<<

Metagenomic investigation of Dehalococcoidia used for bioremediation of groundwater and soil contaminated with chlorinated ethenes and ethanes

by

Olivia Molenda

A thesis submitted in conformity with the requirements

for the degree of Doctor of Philosophy

Department of Chemical Engineering and Applied Chemistry

University of Toronto

© Copyright by Olivia Molenda 2018 Metagenomic investigation of Dehalococcoidia used for bioremediation of groundwater and soil contaminated with chlorinated ethenes and ethanes

Olivia Molenda

Doctor of Philosophy

Department of Chemical Engineering and Applied Chemistry

University of Toronto

2018

ABSTRACT

Chlorinated aliphatic hydrocarbons (CAHs) such as vinyl chloride (VC) and perchloroethene

(PCE) are persistent, toxic groundwater and soil pollutants. Bioremediation using mixed microbial cultures such as KB-1 and WBC-2 has proven to be a technically viable and economically feasible remediation strategy. These cultures include members of the

Dehalococcoides and Dehalogenimonas, genera from the class Dehalococcoidia, who use reductive dehalogenase (RDases) to dechlorinate CAHs in an obligatory respiratory process. Metagenomic DNA sequencing was conducted on the KB-1 and WBC-2 enrichment cultures to describe the diversity of microorganisms living in the cultures, and to close genomes of key organisms. Ten new complete genomes of Dehalococcoidia are presented here. Their reductive dehalogenase genes and their evolutionary history are described. Dehalococcoides mccartyi in KB-1 was used as an example to study mobile DNA, with a particular focus on the possible exchange of reductive dehalogenase genes. Extrachromosomal circular elements were discovered in D. mccartyi, and found to replicate in a circular intermediate state separate from

ii the genome. Simultaneously, the VC reductase gene (vcrA)-containing genomic island was also found to periodically replicate in a circular intermediate state. Twelve new prophages and their interaction with CRISPR systems carried by D. mccartyi were described. Additionally, a reductive dehalogenase from Dehalogenimonas was identified and biochemically characterized using a combination of blue-native polyacrylamide gel electrophoresis followed by enzymatic assays and peptide sequencing. In summary, this work has nearly doubled the number of

Dehalococcoidia genomes available in NCBI, has led to the discovery of new dehalogenases, new types of mobile DNA, and has begun to elucidate a mechanism for lateral gene transfer in

D. mccartyi. Together this knowledge will help better manage contaminated sites and design more effective bioremediation approaches.

iii ACKNOWLEDGEMENTS

I wanted to study bioremediation since high school when I first heard about it. I can’t believe how lucky I am for landing in the Edwards lab at U of T. I can’t thank my supervisor Elizabeth Edwards enough, who taught me so much more than the science I came for. Elizabeth is an incredible person who is kind, patient, so smart, enthusiastic, and she created the incredible BioZone, where multiple biology-oriented chemical engineers can share and collaborate. The Edwards lab itself was always filled with many friends and mentors, especially Line and Susie who none of us would be able to complete degrees without, and everyone who has filled the Lab Project Manager role: Mel, Christina, Angelika, Katrina and now Vinthiya. Thanks to all Edwards lab students over the years I have been here, especially to Alfredo Perez de Mora and Mel Duhamel who made time for me when I was new, and to Shuiquan Tang for training me on everything and answering all of my questions even long after his graduation. Thanks Cheryl, Fei, Sarah, Nigel, Nadia, Ivy, Luz, Peter, Sean, Suzanna and Tommy for all the friendship, laughs, conference partying and the science too. To my family and friends: Thank you for supporting me, encouraging me, and always asking me “When do you graduate?” even though the answer always was a couple months from now for at least 2 years. Herb, thank you for being my favourite brother and even for reimaging my computer with Ubuntu without my permission which in the end set me up to do all the bioinformatics I did here. Parents: you are the ones who made me a biologist (thanks Mom) and an engineer (thanks Dad). The last and most important thank you goes to the love of my life, Michael, I can’t wait to start this next chapter with you.

iv Table of Contents ACKNOWLEDGEMENTS ...... iv Table of Contents ...... v List of Tables ...... vii List of Figures ...... viii List of Appendices ...... xi List of Abbreviations ...... xiii Chapter 1 Introduction and Background ...... 1 1.1 Chlorinated ethenes and ethanes as groundwater pollutants ...... 1 1.2 Bioremediation and organohalide respiring bacteria ...... 1 1.3 OHRB and reductive dehalogenase enzymes ...... 2 1.4 The role of molecular biology in bioremediation ...... 5 1.5 Dechlorinating consortia KB-1® and WBC-2 ...... 6 1.6 Rationale and research objectives ...... 7 1.7 Thesis outline and Publication Status ...... 9 1.8 Statement of Authorship ...... 11 Chapter 2 Metagenomic sequencing of KB-1 enrichment cultures ...... 14 2.1 Introduction ...... 14 2.2 Methods...... 16 2.3 Nucleotide sequence accession numbers ...... 24 2.4 Results and Discussion ...... 24 2.5 Conclusions ...... 42 Chapter 3 Metagenomic sequencing of the WBC-2 enrichment cultures ...... 43 3.1 Introduction ...... 43 3.2 Methods...... 43 3.3 Results and Discussion ...... 48 3.4 Conclusions ...... 58 Chapter 4 Extrachromosomal circular elements targeted by CRISPR-Cas in Dehalococcoides mccartyi are linked to mobilization of reductive dehalogenase genes ...... 59 4.0 Introduction ...... 59 4.1 Methods...... 61 4.2 Results and Discussion ...... 66 4.3 Conclusions ...... 87 Chapter 5 Vinyl chloride reductase (vcrA) containing genomic island circularizes in Dehalococcoides mccartyi found in the KB-1 and WBC-2 microbial consortia ...... 89

v 5.1 Introduction ...... 89 5.2 Methods...... 90 5.3 Results and Discussion ...... 93 5.4 Conclusions ...... 99 Chapter 6 Identification of new trans-dichloroethene reductive dehalogenase, TdrA, found in Dehalogenimonas sp. WBC-2 ...... 101 6.1 Introduction ...... 101 6.2 Methods...... 102 6.3 Results and Discussion ...... 107 6.4 Conclusions ...... 118 Chapter 7 Summary, Significance and Future Work ...... 119 7.1 Summary and Significance ...... 119 7.2 Future Work ...... 122 REFERENCES ...... 124 Appendix A: Supplemental Information for Chapter 2 ...... 136 Appendix B: Supplemental Information for Chapter 3...... 144 Appendix C: Supplemental Information for Chapter 4...... 148 Appendix D: Supplemental Information for Chapter 5 ...... 172 Appendix E: Supplemental Information for Chapter 6 ...... 174 Appendix F: List of co-authored publications ...... 178 Appendix References ...... 179

vi List of Tables

Table 2-1. General features of Dehalococcoides mccartyi genomes closed from KB-1 trichloroethene (TCE), 1,2-dichloroethane (1,2-DCA) and vinyl chloride (VC) enrichment cultures compared to type strain 195...... 27 Table 2-2. Number of mutations incurred since Most Recent Common Ancestor (MRCA) of D. mccartyi and select reductive dehalogenase genes. myr- million years ...... 40 Table 2-3. Common features of orthologous groups (OGs) of reductive dehalogenases from D. mccartyi...... 41 Table 3-1. Nucleotide pairwise identity of three Dehalococcoides partial 16S rRNA sequences found in WBC-2 compared with 16S rRNA sequences from three Dehalococcoides mccartyi representative genomes from NCBI. Cornell – strain CG4, Pinellas – strain BTF08, Victoria – strain GY50. Best hits for each OTU are highlighted...... 49 Table 3-2. Homologous gene clustering of Dehalococcoidia pangenome from 24 Dehalococcoides mccartyi and 5 Dehalogenimonas genomes...... 55 Table 3-3. Summary of contents of reductive dehalogenase (RdhA) containing homologous protein clusters. Clusters are named A to AO...... 57 Table 4-1. Summary of D. mccartyi CRISPR-Cas system targets ...... 70 Table 4-2. General features of mobile DNA found in Dehalococcoides mccartyi targeted by CRISPR-Cas system ...... 71 Table 4-3. Summary of PCR amplification of CRISPR I-E and I-C array from different KB-1 enrichment cultures over time. NT- not tested. Gel photos in Figure C1...... 86 Table 5-1. Proportion of time culture was found to have equal to or greater than 2 copies of vcrA for every D. mccartyi 16S rRNA gene copy as determined by qPCR...... 95 Table 6-1. Protein normalized dechlorination activity (nmol·min-1·mg protein-1) from tDCE2/EL_2011_A and VC/EL_2010 WBC-2 enrichment cultures...... 109

vii List of Figures Figure 1-1. Cross section of a Dehalococcoides mccartyi cell membrane containing the hypothetical organohalide respiratory protein complex adapted from (30)...... 4 Figure 1-2. Summary overview of thesis structure and the relationships between studies presented. Additional experiments were conducted as a result of findings from DNA sequencing...... 9 Figure 2-1. Lineage of KB-1 enrichment cultures. Cultures are listed by name with in-laboratory name in brackets, date of creation and electron acceptor and donor amended...... 17 Figure 2-2. Schematic flow chart of workflow used to assemble Dehalococcoides mccartyi genomes from KB-1 metagenomes...... 26 Figure 2-3. Overview of D. mccartyi genomes closed from three different KB-1 enrichment cultures. Each culture is labeled by electron acceptor and electron donor and the date the culture was first created...... 28 Figure 2-4. Culture composition and Dehalococcoides mccartyi (Dhc) genomes closed from 16S rRNA amplicon sequencing, Illumina sequencing and qPCR of rdhA genes...... 30 Figure 2-5. Phylogenetic amino acid tree of reductive dehalogenases from D. mccartyi closed genomes. Most likely tree of 100 bootstraps...... 32 Figure 2-6. The number of new orthologous groups (OGs) of reductive dehalogenases (RDases) found with each new Dehalococcoides mccartyi genome closed available from NCBI. Single RDases with no group were added...... 33 Figure 2-7A. Order of rdhA found in high plasticity region one (HPR 1) in twenty-two Dehalococcoides mccartyi genomes labeled by strain name. RdhA are labeled by orthologous group (OG) number...... 34 Figure 2-7B. Order of rdhA found in high plasticity region two (HPR 2) in twenty-two Dehalococcoides mccartyi genomes labeled by strain name. RdhA are labeled by orthologous group (OG) number...... 35 Figure 2-8. Phylogenetic tree created from an alignment of 109 concatenated core genes from Dehalococcoides mccartyi closed genomes and Dehalogenimonas closed genomes with Chloroflexi Sphaerobacter thermophilus as out-group...... 38 Figure 2-9. Phylogenetic tree of reductive dehalogenase genes which belong to orthologous group (OG) 5, 13, 71, 15 and 34...... 39 Figure 3-1. Composition of dechlorinating genera Dehalococcoides (Dhc), Dehalogenimonas (Dhg) and Dehalobacter (Dhb) quantified using qPCR in WBC-2 cultures enriched on different chlorinated substrates...... 48 Figure 3-2. Number of reads of operational taxonomic units (OTU) found in WBC-2 enrichment cultures enriched on different electron acceptors...... 50 Figure 3-3. The complete Dehalococcoides mccartyi WBC-2 genome. Inner scale shows position along the chromosome...... 53 Figure 3-4. Correspondence analysis ordination plot of Dehalococcoidia pangenome. Points indicate clusters of homologous protein sequences (triangles)...... 56

viii Figure 4-1. Overview of the eight Dehalococcoides mccartyi genomes closed from the metagenomes of KB-1 enrichment cultures...... 62 Figure 4-2. Diagram of the concatenated gene plasmid used as a qPCR standard...... 65 Figure 4-3. Dehalococcoides mccartyi KBVC1 and KBDCA3 complete genomes and homology to related genomes...... 67 Figure 4-4. CRISPR-Cas operon from D. mccartyi Type I-E and Type I-C. Black diamonds indicate repeats in CRISPR array – which differ based on stain (CBDB1, KBVC1, KBDCA3, GT or DMCB5)...... 68 Figure 4-5. CRISPR Cas1 maximum likelihood tree constructed from an alignment of 227 Cas1 protein sequences...... 69 Figure 4-6. Maximum likelihood phylogenetic tree of prophages identified in D. mccartyi closed genomes including those from KB-1...... 73 Figure 4-7. MAUVE nucleotide sequence alignment diagram between two similar KB-1 prophages. Prophage KB1/TCE-0 was found in the KB-1/TCE-MeOH culture in 2007 and prophage KBTCE1/VC2-1 was found in the same culture in 2013...... 74 Figure 4-8. Representation of Dehalococcoides mccartyi KBVC1 CRISPR type I-E Cascade action on prophage target. D. mccartyi crRNA from the CRISPR array from sequencing information is shown in blue...... 74 Figure 4-9. Maximum likelihood phylogenetic tree of integrative mobilizable elements (IMEs) targeted by KBVC1 and KBDCA3 CRISPR-Cas systems...... 76 Figure 4-10. Metagenomic evidence for circular existence of integrative mobilizable elements (IMEs) in D. mccartyi...... 78 Figure 4-11. Illustration of the circularization of the vcrA genomic island (GI). PCR reactions used to verify sequencing results are shown targeting the vcrA genomic island integrated within the genome (PCR 1) or in circular form (PCR 2)...... 81 Figure 4-12. Evidence of circularization of the vcrA genomic island. Panel a) shows vinyl chloride (VC) and ethene concentrations (left axis) and gene abundances (right axis) as VC is dechlorinated to ethene in a KB-1 sub-culture. Panels b) and c) show the agarose gel images with amplification products generated using reactions PCR 1 and PCR 2 ...... 82 Figure 4-13. Quantitative PCR (qPCR) and gas chromatography (GC) tracking of two KB-1/VC- H2 cultures during routine growth. Graphs A and B are two of triplicate cultures studied. The third culture can be found in Figure 4-12...... 83 Figure 4-14. Evidence for the extension of D. mccartyi KBVC1 CRISPR array and new targets acquired. (a) Agarose gel showing PCR product of CRISPR array from KB-1/VC-H2 amplified from DNA extracted over time. DNA extracted in different years was stored at -80oC. (b) Schematic of partial CRISPR array in 2016 compared to 2002 indicating the addition of three new spacers as determined from sequencing of PCR products. Repeats are shown as black diamonds. Blue rectangles indicate spacer match to prophage. Yellow rectangles indicate spacer match to D. mccartyi IME1. (c) New spacers have best matches to two different D. mccartyi prophages indicated with black arrows. Prophages are annotated using same abbreviations as in Fig. 4-6 and Table C4...... 85

ix Figure 5-1. Illustration of the circularization of the vcrA genomic island (GI). PCR reactions used to verify sequencing results are shown targeting the vcrA genomic island integrated within the genome (PCR 1) or in circular form (PCR 2 and PCR 3)...... 94 Figure 5-2. Quantitative PCR (qPCR) and gas chromatography (GC) tracking of two KB-1/VC- H2 cultures (Sa and Sb) during routine growth...... 96 Figure 5-3. Quantitative PCR (qPCR) and gas chromatography (GC) tracking of two WBC- 2/VC-EL (Wa and Wb) cultures during routine growth...... 97 Figure 5-4. A) and B) Agarose gel (1% TAE) of PCR amplified region of circular vcrA genomic island from DNA taken from KB-1/VC-H2 parent culture, KB-1/VC-H2 Ha, KB-1/TCE-MeOH and WBC-2/VC-EL).C) Quantitative PCR of the same DNA use in A) and B) PCR reactions targeting D. mccartyi (Dhc) 16S rRNA and vinyl chloride reductase (vcrA) genes...... 99 Figure 6-1. Reductive dehalogenases (RDases) identified from LC-MS/MS analysis from trans- dichloroethene (tDCE) tDCE/EL_2011 enrichment culture and vinyl chloride (VC) VC/EL_2010 enrichment culture. Bars indicate total spectra assigned per protein from Band 4 and Band 5 where highest dechlorination activity was observed and most RDase hits occur. Protein coverage is indicated in as percentage on each bar...... 111 Figure 6-2. Quantification of 16S rRNA and specific rdhA genes in various WBC-2 enrichment cultures...... 112 Figure 6-3. The complete Dehalogenimonas WBC-2 genome...... 114 Figure 6-4. Phylogenetic maximum likelihood amino acid tree displayed in radial form of two- hundred and twenty-five reductive dehalogenases, A subunit only...... 116

x List of Appendices Appendix A: Supplemental Information for Chapter 2 ...... 136 Table A1. qPCR survey of Dehalococcoides mccartyi 16S rRNA gene (Dhc), General Bacteria 16S rRNA, (GenBac) and reductive dehalogenases: vcrA, bvcA and tceA of four different KB-1 enrichment cultures. ND - not detected NT- not tested ...... 136 Figure A1. Maximum likelihood phylogenetic tree (of 100 bootstraps) of nucleotide alignment of 16S rRNA in Dehalococcoides mccartyi (Dhc) strains...... 137 Figure A2. Illustration of synteny found in KB-1 D. mccartyi genomes...... 138 Figure A3. Phylogenetic nucleotide tree of reductive dehalogenase genes from D. mccartyi closed genomes...... 139 Table A2. Sections 1-4. Parameters used to estimate divergence age. Parts 1-4 were used to calculate final divergence times in Table A2-4...... 140 Table A3. Summary of correspondence analysis conducted on homologous protein clusters from Dehalococcoidia pangenome...... 143 Appendix B: Supplemental Information for Chapter 3...... 144 Figure B1. Culture history for WBC-2 enrichment transfer cultures...... 144 Figure B2. Composition of Dehalogenimonas (Dhg), Dehalococcoides (Dhc) and Dehalobacter (Dhb) found in WBC-2 enrichment cultures maintained on different chlorinated electron acceptors...... 145 Table B1. Summary of correspondence analysis conducted on homologous protein clusters from Dehalococcoidia pangenome...... 146 Figure B3. Stacked bar graph displaying distribution of Dehalogenimonas and Dehalococcoides protein sequences from pangenome analysis...... 146 Figure B4. Correspondence analysis outputs ...... 147 Appendix C: Supplemental Information for Chapter 4...... 148 Table C1. Primers used in this study...... 148 Table C2. General features of Dehalococcoides mccartyi genomes. Genomes closed from KB-1 derived enrichment cultures compared with other CRISPR-Cas containing strains GT, CBDB1 and DCMB5...... 149 Table C3. Potential targets of Dehalococcoides mccartyi KBVC1, KBDCA3, CBDB1, GT and DCMB5 CRISPRs...... 150 Table C4. Putative annotation of open reading frames (ORFs) found in Dehalococcoides mccartyi prophages and Enterobacteria lambda and HK97 phages ...... 163 Table C5. Integrative Mobilizable Elements (IMEs) found targeted by the CRISPR systems of D. mccartyi KBVC1 and KBDCA3. Nucleotide sequences available from NCBI. Dmc – Dehalococcoides mccartyi...... 164 Table C6. IME1-like constructs found in other bacteria whose genomes are available in NCBI. MP – mega plasmid...... 165

xi Table C7. Quantitative PCR (qPCR) raw data used to make Figure 4-12 & 4-13. Dhc - Dehalococcoides mccartyi. DNA extracts from 2mL culture...... 166 Table C8. Details of standard curves generated for qPCR including slope, efficiency, R2, and Y-intercepts. Dhc -Dehalococcoides mccartyi ...... 168 Figure C1 (a-c). Gel photos of PCR amplified CRISPR I-E and I-C arrays from different KB-1 enrichment cultures over time...... 169 Appendix D: Supplemental Information for Chapter 5 ...... 172 Changing donor from 5x hydrogen to 5x methanol with 3.5x ethanol increases methane production and dechlorination in the KB-1/VC-H2 progeny cultures...... 172 Initial tracking of the number of copies of vcrA and D. mccartyi 16S rRNA during a dechlorination cycle ...... 172 Appendix E: Supplemental Information for Chapter 6 ...... 174 Figure E1. Image of the blue native-PAGE gels showing molecular weight ladder and first sample lane...... 174 Figure E2. Genomic orientation of tdrA and tdrB genes in Dehalogenimonas in the 5’-3’ direction...... 175 Figure E3. Phylogenetic maximum likelihood amino acid tree of two-hundred and twenty- five reductive dehalogenases, A subunit only...... 176 Table E1. qPCR primers used in this study ...... 177 Table E2. Initial Assays on CFEs to investigate substrate range...... 177 Table E3. The calibration curves generated from qPCR runs using Ssofast EvaGreen qPCR reagent. E = efficiency of the reaction. Standard curve equations x = log of starting amount to y = CT (threshold cycle; cycle at which amplified products can be detected)...... 178 Appendix F: List of co-authored publications ...... 178 Appendix References ...... 179

xii List of Abbreviations 1,1-DCA 1,1-Dichloroethane 1,1-DCE 1,1-Dichloroethene 1,2-DCA 1,2-Dichloroethane 1,1,2-TCA 1,1,2-Trichloroethane BN-PAGE Blue-Native Polyacrylamide Gel Electrophoresis cDCE cis-1,2-Dichloroethene Dhg Dehalogenimonas Dhb Dehalobacter Dhc Dehalococcoides DNAPL Dense non-aqueous phase liquid EtOH Ethanol GI Genomic island HPR High plasticity region (in Dehalococcoides genome) IME Integrative and mobilizable element LC-MS/MS Liquid-Chromatography Tandem Mass Spectrometry MeOH Methanol OHRB Organohalide respiring bacteria PCE Tetrachloroethene qPCR Quantitative Polymerase Chain Reaction RDase Reductive dehalogenase rdhA Reductive dehalogenase homologous gene RdhA Reductive dehalogenase protein function unknown rRNA Ribosomal Ribonucleic Acid SNPs Single nucleotide polymorphisms TCE Trichloroethene tDCE trans-1,2-Dichloroethene TeCA 1,1,2,2-Tetrachloroethane VC Vinyl Chloride WBC-2 West Branch Canal Creek Microbial Consortium

xiii Chapter 1 Introduction and Background 1.1 Chlorinated ethenes and ethanes as groundwater pollutants Manufactured chlorinated aliphatic hydrocarbons (CAHs) are widespread groundwater and soil pollutants. Extensive use as industrial degreasers, solvents and chemical feed stocks in combination with poor operating practices lead to their release into the environment (1). CAHs have low solubility in water and have higher densities than water. In the environment CAHs sink below the water table to form high concentration plumes, thus they are classified as dense non- aqueous phase liquids (DNAPLs). DNAPLs pose major technical and economic challenges for clean-up (2). Notoriously recalcitrant and toxic at low levels, CAHs are the most frequently detected groundwater contaminant reported under the Superfund program (3).

Examples of CAHs include 1,1,2,2-tetrachloroethane (TeCA), perchloroethene (PCE), and vinyl chloride (VC). TeCA was widely used as a solvent, refrigerant (R-130) and as a chemical manufacturing intermediate prior to concerns surrounding its toxicity. Classified as a Group C possible human carcinogen, it causes jaundice, enlarged liver, headaches tremors, numbness, and drowsiness in humans who have chronic inhalation exposure (4). Perchloroethene (PCE) is used for dry cleaning of fabrics, however, certain areas of the world are in process of discontinuing its use such as through the California’s Air Resource Board ban. PCE is currently classified as a Group 2A carcinogen and a central nervous system (CNS) depressant. Furthermore PCE is suspected of increasing the risk of Parkinson’s disease by nine fold (5). Vinyl chloride (VC) is a Group 1 human carcinogen, mutagen and central nervous system depressant. VC is the parent compound in the production of PVC and readily degrades in the atmosphere with a half-life of 1.5 days. In soil and groundwater, VC is naturally produced in small quantities (6), however, a significant amount can be produced during bacterial degradation of PCE, trichloroethene (TCE) or 1,1,1-trichloroethane (1,1,1-TCA) (7) and is equally as persistent in soil as its parent compounds.

1.2 Bioremediation and organohalide respiring bacteria At sites where groundwater and soil are contaminated with CAHs, bio-processes are always present and in some cases can be very effectively accelerated (8). The use of bacteria to degrade PCE dates back to the early 1950s, but only became a viable option upon the discovery of Dehalococcoides mccartyi bacteria, capable of dechlorinating the last and most toxic daughter

1 product, VC, during PCE stepwise dechlorination (9, 10). Bioremediation techniques can include one or a combination of natural bioattenuation, biostimulation and bioaugmentation. Biostimulation involves adding limiting nutrients to soil to stimulate native degradation, while bioaugmentation involves the addition of living culture known to be capable of degrading site pollutants. Three decades of the use of bioremediation to clean up CAH demonstrate that it is an effective method limited primarily by sufficient site characterization, and ability to provide appropriate mixing during in-situ site injection (8).

Anaerobic biodegradation of CAHs can occur either co-metabolically, in a process which is slow and possibly incomplete, or they can be used by bacteria as a terminal electron acceptor in respiration for energy conservation and growth which proceeds at catabolic rates. Organohalide respiring bacteria (OHRB) derive energy for growth from dehalogenation. The success of bioremediation is largely due to the discovery of OHRB which are highly specialized and effective at dehalogenation. The first OHRB identified was Desulfomonile tiedjei (11) capable of deriving it’s energy requirements by using 3-chlorobenzoate as an electron acceptor. OHRB come from many different phyla including Deltaproteobacteria (Geobacter, Desulfuromonas, Anaeromyxobacter, Desulfoluna, Desulfomonile and Desulfovibiro), Epsilonproteobacteria (Sulfurospirillum), Betaproteobacteria (Shewanella and Comamonas), Firmicutes (Dehalobacter and Desulfitobacterium) and Chloroflexi (Dehalococcoides, Dehalogenimonas and Dehalobium) (12). The Dehalococcoidia, composed of Dehalococcoides and Dehalogenimonas, are obligate organohalide respiring bacteria for which no other growth supporting redox couples are known. Characterized members of the Dehalococcoidia are capable of anaerobically dechlorinating CAH with two or fewer chlorine substituents thus are key to complete detoxification. OHRB are phylogenetically diverse, ancient microbes which contribute not only to halogen degradation at contaminated sites, but also to the global halogen cycle (12-14).

1.3 OHRB and reductive dehalogenase enzymes OHRB use reductive dehalogenase enzymes (RDases or RdhA if uncharacterized) in order to catalyze dechlorination. RDases occur in an operon containing the rdhA, rdhB, rdhC, and sometimes additional genes. RdhA has been identified as the catalytically active gene and the B gene has been long suspected, and recently implicated in acting as a membrane anchor containing two or sometimes three trans-membrane helices (15-19). The RdhA protein is identified based on the presence of three motifs: a twin-arginine TAT membrane export sequence

2 (RRXFXK), an eight-iron ferredoxin cluster binding motif (CXXCXXCXXXCP)2 and various modifications a cobalamin binding motif (DXHXXGSXLGG) (20-22). A corrinoid molecule is a critical co-factor to folding and function (23-25).

Heterologous expression of a functioning dehalogenase was only recently demonstrated, (25, 26) and neither system has proven its utility for use with all RDases. The RDase PceA from Desulfitobacterium hafniense was expressed in active form in Shimwellia blattae, an organism capable of de novo cobamide production of pseudo-B12 with either an adeninyl moiety or 5,6- dimethylbenzimidazole-cobamide (DMB-cobamide) as the lower ligand. Subsequently, A. Parthasarathy, et al. (25) successfully expressed VcrA in E. coli strain iscR by reformation with the addition of hydroxocobalamin/adenosylcobalamin, Fe3+ and sulfide in the presence of mercaptoethanol. Electron paramagnetic resonance of reconstituted VcrA suggests that a reduced [4Fe-4S] cluster reduces Co (II) to Co(I) of the bound cobalamin, VC is then reduced when Cob(I)alamin transfers an electron to the substrate creating a vinyl radical intermediate (25). Recent studies have demonstrated the preference of a benzimidazoyl-cobamide by Dehalococcoides mccartyi (24). The only available structure of a respiratory reductive dehalogenase is PceA – a tetrachloroethene dehalogenase (PDB 4UQU) produced by Sulfurospirillum multivorans (27). The 464 residue dehalogenase can be divided into an N- terminal domain (residues 1-138), two vitamin B-12 binding units (residues 139-163 and 216 to 323) a “letter-box” substrate channel (residues 164-215) and iron-sulfur cluster binding unit (residues 324 to 394). Two such domains combine to create one double, symmetrical protein with two substrate channels.

Dehalococcoides mccartyi reductive dehalogenase A and B subunits form a unique protein complex which is currently considered to be a fully functional stand-alone respiratory chain without quinone or cytochrome involvement (28). Other than RdhAB, the protein complex includes the organohalide respiration molybdoenzyme (OmeA), formally called the complex iron-sulfur cluster molybdoenzyme (CISM) (28) found to have hydrogenase activity (29), its putative membrane anchor (OmeB), hydrogen uptake hydrogenase (HupX) with a [NiFe] large subunit (HupL) and iron-sulfur containing small subunit (HupS). It is currently hypothesized that electrons are fed into the complex via HupL, transferred via iron-sulfur clusters to the RdhA which reduces organohalides. It is possible, but not yet tested that this process induces conformational changes in the protein complex that can directly drive proton translocation across 3 the membrane since D. mccartyi was not found to contain any conventional ATPases in its membrane (30) (Figure 1-1).

Figure 1-1. Cross section of a Dehalococcoides mccartyi cell membrane containing the hypothetical organohalide respiratory protein complex adapted from (30).

Prior to the recent development of heterologous expression systems (25, 26), RDases have been identified using genomic inference, reverse-transcription PCR or partial biochemical characterization, often using blue native polyacrylamide gel electrophoresis (BN-PAGE) paired with liquid chromatography tandem mass spectrometry (LC-MS/MS). These studies have revealed differences in the substrate preferences for the chlorinated ethene Dehalococcoides mccartyi dehalogenases. VcrA has preference towards VC, cDCE, 1,1-DCE and TCE as well as 1,2-dichloroethane (1,2-DCA) (25, 31); TceA has activity on TCE, cDCE, trans-dichloroethene (tDCE), 1,1-dichloroethene (1,1-DCE), VC, 1,2-DCA and several brominated aliphatic acids (15); and BvcA has activity on VC, cDCE, 1,2-DCA, 1,1-DCE and tDCE (32). Dehalogenase MbrA degrades PCE to tDCE (33), CbrA degrades 1,2,3,4-tetrachlorobenzene (1,2,3,4-TeCB) to 1,2,4-trichlorobenzene (1,24-TCB) (34). Three dehalogenases have been suspected of involvement in PCB degradation (35). And recently identified is the involvement of dehalogenase PteA in PCE to VC degradation (36).

4

1.4 The role of molecular biology in bioremediation A major challenge posed to bioremediation is sufficient knowledge surrounding the key microorganisms that are capable of degrading select contaminants as determined from laboratory study, and applying this knowledge into a real remediation environment. If bioremediation goals are not met, reasons why are often not understood due to the complex biology at hand which could deter practitioners. As a result, science has turned to a multitude of molecular techniques targeted at monitoring biological activity. Advances in high-throughput DNA sequencing, gene expression and metabolism modeling has revolutionized the ability of biologists to track biological functions in the environment (37). Using polymerase chain reaction (PCR) and quantitative PCR (qPCR) to interrogate DNA extracted from soil and groundwater are routine methods used to survey and track OHRB such as Dehalococcoides mccartyi and key genes coding for reductive dehalogenases. Effective molecular biology techniques are essential to bioremediation success and have the potential to fill the gap in studying microbes which can’t be cultured in a laboratory setting (38).

The first genome of Dehalococcoides mccartyi was made publicly available in 2005 (9), at the completion of this thesis, 24 genomes have been sequenced, nine of which were sequenced as part of this thesis. Dehalococcoides mccartyi are ancient microorganisms with one of the smallest genomes (1.4 Mbp) found in a free-living bacteria. Initial analysis of genomes revealed two high plasticity regions (HPRs, individually HPR1 & HPR2) located on either side of the predicted origin of replication. HPRs often contain insertion sequences, repeated elements, deletions, inversions, evidence of phage infiltration and contain genomic islands (GIs). Central metabolic functions of D. mccartyi are located in the core genome while RDases are typically in the HPR, some on distinct mobile genetic elements. It is generally thought that lateral gene transfer in D. mccartyi is used to adapt to environmental conditions and different halogenated substrates (39, 40).

P. J. McMurdie, et al. (40) found that the vcrA gene is on a highly conserved genomic island in all vcrA-containing D. mccartyi. The VcrA enzyme is critical for effective clean-up of sites contaminated with chlorinated ethenes, as it catalyzes the degradation of VC, a known human carcinogen, to non-toxic ethene. As such, the vcrA gene is used as a biomarker during in-situ bioremediation of PCE or TCE to ethene (41-43). Interestingly, there have been multiple reports of higher vcrA copy numbers than highly conserved, single-copy D. mccartyi 16S rRNA gene

5 copies from field and laboratory samples (43-45), yet vcrA has never been found in any species other than D. mccartyi (45). Additionally, a high number of copies of vcrA has been found to correlate to high levels of VC rather than ethene (45). Sites at which PCE or TCE degradation results in the accumulation of VC have been a subject of debate in research with significant economic and environmental consequences (46). At the beginning of this work, a major goal was to explain the puzzling results of gene copy numbers in order to be able to distinguish between methodology issues or a true biological phenomenon.

1.5 Dechlorinating consortia KB-1® and WBC-2 KB-1 is a consortium originating from soil taken from a trichloroethene (TCE)-contaminated site in southern Ontario (47). Major archaeal and bacterial groups have been identified in the consortium and much is already understood about their individual roles (48). The dechlorinating genera present include Dehalococcoides mccartyi, and Geobacter and recently identified, a small population of Dehalobacter. The types of dechlorinating bacteria present and their abundance varies by enrichment culture. Previous work identified the presence of multiple strains of D. mccartyi in KB-1 (49); a better understanding of these different populations is an objective of this thesis.

WBC-2 is a mixed microbial culture originally isolated from the Aberdeen Proving Ground at the West Branch Canal Creek by USGS researcher Michelle Lorah and her group (50). WBC-2 contains three major dechlorinating organisms belonging to Dehalobacter, Dehalococcoides, and Dehalogenimonas each with unique dechlorinating abilities. WBC-2 is commercially available, and marketed under the name KB-1 Plus.

At the beginning of this work, very little was known about the Dehalogenimonas genus. The Dehalogenimonas 16S rRNA sequence in WBC-2 had only been identified in 2012 (50). The first species characterized from the Dehalogenimonas was D. lykanthroporepellens (51) in 2009, whose genome was made publicly available in 2012 (52), a year after the commencement of this work. The Dehalogenimonas in WBC-2 was of particular interest because of its ability to degrade trans-1,2-dichloroethene (tDCE). This was the first time that any genus other than Dehalococcoides had been found to degrade lower chlorinated ethenes with only two halogen substituents. As a part of this work, the complete WBC-2 Dehalogenimonas genome was sequenced, closed and analysed and the tDCE dehalogenase was identified and partially

6 biochemically characterized (53). Subsequently, the genomes of two additional Dehalogenimonas strains were sequenced (D. alkenigignens in 2015 and D. formiexcedens in 2017) and one draft genome of D. etheniformans (54). D. lykanthroporepellens and D. alkenigignens dihaloeliminate chlorinated aliphatic alkanes (55). BN-PAGE was used to identify the 1,2-dichloropropane (1,2-DCP) reductive dehalogenase from D. lykanthroporepellens (56) and the RDase transcripts have been identified using RT-PCR (57). D. etheniformans is the newest addition capable of dechlorinating TCE, 1,1-DCE and VC (54).

Study of the KB-1 and WBC-2 cultures can be used to improve existing bioremediation techniques, and also provides a platform to study bacteria in a mixed culture environment including observation of microbial interactions and population dynamics. The mixed culture environment is closer to field conditions, and allows us to culture certain bacteria which otherwise would not grow as an isolate in a laboratory setting.

1.6 Rationale and research objectives The subject focus of this thesis are the KB-1 and WBC-2 mixed microbial cultures capable of degrading chlorinated ethenes and ethanes with both cultures being used for in-situ bioremediation. Special focus will include the Dehalococcoidia family including Dehalococcoides mccartyi and the novel Dehalogenimonas. Prior to this work, the metabolic potentials of the Dehalococcoidia in KB-1 and WBC-2 were unrealized and very little was known about the Dehalogenimonas in general. Molecular biology techniques, which are essential to effective bioremediation, will be used to expand the knowledge of KB-1 and WBC-2 using techniques such as 16S rRNA amplicon sequencing, catabolic gene PCR/qPCR and whole metagenome and genome DNA sequencing. The use of catabolic genes as biomarkers will be revisited in order to resolve observed discrepancies previously identified in KB-1. Additional efforts in identifying and analyzing reductive dehalogenase enzymes from Dehalococcoidia will be made using DNA sequencing and enzymatic activity assays. This research will contribute to resolving gaps in our current understanding of Dehalococcoidia including finding new pathways and new substrates to improve bioremediation design.

7

The major objectives of this work were (summarized in Figure 1-2):

1. Identify and characterize microorganisms found in KB-1 and WBC-2, especially Dehalococcoidia responsible for dechlorination. a. Assemble complete genomes of Dehalococcoides (Chapter 2) and Dehalogenimonas (Chapter 3) b. Analyze and compare Dehalococcoidia genomes especially their reductive dehalogenase genes (Chapter 2 and 3) c. Identify main phylogenetic groups present using 16S rRNA amplicon sequencing (Chapter 2 and 3) 2. Identify the active reductive dehalogenase enzymes that are responsible for observed dechlorinating activity in the mixed cultures a. Find the enzyme responsible for tDCE hydrogenolysis to VC in Dehalogenimonas (Chapter 6) b. Identify evolutionary origins of reductive dehalogenase enzymes in the Dehalococcoidia and their impact on substrate range (Chapter 2 and 6). 3. Investigate the role of lateral gene transfer in shaping Dehalococcoides mccartyi genomes influencing dechlorinating ability (Chapter 4 and Chapter 5)

8

Figure 1-2. Summary overview of thesis structure and the relationships between studies presented. Additional experiments were conducted as a result of findings from DNA sequencing.

1.7 Thesis outline and Publication Status

Chapter 1 – Introduction and Background

Chapter 2 – Metagenomic sequencing of KB-1 enrichment cultures

Chapter two is a description of DNA sequencing conducted on the KB-1 enrichment cultures. Eight new genomes of Dehalococcoides mccartyi were closed and annotated from three KB-1 cultures enriched on different electron acceptors. The genomes were compared using multivariate analysis, gene clustering, time-since-divergence analysis and gene mapping. The chapter has been written in preparation for publication in BMC Genomics as an original research article, currently available as pre-print from BioRxiv doi: https://doi.org/10.1101/345173. 9

Chapter 3 – Metagenomic sequencing of the WBC-2 enrichment cultures

This chapter describes the DNA sequencing conducted on the WBC-2 enrichment cultures including closing a D. mccartyi genome, written into a genome announcement which has been published (58) and a complete Dehalogenimonas genome which was also published as part of a larger research article (59).

Chapter 4 – Extrachromosomal circular elements targeted by CRISPR-Cas in Dehalococcoides mccartyi are linked to mobilization of reductive dehalogenase genes.

During the assembly of D. mccartyi genomes from KB-1 we found small circular pieces of DNA sequence which were not part of the greater genome sequence. The purpose of this chapter was to identify these circular pieces of DNA, and any other mobile DNA found. The outcome was a paper describing mobile DNA, it’s interaction with CRISPR-Cas systems and its link to the lateral gene transfer of reductive dehalogenases currently accepted in ISMEJ.

Chapter 5 –Vinyl chloride reductase (vcrA) containing genomic island circularizes in Dehalococcoides mccartyi from the KB-1 and WBC-2 consortia

Building on work from Chapter 4, this chapter focuses on a mobile DNA element containing the genes encoding for the vinyl chloride reductase (vcrA). Building on findings presented in Chapter 4, we investigated whether circularization of the vcrA-GI could be induced in a laboratory setting as a result of disrupting culturing conditions. Parts of this chapter have been added to the paper generated from Chapter 4 on CRISPR. This data was presented as at the Aug. 2016 ISME conference (poster) and at the March 2017 DehaloconII conference (oral presentation).

Chapter 6 – Identification of new trans-dichloroethene reductive dehalogenase, TdrA found in Dehalogenimonas sp. WBC-2

The Dehalogenimonas found in WBC-2 grows by dechlorinating tDCE to VC. The dehalogenase responsible was identified using BN-PAGE and LC-MS/MS and named TdrA short of trans- dichloroethene reductive dehalogenase A catalytic subunit. This work was published in Applied and Environmental Microbiology (59).

10

1.8 Statement of Authorship

Chapter 2 Metagenomic sequencing of KB-1 enrichment cultures

Excerpted from: Eight new genomes of organohalide-respiring Dehalococcoides mccartyi reveal evolutionary trends in reductive dehalogenases

Authors: Olivia Molenda1, Shuiquan Tang2, Line Lomheim1 and Elizabeth A. Edwards1,3

Affiliations: Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, Ontario, Canada1

Zymo Research, Irvine, CA, USA2

Department of Cell and Systems Biology, University of Toronto3

Contributions: EAE and OM conceived of the experiments. ST closed genomes with OM. OM annotated genomes and analyzed the data. LL conducted the qPCR on KB-1 cultures and 16S rRNA amplicon analysis. OM, EAE and LL contributed to the preparation of the manuscript.

Publication Information: manuscript prepared for BMC Genomics, currently available from BioRxiv Molenda et al. 2018 BioRxiv 345173 doi.org/10.1101/345173.

Chapter 3 Metagenomic sequencing of the WBC-2 enrichment cultures

Excerpted from two publications: (1) Complete genome sequence of Dehalococcoides mccartyi strain WBC-2, capable of anaerobic reductive dechlorination of vinyl chloride. (2) Dehalogenimonas sp. Strain WBC-2 genome and identification of its trans-dichloroethene reductive dehalogenase TdrA.

Authors: (1) Olivia Molenda1, Shuiquan Tang2 and Elizabeth A. Edwards1,3

(2) Olivia Molenda1, Shuiquan Tang2, Andrew Quaile and Elizabeth A. Edwards1,3

Affiliations: Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, Ontario, Canada1

Zymo Research, Irvine, CA, USA2

Department of Cell and Systems Biology, University of Toronto3

11

Contributions: EAE and OM conceived of the experiments. Cheryl Devine trained/supervised OM on metagenome assembly and binning, ST trained/supervised OM on genome closing. OM collected and analyzed the data.

Publication Information: Molenda et al. (2016) Appl. Environ Microbiol. 82:1,40-50. And Molenda et al. (2016) Genome Announc. 4:6, e01375-16.

Chapter 4 Extrachromosomal circular elements targeted by CRISPR-Cas in Dehalococcoides mccartyi are linked to the lateral transfer of reductive dehalogenases

Excerpted from: Extrachromosomal circular elements targeted by CRISPR-Cas in Dehalococcoides mccartyi are linked to the lateral transfer of reductive dehalogenases

Authors: Olivia Molenda1, Shuiquan Tang2, Line Lomheim1, Vasu Guatam3, Sofia Lemak1, Alexander F. Yakunin1, Karen L. Maxwell3 and Elizabeth A. Edwards1,4,*

Affiliations: Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, Ontario, Canada1

Zymo Research, Irvine, CA, USA2

Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada3

Department of Cell and Systems Biology, University of Toronto4 Contributions: OM conceived of and conducted the experiments. ST closed genomes with OM and found the first IME1 sequence. LL helped prepare DNA for sequencing. VG and KLM annotated the prophages. OM, EAE, ST, KLM, SL, SLY and contributed to the preparation of the manuscript.

Publication Information: Accepted (June 19, 2018) in ISMEJ.

Chapter 5 Vinyl chloride reductase (vcrA) containing genomic island circularizes in Dehalococcoides mccartyi from the KB-1 and WBC-2 microbial consortia

12

Authors: Olivia Molenda1, Shuiquan Tang2, Line Lomheim1 and Elizabeth A. Edwards1,3

Affiliations: Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, Ontario, Canada1

Zymo Research, Irvine, CA, USA2

Department of Cell and Systems Biology, University of Toronto3

Contributions: OM conceived of and conducted the experiments. ST closed genomes with OM. OM polished/annotated the genomes and analysed the data. LL conducted qPCR on DNA sent for sequencing.

Publication Information: Not yet prepared for publication. New experiments in progress.

Chapter 6 Identification of new trans-dichloroethene reductive dehalogenase, TdrA, found in Dehalogenimonas sp. WBC-2

Excerpted from: Dehalogenimonas sp. strain WBC-2 genome and identification of its trans- dichloroethene reductive dehalogenase, TdrA.

Authors: Olivia Molenda1, Andrew Quaile1 and Elizabeth A. Edwards1,2

Affiliations: Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, Ontario, Canada1

Department of Cell and Systems Biology, University of Toronto2

Contributions: EAE and OM conceived of the experiments. OM collected and analyzed the data. AQ ran the LC-MS/MS and helped develop an in house protocol for in gel extraction and digestion of protein.

Publication Information: Molenda et al. (2016) Appl. Environ Microbiol. 82:1,40-50.

13

Chapter 2 Metagenomic sequencing of KB-1 enrichment cultures Posted as pre-print on the BioRxiv Reprint Server for Biology. Copyright © CC-BY-NC 4.0 International License Molenda et al. doi:https://doi.org/10.1 101/345173. June 2018.

2.1 Introduction Bioaugmentation using mixed microbial consortia capable of reductive dechlorination is commonly used in attenuating chlorinated aliphatic hydrocarbons (CAHs) in groundwater and soil. Exposure to CAHs is of public concern due to their known toxicity and/or carcinogenicity (60). The efficacy of in-situ bioaugmentation to transform contaminants such as perchloroethene (PCE) and downstream products such as trichloroethene (TCE), cis-dichloroethene (cDCE) and vinyl chloride (VC) to ethene is already well established (47, 61). Anaerobic reductive dechlorination of these contaminants, especially with two or fewer chlorine constituents, is attributed to the anaerobic bacteria from the class Dehalococcoidia including Dehalococcoides mccartyi (62, 63) and Dehalogenimonas (53, 54). The KB-1 mixed microbial culture is well- suited for this purpose containing multiple strains of Dehalococcoides mccartyi capable of complete detoxification. (61, 64).

D. mccartyi are uniquely adapted for CAH degradation by having reductive dehalogenase enzymes to respire CAHs in an obligatory process required for growth. These bacteria have small genomes (~1.4 Mbp) containing a core syntenic region encoding for “housekeeping” genes such as biosynthesis of amino acids, cell components, transcription/translation, nutrient transport and energy conservation (39, 65). The differences between strains in this region are no more than single nucleotide polymorphisms (SNPs). This stable core genome is interrupted by two variable regions flanking the origin commonly referred to as High Plasticity Regions (HPRs, individually HPR1 and HPR2). HPRs show signs of recombination including repeats, duplication events, insertion sequences, genomic islands, phage related sequences and hold the majority of reductive dehalogenases which are used for organohalide respiration. The primary type of recombination event observable in the HPRs include site-specific recombination - involving the reciprocal exchange of DNA between defined DNA sites (66). D. mccartyi genomes contain as many as ten recombinases per genome in contrast to Escherichia coli K-12 which has six, despite a genome which is ~3.5 times larger. Prokaryotes also use recombination to change gene expression by manipulating regulatory sequences relative to coding sequences (67, 68), or while accepting

DNA from conjugation (69). Several characterized dehalogenases lie on genomic islands with evidence of site-specific integration, such as the VC reductase genes vcrA, and bvcA (40) and TCE reductase gene tceA (70). It is generally thought that recombination events have allowed D. mccartyi to adapt to naturally-occurring and anthropogenic halogenated compounds (39, 40).

D. mccartyi are ancient, tiny organisms having diverged from their most recent Dehalococcoidia ancestor anywhere from 40,000 to 400,000 years ago (40). Circumstances which lead to the reduced genome we can observe in D. mccartyi today can be explained by the genome streamlining hypothesis (71-73). Small genome sizes could also be caused as a by-product of niche specialization due to increased rates of mutation (74), or as a result of gene loss (75). The evolution of genomes in general is thought to be dominated by long term reduction and simplification, with brief episodes of complexification in response to environmental conditions (76). Evidently, D. mccartyi must have experienced long term reduction and simplification to achieve its current state.

Dehalococcoides, and Dehalococcoides-like Chloroflexi are present in both contaminated and uncontaminated environments playing a role in the global halogen cycle, unrelated to releases of organohalides from anthropogenic activities (77). However, some reductive dehalogenase genes found in the genomes of D. mccartyi have very few mutations at the nucleotide or amino acid level suggesting that they have not been a part of D. mccartyi genomes for equally as long. Over time, a gene under strong selective pressure will acquire more synonymous mutations than non- synonymous as a result of purifying selection. The vcrA gene encoding for the vinyl chloride reductase has very few mutations, being 98% conserved (amino acid level) across eight different strains of D. mccartyi suggesting that it was recently laterally distributed across populations possibly in response to industrial activities and the release of chlorinated ethenes into the environment (40).

While many dehalogenases can be identified from any particular D. mccartyi genome, only a select few have been found expressed in response to different CAHs (31, 78). As a result only a few dehalogenases have been characterized, out of the over five hundred sequences that are in NCBI. To classify newly found dehalogenases whose functions are unknown, a naming system has been developed grouping dehalogenases with upwards of 90% pairwise amino acid identity defining sets of orthologous rdhA (i.e. Ortholog Groups, or OG) which likely share the same

15 function (14, 53). In the current state of knowledge surrounding dehalogenases, the vast majority of identified sequences from metagenomic and genomic sequencing have no known function.

Several strains of D. mccartyi, including the type strain 195 (10), GT (79) or BAV1 (80) have been obtained in pure culture. Perez-de-Mora et al. (49) found that in KB-1, a consortium routinely used for bioaugmentation, multiple strains of D. mccartyi were present despite years of enrichment on a single chlorinated substrate. The first objective of this study was to identify the different strains of D. mccartyi in a set of related KB-1 subcultures amended with different chlorinated electron acceptors. From three cultures, 8 new genomes were closed, increasing the total number of publicly available D. mccartyi genomes to 24. The second objective was to compare these genomes to identify key trends in genome evolution, DNA recombination and specifically in the evolution of dehalogenase genes in this species.

2.2 Methods 2.2.0 Enrichment cultures analysed The KB-1 cultures were originally enriched from aquifer materials at a TCE contaminated site in southern Ontario as previous described (47, 61, 80). The parent enrichment culture, referred to as KB-1/TCE-MeOH, has been maintained in a glass 2 L bottle (0.7 L headspace) in anaerobic mineral medium amended with 100 mg/L TCE as electron acceptor and methanol (MeOH) as electron donor, added at 5x the electron equivalents (eeq) required for complete dechlorination, as previously described (80-82). TCE is completely dechlorinated to ethene prior to refeeding, approximately every 2-4 weeks. Acetogens in the mixed culture ferment added methanol to hydrogen and acetate required by D. mccartyi. In 2001, a sub-culture was created with a 2% transfer into pre-reduced anaerobic mineral medium and maintained on VC and hydrogen as described in (80) and is referred to as KB-1/VC-H2. This VC enrichment (200 mL) was maintained in a 250 mL glass bottle sealed using a Teflon mininert cap and was amended with 55 mg/L VC (supplied as 5 mL pure VC gas) and 5x eeq hydrogen gas (supplied as 5mL

80%H2:20% CO2, Praxair). The KB-1/VC-H2 enrichment bottle was also amended with 0.5mM sodium acetate as a carbon source every ten feedings. In 2003, a 1,2-dichloroethane (DCA)-fed enrichment culture was created (KB-1/1,2-DCA-MeOH culture) fed with 250 mg/L 1,2-DCA and 5x MeOH, in a 2 L glass Pyrex bottle containing 1.6 L culture with 0.7 L headspace). See Figure 2-1 below for culturing summary. Approximately every 6 months, half of each culture was removed and substituted with fresh pre-reduced anaerobic mineral medium to replenish

16 vitamins and buffer. In all enrichment cultures, chlorinated ethenes/ethanes, methane and ethene concentrations were monitored using gas chromatography with an FID detector (Agilent 7890A GC system, G1888 auto-sampler, helium used as carrier gas, packed inlet, Agilent GSQ-Plot column 0.53mm x 30m), calibrated with external standards.

Figure 2-1. Lineage of KB-1 enrichment cultures. Cultures are listed by name with in- laboratory name in brackets, date of creation and electron acceptor and donor amended. Cultures which are highlighted in coloured boxes were sent for Illumina sequencing (mate-pair and paired-end for all except KB-1/cDCE-MeOH where only paired-end was done) and 454 16S rRNA amplicon sequencing. All sequencing was conducted in 2013.

2.2.1 Metagenomic sequencing and genome assembly DNA for metagenome sequencing was extracted from larger samples (40-615 mL) taken from the three functionally stable enrichment cultures described above: KB-1/VC-H2 (40 mL culture sample), KB-1/TCE-MeOH (500 mL sample), KB-1/cDCE-MeOH (300 mL culture) and KB- 1/1, 2-DCA-MeOH (615 mL sample). Extractions were conducted between February and May, 2013. Cultures were filtered using Sterivex™ filters (Millipore 0.2 µm) and the DNA was extracted using the CTAB method (JGI bacterial genomic DNA isolation using CTAB protocol

17 v.3). DNA was sequenced at the Genome Quebec Innovation Sequencing Centre using Illumina HiSeq 2500 technology. Paired-end sequencing with an insert size of ~400 bp and read length of ~150 bp provided roughly 50 million reads per culture. Additional mate-pair sequencing with insert size of ~8000 bp and read length of ~100 bp was conducted for the KB-1/TCE-MeOH and KB-1/1, 2-DCA-MeOH cultures where we had more DNA. In the case of metagenomic sequencing using short-read Next Generation Sequencing (NGS), we have demonstrated the utility of long-insert mate-pair data in resolving challenges in metagenomic assembly, especially those related to repeat elements and strain variation (83). In this study, we applied a combination of Illumina mate-pair and paired-end sequencing data. Although other long-read sequencing technologies (e.g. PacBio SMRT sequencing, Nanopore sequencing, Illumina Synthetic Long- Read Sequencing Technique and 10x Genomics) are available, Illumina mate-pair sequencing is a cost-effective choice for the goal of obtaining both high sequencing depth and accuracy and long-distance mate pair links. Raw sequences were trimmed with Trimmomatic (84) to remove bases of low quality and to remove adapters.

The D. mccartyi genomes were assembled in six steps as described below and illustrated in Figure 2-2. In Step 1, we generated ABySS unitigs with Illumina paired-end data using the ABySS assembler (85). These unitigs were the main building blocks in the assembly of the complete genomes. ABySS assemblies generate both unitigs and contigs. Unlike contigs, unitigs are generated solely by overlapping k-mers and their assembly does not utilize the paired-end constraints. As a result, the maximum overlapping length between unitigs is the length of k-mer size minus one. When using ABySS to assemble metagenomic data, we used the maximum k- mer size allowed, 96 bp, since the raw read length was 150bp, much longer than 96 bp. When configuring ABySS runs, it was critical to utilize the -c parameter, which specifies a cut-off, the minimum k-mer depth/coverage used in the assembly. Sequences/unitigs with k-mer coverage lower than this cut-off will be ignored in the assembly, which allows users to have good assemblies of high abundance organisms as the interferences caused by low abundance organisms (especially those of close relatives) and sequencing errors are removed. It is important to make sure that the k-mer depth of the sequences of the target genomes is higher than this threshold so that you have all sequences/unitigs you need to close the target genomes. For example, if the average k-mer depth of the target genome is 100, try 20 for the -c cut off. We used a combination of 16S rRNA amplicon sequencing and qPCR to get an idea of the relative

18 abundances of our target organisms in our metagenome prior to attempting different ABySS assemblies.

In Step 2, we generated a genome-wide reference sequence for the target genome, which was subsequently used to guide the scaffolding of unitigs. This reference sequence can be obtained in different ways. If there are long-distance mate-paired data like we had for KB-1/TCE-MeOH and KB-1/1, 2-DCA-MeOH cultures, this reference sequence can be built de novo. We used two ways to build it: (1) using a standalone scaffolding program, SSPACE v. 2.0 (86), to generates scaffolds with ABySS contigs/unitigs utilizing the mate-paired constraints, (2) using ALLPATHS-LG (87) to generate the assembly with both paired-end and mate-pair data as inputs. ALLPATHS-LG turned out to be the most effective way in most cases. A publicly available closely related closed genome might also be able to serve as a reference genome to guide scaffolding of unitigs in the next step.

One major challenge in metagenomic assembly is cross-interference between closely related genomes, such as strains of the same species. The sequence similarity/dissimilarity between these closely related genomes tend to break the assembly. If a genome had a closely related genome interfering in its assembly, we attempted to assemble the genome with ALLPATHS-LG using both short-insert paired-end data and long-insert mate-pair data. For genomes that have no closely related genomes, a surprisingly effective way to assemble is to combine Digital Normalization (88) with ALLPATHS-LG. This approach reduces the data redundancy of raw sequences with Digital Normalization by k-mer and then one can assemble the resulting data with ALLPATHS-LG. In our case, we had multiple D. mccartyi strains in each metagenome and could not use Digital Normalization. ALLPATHS-LG was able to differentiate our similar strains because their abundances were distinct.

In Step 3, we used the best assembly generated from Step 2 to guide the scaffolding of unitigs generated from Step One. The scaffolding process is based on sequence comparison between the unitigs and the reference assembly by BLAST. After that, unitigs that have a k-mer depth significantly lower or higher than the average k-mer depth of the genome are removed. The basic assumption here is that unitigs with k-mer depth higher than average likely belong to repetitive sequences (such as rRNA gene operons and transposons) and unitigs with a k-mer depth lower than average are more likely to be strain specific. In other words, only unitigs with k-mer depth

19 around average (we used 90%-110% of average) are kept; these unitigs are likely shared by closely related genomes. After that, the gap distance between the neighbouring unitigs is estimated based on the reference assembly. In brief, this process generated a scaffold consisting of unitigs shared by all closely related strains; this will serve as a backbone for subsequent gap resolution.

In Step 4, we identify all potential solutions for all gaps between unitigs in the scaffold. This step is performed by filling the gaps with the remaining unitigs mostly based on sequence overlap between unitigs; we published a similar process previously (83). In the updated script, we have improved the process by incorporating paired-end and mate-pair link information between unitigs to help guide the searching process. The paired-end and mate-pair links were obtained by mapping raw reads against unitigs. Solutions identified this way fulfill the constraints of sequence overlap, paired-end links and mate-pair links. If there are multiple solutions to a gap and they have k-mer depth lower than the average, this suggests the presence of strain variation. In the end, this step generates a closed assembly, having some gaps with multiple solutions in cases of strain variation.

In Step 5 we bin these multiple solutions caused by strain variation to different genomes based on sequencing depth or k-mer depth. For example, if there are always two solutions, one of k- mer depth of 60 and the other one of 40, we will assign all solutions with higher depth to one strain and the rest to the other strain. This approach is unfeasible if the two strains happen to have similar abundance and similar sequencing depth. Things become more complicated when there are more than two strains; in such case, we only try to resolve the genome of the highest abundance by gathering solutions of highest k-mer depth. The editing of the genome sequences is facilitated by the use of Geneious v. 6.1 (89). Finally, in Step 6 we polish the assembled genome by mapping raw reads back to the final assembly. SNPs caused by strain variation are identified. If possible, they are resolved based on abundance in the same principle as using k-mer depth to assign alternative solutions in Step 5.

In all cases multiple genomes could be closed from a single enrichment culture because the different populations of D. mccartyi were at different abundances (as inferred from read depth) at the time of sampling. Two complete genomes each containing a vinyl chloride reductase gene

(vcrA) were closed from the KB-1/VC-H2. The naming convention used here distinguishes KB-1

20 lineage (KB-1) electron acceptor (in this case vinyl chloride or VC) and relative abundance

(number 1 for highest abundance and so on) naming the strains from KB-1/VC-H2 culture D. mccartyi strains KBVC1 and KBVC2. Three genomes each containing bvcA were closed from the KB-1/1, 2-DCA enrichment culture further referred to as strain KBDCA1, KBDCA2 and KBDCA3. Two genomes each containing tceA from KB-1/TCE-MeOH culture, strains KBTCE2 and KBTCE3. A D. mccartyi complete genome containing a vcrA gene was also assembled from KB-1/TCE-MeOH culture, named strain KBTCE1. In all cases low abundance strains of D. mccartyi could not be assembled implying that although eight genomes were closed, the total number of KB-1 D. mccartyi strains is at least eleven. The genomes were annotated using the RAST (90) and BASyS (91) servers; results were manually inspected and corrected where required. Additional searches for conserved domains were conducted using NCBI conserved domain search (E-value threshold of 0.01). The origin of replication was identified using Oriloc in R (92).

2.2.3 Alignments and phylogenetic trees A core gene alignment was created by aligning a set of 109 core genes found in Dehalococcoides mccartyi (22 strains), Dehalogenimonas lykanthroporepellens, D. alkenigignens, Dehalogenimonas sp. WBC-2 and a Chloroflexi out-group Sphaerobacter thermophilus. Core genes are defined as orthologous genes which are present in all genomes analyzed. Core genes were identified using reciprocal BLASTp followed by manual inspection. Each gene was aligned using muscle v. 3.8.3.1 (93) with default settings. The alignments were concatenated to create one long alignment (138,334 bp long, 26 sequences, 83% pairwise identity). A maximum likelihood (ML) tree was built using RAxML (94) plugin in Geneious 8.1.8 (89) with GTR gamma nucleotide substitution model and 100 bootstrap replicates. The best scoring ML tree was chosen as the final tree.

Five-hundred and fifty one rdhA sequences were selected to create a nucleotide phylogenetic tree using Geneious 8.1.8. These included all rdhA which have been assigned an ortholog group (OG) number from the RDase database, all rdhA from three Dehalogenimonas (lykanthroporepellens, alkenigignens and sp. WBC-2) and a reductive dehalogenase from Desulfoluna spongiiphila as the out-group. D. spongiiphila is a reductively dehalogenating, anaerobic, sulfate-reducing bacterium isolated from a marine sponge (95). The alignment and tree building was conducted

21 using muscle and RAxML as described above. FigTree 1.4.2 was used to visualize and further edit the tree to generate figures in this study (http://tree.bio.ed.ac.uk/software/figtree/).

We compared the ratio of non-synonymous (Ka) to synonymous substitutions rates (Ks). The Ka/Ks ratio can be used to identify positive selection. If all non-synonymous mutations are either neutral or deleterious, then Ka/Ks < 1, while if Ka/Ks > 1, then positive selection occurred (96). Ka/Ks ratios were calculated using http://services.cbu.uib.no/tools/kaks.

2.2.4 Quantitative PCR (qPCR) Analysis Quantitative polymerase chain reaction (qPCR) was used to estimate the abundance of rdhA, and D. mccartyi sequences in each of the sequenced cultures. DNA samples were diluted 10, 50 or 100 times with sterile UV treated distilled water (UltraPure), and all subsequent sample manipulations were conducted in a PCR cabinet (ESCO Technologies, Gatboro, PA). Each qPCR reaction was run in duplicate. Four Dehalococcoides genes were targeted by qPCR: 1) the phylogenetic 16S rRNA gene for Dehalococcoides Dhc1f (5’-GATGAACGCTAGCGGCG-3’) and Dhc264r (5’-CCTCTCAGACCAGCTACCGATCGAA-3’) (97); 2) the vinyl chloride reductase gene, vcrA, vcrA642f (5’-GAAAGCTCAGCCGATGACTC-3’) and vcrA846r (5’- TGGTTGAGGTAGGGTGAAGG-3’ ) (98); 3) bvcA dehalogenase, bvcA318f (5’- ATTTAGCGTGGGCAAAACAG-3’) and bvcA555r (5’- CCTTCCCACCTTGGGTATTT-3’) (98); and 4) tceA dehalogenase: tceA500f (5’ TAATATATGCCGCCACGAATGG-3’) and tceA795r(5’- AATCGTATACCAAGGCCCGAGG-3’) (78). Samples were also analysed using general bacteria 16S rRNA primers GenBac1055f (5’- ATGGCTGTCGTCAGCT-3’) and GenBac1392r (5’- ACGGGCGGTGTGTAC-3’) (99). DNA samples were diluted 10, 50 or 100 times with sterile UltraPure distilled water, and all subsequent sample manipulations were conducted in a PCR cabinet (ESCO Technologies, Gatboro, PA). Each qPCR reaction was run in duplicate. Each qPCR run was calibrated by constructing a standard curve using known concentrations of plasmid DNA containing the gene insert of interest. The standard curve was run with 8 concentrations, ranging from 10 to 108 gene copies/µL. All qPCR analyses were conducted using a CFX96 real-time PCR detection system, with a C1000 thermo cycler (Bio-Rad Laboratories, Hercules, CA). Each 20 µL qPCR reaction was prepared in sterile UltraPure distilled water containing 10 µL of EvaGreen® Supermix (Bio-Rad Laboratories, Hercules, CA), 0.5 µL of each primer (forward and reverse, each from 10 μM stock solutions), and 2 μL of diluted template (DNA extract or standard plasmids). The thermocycling program was as

22 follows: initial denaturation at 95oC for 2 min, followed by 40 cycles of denaturation at 98oC for 5s, annealing at 60oC (for 16S rRNA and vcrA, bvcA genes, respectively) or 58 oC for tceA or 55 oC for General Bacteria followed by extension for 10s at 72 oC. A final melting curve analysis was conducted at the end of the program. R2 values were 0.99 or greater and efficiency values 80-110%.

2.2.7 Amplicon Sequencing and Analysis For microbial community analysis, amplicon sequencing was performed on extracted DNA, which was amplified by PCR using general primers for the 16S rRNA gene. The universal primer set, 926f (5’-AAACTYAAAKGAATTGACGG-3’) and 1392r (5’- ACGGGCGGTGTGTRC-3’), targeting the V6-V8 variable region of the 16S rRNA gene from bacteria and archaea, as well as the 18S rRNA gene from Eukaryota, were used (100). The purified PCR products were sent to the McGill University and Genome Quebec Innovation Centre, where they were checked for quality again, pooled and subject to unidirectional sequencing (i.e., Lib-l chemistry) of the 16S gene libraries, using the Roche GS FLX Titanium technology (Roche Diagnostics Corporation, Indianapolis, IN). One to three independent 100 µL PCR amplification reactions were preformed per sample. Each PCR reaction was set up in sterile

Ultra-Pure H2O containing 50uL of PCR mix (Thermo Fisher Scientific, Waltham, MA), 2 µL of each primer (forward and reverse, each from 10 μM stock solutions), and 4 μL of DNA extract. PCR reactions were run on a MJ Research PTC-200 Peltier Thermal Cycler (Bio-Rad Laboratories, Hercules, CA) with the following thermocycling program; 95 °C, 3 min; 25 cycles of 95 °C 30 s, 54 °C 45 s, 72 °C 90 s; 72 °C 10 min; final hold at 4 °C (modified from (101)). The forward and reverse primers included adaptors (926f: CCATCTCATCCCTGCGTGTCTCCGACTCAG and, 1392r: CCTATCCCCTGTGTGCCTTGGCAGTCTCAG), and the reverse primer also included 10bp multiplex identifiers (MID) for distinguishing multiple samples pooled within one sequencing region. The PCR products were verified on a 2% agarose gel and replicates were combined and purified using GeneJETTM PCR Purification Kit (Fermentas, Burlington, ON), according to the manufacturer’s instructions. The concentrations of PCR products were determined using a NanoDrop ND-1000 Spectrophotometer at a wavelength of 260 nm (NanoDrop Technologies, Wilmington, DE). The concentrations and qualities of the final PCR products were also

23 evaluated by running them on 2% agarose gels, and comparing band intensities to those from a serial dilution of ladders with known DNA concentrations.

2.2.8 Taxonomic assignments of 16S rRNA amplicon sequences The raw DNA sequences obtained from the sequencing center were processed using the Quantitative Insights Into Microbial Ecology (QIIME v1.5.0) pipeline (102) with default settings, unless stated otherwise. Only sequences of length between 300 and 500 bp, and with homopolymers shorter than 8 bases were processed for downstream analysis. After filtering, sequences were de-multiplexed into respective samples based on their individual MID. Sequences were further clustered into distinct 16S rRNA gene-based Operational Taxonomic Units (OTUs) using the UCLUST algorithm (103), similarity threshold of 0.97 and the Green Genes database (version 13.5) (104). Taxonomy was assigned to each OTU by the Ribosomal Database Project (RDP) classifier (105).

2.3 Nucleotide sequence accession numbers KB-1 Dehalococcoides mccartyi closed genome nucleotide accession numbers in the National Center for Biotechnology Information (NCBI): strain KBDCA1 CP019867, strain KBDCA2 CP019868, strain KBDCA3 CP019946, strain KBVC1 CP019968, strain KBVC2 CP19969, strain KBTCE1 CP01999, strain KBTCE2 CP019865, and strain KBCTCE3 CP019866.

16S rRNA amplicon sequences have been deposited in NCBI in the short-read archive (SRA) accession no. SRP144609 as part of bioproject no. PRJNA376155.

2.4 Results and Discussion 2.4.0 General features of D. mccartyi KB-1 genomes Eight complete genomes of D. mccartyi strains were assembled (Figure 2-2) and annotated from three different enrichment cultures (Table 2-1) using Illumina mate-pair and paired-end metagenomic sequencing in combination with 16S rRNA amplicon sequencing and qPCR of function rdhA to guide assembly. In all KB-1 enrichments we found lower abundance, fragmented D. mccartyi contigs which could be not assembled into complete genomes indicating that the three KB-1 sub-cultures have at least eleven strains of D. mccartyi. The strains whose genomes we could assemble have been named based on contaminant/electron acceptor amended and a number to indicate rank abundance with respect to other strains found in that same enrichment culture (Figure 2-3). D. mccartyi in general have 98% sequence similarity across 16S rRNA gene (106) (Appendix Figure A1) and fall into three clades known as the Pinellas, Victoria

24 and Cornell groupings (107). KB-1 strains (KBVC1, KBVC2, KBDCA1, KBDCA2, KBDCA3 and KBTCE1) readily fall into the Pinellas clade (containing strains CBDB1 (65), BTF08 (108), DCMB5 (108), 11a5 (109), WBC-2 (53), GT (79), IBARAKI (110) and BAV1), while two fall into the Cornell clade (containing strain 195, MB (33) and CG4 (35)) (Appendix Figure A1). All genomes have a clear GC skew common in bacteria with one origin of replication. The majority of reductive dehalogenase genes continue to be found flanking the origin of replication, primarily coded on the leading strands.

Strains KBVC1 and KBDCA3 each contain a complete CRISPR-Cas system (111) similarly to published strains 11a (112), CBDB1 (65), DCBM5 (108) and GT (79). Strains KBTCE2 and KBTCE3 contain nitrogen fixation genes similarly to strain 195 (113). The number of putative reductive dehalogenase genes (rdhA) varies from five to twenty-two. In general, these new genomes have some of the features which have already been seen in other D. mccartyi strains. However, here we present the smallest D. mccartyi genome closed to date: strain KBTCE3 1.27 Mbp with the smallest number of rdhA found in a D. mccartyi genome (only five in KBTCE2 and KBTCE3) and highest GC content (49.3% in KBTCE3) (Table 2-1).

25

Figure 2-2. Schematic flow chart of workflow used to assemble Dehalococcoides mccartyi genomes from KB-1 metagenomes.

26

Table 2-1. General features of Dehalococcoides mccartyi genomes closed from KB-1 trichloroethene (TCE), 1,2-dichloroethane (1,2-DCA) and vinyl chloride (VC) enrichment cultures compared to type strain 195.

195 KBVC1 KBVC2 KBTCE1 KBTCE2 KBTCE3 KBDCA1 KBDCA2 KBDCA3 (Reference)

Genome size (Mbp) 1.39 1.35 1.39 1.33 1.27 1.43 1.39 1.34 1.47

G+C content (%) 47.3 47.2 47.3 49.1 49.3 47.4 47.5 47.6 48.9

Protein coding genes 1468 1432 1451 1381 1319 1496 1462 1404 1582

Hypothetical genes (%) 31.1 30.2 30.3 29.1 26.8 32.8 31.9 29.1 34.4

tRNA 47 48 47 45 45 47 46 46 46

CRISPR-Cas genes 7 0 0 0 0 0 0 6 0

Nitrogen fixation genes 0 0 0 9 9 0 0 0 10

Serine recombinases 2 0 0 5 2 5 4 2 2

Tyrosine recombinases 2 4 4 5 5 5 5 2 2

Sub-group/Clade Pinellas Pinellas Pinellas Cornell Cornell Pinellas Pinellas Pinellas Cornell

Electron Acceptor VC VC TCE TCE TCE 1,2-DCA 1,2-DCA 1,2-DCA PCE provided to Culture

rdhA genes 22 16 16 5 5 7 7 9 18

VcrA, VcrA, VcrA, TceA, Identifiable rdhA* TceA TceA BvcA BvcA BvcA PceA PceA PceA PceA

NCBI accession number CP019968 CP019969 CP019999 CP019865 CP019866 CP019867 CP019868 CP019946 NC002936

* Identifiable rdhA indicates presence of rdhA gene whose protein product was characterized in a different study. It is not known whether these are currently expressed and by these strains

27

Figure 2-3. Overview of D. mccartyi genomes closed from three different KB-1 enrichment cultures. Each culture is labeled by electron acceptor and electron donor and the date the culture was first created. Trichloroethene (TCE), vinyl chloride (VC), and 1,2-dichloroethane (1,2-DCA) were the electron acceptors for each enrichment culture. Genomes which could be closed are identified by a name indicating electron acceptor and rank abundance as determined from read depth. Reductive dehalogenase homologous genes (rdhA) are marked on each genome, coloured by orthologous (OG) group. HPR- High plasticity region, individually HPR1 and HPR2. Dotted line in genome indicates blocks of rdhA missing.

28

2.4.1 Amplicon sequencing of four KB-1 enrichment cultures The microbial diversity found in the KB-1 consortium and derived sub-cultures has been studied since its first enrichment from TCE-contaminated soils in 1996 (48, 82). In this study we used 16S rRNA amplicon sequencing to confirm community composition of four different cultures each amended with the same amount of electron donor (5x methanol or 5x hydrogen gas) and different chlorinated substrates including: TCE, cDCE, VC and 1,2-DCA. The main roles of both dechlorinating and non-dechlorinating organisms have been well established in the KB-1/TCE- MeOH enrichment culture with D. mccartyi being responsible for dechlorination of TCE and all daughter products to ethene, and Geobacter sp. capable of stepwise dechlorination of PCE to cDCE (31, 48). Non-dechlorinating organisms such as acetogens/fermenters (Acetobacterium, Spirochaetaceae, Synergistales), methanogens (Methanoregula and Methanomethylovorans) and Firmicutes (Sporomusa) degrade methanol to hydrogen or methane and some provide key nutrients such as corrinoid cofactors (48) (Figure 2-3). While individual organisms carrying out a particular function have been known to vary in relative abundance (114), community level functioning remains consistent. Different techniques have been previously used to track microbial diversity within KB-1 cultures (qPCR, metagenome sequencing (48), and shot-gun metagenomic microarray (115)) identifying a similar set of genera over many decades, also reflected in results presented here (Figure 2-4, Appendix Table A1).

The main purpose of this round of 16S rRNA amplicon sequencing was to assist with the assembly of mate-pair and paired-end sequencing conducted in order to close D. mccartyi genomes. Better assemblies produced longer contigs which aided in genome closing. Figure 2-4 shows the combined data from 16S rRNA amplicons sequencing, metagenomic Illumina sequencing and qPCR used in concert to close D. mccartyi genomes.

29

Figure 2-4. Culture composition and Dehalococcoides mccartyi (Dhc) genomes closed from 16S rRNA amplicon sequencing, Illumina sequencing and qPCR of rdhA genes. DNA was extracted from four KB-1 enrichment cultures. The same DNA was split and analysed using 16S rRNA amplicon sequencing, (bar charts), Illumina mate-pair and paired-end assembly and genome closing (genomes listed by rank abundance and strain name if closed) and surveyed for rdhA genes using qPCR. Genes found above the detection limit are listed. See appendix Table A1 for tabular qPCR results.

30

2.4.2 Reductive dehalogenase genes in Dehalococcoides mccartyi and conserved synteny Hug et al. (2013) (14) developed a classification system for reductive dehalogenases where sequences were assigned to orthologous groups (OGs) based on ≥ 90% amino-acid pairwise identity to attempt to cluster rdhA sequences into groups with activity on the same specific halogenated electron acceptors (53) (Figure 2-5). The database is available on Google drive (https://drive.google.com/drive/folders/0BwCzK8wzlz8ON1o2Z3FTbHFPYXc) and is user- updated to include new strains. Previous studies identified 31 distinct rdhA genes in the KB- 1/TCE-MeOH culture (48, 49). Although 104 RdhA sequences were found in this round of DNA sequencing of KB-1, only three new OGs could be described. The remaining RdhA fell into already identified groupings (Figure 2-5). In general, as more D. mccartyi genomes are closed, fewer dehalogenases are found that don’t already belong to an OG, (Figure 2-6) at this time there are 84 OGs.

Whole genome alignments of Dehalococcoides mccartyi generally result strong core-genome region alignments (~90% nucleotide pairwise identity), with poor alignments of two regions flanking the origin, deemed the high plasticity regions (HPR) (<30% pairwise nucleotide identity). Analysing twenty-four closed genomes has revealed that the differences between HPRs of different strains are not necessarily a result of recombination, but also a result of gene loss (Figure A2). This is also true for rdhA genes found in each strain. In fact, a conserved sequential order of rdhA exists which is the same in all D. mccartyi strains (Figure 2-6A HPR1, Figure 2- 7B HPR2). For example at the end of HPR2 the most common order of RdhA ortholog groups is 5’-40:30:34:11:10:17-3’. Some strains have all of these OG (CG5, BTF08, CBDB1, KBVC1, GT, KBVC2, KBTCE1, 11a5) while certain strains only have a select few such as IBARAKI 5’- 30:11-3’, CG4 5’-10:17-3’ or GY50 5’-40:30:10:17-3’ (Figure 2-7B).

31

Figure 2-5. Phylogenetic amino acid tree of reductive dehalogenases from D. mccartyi closed genomes. Most likely tree of 100 bootstraps. Scale indicates number of substitutions per site. Orthologous groups (OGs) of dehalogenases with upwards of 90% amino acid identity are highlighted and identified by number. OGs containing a functionally characterized representative are annotated by dehalogenase name. RdhA sequences are coloured by genome they originated from. RdhA are named by NCBI locus tag. D. mccartyi strain name is indicated before locus tag unless included in locus tag.

32

90 80 70 60

50 40 30 20 10 found OGs newof Number 0 0 5 10 15 20 25 Number of genomes closed

Figure 2-6. The number of new orthologous groups (OGs) of reductive dehalogenases (RDases) found with each new Dehalococcoides mccartyi genome closed available from NCBI. Single RDases with no group were added.

33

Figure 2-7A. Order of rdhA found in high plasticity region one (HPR 1) in twenty-two Dehalococcoides mccartyi genomes labeled by strain name. RdhA are labeled by orthologous group (OG) number. RdhA of the same OG share >90% amino acid pairwise identity. RdhA without a group number (i.e. singletons) are not included due to space limitations. Green and red accents indicate which DNA strand the gene is located on, green being leading strand clockwise from the oriC. The majority of rdhA are on the leading strand. * found in all D. mccartyi strains.

34

Figure 2-7B. Order of rdhA found in high plasticity region two (HPR 2) in twenty-two Dehalococcoides mccartyi genomes labeled by strain name. RdhA are labeled by orthologous group (OG) number. RdhA without a group number are not included due to space limitations. RdhA of the same OG share >90% amino acid identity. Green and red accents indicate which DNA strand the gene is located on, green being leading strand clockwise from the oriC. The majority of rdhA are on the leading strand. HPR2 starts at tRNA-Leu/Arg/Val approximately 1.2 Mbp from the origin. OG 35 only in strain 195 after tRNA-Ala 1.3Mbp from the origin. + OG known to be expressed during starvation.

35

2.4.3 Trends in Dehalococcoides rdhA: acquisition, loss and evolution For the general condition where rdhA occur according to the consensus order, these rdhA are typically preceded by MarR-type regulators, such as MarR-type regulator Rdh2R (cbdb1456) found to supress downstream rdhA expression in CBDB1 (116). These rdhA are defined as syntenic since their location does not vary between genomes. Syntenic OG are also older than the Dehalococcoides clade speciation event (Figure 2-8). The same variations in nucleotides are conserved within a particular clade which result in the clade-specific branching seen in Figure 2- 9 (detailed examples) and Figure A3 (larger nucleotide rdhA tree). Additionally the number of nucleotide mutations observed, dates their most recent common ancestral sequence earlier than when clade separation occurred (Table 2-2). Syntenic OG alignments have small Ka/Ks ratios suggesting that while a small fraction of mutations are deleterious, the rest are neutral meaning they are not currently under selective pressure. The first, and most common trend observed from sequence information is a consensus order of rdhA genes which are ancestral and not in current use (Table 2-3).

In four instances there is evidence for positive selection on RdhA. For example OG 13 and 71 occur in the same position in the genome, but different strains present either one, or the other member of the pair. OG13 (Figure 2-9), which is found in Pinellas and Victoria clades, is highly similar to OG71 found in the same location in the genome only in Cornell strains (Figure 2-9). In this case OG13 dehalogenases have small Ka/Ks ratios, while OG17 in has high ratios suggesting a positive selection event. Conditions experienced by Cornell strains likely lead to the use and specialization of this dehalogenase to local conditions. Therefore, the second trend in dehalogenases in D. mccartyi is the evolution of new dehalogenases from existing ones (Table 2- 3).

Three rdhA are thought to have been acquired laterally by D. mccartyi including tceA (OG5, Figure 2-8 ) (70), bvcA (OG28) (39), and vcrA (OG8) (39, 40) due to their location on genomic islands and very high nucleotide and amino acid sequence conservation among strains which span vast geographical distances. Additionally, cbrA (OG53), mbrA (OG52) and pteA (OG16) also have very few mutations and occur in the vicinity of mobile elements. McMurdie et al. (2011) established that the vcrA gene nucleotide polymorphisms indicate that it was acquired by Dehalococcoides approximately ~1000 years ago, possibly earlier, after the Dehalococcoides clade speciation event. All rdhA related to mobility genes also have few mutations within OG

similarly to vcrA, also suggesting that they were acquired in a similarly recent time frame (Table 2-2). All of these mobile OGs have members which have been biochemically characterized due to their connection with industrial pollutant degradation.

Four OG display movement within a genome, rather than between genomes (as described above). Victoria clade strains VS, CG1 and GY50 contain examples of an OG group occurring twice in the same genome suggesting a duplication event. Duplicated OG occur in different HPRs not in tandem (Figure 2-7). In one case, movement of rdhA appears to have occurred within genome without duplication such in the case of pceA (OG30). OG 30 is syntenic in HPR2 in 14 strains with the exception of strain 195 where it occurs in HPR1 (Figure 2-7). In strain 195, pceA is located near a serine recombinase which possibly mediated its movement. Additionally strain 195 pceA is present without its usual upstream transcriptional regulator thought to be responsible for rdhA regulation (116, 117). Transcriptomic studies show that strain 195 will continue to produce high transcript levels of pceA regardless of starvation or TCE amendment (118). Additionally 195 is the only strain to produce pceA in the presence of PCE. Other strains such as CG5 transcribe pceA in the presence of multiple PCB congeners (35) and CBDB1 was found to transcribe pceA in the presence of 2,3-DCP (78, 119). Thus, the third observable trend in dehalogenase evolution is mobility, either within or between strains (Table 2-3).

37

Figure 2-8. Phylogenetic tree created from an alignment of 109 concatenated core genes from Dehalococcoides mccartyi closed genomes and Dehalogenimonas closed genomes with Chloroflexi Sphaerobacter thermophilus as out-group. Most likely tree of 100 bootstraps. Bottom scale shows timing of key events including clade separation in D. mccartyi and the most recent common ancestor (MRCA) of several dehalogenases listed by name or by ortholog group if uncharacterized. D. mccartyi clades are highlighted in common colour. Scale indicates number of substitutions per site. Double cross-hatching indicates this branch was reduced in length by half for visualization purposes.

38

Figure 2-9. Phylogenetic tree of reductive dehalogenase genes which belong to orthologous group (OG) 5, 13, 71, 15 and 34. Most likely nucleotide tree displayed from 100 bootstraps. Scale shows number of substitutions per site. A trichloroethene dehalogenase from Desulfitobacterium (Dhaf_0696) used as out-group. The rdhA in the tree are coloured by clade: blue (Pinellas), green (Victoria), red (Cornell) and identified with strain name followed by locus tag of rdhA in parentheses. Dehalogenimonas homologous rdhA are shown in black.

39

Table 2-2. Number of mutations incurred since Most Recent Common Ancestor (MRCA) of D. mccartyi and select reductive dehalogenase genes. myr- million years Sample calculation in Table A2.

Divergence Method and Reference # mutations Time (myr)

Dehalococcoidia MRCA Core genes tree (Fig. 2-7) 8.6e05 0.28-3.21

D. mccartyi MRCA (clades split) Core gene tree (Fig. 2-7) 8.4e04 0.03-0.31

OG 13 and OG 71 MRCA rdhA gene tree 1.5e05 0.05-0.58

OG 15 MRCA rdhA gene tree 9.6e04 0.03-0.36

OG 34 (duplicate) MRCA rdhA gene tree 7.3e04 0.02-0.27

OG 5 (tceA) MRCA rdhA gene tree 5.9e03 0.00-0.22

OG 8 (vcrA) MRCA rdhA gene tree (40) 5.0e03 0.00-0.19

OG 28 (bvcA) MRCA rdhA gene tree 2.8e03 0.00-0.10

OG 52 (mbrA) MRCA rdhA gene tree 3.5e03 0.00-0.13

OG 53 (cbrA) MRCA rdhA gene tree 2.0e03 0.00-0.07

OG 16 (pteA) MRCA rdhA gene tree 1.0e03 0.00-0.04

40

Table 2-3. Common features of orthologous groups (OGs) of reductive dehalogenases from D. mccartyi.

Type OG #s General Features Protein information

10, 11, 15, OG 15 known to be 17, 19, 21, Present in more than one clade. expressed by D. mccartyi 22, 23, 30, during starvation (31, 120). 1 48, 55, 60, Found in D. mccartyi before clades 66 separated. Synteny within genomes OG 23 found in all D. conserved. OG 15 illustrated in Fig. 2-8. mccartyi genomes at time of analysis

13/71 18/40 Pairs of OG groups which have recently diverged from one another. Location of No characterized members 2 32/69 33/38 rdhA is syntenic across all genomes. OG at time of study 13 and 71 illustrated in Fig. 2-8.

5 – TceA (15)

8 – VcrA (121)

5, 8, 16, 28, Highly similar in genomes regardless of 16 – PteA (109) clades. Often associate with mobile 3 52, 53 28 – BvcA (32) genes, or on genomic island. OG 5 illustrated in Fig. 2-8. 52 – MbrA (33)

53 – CbrA (122)

26, 34, 57 Duplicated within the same genome. OG No characterized members 4 34 illustrated in Fig. 2-8. at time of study

2.4.4 Gene loss and genomic streamlining in Dehalococcoides mccartyi D. mccartyi genomes are unique in that they are amongst the smallest genomes found in free- living bacteria (avg. 1.4Mbp and 1451 protein-coding genes). A common theme amongst all of small free-living prokaryotes is their high niche specialization and low-nutrient level environments (123). The D. mccartyi genomes required an extensive period of time to become as

41 specialized and small as it currently is. Wolf et al. 2013 (76) present the theory that gene loss is equally or even more important than lateral gene transfer in shaping genomes. The high level of synteny and number of mutations found in orthologous groups of dehalogenases correspond to the same premise in D. mccartyi. Given that all dehalogenases which show evidence of mobility dehalogenate industrial contaminants, it is possible that anthropogenic releases of organohalides have caused D. mccartyi’s genome to enter a period of complexification by sharing select rdhA across vast geographic spans and causing rearrangements within genomes. The exchange of key reductive dehalogenases amongst the D. mccartyi can be similarly described as the recent dissemination of resistance genes in the natural environment (124).

2.5 Conclusions Metagenomic sequencing of KB-1 sub-cultures enriched on different electron acceptors has given us new genomes and then new insights into the multiple co-existing strains of D. mccartyi in each sub-culture. The KB-1 consortium is robust as a result of functional redundancy within its complement of fermenting, acetogenic, methanogenic and dechlorinating organisms, even to the extent of including significant strain variation within the Dehalococcoides. D. mccartyi is an ancient species whose small genomes are an example of extreme genome streamlining, niche specialization and gene loss. The majority of rdhA genes display a much higher degree of synteny between genomes than previously appreciated and have likely been found in D. mccartyi for over hundreds of thousands of years, from the time of a Dehalococcoidia common ancestor. It is possible that the relatively recent anthropogenic releases of high concentrations of specific chloroorganic compounds fueled the dissemination of select reductive dehalogenases capable of their degradation, initiating a period of adaptation and complexification of the D. mccartyi genome. D. mccartyi rdhA complement has been shaped by: (1) adaptation of existing rdhA to new substrates (2) assimilation of new rdhA from the environment, or movement between genomes and (3) duplication or movement within genomes.

42

Chapter 3 Metagenomic sequencing of the WBC-2 enrichment cultures Reproduced with permission from the journal of Applied and Environmental Microbiology, the American Society for Microbiology. Copyright © American Society for Microbiology, Applied and Environmental Microbiology, January 2016 vol. 82 no. 1 40-50. And the journal Genome Announcements Copyright © 2016 Molenda et al. Genome Announc. November/December 2016 vol. 4 no. 6 e01375-16.

3.1 Introduction The West Branch Canal Creek microbial consortium (WBC-2) is used for bioremediation of sites contaminated with chlorinated alkenes and alkanes in the subsurface most notably 1,1,2,2- tretrachloroethane (TeCA) (50, 125). In this consortium, TeCA is dihaloeliminated to 1,2-trans- dichloroethene (tDCE) by Dehalobacter, tDCE is dechlorinated to vinyl chloride (VC) by Dehalogenimonas sp. WBC-2 and VC to ethene by Dehalococcoides mccartyi WBC-2 (50, 53).

A sub-culture enriched solely on tDCE contains only a Dehalogenimonas strain and a Dehalococcoides mccartyi. This sub-culture has been maintained batch style in anaerobic minimal medium with 0.635 meeq tDCE, 3.2 meeq ethanol and 2.2 meeq lactate since 2010 as previously described (50, 53). The purpose of this study was to conduct DNA sequencing on the WBC-2 enrichment cultures in order to identify species present, and close the genomes of the Dehalococcoides mccartyi and novel Dehalogenimonas strain from the tDCE enrichment culture.

3.2 Methods 3.2.1 Developing enrichment cultures for analysis. The West Branch Canal Creek (WBC-2) consortium was originally enriched from contaminated wetland sediment at Aberdeen Proving Grounds (APG) MD, U.S.A as described in E. J. P. Jones, et al. (125). In early 2007, one litre of WBC-2 was sent to the Edwards Laboratory at the University of Toronto and separated into enrichment cultures with different terminal electron acceptors using transfers into anaerobic mineral medium (126). On average 60 ± 11% of enrichment cultures are composed of OHRB including Dehalococcoides, Dehalobacter and Dehalogenimonas (50, 125, 127). The parent 1,1,2,2-TeCA-fed WBC-2 culture was split into different sub-cultures each maintained on different electron acceptors as described in M. Manchester, et al. (50) to functionally characterize the OHRB in the culture. We have now generated additional sub-cultures amended with a total of eleven different electron acceptors,

43 including individual substrates PCE, TCE, 1,1-DCE, cDCE, tDCE, VC, TeCA, 1,1,2-TCA, and 1,2-DCA, and a combination of three chlorinated substrates: 1,1,2-TCA, TeCA, cDCE. Each WBC-2 enrichment is also amended with an electron donor mix of 5x ethanol and 3.5x sodium lactate. This chapter focuses on DNA collected from the TeCA, VC, tDCE enrichment cultures (Figure D1).

The first tDCE enrichment culture was a 1:50 transfer created in 2010 abbreviated tDCE/EL_2010 and the second two cultures were created a year later from tDCE/EL_2010 in parallel 1:1000 transfers referred to as tDCE/EL_2011_A and tDCE/EL_2011_B (Fig. S3.1 from Chapter 3). A fourth 1:40 transfer was created in 2013 referred to as tDCE/EL_2013. The two 2011 A and B cultures were purged every week to remove accumulating VC prior to its conversion to ethene to increase the ratio of Dehalogenimonas (Dhg) to Dehalococcoides (Dhc). Not all VC could be removed; a small amount was still converted to ethene. This regime was successful in changing the ratio of Dhg to Dhc from approximately 1:1 as in the tDCE/EL_2010 culture to 4:1 in tDCE/EL_2011 cultures.

3.2.2 Metagenomic paired-end Illumina DNA sequencing The WBC-2 enrichment culture maintained with TeCA was selected for paired-end sequencing because this culture contains all three dechlorinating genera (Dehalobacter, Dehalococcoides and Dehalogenimonas) and thus serves as a database for LC-MS/MS analyses (described in Chapter 6). DNA was extracted using UltraClean® Microbial DNA isolation kit using the standard protocol provided by MoBio. DNA was paired-end sequenced at the Centre for the Analysis of Genome Evolution & Function (CAGEF) at the University of Toronto (insert size of ~500 bp and read length of ~75bp) in two lanes of 46 million reads each. The reads were assembled using ABySS 1.3.2 with a k-mer set to 70 resulting in 13,170 contigs of 12.9 Mbp. Contigs were binned using k-means clustering of tetranucleotide frequency followed by manual bin curation using read-depth. Contigs were six-way translated using Geneious 6.0.5 and separated into individual proteins for use as the LC-MS/MS database. Proteins with fewer than eight amino acids were removed from the database. Proteins which were identified were annotated based on a Blastp ref_seq search. RDases were further curated to ensure appropriate length and presence of known RDase motifs. After closing the Dehalogenimonas genome (see below) the “Dehalogenimonas” bin was compared and found to contain only Dehalogenimonas

44 sequences but missing 215 kbp out of 1725 kbp (12.5%) suggesting that this binning technique was accurate but conservative.

3.2.3 Dehalogenimonas and Dehalococcoides mccartyi genome assembly DNA from the WBC-2 tDCE/EL_2013 enrichment culture (enriched for Dehalogenimonas) was extracted using UltraClean® Microbial DNA isolation kit using the standard protocol provided by MoBio. DNA was sent for paired-end and mate-pair sequencing at the Genome Quebec Innovation Sequencing Centre using Illumina HiSeq 2500 technology. Paired end sequencing with an insert size of ~400 bp and read length of ~150 bp returned 50 million reads. Mate-pair sequencing with insert size of ~ 8000bp and read length of ~100bp returned 150 million reads. Genome assembly was conducted using the same workflow as described in Chapter 2. All gaps were resolved using an in-house automatic gap resolution program (83) except two gaps which were resolved manually. Gaps occur in initial scaffolds due to repeated regions, clonal variation, and missing information due to ABySS k-mer restrictions. One of the gaps in Dehalogenimonas that had to be resolved manually included a high GC content region which was resolved by randomly sampling more reads (high GC regions are difficult to sequence hence more reads were required to get adequate coverage). The second Dehalogenimonas gap had some adapter contamination from the sequencing process which was removed. The genomes were polished by read mapping in Geneious v. 6.1 with a 90% cut-off on single nucleotide polymorphisms (SNPs). Both genomes had fewer than 20 SNPs and a consistent read depth range. Both SNPs and read depth suggest that there is only one population of Dehalogenimonas and one population of Dehalococcoides mccartyi in the WBC-2 tDCE enrichment culture. The origin of replication was identified using Oriloc R program package (92). The genome was annotated using the RAST server; results were manually inspected and corrected where required. Genomic visualization was conducted using BRIG 0.95 and genoPlotR 2.15.3 package in R 2.15.2. Identification of transmembrane helices in rdhB genes was conducted using EMBOSS tmap (128).

3.2.4 Quantitative polymerase chain reaction (qPCR) analysis DNA was extracted from samples (5 mL) from nine different WBC-2 enrichment cultures during mid-dechlorination using the PowerSoil®DNA Isolation Kit following the standard extraction protocol provided by MoBio (enriched separately on TeCA, 1,1,2-TCA, VC, PCE, TCE, cDCE, and a mixture of three electron acceptors TeCA, 1,1,2-TCA & cDCE). DNA concentrations were measured using NanoDrop. DNA samples were quantified using Dehalococcoides 16S rRNA,

45 vcrA qPCR (see Chapter 2), Dehalogenimonas 16S rRNA (98°C, 2 min, 39 cycles of : 98°C 5s, 58°C 10s, melt-curve: 65°C-95°C , 4°C ∞) and Dehalobacter 16S rRNA (98°C, 2 min, 39 cycles of : 98°C 5s, 62.5°C 10s, melt-curve: 65°C-95°C , 4°C ∞) qPCR. The same qPCR method as described in Chapter 2 was used again here.

3.2.4 16S rRNA amplicon sequencing DNA from each WBC-2 enrichment culture was extracted using 30 mL of culture. 16S rRNA primers were used to amplify the signature gene (16S rRNA gene) using PCR. Primers with different barcodes were used for the different enrichment cultures. This allowed multiple samples to be sequenced in the same lane since the reads could later be mapped back to the culture they came from. The cleaned PCR products were sent to Genome Quebec for Roche 454 GS-FLX Titanium sequencing resulting in ~250K reads per quarter plate, as described in Chapter 2.

3.2.5 Analysis using Quantitative Insights Into Microbial Ecology (Qiime) The same workflow was used to analyze the 16S rRNA data here as in Chapter 2. Briefly, a series of QIIME 1.5.0. scripts were used: first reads of poor quality were filtered out; this included reads with a quality score lower than 25, length shorter than 220, containing ambiguous bases, mismatches to the primer sequences or with 8 or more homopolymers in a row. Operational taxonomic units (OTUs) were clustered using the UCLUST algorithm. Each cluster was aligned with itself and a representative OTU was selected. Representative OTUs were identified using the RDP classifier. Communities were then summarized by taxonomic composition and diversity.

3.2.6 Homologous gene clustering and pangenome analysis Twenty-four D. mccartyi genomes and five Dehalogenimonas genomes (KB-1 and from NCBI) were analyzed using the GET_HOMOLOGUES (129) open sourced software package designed for pangenome analysis. The prokaryotic genome pipeline was used to cluster homologous protein families using first BLASTp (min coverage 75 and E-value 1e-5) to calculate bidirectional best-hit (BDBH) followed by Markov clustering referred to as OrthoMCL method. Minimum cluster size was one protein sequence. A pangenome matrix was created summarizing which Dehalococcoidia genome had which protein cluster present in its genome, and how many representatives from that cluster.

46

In order to investigate synteny a series of whole genome alignments was produced using Mauve (130) plugin in Geneious 8.1.8 (89). Subsequently MCScanX (131) package was chosen to calculate collinear blocks of coding sequences and create figures. All KB-1 Dehalococcoides mccartyi genome coding sequences as well as Dehalogenimonas WBC-2 (Dehalococcoidia) and Sphaerobacter thermophilus (a Chloroflexi) coding sequences were compared using BLASTp. One best alignment was chosen for each coding sequence from each of the genomes with an E- value of at least 1e-2. MCScanX used BLASTp input to calculate collinear blocks and progressively align multiple collinear blocks between genomes on default settings. MCScanX’s circle plotter program was used to generate figures.

3.2.7 Statistical analysis of pangenome homologous protein clusters Pearson’s chi-squared test was used to determine whether the proteins found in each homologous cluster differed significantly between Dehalococcoides mccartyi and Dehalogenimonas. A correspondence analysis (CA) was used to compare the contents of the protein clusters with the genomes they were generated from. In this analysis a total of 2875 homologous protein clusters were generated. A scree plot was used to compare the percentage contributions of each dimension to the expected value (3.6%, if all dimensions contributed equally) in order to only consider significant dimensions. As a result we reduced the number of dimensions from 28 to 9. The contributions of individual genomes and protein clusters were identified by creating a bar plot of top contributors and comparing to a reference line corresponding to the expected value if the contributions were uniform. Any row/column with a contribution above the reference line was considered important to the final ordination. All analyses were conducted using R. 3.4.0. using FactoMineR, factoextra and vegan packages.

3.2.8 NCBI accession numbers The binned contigs from the TeCA-dechlorinating parent culture metagenome have been deposited at NCBI under whole genome shotgun project JXWO00000000. The Dehalogenimonas sp. WBC-2 genome has been deposited under accession number CP011392. The Dehalococcoides mccartyi WBC-2 genome has been deposited in GenBank accession no. CP017572. WBC-2 16S rRNA amplicon are also in NCBI (SRA) accession no. SRP051778 as part of bioproject no. PRJNA269960.

47

3.3 Results and Discussion 3.3.1 Microbial diversity found in WBC-2 enrichment cultures In 2006, two clone libraries were constructed on the WBC-2 culture one by SiREM laboratories finding (from highest to lowest abundance) Acetobacteria, Clostridium, Dehalobacter, Veillonaceae, Clostridiaceae, Syntrophomonadaceae, Peptococcaceae, Clostridiales and Dehalococcoides and one by USGS researchers finding Clostridium, Acetobacterium, Dehalobacter, Bacteriodes, Geobacter and Pseudomonas (125, 127). After enrichment on specific chlorinated electron acceptors in the Edwards laboratory, the dominant dechlorinating organisms became species from the genus Dehalobacter, Dehalogenimonas and Dehalococcoides (50) (Figure 3.1) with main non-dechlorinating organisms being Anaeromusa, Acetobacterium and Methanosphaerula.

WBC-2 Enrichment Cultures 100% 90% 80% 70% 60% 50% 40% 30% 20% copies/mL copies/mL culture 10% 0% TeCA, cDCE TeCA 112TCA PCE TCE VC tDCE tDCE composition composition based (%) on qPCR 112TCA, purge VC cDCE

Dhb Dhg Dhc

Figure 3-1. Composition of dechlorinating genera Dehalococcoides (Dhc), Dehalogenimonas (Dhg) and Dehalobacter (Dhb) quantified using qPCR in WBC-2 cultures enriched on different chlorinated substrates. TeCA – 1,1,2,2-tetrachloroethane, cDCE – cis-dichloroethene, 112TCA – 1,1,2-trichloroethane, PCE – perchloroethene, TCE – trichloroethene, VC – vinyl chloride, tDCE – trans-dichloroethene. November 2012. Replicate cultures and additional qPCR time point found in Appendix Figure B2.

The type of electron acceptor used has an impact on which dechlorinating organisms are present. The three dechlorinating genera likely co-exist because of their niche specificities. Each species dechlorinates a substrate which the others cannot, or do with lesser ability. Other OTU found

48 which may come from dechlorinating genera include sequences classified as Desulfitobacterium, Geobacter, and Dehalobacterium. The amplicon sequencing allowed for a greater resolution of Dehalococcoides in WBC-2, identifying three highly similar OTUs (named Dehalococcoides 1, 2 and 3 in Figure 3-2). It appears that WBC-2 also contains multiple strains of Dehalococcoides similarly to the KB-1 enrichment cultures. OTU Dehalococcoides 3 (OTU 1047) is the most dominant present in all cultures coming from the Pinellas Clade. OTU Dehalococcoides 1 (OTU 527), also a member of the Pinellas clade is in far lower abundance, also in all cultures. The OTU referred to as Dehalococcoides 2 (OTU 663), only appears in cultures amended with chlorinated ethanes (Figure 3-2) and comes from the Cornell Clade (Table 3-1). In the laboratory setting, WBC-2 exists as a closed population, in such environments organisms with high niche specificity tend to thrive (132). Where niche specialization is not possible and resources become limiting communities may remain functionally stable but experience drastic changes in abundance as inferred from 16S rRNA copy number at the species level. Both features have been recently observed in microbial communities of bioreactors (132-134), anoxic habitats in general (135) and in dechlorinating bioreactors (136) and may explain some of the species level variation seen in WBC-2. The dechlorinating microbial community observed in this study has been functionally stable, and continues to present the same dominant dechlorinating genera in WBC-2 enrichment cultures since 2010 (Figure 3-1).

Table 3-1. Nucleotide pairwise identity of three Dehalococcoides partial 16S rRNA sequences found in WBC-2 compared with 16S rRNA sequences from three Dehalococcoides mccartyi representative genomes from NCBI. Cornell – strain CG4, Pinellas – strain BTF08, Victoria – strain GY50. Best hits for each OTU are highlighted. OTU Aligned with clade representative Percent pairwise nucleotide identity Cornell 98.5 Dehalococcoides 3 OTU 1047 Victoria 98.2 Pinellas 100 Cornell 96.7 Dehalococcoides 2 OTU 527 Victoria 96 Pinellas 98.2 Cornell 99 Dehalococcoides 1 OTU 663 Victoria 98.7 Pinellas 97.2

WBC-2 as a dechlorinating community also requires bacteria which do not participate in dechlorination but either provide essential nutrients to the dechlorinators and/or contribute to the

49 production of hydrogen gas required by dechlorinating organisms. Ethanol and lactate is added with each feeding which sustains an assortment of non-dechlorinating organisms. A particularly enriched non-dechlorinating species classified through 16S rRNA amplicon sequencing belongs to the family Veillonellaceae which was identified in 2006 clone libraries as Anaeromusa (7/75 clones) (50). Members of the family Veillonellacea are anaerobes capable of degradation of lactate and simultaneous production of CO2 and H2 gas (137). It is known that the types and amounts of donor used will have significant effects on the diversity and abundance of non- dechlorinating genera which is also true for WBC-2 (138). There is one WBC-2 enrichment culture which is grown only on TeCA and ethanol and this particular culture has very few hits to Anaeromusa. Instead, Desulfovibrio and Acetobacterium increased in abundance as inferred from proportion of 16S rRNA reads recovered during amplicon sequencing. No differences in dechlorination ability were found between TeCA cultures fed with or without lactate during routine culture maintenance. Desulfovibrio and Acetobacterium likely fill a similar role which Anaeromusa plays in the lactate fed cultures (Figure 3.2).

50000 45000 less than 0.5% Desulfovibrio 40000 Spirochaetes1 35000 Spirochaetes2 30000 Methanosphaerula 25000 Acetobacetrium 20000 Pelotomaculum

Numberofreads 15000 Anaeromusa 10000 Dehalobacter 5000 Dehalogenimonas 0 Dehalococcoides1 Dehalococcoides2 Dehalococcoides3

WBC-2 enrichment culture by electron acceptor

Figure 3-2. Number of reads of operational taxonomic units (OTU) found in WBC-2 enrichment cultures enriched on different electron acceptors. All amended with 3.5x sodium lactate and 5x Ethanol (EtOH) except where only amended with EtOH, annotated as -EtOH. OTUs identified from 16S rRNA amplicon sequencing reads are listed by phylogenetic classification.

50

3.3.2 Dehalogenimonas WBC-2 complete genome from tDCE enrichment culture The complete Dehalogenimonas WBC-2 genome was closed. It is 1.72 Mbp in size and has a GC content of 49% (Fig. 6-3). Its size is similar to the Dehalogenimonas lykanthroporepellens genome (1.69 Mbp) while its GC content lies somewhat in between D. lykanthroporepellens (55% GC content) and Dehalococcoides mccartyi CBDB1 (47% GC content, 1.4 Mbp). The genome contains a total of twenty-two rdhA and two rdhB genes as predicted by RAST. No other rdhB genes could be identified in the vicinity of rdhA genes. D. lykanthroporepellens (Dhg lyk) also has few rdhB genes; both genomes have one ‘lone’ rdhB gene (Dehly_1504 in Dhg lyk and DGWBC_0212 in Dhg WBC-2) not occurring not near any rdhA genes (51). As in Dehalococcoides genomes, the majority of rdhA genes are clustered in regions flanking the origin of replication.

3.3.3 Dehalococcoides mccartyi complete genome from tDCE enrichment culture The complete genome is 1.37 Mbp in size with 47% GC content. Based on 16S rRNA identity this particular D. mccartyi belongs to the Pinellas sub-group (107) having 100% 16S rRNA identity with D. mccartyi CBDB1 (65) and GT (79). The pairwise whole genome identity is 74.6% with CBDB1 and 70.5% with GT. Excluding high plasticity regions flanking the origin typical of D. mccartyi strains (39), the core genome is 94.6% identical to D. mccartyi CBDB1 (Figure 3.3). This genome has fifteen putative reductive dehalogenase catalytic A subunit (rdhA) genes and twelve putative reductive dehalogenase membrane anchor (rdhB) genes. D. mccartyi WBC-2 contains the previously identified vcrABC operon encoding the VC reductase (VcrA) located on a genomic island (40). VcrA was the only dehalogenase expressed by this population of D. mccartyi as determined from blue-native polyacrylamide gel electrophoresis coupled with liquid chromatography tandem mass spectrometry (Chapter 6) (53). In the current rdhA naming system developed by L. A. Hug, et al. (14), twelve of fifteen rdhA genes fall into known ortholog groups (OGs). Two rdhA have not been previously found in D. mccartyi and one will create a new OG, (OG 58) sharing 99.5% amino acid pairwise identity with D. mccartyi CBDB1 rdhA cbdbA1539. This strain harbours many characteristic features common to D. mccartyi such as an intact prophage, mobile elements, and a multitude of rdhA whose functions are not yet known. This genome was taken from a WBC-2 tDCE enrichment culture which had been diluted and re- grown several times to enhance enrichment, especially for Dehalogenimonas. From the amplicon

51 sequencing only one dominant Dehalococcoides OTU was found (Figure 3-2) and this one Dehalococcoides genome was closed.

52

Figure 3-3. The complete Dehalococcoides mccartyi WBC-2 genome. Inner scale shows position along the chromosome. Rings moving in to out: Ring 1 (light grey): Coding regions on leading strand. Ring 2 (dark grey): Coding regions on lagging strand. Ring 3 (purple and green) GC content. Ring 4 (black): GC skew. Ring 5 (green): BLASTN result against Dehalococcoides mccartyi CBDB1. Ring 6 (dark blue) BLASTN result against Dehalococcoides mccartyi VS. Ring 7 (light blue) BLASTN result against Dehalococcoides mccartyi 195. BLASTN rings (6,7,8) are colored where any significant BLASTN hit was found from genome searched (expected threshold less than 10).

53

3.3.4 Dehalococcoidia pangenome analysis In order to place these new genomes in relation to all of the genomes currently closed from the Dehalococcoidia, we conducted a pangenome analysis using the OrthoMCL method to cluster homologous protein groups. A total of 40,864 protein sequences from 24 D. mccartyi genomes and 5 Dehalogenimonas genomes available from NCBI were used to create 2875 protein families, from these 623 are found in all 29 genomes representing the core-genome. The remaining 2203 protein families are part of the accessory-genome with 49 protein families being unique (i.e. only present in one strain) (Table 3-2). The first pangenome analysis conducted in 2010 from four Dehalococcoides genomes available at the time resulted in 1118 core genes, 457 accessory and 486 unique genes (139). The most striking difference is that amongst the current Dehalococcoidia genomes, the number of unique protein families has been reduced from 486 to 49. A correspondence analysis (CA) was conducted on the protein families to identify the main differences between genomes (Figure 2-9). The CA ordination highlights the level of similarity between different strains of Dehalococcoides mccartyi, which could not be distinguished from one another in this analysis with statistical significance. Only the Dehalococcoides and the Dehalogenimonas genera were significantly different from each other (χ² p-value=0.0004998, Figure B3). The Dehalococcoides genomes are different strains from the same species which corresponds with the outcome of the CA ordination, in contrast, the Dehalogenimonas genomes do come from different species, and those differences can be seen both along axis 1 (x-axis) distance from the Dehalococcoides cluster, and along axis 2 (y-axis) differences between the different Dehalogenimonas species in the correspondence analysis ordination plot (Figure B4 C&D). The first two dimensions accounted for 36% of the variation between the genomes (Table B1). The only protein families significantly contributing to both axis one and axis two in the CA ordination were calculated based only on the distribution of protein families in the five Dehalogenimonas genomes (Figure B4 B).

Homologous protein families generated in the Dehalococcoidia pangenome analysis group all 83 OG Dehalococcoides RdhA and currently unclassified Dehalogenimonas RdhA into only 41 groups with roughly half (19 of 41) containing both Dehalococcoides and Dehalogenimonas RdhA (Table 3-3). In other words, 41 protein clusters (or families) contain RdhA sequences, and some of these clusters contain several OGs. These protein families suggest that a Dehalococcoida ancestor could have had at least 41 reductive dehalogenase genes. From all of the rdhA

54 sequences found in the Dehalococcoides and Dehalogenimonas genomes it is clear that certain groups of rdhA are more similar to each other, and presumably have a more recent evolutionary link based on these homologous protein clusters. When looking at Dehalococcoides dehalogenases alone we found that certain OGs, although still upwards of 90% conserved at the amino acid level, showed enough sequence divergence at the nucleotide level placing their most recent common ancestral gene prior to the divergence of Dehalococcoides and Dehalogenimonas (such as OG 15 Figure 2-9). The homologous protein clusters correspond with this premise since they cluster a representative rdhA sequence from the Dehalogenimonas with Dehalococcoides established OGs such as in the case of OG 15 (Figure 2-9), now part of homologous cluster I (Table 3-3).

Table 3-2. Homologous gene clustering of Dehalococcoidia pangenome from 24 Dehalococcoides mccartyi and 5 Dehalogenimonas genomes.

Dehalococcoides mccartyi1 and Only Dehalococcoides mccartyi1 Dehalogenimonas2

Core genes 623 993 Accessory genes 2203 1092 Unique genes 49 87 Number of genomes 29 24

1Dehalococcoides mccartyi strains 195, CBDB1, BAV1, VS, WBC-2, GT, CG1, CG3, CG5, CG4, UCH-ATV1, IBARAKI, KBTCE1, KBTCE2, KBTCE3, KBVC1, KBVC2, KBDCA1, KBDCA2, KBDCA3, DCMB5, BTF08, 11a5, GY50. 2Dehalogenimonas: lykanthroporepellens, alkenigignens, formicexedens, and sp. GP and WBC-2.

55

Figure 3-4. Correspondence analysis ordination plot of Dehalococcoidia pangenome. Points indicate clusters of homologous protein sequences (triangles). Clusters are coloured based on whether they contain only Dehalococcoides protein sequences, only Dehalogenimonas protein or protein from both Dehalococcoidia. In total 2875 clusters were identified.

56

Table 3-3. Summary of contents of reductive dehalogenase (RdhA) containing homologous protein clusters. Clusters are named A to AO. The percentage of sequences in the cluster from Dehalococcoides mccartyi (Dhc) is normalized to the number of genomes surveyed. Clusters which contain RdhA sequences from Dehalococcoides orthologous groups (OGs) are listed by group number. RdhA which have been partially biochemically characterized are listed by name.

Cluster Number of Percentage of Dhc OG# and characterized name Sequences sequences from Dhc1 representatives B 15 100 30 70 M 19 100 13 71 Q 8 100 20 T 5 100 73 X 2 100 27 Y 4 100 50 82 Z 2 100 AA 2 100 61 AB 2 100 62 AC 3 100 80 C 31 86 18 21 37 40 F 42 81 19 32 54 55 56 57 69 A 33 76 23-conserved in all Dhc 58 66 72 E 85 73 17 26 33 34 36 38 63 70 75 H 23 69 5-TceA 8-VcrA 28-BvcA 49 TdrA2 CerA2 N 21 66 29 47 60 65 67 R 57 56 10 11 12 24 68 J 32 53 22 39 79 81 P 5 45 53-CbrA 74 O 11 36 52-MbrA 35 I 26 32 15-expressed during starvation AD 3 29 83 U 11 27 14 G 7 22 16-PteA W 10 17 25 64 AF 2 17 K 14 14 48 S 13 12 51 59 AE 8 11 76 V 4 6 D 1 0 L 5 0 AG 2 0 AH 2 0 AI 3 0 AJ 2 0 AK 3 0 AL 4 0 AM 2 0 AN 3 0 AO 1 0 1Percentage of sequences which come from Dhc = (#Dhc RdhA in cluster/# Dhc genomes)/((#Dhc RdhA in cluster/# Dhc genomes) + (#Dhg RdhA in cluster/# Dhg genomes)) x 100. 2RdhA from Dehalogenimonas

3.4 Conclusions The 16S rRNA amplicon sequencing of the WBC-2 culture demonstrated that the main genera found in each enrichment culture are the same since 2010. With increased resolution of amplicon sequencing we found that there are multiple strains of Dehalococcoides present in WBC-2 originating from different clades of the Dehalococcoides. The amplicon sequencing aided in assembly of additional mate-pair and paired-end Illumina reads which allowed the Dehalogenimonas sp. WBC-2 and Dehalococcoides mccartyi WBC-2 strain genomes to be assembled, closed and annotated. Dehalogenimonas sp. WBC-2 is one of only four closed genome available for this genus which will help better describe it, and allow us to better design future experiments with WBC-2.

58

Chapter 4 Extrachromosomal circular elements targeted by CRISPR-Cas in Dehalococcoides mccartyi are linked to mobilization of reductive dehalogenase genes

4.0 Introduction Dehalococcoides mccartyi are hydrogen-utilizing, obligate organohalide-respiring bacteria. They are of interest because of their unique dehalogenating metabolism catalyzed by highly specific reductive dehalogenase enzymes that have widespread application in bioremediation and detoxification (106). D. mccartyi are remarkably small (diameter <500 nm) disk-shaped, strictly anaerobic microbes belonging to the Chloroflexi (106) that are difficult to isolate and have not been successfully grown on agar media. They have highly streamlined genomes under 1.4 Mbp, while individual strains have been shown to contain up to 36 reductive dehalogenase homologous genes (rdhAB) some of which are known to be used for organohalide respiration. Many genomes contain contiguous regions or clusters of genes that appear to have been acquired laterally and are referred to as genomic islands (140). McMurdie et al. (2011) identified the first D. mccartyi genomic island carrying the vcrABC operon coding for a functional vinyl chloride reductase (VcrA) (121). The VcrA enzyme catalyzes the conversion of vinyl chloride to non- toxic ethene and is critical for effective clean-up of sites contaminated with chlorinated ethenes. The vcrA gene is the target of monitoring tools to assess remediation progress during in-situ bioremediation (43). Previous analysis of D. mccartyi genomes revealed the presence of two high plasticity regions (HPRs) flanking the origin, separated by a highly conserved core region common to all strains (39). These HPRs contain a majority of a strain’s rdhAB genes. Lateral transfer of these reductive dehalogenase genes is thought to be a fundamental ecological strategy used by D. mccartyi (39, 40) to adapt to naturally occurring and anthropogenic halogenated compounds, although no mechanism has been identified. Many D. mccartyi genomes harbour prophages that may contribute to the spread of these reductive dehalogenase genes between different strains via lateral gene transfer. Some strains of D. mccartyi also encode CRISPR-Cas (clustered regularly interspaced short palindromic repeats–CRISPR associated) systems involved in adaptive defense mechanisms that protect the host from invading mobile elements including plasmids, phages and transposons (141-143). The role of phages and CRISPR-Cas systems in

59 facilitating or blocking lateral transfer of rdhAB genes among D. mccartyi strains has never been investigated.

CRISPR-Cas adaptive immune systems are encoded by a genetic locus that is comprised of one or more CRISPR arrays and several CRISPR associated (cas) genes. They are divided into two main classes, six types and 21 subtypes based on the cas genes present. The CRISPR arrays contain identical 21-48 base pair (bp) repeats interspaced by unique spacer sequences (26-76 bp), some of which are homologous to sequences in mobile genetic elements. The Cas proteins are involved in all stages of CRISPR immunity including adaptation, maturation and interference processes (141). During adaptation, fragments of an invading genetic element, known as spacers, are added to the CRISPR array. In maturation, the CRISPR array is first transcribed into pre- CRISPR RNA and then separated into short pieces each containing an individual recognition sequence between repeats (crRNAs) (144). During interference, Cas proteins and crRNAs form effector complexes that use the crRNAs as a guide to target and destroy invading DNA in a sequence-specific manner (143, 145). To date the function and targets of D. mccartyi CRISPR- Cas systems have not be described.

In Chapter 2, we described the metagenomic sequencing of the KB-1 enrichment cultures. We closed the genomes of eight new and distinct strains of D. mccartyi. Two strains were found to contain CRISPR-Cas systems. Sequence data suggested the existence of circular extrachromosomal elements, some of which appeared to contain reductive dehalogenase genes. The purpose of this study was to verify the existence of these circular extrachromosomal elements and their relationship with CRISPR-Cas systems in these strains. The CRISPR array was found to target prophage and circular extrachromosomal elements. The CRISPR array itself was found to adapt over time acquiring three new spacers over eleven years. Active circularization of extrachromosomal elements, particularly the island containing the vcrABC operon, was inferred from sequence data and confirmed by polymerase chain reaction (PCR), providing the first clues to a mechanism for lateral transfer of DNA in D. mccartyi. The existence of a circular form of the genomic island containing the vcrABC operon also explains the periodic observation of higher vcrA to 16S rRNA gene copies in DNA samples from field sites. We have gained new insights to lateral transfer of dehalogenase genes, discovered new mobile elements and their connection with CRISPR-Cas systems in D. mccartyi.

60

4.1 Methods 4.1.1 Enrichment cultures The KB-1 set of enrichment cultures originated from microcosms prepared with aquifer materials from a TCE-contaminated site in southern Ontario in 1996 as previous described (82). The KB-1 parent enrichment culture, KB-1/TCE-MeOH, has been maintained with ~100 mg/L TCE as electron acceptor and methanol (MeOH) as electron donor, added at 5x the electron equivalents (eeq) required for complete dechlorination, as previously described (80-82). The parent culture was used to inoculate several sub-cultures that were established between 2001 and 2003 and maintained on different chlorinated acceptors, including daughter products cDCE (KB-1/cDCE- MeOH) and VC (KB-1/VC-MeOH), as well as 1,2-DCA (KB-1/1,2-DCA-MeOH). For additional culturing details refer to Chapter 2.

4.1.2 Metagenomic sequencing and genome assembly DNA extraction and genome assembly for the KB-1 enrichment cultures is described in Chapter 2. This chapter primarily focuses on genomes of Dehalococcoides mccartyi strain KBVC1 from the KB-1/VC-H2 culture and D. mccartyi strain KBDCA3 from the KB-1/1,2,-DCA-MeOH enrichment culture which contain CRISPR-Cas systems.

61

Figure 4-1. Overview of the eight Dehalococcoides mccartyi genomes closed from the metagenomes of KB-1 enrichment cultures. Enrichment sub-cultures are named by electron acceptor, electron donor and the date the sub-culture was first created. Electron acceptors include trichloroethene (TCE), vinyl chloride (VC), 1,2-dichloroethane (1,2-DCA) and cis- dichloroethene (cDCE) and donors are either methanol (MeOH) or hydrogen. Closed D. mccartyi genomes were assigned a strain name based on electron acceptor and rank abundance in the mixed culture. The number of reductive dehalogenase homologous genes (rdhA) per strain genome is indicated and any rdhA that has been functionally characterized is listed by name (vcrA, pceA, tceA, bvcA). Prophages identified in each genome are listed by name in purple. CRISPR-Cas systems are indicated in red.

62

4.1.3 PCR amplification of CRISPR-Cas Array and sequencing DNA samples dating back to 2002 had been taken periodically from the KB-1 enrichment cultures and stored at -80°C. DNA from samples archived from 2002 to 2012 was extracted using the UltraClean Soil DNA kit (Mo Bio Laboratories, Inc.) then DNA extracts were stored at -80°C. After 2012, DNA was extracted using the PowerSoil DNA kit (Mo Bio Laboratories Inc.). One set of primers were designed to anneal just outside of the array region in the I-E type D. mccartyi CRISPR region, a second was designed to anneal just outside of the I-C type CRISPR region using Geneious Pro 5.5.4 Primer Design Feature. Primers were searched against the NCBI nr bacteria database and against KB-1 metagenomes using Primer-BLAST to confirm that primers had one unique target. PCR reactions were amplified using a high-fidelity polymerase to increase speed and improve product quality (ThermoFisher Phusion high-fidelity DNA polymerase) using an MJ Research PTC-200 Peltier Thermal Cycler (for both sets of primers 98°C for 30sec, 30 cycles of: 98°C for 10 sec, 65°C or 59°C for 30 sec, 72°C for 1 min, with a final extension of 72°C for 5 min). PCR products were run on 1% TAE agarose gel to estimate product size. Additional primers were designed in order to sequence entire PCR products (all primers are in Appendix Table C1). PCR products were sequenced (Sanger) at the SickKids Centre for Applied Genomics (Toronto, Ontario).

4.1.4 Prophage sequence identification and CRISPR array sequence alignments A nucleotide BLAST search of metagenomics contigs was conducted in order to identify all phage sequences from the metagenomes, closed genomes and D. mccartyi genomes in NCBI (BLASTN 2.2.29). Sequences with BLAST hits were submitted to RAST (146) for annotation. Prophage regions were defined using PHAST (147) and Phage Finder (148) to identify and improve RAST annotations. Additional annotation was carried out using our in-house Phage Annotation Toolkit (PAT), which uses HMMs and genomic position to accurately annotate the morphogenetic genes of phage genomes. The KB-1 metagenomic contigs were trimmed to only include putative prophage regions to create a KB-1 putative prophage database containing twelve new prophages. Other strains of D. mccartyi in the NCBI database were also searched for prophages and seven additional prophages were identified in the genomes of strains 195, 11a5, CG1, CG3, WBC-2 (described in Ch.3) and BTF08. Prophages were named by strain of origin followed by a number in the case of multiple prophages. If the prophage sequence came from a KB-1 metagenome contig where strain name is unknown, the prophage was named by the culture

63 it originated from followed by a number. The naming convention excludes the prophage found in 11a5 which is named pg11a5 first described in S. Zhao, et al. (36).

The spacers from CRISPR arrays from five different D. mccartyi strains (KBVC1, KBDCA3, CBDB1, DCMB5 and GT) were aligned (BLASTN) against sequences from a database of D. mccartyi prophages (KB-1 and from NCBI genomes) and against KB-1 metagenome contigs (Discontiguous Megablast with max E-value of <0). If spacers hit regions of D. mccartyi genomes that were not prophages, those regions were further inspected to see whether the hit involved a putative mobilizable element. Genomic islands were identified with the assistance of IslandViewer (149). Islands were manually inspected for indicative features such as sequence composition bias, location near a tRNA gene, flanked by direct repeats and over-representation of mobility, virulence, phage-related and unknown genes.

4.1.5 Construction of Phylogenetic trees A database of two hundred and twenty-seven NCBI cas1 genes was updated with all D. mccartyi cas1 genes pulled from NCBI (KBVC1, KBDCA3, GT, CBDB1, DCMB5 strains). Sequences were aligned using MUSCLE (MUSCLE plugin for Geneious 8.1.8) and a maximum likelihood (ML) tree was constructed with 100 bootstraps (RAxML plugin for Geneious 8.18 using Gamma BLOSUM62 substitution matrix and rapid bootstrapping with searching for best-scoring ML tree). Clades were highlighted using FigTree 1.4.2.

Genomic islands from D. mccartyi whose sequences matched to CRISPR spacers were extracted, aligned and a phylogenetic tree was made based on this alignment using the same technique as above. Prophage sequences were aligned using kalign and a phylogenetic tree was made from these sequences using the same technique as above. The genoplotR package in R 3.2.5 (150) was used to create figures from these alignments and ML trees.

4.1.6 Quantitative PCR (qPCR) and PCR to track vcrA genomic island The abundance of D. mccartyi genes was measured by qPCR using 16S rRNA gene primers Dhc1f and Dhc264r (107) and primers vcrA670f and vcrA440r targeting the vcrA gene (53). All D. mccartyi have a single 16S rRNA gene per genome. Reactions were prepared in a PCR cabinet (ESCO Technologies, Gatboro, PA) and each qPCR reaction was run in triplicate. A concatenated DNA sequence comprised of four D. mccartyi gene fragments corresponding to the 16S rRNA gene and three partial reductive dehalogenase genes (including vcrA) was designed

64 and synthesized (IDT technologies). Geneious 8.1.8 DNA fold feature was used to choose the best orientation of gene fragments to reduce DNA folding, especially where primers were expected to anneal. The synthesized plasmid (Figure 4-2) was cloned into Escherichia coli using Invitrogen TOP10 cells. This single concatenated gene plasmid served as the standard for qPCR targeting both 16S rRNA and vcrA genes to obtain accurate ratios of vcrA to 16S rRNA gene copies.

Figure 4-2. Diagram of the concatenated gene plasmid used as a qPCR standard. The plasmid contains fragments of the D. mccartyi 16S rRNA, vcrA, bvcA and tceA genes.

All qPCR analyses were conducted using a CFX96 real-time PCR detection system, with a C1000 Thermo Cycler (Bio-Rad Laboratories, Hercules, CA). Each 20 µL qPCR reaction was prepared in sterile UltraPure distilled water containing 10 µL of EvaGreen® Supermix (Bio-Rad Laboratories, Hercules, CA), 0.5 µL of each primer (forward and reverse, each from 10 μM stock solutions), and 2 μL of diluted template (DNA extract at 1:10 or standard plasmid dilution series). The thermocycling program was as follows: initial denaturation at 95oC for 2 min, followed by 40 cycles of denaturation at 98oC for 5s, annealing at 60oC followed by extension for 10s at 72 oC. A final melting curve analysis was conducted at the end of the program. Calibration R2 values were 0.99 or greater and efficiencies were 80-110%.

PCR reactions were designed to amplify the vcrA genomic island (vcrA-GI) in either of two states (genomic or circular). The first set of primers targeted conserved genomic regions outside of the vcrA-GI yielding an amplicon of predicted length 19,361 bp (PCR 1). The second set of primers faced outwards from the vcrA gene targeting a circular version of the island, outward in both directions from the vcrA gene (PCR 2) of predicted length 10,426 bp. Primers were designed using Geneious 8.1.8 Primer Design feature. PCR reactions were performed using a high-fidelity polymerase (ThermoFisher Phusion) using an MJ Research PTC-200 Peltier

65

Thermal Cycler. All reactions started with an initial denaturation at 98°C for 30s, followed by 30 cycles of 10s of denaturation at 98°C, primer-specific annealing and elongation at 72°C, and a final extension of 5 min at 72°C. For the genomic primers (PCR 1), the annealing temperature was 59.5°C and elongation was 10 min at 72°C. For the vcrA-to-vcrA primers (PCR 2), the annealing temperature was 65 °C and elongation was 2 min at 72°C. PCR products were separated on 1% TAE agarose gel to estimate product size and check for non-specific amplifications. PCR products were sequenced (Sanger) at the SickKids Centre for Applied Genomics (TCAG) sequencing/synthesis facility (Toronto, Ontario). The sequences of all qPCR, PCR and internal sequencing primers are provided in Table C1.

4.2 Results and Discussion 4.2.1 Two new CRISPR-containing D. mccartyi genomes Metagenomes were sequenced from four related KB-1 mixed microbial cultures that have been grown and maintained since 2003 or earlier on TCE, cDCE, VC or 1,2-DCA. The metagenomes obtained from these cultures enabled the closure of eight new D. mccartyi genomes as illustrated and named in Figure 4-1 and described in Chapter 2. These closed genomes were found to contain as few as five, and as many as 22 reductive dehalogenase homologous (rdhA) genes (Table C2). Herein, we will focus on two of the eight closed genomes that were found to contain CRISPR-Cas systems: strains KBVC1 and KBDCA3 (Figure 4-3). The genome of strain KBVC1 was found to contain 22 rdhA genes including a gene that encodes for the functionally characterized vinyl chloride reductase, VcrA. The genome of strain KBDCA3 was found to contain only 9 reductive dehalogenase genes, including a gene for a second functionally characterized vinyl chloride reductase, BvcA (151).

66

Figure 4-3. Dehalococcoides mccartyi KBVC1 and KBDCA3 complete genomes and homology to related genomes. Rings inner to outer: (1) position on genome, (2) GC skew where green (+)/purple (-), (3-5) BLASTN hits (>70% identity) to other D. mccartyi strains with CRISPR systems including DCMB5, CBDB1, and GT, (6) open reading frames (ORFs). Predicted prophage genes are highlighted in purple, reductive dehalogenase (rdhA) in black, CRISPR-associated proteins in red.

4.2.2 Dehalococcoides mccartyi strains encode type I CRISPR-Cas systems In addition to the genomes of strain KBVC1 and strain KBDCA3, the genomes of three previously described strains, namely GT (79), CBDB1 (65) and DCMB5 (108), also encode CRISPR-Cas loci, bringing the total to five out of 24 sequenced genomes (Table C2). The genomes of strains KBVC1, GT and CBDB1 contain Class I type I-E systems, while strain KBDCA3 contains a Class I type I-C system. Strain DCMB5 was found to encode both type I-E and I-C CRISPR-Cas systems. The hallmark of the Class I CRISPR-Cas systems is the presence of a multi-subunit crRNA-effector complex. This ribonucleoprotein complex mediates the processing and interference stages of CRISPR-Cas activity. The signature gene for type I systems is cas3, which contains helicase and nuclease domains that unwind and cleave target DNA (152-154). Both type I-E and I-C CRISPR-Cas subtypes in D. mccartyi, have cas3 genes, and the order and presence of other cas genes does not differ from common examples of I-E and I-C systems across many different bacterial genera (153) (Figure 4-4).

67

Figure 4-4. CRISPR-Cas operon from D. mccartyi Type I-E and Type I-C. Black diamonds indicate repeats in CRISPR array – which differ based on stain (CBDB1, KBVC1, KBDCA3, GT or DMCB5).

In general, CRISPR-Cas systems are considered to be polyphyletic with many rearrangements and lateral gene transfer events occurring between organisms (153). The type I-E and I-C CRISPR-Cas systems identified in D. mccartyi likely arose from independent acquisition events. This is especially evident in strain DCMB5, which contains both type I systems (Figure 4-5). Each system sub-type is well conserved among different strains. The type I-E cas genes from strains KBVC1, GT, CBDB1, and DCMB5 share 99.5% pairwise nucleotide identity, while the type I-C cas genes of KBDCA3 and DCMB5 share 99.6% pairwise nucleotide identity. The CRISPR arrays themselves show low sequence identity due to the high variability in the spacer sequences. The I-E spacers from KBVC1 are not the same as the I-C spacers from KBDCA3. These systems were found in different enrichment cultures, which means they will be subject to different mobile DNA, furthermore different CRISPR systems may recognize different PAM motifs which would result in different spacer recruitment. The CRISPR arrays contain between 19 and 54 spacer sequences (Table C2 & Table 4-1). The unique sets of spacer sequences reflect adaptation of the CRISPR locus to the specific conditions experienced by each strain. We investigated the targets of CRISPR spacers in D. mccartyi to understand the nature and history of invasion by mobile genetic elements. We were able to identify likely targets of 45-86% of the CRISPR spacers in the six D. mccartyi CRISPR arrays. These spacers target primarily phages and genomic islands (Table 4-1 & 4-2).

68

Figure 4-5. CRISPR Cas1 maximum likelihood tree constructed from an alignment of 227 Cas1 protein sequences. Only Cas1 sequences found in closed genomes were used in the alignment. Cas1 come from different types of CRISPR systems. Systems with clear clades are highlighted in different colours and identified by type. Dehalococcoides mccartyi Cas1 are type I-E (orange) and I-C (blue) labeled by strain name. Scale shows number of amino acid substitutions per site. Shown is most likely tree of 100 bootstraps.

69

Table 4-1. Summary of D. mccartyi CRISPR-Cas system targets

Number of spacers that match Total D. mccartyi CRISPR specific types of mobile DNA* number strain type of Prophage IME1 Other IME rdhA IME Unknown spacers KBDCA3 I-C 15 9 0 3 10 37 41% 24% 0% 8% 27% DCMB5 I-C 17 5 2 1 29 54 31% 9% 4% 2% 54% DCMB5 I-E 18 3 2 1 4 28 64% 11% 7% 4% 14% CBDB1 I-E 11 3 0 0 5 19 58% 16% 0% 0% 26% GT I-E 18 3 2 1 13 37 49% 8% 5% 3% 35% KBVC1 I-E 12 6 1 1 21 41 29% 15% 2% 2% 51%

*IME: Integrative and Mobilizable Element; IME1 refer to type 1 IMEs as defined in the text, rdhA IME are IMEs that contain an rdhA gene.

70

Table 4-2. General features of mobile DNA found in Dehalococcoides mccartyi targeted by CRISPR-Cas system Type Prophage/Phage Integrative and mobilizable elements (IMEs) Class Prophage/Phage IME1 IME with rdhA gene IME2 Size (kbp) 6-40 21-23 4-11 8-9 GC content 49% 44% 41% 38% Integration sites At ~800 and ~300 kbp tRNA-Ala, Ile or lys Multiple tRNA or tmRNA At ~780 kbp Naming strain name - number IME1-number IME-rdhA present IME2-number convention

Contain phage-like Contain phage-like integration and excision integration and excision machinery. Observed to Predicted genomic General Siphoroviridae type virus machinery. Observed to replicate as a high copy island. Targeted by Description with lytic/lysogenic cycle. replicate as a high copy circular element or integrated CRISPR function circular element or within genome. Carry a unknown integrated within genome. reductive dehalogenase (rdhA) gene.

4.2.3 Dehalococcoides mccartyi CRISPR-Cas systems target phages and genomic islands Phages were the most common target of D. mccartyi CRISPR spacers (Table 4-1). We identified a total of 20 prophage sequences in D. mccartyi genomes (Figure 4-6). We identified twelve new prophage sequences in the KB-1 cultures sequenced in this work, and seven additional sequences were found in published D. mccartyi genomes from other strains available in NCBI. One had previously been identified in KB-1 (prophage KB/TCE-0) through an earlier sequencing project (115) and another in 11a5 (36). In the current study, we found a prophage sequence (prophage KBTCE1/KBVC2-1) that is 99.9% similar on a nucleotide basis to the KB-1 prophage previously reported and is presumably the same prophage. The few nucleotide substitutions observed were all aggregated in one region of the prophage (Figure 4-7) where the phage tail proteins are encoded (Table C4). Phage tail fibers interact with the cell surface and thus are typically subject to stronger adaptive selection pressure than the rest of the phage genome (155). D. mccartyi prophages appear to fall into three different groups based on open reading frames: i) prophage sequences similar to Escherichia coli phage HK97; ii) prophage sequences similar to Bacillus subtilus SPP1 phage; and iii) hybrid prophages with HK97-like heads and SPP1 tails (Table C3). The CRISPR spacers found in strain KBVC1 had matches to all twelve prophages found in KB-1 D. mccartyi genomes, such that interference would be expected without the addition of new spacers. The spacers in strain KBDCA3 matched seven KB-1 prophage sequences (Figure 4-6). An example of spacer match to a prophage sequence is depicted in Figure 4-8.

Figure 4-6. Maximum likelihood phylogenetic tree of prophages identified in D. mccartyi closed genomes including those from KB-1. Most likely tree of 100 bootstraps, scale indicates number of nucleotide substitutions per site. Spacer matches are highlighted with red bars (KBDCA3 I-C system) or blue bars (KBVC1 I-E system) with spacer number indicated below hit. Matches from all D. mccartyi spacers are provided in Table C3. Two incomplete and two highly similar D. mccartyi prophages were omitted but can be found in Table C4. Sequences corresponding to prophage morphological proteins are abbreviated by name in first instance and subsequently colour coded: ST – small terminase; LT – large terminase; PO – portal; HP – head protease; HD – head decoration; MH – major head; PC – packaging chaperone; TC – tail connector; TR – tail terminator; TT – tail tube; TG – tail assembly chaperone; TM – tail tape measure; HN – HTH endonuclease; SA – recombinase; DT – tail protein; BH – baseplate; HS – head scaffold; 6F – T6SS IcmF; RP – replication protein; VV – DSMS3 protein; IN – phage ; CR – HTH repressor; LM – tail tip; PG – PG ; C1 – connector 1; EP – encapsulin packaged protein; S4 – PMR-associated domain; HC - hcp1 Type VI SS.

73

Figure 4-7. MAUVE nucleotide sequence alignment diagram between two similar KB-1 prophages. Prophage KB1/TCE-0 was found in the KB-1/TCE-MeOH culture in 2007 and prophage KBTCE1/VC2-1 was found in the same culture in 2013. Solid block region indicates sequence homology across entire prophage sequence (blue). Sequence similarity profiles (black) correspond to the average level of conservation in the region. The height of the profile is inversely proportional to the average alignment column change over the alignment. In total 71 of 19,420 bp are different which results in 24 amino acid substitutions (based on predicted open reading frames).

Figure 4-8. Representation of Dehalococcoides mccartyi KBVC1 CRISPR type I-E Cascade action on prophage target. D. mccartyi crRNA from the CRISPR array from sequencing information is shown in blue. Protospacer adjacent motif (PAM) is shown in green. Matching DNA sequence from prophage KBDCA3-12 is shown in red. Stars indicate positions that were found to readily tolerate mutations in E. coli type I-E CRISPR system (143). Due to the small number of mismatches and none in the PAM or seed regions (seed comprises 8-12 PAM- proximal bases), we expect strain KBVC1 CRISPR system to be able to run interference against this prophage.

Eight of 41 spacers in the strain KBVC1 type I-E CRISPR-Cas system and 12 of 37 in the KBDCA3 type I-C system were found to match to regions of D. mccartyi genomes that appear to have been laterally acquired based on sequence information. These genomic islands (GIs) are

74 clearly distinct from prophage sequences because standard phage morphogenetic proteins are absent. What is common is the presence of genes coding for integration, excision and replication. These types of genomic islands, often referred to as integrative and mobilizable elements or IMEs (156, 157), encode their own integration, excision and replication proteins but rely on other helper proteins to be transferred outside of the cell. An alignment was created of all IMEs that were targeted by CRISPR-Cas systems (Figure 4-9). This alignment revealed one group of IMEs, referred to as IME1, which were similar at the nucleotide and predicted protein level. Integration of IME1 sequences is site-specific, mediated by a tyrosine recombinase and insertion is either at tRNA-Ala (seen in strains BAV1, KBDCA2, KBTCE1), tRNA-Ile (strains KBVC1, KBDCA2, WBC-2, and KBDCA1) or tRNA-Lys (strain 195). IME1 in the chromosome is flanked by repeats. Spacers from all D. mccartyi CRISPR-Cas systems had matches to IME1 family sequences.

75

Figure 4-9. Maximum likelihood phylogenetic tree of integrative mobilizable elements (IMEs) targeted by KBVC1 and KBDCA3 CRISPR-Cas systems. KBVC1 CRISPR spacer matches as blue bars with spacer number underneath, KBDCA3 spacer matches as red bars. (a) D. mccartyi (Dmc) IME1 family; (b) other IMEs. Most likely tree of 100 bootstraps, scale indicates number of nucleotide substitutions per site. Coding regions are shown as arrows and are identified by name, or by number as follows: 1- transcriptional regulator; 2- phage antirepressor; 3- recombinase/integrase; 4-Phage DNA primase/polymerase; 6- DNA segregation protein; 7- transposase. A star (*) indicates flanking repeat region. See Table C3 for tabular version of spacer matches to targets. Grey arrows are hypothetical coding regions.

4.2.4 Mobilization of IME1 sequences Illumina mate-pair and paired-end sequencing data suggested that IME1 sequences are capable of mobilization. The IME1 found in strain KBCV1 (Figure 4-3) had twice the read depth as the rest of the genome. Furthermore, mate-pair and paired-end information mapped reads back to the genome or back to itself. Searching through metagenomic contigs we found IME1 sequences that

76 had 700x and 900x higher read depth than the highest abundance D. mccartyi genome closed from that metagenome (Figure 4-10) indicating that IME1 sequences integrated into the genome can simultaneously occur as circular extrachromosomal elements. A review of the literature on IMEs revealed that such mobile elements have been observed to replicate autonomously using rolling circle replication or by hijacking integrative and conjugative element machinery (158) and that they can be either mutualistic or opportunistic (159). D. mccartyi strain 11a5 is reported to carry a plasmid-like circular extrachromosomal element (eDhc6) (109), however this element was found as a single copy per cell, and is not targeted by any of the D. mccartyi CRISPR-Cas systems. The IMEs identified herein have no sequence or predicted protein similarity to sequences in this plasmid-like element (eDhc6), and thus appear to be the first example of CRISPR-targeted IMEs in D. mccartyi.

77

Figure 4-10. Metagenomic evidence for circular existence of integrative mobilizable elements (IMEs) in D. mccartyi. (A) Illustration of partial D. mccartyi genome (strain KBTCE2) showing high read depth over IME region (IME1-8). Stain KBTCE2 has average read depth of 100x with ~700x over IME1-8 region. (B) IME1-1 found circularized from metagenomic data (mate-pair and paired-end reads map only to itself). Read depth is shown in blue ranging from 0 to 1000x. (B) Inner dark green circle indicates significant blast hits to IME1- 8 and IME1-2 (70-100% lighter to darker). Coding regions are shown as arrows: colored are annotated genes, in black showing conserved domains (domains annotated individually). In (A) 23S and 5S have high read depth due to multiple strains of D. mccartyi in this culture and 100% sequence identity across this region among all strains. See Figure 4-9 for different IMEs found.

4.2.5 Genes found in D. mccartyi IME1 sequences IME1 sequences (Figure 4-9a) are the second most frequent target of CRISPR-Cas systems (Table 4-1). These IMEs have a 70.5% pairwise nucleotide identity when aligned and contain similar coding regions. They are 21-23 kbp long with an average GC content of 44% (Figure 4- 9a). The most conserved protein that could be annotated is D. mccartyi integrase with 97.2% pairwise nucleotide identity, which is similar to the integrase (dsiB) that is found on the vcrA-GI (40). No IME1s were found to contain rdhA homologous genes. Other protein sequences that could be identified in IME1 include a transcriptional regulator from either the XRE, LexA or Cro/IC family, a bifunctional primase/polymerase and a protein thought to be involved in DNA

78 repair (Figure 4-9a). Proteins that could not be annotated were checked for the presence of conserved domains to help predict their function. Transcriptional regulators from the XRE, LexA or Cro/CI families are typically used by phages to regulate lytic growth (160-162). The bifunctional primase/polymerase is related to the phage P4 family of primases/polymerases. These are encoded in a single multifunctional gene, with primase, helicase and specific DNA binding activities and are used by phage to replicate as plasmids. The P4 plasmid phage uses this type of primase/polymerase to replicate as a high copy plasmid in its host E. coli. P4 relies on helper phage P2 for morphological functions related to an infectious state such as packaging and cell lysis (163-165). Double-stranded DNA repair proteins, although used in the maintenance of genomes, can also be used by prokaryotes during homologous recombination (166, 167).

All IME1 sequences contain a putative bifunctional primase/polymerase which belongs to a large family of proteins, referred to as the “D5 ”, containing five sub-families (NCBI). The D. mccartyi IME1 primase/polymerase belongs to the sub-family PRK07078 along with 355 other proteins in NCBI without any citing publications at this time. Upon closer inspection of PRK07078 proteins found in other bacteria with closed genomes, the primase/polymerase protein was also found in the vicinity of the same predicted coding regions as a D. mccartyi IME1 including an integrase and transcriptional regulator. Functionally speaking, it is possible that other bacteria also have IME1-like sequences; noting that the similarities found here are at the amino acid level, not at the nucleotide level. IME1-like constructs were found in a wide range of different bacteria genomes such as Ralstonia, Lysobacter, Desulfovibrio, Geobacter, and Nitrospira (Appendix Table C6 for full list). Twenty- five of 29 IME1-like sequences occurred directly adjacent to a complete phage, possibly functioning as a satellite phage much like P4 does in E. coli. Eight of 29 genomes also contained a CRISPR-Cas system, and 2 of 8 targeted its own IME1-like sequence (Table C6). In Ralstonia solanacearum (N. Peeters, et al. (168) for review) the construct is present in different strains and targeted by CRISPR-Cas which is the same case as D. mccartyi. This type of element possibly exists in a wide variety of bacteria, not just D. mccartyi, but has received little attention. The IME1s in D. mccartyi are important because of their similarity to the vcrA-genomic island, and provide the first hint to the mechanism of lateral gene transfer in this species. We defined a naming scheme to identify the different IME1 sequences discovered in this research (Figure 4-

79

9a). The name begins with Dmc (for host D. mccartyi), IME1, followed by a number (Table C5).

4.2.6 Integrative and mobilizable elements that contain reductive dehalogenase genes are also targets of CRISPR-Cas systems The CRISPR-Cas systems in D. mccartyi were found to primarily target prophages (Figure 4-6) and IME1s (Figure 4-9a) (Table 1). However, a few spacer sequences were found to target genomic islands that contain reductive dehalogenase genes, and once even a reductive dehalogenase gene itself (Figure 4-9b). The previously identified vcrA (40) and tceA (70) genomic islands were targeted, as well as genomic islands newly identified in strains BAV1 (rdhA OG 24) and DCMB5 (rdhA OG 49), where OG refers to the ortholog group to which the RdhA sequence belongs, as defined by L. A. Hug, et al. (14). Several spacers were found to target the integrase gene, dsiB, which is similar to the integrase found in IME1 sequences. Others were found to match regions unique to rdhA-genomic islands (Figure 4-9b, Table C2), suggesting that these islands could be specifically targeted by CRISPR-Cas systems.

4.2.7 IME1s and the vcrA genomic island both replicate via a circular intermediate While assembling contigs from the metagenome of the KB-1/VC-H2 culture, we found that the vcrA-containing genomic island had twice the read depth as the rest of the genome (Figure 4-11), similar to the high read depths we found over IME1 regions in strain KBVC1 (Figure 4-3). Furthermore, half of the mate-pair and paired-end sequencing information linked the island to itself suggesting the simultaneous occurrence of a circular intermediate and a chromosomal copy which would explain high read depth. To further investigate this possibility, triplicate bottles with sterile medium were inoculated with KB1/VC-H2 and amended with VC, methanol and ethanol grown to stationary phase. The abundance of vcrA gene copies relative to 16S rRNA gene copies over three successive dechlorination cycles was measured using quantitative PCR (qPCR) calibrated with a single concatenated gene plasmid standard to yield accurate relative abundances. In all three replicate culture bottles, we found that the ratio of vcrA to 16S gene copies was not constant: it varied from 1:1 to 2:1 repetitively (Figure 4-12a and Figure 4-13). Amplification of vcrA-GI using genomic primers on either side of the island was always successful (PCR 1), indicating a stable single copy of the integrated island per genome (Figure 4- 12b). We were also able to obtain the PCR amplicon for the circular element using outward- facing primers (PCR 2, Figure 4-12c), confirming that the element does exist in its circular form

80 as well. The circular vcrA-GI was detected by PCR even when the ratio of vcrA to 16S rRNA was close to one as measured by qPCR. It seems that a fraction of the D. mccartyi population is always producing circular vcrA-GI and that endpoint PCR is sensitive enough to detect this – especially after 30 cycles as used in this study.

Figure 4-11. Illustration of the circularization of the vcrA genomic island (GI). PCR reactions used to verify sequencing results are shown targeting the vcrA genomic island integrated within the genome (PCR 1) or in circular form (PCR 2). Read depth observed over the vcrA island compared to the rest of the genome is shown in blue where darker blue indicates read depth > 120.

81

Figure 4-12. Evidence of circularization of the vcrA genomic island. Panel a) shows vinyl chloride (VC) and ethene concentrations (left axis) and gene abundances (right axis) as VC is dechlorinated to ethene in a KB-1 sub-culture. The culture was purged and re-fed on days indicated by black arrows. The gene copies per mL of culture of D. mccartyi 16S rRNA (Dhc, grey diamonds) and vinyl chloride reductase (vcrA, black squares) were tracked using qPCR, and the copy number ratio is shown above each point. Panels b) and c) show the agarose gel images with amplification products generated using reactions PCR 1 and PCR 2 (illustrated in Figure 4- 11) on the same DNA samples (days indicated above gel) used for qPCR analyses shown in panel a). The PCR1 amplicon size (panel b) is always the expected 19,361 bp indicating one copy of the island integrated in the genome; this gel was run with the Lambda/HindII ladder (L). The PCR2 amplicon (panel c) confirms simultaneous detection of the circular vcrA genomic island with expected 10,426 bp fragment size; this gel was run with the 10kbPlus ladder (L). Expected fragment sizes are indicated with black arrows. PCR products were verified by Sanger sequencing.

82

Figure 4-13. Quantitative PCR (qPCR) and gas chromatography (GC) tracking of two KB- 1/VC-H2 cultures during routine growth. Vinyl chloride (VC) is degraded to ethene (left axes). The number of gene copies of D. mccartyi 16S rRNA (Dhc, grey triangle) vinyl chloride reductase (vcrA, black square) tracked using quantitative qPCR (right axes). Cultures were maintained in batch mode, with periodic purging of gases and re-feeding with VC as indicated on graphs with black triangles. Graphs A and B are two of triplicate cultures studied. The third culture can be found in Figure 4-12. All qPCR raw data is in Table C7, standard curves C8.

Four observations could be made from the DNA analyses in these cultures: (1) the ratio of vcrA to D. mccartyi 16S rRNA gene copies was not associated with any particular time point during batch-style feeding dechlorination; (2) when the ratio was greater than 1, it is because of a higher number of copies of vcrA not a lower number of copies of 16S rRNA genes; (3) vcrA copies doubled within the time span of 2-3 days or less, faster than typical D. mccartyi growth in these cultures; and (4) amplification of the circular state was successful. Sequence annotation cues 83 suggest that the vcrA genomic island is mobilizable (40) and several studies have reported the puzzling result of finding many more D. mccartyi vcrA copies than 16S rRNA gene copies (44, 45, 169) without an explanation other than PCR error since vcrA has never been found in any species other than D. mccartyi. Periodic replication of the vcrA island as an independent extrachromosomal circular element now explains these observations. These data indicate that the vcrA genomic island and possibly other rdhA-containing genomic islands can excise from the genome and circularize as integrative and mobilizable elements (IMEs).

4.2.8 Type I-E CRISPR-Cas is active in KB-1 and adapted to invading DNA over time Through a combination of PCR and sequencing of archived frozen DNA, we were able to determine that the type I-E CRISPR-Cas system in strain KBVC1 actively recruited three new spacers over eleven years of maintenance of this batch culture in our laboratory (Figure 4-14). Two spacers were added at the promoter region of the array, and one integrated in the middle of the array, which is less common but has been observed in other bacteria (170-172). The new spacer occurring in the middle of the array had similarity (BLASTN) to a phage portal protein from a D. mccartyi prophage (Figure 4-14). The second new spacer occurring at the promoter region of the array was a match to a D. mccartyi prophage capsid. We could not identify the target for the third spacer (Figure 4-14).

84

Figure 4-14. Evidence for the extension of D. mccartyi KBVC1 CRISPR array and new targets acquired. (a) Agarose gel showing PCR product of CRISPR array from KB-1/VC-H2 amplified from DNA extracted over time. DNA extracted in different years was stored at -80oC. Amplified products were sequenced with internal primers; primers used are provided in Table C1. Expected 1,725bp PCR product produced (2013-2016). Smaller 1,542 bp PCR product produced from 2002 DNA. (b) Schematic of partial CRISPR array in 2016 compared to 2002 indicating the addition of three new spacers as determined from sequencing of PCR products. Repeats are shown as black diamonds. Blue rectangles indicate spacer match to prophage. Yellow rectangles indicate spacer match to D. mccartyi IME1. Only 24 spacers of the 41 spacers are shown in the region where new spacers were added; cross-hatches indicate where the other spacer-repeat sequences occur. New spacers are identified by spacer number (1, 2 and 20). (c) New spacers have best matches to two different D. mccartyi prophages indicated with black arrows. Prophages are annotated using same abbreviations as in Fig. 4-6 and Table C4.

Further evidence to the functioning of the CRISPR-Cas system can be inferred from the sequences of multiple strains inhabiting the same enrichment cultures. The highest abundance strain (KBVC1) in the KB-1/VC-H2 enrichment culture contains a CRISPR-Cas system while the lower abundance strain (KBVC2) does not. Strain KBVC2 without a CRISPR-Cas system contains two prophages (KBTCE1/KBVC2-1 and KBTCE1/KBVC2-5) in its genome, while strain KBVC1 does not, and its I-E CRISPR-Cas system has spacer matches to these prophages (Table C3, Figure 4-6). Similarly, in the KB-1/1,2-DCA-MeOH enrichment culture, the prophage found in the two most abundant strains (KBDCA1/KBDCA2-6) is not found in the lower abundance strain KBDCA3 that harbors the I-C CRISPR-Cas system and a spacer match to this prophage (Table C3, Figure 4-6).

85

4.2.9 CRISPR-Cas in other KB-1 Cultures over time Two PCR reactions were designed to target the I-E and I-C KB-1 D. mccartyi CRISPR arrays.

Described above, we found the array in KB-1/VC-H2 culture grow over time. We also tested KB- 1/TCE-MeOH, KB-1/cDCE-MeOH and KB-1/1,2-DCA-MeOH cultures using the same primer sets. Starting with the parent culture (KB-1/TCE-MeOH) the I-E CRISPR system could be amplified from all DNA samples over time, however the CRISPR I-C system could only be amplified from 2002 and 2004 DNA. Apparently over time the abundance of the strain containing the I-C system fell below the detection limit of this method. At the time of Illumina DNA sequencing any strains containing the CRISPR I-E system in the parent (KB-1/TCE- MeOH) culture were in low enough abundance that their genomes could not be assembled. Both the I-E and I-C PCRs yielded products visible on agarose gel from the KB-1/cDCE-MeOH and KB-1/1,2-DCA-MeOH cultures (Table 4-3 and Figure C1 for gel photos).

Table 4-3. Summary of PCR amplification of CRISPR I-E and I-C array from different KB-1 enrichment cultures over time. NT- not tested. Gel photos in Figure C1.

In-house DNA Lab CRISPR I-E CRISPR I-C KB-1 Enrichment name of extraction member visible band visible band Culture culture date responsible amplified? amplified?

KB-1/VC-H2 HTVC2 May 18, 2016 OM yes no

KB-1/VC-H2 HTVC2 July 15, 2015 OM yes NT

KB-1/VC-H2 HTVC2 Oct 26, 2015 OM yes NT

KB-1/VC-H2 HTVC2 June 3, 2013 LL yes NT

KB-1/VC-H2 HTVC2 Nov 9, 2002 MD yes NT KB-1/cDCE-MeOH cisAW May 18, 2016 OM yes yes KB-1/cDCE-MeOH cisAW Oct 26, 2015 OM yes yes KB-1/cDCE-MeOH cisAW June 3, 2013 LL yes yes KB-1/TCE-MeOH T3MP1 May 18, 2016 OM yes no KB-1/TCE-MeOH T3MP1 Apr 20, 2016 OM yes no KB-1/TCE-MeOH T3MP1 Oct 26, 2015 OM yes no KB-1/TCE-MeOH T3MP1 Dec 14, 2015 OM yes no KB-1/TCE-MeOH T3MP1 June 10, 2013 ST yes no KB-1/PCE-MeOH T5 PCE Mar 15, 2004 MD yes yes KB-1/TCE-MeOH T5 TCE Nov 9, 2002 MD yes yes KB-1/1,2DCA-MeOH 1,2DCA AW Apr 22, 2016 OM yes yes KB-1/1,2DCA-MeOH 1,2DCA AW Apr 21, 2013 LL yes yes

86

4.3 Conclusions This analysis of the CRISPR-Cas system in D. mccartyi has led to the discovery of circular extrachromosomal elements, defined as the IME1 family that shares commonalities with vital rdhA containing IMEs. Both IME1 and the vcrA-genomic island were found to replicate independently via a circular intermediate. Considering that all D. mccartyi genomes contain predicted comEA competence genes it is possible that lateral gene transfer by transformation occurs in this genus. The mechanism by which D. mccartyi exports IME1 and rdhA-containing IMEs is yet to be determined, however all D. mccartyi genomes also harbour genes for Ftsk/SpoIIIE domain involved in DNA export in other genera. Specifically, Actinobacteria use a single FtsK/SpoIIIE-like translocation protein to move double-stranded plasmids outside the cell (173-176). Integrative and conjugative elements in Actinobacteria, once excised from the chromosome, also replicate autonomously before translocation (177, 178).

It is interesting to consider why CRISPR-Cas-containing D. mccartyi strains would reject incoming DNA that encodes new reductive dehalogenase genes that may be favorable to growth especially since D. mccartyi are so reliant on these genes. Other bacteria who maintain CRISPR- Cas systems have also sometimes been found to target beneficial or conjugative DNA indicating that these systems are in a constant state of flux between beneficial or negative consequences of CRISPR-Cas maintenance (179-181). From our strains we did not find any relationships between CRISPR-Cas and strain abundance suggesting that in our cultures at this time there isn’t a clear benefit or cost. Bacteria may have other ways to resist phage predation, so the CRISPR-Cas system may not need to play a significant role thus may not be selected. One explanation could be related to the energetic cost of IMEs and the fact that our laboratory cultures are constantly maintained on the same chlorinated substrates and do not require new genes. Considering that the average genome size of D. mccartyi is 1.4 Mbp and some of these IMEs such as IME1 are up to 23kbp in length, if replicated six times (using an average of six from read depths of IMEs identified from metagenomic sequencing), D. mccartyi is expending ~8% of the energy required to replicate their entire genome on replicating IMEs. This energy requirement is even higher for strains that have more than one IME per genome.

87

The analysis of the CRISPR-Cas systems and mobile elements across a growing set of Dehalococcoides genomes has enabled the discovery of different types of actively replicating extrachromosomal elements, and the demonstration of a growing CRISPR array. This study has begun to explain observations of higher ratios of vcrA and other rdhA genes to 16S rRNA gene copies in experimental data and has expanded our understanding of population dynamics and mobile DNA in D. mccartyi, thus pointing towards avenues of future research to decipher mechanisms behind lateral gene transfer, especially of reductive dehalogenase genes.

88

Chapter 5 Vinyl chloride reductase (vcrA) containing genomic island circularizes in Dehalococcoides mccartyi found in the KB-1 and WBC-2 microbial consortia

5.1 Introduction Dehalococcoides mccartyi are ancient microorganisms with one of the smallest genomes found in a free-living bacteria. In Chapter 2, we found evidence that some dehalogenase genes have been horizontally transferred based on their high sequence conservation and presence on genomic islands (GIs). These dehalogenases are simultaneously all of the dehalogenases that have been partially biochemically characterized and play an important role in bioremediation processes. The question remains, how does D. mccartyi share dehalogenase genes?

In 2011 P. J. McMurdie, et al. (40) identified a special case GI in D. mccartyi which contains the complete vcrABC operon coding for vinyl chloride reductase VcrA. The vcrA gene is routinely used as a biomarker at contaminated sites during in-situ bioremediation to test for VC degrading ability by the natural community (41, 42). Interestingly, there have been multiple reports of higher vcrA copy numbers than highly conserved, single-copy D. mccartyi 16S rRNA genes from field and laboratory samples (43-45), yet vcrA has never been found in any species other than D. mccartyi (45). Additionally, high copy numbers of vcrA have been found to correlate to high levels of VC rather than ethene (45). Sites at which PCE or TCE degradation results in the accumulation of VC have been a subject of debate in research with significant economic and environmental consequences (46).

The vcrA-GI is always integrated after a tm-rRNA gene followed by an integration module consisting of ccrB, pinR, recF, parBc, mom and hyp-rnap genes thought to be responsible for the integration of the GI, followed by the vcrABC operon. The vcrABC operon is highly conserved showing 98% pairwise identity (40). Two different types of integration modules exist that are 77.4% similar to each other. Within the first type there are two strains which have a deletion of mom and parBc genes (PM and EV) (40). The integration modules occur in different strains of D. mccartyi and are not geographically specific for example KB-1 has strains with both types of integration modules.

In Chapter 4, the vcrA island in KB-1 was found to periodically exist as a circular extrachromosomal element. The purpose of this Chapter was to further investigate under which

89 conditions this occurred, and whether it could be reliably replicated in a laboratory setting for further study. Theorized is the premise that stressors might trigger mobilization of the vcrA-GI similarly to temperate phage, which lead to the experiments described herein.

5.2 Methods 5.2.1 Developing an enrichment culture for analysis The KB-1 cultures were originally enriched from aquifer materials at a TCE contaminated site in southern Ontario in 1996 as previous described (61), and discussed in Chapter 2. The West Branch Canal Creek (WBC-2) consortium was originally enriched from contaminated wetland sediment at Aberdeen Proving Grounds (APG) MD, U.S.A as described in E. J. P. Jones, et al. (125) and Chapter 3. In early 2007, one litre of WBC-2 was sent to the Edwards Laboratory at the University of Toronto and separated into enrichment cultures with different terminal electron acceptors using 25% transfers into anaerobic mineral medium including 1,1,2,2- tetrachloroethane (TeCA) with 5x Ethanol and 3.5x sodium lactate (126). In 2010 an enrichment culture was created from the TeCA parent culture in a 1:10 transfer and enriched on vinyl chloride (VC) referred to as the WBC-2/VC-EL culture.

5.2.2 Metagenomic sequencing and genome assembly Both KB-1 and WBC-2 VC enrichment cultures were sent for Illumina sequencing in 2013. The KB-1 DNA sequencing is described in Chapter 2, the WBC-2 DNA sequencing is described in Chapter 3.

5.2.3 PCR amplification, qPCR and Sanger sequencing PCR and qPCR was conducted using the same reagents developed in Chapter 4. Briefly, the abundance of two D. mccartyi genes was measured by qPCR (Described in Chapter 2) (97), and the vinyl chloride reductase gene, vcrA, using primers vcrA670f (5’- GCCCTCCAGATGCTCCCTTTAC-3’) and vcrA440r (5’-TGCCCTTCCTCACCACTACCAG- 3’)(53). A plasmid with four D. mccartyi gene fragments was designed using Geneious 8.1.8 DNA fold feature to choose the best orientation of gene fragments to reduce DNA folding, especially where primers were expected to anneal. The synthesized plasmid (Figure 4-2) was cloned into Escherichia coli using Invitrogen TOP10 cells. This concatenated gene plasmid served as the standard for qPCR for both the 16S rRNA and vcrA genes to obtain accurate ratios of vcrA to 16S rRNA gene copies.

90

All qPCR analyses were conducted using a CFX96 real-time PCR detection system, with a C1000 Therma Cycler (Bio-Rad Laboratories, Hercules, CA). The thermocycling program was as follows: initial denaturation at 95oC for 2 min, followed by 40 cycles of denaturation at 98oC for 5s, annealing at 60oC followed by extension for 10s at 72 oC. A final melting curve analysis was conducted at the end of the program. Calibration R2 values were 0.99 or greater and efficiencies were 80-110%.

The same two PCR reactions used in Chapter 4, were again used here. The first set of primers targeted the conserved genomic region outside of the vcrA island genomic island (vcrA-GI) named PCR 1. The second set of primers faced outwards from the vcrA gene targeting a circular version of the vcrA genomic island referred to as PCR 2. A third set of primers targetting the circular element were designed which amplified a shorter product named PCR 3. The third set was designed in case the first set – which amplifies 10,426 bp, about 900 bp short of amplifying the entire circular element was sensitive to DNA degradation. PCR 3 primers amplify 1,975 bp from the beginning of the vcrA gene to 669 bp after the insertion repeat. The primers for PCR 3 are PCR3f 5'-TATACAGTATATGGTCCAGCAC-3' and PCR3r5'- TCAACACTTTGTCCTAAATCTA-3'. Primer sequences for PCR1 and 2 are in Table C1. PCR reactions were amplified using a high-fidelity polymerase (ThermoFisher Phusion high-fidelity DNA polymerase) using an MJ Research PTC-200 Peltier Thermal Cycler, initial denaturation at 98°C for 30sec, followed by 30 cycles of: 98°C for 10 sec, reaction specific annealing temperature for 30 sec was 59.5°C for the PCR 2 primers and 65 °C for PCR 1 primers and 52.7°C for PCR3 primers , extension at 72°C for 2 min (PCR 2 and PCR 3) or 10 min (PCR 1), with a final extension of 72°C for 5 min. PCR products were separated on 1% TAE agarose gel to estimate product size and check for non-specific amplifications. PCR products were sequenced (Sanger) at the SickKids Centre for Applied Genomics TCAG DNA sequencing/synthesis facility (Toronto, Ontario).

5.2.5 Experimental conditions – changing feeding conditions to upset routine culture growth The KB-1/VC-H2 culture was used to inoculate three 250 mL glass butyl rubber stoppered bottles in a 1:4 dilution into anaerobic mineral medium with 50x higher concentration of vitamins compared to regular medium used in the Edwards Laboratory. This series of cultures were labeled a to c with prefix “H” i.e., Ha, Hb and Hc. At the same time three cultures in 250

91 mL glass butyl rubber stoppered bottles were cultured using the WBC-2/VC-EL parent culture in the same dilution ratio and same medium as above, named a to c with prefix “W”: Wa, Wb and Wc.

The KB-1/VC-H2 cultures were originally grown with 5x electron donor required for complete dechlorination in hydrogen gas. In order to speed up growth the new cultures were changed to 5x methanol and 3.5x ethanol (to mirror WBC-2’s donor excess and donor complexity of 5x ethanol and 3.5x lactate). The first dechlorination cycle was fed 17 mg/l VC and 5x H2 the same as the parent culture. Dechlorination proceeded slowly and was not complete after 60 days. On day 60, 5x MeOH and 3.5x EtOH was added, complete dechlorination to ethene occurred within 14 days. The second and subsequent cycles were 17 mg/l VC and 5x MeOH with 3.5x EtOH continuing to dechlorinate to ethene in ~ 14 days (data available in Appendix Figure D1).

After 144 days and six dechlorination cycles in the Ha, Hb and Hc cultures, the three cultures were mixed together (total 600 mL of culture) with equal volume fresh mineral medium in order to create more cultures. Six new 250 mL bottles were created each containing 200 mL of culture/medium mix. Three of six were intended to serve as controls continued to carry the names Ha, Hb and Hc. The other three were named Sa, Sb and Sc using the prefix “S” short for “switch”, since their growth conditions would be switched during the experiment. Prior to the start of the experiment all six cultures were fed 25 mg/l VC and 5xMeOH and 3.5x EtOH and grown over three dechlorination cycles (34 days) to stationary phase. For the fourth dechlorination cycle control cultures (Ha, Hb and Hc) were kept at regular feeding conditions (25 mg/l VC and 5x MeOH 3.5x EtOH) while Sa, Sb, and Sc cultures were then switched to 25 mg/l VC and no donor to upset routine feeding conditions. No dechlorination was detectable at the fourth cycle in Sa, Sb and Sc. The Sa-c cultures were purged and re-amended with 25 mg/l VC after 20 days. Again no dechlorination occurred for a subsequent 25 days upon which the cultures were amended with 5x MeOH/3.5x EtOH which re-initiated dechlorination at regular rate. During this time control cultures Ha-c continued to dechlorinate VC at the same rate as before.

The three WBC-2 cultres Wa-c dechlorinated VC (25 mg/l) to ethene in 14 days after transfer, same as the parent culture. The WBC-2 cultures were grown for 157 days and completed 10 dechlorination cycles. In order to upset routine growth, for the 11th dechlorination cycle the three

92 cultures were fed twice as much VC (50 mg/l rather than 25 mg/l) but only 1x donor. At the 12th cycle the culture was fed 50 mg/l VC and no electron donor, however unlike KB-1 Sa-c the WBC-2 cultures continued to dechlorinate VC. At the 13th cycle the culture was fed again 50mg/l VC and no electron donor. DNA was taken before electron donor/acceptor conditions were changed and over the three subsequent cycles.

GC measurements were taken three times a week and after GC sampling, 2 mL of culture was removed and pelleted (13,000 g, 15 min, 21°C Eppendorf centrifuge 5417R) decanted and frozen at -80◦C. DNA samples for the WBC-2 series were taken Dec 18 and 21, 2015 – before changing feeding conditions – and Dec 22, 2015 – after changing feeding conditions – and three times a week until Feb 22, 2016 for a total of 22 DNA samples. KB1/VC-H2 series DNA was sampled Jan 13, 2016 (before changing feeding conditions for Sa, Sb, Sc cultures) on Jan 15, 2016 (after changing feeding conditions for Sa, Sb and Sc), then three times a week until Feb 22, 2016 for a total of 15 DNA samples.

5.2.6 Statistics In this study we were interested in the absolute number of copies of vcrA gene in comparison with the D. mccartyi 16S rRNA gene. There is a known error associated with qPCR of approximately half an order of magnitude. To reduce this error the concatenated gene plasmid was created so that one standard could be used to quantify both genes. To normalize between cultures and between standards the ratio of vcrA to D. mccartyi 16S rRNA was used to compare different enrichment cultures to each other. A t-test was conducted on the average ratio of vcrA to D. mccartyi 16S rRNA measured in DNA samples under normal conditions, and compared to DNA samples after donor/electron acceptor amounts had changed. A major limitation in this test are the effects of time, which are unknown. All analyses were conducted using R 3.4.0.

5.3 Results and Discussion 5.3.1 Assembly of two vcrA containing D. mccartyi genomes Two D. mccartyi genomes were assembled from the parent KB-1/VC-H2 culture. Please refer to Chapter 2 for the detailed discussion of these genomes and all D. mccartyi genomes closed from the KB-1 enrichment cultures. Briefly, the two genomes are both Pinellas type D. mccartyi named KBVC1 (NCBI accession CP019968) and KBVC2 (CP019969). KBVC1 is 1.39 Mbp with GC content of 47.3% with 22 rdhA genes including VC reductase vcrA. KBVC2 is 1.35 Mbp with a GC content of 47.2% and 16 rdhA genes including vcrA. A whole genome alignment

93 results in 82.3% pairwise nucleotide identity. The core genome is 99.3% identical; such high similarity across the core genome is common for D. mccartyi from the same clade (39, 40).

In Chapter 4, I reported that the vcrA genomic island was identified in a circular extrachromosomal state in the KB-1/VC-H2 culture. The read depth of the vcrA-GI was twice as high as the rest of the KBVC1 and KBVC2 genome read depth. Mate-pair and paired-end reads suggested the simultaneous occurrence of a circular extrachromosomal copy of the vcrA-GI. PCR and qPCR was used to verify existence of circular copy (Figure 5-1). The purpose of this study was to determine whether circularization could be controlled in the laboratory by changing culturing conditions.

Figure 5-1. Illustration of the circularization of the vcrA genomic island (GI). PCR reactions used to verify sequencing results are shown targeting the vcrA genomic island integrated within the genome (PCR 1) or in circular form (PCR 2 and PCR 3). Read depth observed over the vcrA island compared to the rest of the genome is shown in blue where darker blue indicates read depth > 120.

5.3.2 Limiting electron donor changed the ratios of D. mccartyi vcrA gene copies to 16S rRNA gene copies in WBC-2 but not KB-1 subcultures For comparison purposes the number of copies of vcrA in each sample was divided by the number of copies of D. mccartyi 16S rRNA gene from each DNA sample measured. This ratio was compared between different DNA samples from different cultures. We changed feeding conditions and monitored dechlorination activity as well as the vcrA to D. mccartyi 16S rRNA gene ratio. Feeding conditions were changed by feeding only VC, without or with limiting electron donor source. The KB1/VC-H2 progeny cultures cycled through periods where vcrA was

94 higher than D. mccartyi 16S rRNA at a similar frequency during the study period regardless of whether feeding conditions were changed (Table 5-1, t-test: df=29.6, p=0.2241, Figure 3-13 and 5-2). In the WBC-2 progeny cultures, the number of copies of vcrA gene was on average higher than D. mccartyi 16S rRNA gene after feeding conditions were changed (Table 5-1, t-test: df = 38.5, p=0.001705, Figure 5-3). In WBC-2, all three continued to degrade VC and produce methane even after no donor was added. Because methane continued to be generated, it is likely that WBC-2 does not consume all of the donor provided during one dechlorination cycle. In KB- 1, cultures without donor did not produce methane and did not dechlorinate VC.

Table 5-1. Proportion of time culture was found to have equal to or greater than 2 copies of vcrA for every D. mccartyi 16S rRNA gene copy as determined by qPCR.

Lineage Culture name Condition # DNA samples Proportion of samples ≥ 2:1 KB-1 Ha Control 13 30% KB-1 Hb Control 15 27% KB-1 Hc Control 15 47% KB-1 Sa No Donor 12 25% KB-1 Sb No Donor 12 50% WBC-2 Wa Control 7 0% WBC-2 Wb Control 7 14% WBC-2 Wa Limited Donor 15 6% WBC-2 Wb Limited Donor 15 47%

95

KB-1/VC-H2 Sa 0.5 4.4 8E+09 0.45 6E+09 0.4 1.2 1.8 0.35 1.7 1.3 4E+09 0.3 1.2 1.2 1.4 1.4 1.4 1.4 vcrA 0.25 2E+09 1.0 Dhc 0.2 mmol/bottle 0 0.15 Purge, feed 4mL VC, NO donor VC 0.1 genecopies/mL culture -2E+09 0.05 ethene 0 -4E+09 0 5 10 15 20 25 30 35 Day

KB-1/VC-H2 Sb 1.6 0.5 6E+09 0.45 Dh 1.8 5E+09 0.4 c 1.7 4E+09 1.8 1.4 0.35 Dh 1.3 1.5 3E+09 0.3 c Dh 1.4 1.9 1.5 Dh Dh 2.0 1.5 0.25 c Dh vcrA 2E+09 c c Dh 0.2 c Dh Dh Dh Dhc Purge, feed 4mLVC, NO donor c Dh Dh 1E+09 genecopies/mL culture mmol/bottle c c c 0.15 c VC c Dhc 0 0.1 Dhc -1E+09 0.05 ethene 0 -2E+09 0 5 10 15 20 25 30 Dhc 35 Date

Figure 5-2. Quantitative PCR (qPCR) and gas chromatography (GC) tracking of two KB- 1/VC-H2 cultures (Sa and Sb) during routine growth. Vinyl chloride (VC) is degraded to ethene (left axes). The number of gene copies of D. mccartyi 16S rRNA (Dhc, grey triangle) vinyl chloride reductase (vcrA, black square) tracked using quantitative qPCR (right axes). Cultures were maintained in batch mode, with periodic purging of gases and re-feeding with VC as indicated on graphs with black triangles. The ratio of vcrA to Dhc 16S rRNA is indicated above qPCR data points.

96

WBC-2/VC-EL Wb 0.8 1.1 3.0E+09 1.0 1.4 0.7 1.5 1.3 1.4 2.5E+09 1.3 1.6 1.3 2.0E+09 0.6 1.11.2 1.3 1.1 1.0 1.4 1.5 1.5 1.5E+09 1.3 2.9 0.5 1.1 2.7 2.9 1.0E+09 0.4 Purge, feed 4mL VC Feed 5mL VC 5.0E+08 8.5 x donor

0.3 1x donor Purge, feed 5mL VC 0.0E+00 mmol/bottle NO donor -5.0E+08 0.2 -1.0E+09 genecopies/mL culture 0.1 -1.5E+09 0 -2.0E+09 0 10 20 30 40 50 60 Day WBC-2/VC-EL Wa 0.8 3.E+09 1.2 1.5 2.1 0.7 1.4 1.2 1.2 2.E+09 1.2 1.3 0.6 1.11.1 2.E+09 1.21.2 1.0 1.0 1.3 1.4 1.4 1.4 1.E+09 0.5 1.11.4 1.3 5.E+08 0.4 Purge, feed 4mL VC Feed 5mL VC 0.E+00 8.5 x donor 1x donor Purge, feed 5mL VC 0.3 mmol/bottle NO donor -5.E+08 0.2 -1.E+09 genecopies/mL culture 0.1 -2.E+09 0 -2.E+09 0 10 20 30 40 50 60 Day Figure 5-3. Quantitative PCR (qPCR) and gas chromatography (GC) tracking of two WBC-2/VC-EL (Wa and Wb) cultures during routine growth. Vinyl chloride (VC) is degraded to ethene (left axes). The number of gene copies of D. mccartyi 16S rRNA (Dhc, grey triangle) vinyl chloride reductase (vcrA, black square) tracked using quantitative qPCR (right axes). Cultures were maintained in batch mode, with periodic purging of gases and re-feeding with VC as indicated on graphs with black triangles. The ratio of vcrA to Dhc 16S rRNA is indicated above qPCR data points.

97

5.3.3 Amplification of DNA using PCR suggests vcrA genomic island excises from genome as a circular intermediate. In Chapter 4, the vcrA-GI was found to exist as a circular extrachromosomal element supported by data from qPCR, PCR and DNA sequencing. During primer testing, the parent KB-

1/VC-H2 culture, Ha culture, Wa culture, and KB-1/TCE-MeOH DNA was used as possible template DNA. At the time of PCR and qPCR only the KB-1/TCE-MeOH parent culture had higher copies of vcrA to D. mccartyi 16S rRNA, and was also the only culture for which the circular element could be amplified and visualized on a gel (Figure 5-4). Using these PCR and qPCR techniques, we find that it is possible that no circular elements are amplified when vcrA to 16S rRNA ratios are close.

98

Figure 5-4. A) and B) Agarose gel (1% TAE) of PCR amplified region of circular vcrA genomic island from DNA taken from KB-1/VC-H2 parent culture, KB-1/VC-H2 Ha, KB-1/TCE-MeOH and WBC-2/VC-EL (Ladder used is GeneRuler 10kb plus, expected PCR amplicon size is 1,975 bp). C) Quantitative PCR of the same DNA use in A) and B) PCR reactions targeting D. mccartyi (Dhc) 16S rRNA and vinyl chloride reductase (vcrA) genes.

5.4 Conclusions From Chapter 4, we identified that the vcrA island periodically circularizes separate from the genome in KB-1 D. mccartyi. In this chapter a preliminary experiment was carried out to see whether changing the amount of electron donor and electron acceptor fed could induce vcrA island circularization. Incidentally, all cultures, whether experiencing a change in feeding

99 amounts or not had periods where the number of copies of vcrA exceeded the number of copies of D. mccartyi 16S rRNA. The ratio of vcrA to D. mccartyi 16S rRNA was significantly different after changing feeding regime in WBC-2 cultures, but not in KB-1 cultures. This was the first study of vcrA island circularization, which will hopefully help direct future experiments. The most important finding from this chapter is that this phenomenon is not unique only to KB-1 D. mccartyi from the KB-1/VC-H2 culture. We observed circularization of vcrA also in the KB- 1/TCE-MeOH parent culture, as well as in WBC-2 VC parent culture, and WBC-2 VC subcultures.

Based on sequencing information it was previously hypothesized that the vcrA genomic island was transferred between different strains of D. mccartyi (40) and furthermore, several studies have quantified the number of copies of vcrA and found them to be higher than D. mccartyi 16S rRNA (44, 45, 169). Here we find both observations to be likely explained by the periodic, repetitive replication of the vcrA island as an independent circular element.

100

Chapter 6 Identification of new trans-dichloroethene reductive dehalogenase, TdrA, found in Dehalogenimonas sp. WBC-2 Reproduced with permission from the journal of Applied and Environmental Microbiology, the American Society for Microbiology. Copyright © American Society for Microbiology, Applied and Environmental Microbiology, January 2016 vol. 82 no. 1 40-50.

6.1 Introduction Industrial, military and agricultural practices have resulted in the release of a growing number of chemical compounds hazardous to human health and the environment. Contamination of soil and groundwater with chlorinated ethenes and ethanes is of concern due to their known toxicity and/or carcinogenicity. Although in-situ bioremediation is already a well-established remediation approach, greater understanding of new microbial capabilities is important for optimization of this strategy (61, 182). Some of the most interesting dechlorinating organisms are organohalide respiring bacteria (OHRB) which couple dehalogenation with a respiration process required for growth (14, 183, 184). The most studied genus of OHRB are the Dehalococcoides because of their ability to respire the most common chlorinated solvent contaminants: trichloroethene (TCE) and perchloroethene (PCE). As recently as 2009, the Dehalogenimonas were identified as a new genus of OHRB most similar to the Dehalococcoides. Dehalogenimonas lykanthroporepellens and D. alkenigignens are the only two species of Dehalogenimonas described to date; they are capable of dihaloelimination of polychlorinated aliphatic alkanes (51, 55, 185), although Dehalogenimonas-like organisms are known to be widely distributed (186). The West Branch Creek consortium (WBC-2) was derived from sediments contaminated with 1,1,2,2- tetrachloroethane (1,1,2,2-TeCA) (125). In this consortium, 1,1,2,2-TeCA dihaloelimation to trans-dichloroethene (tDCE) is carried out by Dehalobacter, tDCE dechlorination to VC is carried out by Dehalogenimonas, and VC dechlorination to ethene by Dehalococcoides (50). On a 16S rRNA gene sequence basis, the Dehalogenimonas population from the WBC-2 culture is 96% identical to Dehalogenimonas lykanthroporepellens while only 91% identical to Dehalococcoides (50). The Dehalogenimonas carrying out the middle step, reducing tDCE to VC, is of particular interest because it is the first non-Dehalococcoides OHRB found to dechlorinate a lesser chlorinated dichloroethene isomer (50).

Many chlorinated ethene dechlorinating mixed and pure cultures reduce PCE and TCE to cDCE as the main intermediate, however tDCE has also been observed both in laboratory and field studies (187, 188). Notably, RDase MbrA from Dehalococcoides mccartyi strain MB degrades PCE and TCE primarily to tDCE. (33, 187-191). Dehalococcoides CBDB1 dechlorinates PCE and TCE to primarily tDCE (192). Other OHRB such as the Dehalobacter sp. strain TCP1 also dechlorinate PCE/TCE to tDCE (193). Dihaloelimination reactions for polychlorinated ethanes such as 1,1,2,2-TeCA can also produce tDCE, thus a better understanding of the enzymes capable of tDCE dechlorination are important to decontamination.

The objective of this study was to identify the tDCE RDase used by the population of Dehalogenimonas growing in the WBC-2 culture. Although no isolate was available, we sequenced the DNA of the mixed culture and were able to close a Dehalogenimonas genome completely from metagenomic data (Chapter 3). We used a BN-PAGE approach to identify and partially characterize the first tDCE dechlorinating RDase from an organism other than Dehalococcoides herein annotated as TdrA, short for trans-dichloroethene reductive dehalogenase, A subunit.

6.2 Methods 6.2.1 Developing enrichment cultures for analysis The West Branch Canal Creek (WBC-2) consortium was originally enriched from contaminated wetland sediment at Aberdeen Proving Grounds (APG) MD, U.S.A as described in E. J. P. Jones, et al. (125). The WBC-2 enrichment cultures at the Edwards laboratory are described in Chapter 3. The two cultures used for the experiments described in this chapter include the VC and tDCE enrichment cultures. The VC enrichment culture was created in 2010 as a 1:50 transfer from the TeCA parent abbreviated as VC/EL_2010 with the name format indicating electron acceptor amended/donor used_year created. The first tDCE enrichment culture was a 1:50 transfer created in 2010 abbreviated tDCE/EL_2010 and the second two cultures were created a year later from tDCE/EL_2010 in parallel 1:1000 transfers referred to as tDCE/EL_2011_A and tDCE/EL_2011_B (Fig. B1 from Chapter 3). A fourth 1:40 transfer was created in 2013 referred to as tDCE/EL_2013. The two 2011 A and B cultures were purged every week to remove accumulating VC prior to its conversion to ethene to increase the ratio of Dehalogenimonas (Dhg) to Dehalococcoides (Dhc). Not all VC could be removed; a small amount was still

102 converted to ethene. This regime was successful in changing the ratio of Dhg to Dhc from approximately 1:1 as in the tDCE/EL_2010 culture to 4:1 in tDCE/EL_2011 cultures.

6.2.2 Protein extraction Cultures were actively dechlorinating at the time of protein extraction. A culture sample (60 mL) was purged with N2/CO2 (80/20, v/v) to remove residual chlorinated compounds. The 60 mL was centrifuged under anaerobic conditions at 4°C, 8000 xg for 30 minutes (Beckman Coulter Avanti J-E centrifuge rotor J.S. 5.3). The pellet was re-suspended in 200 µL of buffer containing 1% digitonin, 50mM Bis Tris, 50mM NaCl, 10% w/v glycerol, (adjusted to pH 7.2 with HCl) including 55% v/v autoclaved culture. Cells were transferred into a 2 mL o-ringed eppendorf tube with 50 µL of glass beads. Cells were bead-beaten using the Vortex Genie 2 for six minutes with one minute on ice every two minutes. Lysed cells were spun at 13,000 xg for 10 minutes (Mandel Sorvall Fresco centrifuge). Supernatant containing crude protein extract was removed and placed in a fresh 2 mL eppendorf tube. Portions of this crude protein extract (referred to as cell-free extract or CFE) were loaded into the Native PAGE mini-cell (as described below) while remaining extracts were stored on ice to be used as positive controls in subsequent dechlorination assays (also described below).

6.2.3 BN-PAGE: gel loading, electrophoresis, staining and LC-MS/MS analysis Multiple lanes were loaded with 20 µL of cell free extract onto a 4-16% gradient Bis-Tris gel stained with coomassie blue (NativePAGE Novex Bis-Tris gel system protocol, Invitrogen) as previously described (32). Buffers were chilled (4°C) prior to use and the entire cell was placed in an ice bath. The mini-cell ran for 60 minutes at 150V then for another 45 minutes at 200V. The blank, ladder and first lane were separated from the gel and stained according to fast coomassie g-250 staining protocol (NativePAGE Novex Bis-Tris gel system protocol, Invitrogen). Based on visualization of the ladder and first lane, the remaining lane bands were cut. In-gel digestion and mass spectrometry analysis of gel slices for culture tDCE/EL_2011_A was performed at the Advanced Protein Technology Centre of SickKids’ Hospital (Toronto, Ontario) as previously described (32). Briefly, gel slices were reduced with 100 mM DTT, followed by alkylation with iodoacetamide and digestion with porcine overnight. Peptides were loaded onto a 150 µm diameter C18 pre-column prior to separation over 60 minutes using a 0-40 % acetonitrile gradient in 0.1 % formic acid over a 75 µm diameter C18 analytical column at 300 nL min-1. Peptide data were acquired in data dependent mode on an

103

LTQ-Orbitrap mass spectrometer (Thermo-Fisher, San Jose, CA, USA); 2 Da isolation width, 32 % normalized collision energy (NCE), 3+ default charge state, 0.250 activation Q, 30 ms activation time using collision induced dissociation (CID) fragmentation. Mass spectrometry analysis of samples from cultures tDCE/EL_2011_B & VC/EL_2010 was conducted at the University of Toronto BioZone Mass Spectrometry Facility. Gel slices were destained and digested with porcine trypsin according to A. Shevchenko, et al. (194) without reduction and alkylation of cysteines. Peptides were separated over a 10.5 cm, 75 µm diameter C18 analytical column over 60 minutes using a 0-40 % acetonitrile, in 0.1 % formic acid gradient. Peptide M/Z values were measured in data dependent mode using an LTQ-XL mass spectrometer (Thermo- Fisher); 2 Da isolation width, 35 % NCE, 3+ default charge state, 0.250 activation Q, 30 ms activation time.

6.2.4 Dechlorination activity assays Activity assays were conducted in a Coy anaerobic chamber using 20 µL of the crude cell free extract (CFE) serving as positive control, or a single gel slice cut from a blue-native gel as source of enzyme. The assays were conducted in 2 mL glass vials in buffer containing 100 mM Tris- HCl, 2 mM titanium citrate, 2 mM methyl viologen, pH 7.4 and 0.5 mM of the chlorinated substrate being tested. Substrates tested included PCE, TCE, 1,1-DCE, cDCE, tDCE, VC, TeCA, 1,1,1-trichloroethane (1,1,1-TCA), 1,1,2-TCA, 1,1-dichloroethane (1,1-DCA), and 1,2-DCA. A gel lane was cut into six bands (as shown in Appendix Fig. E1) and each band placed into a separate 2 mL glass vial. Each assay was incubated in anaerobic conditions for 24-hours and analyzed for dechlorination products using gas chromatography with FID detector (Agilent G1888 auto-sampler with Agilent 7890A GC system, split ratio 2.5:1, column Agilent GS-Q) calibrated with external standards. Incubations with sample buffer and substrate only, as well with sample buffer, substrate and denatured CFE (10 min at 99°C) were run as a negative controls. No dechlorination was observed in either negative control. Protein concentrations in samples of the CFEs (50 µL) were measured using the Bradford assay (BioRad). The ImageJ program (http://imagej.nih.gov/ij/) was used to estimate the amount of protein found in gel slices as described elsewhere (31).

6.2.5 Database searching of LC-MS/MS spectra Raw data was converted to mzXML using MSconvert (195). X! Tandem (The GPM, thegpm.org; version CYCLONE 2010.12.01.1) was used to identify peptides and proteins. The

104 database used included 13,170 contigs assembled from paired-end reads and six-way translated

(see WBC-2 metagenome Chapter 3) augmented with 116 common contaminants (706,106 proteins total). Search parameters for X! Tandem were adjusted based on instrument used: fragment ion mass tolerance of 0.60 Da and a parent ion tolerance of ±10.0 ppm (LTQ-Orbitrap) or 0.40 Da and parent ion tolerance of +3 and -2 Da (LTQ-XL). Carbamidomethyl modification of cysteine was specified as a fixed modification where applicable. For both instruments dehydration of the n-terminus, ammonia-loss of the N-terminus, deamidation of asparagine and glutamine, oxidation of methionine and acetylation of the N-terminus were specified as variable modifications.

6.2.6 Validation of peptide and protein identifications using Scaffold software Scaffold (version Scaffold 4.4.1.1, Proteome Software Inc., Portland, OR) was used to visualize and validate MS/MS based peptide and protein identifications. Peptide identifications were accepted if they could be established at greater than 95.0% probability by the Peptide Prophet algorithm (196). Protein identifications were accepted if they could be established at greater than 99.0% probability and contained at least two unique identified peptides. Protein probabilities were assigned by the Protein Prophet algorithm (197).

6.2.7 Quantitative polymerase-chain reaction (qPCR) Primers to detect 16S rRNA and RDase gene sequences were selected from literature or were designed (for tdrA and vcrA) using Geneious Pro 5.5.4 Primer Design Feature. Primers used for tdrA were TdrA1404F- 5’-GCCTCTCGCCCCCACTAAACC-3’ TdrA1516R – 5’- GCCATCCTTCATAACCACTCACGCA-3’. For vcrA VcrA 670F 5’- GCCCTCCAGATGCTCCCTTTAC-3’ VcrA 440R 5’-TGCCCTTCCTCACCACTACCAG-3’. All primers are listed in Table E1. Primers were searched against the NCBI nr bacteria database using Primer-BLAST to see whether primers were unique, particularly for the novel gene, tdrA. The only possible unintended target for tdrA was Dehalococcoides mccartyi tceA with three mismatches on the forward primer and four mismatches on the reverse primer. Primers were tested against DNA from the KB-1 consortium known to contain tceA but not tdrA and against DNA from WBC-2 known to contain both. Using both culture’s results, melt-curve analysis and visualization of PCR products on 1% agarose gel confirmed no amplification of tceA, and strong specificity for tdrA. A portion of the gene for tdrA was amplified using PCR (MJ Research PTC- 200 Peltier Thermal Cycler, 94°C for 5min, 29 cycles of: 94°C for 30 sec, 59°C for 30 sec, 72°C

105 for 5 min, 4°C ∞) cloned into a plasmid and transformed into Escherichia coli (Invitrogen TOPO® TA Cloning Kit with pCR®2.1-TOPO® vector and TOP10 cells) to be used as a qPCR standard. Plasmids from selected clones were extracted and Sanger sequenced at the SickKids Centre for Applied Genomics TCAG DNA sequencing/synthesis facility (Toronto, Ontario) to verify successful transformation.

6.2.8 qPCR of 16S rRNA gene and rdhA from different WBC-2 enrichment cultures DNA was extracted from samples (5 mL) from five different WBC-2 enrichment cultures during mid-dechlorination using the PowerSoil®DNA Isolation Kit following the standard extraction protocol provided by MoBio. DNA concentrations were measured using NanoDrop. DNA samples were quantified using Dehalococcoides 16S rRNA, vcrA qPCR (98°C, 2 min, 39 cycles of : 98°C 5s, 60°C 10s, melt-curve: 65°C-95°C , 4°C ∞), Dehalogenimonas 16S rRNA (98°C, 2 min, 39 cycles of : 98°C 5s, 58°C 10s, melt-curve: 65°C-95°C , 4°C ∞) and Dehalobacter 16S rRNA and tdrA (98°C, 2 min, 39 cycles of : 98°C 5s, 62.5°C 10s, melt-curve: 65°C-95°C , 4°C ∞) qPCR. All DNA manipulations post extraction were conducted in a PCR cabinet (ESCO Technologies, Hatboro, PA) with the fan off. Plasmids containing an appropriate 16S rRNA gene or RDase gene were used to calibrate the standard curve. Each standard was pipetted three times onto the plate and each unknown sample was pipetted twice. The standard curve consisted of serial dilutions of the plasmid DNA. Each well contained 20 µl of the reaction mixture comprising of 10µl of 2x real-time PCR mix (SsoFast™ EvaGreen® Supermix from BioRad) 0.5 µM of each forward and reverse primer and 2 µl of 1 in 9 diluted DNA template. Analysis was conducted using BioRad CF96 TouchTM Real-Time modular thermal cycler platform and CFX ManagerTM software. Copies/mL of culture were calculated assuming 100% DNA extraction efficiency and by taking into account DNA dilution and culture volumes used in extraction.

6.2.9 Construction of phylogenetic trees Two-hundred and twenty-five RdhA sequences were selected to create an amino acid phylogenetic tree using Geneious 8.4.1. These included an RDase from Geobacter KB-1, PceA from Sulfurospirillum multivorans, 189 RdhA sequences from Dehalococcoides mccartyi, 13 from Dehalogenimonas lykanthroporepellens and 21 from Dehalogenimonas WBC-2. RdhA sequences were pulled from the public access RDase database (link: docs.google.com/folder/d/0BwCzK8wzlz8ON1o2Z3FTbHFPYXc/edit). All RdhA sequences used contained all characteristic motifs. The protein alignment was created using MUSCLE

106 plugin in Geneious with 10 iterations. To guide the alignment, five trees were built grouping sequences by similarity. The distance measure used was k-mer6_6 followed by pctid_kimura, clustered using UPGMB, a pseudo rooting method and CLUSTALW weighting scheme. Alignments was manually inspected for errors. An additional nucleotide alignment was produced using the same method and compared with the amino acid alignment to aid manual inspection. The PHYML maximum likelihood tree builder plugin in Geneious was used to create the final maximum likelihood tree with 100 bootstraps and Le Gascuel substitution model. Dehalococcoides ortholog groups and tree branches have been established by L. A. Hug, et al. (14) and A. D. Fricker, et al. (198) with 100 bootstraps. FigTree 1.4.2 was used to visualize and further edit the tree using circular or radial conformation to generate figures in this paper (http://tree.bio.ed.ac.uk/software/figtree/).

6.2.10 NCBI accession numbers for new sequences from this study Dehalogenimonas WBC-2 dehalogenase sequence accession numbers are as follows: RdhA1 AKG52809, RdhA2 AKG52810, RdhA3 AKG52822, RdhA4 AKG52887, RdhA5 AKG52899, RdhA6 AKG52961, RdhA7 AKG52966, TdrA (RdhA8) AKG53095, RdhA9 AKG53203, RdhA10 AKG53799, RdhA11 AKG53917, RdhA12 AKG54159, RdhA13 AKG54198, RdhA14 AKG54203, RdhA15 AKG54204, RdhA16 AKG54214, RdhA17 AKG54216, RdhA18 AKG54217, RdhA19 AKG54218, RdhA20 AKG54266, RdhA21 AKG54383, RdhA22 AKG54384.

6.3 Results and Discussion 6.3.1 Dechlorinating activity in cell free extracts The various WBC-2 enrichment cultures are ideal tools to help distinguish the activity of the various organisms and enzymes within because they harbour different complements of OHRB. M. Manchester, et al. (50) previously established that the Dehalococcoides population in WBC-2 grows on VC while the Dehalogenimonas population grows on tDCE. In order to distinguish between the dechlorinating activities of Dehalogenimonas and Dehalococcoides, a tDCE enrichment culture was analyzed in parallel with a VC enrichment culture. Cell free extracts (CFE) from tDCE- and VC-fed enrichment cultures were first tested against a wide range of chlorinated ethenes and ethanes including 1,1,2,2-TeCA, 1,1,2-TCA, 1,1,1-TCA, 1,2-DCA, 1,1- DCA, PCE, TCE, cDCE, tDCE, 1,1-DCE and VC. The cell free extract samples from the tDCE enrichment culture showed the highest activity with 1,1-DCE followed by cDCE, TCE, 1,2-

107

DCA, tDCE and VC while the samples from the VC culture exhibited highest activity on cDCE followed by 1,1-DCE, VC, TCE, and 1,2-DCA (Table E2). Extracts from both cultures had low activity on PCE and on all chlorinated ethanes except 1,2-DCA. The most notable difference between the two cultures was the order of magnitude higher activity on the substrate tDCE by the tDCE enrichment culture (Table E2).

In order to identify the reductive dehalogenase responsible for the tDCE dechlorinating activity, cell free extracts from tDCE/EL_2011_A, a culture more enriched in Dehalogenimonas, were prepared alongside extracts from VC/EL_2010, and separated using BN-PAGE. Bands from distinct lanes were excised and assayed for activity, this time on a smaller number of substrates (VC, tDCE, cDCE, 1,1-DCE, TCE, PCE). Most of the dechlorination activity was observed in band four and band five, in the region of 242 kDa (Fig. E1), at the same molecular weight (MW) where other dehalogenases have been identified in previous studies (31, 32). Cell free extract samples and corresponding gel slices from the tDCE enrichment culture showed higher activities with tDCE as substrate than the VC enrichment culture (Table 6-1). The parallel samples for the VC enrichment culture showed low relative activity on tDCE (Table 6-1) and much higher relative activity on VC (as seen previously in Table E2 and previous studies with Dehalococcoides cultures). 1,1-DCE and cDCE were dechlorinated well in extracts and gel slices from both cultures. The tDCE/EL_2011_A enrichment culture (Table 6-1) had a greater proportion of Dehalogenimonas compared to the tDCE/EL_2010 enrichment culture (Table E2). The culture with more Dehalogenimonas had higher relative and absolute tDCE dechlorination activity.

108

Table 6-1. Protein normalized dechlorination activity (nmol·min-1·mg protein-1) from tDCE2/EL_2011_A and VC/EL_2010 WBC-2 enrichment cultures. Samples include cell free extracts (CFE) as well as active gel slices from BN-PAGE gel. Activities less than 10% of the maximum observed for that sample are shaded in grey.

Protein Dechlorination Activity (nmol·min-1·mg protein-1) by chlorinated Conc. substrate Culture Sample Type and Date (µg/mL) or PCE TCE cDCE tDCE 1,1-DCE VC (µg/slice*)

CFE (12-05-2014) 503 ± 262 0.0 ± 0.1 0.1 ± 0.0 2.8 ± 0.1 3.7 ± 0.1 2.5 ± 0.2 0.7 ± 0.0 tDCE2/EL_2011_A % of max - 0% 3% 76% 100% 68% 19%

Gel Slice (12-05-2014) 0.33 0 1.9 6.3 36 18 2.1

CFE (18-05-2014) a 698 ± 170 0.4 4.6 12 n.a. n.a. n.a.

CFE (19-05-2014) b 984 ± 413 n.a. n.a. n.a. 1.1 11 8.1

VC/EL_2010 % of max - 3% 38% 100% 9% 92% 68%

Gel Slice (18-05-2014) a 0.28 1.7 6.8 18 n.a. n.a. n.a.

Gel Slice (19-05-2014) b 0.21 n.a. n.a. n.a. 13 110 87

Protein concentration and activity in CFEs is average +/- standard deviation of three assay runs. Shaded values are close to detection limit. Active regions of gel slices consist of slices #4 and #5. aPCE,TCE, cDCE assays b tDCE, 1,1-DCE, VC (assay one day later) n.a.: not analyzed

109

6.3.2 RDases identified in the WBC-2 dechlorinating cultures by BN-PAGE Abundant proteins present in all of the gel slices from tDCE and VC enrichment cultures were identified using LC-MS/MS (Table S3 Excel file from (59)). The results were filtered of obvious contaminants; supporting statistical information is provided in Tables S4a and S4b from (59). Five different dehalogenases were identified in bands 4 and 5, around 242 kDa, where most activity was observed in dechlorination assays (Figure 6-1). VcrA was the only dehalogenase detected in the VC enrichment culture and was also detected in much lower abundance in the tDCE enrichment cultures. The remaining four dehalogenases were only present in the tDCE enrichment cultures. One dehalogenase (rdhA8 DGWBC_0411) had the most unique peptide hits affirming its positive identification and the highest total peptide hits indicating its high abundance in the samples (Figure 6-1). The remaining three dehalogenases were barely detectable using LC-MS/MS (locus tags rdhA9 DGWBC_1144, rdhA14 DGWBC_1574, rdhA6 DGWBC_0279), and therefore their influence on the activities in dechlorination assays were likely insignificant. From the low abundance dehalogenases, RdhA6 had the most hits and was found to cluster with RdhA from OG 15 in Dehalococcoides (Chapter 2) known to be expressed during starvation. Thus we conclude that this most abundant dehalogenase is the protein primarily responsible for tDCE dechlorination and is hereby annotated as TdrA – trans- dichloroethene reductive dehalogenase, A subunit. TdrA itself is predicted to have a molecular weight of 64 kDa and presumably co-elutes as a protein complex on a native gel (32).

Figure 6-1. Reductive dehalogenases (RDases) identified from LC-MS/MS analysis from trans-dichloroethene (tDCE) tDCE/EL_2011 enrichment culture and vinyl chloride (VC) VC/EL_2010 enrichment culture. Bars indicate total spectra assigned per protein from Band 4 and Band 5 where highest dechlorination activity was observed and most RDase hits occur. Protein coverage is indicated in as percentage on each bar.

6.3.3 Relative abundance of RDases in WBC-2 enrichment cultures In order to verify that TdrA is in fact a Dehalogenimonas dehalogenase, a qPCR survey of cultures enriched on different electron acceptors was conducted measuring dehalogenase genes tdrA, vcrA and 16S rRNA of Dehalococcoides, Dehalogenimonas and Dehalobacter. Efficiencies for qPCR reactions were all between 93 and 106% (Appendix Table E3). As previously observed by M. Manchester, et al. (50), Dehalogenimonas is only found in cultures maintained on 1,1,2,2-TeCA or tDCE (Fig. 6-2). The newly identified tdrA gene was only detected in enrichment cultures where Dehalogenimonas 16S rRNA genes were also present (Fig. 6-2). Furthermore, tdrA occurs at ratio of about 1:1 with Dehalogenimonas 16S rRNA, when considering the error of qPCR analysis, which is about half an order of magnitude. We therefore conclude that the tdrA gene belongs to Dehalogenimonas in WBC-2.

111

Figure 6-2. Quantification of 16S rRNA and specific rdhA genes in various WBC-2 enrichment cultures. Dehalogenimonas (Dhg) tdrA is detected only where Dehalogenimonas 16S rRNA genes are also present, while vcrA is associated with Dehalococcoides (Dhc) 16S rRNA as determined in a quantitative PCR survey of a series of WBC-2 sub-cultures enriched separately on 1,1,2-TCA (containing Dehalobacter and Dehalococcoides), VC (only Dehalococcoides), 1,1,2,2-TeCA (all three OHRB) and two distinct cultures enriched on tDCE (containing Dehalogenimonas and Dehalococcoides). Data are an average of two qPCR reactions; error bars show ± half an order of magnitude which is an estimate of the error associated with qPCR.

6.3.4 Characteristics of the newly discovered dehalogenase, TdrA Similarly to other characterized RDases, TdrA contains a TAT export sequence, two iron-sulfur cluster motifs (4Fe-4S) and a RdhB protein with one trans-membrane helix (Fig. E2, 55 amino acids). This putative B protein results in five peptide fragments after trypsin digest all of which were outside of the peptide mass limitations of the mass spectrometer (Range ~814-5451 AMU, Average 1569 AMU). If the RdhB protein was present, it could not be detected by LC-MS/MS using this method. It is known that RdhA proteins contain a corrinoid essential to their dechlorinating activity. This type of domain has been observed in some, but not all RdhA proteins. For example DcpA contains DHXG-X39-S-X32-G (22, 56, 199, 200). No such binding motif can be identified in TdrA. Aligning Sulfurospirillum multivorans PceA, whose structure is

112 solved, with TdrA using MUSCLE results in a 20.4% pair-wise alignment. Higher than average identity occurs along PceA B-12 binding residues (24.3 and 22.5%) while lower than average identity occurs over the substrate channel forming residues (16.2%). This suggests a putative B- 12 binding region in TdrA could occur at residues 234-271 and 346-455 (Fig. E2) while a unique insertion channel speaks to substrate differences between these two dehalogenases.

6.3.5 Similarity of WBC-2 RDases to other known RDases From the current NCBI and RDase databases (14) (as of March 2015) TdrA is most similar to TceA from Dehalococcoides mccartyi sp. FL2 TceA (AY16309) with a 74.6% pairwise identity. The RDase database first described by L. A. Hug, et al. (14) shows that the closest ortholog group (OG) to TdrA is OG 5 which contains all TceA RDases (Fig. E3). E. Padilla-Crespo, et al. (56) suggested that Dehalogenimonas RDases will cluster separately from Dehalococcoides RDases based on analysis of DcpA – a 1,2-DCA dehalogenase from Dehalogenimonas lykanthroporepellens, however this is not the case here. TdrA has poor identity with other Dehalogenimonas RDases, sharing 31% pairwise identity with the closest Dehalogenimonas lykanthroporepellens dehalogenase (WP 013217644). The high similarity of TdrA to Dehalococcoides TceA may be either evidence of a recent shared common ancestor of the Dehalococcoides and Dehalogenimonas or evidence of lateral gene transfer between closely related OHRB.

The Dehalococcoides population in the WBC-2 consortium also contains vcrA (AEI59432.1) and bvcA (AGY78804.1) genes both of which are highly similar to previously identified vcrA and bvcA sequences. WBC-2 VcrA has upwards of 97% pairwise identity to other members of OG8 while BvcA has upwards of 96% pairwise identity to members of OG 28 (Fig. E3). VcrA protein was detected in this study, however BvcA protein was not.

6.3.6 Dehalogenimonas WBC-2 complete genome The complete Dehalogenimonas WBC-2 genome was closed (Chapter 3). It contains tdrA and tdrB genes further confirming that TdrA is a Dehalogenimonas dehalogenase, consistent with qPCR data presented above. The rdhA genes have been numbered 1 through 22 indicating their clock-wise position from the origin of replication (Fig. 6-3). The three most abundant Dehalogenimonas proteins detected in BN-PAGE gel slices (listed from most to least abundant) include an uncharacterized protein (DGWBC_1659), TdrA (DGWBC_0411) and molecular chaperone GroEL (DGWBC_1696), also shown on Fig. 6-3. Dehalogenimonas were thought to

113 dechlorinate only polychlorinated alkanes through dihaloelimination (51). Now it is evident that hydrogenolysis is also possible for Dehalogenimonas. All proteins identified in gel slices from Dehalogenimonas WBC-2 are reported in supplemental information Table S3 excel file from (59).

Figure 6-3. A) The complete Dehalogenimonas WBC-2 genome. Rings moving from inside to outside: Ring 1: scale indications position along the genome, Ring 2: GC-content, Ring 3: GC-skew, Ring 4: (grey) blastn hit (with E-value of at least 10) against Dehalogenimonas lykanthroporepellens complete genome Ring 5: (blue) blastn hit (with E-value

114 of at least 10) against Dehalococcoides mccartyi BAV1 complete genome. Ring 6: coding regions of Dehalogenimonas WBC-2 leading strand, Ring 7: coding regions of Dehalogenimonas WBC-2 lagging strand. Outer ring: In black are all predicted RdhA, in red are any proteins which were detected in gel slices from LC-MS/MS analysis. 1 – branched-chain amino acid aminotransferase (DGWBC_0198), 2 – phosphoribosylformylglycinamidine synthase (DGWBC_0354), 3 – hypothetical protein (DGWBC_0397), 4 – S-adenosylmethionine synthetase (DGWBC_0515), 4 – adenosylhomocysteinase (DGWBC_0516), 5 – aminopeptidase (DGWBC_0539) , 6 – aspartyl-tRNA synthetase (DGWBC_0813), 7 – serine—glyoxylate aminotransferase (DGWBC_1097), 8 – butyryl-CoA dehydrogenase (DGWBC_1214), 9 – ribose-phosphate pyrophosphokinase (DGWBC_1365), 10 – hypothetical protein (DGWBC_1659), 11 – molecular chaperone GroEL (DGWBC_1696), 12 – hypothetical protein (DGWBC_1766). B) Close up detailed view of tdrAB genomic island region from Dehalogenimonas WBC-2 genome. * indicates 11 bp repeat sequence, blue: mobility genes, dark green/light green: pairs of and transposases. Proteins which could be putatively annotated include 1: DNA-binding response regulator (DGWBC_0406); 2: ribonucleotide reductase class II (DGWBC_0410); 3: sensory box histidine kinase (DGWBC_0415); 4: transposase (DGWBC_0418); 5: integrase (DGWBC_0419); 6: high-affinity carbon uptake protein (DGWBC_0427); 7: DNA (DGWBC_0430).

6.3.8 Evidence for lateral acquisition of tdrAB There are several lines of evidence to suggest that the tdrAB operon lies on a 37 kbp mobile element, and was acquired through lateral gene transfer. The island can be identified using IslandPath-DIMOB due to a significant difference in dinucleotide frequency and presence of mobility genes (Figure 6-3B). The GC content of the island itself is 46%, where the GC content entire genome is 49%. The operon is directly adjacent to a tRNA and flanked by eleven base-pair direct repeats. There is an unusually high concentration of mobility genes (nine of thirteen) on the genomic island (Figure 6-3B). The island also contains an insertion element flanked by two pairs of integrases and transposases. Many of the genes on the island could not be assigned a known function, which is also a common feature of genomic islands. The entire island (Figure 6- 3B) and listed features is, according to M. G. I. Langille, et al. (140), strong evidence for a region’s lateral origin. No other dehalogenase present in the Dehalogenimonas genome has as prominent evidence for lateral acquisition as tdrA suggesting that this dehalogenase was most recently acquired (140).

115

Figure 6-4. Phylogenetic maximum likelihood amino acid tree displayed in radial form of two-hundred and twenty-five reductive dehalogenases, A subunit only. Main clusters are coloured and numbered. Dehalogenimonas WBC-2 RdhA are highlighted by light green dots, D. lykanthroporepellens RdhA by dark green dots, remaining branch tips contain RdhA from various Dehalococcoides mccartyi except Geobacter (Geo) and Sulfurospirillum multivorans (Sul) dehalogenases which are annotated accordingly. Characterized RDases are listed by name. Scale shows number of substitutions per site. Tree branches were based on 100 bootstraps (not shown).

6.3.9 Defining clusters of RdhA sequences within the Dehalococcoidia Ortholog groups (OGs) for RdhA proteins were established by L. A. Hug, et al. (14) to aid in naming new and unclassified dehalogenases based on a 90% amino acid sequence identity cut off. The Dehalogenimonas WBC-2 RdhAs were compared to known Dehalococcoides ortholog groups using this system. None of the Dehalogenimonas WBC-2 dehalogenases met the 90% cut

116 off however, five dehalogenases could be placed near established ortholog groups suggesting vertical transfer from the last common shared ancestor including RdhA2 & RdhA8 (OG 15), an ortholog group containing RdhAs known to be induced by starvation (98), RdhA9 (OG 48), RdhA5 (OG 39) and TdrA (OG 5) (Figure E3). The pangenome analysis from Chapter 2 also clustered these RdhA. Dehalococcoides are thought to acquire reductive dehalogenase homologous genes through both vertical and lateral transfer. Prior to recent additional sequenced genomes, RdhA protein and associated gene sequences seemed to parse into a core group found only in the Dehalococcoides and a second group found in Dehalococcoides as well as in other dechlorinating species (39). With the characterization of additional Dehalogenimonas genomes it now appears that RdhA homologs of the Dehalococcoidia separate into at least five different mixed clusters (Figure 6-4). Cluster 1 contains core RdhA sequences and RdhA sequences from non-Chlorofexi dechlorinating species including Geobacter RDase and Sulfurospirillum multivorans PceA. Cluster 2 contains characterized chlorinated ethene degrading dehalogenases including VcrA, BvcA, TceA, PceA and TdrA. This cluster is dominated by Dehalococcoides RdhA with a few Dehalogenimonas homologs. Cluster 3 contains a concentration of Dehalogenimonas lykanthroporepellens RdhA (5 of 13 RdhA). Cluster 4 is dominated by Dehalococcoides RdhA sequences (including MbrA) with the exception of Dehalogenimonas lykanthroporepellens DcpA, however, DcpA is also found in Dehalococcoides mccartyi KS and RC with 95% and 92% amino acid similarity (56). Cluster 5 has no characterized representatives but has a concentration of Dehalogenimonas WBC-2 RdhA (11 of 22) (Figure 6-4). The Dehalogenimonas are similar enough to the Dehalococcoides in their RdhA compliments that no pure “Dehalococcoides” cluster remains (Figure 6.4).

6.3.10 Sequence alignment and structural motifs An alignment of two-hundred and twenty-five RdhA sequences illustrates the main similarities and differences between RdhA (Table S6 Supplemental Information from (53)). Firstly the most conserved area corresponds to Sulfurospirillum multivorans PceA’s second B-12 binding motif, especially the region closest to the first iron-sulfur cluster unit (residues 216-323 in PceA), that has a 40% pairwise identity across 225 RdhA sequences including completely conserved glycine and arginine residues. Remaining regions associated with known motifs including the double iron-sulfur cluster motif, the first B-12 binding region, N-terminus TAT region, and the C- terminus region show are less conserved. The least conserved region is the “letterbox” substrate

117 channel-forming residues having only ~2.9% pairwise identity. This highly variably area (41 bp in the alignment) is a clear contributor to the groupings of RdhA. Members of previously established ortholog groups often share “letterbox” substrate channel forming residues with each other, but not with other ortholog groups (Table S6, substrate channel forming residues are highlighted). This region is likely the most informative with respect to the substrate identity for these RdhA enzymes. Ortholog groups show promise of grouping dehalogenases with functional relevance, such as was the case here, with WBC-2 VcrA having 97% identity to the known VcrA containing ortholog group 8, having an identical substrate forming channel and demonstrating similar dechlorinating activity to other already characterized VcrA enzymes from this group (25, 31, 32, 121) (Table S6; shading showing putative substrate-binding residues, pages 21 to 25 of supplemental information in (59)).

6.4 Conclusions In summary, Dehalogenimonas WBC-2 is another OHRB highly evolved for dechlorination. This bacterium in mixed culture was shown to express one abundant dehalogenase, TdrA, capable of tDCE to VC hydrogenolysis. The use of primarily one RDase with strong specialization to substrate likely enabled this Dehalogenimonas to out-compete Dehalococcoides for tDCE dechlorination in WBC-2. Considering all Dehalogenimonas WBC-2 dehalogenase genes, the tdrA gene was most recently laterally acquired as evidence by its prominent island features. It is likely that lateral gene transfer of rdhA genes is not only a feature of Dehalococcoides but also a feature of other Dehalococcoidia including the Dehalogenimonas. TdrA is highly specific to the WBC-2 culture and is a good candidate for tracking WBC-2 Dehalogenimonas in the subsurface or after injection during bioaugmentation. The characteristic motifs of RDases now include a recently established putative “letterbox” substrate channel which may begin to reveal clusters with similar substrate specificity.

118

Chapter 7 Summary, Significance and Future Work

7.1 Summary and Significance The overall aim of this thesis was to study the Dehalococcoidia found in the KB-1 and WBC-2 cultures in order to improve bioremediation applications. Three major objectives included (1) identify and characterize the dechlorinating organisms in the WBC-2 and KB-1 cultures (2) identify reductive dehalogenase enzymes (RDases), especially the ones responsible for observed dechlorinating function and (3) investigate the role of lateral gene transfer on shaping Dehalococcoidia genomes and as a result influencing their dechlorinating ability.

Next generation DNA sequencing of KB-1 and WBC-2 served as a snap shot of cultures in time, allowing us to identify key players, resolve their genomes and identify associated DNA As a part of Objective (1) we used a combination of mate-pair and paired-end Illumina sequencing along with 454 16S rRNA gene amplicon sequencing to identify the major microbial groups in both the KB-1 and WBC-2 enrichment cultures. We were surprised to find multiple strains of D.mccartyi in the cultures despite being enriched on specific chlorinated substrates. From both sets of cultures, nine genomes of D. mccartyi and one genome of Dehalogenimonas were closed and analyzed. D. mccartyi genomes are remarkably similar to the Dehalogenimonas, and even more so to each other. The contribution of our ten genomes increased the total number of Dehalococcoidia complete genomes in NCBI to twenty-seven. The reductive dehalogenases found gave us an idea about some of the potential dechlorinating ability of each strain. A total of 104 rdhA sequences were recovered from the KB-1 metagenomes and 96 rdhA sequences from the WBC-2 metagenome of which 186 had never been seen before (Objective (2)).

We found potentially mobile DNA associated with the genomes we closed based on sequence features, read depth and mate-pair/paired-end linking information. We additionally identified two CRISPR-Cas systems in two of our strains whose target sequences helped us identify mobile DNA. The CRISPR-Cas systems primarily targeted prophages, and a novel type of DNA fragments we have named integrative mobilizable element #1 (IME1). The IME1s are a new, never described mobile element of about 23 kbp which exists simultaneously in the genome and in an independent, high copy, circular form. We catalogued all of the IME1 sequences found in our metagenomes, in other D. mccartyi genomes in NCBI and possibly similar sequences in

119 other bacteria. Additionally, we found twelve new D. mccartyi prophage sequences from genomes and in metagenomic contigs. Prior to this study only two prophages had been described in D. mccartyi. The most interesting target DNA found in the CRISPR were fragments of reductive dehalogenase containing genomic islands including the VC reductase containing genomic island (vcrA-GI). Upon closer inspection of DNA sequencing, the vcrA-GI also occurred in unusually high read depth, and could be assembled as integrated into the genome, or as a separate circular form similarly to IME1s. The investigation of lateral gene transfer, especially of reductive dehalogenases, was part of Objective (3).

DNA sequencing helps guide experiments whose findings contribute to science and engineering on a broader scale The comparison of our Dehalococcoidia genomes with the genomes in NCBI allowing us to see not only differences, but also similarities. Both the nucleotide and amino acid alignments have given us insights to the evolutionary pressures shaping reductive dehalogenases, as well as the entire genomes of this family of bacteria.

D. mccartyi genomes are unique in that they are amongst the smallest genomes found in free- living bacteria (avg. 1.4 Mbp and 1451 protein-coding genes). The D. mccartyi genome required an extensive period of time to become as specialized and small as it currently is. Each of the D. mccartyi genomes has a very clear GC skew and strand coding bias commonly found in bacteria with only one origin of replication. Prior to this study, it was generally thought that reductive dehalogenase gene complement arose as a result of lateral gene transfer and was a representation of unique, and largely uncharacterized vast dechlorinating abilities of each strain. After the addition of these new genomes to the NCBI database, we found that the major differences in dehalogenase gene complement were more a result of gene loss than recombination. Furthermore, that the diversity in RdhA sequences is not as high as previously thought, and with more genomes closed we find fewer new D. mccartyi dehalogenase genes. This finding corresponds with the theory that gene loss is equally or even more important than lateral gene transfer in shaping genomes. Despite this, some dehalogenases do show evidence of mobility. Mobile dehalogenases are simultaneously the same dehalogenases that show activity on industrial contaminants. It is possible that anthropogenic releases of organohalides have caused D. mccartyi’s genome to enter a period of complexification by sharing select rdhA across vast geographic spans causing rearrangements within genomes. The exchange of key reductive

120 dehalogenases amongst the D. mccartyi can be similarly described as the recent dissemination of antibiotic resistance genes in the natural environment. The Dehalogenimonas genome we closed, also was found to be using a dehalogenase located on a mobile element. These D. mccartyi genomes will contribute to research on the theory of genome evolution in bacteria, especially as more genomes are closed and trends become increasingly evident.

After identifying CRISPR-Cas modules in two of our genomes, we went one step further and PCR amplified the entire array from present and historical culture DNA. We observed the CRISPR-Cas array grow by three spacer sequences over eleven years, two of which match prophage DNA and one whose target is unknown at this time. We now know that the systems coded in the genomes are functioning, and have been adapting to invading phage DNA in our regularly maintained cultures. The discovery of a functioning CRISPR-Cas expands our understanding of this individual system, adding to current research on CRISPR-Cas systems.

DNA sequencing indicated that the vcrA-containing genomic island (vcrA-GI) had mobilized in our cultures. We designed experiments to explain the sequencing results and found that the vcrA- GI was periodically maintained as a circular extrachromosomal element in cells. We ran the first pilot study to see whether there was a link between the vcrA-GI circularization and donor limitation and observed vcrA-GI circularization in both KB-1 and WBC-2 subcultures, while we did not find a clear relationship we did discover that the vcrA-GI behaved the same way in WBC-2, indicating that this is common to other vcrA-GI not just from KB-1. This genetic element was previously considered a genomic island, which considering the high sequence similarity across different D. mccartyi genomes, could have been laterally transferred in the last 1000 years (40) . It is remarkable to have identified possibly the first step in lateral gene transfer of this element, happening presently in the KB-1 and WBC-2 cultures. This research will not only expand our understanding of lateral gene transfer in D. mccartyi, but also contribute to the existing knowledge of gene transfer in bacteria in general.

Findings in this thesis will be used to improve the study of dechlorinating enrichment cultures and to monitor cultures during in-situ bioremediation Several previously developed methods used in this thesis were modified and improved. Firstly, the BN-PAGE method was modified by adding reducing agents during cell lysis and using nitrogen gas to flush enzyme activity vials. As a result, significantly higher enzymatic activity was recovered. Now we can use less culture and observe more accurate protein activity during

121 assays. This modified method was already used to make new discoveries such as the TdrA dehalogenase presented in Chapter 6, and two additional studies which resulted in two co- authored publications, one in collaboration with Stanford University (201) and the second in collaboration with The University of Tennessee (202).

Quantitative PCR (qPCR) is a frequently used tool to track catabolic genes in laboratory and during in-situ bioremediation. Improvements were made to the existing method such as designing new universal vcrA primers to capture all known vcrA sequences at the time, available from O. Molenda, et al. (59). Also, a plasmid was designed with four Dehalococcoides gene fragments, such that the same standard could be used for four different qPCR reactions. Without this plasmid, it would not have been possible to accurately study the ratios of vcrA to 16S rRNA genes. To track the tdrA gene, new primers and a new qPCR method was developed. All reagents listed here were shared with SiREM who now offers Dehalogenimonas and tdrA biomarker testing.

To summarize, the major contributions of this thesis are (1) ten new closed genomes from the Dehalococcoidia; (2) a newly defined architecture of D. mccartyi genomes, and their rdhA; (3) characterization of a new Dehalogenimonas dehalogenase, TdrA; (4) identification of a functioning CRISPR-Cas system, and novel mobile elements including IME1 in D. mccartyi and twelve new D. mccartyi prophages; and (5) the first pilot study of the mobilization of the vcrA genomic island in KB-1 and WBC-2 cultures.

7.2 Future Work Further study of vcrA-genomic island mobility in D. mccartyi In Chapter 4 and 5 the vcrA genomic island mobilized via a circular intermediate state in KB-1 and WBC-2 VC enrichments. It is possible that more circular extrachromosomal islands are produced when a culture is stressed. Further studies could be designed to confirm, and to be able to produce extrachromosomal vcrA in laboratory. Little is understood in terms of how many cells of a population produce extrachromosomal vcrA, and how that circular element may be transferred between cells. Due to the nature of DNA extraction method used, the circular vcrA element was found within cells. Culture supernatant should be tested to see whether circular vcrA is exported outside of the cell. Proteomic and transcriptomic study could be used to identify proteins involved during extrachromosomal vcrA production, compared with proteins expressed when the island is not being produced. At this point it is known that more gene copies of vcrA

122 occur within the cell, whether this results in an increase in transcripts or protein production is unknown and is another potential area of investigation.

Inducing lateral gene transfer in the laboratory setting During this thesis we designed an experiment to induce lateral gene transfer in dechlorinating mixed microbial cultures in collaboration with Dr. Ruth Richardson (Cornell University). The Richardson Laboratory provided us with the DKB culture, a culture made from a mixture of DONNAII and KB-1. We know that in KB-1 and DONNAII there are strains of D. mccartyi which contain the tceA dehalogenase and are especially good at degrading trichloroethene (TCE). Simultaneously these strains differ from other D. mccartyi in that they contain an intact nitrogen fixation operon. In KB-1 and DKB, the vinyl chloride (VC) degrading strains have the vcrA dehalogenase but lack nitrogen fixation genes. By creating subcultures in ammonia-free, N2 and VC amended medium, we created an environment where a strain with the combination of vcrA reductase and nitrogen fixation genes would outcompete other strains of D. mccartyi. We hoped that the absence of ammonia would work as an environmental stressor to induce lateral gene transfer. The experiment was set up and monitored for 518 days by gas chromatography and by taking periodic DNA samples before being handed off to a new Ph.D. student (Nadia Morson).

Apart from the experiment described above, the Dehalococcoides are known to have DNA uptake genes which may make them naturally competent. Additional experiments could be conducted to test whether D. mccartyi could naturally transform DNA in a laboratory setting. Also the prophage KBDCA3-12 was found in very close proximity to the bvcA gene. It would be interesting to test whether bvcA may be transferred as part of the prophage.

Remaining metagenomic work In 2013, many DNA samples were sent for Illumina mate-pair and paired-end sequencing from a variety of enrichment cultures. During this round of sequencing there are several datasets which have not yet been analyzed including a KB-1 culture from SiREM, WBC-2 1,1,2-TCA and VC enrichment cultures and a WL TCA enrichment culture. The KB-1 cDCE culture dataset is only partially analyzed at this time. There are also three genomes which were closed during this work but still have gaps including a Dehalobacter from KB-1/1,2-DCA-MeOH culture, Geobacter from KB-1/TCE-MeOH culture and vcrA containing Dehalococcoides mccartyi from KB- 1/cDCE-MeOH culture.

123

Historical DNA metagenome sequencing In 2017 a sample from a KB-1 TCE enrichment culture from 2003 was paired-end Illumina sequenced by the Center for the Analysis of Genome Evolution and Function (CAGEF) at the University of Toronto. This historical metagenome could be compared to the KB-1 metagenomes analysed from 2013 DNA in order to observe what kind of changes have occurred over ten years of laboratory continuous culture growth. This dataset may allow us to make new, and better conclusions about lateral gene transfer, CRISPR adaptation, RDases and genome evolution of D. mccartyi.

REFERENCES 1. Elena-Roxana A. 2011. An approach to bioremediation. Journal of Engineering Studies and Research 17:7-11. 2. Kavanaugh MC, Rao PS. 2003. The DNAPL Remediation Challenge: is there a case for source depletion? U.S. Environmental Protection Agency, National Risk Management Research Laboratory, Cincinnati, OH. 3. Council NR. 1994. Alternatives for ground water cleanup. Press TNA, Washington, DC. 4. U.S. Dept. of Health and Human Services. 2008. Toxicological profile for 1,1,2,2- tetrachloroethane. U.S. Dept. of Health and Human Services, Public Health Service, Agency for Toxic Substances and Disease Registry, Atlanta, GA. 5. Guyton KZ, Hogan KA, Scott CS, Cooper GS, Bale AS, Kopylev L, Barone S, Jr., Makris SL, Glenn B, Subramaniam RP, Gwinn MR, Dzubow RC, Chiu WA. 2014. Human health effects of tetrachloroethylene: key findings and scientific issues. Environ Health Perspect 122:325. 6. Keppler F, Borchers R, Pracht J, Rheinberger S, Schöler HF. 2002. Natural Formation of Vinyl Chloride in the Terrestrial Environment. Environ Sci Technol 36:2479-2483. 7. U.S. Environmental Protection Agency. 2000. Toxicological review of vinyl chloride. Agency USEP, Washington, DC. 8. Suthersan SS, Schnobrich M, Martin J, Horst JF, Gates E. 2017. Three Decades of Solvent Bioremediation: The Evolution from Innovation to Conventional Practice. Groundwater Monitoring & Remediation 37:14-23. 9. Seshardi R, Adrian L, Fouts DE, Eisen JA, Phillippy AM, Methe BA, Ward NL, Nelson WC, Dobson RJ, Daugherty SC, Brinkac LM, Sullivan SA, Madupu R, Nelson KE, Kang KH, Impraim M, Tran K, Robinson JM, Forberger HA, Fraser CM, Zinder SH, Heidelberg JF. 2005 Genome sequence of the PCE-dechlorinating bacterium Dehalococcoides ethenogenes. Science 307:105-108. 10. Maymo-Gatell X, Chien Y, Gossett JM, Zinder SH. 1997. Isolation of a bacterium that reductively dechlorinates tetrachloroethene to ethene. Science 276:1568-1571. 11. DeWeerd KA, Mandelco L, Tanner RS, Woese CR, Suflita JM. 1990. Desulfomonile tiedjei gen. nov. and sp. nov., a novel anaerobic, dehalogenating, sulfate-reducing bacterium. Arch Microbiol 154:23-30. 12. Lorenz Adrian FEL. 2006. Organohalide-respiring bacteria- an introduction. In Lorenz Adrian FL (ed), Organohalide-Respiring Bacteria. Heidelberg, Berlin. 13. Krzmarzick MJ, Crary BB, Harding JJ, Oyerinde OO, Leri AC, Myneni SCB, Novak PJ. 2012. Natural Niche for Organohalide-Respiring Chloroflexi. Appl Environ Microbiol 78:393- 401.

124

14. Hug LA, Maphosa F, Leys D, Löffler FE, Smidt H, Edwards EA, Adrian L. 2013. Overview of organohalide-respiring bacteria and a proposal for a classification system for reductive dehalogenases. Philos Trans Soc Biol Sci 368:1-10. 15. Magnuson JK, Romine MF, Burris DR, Kingsley MT. 2000. Trichloroethene reductive dehalogenase from Dehalococcoides ethenogenes: sequence of tceA and substrate range characterization. Appl Environ Microbiol 66:5141-5147. 16. Barras F, Losieau, L., and Beatrice, P. 2005. How Escherichia coli and Saccharomyces cerevisiae build Fe/S proteins. Adv Microb Physiol:42-101. 17. Seshardi R, Adrian L, Fouts DE, Eisen JA, Phillippy AM, Methe BA, Ward NL, Nelson WC, Deboy RT, Brinkac LM, Sullivan SA, Kolonay JT, Dodson RJ, Daugherty SC, Brinkac LM, Madupu R, Nelson KE, Kang KH, Impraim M, Tran K, Robinson JM, Forberger HA, Fraser CM, Zinder SH. 2005. Genome sequencing of the PCE-dechlorinating bacterium Dehalococcoides ethenogenes. Science 307:105-108. 18. Smidt H, Van Leest M, van der Oost J, De Vos WM. 2000. Transcriptional regulation of the cpr gene cluster in ortho-chlorophenol respiring Desulfitobacterium dehalogenans. J Bacteriol 182:5683-5691. 19. Chen K, Huang L, Xu C, Liu X, He J, Zinder SH, Li S, Jiang J. 2013. Molecular characterization of the enzymes involved in degradation of a brominated aromatic herbicide. Mol Microbiol 89:1121-1139. 20. Banerjee R, Ragsdale SW. 2003. The many faces of vitamin B12: catalysis by cobalamin- dependent enzymes. Annu Rev Biochem 72:209-247. 21. Marcy YT, Ishoey RS, Laksen TB, Stockwell BP, Walenz AL, Halpern KY, Beeson KY, Goldberg SM, Quake SR. 2007. Nanoliter reactors improve multiple displacement amplification of genomes from single cells. PLoS Genet 3:1702-1708. 22. Morita Y, Futagami T, Goto M, Furukawa K. 2009. Functional characterization of the trigger factor protein PceT of a tetrachloroethene-dechlorinating Desulfitobacterium hafniense Y51. Appl Microbiol Biotechnol 83:775-781. 23. Sunyama A, Yamashita M, Yoshino S, Furukawa K. 2002. Molecular characterization of the PceA dehalogenase of Desulfitobacterium sp. strain Y51. Bacteriol 184:3419-3425. 24. Yan J, Ritalahti KM, Wagner DD, Löffler FL. 2012. Unexpected specificty of interspecies cobamid transfer from Geobacter spp. to organohalide-respiring Dehalococcoides mccartyi strains. 78:6630-6636. 25. Parthasarathy A, Stich TA, Lohner ST, Lesnefsky A, Britt RD, Spormann A. 2015. Heterologously expressed vinyl chloride reductive dehalogenase (VcrA) from Dehalococcoides mccartyi strain VS. J Am Chem Soc 137:3525-3532. 26. MacNelly A, Kai, M., Svatos, M., Diekert, G.,& Schubert, T. 2014. Functional heterologous production of reductive dehalogenases from Desulfitobacterium hafniense strains. App Envron Microbiol:80(14), 4313-4322. 27. Bommer M, Kunze C, Fesseler J, Schubert T, Diekert G, Dobbek H. 2014. Structural basis for organohalide respiration. Science 346:455-458. 28. Kublik A, Deobald D, Hartwig S, Schiffmann CL, Andrades A, von Bergen M, Sawers RG, Adrian L. 2016. Identification of a multi‐protein reductive dehalogenase complex in Dehalococcoides mccartyi strain CBDB1 suggests a protein‐dependent respiratory electron transport chain obviating quinone involvement. Environ Microbiol 18:3044-3056. 29. Hartwig S, Dragomirova N, Kublik A, Türkowsky D, von Bergen M, Lechner U, Adrian L, Sawers RG. 2017. A H2‐oxidizing, 1,2,3‐trichlorobenzene‐reducing multienzyme complex isolated from the obligately organohalide‐respiring bacterium Dehalococcoides mccartyi strain CBDB1. Environmental Microbiology Reports 9:618-625. 30. Seidel K, Kühnert J, Adrian L. 2018. The complexome of Dehalococcoides mccartyi reveals its organohalide respiration-complex is modular. Frontiers in Microbiology 9.

125

31. Liang X, Molenda O, Tang S, Edwards EA. 2015. Identity and substrate-specificity of reductive dehalogenases expressed in Dehalococcoides-containing enrichment cultures maintained on different chlorinated ethenes. Appl Environ Microbiol 81:4626-4633. 32. Tang S, Chan WMW, Fletcher KE, Seifert J, Liang X, Löffler FE, Edwards EA, Adrian L. 2013. Functional characterizayion of reductive dehalogenases by using blue native polyacrylamide gel electrophoresis. Appl Environ Microbiol 79:974-981. 33. Cheng D, He J. 2009. Isolation and characterization of "Dehalococcoides" sp. strain MB which dechlorinates tetrachloroethene to trans-1,2-dichloroethene. Appl Environ Microbiol 75:5910- 5918. 34. Adrian L, Rahnenführer J, Gobom J, Hölscher T. 2007. Identification of a Chlorobenzene Reductive Dehalogenase in Dehalococcoides sp. Strain CBDB1. Appl Environ Microbiol 73:7717-7724. 35. Wang S, Chng KR, Wilm A, Zhao S, Yang K-L, Nagarajan N, He J. 2014. Genomic characterization of three unique Dehalococcoides that respire on persistent polychlorinated biphenyls. PNAS 111:12103-12108. 36. Zhao S, Ding C, He J. 2017. Genomic characterization of Dehalococcoides mccartyi strain 11a5 reveals a circular extrachromosomal genetic element and a new tetrachloroethene reductive dehalogenase gene. FEMS Microbiol Ecol 93:fiw235-fiw235. 37. Lovley DR. 2003. Cleaning up with genomics: applying molecular biology to bioremediation. Nature Reviews Microbiology 1:35+. 38. Kapley A, Purohit H. 2009. Genomic tools in bioremediation. Indian J Microbiol 49:108-113. 39. McMurdie PJ, Behrens SF, Müller JA, Goke J, Ritalahti KM, Wagner R, Goltsman E, Lapidus A, Holmes S, Löffler FE, Spormann AM. 2009. Localized plasticity in the streamlined genomes of vinyl chloride respiring Dehalococcoides. PLoS Genet 5:e1000714. 40. McMurdie PJ, Hug LA, Edwards EA, Holmes S, Spormann AA, . 2011. Site-specific mobilization of vinyl-chloride respiration islands by a mechanism common in Dehalococcoides. BMC Genomics 12:287-302. 41. Lee PK, Johnson DR, Holmes VF, He J, Alvarez-Cohen L. 2006. Reductive dehalogenase gene expression as a biomarker for physiological activity of Dehalococcoides spp. Appl Environ Microbiol 72:6161-6168. 42. Lee PK, Macbeth TW, Sorenson KS, Jr., Deeb RA, Alvarez-Cohen L. 2008. Quantifying genes and transcripts to assess the in situ physiology of "Dehalococcoides" spp. in a trichloroethene-contaminated groundwater site. Appl Environ Microbiol 74:2728-2739. 43. Ritalahti KM, Hatt JK, Lugmayr V, Henn K, Petrovskis EA, Ogles DM, Davis GA, Yeager CM, Lebron CA, Löffler FE. 2010. Comparing on-site to off-site biomass collection for Dehalococcoides biomarker gene quantification to predict in situ chlorinated ethene detoxification potential. Environ Sci Technol 44:5127-5133. 44. Kocur CMD, Lomheim L, Molenda O, Weber KP, Austrins LM, Sleep BE, Boparai HK, Edwards EA, O’Carroll DM. 2016. Long-Term Field Study of Microbial Community and Dechlorinating Activity Following Carboxymethyl Cellulose-Stabilized Nanoscale Zero-Valent Iron Injection. Environ Sci Technol 50:7658-7670. 45. van der Zaan B, Hannes F, Hoekstra N, Rijnaarts H, de Vos WM, Smidt H, Gerritse J. 2010. Correlation of Dehalococcoides 16S rRNA and Chloroethene-Reductive Dehalogenase Genes with Geochemical Conditions in Chloroethene-Contaminated Groundwater. Appl Environ Microbiol 76:843-850. 46. Schmidt KR, Tiehm A. 2008. Natural attenuation of chloroethenes: identification of sequential reductive/oxidative biodegradation by microcosm studies. Water Sci Technol 58:1137-1145. 47. Duhamel M, S. D. Wehr, L. Yu, H. Rizvi, D. Seepersad, S. Dworatzek, E. E. Cox, and E. A. Edwards. 2002. Comparison of anaerobic dechlorinating enrichment cultures maintained on tetrachloroethene, trichloroethene, cis-dichloroethene and vinyl chloride. Water Res:36,4193- 4202.

126

48. Hug LA, Beiko RG, Rowe AR, Richardson RE, Edwards EA. 2012. Comparative metagenomics of three Dehalococcoides-containing enrichment cultures: the role of the non- dechlorinating community. BMC Genomics 13:327-327. 49. Perez de Mora A, Lacourt A, McMaster ML, Liang X, Dworatzek S, Edwards EA. 2017. Chlorinated electron acceptor availability selects for specific Dehalococcoides populations in dechlorinating enrichment cultures and in groundwater. bioRxiv. 50. Manchester M, Hug LA, Zarek M, Zila A, Edwards EA. 2012. Discovery of a trans- dichloroethene-respiring Dehalogenimonas species in the 1,1,2,2-tetrachloroethane- dechlorinating WBC-2 consortium. Appl Environ Microbiol 78:5280-5287. 51. Moe WM, Yan J, Nobre MF, Da Costa MS, Rainey FA. 2009. Dehalogenimonas lykanthroporepellens gen. nov., sp. nov., a reductively dehalogenating bacterium isolated from chlorinated-solvent contaminated groundwater. Int J Syst Evol Microbiol 59:2692-2697. 52. Siddaramappa S, Challacombe JF, Delano SF, Green LD, Daligault H, Bruce D, Detter C, Tapia R, Han S, Goodwin L, Han J, Woyke T, Pitluck S, Pennacchio L, Nolan M, Land M, Chang YJ, Kyrpides NC, Ovchinnikova G, Hauser L, Lapidus A, Yan J, Bowman KS, da Costa MS, Rainey FA, Moe WM. 2012. Complete genome sequence of Dehalogenimonas lykanthroporepellens type strain (BL-DC-9(T)) and comparison to "Dehalococcoides" strains. Stand Genomic Sci 6:251-264. 53. Molenda O, Quaile AT, Edwards EA. 2016. Dehalogenimonas sp. strain WBC-2 genome and identification of its trans-dichloroethene reductive dehalogenase, TdrA. Appl Environ Microbiol 82:40-50. 54. Yang Y, Higgins SA, Yan J, Simsir B, Chourey K, Iyer R, Hettich RL, Baldwin B, Ogles DM, Löffler FE. 2017. Grape pomace compost harbors organohalide-respiring Dehalogenimonas species with novel reductive dehalogenase genes. ISME J doi:10.1038/ismej.2017.127. 55. Maness AD, Bowman KS, Yan J, Rainey FA, Moe WM. 2012. Dehalogenimonas spp. can reductively dehalogenate high concentrations of 1,2-dichloroethane, 1,2-dichloropropane, and 1,1,2-trichloroethane. AMB Express 2:1-7. 56. Padilla-Crespo E, Yan J, Swift C, Wagner DD, Chourey K, Hettich RL, Ritalahti KM, Löffler FE. 2014. Identification and environmental distribution of dcpA, which encodes the reductive dehalogenase catalyzing the dichloroelimination of 1,2-dichloropropane to propene in organo-halide-respiring chloroflexi. Appl Environ Microbiol 80:808-818. 57. Murkjerjee K, Bowman, K.S., Rainey, F.A., Siddaramappa, S., Challacombe, J.F., & Moe, W.M. 2014. Dehalogenimonas lykanthroporpellens BL-DC-9 simultaneously transcribes many rdhA genes during organohalide respiration with 1,2-DCA, 1,2-DCP and 1,2,3-TCP as electron acceptors. FEMS Microbiol Lett:354, 111-118. 58. Molenda O, Tang S, Edwards EA. 2016. Complete genome sequence of Dehalococcoides mccartyi strain WBC-2, capable of anaerobic reductive dechlorination of vinyl chloride. Genome Announcements 4:e01375-01316. 59. Molenda O, Quaile AT, Edwards EA. 2015. Dehalogenimonas sp. strain WBC-2 genome and identification of its trans-dichloroethene reductive dehalogenase, TdrA. Appl Environ Microbiol 82:40-50. 60. Stroo HF, Leeson A, Ward CH. 2012. Bioaugmentation for Groundwater Remediation. Springer New York. 61. Major DJ, McMaster ML, Cox EE, Edwards EA, Dwortzek SM, Hendrickson ER, Starr MG, Payne JA, Buonamici LW. 2000. Field demonstration of sucessful bioaugmentation to achieve dechlorination of tetrachloroethene to ethene. Environ Sci Technol 36:5106-5116. 62. Maymo-Gatell X, Chien Y, Gossett JM, Zinder SH. 1997. Isolation of a bacterium that reductively dechlorinates tetrachloroethene to ethene. Science 276:1568-1571. 63. He J, Ritalahti KM, Yang KL, Koenigsberg SS, Löffler FE. 2003. Detoxification of vinyl chloride to ethene coupled to growth of an anaerobic bacterium. Nature 424:62-65.

127

64. Calabrese EJ, Kostecki PT, Dragun J. 2006. Contaminated Soils, Sediments and Water: Science in the Real World. Springer US. 65. Kube M, Beck A, Zinder SH, Kuhl H, Reinhardt R, Adrian L. 2005. Genome sequence of the chlorinated compound-respiring bacterium Dehalococcoides species strain CBDB1. Nat Biotech 23:1269-1273. 66. Grindley ND, Whiteson KL, Rice PA. 2006. Mechanisms of site-specific recombination. Annu Rev Biochem 75:567-605. 67. Komano T, Kim SR, Yoshida T, Nisioka T. 1994. DNA rearrangement of the shufflon determines recipient specificity in liquid mating of IncI1 plasmid R64. J Mol Biol 243:6-9. 68. Johnson RC. 2015. Site-specific DNA Inversion by Serine Recombinases. Microbiol Spectr 3:Mdna3-0047-2014. 69. Birge EA. 1983. Site-specific recombination following conjugation in Escherichia coli K-12. Mol Gen Genet 192:366-372. 70. Krajmalnik-Brown R, Sung Y, Ritalahti KM, Saunders FM, Löffler FE. 2007. Environmental distribution of the trichloroethene reductive dehalogenase gene (tceA) suggests lateral gene transfer among Dehalococcoides. FEMS Microbiol Ecol 59:206-214. 71. Martínez-Cano DJ, Reyes-Prieto M, Martínez-Romero E, Partida-Martínez LP, Latorre A, Moya A, Delaye L. 2014. Evolution of small prokaryotic genomes. Frontiers in Microbiology 5:742. 72. Dufresne A, Garczarek L, Partensky F. 2005. Accelerated evolution associated with genome reduction in a free-living prokaryote. Genome Biol 6:R14. 73. Mira A, Ochman H, Moran NA. 2001. Deletional bias and the evolution of bacterial genomes. Trends Genet 17:589-596. 74. Marais GA, Calteau A, Tenaillon O. 2008. Mutation rate and genome reduction in endosymbiotic and free-living bacteria. Genetica 134:205-210. 75. Berg OG, Kurland CG. 2002. Evolution of microbial genomes: sequence acquisition and loss. Mol Biol Evol 19:2265-2276. 76. Wolf YI, Koonin EV. 2013. Genome reduction as the dominant mode of evolution. Bioessays 35:829-837. 77. Krzmarzick MJ, Crary BB, Harding JJ, Oyerinde OO, Leri AC, Myneni SC, Novak PJ. 2012. Natural niche for organohalide-respiring Chloroflexi. Appl Environ Microbiol 78:393-401. 78. Fung JM, Morris RM, Adrian L, Zinder SH. 2007. Expression of reductive dehalogenase genes in Dehalococcoides ethenogenes strain 195 growing on tetrachloroethene, trichloroethene, or 2,3-dichlorophenol. Appl Environ Microbiol 73:4439-4445. 79. Sung Y, Ritalahti KM, Apkarian RP, Löffler FE. 2006. Quantitative PCR confirms purity of strain GT, a novel trichloroethene-to-ethene-respiring Dehalococcoides isolate. Appl Environ Microbiol 72:1980-1987. 80. Duhamel M, Mo K, Edwards EA. 2004. Characterization of a highly enriched Dehalococcoides-containing culture that grows on vinyl chloride and trichloroethene. Appl Environ Microbiol 70:5538-5545. 81. Duhamel M, Edwards EA. 2007. Growth and yields of dechlorinators, acetogens, and methanogens during reductive dechlorination of chlorinated ethenes and dihaloelimination of 1 ,2-dichloroethane. Environ Sci Technol 41:2303-2310. 82. Duhamel M, Wehr SD, Yu L, Rizvi H, Seepersad D, Dworatzek S, Cox EE, Edwards EA. 2002. Comparison of anaerobic dechlorinating enrichment cultures maintained on tetrachloroethene, trichloroethene, cis-dichloroethene and vinyl chloride. Water Res 36:4193- 4202. 83. Tang S, Gong Y, Edwards EA. 2012. Semi-automatic in silico gap closure enabled de novo assembly of two Dehalobacter genomes from metagenomic data. PLoS One 7. 84. Bolger AM, Lohse M, Usadel B. 2014. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30:2114-2120.

128

85. Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJ, Birol I. 2009. ABySS: a parallel assembler for short read sequence data. Genome Res 19:1117-1123. 86. Boetzer M, Henkel CV, Jansen HJ, Butler D, Pirovano W. 2011. Scaffolding pre-assembled contigs using SSPACE. Bioinformatics 27:578-579. 87. Gnerre S, Maccallum I, Przybylski D, Ribeiro FJ, Burton JN, Walker BJ, Sharpe T, Hall G, Shea TP, Sykes S, Berlin AM, Aird D, Costello M, Daza R, Williams L, Nicol R, Gnirke A, Nusbaum C, Lander ES, Jaffe DB. 2011. High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proc Natl Acad Sci U S A 108:1513-1518. 88. Brown CT, Howe A, Zhang Q, Pyrokosz AB, Brom TH. 2012. A reference-free algorithm for computational normalization of shotgun sequencing data. Quantitative Biology. 89. Kearse M, Moir R, Wilson A, Stones-Havas S, Cheung M, Sturrock S, Buxton S, Cooper A, Markowitz S, Duran C, Thierer T, Ashton B, Meintjes P, Drummond A. 2012. Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics 28:1647-1649. 90. Aziz R, Bartels D, Best AD, M, Disz T, Edwards R, Forsma K, Gerdes SG, EM, Kubal M, Meyer F, Olsen G, Olson R, Ostermn A, Overbeek R, McNeil L, Paarmann D, Paczian T, Parello B, Push G, Reich C, Stevens R, Vassieva O, Vonstein V, Wilke A, Zagnitko O. 2008. The RAST server: rapid annotations using subsystems technology. BMC Genomics 9. 91. Van Domselaar GH, Stothard P, Shrivastava S, Cruz JA, Guo A, Dong X, Lu P, Szafron D, Greiner R, Wishart DS. 2005. BASys: a web server for automated bacterial genome annotation. Nucleic Acids Res 33:W455-W459. 92. Frank A, Lobry J. 2000. Oriloc: prediction of replication boundaries in unannotated bacterial chromosomes. Bioinformatics 16:566-567. 93. Edgar RC. 2004. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32:1792-1797. 94. Stamatakis A. 2006. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22:2688-2690. 95. Ahn YB, Kerkhof LJ, Haggblom MM. 2009. Desulfoluna spongiiphila sp. nov., a dehalogenating bacterium in the Desulfobacteraceae from the marine sponge Aplysina aerophoba. Int J Syst Evol Microbiol 59:2133-2139. 96. Nekrutenko A, Makova KD, Li W-H. 2002. The K(A)/K(S) Ratio Test for Assessing the Protein-Coding Potential of Genomic Regions: An Empirical and Simulation Study. Genome Res 12:198-202. 97. Grostern A, Edwards EA. 2009. Characterization of a Dehalobacter coculture that dechlorinates 1,2-dichloroethane to ethene and identification of the putative reductive dehalogenase gene. Appl Environ Microbiol 75:2684-2693. 98. Waller AS, Krajmalnik-Brown R, Löffler FE, Edwards EA. 2005. Multiple reductive- dehalogenase-homologous genes are simultaneously transcribed during dechlorination by Dehalococcoides-containing cultures. Appl Environ Microbiol 71:8257-8264. 99. Ritalahti KM, Amos BK, Sung Y, Wu Q, Koenigsberg SS, Löffler FE. 2006. Quantitative PCR targeting 16S rRNA and multiple reductive dehalogenase genes simultaneously monitors multiple Dehalococcoides strains. Appl Environ Microbiol 72:2765–2774. 100. Engelbrektson A, Kunin V, Wrighton KC, Zvenigorodsky N, Chen F, Ochman H, Hugenholtz P. 2010. Experimental factors affecting PCR-based estimates of microbial species richness and evenness. ISME J 4:642-647. 101. Ramos-Padron E, Bordenave S, Lin S, Bhaskar IM, Dong X, Sensen CW, Fournier J, Voordouw G, Gieg LM. 2011. Carbon and sulfur cycling by microbial communities in a gypsum-treated oil sands tailings pond. Environ Sci Technol 45:439-446. 102. Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD, Costello EK, Fierer N, Peña AG, Goodrich JK, Gordon JI, Huttley GA, Kelley ST, Knights D, Koenig JE, Ley RE, Lozupone CA, McDonald D, Muegge BD, Pirrung M, Reeder J, Sevinsky JR, Turnbaugh

129

PJ, Walters WA, Widmann J, Yatsunenko T, Zaneveld J, Knight R. 2010. QIIME allows analysis of high-throughput community sequencing data. Nat Methods 7. 103. Edgar RC. 2010. Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26:2460-2461. 104. DeSantis TZ, Hugenholtz P, Larsen N, Rojas M, Brodie EL, Keller K, Huber T, Dalevi D, Hu P, Andersen GL. 2006. Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl Environ Microbiol 72:5069-5072. 105. Wang Q, Garrity GM, Tiedje JM, Cole JR. 2007. Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Appl Environ Microbiol 73:5261-5267. 106. Löffler FE, Yan J, Ritalahti KM, Adrian L, Edwards EA, Konstantinidis KT, Müller JA, Fullerton H, Zinder SH, Spormann AM. 2013. Dehalococcoides mccartyi gen. nov., sp. nov., obligately organohalide-respiring anaerobic bacteria relevant to halogen cycling and bioremediation, belong to a novel bacterial class, Dehalococcoidia classis nov., order Dehalococcoidales ord. nov. and family Dehalococcoidaceae fam. nov., within the phylum Chloroflexi. Int J Syst Evol Microbiol 63:625-635. 107. Hendrickson ER, Payne JA, Young RM, Starr MG, Perry MP, Fahnestock S, Ellis DE, Ebersole RC. 2002. Molecular analysis of Dehalococcoides 16S Ribosomal DNA from chloroethene-contaminated sites throughout North America and Europe. Appl Environ Microbiol 68:485-495. 108. Pöritz M, Goris T, Wubet T, Tarkka MT, Buscot F, Nijenhuis I, Lechner U, Adrian L. 2013. Genome sequences of two dehalogenation specialists - Dehalococcoides mccartyi strains BTF08 and DCMB5 enriched from the highly polluted Bitterfeld region. FEMS Microbiol Lett 343:101- 104. 109. Zhao S, Ding C, He J. 2016. Genomic characterization of Dehalococcoides mccartyi strain 11a5 reveals a circular extrachromosomal genetic element and a new tetrachloroethene reductive dehalogenase gene. FEMS Microbiol Ecol doi:10.1093/femsec/fiw235. 110. Yohda M, Yagi O, Takechi A, Kitajima M, Matsuda H, Miyamura N, Aizawa T, Nakajima M, Sunairi M, Daiba A, Miyajima T, Teruya M, Teruya K, Shiroma A, Shimoji M, Tamotsu H, Juan A, Nakano K, Aoyama M, Terabayashi Y, Satou K, Hirano T. 2015. Genome sequence determination and metagenomic characterization of a Dehalococcoides mixed culture grown on cis-1,2-dichloroethene. J Biosci Bioeng 120:69-77. 111. Molenda O, Tang S, Lomheim L, Vasu G, Lemak S, Yakunin AF, Edwards EA. 2018. Extrachromosomal circular elements targeted by CRISPR-Cas in Dehalococcoides mccartyi are linked to mobilization of reductive dehalogenase genes. ISME J accepted. 112. Low A, Shen Z, Cheng D, Rogers MJ, Lee PKH, He J. 2015. A comparative genomics and reductive dehalogenase gene transcription study of two chloroethene-respiring bacteria, Dehalococcoides mccartyi strains MB and 11a. Scientific Reports 5:15204. 113. Lee PKH, He J, Zinder SH, Alvarez-Cohen L. 2009. Evidence for nitrogen fixation by “Dehalococcoides ethenogenes” strain 195. Appl Environ Microbiol 75:7551-7555. 114. Waller A. 2010. Molecular investigation of chloroethene reductive dehalogenation by the mixed microbial community KB-1. Ph.D. University of Toronto. 115. Waller AS, Hug LA, Mo K, Radford DR, Maxwell KL, Edwards EA. 2012. Transcriptional analysis of a Dehalococcoides-containing microbial consortium reveals prophage activation. Appl Environ Microbiol 78:1178-1186. 116. Wagner A, Segler L, Kleinsteuber S, Sawers G, Smidt H, Lechner U. 2013. Regulation of reductive dehalogenase gene transcription in Dehalococcoides mccartyi. Philos Trans Soc Biol Sci 368:20120317. 117. Krasper L, Lilie H, Kublik A, Adrian L, Golbik R, Lechner U. 2016. The MarR-Type Regulator Rdh2R Regulates rdh Gene Transcription in Dehalococcoides mccartyi strain CBDB1. J Bacteriol 198:3130-3141.

130

118. Mansfeldt CB, Rowe AR, Heavner GLW, Zinder SH, Richardson RE. 2014. Meta-analyses of Dehalococcoides mccartyi strain 195 transcriptomic profiles identify a respiration rate-related gene expression transition point and interoperon recruitment of a key subunit. Appl Environ Microbiol 80:6062-6072. 119. Morris RM, Fung JM, Rahm BG, Zhang S, Freedman DL, Zinder SH, Richardson RE. 2007. Comparative proteomics of Dehalococcoides spp. reveals strain-specific peptides associated with activity. Appl Environ Microbiol 73:320-326. 120. Johnson DR, Brodie EL, Hubbard AE, Andersen GL, Zinder SH, Alvarez-Cohen L. 2008. Temporal Transcriptomic Microarray Analysis of “Dehalococcoides ethenogenes” Strain 195 during the Transition into Stationary Phase. Applied and Environmental Microbiology 74:2864- 2872. 121. Müller JA, Rosner BM, Von Abendroth G, Meshulam-Simon G, McCarty PL, Spormann AM. 2004. Molecular identification of the catabolic vinyl chloride reductase from Dehalococcodies sp. strain VS and its environmental distribution. Appl Environ Microbiol 70:4880-4888. 122. Adrian L, Rahnenfuhrer J, Gobom J, Holscher T. 2007. Identification of a chlorobenzene reductive dehalogenase in Dehalococcoides sp. strain CBDB1. Appl Environ Microbiol 73:7717- 7724. 123. Lauro FM, McDougald D, Thomas T, Williams TJ, Egan S, Rice S, DeMaere MZ, Ting L, Ertan H, Johnson J, Ferriera S, Lapidus A, Anderson I, Kyrpides N, Munk AC, Detter C, Han CS, Brown MV, Robb FT, Kjelleberg S, Cavicchioli R. 2009. The genomic basis of trophic strategy in marine bacteria. Proc Natl Acad Sci U S A 106:15527-15533. 124. Allen HK, Donato J, Wang HH, Cloud-Hansen KA, Davies J, Handelsman J. 2010. Call of the wild: antibiotic resistance genes in natural environments. Nat Rev Micro 8:251-259. 125. Jones EJP, Voytek MA, Lorah MM, Kirshtein JD. 2006. Characterization of a microbial consortium capable of rapid and simultaneous dechlorination of 1,1,2,2-tetrachloroethane and chlorinated ethane and ethene intermediates. Bioremediation J 10:153-168. 126. Edwards EA, Grbić-Galić D. 1994. Anaerobic degradation of toluene and o-xylene by a methanogenic consotrium. Appl Environ Microbiol 60:313-322. 127. Lorah MM, Majcher EH, Jones EJ, Voytek MA. 2007. Microbial consortia development and microcosm and column experiments for enhanced bioremediation of chlorinated volatile organic compounds, West Branch Canal Creek Wetland Area, Aberdeen Proving Ground, Maryland. Survey UDotIUG, US Scientific Investigations Report 2007-5165, Reston, Virginia. 128. Persson B, Argos P. 1994. Prediction of transmembrane segments in proteins utilising multiple sequence alignments. J Mol Biol 237:182-192. 129. Contreras-Moreira B, Vinuesa P. 2013. GET_HOMOLOGUES, a versatile software package for scalable and robust microbial pangenome analysis. Appl Environ Microbiol 79:7696-7701. 130. Darling AE, Mau B, Perna NT. 2010. progressiveMauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS One 5:e11147. 131. Wang Y, Tang H, Debarry JD, Tan X, Li J, Wang X, Lee TH, Jin H, Marler B, Guo H, Kissinger JC, Paterson AH. 2012. MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res 40:e49. 132. Valentın-Vargas A, Toro-Labrador G, Massol-Deya AA. Bacterial Community Dynamics in Full-Scale Activated Sludge Bioreactors: Operational and Ecological Factors Driving Community Assembly and Performance. PLoS ONE 7:e42524. 133. Wernera JJ, Knights D, Garciac ML, Scalfonea NB, Smithd S, Yarasheskid K, Cummingse TA, Beerse AR, Knightf R, Angenent LT. 2011. Bacterial community structures are unique and resilient in full-scale bioenergy systems. PNAS 108:4158–4163. 134. Ziv-El M, Popat SC, Cai K, Halden RU, Krajmalnik-Brown R, Rittmann BE. 2012. Managing Methanogens and Homoacetogens to Promote Reductive Dechlorination of

131

Trichloroethene With Direct Delivery of H2 in a Membrane Biofilm Reactor. Biotechnol Bioeng 109:2200-2210. 135. Forster D, Behnke A, Stoeck T. 2012. Meta-analyses of environmental sequence data identify anoxia and salinity as parameters shaping ciliate communities. Syst Biodivers 10:277–288. 136. Marshall IPG, Azizian MF, Semprini L, Spormann AM. 2014. Inferring community dynamics of organohalide-respiring bacteria in chemostats by covariance of rdhA gene abundance. FEMS Microbiol Ecol 87:428-440. 137. Rogosa M. 1971. Transfer of Veillonella prevot and Acidaminococcus rogosa from Neisseriaceae to Veillonellaceae fam. nov., and the includion of Megasphaera rogosa in Veillonellaceae. International Journal of Systematic Biology 21:231-233. 138. Xu M, Chen X, Qiu M, Zeng X, Xu J, Deng D, Sun G, Li X, Guo J. 2012. Bar-Coded Pyrosequencing Reveals the Responses of PBDE-Degrading Microbial Communities to Electron Donor Amendments. PLoS ONE 7:e30439. 139. Islam MA, Waller AS, Hug LA, Provart NJ, Edwards EA, Mahadevan R. 2014. New Insights into Dehalococcoides mccartyi Metabolism from a Reconstructed Metabolic Network- Based Systems-Level Analysis of D. mccartyi Transcriptomes. PLoS ONE 9:e94808. 140. Langille MGI, Hsiao WWL, Brinkman FSL. 2010. Detecting genomic islands using bioinformatics approaches. Nat Rev Microbiol 8:373-382. 141. Makarova KS, Haft DH, Barrangou R, Brouns SJJ, Charpentier E, Horvath P, Moineau S, Mojica FJM, Wolf YI, Yakunin AF, van der Oost J, Koonin EV. 2011. Evolution and classification of the CRISPR–Cas systems. Nat Rev Micro 9:467-477. 142. van der Oost J, Jore MM, Westra ER, Lundgren M, Brouns SJJ. 2009. CRISPR-based adaptive and heritable immunity in prokaryotes. Trends Biochem Sci 34:401-407. 143. Fineran PC, Gerritzen MJH, Suárez-Diez M, Künne T, Boekhorst J, van Hijum SAFT, Staals RHJ, Brouns SJJ. 2014. Degenerate target sites mediate rapid primed CRISPR adaptation. Proceedings of the National Academy of Sciences 111:E1629-E1638. 144. Hatoum-Aslan A, Maniv I, Marraffini LA. 2011. Mature clustered, regularly interspaced, short palindromic repeats RNA (crRNA) length is measured by a ruler mechanism anchored at the precursor processing site. Proceedings of the National Academy of Sciences of the United States of America 108:21218-21222. 145. Fineran PC, Charpentier E. 2012. Memory of viral infections by CRISPR-Cas adaptive immune systems: Acquisition of new information. Virology 434:202-209. 146. Aziz R, Bartels D, Best A, DeJongh M, Disz T, Edwards R, Forsma K, Gerdes S, Glass E, Kubal M, Meyer F, Olsen G, Olson R, Ostermn A, Overbeek R, McNeil L, Paarmann D, Paczian T, Parello B, Push G, Reich C, Stevens R, Vassieva O, Vonstein V, Wilke A, Zagnitko O. 2008. The RAST server: rapid annotations using subsystems technology. BMC Genomics 9. 147. Zhou Y, Liang Y, Lynch KH, Dennis JJ, Wishart DS. 2011. PHAST: A Fast Phage Search Tool. Nucleic Acids Res 39:W347-W352. 148. Fouts DE. 2006. Phage_Finder: Automated identification and classification of prophage regions in complete bacterial genome sequences. Nucleic Acids Res 34:5839-5851. 149. Dhillon BK, Laird MR, Shay JA, Winsor GL, Lo R, Nizam F, Pereira SK, Waglechner N, McArthur AG, Langille MGI, Brinkman FSL. 2015. IslandViewer 3: more flexible, interactive genomic island discovery, visualization and analysis. Nucleic Acids Res 43:W104-W108. 150. Guy L, Kultima JR, Andersson SGE. 2010. genoPlotR: comparative gene and genome visualization in R. Bioinformatics 26:2334-2335. 151. Krajmalnik-Brown R, Holscher T, Thomson IN, Saunders FM, Ritalahti KM, Löffler FE. 2004. Genetic identification of a putative vinyl chloride reductase in Dehalococcoides sp. strain BAV1. Appl Environ Microbiol 70:6347-6351.

132

152. Gong B, Shin M, Sun J, Jung C-H, Bolt EL, van der Oost J, Kim J-S. 2014. Molecular insights into DNA interference by CRISPR-associated nuclease-helicase Cas3. Proceedings of the National Academy of Sciences of the United States of America 111:16359-16364. 153. Makarova KS, Wolf YI, Alkhnbashi OS, Costa F, Shah SA, Saunders SJ, Barrangou R, Brouns SJJ, Charpentier E, Haft DH, Horvath P, Moineau S, Mojica FJM, Terns RM, Terns MP, White MF, Yakunin AF, Garrett RA, van der Oost J, Backofen R, Koonin EV. 2015. An updated evolutionary classification of CRISPR-Cas systems. Nat Rev Micro 13:722- 736. 154. Huo Y, Nam KH, Ding F, Lee H, Wu L, Xiao Y, Farchione FD, Zhou S, Rajashankar R, Kurinov I, Zhang R, Ke A. 2014. Structures of CRISPR Cas3 offer mechanistic insights into Cascade-activated DNA unwinding and degradation. Nat Struct Mol Biol 21:771-777. 155. Canchaya C, Proux C, Fournous G, Bruttin A, Brüssow H. 2003. Prophage Genomics. Microbiol Mol Biol Rev 67:238-276. 156. Wozniak RA, Waldor MK. 2010. Integrative and conjugative elements: mosaic mobile genetic elements enabling dynamic lateral gene flow. Nat Rev Microbiol 8:552-563. 157. Roberts AP, Chandler M, Courvalin P, Guedon G, Mullany P, Pembroke T, Rood JI, Smith CJ, Summers AO, Tsuda M, Berg DE. 2008. Revised nomenclature for transposable genetic elements. Plasmid 60:167-173. 158. Daccord A, Ceccarelli D, Rodrigue S, Burrus V. 2013. Comparative Analysis of Mobilizable Genomic Islands. J Bacteriol 195:606-614. 159. Rankin DJ, Rocha EPC, Brown SP. 2011. What traits are carried on mobile genetic elements, and why? Heredity 106:1-10. 160. McDonnell GE, McConnell DJ. 1994. Overproduction, isolation, and DNA-binding characteristics of Xre, the repressor protein from the Bacillus subtilis defective prophage PBSX. J Bacteriol 176:5831-5834. 161. Ibarra JA, Pérez-Rueda E, Carroll RK, Shaw LN. 2013. Global analysis of transcriptional regulators in Staphylococcus aureus. BMC Genomics 14:126. 162. Barragan MJ, Blazquez B, Zamarro MT, Mancheno JM, Garcia JL, Diaz E, Carmona M. 2005. BzdR, a repressor that controls the anaerobic catabolism of benzoate in Azoarcus sp. CIB, is the first member of a new subfamily of transcriptional regulators. J Biol Chem 280. 163. Tocchetti A, Galimberti G, Dehò G, Ghisotti D. 1999. Characterization of the oriI and oriII Origins of Replication in Phage-Plasmid P4. J Virol 73:7308-7316. 164. Ziegelin G, Scherzinger E, Lurz R, Lanka E. 1993. Phage P4 alpha protein is multifunctional with origin recognition, helicase and primase activities. EMBO J 12:3703-3708. 165. Briani F, Deho G, Forti F, Ghisotti D. 2001. The plasmid status of satellite bacteriophage P4. Plasmid 45:1-17. 166. Rojowska A, Lammens K, Seifert FU, Direnberger C, Feldmann H, Hopfner K-P. 2014. Structure of the Rad50 DNA double-strand break repair protein in complex with DNA. The EMBO Journal 33:2847-2859. 167. Ayora S, Carrasco B, Cardenas PP, Cesar CE, Canas C, Yadav T, Marchisone C, Alonso JC. 2011. Double-strand break repair in bacteria: a view from Bacillus subtilis. FEMS Microbiol Rev 35:1055-1081. 168. Peeters N, Guidot A, Vailleau F, Valls M. 2013. Ralstonia solanacearum, a widespread bacterial plant pathogen in the post-genomic era. Mol Plant Pathol 14:651-662. 169. Maphosa F, Smidt H, de Vos WM, Röling WFM. 2010. Microbial Community- And Metabolite Dynamics of an Anoxic Dechlorinating Bioreactor. Environ Sci Technol 44:4884- 4890. 170. Biswas A, Staals RHJ, Morales SE, Fineran PC, Brown CM. 2016. CRISPRDetect: A flexible algorithm to define CRISPR arrays. BMC Genomics 17:356.

133

171. Barrangou R, Fremaux C, Deveau H, Richards M, Boyaval P, Moineau S, Romero DA, Horvath P. 2007. CRISPR provides acquired resistance against viruses in prokaryotes. Science 315:1709-1712. 172. Nuñez JK, Harrington LB, Kranzusch PJ, Engelman AN, Doudna JA. 2015. Foreign DNA capture during CRISPR–Cas adaptive immunity. Nature 527:535-538. 173. Ghinet MG, Bordeleau E, Beaudin J, Brzezinski R, Roy S, Burrus V. 2011. Uncovering the Prevalence and Diversity of Integrating Conjugative Elements in Actinobacteria. PLOS ONE 6:e27846. 174. Poele EMt, Bolhuis H, Dijkhuizen L. 2008. Actinomycete integrative and conjugative elements. Antonie Leeuwenhoek 94:127-143. 175. Possoz C, Ribard C, Gagnat J, Pernodet JL, Guerineau M. 2001. The integrative element pSAM2 from Streptomyces: kinetics and mode of conjugal transfer. Mol Microbiol 42:159-166. 176. Vogelmann J, Ammelburg M, Finger C, Guezguez J, Linke D, Flötenmeyer M, Stierhof Y- D, Wohlleben W, Muth G. 2011. Conjugal plasmid transfer in Streptomyces resembles bacterial chromosome segregation by FtsK/SpoIIIE. The EMBO Journal 30:2246-2254. 177. Sezonov G, Duchene AM, Friedmann A, Guerineau M, Pernodet JL. 1998. Replicase, excisionase, and integrase genes of the Streptomyces element pSAM2 constitute an operon positively regulated by the pra gene. J Bacteriol 180:3056-3061. 178. Sezonov G, Hagege J, Pernodet JL, Friedmann A, Guerineau M. 1995. Characterization of pra, a gene for replication control in pSAM2, the integrating element of Streptomyces ambofaciens. Mol Microbiol 17:533-544. 179. Jiang W, Maniv I, Arain F, Wang Y, Levin BR, Marraffini LA. 2013. Dealing with the evolutionary downside of CRISPR immunity: bacteria and beneficial plasmids. PLoS Genet 9:e1003844. 180. Westra ER, Staals RHJ, Gort G, Høgh S, Neumann S, de la Cruz F, Fineran PC, Brouns SJJ. 2013. CRISPR-Cas systems preferentially target the leading regions of MOBF conjugative plasmids. RNA Biology 10:749-761. 181. Weinberger AD, Gilmore MS. 2012. CRISPR-Cas: To take up DNA or not, that is the question. Cell host & microbe 12:125-126. 182. Aulenta F, Majone M, Tandoi V. 2006. Enhanced anaerobic bioremediation of chlorinated solvents: environmental factors influencing microbial activity and their relevance under field conditions. J Chem Technol Biotechnol 81:1463-1474. 183. Löffler FE, Sanford RA, Tiedje JM. 1996. Initial characterization of a reductive dehalogenase from Desulfitobacterium chlororespirans Co23. Appl Environ Microbiol 62:3809-3813. 184. Smidt H, De Vos WM. 2004. Anaerobic microbial dehalogenation. Annu Rev Microbiol 58:43- 73. 185. Bowman K, Nobre MF, Da Costa MS, Rainey FA, Moe WM. 2013. Dehalogenimonas alkenigignens sp. nov., a chlorinated-alkane-dehalogenating baterium isolated from groundwater. Int J Syst Evol Microbiol 63:1492-1498. 186. Chen J, Bowman KS, Rainey FA, Moe WM. 2014. Reassessment of PCR primers targeting 16S rRNA genes of the organohalide-respiring genus Dehalogenimonas. Biodegradation 25:747-756. 187. Kittelmann S, Fredrich MW. 2008. Novel uncultured Chloroflexi dechlorinate perchloroethene to trans-dichloroethene in tidal flat sediments. Environ Microbiol 10:1557-1570. 188. Griffin BM, Tiedje JE, Löffler FE. 2004. Anaerobic microbial reductive dechlorination of tetrachloroethene to predominantly trans-1,2-dichloroethene. Environ Sci Technol 38:4300-4303. 189. Cheng D, Chow WL, He J. 2010. A Dehalococcoides-containing co-culture that dechlorinates tetrachloroethene to trans-1,2-dichloroethene. ISME J 4:88-97. 190. Chow WL, Cheng D, Wang S, He J. 2010. Identification and transcription analysis of trans- DCE producing reductive dehalogenases in Dehalococcoides species. The ISME Journal 4:1020- 2013.

134

191. Miller GS, Milliken CE, Sowers KR, May HD. 2005. Reductive dechlorination of tetrachloroethene to trans-dichloroethene and cis-dichloroethene by PCB-dechlorinating bacterium DF-1. Environ Sci Technol 39:2631-2635. 192. Marco-Urrea E, Nijenhuis I, Adrian L. 2011. Transformation and carbon isotope fractionation of tetra- and trichloroethene to trans-dichloroethene by Dehalococcoides sp. strain CBDB1. Environ Sci Technol 45:1555-1562. 193. Wang S, Zhang W, Yang K-L, He J. 2014. Isolation and characterization of a novel Dehalobacter species strain TCP1 that reductively dechlorinates 2,4,6-trichlorophenol. Biodegradation 25:313-323. 194. Shevchenko A, Tomas H, Havlis J, Olsen JV, Mann M. 2006. In-gel digestion for mass spectrometric characterization of proteins and proteomes. Nat Protoc 1:2856-2860. 195. Kessner D, Chambers M, Burke R, Agus D, Mallick P. 2008. ProteoWizard: open source software for rapid proteomics tools development. Bioinformatics 24:2534-2536. 196. Keller A, Nesvizhskii AI, Kolker E, Aebersold R. 2002. Empirical statistical model to estimate the accuracy of peptide identifiations made by MS/MS and database search. Anal Chem 74:5383- 5392. 197. Nesvizhskii AI, Keller A, Kolker E, Aebersold R. 2003. A statistical model for identifying proteins by tandem mass spectrometry. Anal Chem 75:4646-4658. 198. Fricker AD, LaRoe SL, Shea ME, Bedard DL. 2014. Dehalococcoides mccartyi strain JNA dechlorinates multiple chlorinated phenols including pentachlorophenol and harbors at least 19 reductive dehalogenase homologous genes. Environ Sci Technol 48:14300-14308. 199. Matthews RG. 2009. Cobalamin- and corrinoid-dependent enzymes. Met Ions Life Science 6:53- 114. 200. Bollag JA. 1995. Soil contamination and fesibility of biological remediation. Soil Science Society of America, Madison, Wisconsin. 201. Mayer-Blackwell K, Fincker M, Molenda O, Callahan B, Sewell H, Holmes S, Edwards EA, Spormann AM. 2016. 1,2-Dichloroethane Exposure Alters the Population Structure, Metabolism, and Kinetics of a Trichloroethene-Dechlorinating Dehalococcoides mccartyi Consortium. Environ Sci Technol 50:12187-12196. 202. Yan J, Bi M, Bourdon AK, Farmer AT, Wang P-H, Molenda O, Quaile A, Jiang N, Yang Y, Yin Y, Simsir B, Campagna SR, Edwards EA, Loeffler FE. 2018. Purinyl-cobamide is a native prosthetic group of reductive dehalogenases. Nat Chem Biol 14:8-14. 203. Grostern A, Edwards EA. 2006. Growth of Dehalobacter and Dehalococcoides spp. during degradation of chlorinated ethanes. Appl Environ Microbiol 72:428-436.

135

Appendix A: Supplemental Information for Chapter 2 Table A1. qPCR survey of Dehalococcoides mccartyi 16S rRNA gene (Dhc), General Bacteria 16S rRNA, (GenBac) and reductive dehalogenases: vcrA, bvcA and tceA of four different KB-1 enrichment cultures. ND - not detected NT- not tested Raw data - SQ (starting quantity) (copies/uL) Concentration in culture (copies/mL)

Elution Amount Dilution volume Culture filtered factor for after DNA vcrA SQ bvcA SQ tceA SQ Dhc SQ GenBac SQ vcrA bvcA tceA Dhc GenBac (mL) sample extraction (uL)

KB-1/1,2-DCA-MeOH sample 11:10 40 10 50 6761 150007 29615 45702 626868 8.45E+04 1.88E+06 3.70E+05 5.71E+05 7.84E+06 KB-1/1,2-DCA-MeOH sample 1 1:10 40 10 50 6621 133025 29201 74777 545799 8.28E+04 1.66E+06 3.65E+05 9.35E+05 6.82E+06 KB-1/1,2-DCA-MeOH sample 1 1:50 40 50 50 1524 21897 3977 11583 127699 9.53E+04 1.37E+06 2.49E+05 7.24E+05 7.98E+06 KB-1/1,2-DCA-MeOH sample 1 1:50 40 50 50 1711 21125 nt 12343 127051 1.07E+05 1.32E+06 7.71E+05 7.94E+06 KB-1/1,2-DCA-MeOH sample 2 1:10 40 10 50 170565 2073701 887853 1024617 3996939 2.13E+06 2.59E+07 1.11E+07 1.28E+07 5.00E+07 KB-1/1,2-DCA-MeOH sample 2 1:10 40 10 50 170764 2255491 828537 1022573 3916842 2.13E+06 2.82E+07 1.04E+07 1.28E+07 4.90E+07 KB-1/1,2-DCA-MeOH sample 2 1:50 40 50 50 45550 574361 120877 265142 981316 2.85E+06 3.59E+07 7.55E+06 1.66E+07 6.13E+07 KB-1/1,2-DCA-MeOH sample 2 1:50 40 50 50 47177 585780 125475 278127 1051573 2.95E+06 3.66E+07 7.84E+06 1.74E+07 6.57E+07 KB-1/cDCE-MeOH 1:10 40 10 50 85917 306010 ND 235024 630396 1.07E+06 3.83E+06 ND 2.94E+06 7.88E+06 KB-1/cDCE-MeOH 1:10 40 10 50 85548 327799 ND 286083 655866 1.07E+06 4.10E+06 ND 3.58E+06 8.20E+06 KB-1/cDCE-MeOH 1:50 40 50 50 21990 81368 ND 44930 149034 1.37E+06 5.09E+06 ND 2.81E+06 9.31E+06 KB-1/cDCE-MeOH 1:50 40 50 50 23477 82195 ND 46705 171444 1.47E+06 5.14E+06 ND 2.92E+06 1.07E+07

KB-1/VC-H2 1:10 40 10 50 7015011 ND ND 3423443 4154129 8.77E+07 ND ND 4.28E+07 5.19E+07

KB-1/VC-H2 1:10 40 10 50 6988920 ND ND 3380919 4210962 8.74E+07 ND ND 4.23E+07 5.26E+07

KB-1/VC-H2 1:50 40 50 50 1886096 ND ND 923854 1072650 1.18E+08 ND ND 5.77E+07 6.70E+07

KB-1/VC-H2 1:50 40 50 50 1975811 ND nt 995994 1075217 1.23E+08 ND 6.22E+07 6.72E+07 KB-1/TCE-MeOH 1:50 400 50 50 7006510 ND 235535 9655782 15516397 4.38E+07 ND 1.47E+06 6.03E+07 9.70E+07 KB-1/TCE-MeOH 1:50 400 50 50 6723999 ND 230317 9821546 13875805 4.20E+07 ND 1.44E+06 6.14E+07 8.67E+07 KB-1/TCE-MeOH 1:100 400 100 50 6359129 ND 236495 nt nt 7.95E+07 ND 2.96E+06 nt nt KB-1/TCE-MeOH 1:100 400 100 50 3704187 ND nt nt 4.63E+07 ND nt nt

Figure A1. Maximum likelihood phylogenetic tree (of 100 bootstraps) of nucleotide alignment of 16S rRNA in Dehalococcoides mccartyi (Dhc) strains. D. mccartyi genomes available in NCBI and Dehalogenimonas sp. WBC-2 used as out-group. Scale indicates number of nucleotide substitutions per site. Dhc strains identified by clade.

137

Figure A2. Illustration of synteny found in KB-1 D. mccartyi genomes. Whole genome alignments and putative open reading frames were used to calculate collinear blocks of sequences displayed using lines. Collinear blocks of coding sequences are indicated using lines. Each curved line links a pair of collinear genes, different colours of lines represent different collinear blocks. Nearest Dehalococcoidia relative Dehalogenimonas sp. WBC-2 and most similar Chloroflexi relative Sphaerobacter thermophilus are included as out-groups.

138

Figure A3. Phylogenetic nucleotide tree of reductive dehalogenase genes from D. mccartyi closed genomes.

Most likely tree of 100 bootstraps. Scale indicates number of substitutions per site. Orthologous groups (OGs) of dehalogenases with upwards of 90% amino acid identity are highlighted and identified by number. OGs containing a functionally characterized representative are annotated by dehalogenase gene name. RdhA are coloured by strain of origin. Strain name is listed before locus tag if not included in locus tag.

139

Table A2. Sections 1-4. Parameters used to estimate divergence age. Parts 1-4 were used to calculate final divergence times in Table A2-4. Table A2-1: Mutation rates used in divergence calculations (1) estimated universal bacterial error rate (2) empirically determined Escherichia coli mutation rate and (3) empirically determined mutation rate between D. mccartyi DONNA2 and 195 over 16 years. Table A2-2: Calculation for average Dehalococcoidia genome size, Table A2-3: Calculation for D. mccartyi average doubling time. Table A2-4: Example calculated divergence times in years for three different mutation rates (found in Table A2-1). Dhc – Dehalococcoides, Dhg – Dehalogenimonas . Table A2-1 Mutation Rates D. mccartyi bacteria E. coli unit DONNA2/strain 195 Reference Ochman et al 1999 Drake et al 1998 McMurdie et al 2011 Overall error rate nt/genome/generation 1.00E-09 5.40E-09 2.08405E-06 bp-1 # bp per mutation 1.00E+09 185185185.2 479835.6062 bp

Average Dehalococcoidia genome size 1439363 bp

Replications per single fixed neutral mutation per genome 694.8 128.7 0.3 replications

D. mccartyi doubling time (days) 2 days

Time estimated for single mutation (days) 1.37E+03 2.54E+02 6.58E-01 days years 3.76 0.70 0.00 years

140

Table A2-2 Genome Species size (bp) D. mccartyi GT 1360154 D. mccartyi CBDB1 1395502 D. mccartyi 195 1469720 D. mccartyi BAV1 1341892 D. mccartyi CG5 1362151 D. mccartyi WBC-2 1374583 D. mccartyi VS 1413462 D. mccartyi CG4 1382308 D. mccartyi GY50 1407418 D. mccartyi DCMB5 1431902 D. mccartyi IBARAKI 1451902 D. mccartyi BTF08 1451056 D. mccartyi 11a5 1461973 D. mccartyi CG1 1486678 D. mccartyi CG3 1521286 D. mccartyi KBDCA1 1428463 D. mccartyi KBDCA2 1394319 D. mccartyi KBDCA3 1337486 D. mccartyi KBTCE1 1388914 D. mccartyi KBTCE2 1329198 D. mccartyi KBTCE3 1271604 D. mccartyi KBVC1 1359904 D. mccartyi KBVC2 1337731 Dehalogenimonas sp. WBC-2 1725730 Dehalogenimonas lykanthroporepellens 1686510 Dehalogenimonas alkenigignens 1851580 Average Dhc genome 1398244 Average Dhg genome 1754607 Weighted Average Genome Size 1439363

Table A2-3 D. mccartyi doubling times D. mccartyi strain Reference (days)

FL2 2.4 He et al. EM, 2005 DE195 0.8 Maymo-Gatell et al. Science, 1997 VS 2.5 Cupples et al. AEM 2003 BAV1 2.2 He et al. Science, 2003 AVERAGE growth rate 2 days

141

Time since divergence (years) from Table A2-4 three different mutation rates D. mccartyi branch Divergence method # mutations bacteria E. coli DONNA2/strain lengths 195 MRCA Dhg concat tree (ML) 0.2589 372651 1,395,876 258,496 124,229 MRCA Dhc concat tree (ML) 0.2832 407627 1,526,891 282,758 135,889 Dehalococcoidia concat tree (ML) 0.5936 854406 3,200,433 592,673 284,830 D. mccartyi clades (Pinellas, Cornell, Victoria) concat tree (ML) 0.0582 81378 304,825 56,449 27,926 KBTCE2 MRCA (w/KBTCE3) concat tree (ML) 1.80E-05 25 94 17 9 KBTCE3 MRCA (w/KBTCE2) concat tree (ML) 1.14E-04 159 597 111 55 KBTCE1 MRCA (w/KBVC2) concat tree (ML) 8.00E-06 11 42 8 4 KBVC2 MRCA (w/KBTCE1) concat tree (ML) 3.30E-05 46 173 32 16 GT MRCA (w/KBVC1) concat tree (ML) 5.75E-04 804 3,012 558 276 KBVC1 MRCA (w/GT) concat tree (ML) 9.69E-04 1355 5,075 940 465 KBDCA1 MRCA (w/KBDCA2) concat tree (ML) 8.00E-06 11 42 8 4 KBDCA2 MRCA (w/KBDCA1) concat tree (ML) 4.10E-05 57 215 40 20

142

Table A3. Summary of correspondence analysis conducted on homologous protein clusters from Dehalococcoidia pangenome. First 9 of 28 components contributed significantly to the ordination accounting for 74% of the variation plot.

Importance of Components: Axes CA1 CA2 CA3 CA4 CA5 CA6 CA7 CA8 CA9 0.29 0.112 0.088 0.075 0.070 0.070 0.055 0.047 0.038 Eigenvalue 96 36 27 02 47 07 71 32 97 Proportion Explained 26% 10% 8% 7% 6% 6% 5% 4% 3% Cumulative Proportion Explained 26% 36% 43% 50% 56% 62% 67% 71% 74% Total Inertia* for 28 components 1.15

143

Appendix B: Supplemental Information for Chapter 3

Figure B1. Culture history for WBC-2 enrichment transfer cultures.

Samples originally obtained from United States Geological Survey (USGS) researchers. Dates at top indicate year when transfer culture was first prepared. Boxes indicate the enrichment substrate. All cultures are also amended with 8.5x ethanol and sodium lactate as donor. Cultures which continue that have been maintained until present appear on right and are named according to labeling used in this paper. Right hand labels indicate electron acceptor/donor_year created. Grey background indicates sample from culture was used in qPCR survey, red border indicates use for initial cell free extract (CFE) activity assays. Blue star indicates investigation using blue- native polyacrylamide gel electrophoresis (BN-PAGE), liquid chromatography followed by tandem mass spectrometry (LC-MS/MS) and gel slice activity assays. Light blue star indicates BN-PAGE separation followed by LC-MS/MS only (described in Chapter 6). Illumina shotgun sequencing was carried out on DNA from cultures marked with a gold sun (described in Chapter 2).

144

A) WBC-2 Enrichment Cultures June 2012 100% 80% 60% 40% 20%

0%

copies/mL culture copies/mL % composition %on based qPCR

Dhc Dhg Dhb

B) WBC-2 Enrichment Cultures Nov. 2012 100% 80% 60% 40% 20%

0%

copies/mL culture copies/mL composition on based (%)qPCR composition

Dhb Dhg Dhc

Figure B2. Composition of Dehalogenimonas (Dhg), Dehalococcoides (Dhc) and Dehalobacter (Dhb) found in WBC-2 enrichment cultures maintained on different chlorinated electron acceptors. A) Determined using qPCR in June 2012 B) Determined using qPCR in November 2012. All cultures also fed ethanol and lactate as donor.

145

Table B1. Summary of correspondence analysis conducted on homologous protein clusters from Dehalococcoidia pangenome. First 9 of 28 components contributed significantly to the ordination accounting for 74% of the variation plot.

Importance of Components: Axes CA1 CA2 CA3 CA4 CA5 CA6 CA7 CA8 CA9 0.29 0.112 0.088 0.075 0.070 0.070 0.055 0.047 0.038 Eigenvalue 96 36 27 02 47 07 71 32 97 Proportion Explained 26% 10% 8% 7% 6% 6% 5% 4% 3% Cumulative Proportion Explained 26% 36% 43% 50% 56% 62% 67% 71% 74% Total Inertia* for 28 components 1.15

Figure B3. Stacked bar graph displaying distribution of Dehalogenimonas and Dehalococcoides protein sequences from pangenome analysis. A total of 2876 homologous protein clusters generated during pangenome analysis (χ² p- value=0.0004998). Protein clusters are on x-axis, but not listed by name due to space limitations.

146

Figure B4. Correspondence analysis outputs A) Scree plot used to determine the number of dimensions. B) Contribution of top 10 homologous protein clusters to CA ordination axis 1, cluster names abbreviated by number C) Contribution of genomes to axis 1, Dhg – Dehalogenimonas Dhc – Dehalococcoides mccartyi. D) Contribution of genomes to axis 2. In all cases red dotted lines indicate expected value if data was random, data points above this line are considered significant to the ordination.

147

Appendix C: Supplemental Information for Chapter 4 Table C1. Primers used in this study.

Primer Name Direction Sequence Target Use Annealing temp. (⁰C) Ref. CRISPR I-E 1f f 5'-TTGCCCGGTTCGTCATTA-3' Annealing outside CRISPR array PCR 65 This thesis CRISPR I-E 4561r r 5'-GCCAATCCAACCGACCTT-3' Annealing outside CRISPR array PCR 65 This thesis CRISPR I-E 1f f 5'-TACCTAGGAGTCAAGCATCG-3' Half of CRISPR array PCR 59 This thesis CRISPR I-E 1725r r 5'-AAAACGCAAAATTCTGACAC-3' Half of CRISPR array PCR 59 This thesis CRISPR I-C 1f f 5’-CGCCAGAGGGGTCAGGTA-3’ Annealing outside CRISPR array PCR 65 This thesis CRISPR I-C 3390r r 5’-CCTCCTCGCCCATGACAC-3’ Annealing outside CRISPR array PCR 65 This thesis

Dhc 1f f 5’-GATGAACGCTAGCGGCG-3’ D. mccartyi 16S rRNA gene qPCR 60 1

Dhc 265r r 5’-CCTCTCAGACCAGCTACCGATCGAA-3’ D. mccartyi 16S rRNA gene qPCR 60 1 vcrA 670f f 5’-GCCCTCCAGATGCTCCCTTTAC-3’ vcrA gene qPCR 60 2

vcrA 440r r 5’-TGCCCTTCCTCACCACTACCAG-3’ vcrA gene qPCR 60 2

vcrA-GI PCR 1 1f f 5’-CTAGATTCTTAATAAATTCGCGTGT-3’ genome region outside of vcrA-GI PCR 65 This thesis vcrA-GI PCR 1 19361r r 5’-AAATAACAAATGAGCTCTCAGAAAA-3’ genome region outside of vcrA-GI PCR 65 This thesis vcrA GI PCR 2 1f f 5’-GTAAAACTTACCGGGTTGAT-3’ circular vcrA genomic island PCR 59.5 This thesis vcrA GI PCR 2 10426r r 5’-TTGTTGCTAACACGACTGTT-3’ circular vcrA genomic island PCR 59.5 This thesis seq-1 f 5'-ACAGGAACGAAGACGAACGGGG-3' 100 bp before CRISPR sequencing n/a This thesis seq-2 f 5'-AGTGCCTACCCTACGAGCAAGG-3' 31 bp before CRISPR sequencing n/a This thesis seq-3 f 5'-ATCGCTAGGGCTTGCCGAACT-3' spacer # 6 sequencing n/a This thesis seq-1 r 5'-ACGCTTCGTTCTGGTGGAA-3' spacer # 16 sequencing n/a This thesis seq-2 r 5'-GGCGTTTAGACCATTTTCACGC-3' spacer # 27 sequencing n/a This thesis seq-3 r 5'-GCGAGTCGTACTGTATTGAGG-3' spacer # 33 sequencing n/a This thesis seq-4 r 5'-ACCAGCACTATCAAAACGCA-3' 48 bp after the end of CRISPR sequencing n/a This thesis seq-T3-1348-1f f 5'-GAGATCTCTCCATGCCCAGC3-3' 300 bp after vcrA-GI sequencing n/a This thesis seq-T1-1572r r 5'-GAGCAGGCTGGGAAACTCAT-3' 233 bp after vcrA-GI sequencing n/a This thesis

1 - Hendrickson et al. (2002) Appl. Environ. Microbiol. 68, 485-495 2- Molenda, O. et al. (2016) Appl. Environ. Microbiol 82,40-50

148

Table C2. General features of Dehalococcoides mccartyi genomes. Genomes closed from KB-1 derived enrichment cultures compared with other CRISPR-Cas containing strains GT, CBDB1 and DCMB5. Strain KBVC1 GT CBDB1 DCMB5 KBDCA3 Origin KB-1VC enrichment Cont. aquifer, Cottage Saale river near Bitterfeld KB-1 1,2-DCA S.ON, CAN Grove, WI, USA Jena, GER region, GER enrichment S. ON, CAN Genome size (Mbp) 1.39 1.36 1.4 1.43 1.34

G+C content (%) 47.3 47 47 47.1 47.6

Protein coding genes 1468 1416 1460 1458 1404

Hypothetical genes (%) 31.1 18.6 28.7 25.4 29.1

tRNA 47 46 47 46 46 Number of Recombinases 4 7 4 7 4 Sub-group/Clade Pinellas Pinellas Pinellas Pinellas Pinellas

1 Chlorinated substrate VC TCE 1,2,3-TCB 1,2,3-TCB 1,2-DCA &1,2,4-TCB CRISPR-Cas type I-E I-E I-E I-E & I-C I-C rdhA genes 22 20 32 23 9 Characterized2 RDases VcrA, PceA VcrA, PceA MbrA, CbrA, MbrA, CbrA BvcA PceA Genomic position of 1318 (I-E) 1208 1176 1201 276 CRISPR-Cas (kbp) 97 (I-C)

28 (I-E) No. of CRISPR spacers 41 37 19 38 54 (I-C)

NCBI accession number CP019968 CP001924 AJ965256 CP004079 CP019946

11,2,3-trichlorobenzne (1,2,3-TCB). 1,2,4-trichlorobenzene (1,2,4-TCB) 2 characterized indicates any RDase which has been at least partially biochemically characterized, not what is being expressed in culture by each stain.

149

Table C3. Potential targets of Dehalococcoides mccartyi KBVC1, KBDCA3, CBDB1, GT and DCMB5 CRISPRs. Results from BLAST searches: (1) against a database of KB-1 prophages and all D. mccartyi prophages from NCBI closed genomes and (2) against a database of all D. mccartyi genomes. A fasta file with all spacer sequences is provided in supplemental information separately from this excel file. GI: Genomic Island. Hit was listed in table if E-value < 1e-1. rdhA-containing genomic islands (GI) are highlighted in red. New spacers acquired between 2002 and 2013 are highlighted in blue. CRISPR spacer spacer Type of mobile D.mccartyi Mobile BLAST % Gene annotation of best strain FASTA name of spacer E-value type # length DNA which DNA name hit pairwise BLAST hit hit (bp) matched to spacer length identity CBDB1 I-E 1 CBDB1_AJ965256_spacer_1 33 prophage KBTCE1/KBVC2-12 16 93.8 3.22E-01 hypothetical protein * CBDB1 I-E 2 CBDB1_AJ965256_spacer_2 33 prophage KBTCE1/KBVC2prophage -12 13 100 3.03E-01 CBS-domain * CBDB1 I-E 3 CBDB1_AJ965256_spacer_3 34 IME1 prophageIME1-13 32 96.9 2.50E-08 hypotheticalcontaining protein protein * CBDB1 I-E 4 CBDB1_AJ965256_spacer_4 33 prophage BTF08 prophage 13 100 3.03E-01 hypothetical protein * CBDB1 I-E 4 CBDB1_AJ965256_spacer_4 33 prophage KBTCE1/KBVC2-12 13 100 3.03E-01 hypothetical protein CBDB1 I-E 5 CBDB1_AJ965256_spacer_5 33 no hits prophage * CBDB1 I-E 6 CBDB1_AJ965256_spacer_6 33 prophage CG1 prophage 32 93.8 7.60E-09 endolysin * CBDB1 I-E 6 CBDB1_AJ965256_spacer_6 33 prophage 195 prophage 18 88.9 3.03E-01 hypothetical protein CBDB1 I-E 6 CBDB1_AJ965256_spacer_6 33 prophage 195 prophage 21 85.7 3.03E-01 hypothetical protein CBDB1 I-E 7 CBDB1_AJ965256_spacer_7 33 no hits * CBDB1 I-E 8 CBDB1_AJ965256_spacer_8 33 no hits * CBDB1 I-E 9 CBDB1_AJ965256_spacer_9 34 IME1 IME1-9 33 97 7.81E-09 hypothetical protein * CBDB1 I-E 10 CBDB1_AJ965256_spacer_10 33 prophage BTF08 prophage 25 96 3.94E-06 hypothetical protein CBDB1 I-E 10 CBDB1_AJ965256_spacer_10 33 prophage CG1 prophage 33 97 1.79E-10 hypothetical protein * CBDB1 I-E 10 CBDB1_AJ965256_spacer_10 33 prophage pg11a5 prophage 25 96 3.94E-06 hypothetical protein CBDB1 I-E 10 CBDB1_AJ965256_spacer_10 33 prophage KB1/cDCE-4 33 87.9 1.13E-06 non-coding CBDB1 I-E 10 CBDB1_AJ965256_spacer_10 33 prophage KB1/TCEprophage-0 prophage 33 87.9 1.13E-06 hypothetical protein CBDB1 I-E 10 CBDB1_AJ965256_spacer_10 33 prophage KBDCA1/KBDCA2-6 33 87.9 1.13E-06 hypothetical protein CBDB1 I-E 10 CBDB1_AJ965256_spacer_10 33 prophage KBTCE1/KBVC2prophage -1 33 87.9 1.13E-06 hypothetical protein CBDB1 I-E 10 CBDB1_AJ965256_spacer_10 33 prophage KBTCE1/KBVC2prophage -5 33 87.9 1.13E-06 hypothetical protein CBDB1 I-E 11 CBDB1_AJ965256_spacer_11 34 prophage CG1prophage prophage 27 92.6 1.61E-04 hypothetical protein * CBDB1 I-E 12 CBDB1_AJ965256_spacer_12 34 prophage KBTCE1/KBVC2-11 13 100 3.22E-01 phage antirepressor * CBDB1 I-E 13 CBDB1_AJ965256_spacer_13 33 prophage CG3prophage prophage 32 96.9 6.24E-10 non-coding CBDB1 I-E 13 CBDB1_AJ965256_spacer_13 33 prophage KB1/cDCE-8 33 100 4.10E-12 hypothetical protein * CBDB1 I-E 14 CBDB1_AJ965256_spacer_14 33 no hits prophage * CBDB1 I-E 15 CBDB1_AJ965256_spacer_15 33 no hits * CBDB1 I-E 16 CBDB1_AJ965256_spacer_16 33 prophage KBTCE1/KBVC2-10 13 100 3.03E-01 phage tail protein * CBDB1 I-E 17 CBDB1_AJ965256_spacer_17 33 prophage WBCprop-2 prophagehage 13 100 3.03E-01 non-coding * CBDB1 I-E 17 CBDB1_AJ965256_spacer_17 33 prophage KBDCA1-9 prophage 13 100 3.03E-01 RNA polymerase CBDB1 I-E 17 CBDB1_AJ965256_spacer_17 33 prophage KBTCE1/KBVC2-10 13 100 3.03E-01 RNA polymerase prophage 150

CRISPR spacer spacer Type of mobile D.mccartyi Mobile BLAST % Gene annotation of best strain FASTA name of spacer E-value type # length DNA which DNA name hit pairwise BLAST hit hit (bp) matched to spacer length identity CBDB1 I-E 18 CBDB1_AJ965256_spacer_18 33 IME1 IME1-10 33 93.9 8.22E-08 non-coding CBDB1 I-E 18 CBDB1_AJ965256_spacer_18 33 IME1 IME1-11 33 93.9 8.22E-08 hypothetical protein CBDB1 I-E 18 CBDB1_AJ965256_spacer_18 33 IME1 IME1-12 33 97 8.22E-08 non-coding CBDB1 I-E 18 CBDB1_AJ965256_spacer_18 33 IME1 IME1-13 33 100 6.74E-09 non-coding * CBDB1 I-E 19 CBDB1_AJ965256_spacer_19 33 prophage pg11a5 prophage 33 100 4.2E-12 hypothetical protein CBDB1 I-E 19 CBDB1_AJ965256_spacer_19 33 prophage KB1/TCE-0 prophage 33 100 4.2E-12 hypothetical protein * CBDB1 I-E 19 CBDB1_AJ965256_spacer_19 33 prophage KBDCA1/KBDCA2-6 33 100 4.2E-12 hypothetical protein CBDB1 I-E 19 CBDB1_AJ965256_spacer_19 33 prophage KBTCE1/KBVC2prophage -1 33 100 4.2E-12 hypothetical protein CBDB1 I-E 19 CBDB1_AJ965256_spacer_19 33 prophage KBTCE1/KBVC2prophage -5 33 100 4.2E-12 hypothetical protein DCMB5 I-C 1 DCMB5_CP004079_I-C_spacer_1 36 prophage KBTCE1/KBVC2prophage -12 25 84 2.95E-02 non-coding * DCMB5 I-E 1 DCMB5_CP004079_I-E_spacer_1 33 no hits prophage * DCMB5 I-C 2 DCMB5_CP004079_I-C_spacer_2 35 no hits * DCMB5 I-E 2 DCMB5_CP004079_I-E_spacer_2 33 no hits * DCMB5 I-C 3 DCMB5_CP004079_I-C_spacer_3 33 no hits * DCMB5 I-E 3 DCMB5_CP004079_I-E_spacer_3 33 no hits * DCMB5 I-C 4 DCMB5_CP004079_I-C_spacer_4 34 no hits * DCMB5 I-E 4 DCMB5_CP004079_I-E_spacer_4 33 no hits * DCMB5 I-E 5 DCMB5_CP004079_I-E_spacer_5 33 IME1 IME1-13 32 100 7.70E-02 hypothetical protein * DCMB5 I-C 5 DCMB5_CP004079_I-C_spacer_5 34 no hits * DCMB5 I-E 6 DCMB5_CP004079_I-E_spacer_6 33 prophage KB1/cDCE-4 33 100 4.20E-12 phage tail tape DCMB5 I-E 6 DCMB5_CP004079_I-E_spacer_6 33 prophage KB1/cDCEprophage -7 33 100 4.20E-12 phagemeasure tail tape DCMB5 I-E 6 DCMB5_CP004079_I-E_spacer_6 33 prophage KB1/TCEprophage-0 prophage 33 100 4.20E-12 phagemeasure tail tape * DCMB5 I-E 6 DCMB5_CP004079_I-E_spacer_6 33 prophage KBDCA1/KBDCA2-6 33 100 4.20E-12 phagemeasure tail tape DCMB5 I-E 6 DCMB5_CP004079_I-E_spacer_6 33 prophage KBTCE1/KBVC2prophage -1 33 100 4.20E-12 phagemeasure tail tape DCMB5 I-E 6 DCMB5_CP004079_I-E_spacer_6 33 prophage KBTCE1/KBVC2prophage -5 33 100 4.20E-12 phagemeasure tail tape DCMB5 I-C 6 DCMB5_CP004079_I-C_spacer_6 34 no hits prophage measure * DCMB5 I-C 7 DCMB5_CP004079_I-C_spacer_7 32 prophage WBC-2 prophage 13 100 2.84E-01 s-adenosylmethionine * DCMB5 I-C 7 DCMB5_CP004079_I-C_spacer_7 32 prophage KBTCE1/KBVC2-12 12 100 9.90E-01 phagesynthetase integrase DCMB5 I-C 7 DCMB5_CP004079_I-C_spacer_7 32 prophage KBTCE2prophage-2 prophage 12 100 9.90E-01 phage terminase DCMB5 I-C 7 DCMB5_CP004079_I-C_spacer_7 32 prophage KBTCE3-3 prophage 12 100 9.90E-01 phage terminase DCMB5 I-E 7 DCMB5_CP004079_I-E_spacer_7 33 no hits * DCMB5 I-E 8 DCMB5_CP004079_I-E_spacer_8 33 prophage KB1/cDCE-8 33 100 4.20E-12 hypothetical protein * DCMB5 I-C 8 DCMB5_CP004079_I-C_spacer_8 35 no hits prophage * DCMB5 I-E 9 DCMB5_CP004079_I-E_spacer_9 33 IME1 IME1-12 33 100 1.59E-10 hypothetical protein * DCMB5 I-E 9 DCMB5_CP004079_I-E_spacer_9 33 IME1 IME1-8 31 100 1.93E-09 non-coding DCMB5 I-E 9 DCMB5_CP004079_I-E_spacer_9 33 IME1 IME1-9 33 100 1.59E-10 hypothetical protein DCMB5 I-E 9 DCMB5_CP004079_I-E_spacer_9 33 IME1 IME1-13 33 100 1.59E-10 hypothetical protein

151

CRISPR spacer spacer Type of mobile D.mccartyi Mobile BLAST % Gene annotation of best strain FASTA name of spacer E-value type # length DNA which DNA name hit pairwise BLAST hit hit (bp) matched to spacer length identity DCMB5 I-C 9 DCMB5_CP004079_I-C_spacer_9 35 prophage BTF08 prophage 13 100 3.40E-01 DNA polymerase * DCMB5 I-E 9 DCMB5_CP004079_I-E_spacer_9 33 no hits * DCMB5 I-E 10 DCMB5_CP004079_I-E_spacer_10 33 prophage KBTCE2-2 prophage 13 13 3.03E-01 phage intergrase * DCMB5 I-E 10 DCMB5_CP004079_I-E_spacer_10 33 prophage KBTCE3-3 prophage 13 13 3.03E-01 phage intergrase DCMB5 I-C 10 DCMB5_CP004079_I-C_spacer_10 35 IME1 IME1-10 35 100 1.41E-11 hypothetical protein * DCMB5 I-C 10 DCMB5_CP004079_I-C_spacer_10 35 IME1 IME1-8 35 100 1.41E-11 hypothetical protein DCMB5 I-C 10 DCMB5_CP004079_I-C_spacer_10 35 IME1 IME1-8 17 100 8.34E-02 hypothetical protein DCMB5 I-C 10 DCMB5_CP004079_I-C_spacer_10 35 IME1 IME1-13 35 100 1.41E-11 hypothetical protein DCMB5 I-C 11 DCMB5_CP004079_I-C_spacer_11 32 no hits * DCMB5 I-E 11 DCMB5_CP004079_I-E_spacer_11 33 no hits * DCMB5 I-E 12 DCMB5_CP004079_I-E_spacer_12 33 prophage BTF08 prophage 31 83.9 5.84E-04 portal protein * DCMB5 I-E 12 DCMB5_CP004079_I-E_spacer_12 33 prophage KBTCE1/KBVC2-12 31 83.9 5.84E-04 hypothetical protein DCMB5 I-C 12 DCMB5_CP004079_I-C_spacer_12 30 no hits prophage * DCMB5 I-E 13 DCMB5_CP004079_I-E_spacer_13 33 prophage KBTCE1/KBVC2-11 12 100 3.03E-01 DNA primase * DCMB5 I-C 13 DCMB5_CP004079_I-C_spacer_13 34 no hits prophage * DCMB5 I-E 14 DCMB5_CP004079_I-E_spacer_14 33 prophage BTF08 prophage 21 90.5 8.67E-02 nuclease DCMB5 I-E 14 DCMB5_CP004079_I-E_spacer_14 33 prophage KBDCA1-9 prophage 13 100 3.03E-01 nuclease DCMB5 I-E 14 DCMB5_CP004079_I-E_spacer_14 33 prophage KBTCE1/KBVC2-10 13 100 3.03E-01 hypothetical protein * DCMB5 I-C 14 DCMB5_CP004079_I-C_spacer_14 34 no hits prophage * DCMB5 I-C 15 DCMB5_CP004079_I-C_spacer_15 34 prophage KB1/cDCE-4 13 100 3.22E-01 hypothetical protein * DCMB5 I-C 15 DCMB5_CP004079_I-C_spacer_15 34 prophage KB1/cDCEprophage -7 13 100 3.22E-01 hypothetical protein DCMB5 I-E 15 DCMB5_CP004079_I-E_spacer_15 33 no hits prophage * DCMB5 I-C 16 DCMB5_CP004079_I-C_spacer_16 35 no hits * DCMB5 I-E 16 DCMB5_CP004079_I-E_spacer_16 33 no hits * DCMB5 I-E 17 DCMB5_CP004079_I-E_spacer_17 33 prophage BTF08 prophage 18 94.4 2.48E-02 DNA polymerase DCMB5 I-E 17 DCMB5_CP004079_I-E_spacer_17 33 prophage CG1 prophage 33 93.9 2.18E-09 phage terminase DCMB5 I-E 17 DCMB5_CP004079_I-E_spacer_17 33 prophage CG3 prophage 33 93.9 2.18E-09 hypothetical protein DCMB5 I-E 17 DCMB5_CP004079_I-E_spacer_17 33 IME-vcrA IME-vcrA GT 13 100 5.36E-02 dsiB * DCMB5 I-E 17 DCMB5_CP004079_I-E_spacer_17 33 IME-vcrA IME-vcrA KBVC1 13 100 5.36E-02 dsiB DCMB5 I-E 17 DCMB5_CP004079_I-E_spacer_17 33 IME-vcrA IME-vcrA VS 13 100 5.36E-02 dsiB DCMB5 I-C 17 DCMB5_CP004079_I-C_spacer_17 37 no hits * DCMB5 I-C 18 DCMB5_CP004079_I-C_spacer_18 36 no hits * DCMB5 I-E 18 DCMB5_CP004079_I-E_spacer_18 33 no hits * DCMB5 I-C 19 DCMB5_CP004079_I-C_spacer_19 37 no hits * DCMB5 I-E 19 DCMB5_CP004079_I-E_spacer_19 33 no hits * DCMB5 I-E 20 DCMB5_CP004079_I-E_spacer_20 34 prophage KBTCE1/KBVC2-1 34 100 4.92E-11 hypothetical protein DCMB5 I-E 20 DCMB5_CP004079_I-E_spacer_20 34 prophage KBTCE1/KBVC2prophage -11 34 100 4.92E-11 hypothetical protein prophage 152

CRISPR spacer spacer Type of mobile D.mccartyi Mobile BLAST % Gene annotation of best strain FASTA name of spacer E-value type # length DNA which DNA name hit pairwise BLAST hit hit (bp) matched to spacer length identity DCMB5 I-C 20 DCMB5_CP004079_I-C_spacer_20 36 IME1 IME1-10 36 91.7 9.59E-08 hypothetical protein * DCMB5 I-C 20 DCMB5_CP004079_I-C_spacer_20 36 IME1 IME1-11 36 91.7 9.59E-08 hypothetical protein DCMB5 I-C 20 DCMB5_CP004079_I-C_spacer_20 36 IME1 IME1-10 36 91.7 9.59E-08 hypothetical protein DCMB5 I-C 21 DCMB5_CP004079_I-C_spacer_21 33 no hits * DCMB5 I-E 21 DCMB5_CP004079_I-E_spacer_21 34 no hits * DCMB5 I-E 22 DCMB5_CP004079_I-E_spacer_22 33 prophage CG1 prophage 31 100 5.12E-11 prohead protease * DCMB5 I-E 22 DCMB5_CP004079_I-E_spacer_22 33 prophage CG3 prophage 26 88.5 5.84E-04 hypothetical protein DCMB5 I-C 22 DCMB5_CP004079_I-C_spacer_22 33 no hits * DCMB5 I-C 23 DCMB5_CP004079_I-C_spacer_23 33 unknown GI KBDCA1 27 85.2 7.70E-01 hypothetical protein * DCMB5 I-C 23 DCMB5_CP004079_I-C_spacer_23 33 unknown GI KBDCA2 27 85.2 7.70E-01 hypothetical protein DCMB5 I-E 23 DCMB5_CP004079_I-E_spacer_23 33 no hits * DCMB5 I-C 24 DCMB5_CP004079_I-C_spacer_24 32 IME-OG19 IME-OG19 CG1 22 90.9 7.05E-02 rdhA OG19 * DCMB5 I-C 24 DCMB5_CP004079_I-C_spacer_24 32 IME-OG19 IME-OG19 VS 22 90.9 7.05E-02 rdhA OG19 DCMB5 I-E 24 DCMB5_CP004079_I-E_spacer_24 33 no hits * DCMB5 I-E 25 DCMB5_CP004079_I-E_spacer_25 33 prophage KBDCA1-9 prophage 25 84 3.03E-01 hypothetical protein DCMB5 I-E 25 DCMB5_CP004079_I-E_spacer_25 33 prophage KBTCE1/KBVC2-10 25 84 3.03E-01 hypothetical protein DCMB5 I-E 25 DCMB5_CP004079_I-E_spacer_25 33 prophage KBTCE1/KBVC2prophage -11 25 84 3.03E-01 hypothetical protein DCMB5 I-E 25 DCMB5_CP004079_I-E_spacer_25 33 IME-vcrA IME-prophagevcrA KBVC1 12 100 1.87E-01 hypr-nap * DCMB5 I-C 25 DCMB5_CP004079_I-C_spacer_25 34 prophage 195 prophage 13 100 3.22E-01 phage terminase * DCMB5 I-C 26 DCMB5_CP004079_I-C_spacer_26 34 no hits * DCMB5 I-E 26 DCMB5_CP004079_I-E_spacer_26 33 no hits * DCMB5 I-E 27 DCMB5_CP004079_I-E_spacer_27 33 prophage CG3 prophage 32 90.6 3.32E-07 hypothetical protein DCMB5 I-C 27 DCMB5_CP004079_I-C_spacer_27 34 prophage KBTCE1/KBVC2-10 18 94.4 3.22E-01 non-coding * DCMB5 I-E 28 DCMB5_CP004079_I-E_spacer_28 33 prophage CG1prophage prophage 33 90.6 9.26E-08 portal protein * DCMB5 I-C 28 DCMB5_CP004079_I-C_spacer_28 36 no hits * DCMB5 I-C 29 DCMB5_CP004079_I-C_spacer_29 33 prophage KBTCE1/KBVC2-11 13 100 3.03E-01 hypothetical protein * DCMB5 I-C 30 DCMB5_CP004079_I-C_spacer_30 35 prophage BTF08prophage prophage 13 100 3.40E-01 hypothetical protein * DCMB5 I-C 31 DCMB5_CP004079_I-C_spacer_31 34 no hits * DCMB5 I-C 32 DCMB5_CP004079_I-C_spacer_32 35 IME1 IME1-10 35 85.7 1.61E-04 hypothetical protein * DCMB5 I-C 32 DCMB5_CP004079_I-C_spacer_32 35 IME1 IME1-8 35 85.7 1.61E-04 hypothetical protein DCMB5 I-C 32 DCMB5_CP004079_I-C_spacer_32 35 IME1 IME1-13 35 85.7 1.61E-04 hypothetical protein DCMB5 I-C 33 DCMB5_CP004079_I-C_spacer_33 36 unknown GI GT 18 100 2.57E-02 peptidase DCMB5 I-C 33 DCMB5_CP004079_I-C_spacer_33 36 unknown target WBC-2 18 100 2.57E-02 TldD DCMB5 I-C 33 DCMB5_CP004079_I-C_spacer_33 36 unknown GI 11a5-GI 18 100 2.57E-02 TldD * DCMB5 I-C 34 DCMB5_CP004079_I-C_spacer_34 35 no hits * DCMB5 I-C 35 DCMB5_CP004079_I-C_spacer_35 34 no hits * DCMB5 I-C 36 DCMB5_CP004079_I-C_spacer_36 32 prophage KB1/cDCE-7 18 94.4 2.84E-01 hypothetical protein prophage 153

CRISPR spacer spacer Type of mobile D.mccartyi Mobile BLAST % Gene annotation of best strain FASTA name of spacer E-value type # length DNA which DNA name hit pairwise BLAST hit hit (bp) matched to spacer length identity DCMB5 I-C 36 DCMB5_CP004079_I-C_spacer_36 32 prophage KB1/TCE-0 prophage 18 94.4 2.84E-01 hypothetical protein DCMB5 I-C 36 DCMB5_CP004079_I-C_spacer_36 32 prophage KBDCA1/KBDCA2-6 18 94.4 2.84E-01 hypothetical protein DCMB5 I-C 36 DCMB5_CP004079_I-C_spacer_36 32 prophage KBTCE1/KBVC2prophage -1 18 94.4 2.84E-01 AMP-binding protein DCMB5 I-C 36 DCMB5_CP004079_I-C_spacer_36 32 prophage KBTCE1/KBVC2prophage -12 13 100 2.84E-01 hypothetical protein * DCMB5 I-C 36 DCMB5_CP004079_I-C_spacer_36 32 prophage KBTCE1/KBVC2prophage -5 18 94.4 2.84E-01 hypothetical protein DCMB5 I-C 37 DCMB5_CP004079_I-C_spacer_37 35 IME1 prophageIME1-13 23 100 4.61E-05 hypothetical protein * DCMB5 I-C 38 DCMB5_CP004079_I-C_spacer_38 35 no hits * DCMB5 I-C 39 DCMB5_CP004079_I-C_spacer_39 34 no hits * DCMB5 I-C 40 DCMB5_CP004079_I-C_spacer_40 32 prophage pg11a5 prophage 18 94.4 2.84E-01 hypothetical protein * DCMB5 I-C 41 DCMB5_CP004079_I-C_spacer_41 33 IME1 IME1-10 30 80 9.37E-01 hypothetical protein * DCMB5 I-C 41 DCMB5_CP004079_I-C_spacer_41 33 IME1 IME1-8 30 80 9.37E-01 hypothetical protein DCMB5 I-C 42 DCMB5_CP004079_I-C_spacer_42 32 prophage BTF08 prophage 17 88.2 9.90E-01 hypothetical protein * DCMB5 I-C 42 DCMB5_CP004079_I-C_spacer_42 32 prophage WBC-2 prophage 17 88.2 9.90E-01 hypothetical protein DCMB5 I-C 43 DCMB5_CP004079_I-C_spacer_43 31 prophage BTF08 prophage 16 93.8 2.65E-01 hypothetical protein * DCMB5 I-C 43 DCMB5_CP004079_I-C_spacer_43 31 prophage KBTCE2-2 prophage 16 93.8 2.65E-01 hypothetical protein DCMB5 I-C 43 DCMB5_CP004079_I-C_spacer_43 31 prophage KBTCE3-3 prophage 16 93.8 2.65E-01 hypothetical protein DCMB5 I-C 44 DCMB5_CP004079_I-C_spacer_44 36 prophage 195 prophage 13 100 3.59E-01 adenine-specific * DCMB5 I-C 45 DCMB5_CP004079_I-C_spacer_45 35 no hits methyltransferase * DCMB5 I-C 46 DCMB5_CP004079_I-C_spacer_46 36 prophage pg11a5 prophage 36 97.2 4.99E-12 hypothetical protein * DCMB5 I-C 46 DCMB5_CP004079_I-C_spacer_46 36 prophage KB1/cDCE-4 36 97.2 4.99E-12 hypothetical protein DCMB5 I-C 46 DCMB5_CP004079_I-C_spacer_46 36 prophage KB1/cDCEprophage -7 36 97.2 4.99E-12 hypothetical protein DCMB5 I-C 47 DCMB5_CP004079_I-C_spacer_47 35 prophage CG1prophage prophage 31 87.1 1.55E-05 hypothetical protein DCMB5 I-C 47 DCMB5_CP004079_I-C_spacer_47 35 prophage CG3 prophage 32 87.5 4.43E-06 phage major capsid DCMB5 I-C 47 DCMB5_CP004079_I-C_spacer_47 35 prophage 195 prophage 13 100 3.40E-01 hypothetical protein * DCMB5 I-C 48 DCMB5_CP004079_I-C_spacer_48 34 no hits * DCMB5 I-C 49 DCMB5_CP004079_I-C_spacer_49 35 no hits * DCMB5 I-C 50 DCMB5_CP004079_I-C_spacer_50 37 no hits * DCMB5 I-C 51 DCMB5_CP004079_I-C_spacer_51 32 no hits * DCMB5 I-C 52 DCMB5_CP004079_I-C_spacer_52 30 prophage KBTCE1/KBVC2-11 12 100 8.58E-01 hypothetical protein * DCMB5 I-C 53 DCMB5_CP004079_I-C_spacer_53 35 no hits prophage * DCMB5 I-C 54 DCMB5_CP004079_I-C_spacer_54 36 prophage CG3 prophage 13 100 3.59E-01 hypothetical protein * GT I-E 1 GT_CP001924_spacer_1 31 prophage 195 prophage 17 88.2 9.24E-01 phage tape measure * GT I-E 2 GT_CP001924_spacer_2 32 no hits * GT I-E 3 GT_CP001924_spacer_3 32 prophage pg11a5 prophage 12 100 9.90E-01 hypothetical protein GT I-E 3 GT_CP001924_spacer_3 32 prophage KB1/cDCE-4 12 100 9.90E-01 protease GT I-E 3 GT_CP001924_spacer_3 32 prophage KB1/cDCEprophage -7 12 100 9.90E-01 hypothetical protein GT I-E 3 GT_CP001924_spacer_3 32 prophage KB1/TCEprophage-0 prophage 12 100 9.90E-01 hypothetical protein

154

CRISPR spacer spacer Type of mobile D.mccartyi Mobile BLAST % Gene annotation of best strain FASTA name of spacer E-value type # length DNA which DNA name hit pairwise BLAST hit hit (bp) matched to spacer length identity GT I-E 3 GT_CP001924_spacer_3 32 prophage KBDCA1/KBDCA2-6 12 100 9.90E-01 hypothetical protein GT I-E 3 GT_CP001924_spacer_3 32 prophage KBTCE1/KBVC2prophage -1 12 100 9.90E-01 hypothetical protein GT I-E 3 GT_CP001924_spacer_3 32 prophage KBTCE1/KBVC2prophage -11 21 85.7 2.84E-01 hypothetical protein * GT I-E 3 GT_CP001924_spacer_3 32 prophage KBTCE1/KBVC2prophage -5 12 100 9.90E-01 hypothetical protein GT I-E 3 GT_CP001924_spacer_3 32 prophage KBTCE2prophage-2 prophage 12 100 9.90E-01 hypothetical protein GT I-E 3 GT_CP001924_spacer_3 32 prophage KBTCE3-3 prophage 12 100 9.90E-01 protease GT I-E 4 GT_CP001924_spacer_4 33 no hits * GT I-E 5 GT_CP001924_spacer_5 32 prophage CG1 prophage 32 90.6 3.03E-07 phage terminase * GT I-E 5 GT_CP001924_spacer_5 32 prophage CG3 prophage 32 90.6 3.03E-07 hypothetical protein GT I-E 6 GT_CP001924_spacer_6 32 no hits * GT I-E 7 GT_CP001924_spacer_7 32 prophage KBTCE2-2 prophage 12 100 9.90E-01 hypothetical protein * GT I-E 7 GT_CP001924_spacer_7 32 prophage KBTCE3-3 prophage 12 100 9.90E-01 hypothetical protein GT I-E 8 GT_CP001924_spacer_8 32 no hits * GT I-E 9 GT_CP001924_spacer_9 32 prophage KB1/cDCE-8 32 100 1.38E-11 non-coding * GT I-E 10 GT_CP001924_spacer_10 32 prophage CG3prophage prophage 12 100 9.90E-01 hypothetical protein * GT I-E 11 GT_CP001924_spacer_11 32 prophage WBC-2 prophage 14 100 8.13E-02 hypothetical protein * GT I-E 11 GT_CP001924_spacer_11 32 prophage KBDCA1-9 prophage 14 100 8.13E-02 phage terminase GT I-E 11 GT_CP001924_spacer_11 32 prophage KBTCE1/KBVC2-10 14 100 8.13E-02 phage terminase GT I-E 11 GT_CP001924_spacer_11 32 prophage KBTCE1/KBVC2prophage -11 14 100 8.13E-02 phage terminase GT I-E 12 GT_CP001924_spacer_12 32 IME-vcrA IME-prophagevcrA KBVC2 14 92.9 6.19E-01 dsiB * GT I-E 13 GT_CP001924_spacer_13 32 prophage 195 prophage 12 100 9.90E-01 phage integrase * GT I-E 14 GT_CP001924_spacer_14 32 unknown GI CG1 GI 24 87.5 2.46E-01 hypothetical protein * GT I-E 15 GT_CP001924_spacer_15 32 prophage BTF08 prophage 32 84.4 1.57E-04 hypothetical protein * GT I-E 15 GT_CP001924_spacer_15 32 prophage KBDCA1-9 prophage 16 93.8 2.84E-01 non-coding GT I-E 15 GT_CP001924_spacer_15 32 prophage KBTCE1/KBVC2-10 16 93.8 2.84E-01 phage terminase GT I-E 15 GT_CP001924_spacer_15 32 prophage KBTCE1/KBVC2prophage -11 16 93.8 2.84E-01 non-coding GT I-E 15 GT_CP001924_spacer_15 32 prophage KBTCE1/KBVC2prophage -12 32 84.4 1.57E-04 non-coding GT I-E 16 GT_CP001924_spacer_16 33 prophage CG1prophage prophage 33 87.9 1.13E-06 phage terminase * GT I-E 16 GT_CP001924_spacer_16 33 prophage CG3 prophage 33 87.9 1.13E-06 hypothetical protein GT I-E 17 GT_CP001924_spacer_17 33 unknown GI BAV1 GI 22 90.9 7.70E-02 hypothetical protein * GT I-E 18 GT_CP001924_spacer_18 32 no hits * GT I-E 19 GT_CP001924_spacer_19 32 prophage BTF08 prophage 18 94.4 2.33E-02 DNA polymerase GT I-E 19 GT_CP001924_spacer_19 32 prophage CG1 prophage 32 93.8 7.12E-09 phage terminase * GT I-E 19 GT_CP001924_spacer_19 32 prophage CG3 prophage 32 93.8 7.12E-09 hypothetical protein GT I-E 20 GT_CP001924_spacer_20 32 IME1 IME1-14 32 94.4 8.59E-01 hypothetical protein * GT I-E 21 GT_CP001924_spacer_21 32 no hits * GT I-E 22 GT_CP001924_spacer_22 32 prophage BTF08 prophage 21 90.5 8.13E-02 nuclease

155

CRISPR spacer spacer Type of mobile D.mccartyi Mobile BLAST % Gene annotation of best strain FASTA name of spacer E-value type # length DNA which DNA name hit pairwise BLAST hit hit (bp) matched to spacer length identity GT I-E 22 GT_CP001924_spacer_22 32 prophage KBDCA1-9 prophage 13 100 2.84E-01 nuclease GT I-E 22 GT_CP001924_spacer_22 32 prophage KBTCE1/KBVC2-10 13 100 2.84E-01 hypothetical protein * GT I-E 23 GT_CP001924_spacer_23 32 prophage KBTCE1/KBVC2prophage -11 13 100 2.84E-01 DNA primase * GT I-E 24 GT_CP001924_spacer_24 32 prophage BTF08prophage prophage 28 85.7 5.84E-04 DNA polymerase I GT I-E 24 GT_CP001924_spacer_24 32 prophage KBTCE1/KBVC2-12 28 85.7 5.84E-04 DNA polymerase I GT I-E 24 GT_CP001924_spacer_24 32 prophage KBTCE2prophage-2 prophage 15 93.3 9.90E-01 phage portal protein * GT I-E 24 GT_CP001924_spacer_24 32 prophage KBTCE3-3 prophage 15 93.3 9.90E-01 phage portal protein GT I-E 25 GT_CP001924_spacer_25 32 no hits * GT I-E 26 GT_CP001924_spacer_26 32 prophage KBTCE2-2 prophage 13 100 2.84E-01 phage integrase * GT I-E 26 GT_CP001924_spacer_26 32 prophage KBTCE3-3 prophage 13 100 2.84E-01 phage integrase GT I-E 27 GT_CP001924_spacer_27 32 IME1 IME1-13 32 100 5.07E-10 hypothetical protein * GT I-E 28 GT_CP001924_spacer_28 32 no hits * GT I-E 29 GT_CP001924_spacer_29 33 no hits * GT I-E 30 GT_CP001924_spacer_30 32 no hits * GT I-E 31 GT_CP001924_spacer_31 32 prophage KBTCE1/KBVC2-12 13 100 2.84E-01 non-coding * GT I-E 32 GT_CP001924_spacer_32 33 no hits prophage * GT I-E 33 GT_CP001924_spacer_33 33 IME1 IME1-13 33 100 1.59E-10 hypothetical protein * GT I-E 34 GT_CP001924_spacer_34 32 no hits * GT I-E 35 GT_CP001924_spacer_35 31 prophage KBTCE1/KBVC2-10 13 100 2.65E-01 hypothetical protein GT I-E 35 GT_CP001924_spacer_35 31 prophage KBTCE2prophage-2 prophage 12 100 9.24E-01 hypothetical protein GT I-E 35 GT_CP001924_spacer_35 31 prophage KBTCE3-3 prophage 12 100 9.24E-01 hypothetical protein GT I-E 35 GT_CP001924_spacer_35 32 prophage KBDCA1-9 prophage 13 100 2.65E-01 hypothetical protein * GT I-E 36 GT_CP001924_spacer_36 32 no hits * GT I-E 37 GT_CP001924_spacer_37 32 prophage CG3 prophage 12 100 9.90E-01 hypothetical protein * KBDCA3 I-C 1 KBDCA3_CP019946_spacer_1 34 no hits * KBDCA3 I-C 2 KBDCA3_CP019946_spacer_2 34 prophage KBTCE2-2 prophage 12 100 8.01E-01 phage protein * KBDCA3 I-C 2 KBDCA3_CP019946_spacer_2 34 prophage KBTCE3-3 prophage 12 100 8.01E-01 phage protein KBDCA3 I-C 3 KBDCA3_CP019946_spacer_3 34 prophage KBDCA1/KBDCA2-6 12 100 8.01E-01 phage tape measure KBDCA3 I-C 3 KBDCA3_CP019946_spacer_3 34 prophage KBTCE1/KBVC2prophage -1 12 100 8.01E-01 phage tape measure * KBDCA3 I-C 3 KBDCA3_CP019946_spacer_3 34 prophage KBTCE1/KBVC2prophage -1 12 100 8.01E-01 phage tape measure KBDCA3 I-C 3 KBDCA3_CP019946_spacer_3 34 prophage KBTCE1/KBVC2prophage -5 12 100 8.01E-01 hypothetical protein KBDCA3 I-C 4 KBDCA3_CP019946_spacer_4 31 no hits prophage * KBDCA3 I-C 5 KBDCA3_CP019946_spacer_5 33 prophage pg11a5 prophage 12 100 7.56E-01 hypothetical protein * KBDCA3 I-C 5 KBDCA3_CP019946_spacer_5 33 prophage 195 prophage 12 100 7.56E-01 phage protein KBDCA3 I-C 5 KBDCA3_CP019946_spacer_5 33 prophage KBTCE1/KBVC2-1 12 100 7.56E-01 hypothetical protein KBDCA3 I-C 6 KBDCA3_CP019946_spacer_6 35 unknown GI BTF08prophage GI 25 92 1.96E-03 transcriptional KBDCA3 I-C 6 KBDCA3_CP019946_spacer_6 35 prophage CG1 prophage 24 91.7 6.84E-03 repressortranscriptional LexA * regulator 156

CRISPR spacer spacer Type of mobile D.mccartyi Mobile BLAST % Gene annotation of best strain FASTA name of spacer E-value type # length DNA which DNA name hit pairwise BLAST hit hit (bp) matched to spacer length identity KBDCA3 I-C 6 KBDCA3_CP019946_spacer_6 35 prophage CG1 prophage 12 100 8.45E-01 hypothetical protein KBDCA3 I-C 6 KBDCA3_CP019946_spacer_6 35 unknown GI 11a5-GI 25 92 1.96E-03 KBDCA3 I-C 7 KBDCA3_CP019946_spacer_7 35 prophage 195 prophage 15 93.3 8.45E-01 phage tape measure * KBDCA3 I-C 8 KBDCA3_CP019946_spacer_8 34 IME-OG49 DCMB5 GI rdhA 29 90.9 8.34E-02 reductive * KBDCA3 I-C 9 KBDCA3_CP019946_spacer_9 34 IME-vcrA IMEcontaining-vcrA KBVC2 12 100 1.87E-01 dehalogenaseparBc OG 49 * KBDCA3 I-C 10 KBDCA3_CP019946_spacer_10 35 no hits * KBDCA3 I-C 11 KBDCA3_CP019946_spacer_11 34 prophage pg11a5 prophage 26 84.6 6.57E-02 phage terminase * KBDCA3 I-C 12 KBDCA3_CP019946_spacer_12 34 no hits * KBDCA3 I-C 13 KBDCA3_CP019946_spacer_13 35 IME1 IME1-11 18 94.4 8.73E-01 DsiB * KBDCA3 I-C 13 KBDCA3_CP019946_spacer_13 35 IME1 IME1-8 18 94.4 8.73E-01 DsiB KBDCA3 I-C 14 KBDCA3_CP019946_spacer_14 36 prophage 195 prophage 12 100 8.90E-01 hypothetical protein KBDCA3 I-C 14 KBDCA3_CP019946_spacer_14 36 IME1 IME1-13 35 94.3 7.87E-09 hypothetical protein * KBDCA3 I-C 14 KBDCA3_CP019946_spacer_14 36 prophage KB1/cDCE-4 22 86.4 2.55E-01 non-coding KBDCA3 I-C 14 KBDCA3_CP019946_spacer_14 36 prophage KBDCA1/KBDCA2prophage -6 22 86.4 2.55E-01 hypothetical protein KBDCA3 I-C 14 KBDCA3_CP019946_spacer_14 36 prophage KBTCE1/KBVC2prophage -1 17 88.2 8.90E-01 hypothetical protein KBDCA3 I-C 14 KBDCA3_CP019946_spacer_14 36 prophage KBTCE1/KBVC2prophage -1 22 86.4 2.55E-01 hypothetical protein KBDCA3 I-C 14 KBDCA3_CP019946_spacer_14 36 prophage KBTCE1/KBVC2prophage -1 22 86.4 2.55E-01 hypothetical protein KBDCA3 I-C 14 KBDCA3_CP019946_spacer_14 36 prophage KBTCE1/KprophageBVC2 -10 15 93.3 8.90E-01 hypothetical protein KBDCA3 I-C 14 KBDCA3_CP019946_spacer_14 36 prophage KBTCE1/KBVC2prophage -11 15 93.3 8.90E-01 phage tape measure KBDCA3 I-C 14 KBDCA3_CP019946_spacer_14 36 prophage KBTCE1/KBVC2prophage -11 15 93.3 8.90E-01 phage tape measure KBDCA3 I-C 14 KBDCA3_CP019946_spacer_14 36 prophage KBTCE1/KBVC2prophage -5 22 86.4 2.55E-01 phage tape measure KBDCA3 I-C 14 KBDCA3_CP019946_spacer_14 36 IME-tceA IMEprophage-tceA 36 91.7 9.59E-08 phage portal protein KBDCA3 I-C 15 KBDCA3_CP019946_spacer_15 34 no hits * KBDCA3 I-C 16 KBDCA3_CP019946_spacer_16 34 prophage KBTCE2-2 prophage 12 100 8.01E-01 site-specific * KBDCA3 I-C 16 KBDCA3_CP019946_spacer_16 34 prophage KBTCE3-3 prophage 12 100 8.01E-01 recombinasesite-specific KBDCA3 I-C 17 KBDCA3_CP019946_spacer_17 35 no hits recombinase * KBDCA3 I-C 18 KBDCA3_CP019946_spacer_18 34 prophage CG1 prophage 12 100 8.01E-01 major capsid protein * KBDCA3 I-C 19 KBDCA3_CP019946_spacer_19 34 no hits * KBDCA3 I-C 20 KBDCA3_CP019946_spacer_20 33 prophage KB1/cDCE-4 33 87.9 8.08E-07 hypothetical protein KBDCA3 I-C 20 KBDCA3_CP019946_spacer_20 33 prophage KBDCA1/KBDCA2prophage -6 33 87.9 8.08E-07 hypothetical protein KBDCA3 I-C 20 KBDCA3_CP019946_spacer_20 33 prophage KBTCE1/KBVC2prophage -1 33 87.9 8.08E-07 hypothetical protein * KBDCA3 I-C 20 KBDCA3_CP019946_spacer_20 33 prophage KBTCE1/KBVC2prophage -1 33 87.9 8.08E-07 hypothetical protein KBDCA3 I-C 21 KBDCA3_CP019946_spacer_21 34 IME1 prophageIME1-9 34 100 4.92E-11 hypothetical protein * KBDCA3 I-C 21 KBDCA3_CP019946_spacer_21 34 prophage KBDCA1/KBDCA2-6 12 100 8.01E-01 phage prohead KBDCA3 I-C 21 KBDCA3_CP019946_spacer_21 34 prophage KBTCE1/KBVC2prophage -1 12 100 8.01E-01 phageprotease prohe ad KBDCA3 I-C 21 KBDCA3_CP019946_spacer_21 34 prophage KBTCE1/KBVC2prophage -1 12 100 8.01E-01 phageprotease prohead KBDCA3 I-C 21 KBDCA3_CP019946_spacer_21 34 prophage KBTCE1/KBVC2prophage -5 12 100 8.01E-01 phageprotease prohead prophage protease 157

CRISPR spacer spacer Type of mobile D.mccartyi Mobile BLAST % Gene annotation of best strain FASTA name of spacer E-value type # length DNA which DNA name hit pairwise BLAST hit hit (bp) matched to spacer length identity KBDCA3 I-C 22 KBDCA3_CP019946_spacer_22 35 unknown GI 195 GI 35 85.7 1.60E-04 DNA primase KBDCA3 I-C 22 KBDCA3_CP019946_spacer_22 35 IME1 IME1-14 35 85.7 1.60E-04 dsiB KBDCA3 I-C 22 KBDCA3_CP019946_spacer_22 35 IME1 IME1-10 35 100 1.41E-11 hypothetical protein * KBDCA3 I-C 22 KBDCA3_CP019946_spacer_22 35 IME1 IME1-11 35 85.7 1.60E-04 phage tape measure KBDCA3 I-C 22 KBDCA3_CP019946_spacer_22 35 IME1 IME1-8 35 100 1.41E-11 hypothetical protein KBDCA3 I-C 22 KBDCA3_CP019946_spacer_22 35 IME1 IME1-8 35 85.7 1.60E-04 phage tape measure KBDCA3 I-C 22 KBDCA3_CP019946_spacer_22 35 IME1 IME1-9 35 85.7 1.60E-04 phage tape measure KBDCA3 I-C 22 KBDCA3_CP019946_spacer_22 35 IME1 IME1-13 35 100 1.41E-11 hypothetical protein KBDCA3 I-C 22 KBDCA3_CP019946_spacer_22 35 IME1 IME1-13 35 85.7 1.60E-04 hypothetical protein KBDCA3 I-C 22 KBDCA3_CP019946_spacer_22 35 prophage KBDCA1/KBDCA2-6 12 100 8.45E-01 hypothetical protein KBDCA3 I-C 22 KBDCA3_CP019946_spacer_22 35 prophage KBTCE1/KBVC2prophage -1 12 100 8.45E-01 hypothetical protein KBDCA3 I-C 22 KBDCA3_CP019946_spacer_22 35 prophage KBTCE1/KBVC2prophage -1 12 100 8.45E-01 hypothetical protein KBDCA3 I-C 22 KBDCA3_CP019946_spacer_22 35 prophage KBTCE1/KBVC2prophage -5 12 100 8.45E-01 hypothetical protein KBDCA3 I-C 22 KBDCA3_CP019946_spacer_22 35 IME-vcrA IMEprophage-vcrA GT 14 92.9 6.88E-01 hypothetical protein KBDCA3 I-C 22 KBDCA3_CP019946_spacer_22 35 IME-vcrA IME-vcrA KBVC1 14 92.9 6.88E-01 dsiB KBDCA3 I-C 22 KBDCA3_CP019946_spacer_22 35 IME-vcrA IME-vcrA WBC-2 14 92.9 6.88E-01 dsiB KBDCA3 I-C 23 KBDCA3_CP019946_spacer_23 36 prophage KBTCE2-2 prophage 16 100 6.00E-03 hypothetical protein * KBDCA3 I-C 23 KBDCA3_CP019946_spacer_23 36 prophage KBTCE3-3 prophage 16 100 6.00E-03 hypothetical protein KBDCA3 I-C 24 KBDCA3_CP019946_spacer_24 36 prophage BTF08 prophage 12 100 7.65E-01 phage tape measure KBDCA3 I-C 24 KBDCA3_CP019946_spacer_24 36 prophage KBTCE1/KBVC2-12 12 100 8.90E-01 non-coding KBDCA3 I-C 24 KBDCA3_CP019946_spacer_24 36 prophage KBTCE2prophage-2 prophage 14 100 7.30E-02 hypothetical protein * KBDCA3 I-C 24 KBDCA3_CP019946_spacer_24 36 prophage KBTCE2-2 prophage 15 93.3 8.90E-01 non-coding KBDCA3 I-C 24 KBDCA3_CP019946_spacer_24 36 prophage KBTCE3-3 prophage 14 100 7.30E-02 hypothetical protein KBDCA3 I-C 24 KBDCA3_CP019946_spacer_24 36 prophage KBTCE3-3 prophage 15 93.3 8.90E-01 hypothetical protein KBDCA3 I-C 25 KBDCA3_CP019946_spacer_25 36 no hits * KBDCA3 I-C 26 KBDCA3_CP019946_spacer_26 34 prophage 195 prophage 12 100 8.01E-01 hypothetical protein KBDCA3 I-C 26 KBDCA3_CP019946_spacer_26 34 IME1 IME1-9 20 90 9.35E-01 phage DNA helicase * KBDCA3 I-C 27 KBDCA3_CP019946_spacer_27 32 IME1 IME1-13 32 100 5.07E-10 hypothetical protein * KBDCA3 I-C 28 KBDCA3_CP019946_spacer_28 32 prophage pg11a5 prophage 12 100 7.12E-01 hypothetical protein * KBDCA3 I-C 28 KBDCA3_CP019946_spacer_28 32 prophage KB1/cDCE-4 12 100 7.12E-01 hypothetical protein KBDCA3 I-C 28 KBDCA3_CP019946_spacer_28 32 prophage KBDCA1/KBDCA2prophage -6 12 100 7.12E-01 hypothetical protein KBDCA3 I-C 28 KBDCA3_CP019946_spacer_28 32 prophage KBTCE1/KBVC2prophage -1 12 100 7.12E-01 hypothetical protein KBDCA3 I-C 28 KBDCA3_CP019946_spacer_28 32 prophage KBTCE1/KBVC2prophage -1 12 100 7.12E-01 hypothetical protein KBDCA3 I-C 28 KBDCA3_CP019946_spacer_28 32 prophage KBTCE1/KBVC2prophage -5 12 100 7.12E-01 hypothetical protein KBDCA3 I-C 28 KBDCA3_CP019946_spacer_28 32 prophage KBTCE2prophage-2 prophage 12 100 7.12E-01 phage terminase KBDCA3 I-C 28 KBDCA3_CP019946_spacer_28 32 prophage KBTCE3-3 prophage 12 100 7.12E-01 phage terminase KBDCA3 I-C 29 KBDCA3_CP019946_spacer_29 34 prophage KB1/cDCE-4 12 100 8.01E-01 hypothetical protein prophage 158

CRISPR spacer spacer Type of mobile D.mccartyi Mobile BLAST % Gene annotation of best strain FASTA name of spacer E-value type # length DNA which DNA name hit pairwise BLAST hit hit (bp) matched to spacer length identity KBDCA3 I-C 29 KBDCA3_CP019946_spacer_29 34 prophage KBDCA1/KBDCA2-6 12 100 8.01E-01 phage related protein KBDCA3 I-C 29 KBDCA3_CP019946_spacer_29 34 prophage KBTCE1/KBVC2prophage -1 12 100 8.01E-01 hypothetical protein * KBDCA3 I-C 29 KBDCA3_CP019946_spacer_29 34 prophage KBTCE1/KBVC2prophage -1 12 100 8.01E-01 hypothetical protein KBDCA3 I-C 30 KBDCA3_CP019946_spacer_30 35 prophage KB1/cDCEprophage -4 12 100 8.45E-01 hypothetical protein KBDCA3 I-C 30 KBDCA3_CP019946_spacer_30 35 prophage KBDCA1/KBDCA2prophage -6 12 100 8.45E-01 phage related protein KBDCA3 I-C 30 KBDCA3_CP019946_spacer_30 35 prophage KBTCE1/KBVC2prophage -1 12 100 8.45E-01 hypothetical protein * KBDCA3 I-C 30 KBDCA3_CP019946_spacer_30 35 prophage KBTCE1/KBVC2prophage -1 12 100 8.45E-01 hypothetical protein KBDCA3 I-C 31 KBDCA3_CP019946_spacer_31 36 IME1 prophageIME1-10 36 100 3.72E-12 hypothetical protein * KBDCA3 I-C 31 KBDCA3_CP019946_spacer_31 36 IME1 IME1-8 36 100 3.72E-12 hypothetical protein KBDCA3 I-C 32 KBDCA3_CP019946_spacer_32 35 IME1 IME1-10 26 88.5 2.05E-02 hypothetical protein * KBDCA3 I-C 32 KBDCA3_CP019946_spacer_32 35 IME1 IME1-11 26 88.5 2.05E-02 hypothetical protein KBDCA3 I-C 32 KBDCA3_CP019946_spacer_32 35 IME1 IME1-9 26 88.5 2.05E-02 hypothetical protein KBDCA3 I-C 33 KBDCA3_CP019946_spacer_33 35 IME1 IME1-8 33 97 6.28E-09 phage protein * KBDCA3 I-C 33 KBDCA3_CP019946_spacer_33 35 prophage KBDCA1/KBDCA2-6 12 100 8.45E-01 phage protein KBDCA3 I-C 33 KBDCA3_CP019946_spacer_33 35 prophage KBTCE1/KBVC2prophage -1 12 100 8.45E-01 non-coding KBDCA3 I-C 33 KBDCA3_CP019946_spacer_33 35 prophage KBTCE1/KBVC2prophage -1 12 100 8.45E-01 phage portal protein KBDCA3 I-C 33 KBDCA3_CP019946_spacer_33 35 prophage KBTCE1/KBVC2prophage -5 12 100 8.45E-01 phage portal protein KBDCA3 I-C 33 KBDCA3_CP019946_spacer_33 35 prophage KBTCE2prophage-2 prophage 27 81.5 8.45E-01 phage portal protein KBDCA3 I-C 33 KBDCA3_CP019946_spacer_33 35 prophage KBTCE3-3 prophage 27 81.5 8.45E-01 phage portal protein KBDCA3 I-C 34 KBDCA3_CP019946_spacer_34 34 prophage KBTCE2-2 prophage 16 93.8 1.97E-01 phage tape measure * KBDCA3 I-C 34 KBDCA3_CP019946_spacer_34 34 prophage KBTCE3-3 prophage 16 93.8 1.97E-01 phage tape measure KBDCA3 I-C 35 KBDCA3_CP019946_spacer_35 35 no hits * KBDCA3 I-C 36 KBDCA3_CP019946_spacer_36 36 no hits * KBDCA3 I-C 37 KBDCA3_CP019946_spacer_37 34 prophage pg11a5 prophage 12 100 8.01E-01 phage portal protein KBDCA3 I-C 37 KBDCA3_CP019946_spacer_37 34 IME-OG24 BAV1 GI rdhA 34 100 4.92E-11 hypothetical protein * KBVC1 I-E 1 KBVC1_CP019968_spacer_1 32 no hits containing near rdhA OG24 * KBVC1 I-E 2 KBVC1_CP019968_spacer_2 32 prophage pg11a5 prophage 32 100 9.89E-12 major capsid protein * KBVC1 I-E 2 KBVC1_CP019968_spacer_2 32 prophage 195 prophage 12 100 7.12E-01 endolysin KBVC1 I-E 3 KBVC1_CP019968_spacer_3 31 prophage pg11a5 prophage 32 88.2 6.67E-01 phage tape measure * KBVC1 I-E 4 KBVC1_CP019968_spacer_4 32 no hits * KBVC1 I-E 5 KBVC1_CP019968_spacer_5 32 prophage pg11a5 prophage 12 100 7.12E-01 hypothetical protein * KBVC1 I-E 5 KBVC1_CP019968_spacer_5 32 prophage KB1/cDCE-4 12 100 7.12E-01 hypothetical protein KBVC1 I-E 5 KBVC1_CP019968_spacer_5 32 prophage KBDCA1/KBDCA2prophage -6 12 100 7.12E-01 hypothetical protein KBVC1 I-E 5 KBVC1_CP019968_spacer_5 32 prophage KBTCE1/KBVC2prophage -1 12 100 7.12E-01 hypothetical protein KBVC1 I-E 5 KBVC1_CP019968_spacer_5 32 prophage KBTCE1/KBVC2prophage -1 12 100 7.12E-01 hypothetical protein KBVC1 I-E 5 KBVC1_CP019968_spacer_5 32 prophage KBTCE1/KBVC2prophage -5 12 100 7.12E-01 hypothetical protein KBVC1 I-E 5 KBVC1_CP019968_spacer_5 32 prophage KBTCE2prophage-2 prophage 12 100 7.12E-01 phage Clp protease- like protein 159

CRISPR spacer spacer Type of mobile D.mccartyi Mobile BLAST % Gene annotation of best strain FASTA name of spacer E-value type # length DNA which DNA name hit pairwise BLAST hit hit (bp) matched to spacer length identity KBVC1 I-E 5 KBVC1_CP019968_spacer_5 32 prophage KBTCE3-3 prophage 12 100 7.12E-01 phage Clp protease- KBVC1 I-E 6 KBVC1_CP019968_spacer_6 33 prophage 195 prophage 12 100 7.56E-01 likenon -pcodingrotein * KBVC1 I-E 6 KBVC1_CP019968_spacer_6 33 prophage 195 prophage 12 100 7.56E-01 major capsid protein KBVC1 I-E 6 KBVC1_CP019968_spacer_6 33 prophage 195 prophage 12 94.1 7.56E-01 phage recombinase KBVC1 I-E 6 KBVC1_CP019968_spacer_6 33 prophage KBTCE1/KBVC2-10 12 100 7.56E-01 non-coding KBVC1 I-E 7 KBVC1_CP019968_spacer_7 32 prophage CG1prophage prophage 32 90.6 2.18E-07 phage terminase * KBVC1 I-E 7 KBVC1_CP019968_spacer_7 32 prophage CG1 prophage 32 90.6 1.12E-05 terminase KBVC1 I-E 8 KBVC1_CP019968_spacer_8 32 no hits * KBVC1 I-E 9 KBVC1_CP019968_spacer_9 32 IME1 IME1-10 32 96.9 2.16E-08 hypothetical protein KBVC1 I-E 9 KBVC1_CP019968_spacer_9 32 IME1 IME1-8 32 100 5.07E-10 hypothetical protein * KBVC1 I-E 9 KBVC1_CP019968_spacer_9 32 IME1 IME1-13 32 100 5.07E-10 hypothetical protein KBVC1 I-E 9 KBVC1_CP019968_spacer_9 32 prophage KBTCE2-2 prophage 12 100 7.12E-01 hypothetical protein KBVC1 I-E 9 KBVC1_CP019968_spacer_9 32 prophage KBTCE3-3 prophage 12 100 7.12E-01 hypothetical protein KBVC1 I-E 10 KBVC1_CP019968_spacer_10 32 IME1 IME1-11 32 96.6 1.72E-08 hypothetical protein KBVC1 I-E 10 KBVC1_CP019968_spacer_10 32 IME1 IME1-8 32 100 4.05E-10 hypothetical protein * KBVC1 I-E 11 KBVC1_CP019968_spacer_11 32 prophage CG1 prophage 31 96.8 7.53E-08 KBVC1 I-E 11 KBVC1_CP019968_spacer_11 32 prophage KB1/cDCE-8 32 100 9.89E-12 hypothetical protein * KBVC1 I-E 12 KBVC1_CP019968_spacer_12 32 no hits prophage * KBVC1 I-E 13 KBVC1_CP019968_spacer_13 32 no hits * KBVC1 I-E 14 KBVC1_CP019968_spacer_14 32 no hits * KBVC1 I-E 15 KBVC1_CP019968_spacer_15 32 prophage 195 prophage 12 100 7.12E-01 hypothetical protein KBVC1 I-E 15 KBVC1_CP019968_spacer_15 32 IME1 IME1-10 32 96.9 2.16E-08 hypothetical protein * KBVC1 I-E 15 KBVC1_CP019968_spacer_15 32 IME1 IME1-10 32 93.8 2.63E-07 hypothetical protein KBVC1 I-E 15 KBVC1_CP019968_spacer_15 32 IME1 IME1-8 32 93.8 2.63E-07 hypothetical protein KBVC1 I-E 15 KBVC1_CP019968_spacer_15 32 IME1 IME1-8 32 93.8 2.63E-07 hypothetical protein KBVC1 I-E 15 KBVC1_CP019968_spacer_15 32 IME1 IME1-8 32 93.8 2.63E-07 hypothetical protein KBVC1 I-E 15 KBVC1_CP019968_spacer_15 32 IME1 IME1-13 32 90.6 1.12E-05 phage integrase KBVC1 I-E 15 KBVC1_CP019968_spacer_15 32 IME-tceA IME-tceA 32 93.8 2.63E-07 hypothetical protein KBVC1 I-E 16 KBVC1_CP019968_spacer_16 32 IME1 IME1-10 32 96.6 1.72E-08 non-coding KBVC1 I-E 16 KBVC1_CP019968_spacer_16 32 IME1 IME1-8 32 96.6 1.72E-08 non-coding KBVC1 I-E 16 KBVC1_CP019968_spacer_16 32 IME1 IME1-8 32 93.8 2.10E-07 non-coding * KBVC1 I-E 17 KBVC1_CP019968_spacer_17 32 prophage BTF08-2 prophage 32 84.4 1.13E-04 hypothetical protein * KBVC1 I-E 17 KBVC1_CP019968_spacer_17 32 prophage KBTCE1/KBVC2-10 16 93.8 2.04E-01 hypothetical protein KBVC1 I-E 17 KBVC1_CP019968_spacer_17 32 prophage KBTCE1/KBVC2prophage -11 16 93.8 2.04E-01 hypothetical protein KBVC1 I-E 17 KBVC1_CP019968_spacer_17 32 prophage KBTCE1/KBVC2prophage -12 32 84.4 1.13E-04 hypothetical protein KBVC1 I-E 17 KBVC1_CP019968_spacer_17 32 prophage KBTCE1/KBVC2prophage -12 32 84.4 1.13E-04 phage terminase KBVC1 I-E 18 KBVC1_CP019968_spacer_18 33 prophage CG1prophage prophage 33 87.9 8.08E-07 major capsid protein *

160

CRISPR spacer spacer Type of mobile D.mccartyi Mobile BLAST % Gene annotation of best strain FASTA name of spacer E-value type # length DNA which DNA name hit pairwise BLAST hit hit (bp) matched to spacer length identity KBVC1 I-E 18 KBVC1_CP019968_spacer_18 33 prophage KBDCA1/KBDCA2-6 15 93.3 7.56E-01 major capsid protein KBVC1 I-E 18 KBVC1_CP019968_spacer_18 33 prophage KBTCE1/KBVC2prophage -1 15 93.3 7.56E-01 phage terminase KBVC1 I-E 18 KBVC1_CP019968_spacer_18 33 prophage KBTCE1/KBVC2prophage -1 15 93.3 7.56E-01 major capsid protein KBVC1 I-E 18 KBVC1_CP019968_spacer_18 33 prophage KBTCE1/KBVC2prophage -5 15 93.3 7.56E-01 major capsid protein KBVC1 I-E 19 KBVC1_CP019968_spacer_19 33 unknown GI BAV1prophage GI 22 90.9 7.70E-02 hypothetical protein * KBVC1 I-E 20 KBVC1_CP019968_spacer_20 32 prophage KBTCE2-2 prophage 12 100 7.12E-01 phage portal protein * KBVC1 I-E 20 KBVC1_CP019968_spacer_20 32 prophage KBTCE3-3 prophage 12 100 7.12E-01 phage portal protein KBVC1 I-E 21 KBVC1_CP019968_spacer_21 32 no hits * KBVC1 I-E 22 KBVC1_CP019968_spacer_22 32 prophage CG1 prophage 32 93.8 5.12E-09 phage terminase * KBVC1 I-E 22 KBVC1_CP019968_spacer_22 32 prophage CG1 prophage 32 93.8 2.63E-07 terminase KBVC1 I-E 22 KBVC1_CP019968_spacer_22 32 IME-vcrA IME-vcrA GT 13 100 5.08E-02 dsiB KBVC1 I-E 22 KBVC1_CP019968_spacer_22 32 IME-vcrA IME-vcrA KBVC1 13 100 5.08E-02 dsiB KBVC1 I-E 23 KBVC1_CP019968_spacer_23 32 no hits * KBVC1 I-E 24 KBVC1_CP019968_spacer_24 32 no hits * KBVC1 I-E 25 KBVC1_CP019968_spacer_25 32 no hits * KBVC1 I-E 26 KBVC1_CP019968_spacer_26 32 no hits * KBVC1 I-E 27 KBVC1_CP019968_spacer_27 32 prophage BTF08-2 prophage 28 85.7 3.94E-04 hypothetical protein * KBVC1 I-E 27 KBVC1_CP019968_spacer_27 32 prophage KBTCE1/KBVC2-12 28 85.7 3.94E-04 DNA polymerase I KBVC1 I-E 27 KBVC1_CP019968_spacer_27 32 prophage KBTCE1/KBVC2prophage -12 28 85.7 3.94E-04 DNA polymerase I KBVC1 I-E 27 KBVC1_CP019968_spacer_27 32 prophage KBTCE2prophage-2 prophage 15 93.3 7.12E-01 hypothetical protein KBVC1 I-E 27 KBVC1_CP019968_spacer_27 32 prophage KBTCE3-3 prophage 15 93.3 7.12E-01 phage portal protein KBVC1 I-E 28 KBVC1_CP019968_spacer_28 32 no hits * KBVC1 I-E 29 KBVC1_CP019968_spacer_30 32 prophage KBTCE2-2 prophage 13 100 2.04E-01 phage integrase KBVC1 I-E 29 KBVC1_CP019968_spacer_29 32 no hits * KBVC1 I-E 30 KBVC1_CP019968_spacer_30 32 IME1 IME1-13 32 100 5.07E-10 hypothetical protein * KBVC1 I-E 30 KBVC1_CP019968_spacer_30 32 prophage KBTCE3-3 prophage 13 100 2.04E-01 phage integrase KBVC1 I-E 31 KBVC1_CP019968_spacer_31 32 no hits * KBVC1 I-E 32 KBVC1_CP019968_spacer_32 33 no hits * KBVC1 I-E 33 KBVC1_CP019968_spacer_33 32 IME-vcrA IME-vcrA GT 19 89.5 1.46E-02 dsiB * KBVC1 I-E 33 KBVC1_CP019968_spacer_33 32 IME-vcrA IME-vcrA KBVC1 19 89.5 1.46E-02 dsiB KBVC1 I-E 33 KBVC1_CP019968_spacer_33 32 IME-vcrA IME-vcrA WBC-2 19 89.5 1.46E-02 dsiB KBVC1 I-E 34 KBVC1_CP019968_spacer_34 32 no hits * KBVC1 I-E 35 KBVC1_CP019968_spacer_35 33 no hits * KBVC1 I-E 36 KBVC1_CP019968_spacer_36 33 IME1 IME1-13 33 100 1.59E-10 hypothetical protein * KBVC1 I-E 37 KBVC1_CP019968_spacer_37 32 no hits * KBVC1 I-E 38 KBVC1_CP019968_spacer_38 31 prophage KBTCE2-2 prophage 12 100 6.67E-01 hypothetical protein * KBVC1 I-E 38 KBVC1_CP019968_spacer_38 31 prophage KBTCE3-3 prophage 12 100 6.67E-01 hypothetical protein

161

CRISPR spacer spacer Type of mobile D.mccartyi Mobile BLAST % Gene annotation of best strain FASTA name of spacer E-value type # length DNA which DNA name hit pairwise BLAST hit hit (bp) matched to spacer length identity KBVC1 I-E 39 KBVC1_CP019968_spacer_39 32 no hits * KBVC1 I-E 40 KBVC1_CP019968_spacer_40 32 no hits * KBVC1 I-E 41 KBVC1_CP019968_spacer_41 34 no hits *

162

Table C4. Putative annotation of open reading frames (ORFs) found in Dehalococcoides mccartyi prophages and Enterobacteria lambda and HK97 phages. Annotations were generated using the Phage Annotation Toolkit and NCBI database. Annotations are abbreviated by two letters and listed in full length above lambda phage, and below in Table C4A if not found in lambda phage. ORFs which could not be annotated are shown as white boxes with number indicating number of un-annotated open reading frames. Classification is based on protein similarity. All nucleotide sequences for phages are available in NCBI.

163

Table C5. Integrative Mobilizable Elements (IMEs) found targeted by the CRISPR systems of D. mccartyi KBVC1 and KBDCA3. Nucleotide sequences available from NCBI. Dmc –Dehalococcoides mccartyi. D. mccartyi host strain Circularized? Evidence for circularization Type Name D. mccartyi contigs yes Illumina sequencing: mate-pair and paired-end IME1 Dmc IME1-1 D. mccartyi contigs yes Illumina sequencing: mate-pair and paired-end IME1 Dmc IME1-2 D. mccartyi contigs IME1 Dmc IME1-3 D. mccartyi contigs IME1 Dmc IME1-4 D. mccartyi contigs IME1 Dmc IME1-5 D. mccartyi contigs IME1 Dmc IME1-6 D. mccartyi contigs IME1 Dmc IME1-7 KBTCE2 IME1 Dmc IME1-8 KBVC1 yes Illumina sequencing: mate-pair and paired-end IME1 Dmc IME1-9 KBDCA1 IME1 Dmc IME1-10 KBDCA1 IME1 Dmc IME1-11 KBDCA2 IME1 Dmc IME1-12 WBC-2 IME1 Dmc IME1-13 BAV1 IME1 Dmc IME1-14 195 yes published as tandem repeat IME1 Dmc IME1-15 11a5 IME2 Dmc IME2 BAV1 rdhA containing IME Dmc IME-OG24 DCMB5 rdhA containing IME Dmc IME-OG49 KBVC1, KBVC2, KBTCE1, yes 1) Illumina sequencing: mate-pair and paired-end 2) qPCR rdhA containing IME Dmc IME-vcrA WBC-2 plus PCR 195, KBTCE2, KBTCE3 sequence alignments indicate mobilization in last 100 years rdhA containing IME Dmc IME-tceA

164

Table C6. IME1-like constructs found in other bacteria whose genomes are available in NCBI. MP – mega plasmid. is IME1-like Do CRISPR # plasmids # IME1-like CRISPR construct spacers have Phylum Class Species NCBI Accession published constructs type if adjacent to matches to on NCBI in genome present prophage IME1-like sequence?

Proteo. Betaproteo. Nitrosomonas communis NZ_CP011451.1 0 1 no

Proteo. Betaproteo. Paucibacter sp. KCTC NZ_CP013692.1 0 1 yes

Proteo. Betaproteo. Ralstonia solanacearum FJACT-1458 NZ_CP016554.1 1 MP 3 no

Proteo. Betaproteo. Ralstonia solanacearum FJAT-91 NZ_CP016612.1 1 MP 4 no

Proteo. Betaproteo. Ralstonia solanacearum CMR15 NC_017559.1 1+ 1 MP 1 no

Proteo. Betaproteo. Ralstonia solanacearum p082 NC_017574.1 1 MP 1 yes CAS-I-E yes

Proteo. Betaproteo. Ralstonia solanacearum pS107 NC_014311.1 1 MP 1 yes

Proteo. Betaproteo. Ralstonia solanacearum GM1000 NC_003295.1 1 MP 2 yes

Proteo. Betaproteo. Ralstonia solanacearum OE1-1 NZ_CP009764.1 1 MP 1 yes

Proteo. Betaproteo. Hydrogenophaga sp. PBC NZ_CP017311.1 0 2 tandem duplicate

Proteo. Betaproteo. Rugosibacter aromaticivorans CP010554.1 0 1 yes

Proteo. Betaproteo. Janthinobacterium marseille NC_009659.1 0 1 yes

Proteo. Betaproteo. Acidovorax ebreus NC_011992.1 0 1 yes

Proteo. Betaproteo. Burkholderiales bacterium GJ-E10 NZ_AP014683.1 0 1 yes

Proteo. Betaproteo. Comamonadaceae bacterium B1 NZ_AP014569.1 1 1 yes

Proteo. Betaproteo. Cupriavidus gilardii NZ_CP010516.1 0 1 yes

Proteo. Betaproteo. Polyangium brachysporum NZ_CP011371.1 0 1 yes

Proteo. Betaproteo. Variovorax sp. HW608 NZ_LT607803.1 0 1 yes

Proteo. Betaproteo. Alicycliphilus denitrificans K601 NC_015422.1 1 1 yes CAS-II-C no

Proteo. Betaproteo. Burkholderia vietnameinsis G4 CP000614.1 5 1 yes

Proteo. Gammaproteo. Hahella chejunsis NC_007645.1 0 1 yes CAS-I-C no

Proteo. Gammaproteo. Stenotrophomonas maltophilia K279a NC_010943.1 2 1 yes

Proteo. Gammaproteo. Lysobacter gummosus NZ_CP011131.1 0 1 yes

Proteo. Gammaproteo. Xanthomonas albilineans NC_013722.1 3 1 yes CAS-I-C yes

Proteo. Deltaproteo. Desulfovibrio vulgaris DP4 NC_008751.1 1 1 yes and I-F

Proteo. Deltaproteo. Desulfarculus baarsii NC_014365.1 0 1 yes CAS-I-C no

Proteo. Deltaproteo. Geobacter sulfurreducens AM-1 NZ_CP010430.1 0 1 yes andCAS III-I-DE no

Proteo. Acidithiobacillia Acidithiobacillus caldus ATCC 51756 NZ_CP005986.1 2 + 1 MP 1 yes CASand- IVI-U-A no

Nitro. Nitrospira Nitrospira defluvii NC_014355.1 0 1 yes CAS-I-E no

165

Table C7. Quantitative PCR (qPCR) raw data used to make Figure 4-12 & 4-13. Dhc - Dehalococcoides mccartyi. DNA extracts from 2mL culture. Raw data - Mean Starting Quantity (copies/uL) or Concentration in culture Ratio calculated as range if from 2 (copies/mL culture) qPCR reactions

number of number of qPCR qPCR Sub-culture Day See Figure vcrA Dhc 16S rRNA vcrA Dhc 16S rRNA vcrA/Dhc reactions reactions vcrA Dhc

KB1/VC-ME b 0 4-12 3 2 3.98E+06 3.33E+06 1.99E+09 1.67E+09 1.2 KB1/VC-ME b 2 4-12 2 3 3.06E+06 2.54E+06 1.53E+09 1.27E+09 1.2 KB1/VC-ME b 5 4-12 3 2 4.52E+06 3.35E+06 2.26E+09 1.67E+09 1.3 KB1/VC-ME b 7 4-12 3 3 4.33E+06 3.64E+06 2.17E+09 1.82E+09 1.2 KB1/VC-ME b 9 4-12 2 3 4.47E+06 3.62E+06 2.23E+09 1.81E+09 1.2 KB1/VC-ME b 12 4-12 2 3 3.49E+06 2.97E+06 1.75E+09 1.48E+09 1.2 KB1/VC-ME b 16 4-12 3 3 3.76E+06 3.20E+06 1.88E+09 1.60E+09 1.2 KB1/VC-ME b 19 4-12 3 2 3.99E+06 3.37E+06 2.00E+09 1.69E+09 1.2 KB1/VC-ME b 21 4-12 3 2 3.72E+06 3.19E+06 1.86E+09 1.59E+09 1.2 KB1/VC-ME b 23 4-12 3 2 6.28E+06 3.31E+06 3.14E+09 1.66E+09 1.9 KB1/VC-ME b 26 4-12 2 2 7.32E+06 3.49E+06 3.66E+09 1.75E+09 2.1 KB1/VC-ME b 30 4-12 3 3 5.64E+06 3.39E+06 2.82E+09 1.70E+09 1.7 KB1/VC-ME b 35 4-12 2 2 5.30E+06 3.32E+06 2.65E+09 1.66E+09 1.6 KB1/VC-ME b 37 4-12 3 3 5.68E+06 3.79E+06 2.84E+09 1.90E+09 1.5 KB1/VC-ME b 40 4-12 3 3 4.50E+06 3.46E+06 2.25E+09 1.73E+09 1.3 KB1/VC-ME c 0 4-13(b) 3 3 4.76E+06 3.82E+06 2.38E+09 1.91E+09 1.2 KB1/VC-ME c 2 4-13(b) 3 3 4.72E+06 3.65E+06 2.36E+09 1.82E+09 1.3 KB1/VC-ME c 5 4-13(b) 3 2 5.25E+06 5.08E+06 2.62E+09 2.54E+09 1.0 KB1/VC-ME c 7 4-13(b) 3 2 6.06E+06 5.91E+06 3.03E+09 2.96E+09 1.0 KB1/VC-ME c 9 4-13(b) 3 3 4.94E+06 5.08E+06 2.47E+09 2.54E+09 1.0 KB1/VC-ME c 12 4-13(b) 3 2 4.76E+06 3.70E+06 2.38E+09 1.85E+09 1.3 KB1/VC-ME c 16 4-13(b) 3 3 4.57E+06 3.83E+06 2.28E+09 1.91E+09 1.2 KB1/VC-ME c 19 4-13(b) 3 3 2.75E+06 2.76E+06 1.37E+09 1.38E+09 1.0

166

Raw data - Mean Starting Quantity (copies/uL) or Concentration in culture Ratio calculated as range if from 2 (copies/mL culture) qPCR reactions

number of number of qPCR qPCR Sub-culture Day See Figure vcrA Dhc 16S rRNA vcrA Dhc 16S rRNA vcrA/Dhc reactions reactions vcrA Dhc

KB1/VC-ME c 21 4-13(b) 3 3 4.17E+06 2.37E+06 2.09E+09 1.19E+09 1.8 KB1/VC-ME c 23 4-13(b) 3 3 5.48E+06 3.49E+06 2.74E+09 1.74E+09 1.6 KB1/VC-ME c 26 4-13(b) 3 3 5.85E+06 3.58E+06 2.93E+09 1.79E+09 1.6 KB1/VC-ME c 30 4-13(b) 3 3 6.91E+06 4.26E+06 3.45E+09 2.13E+09 1.6 KB1/VC-ME c 35 4-13(b) 3 3 5.08E+06 3.35E+06 2.54E+09 1.68E+09 1.5 KB1/VC-ME c 37 4-13(b) 3 3 3.99E+06 2.46E+06 1.99E+09 1.23E+09 1.6 KB1/VC-ME c 40 4-13(b) 3 3 5.10E+06 3.50E+06 2.55E+09 1.75E+09 1.5 KB1/VC-ME a 0 4-13(a) 3 3 3.26E+06 2.71E+06 1.63E+09 1.36E+09 1.2 KB1/VC-ME a 2 4-13(a) 3 2 3.36E+06 2.72E+06 1.68E+09 1.36E+09 1.2 KB1/VC-ME a 5 4-13(a) 3 3 4.56E+06 2.87E+06 2.28E+09 1.44E+09 1.6 KB1/VC-ME a 7 4-13(a) 3 2 5.24E+06 2.81E+06 2.62E+09 1.40E+09 1.9 KB1/VC-ME a 9 4-13(a) 3 2 4.63E+06 2.94E+06 2.31E+09 1.47E+09 1.6 KB1/VC-ME a 12 4-13(a) 3 3 3.19E+06 2.58E+06 1.59E+09 1.29E+09 1.2 KB1/VC-ME a 16 4-13(a) 3 2 3.13E+06 2.24E+06 1.56E+09 1.12E+09 1.4 KB1/VC-ME a 19 4-13(a) 3 2 3.16E+06 2.14E+06 1.58E+09 1.07E+09 1.5 KB1/VC-ME a 21 4-13(a) 3 3 2.74E+06 1.86E+06 1.37E+09 9.32E+08 1.5 KB1/VC-ME a 23 4-13(a) 3 3 3.13E+06 2.09E+06 1.56E+09 1.05E+09 1.5 KB1/VC-ME a 30 4-13(a) 3 3 5.08E+06 2.24E+06 2.54E+09 1.12E+09 2.3 KB1/VC-ME a 35 4-13(a) 3 2 3.26E+06 2.49E+06 1.63E+09 1.25E+09 1.3 KB1/VC-ME a 40 4-13(a) 3 2 3.44E+06 3.06E+06 1.72E+09 1.53E+09 1.1

167

Table C8. Details of standard curves generated for qPCR including slope, efficiency, R2, and Y-intercepts. Dhc - Dehalococcoides mccartyi

Run # Samples quantified gene target slope y-intercept efficiency R2

1 KB1/VC-ME a day 0-12 Dhc 16S rRNA -3.905 38.2 80.3 0.997

1 KB1/VC-ME a day 0-12 vcrA -3.910 39.9 80.2 0.998

2 KB1/VC-ME a day 16-40 Dhc 16S rRNA -3.918 39.0 80.0 0.999

2 KB1/VC-ME a day 16-40 vcrA -3.908 39.7 80.3 0.993

3 KB1/VC-ME b day 0-40 Dhc 16S rRNA -3.916 41.4 80.0 0.998

3 KB1/VC-ME b day 0-40 vcrA -3.907 43.4 80.3 1.000

4 KB1/VC-ME c day 0-16 Dhc 16S rRNA -3.841 39.5 80.7 1.000

4 KB1/VC-ME c day 0-16 vcrA -3.917 40.3 80.0 0.999

5 KB1/VC-ME c day 19-40 Dhc 16S rRNA -3.912 40.7 80.1 0.996

5 KB1/VC-ME c day 19-40 vcrA -3.741 42.6 85.0 0.996

168

Figure C1 (a-c). Gel photos of PCR amplified CRISPR I-E and I-C arrays from different KB-1 enrichment cultures over time.

Figure C1a. Agarose gel showing PCR amplification product of CRISPR I-E array from three KB-1 enrichment cultures over time.

169

Figure C1b. Agarose gel showing PCR amplification product of CRISPR I-E (first 9 lanes) and I- C (last two lanes) arrays over time.

170

Figure C1c. Agarose gel showing PCR amplification product of CRISPR I-C array from two KB-1 enrichment cultures over time.

171

Appendix D: Supplemental Information for Chapter 5

Changing donor from 5x hydrogen to 5x methanol with 3.5x ethanol increases methane production and dechlorination in the KB-1/VC-H2 progeny cultures.

The parent KB-1/VC-H2 culture is known to dechlorinate slowly and supports a smaller Dehalococcoides mccartyi population than KB-1 enrichment cultures maintained on 5x methanol. The experimental cultures were switched from 5x H2 to 5x methanol with 3.5x ethanol. Faster dechlorination was observed after changing donor (Figure D1). Without changing the donor, study of these enrichment cultures would be difficult due to time required to complete one dechlorination cycle.

Ha 0.6 Purge and re-feed 0.07mmol VC 0.5 and 5x MeOH and 3.5x EtOH 0.4 Add 5x MeOH 0.3 and 3.5x EtOH Ethene Feed 0.07 mmol VC Methane mmol/bottle 0.2 and 5x H2 VC 0.1

0 1 21 41 61 81 Date

Figure D1. Dechlorination profile of Ha culture first amended with 0.07 mmol VC (17mg/l) and 5x electron equivalents required for complete dechlorination in H2, after sixty days changed to 5x MeOH with 3.5 x EtOH.

Initial tracking of the number of copies of vcrA and D. mccartyi 16S rRNA during a dechlorination cycle

172

The WBC-2-VC parent culture was tracked over a dechlorination cycle to determine whether changes in the copies of vcrA to D. mccartyi 16S rRNA could differ due to the process of cell (and genome) duplication given that the vcrA gene is approximately 100 kbp from the origin while 16S rRNA gene is 450 kbp from the origin. During a dechlorination cycle the ratio of vcrA to D. mccartyi 16S rRNA remained 1:1 (Figure D2).

feed feed vcrA 16mg/l 16 mg/l 4.E+08 0.07 VC VC Dhc 1.1 3.E+08 0.06 1.1 3.E+08 1.0 0.05

2.E+08 1.1 0.04 1.1

0.03 2.E+08

mmol/culture copies/mLculture 0.02 1.E+08

0.01 5.E+07

0.00 0.E+00 0 2 4 6 8 10 0 2 4 6 8 10 Day Day

Figure D2. Left: Dechlorination of vinyl chloride (VC) by WBC-2 VC parent enrichment culture. Arrows indicate DNA sampling points. Right: The number of copies of vcrA and D. mccartyi (Dhc) 16S rRNA as determined by qPCR.

173

Appendix E: Supplemental Information for Chapter 6

Figure E1. Image of the blue native-PAGE gels showing molecular weight ladder and first sample lane. Remaining sample lanes were not stained, but instead cut and used in activity assays. In the tables on the right, total dechlorination products produced (nmol) are shown per band for tDCE and VC activity assays. A) WBC-2 tDCE enrichment culture tDCE/EL_2011_A. B) WBC-2 VC enrichment culture VC/EL_2010.

174

Figure E2. Genomic orientation of tdrA and tdrB genes in Dehalogenimonas in the 5’-3’ direction.

Approximate binding location of tdrA primers is indicated on black line below image. Characteristic RDase features such as the twin-arginine export sequence, two iron-sulfur clusters and B gene trans-membrane helix (as predicted by tmap (EMBOSS tools)) are indicated. Putative domains as derived from Sulfurospirillum multivorans PceA alignment are also indicated.

175

Figure E3. Phylogenetic maximum likelihood amino acid tree of two-hundred and twenty- five reductive dehalogenases, A subunit only.

Coloring and numbering show different ortholog groups (greater than 90% amino acid identity). Dehalogenimonas WBC-2 RDases are in green, star indicates detected during LC-MS/MS analysis, D.lykanthroporepellens RDases are in dark green, remaining RDases belong to various Dehalococcoides mccartyi. Red stars indicate known function. Scale shows amount of genetic change. Tree branches were made based on 100 bootstraps.

176

Table E1. qPCR primers used in this study Target and Primer Tm Source or Primer Sequence (5’ to 3’) Accession Name (°C) Reference Dehalobacter Dhb 477F GATTGACGGTACCTAACGAGG 62.5 1 Dhb 647R TACAGTTTCCAATGCTTTACG Dehalococcoides Dhc1F GATGAACGCTAGCGGCG 60 2 Dhc264R CCTCTCAGACCAGCTACCGATCGAA Dehalogenimonas Dhg273F TAGCTCCCGGTCGCCCG 59 3 Dhg537R CCTCACCAGGGTTTGACATGTTAGAAG tdrA AKG53095.1 TdrA1404F GCCTCTCGCCCCCACTAAACC 62.5 This study TdrA1516R GCCATCCTTCATAACCACTCACGCA vcrA AEI59432.1 VcrA 670F GCCCTCCAGATGCTCCCTTTAC 60 This study VcrA 440R TGCCCTTCCTCACCACTACCAG bvcA AGY78804.1 BvcA 318F ATTTAGCGTGGGCAAAACAG 60 4 BvcA 555R CCTTCCCACCTTGGGTATTT Table E2. Initial Assays on CFEs to investigate substrate range. In an activity survey of different chlorinated ethanes and ethenes cell free extract (CFE) taken from Dehalococcoides and Dehalogenimonas containing cultures show high activity on tDCE when compared with cultures containing only Dehalococcoides. Protein normalized activity (nmol·min-1·mg protein-1) is shown as well as percentage of maximum dechlorination activity observed. Each culture was sampled three times and protein extraction was conducted in parallel. Protein normalized activity is reported as an average of the three runs with standard deviation except where specified. Relative rates were normalized to 1,1-DCE. Shading indicates <10% of 1,1-DCE activity.

Extract from culture Extract from culture tDCE1/EL_2010 VC/EL_2010 (Dehalogenimonas and (Dehalococcoides only) Dehalococcoides) nmol·min-1·mg nmol·min-1·mg Substrate protein-1 % of 1,1-DCE protein-1 % of 1,1-DCE PCE 0.0 ± 0.0 0% 0.1 ± 0.0 1% TCE 4.1 ± 0.7 40% 2.3 ± 0.2 23% cis-DCE 9.4 ± 2.3 91% 15 ± 0.3 150% tDCE 1.8 ± 0.3 17% 0.1 ± 0.0 1% 1,1-DCE 10 ± 1.9 100% *10 ± 0.1 100% VC 1.3 ± 0.3 13% 4.4 ± 0.2 43% 1,1,2,2-TeCA 1.0 ± 0.3 10% 0.9 ± 0.4 9% 1,1,2-TCA 0.2 ± 0.0 2% 0.2 ± 0.0 2% 1,1,1-TCA 0.1 ± 0.1 1% 0.2 ± 0.1 2% 1,2-DCA 3.2 ± 1.2 31% 1.3 ± 0.2 13% 1,1-DCA 0.0 ± 0.3 0% 0.0 ± 0.0 0% * Range of two samples; Shaded values are close to detection limits.

177

Table E3. The calibration curves generated from qPCR runs using Ssofast EvaGreen qPCR reagent. E = efficiency of the reaction. Standard curve equations x = log of starting amount to y = CT (threshold cycle; cycle at which amplified products can be detected). Target Standard Curve Equations E (%) Dehalobacter 16S rRNA y=-3.262x+35.345; r2 = 0.99 102.6 Dehalococcoides 16S rRNA y=-3.201x+34.713; r2 = 0.99 105.3 Dehalogenimonas 16S rRNA y=-3.492x+39.099; r2 = 0.99 93.4 trdA y=-3.463x+40.744; r2 = 0.99 94.4 vcrA y=-3.318x+39.082; r2 = 0.99 100.2 bvcA y=-3.191x+39.376; r2 = 0.99 105.8

Appendix F: List of co-authored publications

Liang X, Molenda O, Tang S, Edwards EA. 2015. Identity and substrate-specificity of reductive dehalogenases expressed in Dehalococcoides-containing enrichment cultures maintained on different chlorinated ethenes. Appl Environ Microbiol 81:4626-4633. Mayer-Blackwell K, Fincker M, Molenda O, Callahan B, Sewell H, Holmes S, Edwards EA, Spormann AM. 2016. 1,2-Dichloroethane Exposure Alters the Population Structure, Metabolism, and Kinetics of a Trichloroethene-Dechlorinating Dehalococcoides mccartyi Consortium. Environ Sci Technol 50:12187-12196. Schnurr PJ, Molenda O, Edwards E, Espie GS, Allen DG. 2016. Improved biomass productivity in algal biofilms through synergistic interactions between photon flux density and carbon dioxide concentration. Bioresour Technol 219:72-79. Kocur CMD, Lomheim L, Molenda O, Weber KP, Austrins LM, Sleep BE, Boparai HK, Edwards EA, O’Carroll DM. 2016. Long-Term Field Study of Microbial Community and Dechlorinating Activity Following Carboxymethyl Cellulose-Stabilized Nanoscale Zero- Valent Iron Injection. Environ Sci Technol 50:7658-7670. Yan J, Bi M, Bourdon AK, Farmer AT, Wang P-H, Molenda O, Quaile A, Jiang N, Yang Y, Yin Y, Simsir B, Campagna SR, Edwards EA, Loeffler FE. 2018. Purinyl-cobamide is a native prosthetic group of reductive dehalogenases. Nat Chem Biol 14:8-14. Puentes Jacome LA,Wang P, Molenda O, Li J, Krivushin K, Islam MA, Edwards EA. 2018. Sustainable and complete chloroethene dechlorination in Dehalococcoides-enriched cultures grown without exogenous vitamin amemdments at varying pH conditions. Environ Sci Technol in prep.

Appendix References 1. Hendrickson ER, Payne JA, Young RM, Starr MG, Perry MP, Fahnestock S, Ellis DE, Ebersole RC. 2002. Molecular analysis of Dehalococcoides 16S Ribosomal DNA from chloroethene-contaminated sites throughout North America and Europe. Appl Environ Microbiol 68:485-495. 2. Manchester M, Hug LA, Zarek M, Zila A, Edwards EA. 2012. Discovery of a trans- dichloroethene-respiring Dehalogenimonas species in the 1,1,2,2-tetrachloroethane- dechlorinating WBC-2 consortium. Appl Environ Microbiol 78:5280-5287. 3. Grostern A, Edwards EA. 2006. Growth of Dehalobacter and Dehalococcoides spp. during degradation of chlorinated ethanes. Appl Environ Microbiol 72:428-436. 4. Waller AS, Krajmalnik-Brown R, Löffler FE, Edwards EA. 2005. Multiple reductive- dehalogenase-homologous genes are simultaneously transcribed during dechlorination by Dehalococcoides-containing cultures. Appl Environ Microbiol 71:8257-8264.

179