INVESTIGATION OF -CONTAINING CULTURES THAT REDUCTIVELY DECHLORINATE AND 1,1,1-TRICHLOROETHANE

by

Shuiquan Tang

A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy

Chemical Engineering and Applied Chemistry University of Toronto

© Copyright by Shuiquan Tang 2014

INVESTIGATION OF DEHALOBACTER-CONTAINING CULTURES THAT REDUCTIVELY DECHLORINATE CHLOROFORM AND 1,1,1-TRICHLOROETHANE

Shuiquan Tang

Doctor of Philosophy

Chemical Engineering and Applied Chemistry University of Toronto

2014

Abstract

ACT-3 is an enrichment culture derived from aquifer material collected from a contaminated site in the northeastern United States. ACT-3 is currently used as a bioaugmentation culture to detoxify groundwater at sites contaminated by 1,1,1-trichloroethane (1,1,1-TCA) and chloroform (CF). This thesis aims to improve the understanding of this culture and its dominant dechlorinating organisms, Dehalobacter spp., which are commonly involved in organohalide respiration but thus far poorly understood. This research contributes to the expanding application of for site remediation.

ACT-3 can dechlorinate 1,1,1-TCA to monochloroethane (CA) via 1,1-dichloroethane (1,1-DCA), and CF to dichloromethane (DCM). By partially separating (RDases) using blue native polyacrylamide gel electrophoresis, followed by enzymatic assays for dechlorination and peptide sequencing with liquid chromatography tandem mass spectrometry, two novel RDases were found coexpressed in ACT-3: CfrA dechlorinating 1,1,1-TCA to 1,1-DCA and CF to DCM, and DcrA dechlorinating 1,1-DCA to CA. The identification of these two RDases indicated the potential co-existence of two Dehalobacter strains in ACT-3. This was later confirmed as two complete Dehalobacter genomes were assembled from metagenomic sequences of ACT-3 and a CF-amended subculture. The coexistence of these two highly similar Dehalobacter genomes in the ACT-3 metagenome resulted in severe fragmentation in the

ii preliminary assembly. An in silico gap-resolution method was developed, enabling assembly of Dehalobacter sequences from ACT-3 into a hybrid genome from two strains. By comparing this hybrid genome with the sequencing data from the CF subculture, which has only one of the two Dehalobacter strains, two separate and complete Dehalobacter genomes were obtained. One of the two genomes harbours the gene encoding CfrA and the other harbours the gene encoding DcrA. The analysis of these two genomes has significantly improved our understanding of Dehalobacter. In addition, two distinct Dehalobacter strains were successfully isolated from ACT-3 after a series of dilution-to-extinction transfers using media amended with sterile mixed culture supernatant collected from ACT-3. Finally, the complete genome of a CF- and 1,1,1-TCA-tolerant fermenting Bacteroidales strain was also assembled from metagenomic sequences.

iii

Acknowledgments

Before joining the Edwards lab, my self-confidence was at the lowest point of my life. The research described in this thesis has restored my self-confidence. I have to thank lots of people who have helped me to accomplish this work; in the following, I will highlight some.

First of all, I have to thank my supervisor, Elizabeth Edwards. There is no one better than I to appreciate the importance of having a good supervisor for Ph.D. work. “This time, you are in good luck.” This is what I told myself and what I told new students in the lab. I wished I could copy all of her shining traits and paste them into mine. Although it is impossible to learn everything from her, I have definitely learned a lot. I especially appreciate her knowledge, her compassion to anyone, her passion in science, her patience, and her way of empowering people. Everytime I discovered something exciting in my research I couldn’t wait to share it with her because she loves science. Everytime I felt frustrated with difficulties in my research, she always motivated me and gave endless ideas. “Recharged” was what felt after research meetings with her.

Before this thesis, my background was bioprocess engineering, which is different from my current work, i.e. anaerobic microbial cultivation, molecular biology and metagenomics. Other than from my supervisor, there are some great teachers in this lab, from whom I learned the new techniques and knowledge. This thesis would be impossible without Ariel Grostern’s work. He is the previous Ph.D. student who established all cultures and passed them to me. He taught me how to grow anaerobic cultures. I learned PCR and Geneious from Laura Hug, a model researcher. I used to bug her frequently with questions in research because she has such a big knowledge base, comprehensive and organized. Cheryl Washer is another great teacher and a good friend. I also need to thank another three colleagues who taught me some important techniques: Melanie Dahumel, teaching me how to fix a glovebox and work with gas cylinders; Winnie Chan, teaching me the blue native polyacrylamide gel electrophoresis technique; and Alison Waller, teaching me the CTAB DNA extraction method. They are all great teachers and nice people to work with.

I have to thank supporting staff in BioZone who have made this Ph.D. a lot easier. Endang Susilawati is like our lab mum, who is always ready to help. She saved me lots of time in iv ordering, searching for chemicals and glassware, etc. Before Susie joined the lab, it was Angelika Duffy who helped me with these. She is great too. Weijun Gao is our IT manager and always ready to help. He saved me lots of time installing bioinformatics software packages on our server and taught me great computer skills. Line Lomheim is our research assistant. She has helped with regular lab work and some side projects. She is such a nice coworker. I have to thank Christina heidorn, who taught me nice presentation skills and help correct our English writing.

I miss the days playing pingpong with my colleagues, Xiaoming Liang, Fei Luo and Kai Wei. Playing pingpong was the most relaxed thing in my Ph.D. study; thanks for that. Thanks to Ivy Yang, who provided important advice about taking care of my newborn baby. Thanks to Alfredo Perez de Mora, who was friendly and excited about my research progress. Thanks to other colleagues in the Edwards lab, a cozy family; thanks to everyone for being so supportive all the time.

Finally, I would thank my wife, Liqin Xu, and my baby, Ella Tang. They are the power that keeps me going forward. Thanks to my mother and father in China for patiently waiting for my graduation.

v

Table of Contents

Abstract ...... ii Acknowledgments ...... iv Table of Contents ...... vi List of Tables ...... xi List of Figures ...... xii List of Appendices ...... xiv List of Abbreviations ...... xvi Chapter 1 General Introduction ...... 1 1.1 CHLOROFORM AND 1,1,1-TRICHLOROETHANE AS GROUNDWATER POLLUTANTS ...... 1 1.2 ORGANOHALIDE RESPIRATION OF CF AND 1,1,1-TCA ...... 3 1.3 DEHALOBACTER ...... 5 1.4 REDUCTIVE DEHALOGENASES ...... 6 1.5 ACT-3 AND ITS SUBCULTURES ...... 9 1.6 RATIONALE AND RESEARCH OBJECTIVES ...... 11 1.7 THESIS OUTLINE ...... 12 1.8 STATEMENT OF AUTHORSHIP AND PUBLICATION STATUS ...... 14 Chapter 2 Functional Characterization of Reductive Dehalogenases Using Blue Native Polyacrylamide Gel Electrophoresis ...... 17 2.1 ABSTRACT ...... 17 2.2 INTRODUCTION ...... 17 2.3 MATERIALS AND METHODS ...... 19 2.3.1 Cultures and growth conditions ...... 19 2.3.2 Preparation of crude protein extracts ...... 19 2.3.3 BN-PAGE gel electrophoresis and staining ...... 20 2.3.4 Protein quantification in BN-PAGE gels ...... 21 2.3.5 Dechlorination activity assays using gel slices ...... 22 2.3.6 Analysis of dechlorination products ...... 22 2.3.7 SDS-PAGE ...... 23 2.3.8 LC-MS/MS analysis ...... 23 vi

2.3.9 Reference databases used for LC-MS/MS analysis ...... 24 2.4 RESULTS ...... 24 2.4.1 RDase expression in BAV1 cultures ...... 24 2.4.2 RDase expression in KB-1-derived cultures ...... 27 2.4.3 Investigating RDase enrichment during BN-PAGE ...... 30 2.4.4 Identification of other proteins in active gel slices ...... 31 2.5 DISCUSSION ...... 31 2.5.1 Functional characterization of RDases ...... 31 2.5.2 Features of BN-PAGE ...... 34 2.6 ACKNOWLEDGEMENTS ...... 34 Chapter 3 Identification of Dehalobacter Reductive Dehalogenases that Catalyze Dechlorination of Chloroform, 1,1,1-Trichloroethane and 1,1-Dichloroethane ...... 36 3.1 ABSTRACT ...... 36 3.2 INTRODUCTION ...... 36 3.3 MATERIALS AND METHODS ...... 38 3.3.1 Cultures and culture history ...... 38 3.3.2 Metagenome sequencing and assembly ...... 38 3.3.3 Identification of putative rdhA and rdhB genes ...... 39 3.3.4 Sample preparation for Blue Native Polyacrylamide Gel Electrophoresis ...... 39 3.3.5 BN-PAGE gel electrophoresis and staining ...... 40 3.3.6 SDS-PAGE ...... 40 3.3.7 Assaying dechlorinating activity in gel slices ...... 41 3.3.8 LC-MS/MS analysis of gel slices ...... 42 3.3.9 PCR reactions ...... 43 3.3.10 Sequence analysis ...... 43 3.3.11 Nucleotide sequence accession numbers ...... 44 3.4 RESULTS ...... 44 3.4.1 Metagenome sequencing...... 44 3.4.2 Functional differentiation of the three mixed cultures ...... 45 3.4.3 RDase expression in the CF subculture ...... 46 3.4.4 RDase expression in the DCA subculture ...... 47 3.4.5 RDase expression in the ACT-3 parent culture ...... 50

vii

3.4.6 Distribution of cfrA and dcrA genes ...... 51 3.4.7 Expression of non-RDase proteins ...... 52 3.4.8 Sequence analysis of CfrA and DcrA ...... 52 3.5 DISCUSSION ...... 55 3.5.1 CfrA and DcrA ...... 55 3.5.2 CfrB and DcrB ...... 55 3.5.3 Non-RDase proteins ...... 56 3.6 ACKNOWLEDGEMENTS ...... 56 Chapter 4 Semi-Automatic In Silico Gap Closure Enabled De Novo Assembly of Two Dehalobacter Genomes from Metagenomic Data ...... 58 4.1 ABSTRACT ...... 58 4.2 INTRODUCTION ...... 59 4.3 MATERIALS AND METHODS ...... 60 4.3.1 Culture description and metagenomic DNA sequencing ...... 60 4.3.2 Genome assembly and gap resolution ...... 63 4.3.3 Separation of the two Dehalobacter genomes ...... 65 4.3.4 PCR reactions ...... 67 4.3.5 Genome polishing ...... 67 4.3.6 Testing on a published genome ...... 68 4.3.7 Accession numbers ...... 69 4.4 RESULTS ...... 69 4.4.1 Draft assembly of the ACT-3 metagenome ...... 69 4.4.2 In Silico gap resolution ...... 71 4.4.3 Alternative scaffolds ...... 79 4.4.4 Order of scaffolds ...... 80 4.4.5 Read mapping ...... 81 4.4.6 Recalcitrant gaps ...... 81 4.4.7 PCR verification ...... 84 4.4.8 The two Dehalobacter genomes ...... 85 4.4.9 Testing re-assembly of a published genome ...... 85 4.5 DISCUSSION ...... 87 4.6 CONCLUSIONS ...... 89

viii

4.7 ACKNOWLEDGEMENTS ...... 90 Chapter 5 Sister Dehalobacter Genomes Revealed Specialization in Organohalide Respiration and Strain Differentiation Driven by Chlorinated Substrates ...... 91 5.1 ABSTRACT ...... 91 5.2 INTRODUCTION ...... 92 5.3 METHODS ...... 93 5.4 RESULTS AND DISCUSSION ...... 94 5.4.1 General genome features ...... 94 5.4.2 Recent differentiation between strains CF and DCA ...... 99 5.4.3 rdhA gene clusters ...... 101 5.4.4 Horizontal gene transfer ...... 103 5.4.5 Metabolism ...... 105 5.5 CLOSING REMARKS ...... 111 5.6 ACKNOWLEDGEMENTS ...... 113 Chapter 6 Isolation of Three Dehalobacter Strains ...... 114 6.1 INTRODUCTION ...... 114 6.2 MATERIALS AND METHODS ...... 117 6.2.1 Medium and mixed culture supernatants ...... 117 6.2.2 Cultures ...... 117 6.2.3 Dilution-to-extinction transfers ...... 118 6.2.4 Purity tests ...... 118 6.3 RESULTS AND DISCUSSION ...... 119 6.3.1 Stimulatory effects of Mixed Culture Supernatant (MCS) from ACT-3 ...... 119 6.3.2 Dilution-to-extinction transfers ...... 121 6.3.3 Purity tests ...... 122 6.3.4 Different morphologies of Dehalobacter cells ...... 125 6.4 CONCLUSIONS ...... 125 Chapter 7 Complete Genome of Bacteroidales strain CF from a Chloroform- Dechlorinating Enrichment Culture ...... 127 7.1 ABSTRACT ...... 127 7.2 INTRODUCTION ...... 127 7.3 METHODS AND RESULTS ...... 128

ix

7.4 ACKNOWLEDGEMENTS ...... 129 Chapter 8 Summary, Significance and Future Work ...... 130 8.1 SUMMARY ...... 130 8.2 SIGNIFICANCE ...... 135 8.3 FUTURE WORK ...... 137 References ...... 142 Appendix ...... 158

x

List of Tables

Table 2.1 D. mccarty strain BAV1 proteins identified in the BN-PAGE gel region of enriched dechlorinating activity...... 28

Table 2.2 Summary of BN-PAGE analyses for the four different cultures...... 33 Table 3.1 Results of 24-hour assays of the crude protein extracts from the three mixed cultures...... 45

Table 3.2 Proteins identified in the gel slices of enriched activity from BN-PAGE gels using the reference database of all proteins identified in the ACT-3 metagenome...... 47

Table 4.1 Dehalobacter scaffolds in the ACT-3 metagenome...... 70 Table 4.2 Length, read depth and annotation of multi-copy sequences in the Dehalobacter genomes...... 73

Table 5.1 General features of the four Dehalobacter genomes...... 95 Table 6.1 Examples of Dehalobacter and strains isolated in the literature. ... 115 Table 6.2 Summary of assignment in the results of 16S rRNA gene pyrotag sequencing on the two cultures of DHB-11DCA/H2 and DHB-12DCA/H2...... 123

xi

List of Figures

Figure 1.1 ACT-3 and its subcultures...... 10 Figure 2.1 RDase expression in a strain BAV1 culture grown on cis-DCE...... 26 Figure 2.2 RDase expression in a BAV1 culture grown on 1,2-DCA ...... 27 Figure 2.3 BN PAGE of protein extracts of the VC-induced KB-1 culture and VC dechlorinating activity in the bands...... 29 Figure 2.4 Peptide hits and coverage of the RDases identified from the three KB-1 related cultures...... 30 Figure 3.1 Image of left two lanes of the BN-PAGE Gel, showing molecular weight ladder and the first stained sample lane...... 42 Figure 3.2 Results of BN-PAGE with protein samples from the CF subculture...... 49 Figure 3.3 Results of BN-PAGE with protein samples from the DCA subculture...... 50 Figure 3.4 Results of BN-PAGE with protein samples from the ACT-3 culture...... 51 Figure 3.5 Amino acid sequence alignment for CfrA versus DcrA, and CfrB versus DcrB. Sequences shown are for CfrA and CfrB...... 53 Figure 3.6 Maximum likelihood phylogenetic tree of the RDases that have functional characterization...... 54 Figure 4.1 Flow chart of the sequence assembly process...... 61 Figure 4.2 Microbial composition determined by 16S rRNA pyrotag sequences from the ACT-3 parent culture and two subcultures...... 62 Figure 4.3 Overview of the in silico gap-resolution process...... 64 Figure 4.4 Separation of the genome of strain CF by progressive read-mapping...... 66 Figure 4.5 Contig distribution in the ACT-3 metagenome...... 70 Figure 4.6 Typical gaps in Group A...... 72 Figure 4.7 Typical gaps in Group B...... 74 Figure 4.8 Typical gaps in Group C...... 75 Figure 4.9 Typical gaps in Group D...... 77 Figure 4.10 Assessment of the assembly using gap-distance comparisons...... 78 Figure 4.11 Combinations of alternative scaffolds...... 79 Figure 4.12 Schematic of the draft chimeric Dehalobacter genome from the ACT-3 metagenome...... 80 xii

Figure 4.13 Gap 00229-G-00230...... 82 Figure 4.14 The incomplete resolution of three gaps in which 5S, 16S, and 23S rRNA genes locate...... 84 Figure 4.15 The alignment of the two Dehalobacter genomes: strain CF and strain DCA...... 85 Figure 4.16 Alignment of the published assembly versus the new (this study) assembly of the B. salanitronis genome...... 86 Figure 5.1 A maximum-likelihood phylogenetic tree of the four Dehalobacter strains...... 96 Figure 5.2 Genome circular map of Dehalobacter sp. strain CF...... 97 Figure 5.3 Whole genome alignment of strains CF and DCA...... 99 Figure 5.4 Sequence alignment of the gene neighborhoods of cfrA and dcrA...... 100 Figure 5.5 Sequence alignment between strain PER-K23 and strain CF or strain E1 focused on the two rdhA clusters...... 102 Figure 5.6 Central carbon metabolism...... 106 Figure 5.7 Comparing strain CF, PER-K23 and E1 on three gene clusters related to (a) the Wood-Ljungdahl pathway, (b) molybdopterin biosynthesis, and (c) cobalamin biosynthesis...... 108 Figure 6.1 The dechlorination profiles of the cultures of (a) DHB-111TCA/H2 and (b) DHB- 11DCA/H2 with and without ACT-3 mixed culture supernatant (MCS)...... 120 Figure 6.2 Typical dechlorination profiles of dilution-to-extinction transfers of the three Dehalobacter cultures in this study when ACT-3 MCS was added to the medium...... 121 Figure 6.3 PCR reactions targeting the 16S rRNA gene of the Acetobacterium strain in the co- culture of DHB-12DCA/H2 before and after dilution-to-extinction transfers...... 122 Figure 6.4 Fluorescence microscopy with DAPI staining of the three Dehalobacter cultures: DHB-111TCA/H2 (a-b), DHB-11DCA/H2 (c-d), and DHB-12DCA/H2 (e-h)...... 124

xiii

List of Appendices

Appendix ...... 158 Appendix A: Supplemental Information for Chapter 2 ...... 158 Appendix B: Supplemental Information for Chapter 3 ...... 165 Appendix C: Supplemental Information for Chapter 4 ...... 172 Appendix D: Supplemental Information for Chapter 5 ...... 176 Figure A1 Distribution of 1,2-DCA dechlorinating activity on a BN-PAGE gel lane using protein extracts from a BAV1 culture grown on 1,2-DCA...... 158 Figure A2 Dechlorination assays with protein extracts and gel slices from the KB-1 culture. (A) of the VC-induced KB-1 culture and (B) of the TCE-induced KB-1 culture...... 159 Figure A3 The distribution of dechlorination activity on BN-PAGE gel lanes using protein extracts of the 1,2-DCA KB-1 subculture...... 160 Figure A4 Enrichment of RDase activity after separation by BN-PAGE. The crude protein extract from a KB-1 culture grown on TCE was applied to the gel...... 161 Table A1 Specific dechlorination rates determined by methyl viologen assays with crude protein extracts...... 162 Table A2 Peptide hits and coverage of the RDases identified from the three KB-1 cultures ..... 163 Table A3 Proteins identified from the active BN-PAGE gel slices from the three KB-1 cultures...... 164 Table A4 The custom RDase database (amino acid sequences) used as the reference database for RDase identification from the samples of the KB-1 cultures...... 164 Figure B1 Microbial composition of each of the three mixed cultures determined by 16S rRNA gene pyrotag sequencing...... 165 Figure B2 The re-construction of cfrA and dcrA genes...... 166 Figure B3 The re-construction of cfrB and dcrB genes...... 167 Figure B 4 SDS-PAGE separation of protein samples extracted from BN-PAGE gel slices. This picture shows protein extracts from the CF subculture...... 168 Figure B5 Gel showing PCR reaction results with primers distinguishing cfrA and dcrA genes in the three mixed cultures...... 168 Figure B6 Maximum likelihood phylogenetic tree of (putative) RDases retrieved from the ACT-3 metagenome...... 169 xiv

Figure B7 Potential trypsin digestion sites of six putative RdhB proteins...... 170 Table B1 Tabulated results of LC-MS/MS analyses in all gel slices analyzed...... 170 Table B2 Clustering and phylogenetic assignments of 454 pyrotag sequences (16S rRNA tags) from the three mixed cultures...... 171 Table B 3 The DNA sequences of all rdhA and rdhB genes identified from the ACT-3 metagenome...... 171 Figure C1 Visualization of raw reads suppressed at the 5’ edge of contig00270...... 172 Figure C2 Detection of tandem repeats by read mapping...... 172 Table C1 Contigs that contain reads whose sequences were suppressed...... 173 Table C2 Experimental verification of the resolution of 22 assembly gaps...... 174 Figure D1 Multiple sequence alignment of 16S rRNA genes from strain CF, PER-K23 and E1...... 176 Figure D2 Three potential genome rearrangement events between strain CF and PER-K23. .... 177 Figure D3 Genome circular map of strain PER-K23...... 178 Figure D4 A maximum-likelihood phylogenetic tree of all RDases (characterized or putative) from the four Dehalobacter genomes together with 15 RDases with known functions from other organisms...... 179 Figure D5 Comparing strain CF, PER-K23 and E1 on the gene neighborhood of pceABCT. ... 180 Table D1 Insertion sequences in strain CF, DCA and PER-K23...... 180 Table D2 Genes in Dehalobacter strain CF involved in selected metabolic pathways or categories...... 180

xv

List of Abbreviations

1,1,1-TCA 1,1,1-trichloroethane 1,1-DCA 1,1-dichloroethane 1,1-DCE 1,1-dichloroethene 1,2-DCA 1,2-dichloroethane AA amino acid BN-PAGE blue native polyacrylamide gel electrophoresis CA monochloroethane CDS coding sequence CF chloroform cis-DCE cis-dichloroethene CN-PAGE clear native polyacrylamide gel electrophoresis CT carbon tetrachloride DCE dichloroethene DCM dichloromethane FeS iron sulfide FID flame ionization detector FISH fluorescence in situ hybridization GC gas chromatography IMG Integrated Microbial Genomes IS insertion sequence JGI Joint Genome Institute LC-MS/MS liquid chromatography tandem mass spectrometry MCS mixed-culture supernatant MEAL a solution of methanol, ethanol, acetate and lactate MEL a solution of methanol, ethanol and lactate NGS next-generation sequencing PCE tetrachloroethene (perchloroethene) PCR polymerase chain reaction qPCR quantitative polymerase chain reaction RDase reductive dehalogenase xvi rdhA the gene of reductive dehalogenase catalytic subunit A rdhB the gene of reductive dehalogenase membrane anchor subunit B rRNA ribosomal RNA SDS-PAGE sodium dodecyl sulfate polyacrylamide gel electrophoresis SNP single nucleotide polymorphism TCA cycle tricarboxylic acid cycle TCE trichloroethene TE transposable element trans-DCE trans-dichloroethene VC vinyl chloride

xvii

Chapter 1 General Introduction

1.1 CHLOROFORM AND 1,1,1-TRICHLOROETHANE AS GROUNDWATER POLLUTANTS

Chlorinated ethanes and methanes, such as 1,1,1-trichloroethane (1,1,1-TCA) and chloroform (CF), are pervasive pollutants throughout the industrialized world due to their extensive applications in industry. The use of 1,1,1-TCA began in the 1950s when it became a common industrial solvent to remove grease, oil and wax (Doherty, 2000b). Its usage peaked in the 1970s as a replacement of trichloroethene (TCE), which had been used heavily in early 1970s before it was banned due to adverse human and environmental effects (Doherty, 2000b). Ironically, in the 1980s, 1,1,1-TCA itself was banned because of its ozone-depleting potential (Doherty, 2000b). Currently, 1,1,1-TCA is mainly used in the production of chloro uorocarbon-142 (ATSDR,

2006). Chloroform was one of the first surgical anaesthetics (Grosternfl et al., 2010). In 1976, it was banned from consumer products in the United States because of carcinogenic effects seen in animal tests (Rosenthal, 1987). Currently, CF is mainly used for the production of the refrigerant, monochlorodifluoromethane, and the production of fluoropolymers (ATSDR, 2011). CF is also used as an extraction solvent, a heat transfer medium in fire extinguishers, and an intermediate in dye and pesticide production (ATSDR, 2011).

Both CF and 1,1,1-TCA have adverse effects on human health. CF was classified as a possibly carcinogen to humans (2B carcinogen): its carcinogenicity has not been confirmed in human but in animals (Cappelletti et al., 2012). Chronic exposure to 1,1,1-TCA can cause damage in liver, nervous and circulatory systems (ATSDR, 2006). The US Environmental Protection Agency sets the Maximum Contamination Level for drinking water exposure of 1,1,1-TCA and CF to 0.2 mg/L and 0.07 mg/L, respectively (US-EPA, 2010). Currently, the use of these two compounds is under strict regulation in different countries. Unfortunately, these two compounds have already become widespread contaminants in the environment, especially groundwater, due to accidental spills and improper disposal in the past. Prior to 1984, direct disposal into soil was an established disposal practice for chlorinated solvents (Jackson, 2004). In the United States, 1,1,1-TCA was

1

found in 399 of 1681 National Priorities List sites, while CF was found in 427 sites (a search on September 2, 2013 at http://cfpub.epa.gov/supercpad/cursites/srchsites.cfm). Another cause of CF occurrence in contaminated sites is that CF is a dechlorination product of carbon tetrachloride (CT), another common industrial solvent and widespread pollutant (Grostern et al., 2010). In addition, CF occurs naturally because of volcanic emissions and production by marine algae (Laturnus et al., 2002).

When spilled or dumped, because of low solubility in water, chlorinated solvents including 1,1,1-TCA and CF tend to form dense non-aqueous phase liquid (DNAPL) layers in soils, providing continuous pollution to groundwater. Traditionally, the remediation of a site contaminated by chlorinated solvents is performed through an ex situ approach in which groundwater is pumped out and treated; however, since this approach only decontaminates the water, the non-aqueous phase sources remain underground. For better contamination source control, in situ remediation techniques, such as bioremediation, appear more favorable. In situ bioremediation, including bioaugmentation with microcosms capable of , has been proved successful in the detoxification of sites contaminated by chlorinated ethenes (ESTCP, 2006), such as tetrachloroethene (PCE) and trichloroethene (TCE), another two other widely used chlorinated solvents.

In addition to negative impacts on human health, 1,1,1-TCA and CF are known for their inhibitory effects on many microbial processes, such as methanogenesis (Yang, 1981; Suidan et al., 1991; de Best et al., 1999; Adamson and Parkin, 2000; Weathers, 2000; Yu and Smith, 2000) and reductive dechlorination (Bagley et al., 2000; Duhamel et al., 2002; Futagami et al., 2008). Although the bioremediation of chlorinated ethenes has been successful, its efficacy can be limited by the presence of inhibitory co-contaminants (Löffler and Edwards, 2006), and the two most common inhibitory co-contaminants are 1,1,1-TCA and CF. In US National Priorities List sites, 1,1,1-TCA and TCE were found to coexist at over 310 locations (a search of the National Priorities List database in May 2006). Similarly, CF is often found coexisting with other chlorinated organics. Therefore, the detoxification of 1,1,1-TCA and CF is of special significance to bioremediation efforts of other chlorinated organics.

2

1.2 ORGANOHALIDE RESPIRATION OF CF AND 1,1,1-TCA

Both 1,1,1-TCA and CF undergo abiotic and biotic degradation in groundwater, which is normally anoxic. Abiotically, 1,1,1-TCA dehydrohalogenates to 1,1-dichloroethene (1,1-DCA) and acetic acid with a half-life of more than 2.8 years (Vogel and Mccarty, 1987; Gerkens and Franklin, 1989), while CF is extremely stable: it has a half-life of 3,100 years in groundwater (Mabey and Mill, 1978). In microbial processes, 1,1,1-TCA and CF can be degraded cometabolically by methanogens and sulfate reducers (Bouwer and McCarty, 1983; Freeman et al., 1995; Weathers and Parkin, 1995; Adamson and Parkin, 1999; de Best et al., 1999; Harper, 2000; Koons et al., 2001; Olivas et al., 2002; Guerrero-Barajas and Field, 2005; Chung and Rittmann, 2007). Cometabolic transformation depicts a process in which compounds are fortuitously transformed by microbes but the process is not associated with cell growth or energy production. From the application point of view, cometabolic transformation is less favorable compared to metabolic or growth-linked transformation because the latter is more efficient and more sustainable. An example of metabolic transformation of chlorinated organics is organohalide respiration, in which microorganisms use halogenated organics as the terminal electron acceptors in the electron transfer chain to generate free energy for cell growth. Organohalide respiration was first demonstrated by Dolfing (Dolfing, 1990), and Mohn and Tiedje (Mohn and Tiedje, 1990). Currently, many bioremediation efforts rely on utilizing capable of organohalide respiration.

Organohalide-respiring microbes take advantage of reductive dehalogenation reactions as electron sinks in their metabolism. These reactions involve the removal of a halogen substituent from a molecule with concurrent addition of electrons to the molecule. Two types of reductive dechlorination reactions are observed in organohalide-respiring microbes: hydrogenolysis and dihaloelimination. In hydrogenolysis, a halide is replaced by a hydrogen, such as in the dechlorination of 1,1,1-TCA to 1,1‑dichloroethane (1,1‑DCA). In dihaloelimination, two neighboring (vicinal) halogen substituents are removed simultaneously and a carbon-carbon double bond is formed, such as in the dechlorination of 1,2-dichloroethane (1,2-DCA) to ethene. The halogens atoms are released as halide anions, and can contribute to lowering pH in poorly buffered systems. The mechanism for these reactions in living cells involves electron transfer to transition metal complexes (Fe, Ni, and especially Co), which can be reduced in the presence of a

3

reductant and then donate electrons to halogenated compounds (Vogel et al., 1987). Vitamin B12 is a corrinoid (Cobalt-containing) transition metal complex often associated with catalyzing reductive dehalogenation (Smidt and de Vos, 2004).

The reductive dechlorination of 1,1,1-TCA was first reported as early as 1983 (Bouwer and McCarty, 1983; Parsons and Lage, 1985; Egli et al., 1987; Galli and Mccarty, 1989; Ahlert and Enzminger, 1992; de Best et al., 1997; Deipser and Stegmann, 1997; Chen et al., 1999), but organohalide respiration of 1,1,1-TCA by hydrogenolysis was not demonstrated until 2002 (Sun et al., 2002) when Sun et al. isolated a Dehalobacter strain that respires 1,1,1-TCA and produces monochloroethane (CA) as the end product. In the Edwards lab, an enrichment culture now referred to as “ACT-3” was found to respire 1,1,1-TCA producing CA as the end product via 1,1- DCA (Grostern and Edwards, 2006b). The dechlorinating organisms in this mixed culture also belong to the Dehalobacter genus. Although the dechlorination of CA to ethane by a Methanosarcina barkeri pure culture was reported before (Holliger et al., 1990), CA appears resistant to biological degradation under anaerobic conditions. CA is found as a persistent end product in field sites contaminated by 1,1,1-TCA (Hoekstra, 2005; Borden, 2007; Duchesneau et al., 2007). However, CA can be hydrolyzed to ethanol abiotically (Laughton and Robertson, 1959) and it can also be degraded by some aerobic microbes (Keuning et al., 1985; Scholtz et al., 1987). As a result, regulations on CA contaminant levels in groundwater are typically much less stringent than for more chlorinated compounds.

CF is a recalcitrant organic compound, generally resistant to biodegradation. Organohalide respiration of CF was first demonstrated in the Edwards lab where Dehalobacter within the ACT-3 culture were shown to respire and dechlorinate CF to dichloromethane (DCM) (Grostern et al., 2010). DCM is not further degraded or dechlorinated by ACT-3. DCM fermentation and degradation into non-chlorinated compounds by a distinct Dehalobacter strain was recently reported by Justicia-Leon et al (Justicia-Leon et al., 2012). As well, a culture capable of complete CF dechlorination to non-chlorinated end products was reported and two Dehalobacter strains were involved: one reductively dechlorinates CF to DCM and the other one performs DCM fermentation (Lee et al., 2012). Thus the genus Dehalobacter has emerged as an important player in chloromethane and chloroethethane detoxification.

4

1.3 DEHALOBACTER

Organohalide-respirating bacteria are phylogenetically diverse, including Dehalococcoides, Geobacter, Sulfurospirillum, Desulfitobacterium, Dehalobacter, Dehalogenimonas et al (Löffler and Edwards, 2006). Dehalobacter spp. (Phylum ) are among the organisms that are most commonly found in groundwater contaminated with chlorinated organics, and they have attracted more attention in the past few years as more studies revealed their frequent involvement in organohalide respiration of a variety of halogenated organics. Dehalobacter have been found to dechlorinate many chlorinated organics including chlorinated ethenes (Wild et al., 1996; Holliger et al., 1998), chlorinated ethanes (Sun et al., 2002; Grostern and Edwards, 2006b, a, 2009; Grostern et al., 2010), chlorobenzenes (Nelson et al., 2011), and others (Schlotelburg et al., 2002; van Doesburg et al., 2005; Yoshida et al., 2009b, a). To date, only three Dehalobacter isolates (Wild et al., 1996; Holliger et al., 1998; Sun et al., 2002) were reported. Phylogenetically, Dehalobacter spp. belong to the group of gram-positive bacterium with low DNA G+C content, although they often stain negative in the Gram stain (Holliger et al., 1998). Dehalobacter cells exist as rods with one to four flagella (Wild et al., 1996; Holliger et al., 1998). The characterization of these three isolates depicted Dehalobacter as strict anaerobes and obligate dechlorinators: they only grow with hydrogen or formate as an electron donor, chlorinated compounds as electron acceptors and acetate as carbon source. Their narrow metabolism restricted to organohalide respiration appears similar to that of Dehalococcoides (from the Phylum of Chloroflexi). However, based on 16S phylogeny, Dehalobacter is more closely related to Desulfitobacterium: both belong to the Family. In contrast, Desulfitobacterium are well known for their versatility of using different electron donors and acceptors in (Villemur et al., 2006). Interestingly, although Dehalobacter were characterized as obligate dechlorinators, two recent reports discovered some Dehalobacter strains that grow on dichloromethane fermentation (Justicia-Leon et al., 2012; Lee et al., 2012), indicating that they can be more versatile in metabolism than previously thought. No Dehalobacter genome was available before 2012, but four Dehalobacter genomes, three complete (Tang et al., 2012; Rupakula et al., 2013) and one draft (Maphosa et al., 2012), have been released. The first two complete Dehalobacter genomes published were from this Ph.D. work and are included in this thesis (Chapter 4).

5

1.4 REDUCTIVE DEHALOGENASES

In organohalide respiration, the key enzymes that reside at the end of electron transfer chain and directly react with halogenated organics are known as reductive dehalogenases (RDases). RDases form a large protein family. Currently, there are hundreds of reductive dehalogenase homologous sequences available in public databases, but only fifteen of them have a known substrate (Hug et al., 2013). The study of this protein family has been challenging due to the failure of heterologous expression efforts, and the lack of protein structure and knowledge of reaction sites.

To date, the characterization of most RDases has been achieved by their direct purification from organohalide-respirating cultures (Ni et al., 1995; Neumann et al., 1996; Christiansen et al., 1998; Magnuson et al., 1998; Miller et al., 1998; van de Pas et al., 1999; Krasotkina et al., 2001; Okeke et al., 2001; van de Pas et al., 2001; Suyama et al., 2002; Boyer et al., 2003; Maillard et al., 2003; Thibodeau et al., 2004). All characterized RDases have a length of 440-560 amino acids. They are typically membrane-bound, monomeric, corrinoid-dependent, iron-sulfur cluster containing enzymes, although there might be an exception: the 3-chlorobenzoate RDase from Desulfomonile tiedjei might be a heterodimer that has a subunit presumably containing a heme as a cofactor and has no corrinoids or iron-sulfur clusters (Ni et al., 1995). Notably, corrinoid cofactors contained by RDases are not necessarily identical. The corrinoid isolated from the PCE RDase of Dehalobacter restrictus has the same properties as the commercialized cobalamin (Maillard et al., 2003). Dehalococcoides RDases are corrinoid-dependent but they cannot synthesize corrinoids by themselves. Recently, interspecies corrinoid transfer between Dehalococcoides and other anaerobes was demonstrated (Yan et al., 2012; Yan et al., 2013); interestingly, it was shown that Dehalococcoides only use certain types of corrinoid, cobalamins, which use 5’,6’-dimethylbenzimidazole as the lower α-ligand base (Yan et al., 2013). However, not all RDases use a cobalamin as the corrinoid cofactor: the corrinoid from the PCE RDase of Sulfurospirillum multivorans is a Norpseudovitamin-B12, which uses adenine as the lower α-ligand base (Krautler et al., 2003).

All characterized RDases show specificity on limited chlorinated substrates that have similar chemical structures. Some catalyze the dechlorination of chlorinated ethenes, such as PceA (Neumann et al., 1996; Magnuson et al., 1998; Miller et al., 1998; Okeke et al., 2001; Suyama et

6

al., 2002; Maillard et al., 2003), MbrA (Chow et al., 2010), TceA (Magnuson et al., 2000), VcrA (Müller et al., 2004) and BvcA (Krajmalnik-Brown et al., 2004); some prefer chlorinated ethanes, such as DcaA (Marzorati et al., 2007; Grostern and Edwards, 2009), CfrA (Tang and Edwards, 2013) and DcrA (Tang and Edwards, 2013); Some prefer chlorinated phenols, such as CprA (van de Pas et al., 1999; Thibodeau et al., 2004), and CrdA (van de Pas et al., 1999; Boyer et al., 2003); some prefer chlorinated benzenes, such as CbrA (Adrian et al., 2007). Notably, three Dehalococcoides RDases, TceA, VcrA and BvcA, also have the activity of the dichloroelemination of 1,2-DCA to ethene (Tang et al., 2013). The phylogeny of RDases has been studied (Hug et al., 2013): it was found that functional similarity does not always agree with sequence similarity. In some cases, it does: TceA, VcrA and BvcA, share a similar substrate range and are phylogenetically close with a pair-wise amino acid identity higher than 37%. However, three PceA RDases (Neumann et al., 1996; Magnuson et al., 2000; Maillard et al., 2003) share identical substrates, but are phylogenetically distant, and two of them share an amino acid identity lower than 22%. In dramatic constrast, two RDases, CfrA and DcrA, with amino acid identity as high as 95% share no known substrate (Tang and Edwards, 2013). CfrA catalyzes the dechlorination of chloroform and 1,1,1-TCA, while DcrA catalyzes the dechlorination of 1,1- DCA; their substrates are similar in structure, but they have no or negligible activity on each other’s substrates. Potential explanations for these constrasting observations may be achievable once insights into protein structures of RDases are obtained; unfortunately, no protein structure of any RDase has been determined so far.

In most sequenced genomes of organohalide-respirating bacteria, intact RDase genes (putative or not) are present in operons composed of at least two genes: rdhA and rdhB. rdhA encodes the catalytic subunit while rdhB is assumed to encode a membrane anchor for the catalytic subunit (Smidt and de Vos, 2004). Notably, a Dehalogenimonas genome (Siddaramappa et al., 2012) released recently had several rdhA genes that have no rdhB genes nearby. The catalytic subunit encoded by rdhA typically has two conserved features: (a) it has two conserved iron-sulfur binding motifs associated with two iron-sulfur clusters and (b) it has a twin arginine signal peptide sequence (RRXFXK). This signal peptide indicates that RDases are transported across the membrane by a twin arginine translocation system, which transports extracytoplasmic proteins that contain complex redox cofactors to the periplasmic space.

7

To date, many genomes of organohalide-respirating organisms have been sequenced. Multiple rdhA genes often coexist in one genome, ranging from a few to dozens. An extreme case is that Dehaloccoides mccartyi strain VS has 36 rdhA genes (McMurdie et al., 2009). The coexistence of multiple rdhA genes indicates multiple substrates, hypothetically allowing the organism to have better adaptability. It also suggests the existence of regulatory systems to control the expression of different RDase genes. The expression of RDase genes was found to be induced by the addition of chlorinated substrates in several studies (Suyama et al., 2002; Krajmalnik-Brown et al., 2004; Gauthier et al., 2006; Morris et al., 2006; Rahm et al., 2006; Tsukagoshi et al., 2006) and a transcriptional factor that regulates the expression of an RDase gene has been identified (Pop et al., 2004). However, there were also examples showing that some RDases could be expressed constitutively (Gauthier et al., 2006; Rupakula et al., 2013).

Because no protein structure of a RDase is available, the reaction site(s) and reaction mechanism(s) of organohalide respiration are unknown, although potential models of the reaction mechanism have been proposed based on the knowledge of the corrinoid and the iron-sulfur clusters (Smidt and de Vos, 2004). The main difficulty in protein structure determination is that an insufficient mass of purified RDases for crystallization trials can be obtained by direct purification from dechlorinating cultures, which typically grow slowly and have a relatively low cell density. Meanwhile, attempts to use a heterologous host, such as Eschericbia coli (E. coli), to express RDases were unsuccessful (Neumann et al., 1998; Suyama et al., 2002). Due to the complexity of RDases, there could be many reasons for the improper folding of RDases expressed in a heterologous host, such as incorrect cofactors, different TAT translocation systems and special chaperones required for correct folding. Morita et al. (Morris et al., 2006) identified a trigger factor protein PceT in an RDase operon that contains four genes including pceA (rdhA gene) and pceB (rdhB gene). This trigger protein is likely responsible for the correct folding of PceA. Smidt et al. (Smidt et al., 2000) found the presence of a putative “trigger-factor” protein gene close to a chlorophenol RDase gene in Desulfitobacterium dehalogenans. Coexpression of trigger proteins might help solve problems in heterologous expression. Recently, some progress was achieved in heterologous expression of RDases in E. coli (Sjuts et al., 2012): soluble expression of the PceA RDase from Dehalobacter restrictus was achieved by fusing an E.coli trigger protein; unfortunately, although this recombinant PceA

8

protein can bind to cobalamin and the two iron-sulfur clusters can be resconstructed, no dechlorination activity was recovered.

1.5 ACT-3 AND ITS SUBCULTURES

This thesis focuses on the characterization of a mixed microbial community known as ACT-3 and its subcultures. ACT-3 is an anaerobic enrichment culture originally derived from microcosms prepared from material from a site contaminated with chlorinated ethanes and ethenes (Grostern and Edwards, 2006b). ACT-3 is now a commercialized culture that has been used for bioremediation applications. Currently it is blended with KB-1 (a chlorinated-ethene- dechlorinating culture) to form KB-1 plus (http://www.siremlab.com/products/kb-1-plus), which is used to clean up groundwater sites contaminated by both chlorinated ethenes and 1,1,1-TCA (or CF) because the coexistence of 1,1,1-TCA inhibits complete dechlorination of chlorinated ethenes by Dehalococcoides (Grostern and Edwards, 2006b), the major dechlorinators in KB-1.

In the Edwards lab, we typically grow anaerobic dechlorinating enrichment cultures with chlorinated substrates as electron acceptors and small organic compounds (such as methanol and ethanol) as electron donors. Based on their functions, we often classify microbes in these enrichment cultures into three categories: dechlorinating organisms, fermenting organisms and methanogens. Fermenting organisms grow on fermenting organic electron donors into hydrogen

(H2) or formate, and acetate, which are then used by dechlorinating organisms as electron donor and carbon source. Methanogens compete with dechlorinating organisms for H2 (or formate) and acetate to produce methane. Besides these, other interactions could exist between them. For example, any organisms in these culture (including dechlorinating organisms) could rely on others to provide essential nutrients that cannot be synthesized by themselves; such nutrients could be amino acids or special vitamins that are not provided in the original medium.

ACT-3 is grown on an anaerobic defined medium (Edwards and Grbić-Galić, 1994) amended with 1,1,1-TCA as an electron acceptor and a mixture of methanol, ethanol, acetate and lactate (MEAL) as electron donors. This defined medium only contains inorganic salts, trace metals, and some basic vitamins; it is buffered at pH ~7 with bicarbonate and carbon dioxide; and iron (II) sulfide is used to remove oxygen. A mixture of donors was used to ensure enrichment of a broad range of fermenting organisms, aiming to increase the robustness of the culture. In the Edwards

9

lab, ACT-3 and other enrichment cultures are maintained in an anaerobic condition by growing them with anaerobic media in air-tight containers (such as Serum bottles) inside anaerobic chambers (gloveboxes). In ACT-3, Dehalobacter spp. are the sole dechlorinators, occupying ~ 70% of the whole microbial community. The methanogens in ACT-3 belong to the family of Methanocullaceae. Notably, methanogenesis in this culture only occurs when 1,1,1-TCA is deleted, showing the inhibition of 1,1,1-TCA on methanogenesis. Other microbes in ACT-3 are of Bacteroidales (most abundant non-dechlorinating organism), Clostridium, Spirochaetes,

Desulfovibrio etc. Many of them should be fermenting organisms that convert MEAL into H2 (or formate) and acetate.

Figure 1.1 ACT-3 and its subcultures. The culture names are in bold and the substrates amended are shown below the names. MEL represents an equal electron-equivalent mixture of methanol, ethanol and lactate supplied as fermentable electron donors.

To facilitate the characterization of the ACT-3 culture, four ACT-3 subcultures were established using different electron donors and acceptors (Figure 1.1). An ACT-3 subculture known as the CF subculture was amended with chloroform as an electron acceptor and MEAL as electron donors. Another subculture known as the 1,1-DCA subculture was amended with 1,1-DCA as an electron acceptor and MEAL as electron donors. About two years ago, acetate was removed from the list of electron donors because it was amply produced from the fermentation of the other substrates; since then, these three cultures are amended with a mixture of methanol, ethanol and lactate (MEL) as electron donors. Dehalobacter were known as the sole dechlorinators in ACT-

3. As Dehalobacter were mainly known as obligate dechlorinators that use hydrogen (H2) or formate as an electron donor and acetate as carbon source (Wild et al., 1996; Holliger et al., 1998; Sun et al., 2002), another two subcultures were established accordingly to enrich Dehalobacter strains: DHB-111TCA/H2 was amended with 1,1,1-TCA as an electron acceptor,

H2 as an electron donor, and acetate as carbon source; and DHB-11DCA/H2 was amended with

1,1-DCA as an electron acceptor, H2 as an electron donor, and acetate as carbon source.

10

Before this thesis, some characterization of ACT-3 and its subcultures had been performed by previous students (Grostern, 2009). From their work, we learned that ACT-3 can dechlorinate three different substrates: CF, 1,1,1-TCA and 1,1-DCA. It can dechlorinate 1,1,1-TCA to CA via 1,1-DCA, and CF to DCM. Interestingly, after being maintained and transferred many times in medium amended with different electron donors and acceptors, the ACT-3 subcultures lost some of the dechlorination functions of the parent culture ACT-3. The CF subculture and DHB- 111TCA/H2 only dechlorinates 1,1,1-TCA (to 1,1-DCA) and CF (to DCM). The DCA subculture and DHB-11DCA/H2 only dechlorinate 1,1-DCA (to CA). Such differences strongly suggested the presence of two different Dehalobacter strains (Grostern, 2009), a hypothesis that was subsequently proven by the work in this Ph.D. thesis.

1.6 RATIONALE AND RESEARCH OBJECTIVES

This thesis focuses on the characterization of ACT-3, a commercialized culture that is being used to detoxify groundwater contaminated by 1,1,1-TCA and chloroform, two widespread, toxic and recalcitrant organics. Dehalobacter, the dechlorinating organisms in ACT-3, are among the microbes that are most commonly associated with organohalide respiration of different chlorinated organics. However, Dehalobacter was a poorly understood genus. The research presented below aims to improve our understanding of the ACT-3 culture and the Dehalobacter genus. Ultimately, better understanding of the dominant dechlorinating organisms and the whole culture will improve the growth and the bioremediation applications of ACT-3.

Major objectives of this thesis are the following:

1. Establish a protein-purification method that allows rapid functional characterization of RDases expressed in dechlorinating cultures.

2. Identify the RDases responsible for the dechlorination functions observed in ACT-3.

3. Assemble complete Dehalobacter genomes from the ACT-3 metagenome

4. Annotate and analyze assembled Dehalobacter contigs or genomes.

5. Isolate the Dehalobacter strains in ACT-3.

11

6. Investigate non-dechlorinating organisms in ACT-3.

1.7 THESIS OUTLINE

Chapter 1: General Introduction

Chapter 2: Functional Characterization of Reductive Dehalogenases Using Blue Native Polyacrylamide Gel Electrophoresis

This chapter reports the development of a new method that allows rapid functional characterization of active RDases from dechlorinating cultures while requiring low biomass consumption, and the application of this method in the characterization of RDases expressed in Dehalococcoides-containing cultures. A version of this chapter has been published (See Section 1.8).

Chapter 3: Identification of Dehalobacter Reductive Dehalogenases that Catalyze Dechlorination of Chloroform, 1,1,1-Trichloroethane and 1,1-Dichloroethane

This chapter reports the application of the method developed as described in Chapter 2 to the identification of active RDases expressed in ACT-3, which resulted in the identification of two novel RDases, CfrA and DcrA. CfrA dechlorinates chloroform and 1,1,1-TCA, and DcrA dechlorinates 1,1-DCA. A version of this chapter has been published (See Section 1.8).

Chapter 4: Semi-Automatic In Silico Gap Closure Enabled De Novo Assembly of Two Dehalobacter Genomes from Metagenomic Data

This Chapter reports the assembly and separation of two highly similar Dehalobacter genomes from the metagenomes of the ACT-3 culture and the CF subculture. The assembly of these two genomes was a challenging task and it was achieved by the development of an in silico gap- resolution method, which has a potential to help other researchers in solving similar problems in metagenomic assembly. A version of this chapter has been published (See Section 1.8).

Chapter 5: Sister Dehalobacter Genomes Revealed Specialization in Organohalide Respiration and Strain Differentiation Driven by Chlorinated Substrates

12

This chapter describes the annotation and analyses of the two Dehalobacter genomes assembled as described in Chapter 4. One genome, strain CF, encodes the gene of the RDase CfrA and the other genome, strain DCA, encodes the gene of the RDase DcrA; CfrA and DcrA are the two RDases responsible for the dechlorination reactions found in the ACT-3 cultures as described in Chapter 3. In addition, these two genomes were compared with another two Dehalobacter genomes released recently: Dehalobacter restrictus strain PER-K23 (Rupakula et al., 2013) and Dehalobacter sp. strain E1 (Maphosa et al., 2012). A version of this chapter has been published (See Section 1.8).

Chapter 6: Isolation of Three Dehalobacter Strains

This chapter describes the isolation of the two Dehalobacter strains co-existing in ACT-3 and another Dehalobacter strain from a culture unrelated to ACT-3. The two strains isolated from ACT-3 correspond to the two genomes of Dehalobacter strain CF and strain DCA assembled in Chapter 4 and annotated in Chapter 5. The isolation of these three Dehalobacter strains were realized by dilution-to-extinction transfers with a defined medium amended with sterile mixed- culture supernatant from ACT-3.

Chapter 7: Complete Genome of Bacteroidales strain CF from a Chloroform- Dechlorinating Enrichment Culture

This chapter describes the assembly of another complete genome, Bacteroidales strain CF, from the metagenomes of the ACT-3 culture and the CF subculture. This organism is the most abundant non-dechlorinating organism in the two cultures. Earlier chapters focussed on the study of the dechlorinating organisms (Dehalobacter spp.) in the ACT-3 culture, but the characterization of non-dechlorinating organisms is equally important to the understanding of the whole microbial community. In Chapter 6, the growth of Dehalobacter isolates was found to rely on the addition of sterile mixed-culture supernatant from the ACT-3 culture, indicating nutrient dependence of Dehalobacter spp. on non-dechlorinating organisms. Chapter 7 begins to explore non-dechlorinating organisms by first closing a genome of one such organism. A version of this chapter has been published (See Section 1.8).

Chapter 8: Summary, Significance and Future Work

13

This chapter integrates and summarizes the work of this thesis, and describes further research directions.

1.8 STATEMENT OF AUTHORSHIP AND PUBLICATION STATUS

Chapter 2: Functional Characterization of Reductive Dehalogenases Using Blue Native Polyacrylamide Gel Electrophoresis

Authors: Shuiquan Tang1, Winnie W. M. Chan1, Kelly E. Fletcher2, Jana Seifert3, Xiaoming Liang1, Frank E. Löffler4,5, Elizabeth A. Edwards1, and Lorenz Adrian6. Affiliations: 1-Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, Ontario, Canada; 2-School of Civil and Environmental Engineering, Georgia Institute of Technology, Atlanta, Georgia, USA; 3-Helmholtz Centre for Environmental Research-UFZ, Department of Proteomics, Leipzig, Germany; 4-Department of and Department of Civil and Environmental Engineering, University of Tennessee, Knoxville, Tennessee, USA; 5-Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA; 6-Helmholtz Centre for Environmental Research-UFZ, Department of Isotope Biogeochemistry, Leipzig, Germany. Contributions: EAE, FEL, KEF, LA, ST, and WWMC conceived of the experiments. KEF, ST and WWMC conducted the experiments with different dechlorinating cultures using the same method. XL conducted the experiments regarding protein quantification in gels of blue native PAGE. JS and ST performed protein identification analyses of mass spectrum data from LC- MS/MS peptide sequencing. ST and EAE drafted the manuscript. Reference to publication: Applied and Environmental Microbiology, 2013, Vol. 79, No. 3, P. 974-981 (doi: 10.1128/AEM.01873-12).

Chapter 3: Identification of Dehalobacter Reductive Dehalogenases that Catalyze Dechlorination of Chloroform, 1,1,1-Trichloroethane and 1,1-Dichloroethane

Authors: Shuiquan Tang1 and Elizabeth A. Edwards1. Affiliations: 1-Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, Ontario, Canada.

14

Contributions: EAE and ST conceived of the experiments. ST conducted the experiments and performed the analyses. ST and EAE drafted the manuscript. Reference to publication: Philosophical Transactions of the Royal Society B: Biological Sciences, 2013, Vol. 368, 20120318 (doi: 10.1098/rstb.2012.0318).

Chapter 4: Semi-Automatic In Silico Gap Closure Enabled De Novo Assembly of Two Dehalobacter Genomes from Metagenomic Data

Authors: Shuiquan Tang1, Yunchen Gong2, and Elizabeth A. Edwards1. Affiliations: 1-Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, Ontario, Canada; 2- Centre for the Analysis of Genome Evolution and Function, University of Toronto, Toronto, Ontario, Canada. Contributions: ST conceived of the experiments. ST conducted the experiments. ST developed the in-silico gap-resolution method and conducted the assembly of the two Dehalobacter genomes. YG conducted the initial assembly of Illumina sequencing data from the CF subculture. ST and EAE drafted the manuscript. Reference to publication: PLoS One, 2012, Vol. 7, No. 12, P. e52038 (doi: 10.1371/journal.pone.0052038).

Chapter 5: Sister Dehalobacter Genomes Revealed Specialization in Organohalide Respiration and Strain Differentiation Driven by Chlorinated Substrates

Authors: Shuiquan Tang1 and Elizabeth A. Edwards1. Affiliations: 1-Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, Ontario, Canada. Contributions: EAE and ST conceived of the data analyses. ST performed the analyses. EAE and ST drafted the manuscript. In preparation for: the journal of International Society for Microbial Ecology (ISME).

Chapter 7: Complete Genome of Bacteroidales strain CF from a Chloroform- Dechlorinating Enrichment Culture

Authors: Shuiquan Tang1 and Elizabeth A. Edwards1. Affiliations: 1-Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, Ontario, Canada. 15

Contributions: ST conceived of and performed the analyses. ST and EAE drafted the manuscript. Reference to publication: Genome Announcements, Accepted 15 Nov 2013, ref genomeA01066-13.

Other Publications relevant to this thesis, carried out as collaborator:

Justicia-Leon, S. D., Mack, E. E., Griffiths, D. R., Tang, S., Edwards, E.A. and F.E. Löffler. 2013. Bioaugmentation with Dehalobacter-containing consortia achieves complete chloroform detoxification in anoxic microcosms. Environmental Science & Technology (accepted Nov. 2013).

Chan, C.C.H., Mundle, S.O.C., Eckert, T., Liang, X., Tang, S., Lacrampe-Couloume, G., Edwards, E.A. and B. Sherwood Lollar. 2012. Large carbon isotope fractionation during biodegradation of chloroform by Dehalobacter cultures. Environmental Science & Technology 46:10154–10160

16

Chapter 2 Functional Characterization of Reductive Dehalogenases Using Blue Native Polyacrylamide Gel Electrophoresis

Reproduced with permission from the journal of Applied and Environmental Microbiology, the American Society for Microbiology. Copyright © American Society for Microbiology, Applied and Environmental Microbiology, 2013, Vol. 79, P. 974-981. DOI: 10.1128/AEM.01873-12.

2.1 ABSTRACT

Dehalococcoides mccartyi strains are obligate organohalide-respiring bacteria harboring multiple distinct reductive dehalogenase (RDase) genes within their genomes. A major challenge is to identify substrates for the enzymes encoded by these RDase genes. We demonstrate an approach that involves blue native polyacrylamide gel electrophoresis (BN-PAGE) followed by enzyme activity assays with gel slices and subsequent identification of proteins in gel slices using liquid chromatography tandem mass spectrometry (LC-MS/MS). RDase expression was investigated in cultures of Dehalococcoides mccartyi strain BAV1 and in the KB-1 consortium growing on chlorinated ethenes and 1,2-dichloroethane. In cultures of strain BAV1, BvcA was the only RDase detected, revealing that this enzyme catalyzes the dechlorination not only of vinyl chloride, but also of all dichloroethene isomers and 1,2-dichloroethane. In cultures of consortium KB-1, five distinct Dehalococcoides RDases and one Geobacter RDase were expressed under the conditions tested. Three of the five RDases included orthologs to the previously identified chlorinated ethene-dechlorinating enzymes VcrA, BvcA and TceA. This study revealed substrate promiscuity for these three enzymes, and provides a path forward to further explore the largely unknown RDase protein family.

2.2 INTRODUCTION

Chlorinated ethenes and ethanes are widespread groundwater contaminants (De Wildeman and Verstraete, 2003; Löffler and Edwards, 2006). A viable approach for the remediation of chlorinated solvent contamination is microbial reductive dechlorination (Major et al., 2002; Lendvay et al., 2003; Ward and Stroo, 2010). Phylogenetically diverse bacteria partially

17

dechlorinate tetrachloroethene (PCE) via trichloroethene (TCE) to cis-1,2-dichloroethene (cis-DCE), including Dehalococcoides, Geobacter, Sulfurospirillum, Dehalobacter, and Desulfitobacterium among others (Löffler and Edwards, 2006). Dehalococcoides mccartyi strains are the only organisms known to dechlorinate cis-DCE and vinyl chloride (VC) to nontoxic ethene (Löffler et al., 2012). Some D. mccartyi strains are also capable of catalyzing the reductive dihaloelimination of 1,2-dichloroethane (1,2-DCA) to ethene and 1,2-dichloropropane to propene (Maymó-Gatell et al., 1999; He et al., 2003; Ritalahti and Löffler, 2004).

Reductive dechlorination of these groundwater pollutants is catalyzed by reductive dehalogenases (RDases). The catalytic unit is encoded by the RDase subunit A gene (rdhA). Over 650 rdhA genes have been identified from fully sequenced genomes based on sequence homology and of these over 100 are from Dehalococcoides (McMurdie et al., 2009; Hug, 2012). Only a few rdhA genes have been functionally characterized because of difficulties inherent to working with slow-growing anaerobes with low biomass yields, the lack of genetic systems for these organisms, and the inability to successfully express functional RDases heterologously. Partial purification of the Dehalococcoides RDases TceA (Magnuson et al., 2000), PceA (Magnuson et al., 1998) and VcrA (Müller et al., 2004) enabled the preliminary characterization of their activity and substrate range, but difficulties in obtaining sufficient biomass hampered biochemical studies. Substrates for the RDases BvcA (Krajmalnik-Brown et al., 2004) and MbrA (Chow et al., 2010) were inferred from transcriptional analysis, though biochemical confirmation is still missing. Adrian et al. (2007) identified the first chlorobenzene RDase, CbrA, using a combination of clear native polyacrylamide gel electrophoresis (CN-PAGE), enzyme assays and liquid chromatography tandem mass spectrometry (LC-MS/MS) peptide identification. This approach enabled functional attribution without requiring large amounts of biomass. Here we build upon this approach using blue native PAGE (BN-PAGE) (Wittig et al., 2006), which substantially improved recovery of dechlorinating activity after electrophoresis, resulting in higher sensitivity and enabling analysis of a wider range of substrates.

KB-1 is a Dehalococcoides-containing consortium capable of complete PCE hydrogenolysis to ethene via TCE, cis-DCE and VC as intermediates, as well as 1,2-DCA dihaloelimination to ethene (Duhamel and Edwards, 2006, 2007). Metagenome sequencing revealed at least 36 rdhA genes in consortium KB-1 (Hug, 2012) and multiple rdhA genes were transcribed simultaneously when the culture was grown on different chlorinated solvents as electron acceptors (Waller et al., 18

2005; Waller, 2009). D. mccartyi strain BAV1 is capable of hydrogenolysis of all three dichloroethene (DCE) isomers (cis-DCE, trans-DCE and 1,1-DCE) and VC, as well as the dihaloelimination of 1,2-DCA to ethene (He et al., 2003); 11 rdhA genes were found in its genome (McMurdie et al., 2009). Previous gene expression studies implicated bvcA in VC-to-ethene reductive dechlorination (Krajmalnik-Brown et al., 2004); however, the RDases that catalyze other dechlorinating reactions are unknown. Therefore, the BN-PAGE approach was applied to identify the RDases expressed and active in strain BAV1 and in the KB-1 consortium.

2.3 MATERIALS AND METHODS

2.3.1 Cultures and growth conditions

The KB-1 consortium, a KB-1 subculture grown on 1,2-DCA (referred to as the 1,2-DCA KB-1 subculture), and pure cultures of D. mccartyi strain BAV1 (He et al., 2003) were used in this study. KB-1 was enriched from sediment from a contaminated site in Ontario (Canada) and contains multiple D. mccartyi strains (Duhamel et al., 2004). KB-1 also contains a PCE- and TCE- dechlorinating Geobacter lovleyi strain KB-1 (Wagner et al., 2012). Consortium KB-1 was routinely maintained with TCE as an electron acceptor and methanol and ethanol as electron donors in a defined mineral salts medium (Edwards and Grbić-Galić, 1994). The 1,2-DCA KB-1 subculture was maintained for over 4 years with 1,2-DCA as an electron acceptor and methanol as an electron donor. D. mccartyi strain BAV1 was isolated from dechlorinating microcosms established with aquifer material collected at the contaminated Bachman Road site in Oscoda, Michigan (USA) (He et al., 2003).

2.3.2 Preparation of crude protein extracts

Prior to preparing cell-free crude extracts, an aliquot of KB-1 culture was separated into two bottles, and flushed with N2/CO2 (80/20, vol/vol) to purge residual chlorinated ethenes and ethene. The purged cultures were incubated for 5 days (starvation phase), and then flushed with

H2/CO2 (80/20, vol/vol) to provide electron donor and amended with 30 mg/L (aqueous concentration) of either TCE (230 µM; “TCE-induced”) or VC (480 µM; “VC-induced”). One day later, cells from 8-10 mL culture samples were pelleted by centrifugation for 6 min at 10,000 ×g at 4 °C. For the 1,2-DCA KB-1 subculture, 40 mL of culture were collected during the dechlorination of 1,2-DCA to ethene and cells were collected by centrifugation. Cell pellets were 19

used immediately or stored at -80 °C. D. mccartyi strain BAV1 cultures were grown with either cis-DCE or 1,2-DCA as follows. Replicate vessels containing a mineral salts medium (Fletcher et al., 2009) supplemented with 5 mM sodium acetate, ~ 30 mg/L (~300 µM; aqueous concentration) of cis-DCE or 1,2-DCA and hydrogen (nominal aqueous concentration of 7.5 mM) as the electron donor were inoculated with strain BAV1 (1.3 % vol/vol). When at least 50 % of the amended cis-DCE had been dechlorinated to VC (with at least traces of ethene produced as well) or at least 50% of the amended 1,2-DCA had been dechlorinated to ethene, 600 mL BAV1 cell suspension was pelleted by centrifugation for 60 min at 4,000 rpm and 15 °C. Cell pellets were stored at -80 °C.

In an anoxic chamber, fresh or thawed frozen cell pellets were suspended in 200-250 µL of anoxic 1x Sample Buffer provided with the NativePAGE Sample Prep Kit (Invitrogen, Carlsbad, CA) containing 1 % (wt/vol) digitonin. During method optimization, four different detergents were compared, including digitonin, dodecyl-β-d-maltoside, taurodeoxycholate, and Triton-X 100. Digitonin was found to be most effective (Chan, 2009). Suspended cell pellets were combined with approximately ~ 200 mg of 75 µm diameter glass beads, sealed in a 1.5 mL screw-top tube, shaken in a bead beater (FastPrep DNA extractor, Savant Instruments, Holbrook, NY) at intensity 4.0 for 10 sec, and then placed immediately on ice. To separate solubilized proteins from cell debris, the suspensions were centrifuged at 13,000 × g for 10 min at 4 °C. For the 1,2-DCA KB-1 subculture only, cells were lysed by shaking with glass beads on a horizontal vortex mixer (Scientific Industries Inc., Bohemia, NY, USA) at maximum amplitude for three cycles; each cycle consisted of 2 min shaking and 1 min incubation in an ice bath. Supernatants (crude protein extracts), containing solubilized proteins, were transferred to new 1.5 mL Eppendorf tubes. Protein concentrations were determined by using the Bradford assay (Bradford, 1976) with bovine serum albumin as a standard. Before loading the samples on the BN-PAGE, crude protein extracts were amended with 5 % (wt/vol) G-250 sample additive from the NativePAGE Sample Prep Kit (Invitrogen) to a final concentration of 0.25 % (wt/vol).

2.3.3 BN-PAGE gel electrophoresis and staining

Electrophoresis was performed in an 11°C room using the NativePAGE Novex Bis-Tris Gel System (Invitrogen). The anode and cathode buffers were prepared according to the manufacturer’s instructions in the NativePAGE Running Buffer Kit (Invitrogen) and were

20

pre-chilled to 4°C prior to use. A precast gradient Bis-Tris gel (4-16 %, 1.0 mm thick; Invitrogen) was placed into the XCell SureLock Mini Cell and 5 µL of NativeMark Unstained Protein Standard (Invitrogen) was loaded to one lane to serve as the size standard. Volumes of 20 to 25 µL of crude protein extract, corresponding to about 12 to 30 µg of total protein, were loaded to each of the other lanes of the gel. Remaining crude protein extracts were stored on ice to be used as positive controls in subsequent dechlorination assays. Replicate lanes were prepared for: i) staining to visualize protein bands, ii) excision of gel slices and elution of proteins for SDS-PAGE, and iii) excision of gel slices for activity assays. The loaded gel was run successively at 150 V for 60 min, then at 250 V for 30 min, and finally at 300 V for 15 min. For the 1,2-DCA KB-1 subculture only, the BN-PAGE electrophoresis was run for 60 min at 150 V followed by 45 min at 200 V while the whole chamber was placed in an ice bath. Once electrophoresis was complete, the lane containing the protein ladder and one lane loaded with the crude protein extract were cut from the rest of the gel using a scalpel and silver stained according to Nesterenko et al. (1994). For the 1,2-DCA KB-1 subculture only, the staining was performed using the “Fast Coomassie G-250 Staining” protocol from Invitrogen (http://tools.invitrogen.com/content/sfs/manuals/nativepage_man.pdf, page 23). The remainder of the gel was stored in anode buffer at 4° or 11 °C during the staining procedure. Stained lanes or gel slices were saved at -20° to 4°C in a solution containing 1 % (vol/vol) glacial acetic acid for subsequent LC-MS/MS analysis.

2.3.4 Protein quantification in BN-PAGE gels

Protein amounts in bands excised from BN-PAGE gel lanes were estimated by comparing stain intensity to standards. In this procedure two gel lanes, one containing a protein ladder and one containing crude protein extract, were first stained following the “Fast Coomassie G-250 Staining” protocol (Invitrogen) and then destained by incubating overnight while gently shaking in 7% (vol/vol) acetic acid to reduce the Coomassie Blue background. Then a digital picture (G:BOX Chemi HR16, Syngene, US) of the two gel lanes was analyzed using ImageJ (http://rsb.info.nih.gov/ij/) comparing grey values of protein bands in the lane with crude protein extracts with those of lanes containing known amounts of protein in the ladder. Protein amounts in the bands from the ladder had been previously determined in the same way using a series of bovine serum albumin standards. The protein content in BN-PAGE gel slices was not measured

21

in our initial experiments, but only in a later experiment with KB-1 culture extract to investigate enrichment of RDases during BN-PAGE.

2.3.5 Dechlorination activity assays using gel slices

To determine the location of proteins in an unstained gel lane, the corresponding lanes with silver stained proteins were aligned, and gel slices were excised using a scalpel. Individual gel slices were cut into 1 mm square pieces and were transferred to 2 mL crimp-top glass vials. As a positive control, 10-25 µL of the crude protein extract from the same original sample was added to an additional glass vial. Dechlorination activity assays were performed essentially as described previously (Hölscher et al., 2003). In an anoxic chamber, the samples from TCE-induced and VC-induced KB-1 cultures were tested for TCE and VC dechlorination in 2.0 mL crimp-top vials amended with 1.0 mL assay buffer that contained 100 mM Tris-HCl (pH 7.4), 2 mM titanium citrate, 2 mM methyl viologen, and 30-70 mg/L (aqueous concentration) of chlorinated compounds. The samples from the 1,2-DCA KB-1 subculture were assayed for PCE, TCE, cis-DCE, trans-1,2-dichloroethene (trans-DCE), VC and 1,2-DCA dechlorination. The samples from the BAV1 culture were assayed for cis-DCE and 1,2-DCA dechlorination in 0.8 mL vials with 0.4 mL assay buffer that contained 100 mM potassium acetate (pH 5.8), 4 mM titanium citrate, 4 mM methyl viologen, and 25 mg/L (aqueous concentration) of cis-DCE or 1,2-DCA. The crude protein extract from the BAV1 culture grown on cis-DCE was also assayed for PCE, TCE, all three isomers of DCE, VC and 1,2-DCA. Crimp-top vials were closed with Teflon-coated septa immediately after assay buffer addition, thoroughly mixed, and were stored upside-down inside the anoxic chamber for 24-48 h prior to headspace analysis.

2.3.6 Analysis of dechlorination products

To determine the concentrations of chlorinated substrates and their dechlorination products following incubation, 250 µL (for KB-1 assays) and 50 µL (for BAV1 assays) headspace samples were removed from activity assay vials and directly injected into a Chrompack CP-3800 gas chromatograph connected to a flame ionization detector (FID) (Varian, Middelburg, the Netherlands) equipped with a 30 m-by-0.53 mm GS-Q column (J&W Scientific, Waldbronn, Germany). The following temperature program was used: 1 min at 100 °C, 50 °C/min to 225 °C, hold for 2.5 min. The FID was operated at 250 °C, with helium as the carrier gas at an input pressure of 680 hPa. For 1,2-DCA KB-1 subcultures only, 300 µL headspace samples were

22

similarly analyzed by gas chromatography as described previously (Grostern and Edwards, 2006a).

2.3.7 SDS-PAGE

While excising gel slices for dechlorination activity assays, parallel gel slices were also excised from a second unstained lane to elute proteins for sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE). BN-PAGE gel slices were cut into 1 mm square pieces and transferred to 1.5 mL Eppendorf tubes containing 250 µL of SDS elution buffer (100 mM Tris-HCl, pH 7.0, and 0.1 % wt/vol SDS). Following 12-20 h of shaking at 750 rpm, the solution containing eluted proteins was concentrated to 10-15 µL using an Amicon Ultra Centrifugal Filter with a 10 kDa cutoff (Millipore, Billerica, MA) following the manufacturer’s instructions. The concentrate was then analyzed by SDS-PAGE and the gels were silver stained according to Nesterenko et al. (1994).

2.3.8 LC-MS/MS analysis

The gel slices of interest from the stained lanes of BN-PAGE or SDS-PAGE gels were destained and proteins were reduced with 100 mM dithiothreitol, alkylated with 10 mM iodoacetamide and trypsin digested as described (Adrian et al., 2007). The mass of peptide fragments was determined by liquid chromatography linked to mass spectrometry via electrospray ionization (LC-ESI-MS/MS) as described (Benndorf et al., 2007) and by nanoLC-LTQ Orbitrap MS/MS (Bastida et al., 2010). Peptide fragments were identified using the MS/MS ion search in the Mascot server with previously described parameters (Kellner et al., 2007) (Bastida et al., 2010). For the 1,2-DCA KB-1 subculture only, the LC-MS/MS analysis was performed at the Advanced Protein Technology Center of SickKids’ Hospital (Toronto, Canada). After reduction, alkylation, and tryptic digest of proteins (Adrian et al., 2007), the resulting peptides were loaded onto a 150 μm ID pre-column (Magic C18, Michrom Biosciences) at 4 μL/min and separated over a 75 μm ID analytical column packed into an emitter tip containing the same packing material. The peptides were eluted over 60 min at 300 nL/min using a 0 to 40 % acetonitrile gradient in 0.1 % formic acid using an EASY n-LC nano-chromatography pump (Proxeon Biosystems, Odense, Denmark). The peptides were eluted into a LTQ linear ion trap mass spectrometer (Thermo-Fisher, San Jose, CA) operated in a data dependent mode. Six MS/MS scans were obtained per MS cycle. The raw data files were searched using X!Tandem (Beavis Informatics)

23

using a parent ion accuracy of 2 Da, and a fragment accuracy of 0.5 Da. A fixed modification of carbamidomethyl cysteine and variable modification of oxidized methionine were included in the search.

2.3.9 Reference databases used for LC-MS/MS analysis

Mass spectra from samples of the BAV1 culture were searched against proteins from the BAV1 genome (NCBI accession No., NC_009455) (McMurdie et al., 2009). The genome of BAV1 has 11 rdhA genes, including one transcriptionally-identified vinyl chloride reductase gene referred to as bvcA (Krajmalnik-Brown et al., 2004). Mass spectra from samples of KB-1 cultures were searched against two reference databases: 1) a database of all predicted protein sequences from the KB-1 metagenome and 2) a custom RDase database. The KB-1 metagenome was obtained from shotgun sequencing of the KB-1 culture DNA using Sanger sequencing (Hug, 2012). The assembly and annotation of the KB-1 metagenome were performed using the in-house pipelines of the DoE Joint Genome Institute (JGI, Walnut Creek, CA) (Hug, 2012) and can be accessed through the IMG/M platform (http://img.jgi.doe.gov/cgi-bin/m/main.cgi) with the IMG taxon object ID 2013843002. A clone library of RDase genes generated by Waller et al. (2005) identified 15 partial rdhA sequences (KB1_RdhA1 to KB1_RdhA14) in KB-1 cultures. The assembly and annotation of the KB-1 metagenome identified an additional 21 rdhA sequences, 18 of which are complete genes including a Geobacter rdhA gene (KB1_GeobRD) (Wagner et al., 2012). Because this collection of 36 rdhA sequences from the KB-1 culture was from metagenomic data, it may not cover all rdhA sequences in KB-1 cultures. Therefore, a custom curated protein database containing 182 RdhA sequences (including those from KB-1) was created. These additional RdhA sequences were mined from NCBI and JGI public sequence databases from the genomes of Dehalococcoides and other organisms (Table A3).

2.4 RESULTS

2.4.1 RDase expression in BAV1 cultures

Crude protein extract of strain BAV1 cultures grown on cis-DCE reductively dechlorinated TCE, cis-DCE, trans-DCE, 1,1-DCE, and VC but not PCE. 1,2-DCA was transformed by dihaloelimination to ethene (Table A1). The highest dechlorinating activities were observed for the three DCE isomers. These activities are similar to those observed in growing cultures (15), although relative rates may not be comparable because the rate-limiting step in organohalide 24

respiration is not necessarily the reductive dehalogenation step that is assayed in these experiments (recall methyl viologen is used as artificial electron donor). Other electron transfer steps could be growth rate-limiting in cultures. After BN-PAGE separation of strain BAV1 crude extracts, cis-DCE dechlorinating activity was mostly constrained to a gel segment around the 242 kDa standard band (Figure 2.1). Analysis of proteins eluted from this gel segment by SDS PAGE revealed a protein band with a molecular weight between 45 and 66 kDa, the molecular weight range of RDases (Figure 2.1). LC-MS/MS analysis of the SDS-PAGE gel slice revealed the presence of only one RDase, BvcA (Figure 2.1). With crude protein extracts from the BAV1 culture grown on 1,2-DCA instead of cis-DCE, the 1,2-DCA dechlorinating activity was again mostly constrained to a narrow gel slice (3-4 mm) around 242 kDa on the BN-PAGE gel (Figure 2.2 and Figure A1). Again, LC-MS/MS analysis of the gel slices from both BN-PAGE and SDS-PAGE revealed the presence of only one RDase, BvcA (Figure 2.2). Although non-RDase proteins were also detected in the gel slice, BvcA showed highest coverage (59%) and peptide hits (42); BvcA was therefore the most abundant protein in the BN-PAGE gel slice with the highest dechlorinating activity (Figure 2.2 and Table 2.1).

25

Figure 2.1 RDase expression in a strain BAV1 culture grown on cis-DCE. Shown are the results from BN PAGE indicating position of gel slices, amounts of dechlorination product(s) obtained with the different gel slices in activity tests, an SDS gel of proteins in slice 10, and the mass spectrometric identification of the band at ~50kDa. “+”: positive control for dechlorination assayed with 10 µL crude protein extract instead of the protein from a gel slice

26

Figure 2.2 RDase expression in a BAV1 culture grown on 1,2-DCA Shown are the results from BN PAGE indicating the position of gel slices, amounts of dechlorination product(s) obtained with the different gel slices in activity tests, an SDS gel of proteins in slice 4, and the mass spectrometric identification of the band at ~ 50kDa. “+”: positive control for dechlorination assayed with 10 µL crude protein extract instead of the protein from a gel slice.

2.4.2 RDase expression in KB-1-derived cultures

The crude protein extract from KB-1 cultures grown on TCE dechlorinated PCE, TCE, cis-DCE, trans-DCE, VC and 1,2-DCA in methyl viologen-amended activity tests (Table A1). The crude protein extracts from the 1,2-DCA KB-1 subculture dechlorinated the same substrates (Table A1), despite having been maintained on 1,2-DCA as growth substrate for 4 years. Dechlorination rates reported in Table A1 were measured over a 24-hour period and were normalized to protein concentration to obtain a rough estimate of the relative specific activity of the enzymes in the crude protein extracts on different substrates. These estimates, ranging from about 4-16 nmol ·

27

min-1 · mg protein-1 depending on substrate, were lower than specific activities determined previously for KB-1, that ranged between 50-90 nmol · min-1 · mg protein-1 (5). However in the previous study, activity was determined after 2 to 4 hours (instead of 24 hours) and cell-free extracts were prepared by sonication and without detergents (Chan et al., 2011).

Table 2.1 D. mccarty strain BAV1 proteins identified in the BN-PAGE gel region of enriched dechlorinating activity. The BAV1 culture was grown on 1,2-DCA prior to analysis. NCBI GI No. of Coverage Description No. peptide hits (%) Reductive dehalogenase, BvcA 48995937 42 59 Nicotinate nucleotide dimethylbenzimidazole 147669271 20 57 Phosphoribosyltransferase Chaperone protein DnaK 147669847 19 33 Chaperonin Cpn10 147669874 13 65 DNA polymerase III, beta subunit 147669676 7 21 Transketolase 147669261 7 13 Pyruvate ferredoxin oxidoreductase, alpha 147669303 6 18 subunit Formate dehydrogenase, alpha subunit 147668816 6 9 General substrate transporter 147669750 5 11 Chaperonin GroEL 147669875 4 9 Hypothetical protein 147668976 3 8 Periplasmic binding protein 147669265 2 7 GrpE protein 147669848 2 14 AIR synthase-like protein 147669977 2 7 Hypothetical protein (putative S-layer protein) 147669853 2 8

28

Figure 2.3 BN PAGE of protein extracts of the VC-induced KB-1 culture and VC dechlorinating activity in the bands. Shown is the BN PAGE gel indicating the position of gel slices. Activity towards VC was measured as nmol ethene produced. “+”: positive control for dechlorination assayed with 25 µL crude protein extract instead of protein from a gel slice.

In BN-PAGE gels using crude protein extracts from the three KB-1 cultures (VC-induced KB-1 culture, TCE-induced KB-1 culture and 1,2-DCA KB-1 subculture), the dechlorinating activity distribution along the gel lanes was essentially identical to that seen with BAV1 cultures. Again, dechlorinating activity was mainly constrained to a narrow region around 242 kDa (Figure 2.3, Figure A2 and Figure A3). LC-MS/MS analysis was focused on the gel slices with higher activity in order to identify the RDases in the three cultures. In total, only six distinct RDases out of 36 sequences in KB-1 were expressed (Figure 2.4). KB1_VcrA, KB1_BvcA, KB1_TceA and KB1_RdhA5 were expressed in all three cultures (Figure 2.4). KB1_VcrA was the most abundant RDase in all three cultures based on the number of peptide hits and coverage (Figure 2.4). KB1_RdhA1 was only found in the VC-induced KB-1 culture. KB1_GeobRD, which belongs to Geobacter lovleyi strain KB-1, was found in the TCE-induced and to a lesser extent in the VC-induced KB-1 cultures, but not in the KB-1 1,2-DCA subculture (Figure 2.4).

29

Figure 2.4 Peptide hits and coverage of the RDases identified from the three KB-1 related cultures. The MS spectra were searched against a custom RDase database (Table S4). For each culture, three consecutive gel slices covering the active region were subject to LC-MS/MS, separately. Peptide Hits were reported by summing up the peptide hits for each protein from the three gel slices. Peptide coverage (%) was reported using the highest coverage seen in the three slices. Values of peptide hits and coverage for the three gel slices individually are provided in Table A1. a IMG gene object ID; b NCBI GI number; c NCBI accession number.

2.4.3 Investigating RDase enrichment during BN-PAGE

To determine if specific activity in the gel slices was higher than that in the crude extract, it was necessary to obtain an estimate of protein content in gel slices. The protein content in gel slices from a KB-1 culture extract was quantified from gel images, as described under Materials and Methods. Consistent with all prior experiments, the majority of the activity on TCE, VC and 1,2-DCA was found in a gel segment around 242 kDa (slice 3 in Figure A4). The total amount of proteins in slice 3 was estimated to be 0.18 µg resulting in specific activities of 77, 90, and 6.9 nmol min-1 mg-1 for VC, TCE and 1,2-DCA, respectively. The specific activity in slice 3 was 5 to 15 times higher than that in the crude protein extract (Figure A4), demonstrating enrichment of RDases by BN-PAGE.

30

2.4.4 Identification of other proteins in active gel slices

Many non-RDase proteins were identified in selected gel slices of enriched activity when the MS spectra were searched against all proteins and protein fragments from the KB-1 metagenome (Table A3) or proteins from the BAV1 genome (Table 2.1). In all cultures except the 1,2-DCA KB-1 subculture, the protein with most peptide hits identified in gel slices showing activity was an RDase. In the 1,2-DCA KB-1 subculture, chaperonin GroEL (Dehalococcoides, 2013887541) and chaperone protein DnaK (Dehalococcoides, 2013903864) were the most abundant proteins, followed by an RDase. GroEL and DnaK were also abundant in all other samples (Tables 2.1 and A3). Other non-RDase proteins identified with high peptides hits included the Dehalococcoides α-subunit of pyruvate:ferredoxin oxidoreductase (2013890603). In samples from the KB-1 cultures, the majority of proteins belonged to Dehalococcoides, consistent with the dominance of Dehalococcoides in the KB-1 consortium. While Geobacter proteins were found in the TCE-induced and VC-induced KB-1 cultures, no Geobacter proteins were detected in the 1,2-DCA KB-1 subculture, consistent with the absence of Geobacter in this subculture.

2.5 DISCUSSION

2.5.1 Functional characterization of RDases

BvcA had previously been associated with VC dechlorination by transcriptional analysis (Krajmalnik-Brown et al., 2004). BvcA was the only RDase detected in active BN-PAGE gel segments obtained from electrophoresis of crude protein extracts from the BAV1 culture grown with either cis-DCE or 1,2-DCA (Figure 2.1 and Figure 2.2). It follows that BvcA must then also dechlorinate cis-DCE and 1,2-DCA. Because BN-PAGE does not separate RDases from each other as shown with data for the KB-1 cultures, BvcA, the only RDase detected in the gel segment showing maximum dechlorinating activity, is likely also the only RDase expressed by strain BAV1 under these conditions. BvcA must then be responsible for all dechlorinating activities detected in the crude protein extracts of the BAV1 culture grown on cis-DCE (Table A1). Therefore, in addition to cis-DCE, 1,2-DCA and VC, the substrates of BvcA include 1,1-DCE, trans-DCE and TCE, but not PCE. Similar substrate ranges were observed for two other characterized Dehalococcoides RDases, TceA (Magnuson et al., 2000) and VcrA (Müller et al., 2004). These results are consistent with the fact that strain BAV1 grows on all three DCE

31

isomers, VC and 1,2-DCA, but not TCE or PCE. TCE and possibly PCE are however co-metabolized in the presence of growth-supporting electron acceptors (He et al., 2003).

In total, only six distinct RDases were detected in the KB-1 cultures (Figure 2.4) with three of these orthologous to the characterized Dehalococcoides RDases VcrA, TceA, and BvcA. KB1_VcrA, the most abundant RDase expressed in all KB-1 cultures tested (Figure 2.4 and Table 2.2), shares 97.1% a.a. identity (15 differences/519) with VcrA from D. mccartyi strain VS, that was shown to dechlorinate VC, all three DCE isomers and TCE, the latter much more slowly (Müller et al., 2004). KB1_VcrA was also most abundant in the 1,2-DCA KB-1 subculture, suggesting that it might also dechlorinate 1,2-DCA (Figure 2.4 and Table 2.2). KB1_TceA shares 97.3 % a. a. identity (15 differences/560) with TceA from D. mccartyi strain 195; TceA is known to dechlorinate TCE, cis-DCE, 1,1-DCE, trans-DCE (slowly), VC (extremely slowly) and 1,2-DCA, but not PCE (Magnuson et al., 2000). KB1_BvcA shares 99.0 % a.a. identity (5 differences/516) with BvcA from D. mccartyi strain BAV1, that in this study was found to dechlorinate all three DCE isomers, VC, 1,2-DCA and TCE, but not PCE. In summary, VcrA, BvcA and TceA dechlorinate similar substrates with some differences in substrate preferences. However, they are manifestly distinct in pairwise comparisons, sharing 37.2% (TceA and VcrA ), 41.2% (TceA and BvcA) and 39.3% (BvcA and VcrA) a.a. identity.

KB1_RdhA5 was also detected in all three KB-1 cultures. In a previous transcriptional study (Waller et al., 2005), the gene encoding KB1_RdhA5 was transcribed with all investigated substrates (TCE, cis-DCE, VC, 1,2-DCA), corresponding well with the results of this study. In cultures of D. mccartyi strain 195 (DET1545), the ortholog of KB1_RdhA5 was most up-regulated at low chlorinated ethene concentrations or respiration rates (Rahm and Richardson, 2008). The low and possibly constitutive expression of this RDase under all conditions complicate a functional assignment and thus the substrate(s) of this RDase remain unknown.

KB1_GeobRD shares ~ 95 % a.a. identity to the two nearly identical RDases (IMG gene object ID 642678287 and 642678289, differing from each other by only by 4/515 a.a.) of Geobacter lovleyi strain SZ. Both G. lovleyi SZ and G. lovleyi strain KB-1 dechlorinate PCE and TCE to cis-DCE (Wagner et al., 2012) using acetate as electron donor. Since there are no other RDase sequences in these strains, KB1_GeobRD likely dechlorinates both PCE and TCE, which is consistent with our observations: KB1_GeobRD was relatively more abundant in the

32

TCE-induced KB-1 culture, was less abundant in the VC-induced KB-1 culture which was normally grown on TCE but amended with VC just prior to analysis, and was not detected in the 1,2-DCA KB-1 subculture (Figure 2.4 and Table 2.2). These results are also consistent with previous microarray data showing the transcription of KB1_GeobRD in TCE- but not in VC-amended KB-1 cultures (Waller, 2009).

Table 2.2 Summary of BN-PAGE analyses for the four different cultures. Culture Culture condition Chlorinated Activity RDases before protein substrates detected? identified in extraction tested on gel (Yes or active gel slicesa slices No) KB-1 Starved for 5 days TCE, Yes KB1_VcrA (*) maintained on then amended with cis-DCE, Yes KB1_BvcA (*)

TCE and TCE and H2 trans-DCE, Yes KB1_GeobRD methanol VC Yes (*) 1,2-DCA Yes KB1_RdhA5 PCE Yes KB1_TceA

KB-1 Starved for 5 days TCE, Yes KB1_VcrA (*) maintained on then amended with cis-DCE, Yes KB1_BvcA (*)

TCE and VC and H2 trans-DCE, Yes KB1_GeobRD methanol (same VC Yes KB1_RdhA5 culture as 1,2-DCA Yes KB1_TceA above) PCE Yes KB1_RdhA1

1,2-DCA KB-1 Grown exclusively TCE, Yes KB1_VcrA (*) subculture on 1,2-DCA and cis-DCE, Yes KB1_TceA (*) methanol for > 4 trans-DCE, Yes KB1_BvcA years VC Yes KB1_RdhA5 1,2-DCA Yes PCE Yes D. mccartyi Grown on cis-DCE TCE, Yes BvcA strain BAV1 cis-DCE, Yes trans-DCE, Yes 1,1-DCE Yes VC Yes 1,2-DCA Yes PCE No D. mccartyi Grown on 1,2-DCA 1,2-DCA Yes BvcA strain BAV1 a The identified RDases are listed in the order of decreasing peptide hit counts. Accession numbers are provided in Figure 2.4. *The dominant RDases are highlighted with asterisks.

33

2.5.2 Features of BN-PAGE

Native PAGE is an electrophoresis technique that separates proteins while preserving their native states, enabling subsequent protein identification using activity assays. Previously, Adrian et al. (2007) reported the use of Clear Native PAGE to identify a chlorobenzene RDase (CbrA) from D. mccartyi strain CBDB1. However, when this technique was applied to protein extracts from the KB-1 cultures, no dechlorination activity was recovered from gel slices. BN-PAGE was then investigated and found to significantly increase the activity recovered from the gel. The major difference between CN-PAGE and BN-PAGE is the use of Coomassie Blue G-250 to impart negative charges on protein surfaces for greater mobility through the gel (Wittig and Schägger, 2008). Another difference between the BN-PAGE approach described herein and the CN-PAGE used by Adrian et al. (2007) is that precast 4-16 % acrylamide gradient gels (Invitrogen) were used. Another important component of method optimization was the selection of an appropriate non-ionic, mild detergent required in the sample buffer to solubilize membrane-associated proteins such as RDases. Of the four different detergents tested, digitonin, dodecyl-β-d-maltoside, taurodeoxycholate, and Triton-X 100, digitonin was found to be most effective.

The BN-PAGE gradient gels (from Invitrogen) separated proteins based on size. Dechlorination activity was highly constrained to regions of electrophoretic mobility corresponding to 242 kDa in all cases studied. The fact that activity was constrained to a particular region suggests enrichment of RDases during electrophoresis. Specific activity in gel slices was found to be 5 to 15 times greater than in crude protein extracts (Figure A4), further indicating enrichment of RDases by this approach. However, although the BN-PAGE technique can partially separate and enrich RDases from other proteins, it does not separate different RDases from each other. Even when the active gel regions were divided into three consecutive narrow slices (Figure A2), no significant differences were found in the RDases identified (Table A2). Therefore, further separation of the proteins in each slice is needed to resolve cases where multiple RDases are co-expressed.

2.6 ACKNOWLEDGEMENTS

We thank Benjamin Scheer for his assistance in the laboratory and Laura Hug for the curation of the RDase database. Support for this research was provided by the Saxon State Ministry for

34

Science and Art Fellowships awarded to both K.E.F. and W.C. through L.A. K.E.F. acknowledges support through a NSF graduate research fellowship. S.T. received awards from the Government of Ontario through the Ontario Graduate Scholarships in Science and Technology (OGSST) and the Natural Sciences and Engineering Research Council of Canada (NSERC PGS B). LA was supported by the European Research Council and the DFG (FOR1530). Support was provided by the Government of Canada through Genome Canada and the Ontario Genomics Institute (2009-OGI-ABC-1405), by the Government of Ontario through the ORF-GL2 program and by the United States Department of Defense through the Strategic Environmental Research and Development Program (SERDP) under contract W912HQ-07-C-0036 (project ER-1586). Metagenome sequencing was provided by the U.S. Department of Energy Joint Genome Institute's Community Sequencing Program (CSP 2010).

35

Chapter 3 Identification of Dehalobacter Reductive Dehalogenases that Catalyze Dechlorination of Chloroform, 1,1,1-Trichloroethane and 1,1-Dichloroethane

Reproduced with permission from the journal of Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences. Citation: Tang S, Edwards EA. 2013. Identification of Dehalobacter reductive dehalogenases that catalyse dechlorination of chloroform, 1,1,1-trichloroethane and 1,1-dichloroethane. Philosophical transactions of the Royal Society of London. Series B, Biological sciences 368:20120318

3.1 ABSTRACT

Two novel reductive dehalogenases (RDases) that are highly similar to each other but catalyze distinct dechlorination reactions were identified from Dehalobacter-containing mixed cultures. These two RDases were partially purified from crude protein extracts of anaerobic dechlorinating enrichment cultures using blue native polyacrylamide gel electrophoresis. Gel slices were assayed for dechlorinating activity and associated proteins were identified using liquid chromatography tandem mass spectrometry with the metagenome of the parent culture as the reference database. The two RDases identified, annotated as CfrA and DcrA, share an amino acid identity of 95.2 %, but use different substrates: CfrA dechlorinates chloroform and 1,1,1-trichloroethane, but not 1,1-dichloroethane; DcrA dechlorinates 1,1-dichloroethane, but not chloroform or 1,1,1-trichloroethane. These two novel RDases share no more than 40 % amino acid identity to any other known or putative RDases, but both have a twin arginine motif and two iron-sulfur binding motifs conserved in most RDases. Peptides specific to two putative membrane anchor proteins, annotated as CfrB and DcrB, were also detected in gel slices.

3.2 INTRODUCTION

Chloroform (CF) and 1,1,1-trichloroethane (1,1,1-TCA) are persistent groundwater contaminants owing to their historical and widespread industrial use as organic solvents, and improper disposal

36

in the past. In the US, CF is present at 416 of 1723 National Priorities List (NPL) sites, while 1,1,1-TCA is present at 393 sites (NPL database search, June 2012, http://cfpub.epa.gov/ supercpad/cursites/srchsites.cfm). In addition to having adverse impacts on human health and the environment (Doherty, 2000a; Meek et al., 2002; Sjuts et al., 2012), CF and 1,1,1-TCA inhibit many microbial processes, including methanogenesis (Yang, 1981; Suidan et al., 1991; Weathers and Parkin, 1995; de Best et al., 1999; Adamson and Parkin, 2000; Yu and Smith, 2000) and reductive dechlorination (Bagley et al., 2000; Duhamel et al., 2002; Futagami et al., 2008). Since many sites are contaminated by multiple chlorinated organics including CF and 1,1,1-TCA, the removal of these two inhibitory compounds is of special importance to bioremediation efforts (Grostern and Edwards, 2006b).

We previously reported the discovery of a Dehalobacter-containing mixed culture (herein referred to as ACT-3) capable of dechlorinating chloroform, 1,1,1-trichloroethane and 1,1-dichloroethane (1,1-DCA) (Grostern and Edwards, 2006b; Grostern et al., 2010). A Dehalobacter strain (16S rRNA gene, DQ663785) from the ACT-3 culture is the first organism shown to respire CF (Grostern et al., 2010). The important role of Dehalobacter in organohalide respiration is clear: they can respire chloroethenes (Holliger et al., 1998), chloroethanes (Sun et al., 2002; Grostern and Edwards, 2006b, 2009), chloromethane (Grostern et al., 2010; Lee et al., 2012), chlorobenzenes (Nelson et al., 2011), and other halogenated compounds (van Doesburg et al., 2005; Yoshida et al., 2009b). Although Dehalobacter were previously recognized as a strictly organohalide-respiring anaerobe, two strains were recently shown to also ferment dichloromethane (DCM) (Justicia-Leon et al., 2012; Lee et al., 2012).

Organohalide-respiring bacteria such as Dehalococcoides and Dehalobacter catalyze reductive dechlorination using reductive dehalogenaseas (RDases) (Smidt and de Vos, 2004). Genome sequencing has revealed that these organisms tend to harbour multiple distinct putative reductive dehalogenase genes within their genomes; some have more than 30 (Kube et al., 2005; McMurdie et al., 2009). In total, hundreds of putative reductive dehalogenase genes have been identified; however, only a handful have been functionally characterized because of difficulties inherent to working with slow-growing strict anaerobes, the lack of genetic systems to manipulate these organisms, and the inability to express functional reductive dehalogenases heterologously (Sjuts et al., 2012). In this study, the identification of two RDases that catalyze dechlorination reactions in the ACT-3 culture was reported. The genes of these two RDases were

37

among the 19 Dehalobacter RDase genes determined by metagenomic sequencing of the ACT-3 culture. The RDase-identification process, similar to previous reports (Adrian et al., 2007; Tang et al., 2013), featured of the combination of blue native polyacrylamide gel electrophoresis (BN-PAGE), dechlorination activity assays of gel slices and liquid chromatography tandem mass spectrometry (LC-MS/MS) for protein identification.

3.3 MATERIALS AND METHODS

3.3.1 Cultures and culture history

The ACT-3 enrichment culture was originally derived from microcosms prepared with contaminated aquifer material. It has been maintained for more than 10 years in a mineral medium (Grostern et al., 2010) amended with 1,1,1-TCA as an electron acceptor and a mixture of methanol, ethanol, acetate and lactate (MEAL) as electron donors. This culture sequentially dechlorinates 1,1,1-TCA via 1,1-DCA to monochloroethane (CA). It also dechlorinates CF to dichloromethane (DCM). Dehalobacter was shown to be responsible for these dechlorination reactions (Grostern and Edwards, 2006b; Grostern et al., 2010). Two subcultures of the ACT-3 culture were created over 6 years ago with different chlorinated substrates: the DCA subculture has been maintained with 1,1-DCA as an electron acceptor and MEAL as electron donor mixture; the CF subculture has been maintained with CF as an electron acceptor and a mixture of methanol, ethanol and lactate (MEL) as electron donors. Over time, these two subcultures have lost specific dechlorinating activities compared to the parent culture: the CF subculture dechlorinates 1,1,1-TCA and CF, but no longer dechlorinates 1,1-DCA, while the DCA subculture dechlorinates 1,1-DCA, but no longer 1,1,1-TCA or CF.

3.3.2 Metagenome sequencing and assembly

DNA from the ACT-3 parent culture was extracted using a CTAB protocol recommended by the U.S. Department of Energy, Joint Genome Institute (JGI, Walnut Creek, CA). The protocol is available online: http://my.jgi.doe.gov/general/protocols/DNA_Isolation_Bacterial_CTAB _Protocol.doc. The sequencing was performed by JGI using 454 pyrosequencing. A total of 444 Mb of sequence was generated, including paired-end 454 reads from an 8 kb insert library. Initial assembly was performed with Newbler v. 2.5. The collection of resulting contigs is referred to as

38

the ACT-3 metagenome. The initial assembly and annotation were performed by the JGI and can be accessed by the JGI taxon object ID of 2100351010 (http://img.jgi.doe.gov/cgi- bin/m/main.cgi). DNA samples from the CF and DCA subcultures were extracted with UltraCleanTM soil DNA isolation kit (MOBIO). To determine community structure, DNA samples from these three mixed cultures were sequenced by 16S rRNA gene pyrotag sequencing. This sequencing and the phylogenetic assignment of sequenced reads were also performed by the JGI.

3.3.3 Identification of putative rdhA and rdhB genes

From the ACT-3 metagenome, contigs encoding putative rdhA genes were identified using a BLASTX search with a query database consisting of hundreds of known or putative RDases from public databases. Fragmentation in some rdhA genes (sequences that belong to one rdhA gene were not assembled into one contig) due to strain variations of the dominant organism, Dehalobacter, was observed; the re-assembly and curation of these fragmented rdhA genes were achieved by performing sequence alignments using Geneious Pro v. 5.4.2 (Drummond et al., 2011) as illustrated in Figure B2. Two re-assembled rdhA genes were found expressed in the cultures in subsequent LC-MS/MS analysis; therefore, the DNA sequences of these two genes were further confirmed and polished with read mapping using Geneious Pro and additional Sanger sequencing directed by specific PCR primers (Figure B2). Putative rdhB genes were identified by analyzing the gene neighborhoods of rdhA genes. Two rdhB genes were also re-constructed from fragmented sequences (Figure B3).

3.3.4 Sample preparation for Blue Native Polyacrylamide Gel Electrophoresis

Inside an anaerobic chamber (Coy, Michigan, US), culture samples (40 mL) were transferred to a 50 mL Falcon tube sealed with anaerobic tape. The tube was centrifuged at 10000 ×g for 20 min and returned to the anaerobic chamber. The cell pellet was resuspended in 1 mL remaining supernatant and transferred to a 2 mL o-ring capped microcentrifuge tube, and centrifuged again at 10000 ×g for 10 min. Inside the anaerobic chamber, with the supernatant discarded, the cell pellet was resuspended in 200 µL BN-PAGE sample buffer (1% digitonin, 50 mM BisTris, 50 mM NaCl, 10 % w/v glycerol, pH 7.2 with HCl). Glass beads (~ 50 mg) were added to the tube

39

and cells were lysed during 3 rounds of vortexing in a horizontal vortex (Scientific Industries Inc., Bohemia, NY, USA) at the maximum amplitude; each round consisted of 2 min vortex followed by 1 min cooling in an ice bath. The sample was centrifuged at 10000 ×g for 10 min, and the supernatant (crude protein extract) was transferred to a new Eppendorf tube. Before being subjected to electrophoresis, 200 µL crude protein extract was supplemented with 10 µL of a 5 % Coomassie Blue G solution.

3.3.5 BN-PAGE gel electrophoresis and staining

BN-PAGE gel electrophoresis and staining were performed outside the anaerobic chamber. Precast 4-16 % gradient Bis-Tris gels (NativePAGE™ Novex®, Invitrogen) and a electrophoresis device (XCell SureLockTM, Invitrogen) were used. The preparation of running buffers and the setup of the device were performed according to the manufacturer’s manual. Typically, a protein ladder (NativeMarkTM, Invitrogen) was loaded into the first lane of the gel and 20 µL of the crude protein extract was loaded into each of the nine remaining lanes. Electrophoresis was run at 150 V for 60 min and then at 200 V for another 45 min, with the entire chamber cooling in an ice bath. After electrophoresis, the first two lanes (the protein ladder and one of the culture samples) were cut for staining and the remaining gel lanes were saved for dechlorinating enzyme assays or for further separation by sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE). For the staining of the first two lanes, we used the “Fast Coomassie G-250 Staining” procedure from the manufacturer’s manual of the precast gels (Page 23).

3.3.6 SDS-PAGE

For SDS-PAGE analysis, BN-PAGE gel slices were cut from one unstained lane, chopped into <1 mm pieces and transferred to Eppendorf tubes containing 250 µL elution buffer (100 mM Tris-HCl, pH 7.0 and 0.1 % SDS). Protein elution was performed by shaking the tubes at 1000 rpm for 4 h at room temperature. After elution, the 250 µL protein sample was concentrated to a volume of ~ 20 µL by ultrafiltration with 3 kDa cut-off membranes (Pall Co., Canada), before being loaded on a SDS-PAGE gel. The SDS-PAGE gels were stained by silver staining following established protocols.

40

3.3.7 Assaying dechlorinating activity in gel slices

For assaying dechlorinating activity, an unstained gel lane was aligned with the stained gel lane and was cut into gel slices – this was carried out outside of the glove box. The gel slicing pattern, as shown in Figure 3.1, was designed to cover the regions that were known to have dechlorination activity. In preliminary assays, the other regions of a gel lane were tested and were found to have no dechlorination activity. The same pattern was kept when we prepared gel slices for subsequent separation with SDS-PAGE or LC-MS/MS analysis for protein identification. The same pattern was used for the protein extracts from all three mixed cultures. These gel slices were then transferred to 2 mL screw-top glass vials, and were brought into the anaerobic chamber. For each glass vial, 1 mL assay buffer was added, which contained 100 mM Tris-HCl (pH 7.4), 2 mM titanium citrate, 2 mM methyl viologen, and ~ 0.5 mM chlorinated substrate (1,1,1-TCA, 1,1-DCA or CF). These vials were incubated in the anaerobic chamber for 24 h, and then 0.3 mL headspace samples were taken for gas chromatography (GC) analysis to evaluate the extent of dechlorination of the substrate. As positive controls, 20 µL of the original crude cell extract (equal to the volume of sample loaded into one well of the BN-PAGE gel) was assayed. As negative controls, 20 µL heat-killed (incubating at 80 °C for 15min) crude protein extracts were assayed in preliminary tests; no dechlorinaiton activity of 1,1,1-TCA, CF and 1,1-DCA was observed in such negative controls.

41

Figure 3.1 Image of left two lanes of the BN-PAGE Gel, showing molecular weight ladder and the first stained sample lane. Remaining identical sample lanes (unstained) from the same gel were sliced at specific positions consistently by aligning to the stained lane as indicated. Each gel slice was then used in dechlorination assays. BP = Band Position.

3.3.8 LC-MS/MS analysis of gel slices

To identify the proteins contained in the gel slices, the slices were sent for LC-MS/MS analysis at the Advanced Protein Technology Center at SickKids’ Hospital (Toronto, Canada). The proteins in the excised gel slices were reduced with 100 mM dithiothreitol (in 50 mM ammonium bicarbonate), alkylated with 10 mM iodoacetamide (in 50 mM ammonium bicarbonate) and digested by overnight incubation with porcine trypsin (13 µg/µl). The resulting peptides were extracted with 25 mM ammonium bicarbonate buffer and 100% acetonitrile. The peptides thus produced were loaded onto a 150 µm ID pre-column (Magic C18, Michrom Biosciences) at 4 µL/min and separated over a 75 µm ID analytical column packed into an emitter tip containing the same packing material. The peptides were eluted over 60 min at 300 nL/min using a 0 to 40 % acetonitrile gradient in 0.1 % formic acid using an EASY n-LC nano-chromatography pump (Proxeon Biosystems, Odense, Denmark). The peptides were eluted into a LTQ linear ion trap mass spectrometer (Thermo-Fisher, San Jose, CA) operated in a data dependent mode. Six MS/MS scans were obtained per MS cycle. 42

The raw data files were searched with X!Tandem (Beavis Informatics Ltd., Canada) using a parent ion accuracy of 1.8 Da, a fragment accuracy of 0.4 Da, no semi-enzymatic cleavage, the maximum missed cleavage sites of 1 and the maximum expectation value for recorded peptides of 0.01. A fixed modification of carbamidomethyl cysteine and a variable modification of oxidized methionine were included in the search and the refinement, which also included variable modifications of the deamidation of asparagine and glutamine. The peptides were searched against two protein databases. The first consists of all proteins and protein fragments (from the ACT-3 metagenome) predicted and annotated through JGI’s IMG/m annotation pipelines (accessed by IMG taxon object ID of 2100351010). The second database is a custom database, consisting of only curated rdhA and rdhB genes found in the ACT-3 metagenome (Table B3). In a separate study, we were able to close the genomes of two Dehalobacter strains present in the ACT-3 Culture (Tang et al., 2012). The complete sequences of the two genomes were used to pull out proteins that are certain to belong to Dehalobacter (with an amino acid identity > 95%) and help to assign taxonomy. The remaining non-Dehalobacter proteins were then searched against the NCBI RefSeq_protein database and assigned taxonomy based on the best hit.

3.3.9 PCR reactions

Four primers were designed to specially amplify each of the two similar rdhA genes identified in this study. They were cfrA-413f (CCCGAACCTCTAGCACTTGTAG), cfrA-531r (ACGGCAAAGCTTGCACGA), dcrA-424f (AGCACTCAGAGAGCGTTTTGC), and dcrA-533r (CAACGGCCCAGCTTGCAT). For PCR reactions, a Taq DNA polymerase (Fermentas, Canada) was used. The thermocycling program was as follows: initial denaturation of 10 min at 94 °C; 40 cycles of 30 s denaturation at 94 °C, 30s annealing at 60 °C and 30 s extension at 72 °C; and final extension of 10 min at 72 °C.

3.3.10 Sequence analysis

The multiple sequence alignment and phylogenetic analysis of the two identified RdhA proteins and 18 additional characterized RdhA proteins were performed using MUSCLE (Edgar, 2004) and PHYML (Guindon and Gascuel, 2003) accessed through Geneious Pro. Potential promoter sites were predicted using NNPP 2.2 (Reese, 2001). Potential ribosome binding sites were

43

predicted using SIGSCAN 4.05 (Prestridge, 1991) The twin-arginine signal peptide was predicted by tatP (Bendtsen et al., 2005). Transmembrane helixes were predicted by TMHMM v. 2.0 (http://www.cbs.dtu.dk /services/TMHMM-2.0/). The theoretical molecular weights of proteins were calculated by the Compute PI/Mw tool of ExPASy (http://expasy.org/tools/pi_tool.html).

3.3.11 Nucleotide sequence accession numbers

The DNA sequences of cfrA, dcrA, cfrB and dcrB have been deposited in GenBank with the following accession numbers: JX282329 for cfrA, JX282330 for dcrA, JX282334 for cfrB and JX282335 for dcrB.

3.4 RESULTS

3.4.1 Metagenome sequencing.

Dehalobacter was previously shown to be responsible for the reductive dechlorination of 1,1,1-TCA, 1,1-DCA and CF in ACT-3 (Grostern and Edwards, 2006b; Grostern et al., 2010). This corresponds well with the dominance of Dehalobacter in the three ACT-3 related cultures as determined by pyrotag sequencing of the 16S rRNA gene (Figure B1). Shotgun and paired-end 454 pyrosequencing of the ACT-3 culture metagenomic DNA produced 13479 contigs (>500 bp) with N50 of 1708 bp and the largest contig of 169374 bp. In a separate study (Tang et al., 2012), we successfully assembled all Dehalobacter contigs into a closed circular assembly. However, certain locations in this assembly consist of alternative contigs that belong two two highly similar but different Dehalobacter genomes; by comparing this assembly with additional sequencing data from the CF subculture, which happens to have only one Dehalobacter strain, we managed to assemble and separate two complete Dehalobacter genomes. The coexistence of two Dehalobacter strains/genomes in ACT-3 resulted in fragmentation of some Dehalobacter contigs in the initial assembly produced by Newbler. The read depth of the contigs that were shared by both strains was ~ 80 and the two strains were found in similar abundance as indicated by read depth analysis. With homology search and manual curation, 19 putative intact rdhA genes and one truncated rdhA homologous gene (786 bp) were retrieved from the ACT-3 metagenome. The genome context investigation of these

44

putative rdhA genes revealed 17 putative rdhB genes. Read depth analysis of these rdhA and rdhB genes (data not shown) showed that they belong to the dominant genus, Dehalobacter, which was further confirmed after the closure of the two Dehalobacter genomes (Tang et al., 2012).

3.4.2 Functional differentiation of the three mixed cultures

The ACT-3 parent culture dechlorinates CF, 1,1,1-TCA and 1,1-DCA, while the CF subculture only dechlorinates CF and 1,1,1-TCA, and the DCA subculture only dechlorinates 1,1-DCA (Grostern et al., 2010). These observations were confirmed in dechlorination assays performed using the crude protein extracts from the three cultures (Table 3.1). The crude protein extract from the CF subculture did not dechlorinate 1,1-DCA, the one from the DCA subculture did not dechlorinate 1,1,1-TCA or CF, and the one from the parent ACT-3 culture dechlorinated all three substrates (Table 3.1).

Table 3.1 Results of 24-hour enzyme assays of the crude protein extracts from the three mixed cultures. Data are averages (± standard deviation) of triplicate extracts.

Dechlorination products (nmol)

Samples CF assay: 1,1,1-TCA assay: 1,1-DCA assay: DCM 1,1-DCA (CA)1 CA

Buffer control2 BD3 BD (BD)4 BD ACT-3 228 ± 3 98± 1 (6.4 ± 0.1)4 56 ± 2 CF subculture 139 ± 2 64 ± 2 (BD)4 0.1 ± 0.0 DCA subculture 2.5 ± 0.1 0.9 ± 0.1 (BD)4 47 ± 1

1 In the assays of 1,1,1-TCA, CA can be produced. The amount of CA is shown in parentheses. 2 The negative control using the reaction buffer without protein addition. 3 BD means below detection limit. 4 In the assays of 1,1,1-TCA, CA can also be produced. The amount of CA produced is shown in parentheses. Numbers in bold show highest activities.

45

3.4.3 RDase expression in the CF subculture

The crude protein extract from the CF subculture was shown to have negligible dechlorinating activity on 1,1-DCA (Table 3.1 and the positive control of Figure 3.2a); therefore, the gel slices from this sample were only assayed with 1,1,1-TCA and CF. Testing for dechlorination activity in gel slices revealed two regions of enriched dechlorination activity on the gel (Figure 3.2a): one was around Band Position 3 or 4 (BP-3 or BP-4), slightly below the 242 kDa marker; the other was around BP-7, slightly below the 146 kDa marker. Although the dechlorination profile with CF as substrate appears slightly different from that with 1,1,1-TCA (Figure 3.2a), this difference was likely caused by slight variations in the position of the gel slice in adjacent lanes (one individual lane was used for one specific assay). Nevertheless, we found these two regions of enriched dechlorination activity consistently in over three preliminary trials. In the gel slices of enriched activity, proteins with molecular weight similar to that of an RDase (~ 45 kDa) were identified by 2nd dimensional separation using SDS-PAGE (Figure B4). The presence of RDases was further confirmed by LC-MS/MS analysis of the proteins in gel slices from BP-2, BP-3, BP-4, BP-6, BP-7 and BP-8. Matching the MS spectra against the IMG-predicted proteins of the ACT-3 metagenome identified two protein hits related to RdhA proteins (DHTCA2_00197470 and DHTCA2_00327390, Table 3.2). Subsequent analysis showed that they were two fragmented sequences of one RdhA protein and the fragmentation was caused by strain variations in the initial assembly with Newbler. When the MS spectra were matched to the custom RDase database consisting of only curated putative rdhA and rdhB genes curated from the ACT-3 metagenome as described earlier, we found only one RdhA and only one RdhB were expressed in the CF subculture (Figure 3.2b). These two proteins were named CfrA and CfrB, with corresponding genes cfrA and cfrB. The assembly and separation of two Dehalobacter genomes from ACT-3 metagenome (Tang et al., 2012) confirmed that the cfrA and cfrB genes were adjacent and located in one gene operon in one genome. Because only CfrA was identified in the CF subculture, we can conclude that CfrA dechlorinates both CF and 1,1,1-TCA, but not 1,1-DCA. The fact that CfrA dechlorinates both CF and 1,1,1-TCA is actually not surprising because these two substrates are similar in structure: 1,1,1-TCA is methyl chloroform.

46

3.4.4 RDase expression in the DCA subculture

The crude protein extract of the DCA subculture had negligible dechlorinating activity on either 1,1,1-TCA or CF (Table 3.1 and the positive control of Figure 3.3a); therefore, the gel slices were only assayed with 1,1-DCA (Figure 3.3a). Dechlorination assays on gel slices revealed the presence of one region of highly enriched activity (centered on BP-6, Figure 3.3a). LC-MS/MS analysis revealed the presence of only one RdhA protein and one RdhB protein (Figure 3.3b), which were named DcrA and DcrB with the corresponding genes dcrA and dcrB. The dcrA and dcrB genes were adjacent and located in one operon in one genome in the newly assembled genomes (Tang et al., 2012). Since DcrA was the only RDase identified, we concluded that DcrA dechlorinates 1,1-DCA, but not 1,1,1-TCA or CF.

Table 3.2 Proteins identified in the gel slices of enriched activity from BN-PAGE gels using the reference database of all proteins identified in the ACT-3 metagenome. Table 3.2 Only the top 7 hits were listed for each gel slice.

IMG Cultur Spectrum Gel Slice Accession Annotation Putative Organism e Count Number CF BP-3 2107932898 25 OAH/OAS sulfhydrylase Dehalobacter sub- 2107922662 20 Chaperone protein DnaK Dehalobacter culture 2107932208 14 Chaperonin GroL Dehalobacter 2107926888 13 Citrate lyase beta subunit Dehalobacter 2107929921 10 Dihydroxy-acid dehydratase Dehalobacter 2107935413 8 Reductive dehalogense CfrA Dehalobacter fragment 2107929614 7 Formate dehydrogenase alpha subunit Dehalobacter

BP-7 2107922421 23 Reductive dehalogense CfrA Dehalobacter fragment 2107935413 18 Reductive dehalogense CfrA Dehalobacter fragment 2107924383 12 Uroporphyrin-III C-methyltransferase Dehalobacter 2107922662 7 Chaperone protein DnaK Dehalobacter 2107933300 7 Hup-type Ni,Fe-hydrogenase large Dehalobacter subunit 2107950157 5 Alcohol dehydrogenase, class IV Desulfovibrio 2107996864 5 Translation elongation factor TU Desulfovibrio

DCA BP-2 2107955926 27 Carbon monoxide dehydrogenase large Desulfovibrio sub- subunit culture 2107951343 11 Methanol-cobalamin methyltransferase Methanosarcina subunit B 2107914164 10 Carbon monoxide dehydrogenase large Desulfovibrio subunit 2107926575 10 Formyltetrahydrofolate synthetase Dehalobacter 2107922662 9 Chaperone protein DnaK Dehalobacter

47

2107929614 8 Formate dehydrogenase alpha subunit Dehalobacter 2107950157 5 Alcohol dehydrogenase, class IV Desulfovibrio

BP-6 2107950157 17 Alcohol dehydrogenase, class IV Desulfovibrio 2107922662 14 Chaperone protein DnaK Dehalobacter 2107952893 9 Alcohol dehydrogenase, class IV Desulfovibrio 2107963039 6 Reductive dehalogense DcrA Dehalobacter fragment 2107933300 5 Hup-type Ni,Fe-hydrogenase large Dehalobacter subunit 2107922866 4 Branched-chain amino acid transporter Dehalobacter 2107963634 4 Pyridoxamine 5'-phosphate oxidase Dehalobacter

ACT-3 BP-4 2107922967 9 Putative cell wall-binding domain Dehalobacter 2107922662 8 Chaperone protein DnaK Dehalobacter 2107929614 5 Formate dehydrogenase alpha subunit Dehalobacter 2107978239 4 DNA polymerase sliding clamp subunit Methanohalobium 2107950157 3 Alcohol dehydrogenase, class IV Desulfovibrio 2107996864 3 Translation elongation factor TU Desulfovibrio 2107935811 3 Acyl CoA:acetate/3-ketoacid CoA Clostridium transferase

BP-7 2107966468 11 Methanol-cobalamin methyltransferase Methanohalophilus subunit B 2108014270 11 Methanol-cobalamin methyltransferase Methanosalsum subunit B 2107922421 10 Reductive dehalogense CfrA fragment Dehalobacter 2107950157 9 Alcohol dehydrogenase, class IV Desulfovibrio 2107996864 8 Translation elongation factor TU Desulfovibrio 2107935413 7 Reductive dehalogense CfrA Dehalobacter fragment 2107933351 6 Translation elongation factor TU Dehalobacter

Bold indicates reductive dehalogenases.

48

(a) 373 250 80 CF to DCM 70 1,1,1-TCA to 1,1-DCA 60 1,1-DCA to CA 50 40 30 20 products (nmol) otal dechlorination 10 T 0

BP-1 BP-2 BP-3 BP-4 BP-5 BP-6 BP-7 BP-8 (b) 50 CfrA 40 CfrB 30 20 Number of peptide hits 10 N/A N/A 0

BP-1 BP-2 BP-3 BP-4 BP-5 BP-6 BP-7 BP-8

Figure 3.2 Results of BN-PAGE with protein samples from the CF subculture. (a) Quantification of dechlorination products in enzyme assays with gel slices. The arrow bars indicate the regular technical error in headspace GC measurements. “+” indicates the positive control, where activity was measured using 20 µL of protein extracts equal to the volume loaded onto each BN-PAGE well. Black bars: DCM detected from CF; grey bars: 1,1-DCA detected from 1,1,1-TCA; white bars: CA detected from 1,1-DCA. (b) Counts of LC-MS/MS-detected peptide hits in gel slices when searched against curated RdhA and RdhB proteins derived from the ACT-3 metagenome. “N/A” means the gel slice was not analyzed.

49

(a) 74 25 CF to DCM 1,1,1-TCA to 1,1-DCA 20 1,1-DCA to CA 15

10

products (nmol) 5 otal dechlorination T 0

BP-1 BP-2 BP-3 BP-4 BP-5 BP-6 BP-7 BP-8 (b) 50 DcrA 40 DcrB 30 20 Number of peptide hits 10 N/A N/A N/A N/A 0

BP-1 BP-2 BP-3 BP-4 BP-5 BP-6 BP-7 BP-8

Figure 3.3 Results of BN-PAGE with protein samples from the DCA subculture. (a) Quantification of dechlorination products in enzyme assays with gel slices; legend as in Figure 2. (b) Counts of LC-MS/MS peptide hits in gel slices when searched against curated RdhA and RdhB proteins derived from the ACT-3 metagenome. “N/A” means the gel slice was not analyzed.

3.4.5 RDase expression in the ACT-3 parent culture

The dechlorination profiles (Figure 3.4a) and RDase peptide hit profiles (Figure 3.4b) in gel slices from the ACT-3 parent culture are just like a combination of the results we found for each of the two subcultures. Again, two similar gel regions of enriched dechlorination activity were found and the number of RDase peptide hits is higher in the second region. With the functions assigned to CfrA and DcrA in the earlier analyses, the coexistence of CfrA and DcrA explains perfectly why the ACT-3 culture dechlorinates all three substrates. Notably, both CfrA and DcrA were found in all gel slices analyzed except BP-2 (Figure 3.4b) and both RdhB proteins, CfrB and DcrB, were also found expressed in the ACT-3 culture.

50

(a) 262 70 89 25 CF to DCM 20 1,1,1-TCA to 1,1-DCA 1,1-DCA to CA 15

10

products (nmol) 5 otal dechlorination T 0

BP-1 BP-2 BP-3 BP-4 BP-5 BP-6 BP-7 BP-8 (b) 40 CfrA CfrB 30 DcrA DcrB 20 Number of peptide hits 10 N/A N/A 0

BP-1 BP-2 BP-3 BP-4 BP-5 BP-6 BP-7 BP-8

Figure 3.4 Results of BN-PAGE with protein samples from the ACT-3 culture. (a) Quantification of dechlorination products in enzyme assays with gel slices; legend as in Figure 2. (b) Counts of LC-MS/MS peptide hits in gel slices when searched against curated RdhA and RdhB proteins derived from the ACT-3 metagenome. “N/A” means the gel slice was not analyzed.

3.4.6 Distribution of cfrA and dcrA genes

With PCR reactions using the primers (cfrA-413f, cfrA-531r, dcrA-424f and dcrA-533r) that specifically target cfrA and dcrA genes, we investigated the distribution of these two genes in the three mixed cultures (Figure B5). The distribution of these two genes is identical to that of the encoded RDases as determined by LC-MS/MS analysis. Therefore, we can further confirm the absence of DcrA in the CF subculture and the absence of CfrA in the DCA subculture.

51

3.4.7 Expression of non-RDase proteins

A large number of other non-RDase proteins were identified from BN-PAGE gel slices using LC-MS/MS analysis (Table 3.2 and Table B1). As expected, a large proportion of identified proteins belong to Dehalobacter (Table B1), corresponding to the abundance of this organism in these three cultures (Figure B1). Looking for potential protein-protein interactions with CfrA and DcrA, we focused on the Dehalobacter proteins identified from the two regions of enriched activity on the BN-PAGE gels (Table 3.2). Besides CfrA or DcrA, there were another two proteins that were repeatedly identified in all three cultures and in relatively high peptide counts: formate dehydrogenase alpha subunit (DHTCA2_00269400) in the first region of enriched activity and hup-type Ni,Fe-hydrogenase large subunit (DHTCA2_00306260) in the second region of enriched activity. Other subunits to these proteins, the formate dehydrogenase beta subunit (DHTCA2_00269390) and hup-type Ni,Fe-hydrogenase small subunit (DHTCA2_00306250), were also found in corresponding gel slices (Table B1).

3.4.8 Sequence analysis of CfrA and DcrA

DNA sequence analysis identified potential promoter sites and ribosome binding sites upstream of cfrA and dcrA genes. Based on a start codon 1 bp downstream of the nearest ribosome binding site, cfrA and dcrA genes with a full length of 1521bp were predicted. However, when the corresponding proteins translated from these two DNA sequences were aligned with other characterized RdhA proteins, the twin-arginine motif, RRQFLK, was found located 66 amino acids (AAs) from the N-terminal. In contrast, this motif is located only several residues from the N-terminal in other known RDases. Therefore, we chose an alternative start codon for these two rdhA genes so that the twin arginine motif is only 16 AAs from the N-terminal. With this adjustment, the gene length shortens to 1371 bp and the protein length to 456 AAs.

CfrA and DcrA differ from each other in 22 out of 456 AAs (Figure 3.5). Despite such high similarity, they were well discriminated by LC-MS/MS analysis (Figure 3.5), highlighting the sensitivity of this technique. A signal peptide (39 AAs long), which contains the twin-arginine motif, was predicted in both RDases. No peptide from the region of the predicted signal peptide was detected by LC-MS/MS analysis, indicating the cleavage of the signal peptide in the mature forms of CfrA and DcrA. The molecular weights of CfrA before and after the cleavage of the

52

CfrA vs. DcrA: 001 MDKEKSNNDK PATKINRRQF LKFGAGASSG IAIATAATAL GGKSLIDPKQ ......

051 VYAGTVKELD ELPFNIPADY KPFTNQRNIY GQAVLGVPEP LALVERFDEv ...... H....W ...L...... R...A..

101 RWNGWQTDGS PGLTVLDGAA ARASFAVDYY FNGENSACRA NKGFFEWHPK ...... H..W..... L......

151 VAELNFKWGD PERNIHSPGV KSAEEGTMAV KKIARFFGAA KAGIAPFDKR .P....R......

201 WVFTETYAFV KTPEGESLKF IPPDFGFEPK HVISMIIPQS PEGVKCDPSF ...... A...... L..T..A...

251 LGSTEYGLSC AQIGYAAFGL SMFIKDLGYH AVPIGSDSAL AIPIAIQAGL ...A.....Y T...... A......

301 GEYSRSGLMI TPEFGSNVRL CEVFTDMPLN HDKPISFGVT EFCKTCKKCA ...... P......

351 EACAPQAISY EDPTIDGPRG QMQNSGIKRW YVDPVKCLEF MSRDNVRNCC ...... F.. W......

401 GACIAACPFT KPEAWHHTLI RSLVGAPVIT PFMKDMDDIF GYGKLNDEKA ...... P.....

451 IADWWK ......

CfrB vs. DcrB:: 001 MALMTFVLGL LVASFGWGVN IWRKNTKFSI TWLGWTGIAL SFMVLLFTIA ......

051 WSWSCLLEGT PQAAGVGFLI FGSIMVVIGA ITRIVIIRGI PASKKDHIGG ...... T..... LS..IV...... T..

101 ISQST .....

Figure 3.5 Amino acid sequence alignment for CfrA versus DcrA, and CfrB versus DcrB. Sequences shown are for CfrA and CfrB. Residues that differ in DcrA and DcrB are in bold on next line. The twin arginine motif is boxed and the two iron-sulfur binding motifs are shaded. The dotted underline indicates the predicted signal peptide. The peptides detected by LC-MS/MS are underlined.

53

signal peptide were 50 and 46 kDa; those of DcrA were similar. The molecular weight of 46 kDa agreed well with bands seen by SDS-PAGE (Figure B4). In addition to the twin arginine signal peptides, CfrA and DcrA also share the two iron-sulfur binding motifs that are common to RdhA proteins (Figure 3.5).

89 BvcA (Dehalococcoides sp. BAV1, YP_001214307) 100 TceA (Dehalococcoides ethenogenes st. 195, AAF73916) 88 VcrA (Dehalococcoides sp. VS, YP_003330719) 100 100 PceA (Dehalococcoides ethenogenes st. 195, YP_181066) MbrA (Dehalococcoides sp. MB, ACF24863) 89 CbrA (Dehalococcoides sp. CBDB1, YP_307261) 85 CprA (Desulfitobacterium dehalogenans ATCC 51507, AF115542) 78 PceA (Sulfurospirillum multivorans, AAC60788) PrdA (Desulfitobacterium sp. KBC1, BAE45338) 90 PceA (Desulfitobacterium hafniense Y51, BAC00915) 69 PceA (Desulfitobacterium hafniense TCE1, CAD28792) 100 PceA (Desulfitobacterium sp. PCE-S, AAO60101) 77 PceA (Dehalobacter restrictus, CAD28790) 100 DcaA (Dehalobacter sp. WL, ACH87594) 100 DcaA (Desulfitobacterium dichloroeliminans LMG P-21439, CAJ75430) CprA (Desulfitobacterium hafniense PCP-1, AAQ54585) 100 DcrA (Dehalobacter sp. DCA, this study) CfrA (Dehalobacter sp. CF, this study)

0.4

Figure 3.6 Maximum likelihood phylogenetic tree of the RDases that have functional characterization. The alignment was generated using the MUSCLE algorithm, and the tree generated using the PhyML plugin in Geneious under the WAG model of evolution. Bootstrap support values (from 100 bootstrap iterations) are indicated where greater than 50%. The scale bar represents the average number of substitutions per site. For a complete tree of curated RDase sequences see the introductory chapter to this issue.

CfrA and DcrA are not closely related to other known or putative RdhA proteins. Their best hit in BLASTP search (05-Oct-2012) against NCBI non-redundant database was a putative reductive dehalogenase (YP_004971810) with an amino acid (AA) identity of ~ 40 %. Their novelty was further confirmed in the phylogenetic analysis with 12 other RDases that have some functional characterizations (Figure 3.6). CfrA and DcrA were also not closely related to other putative RdhA proteins recovered from their source metagenome (Figure B6).

54

3.5 DISCUSSION

3.5.1 CfrA and DcrA

CfrA and DcrA are the first RDases described to specifically dechlorinate CF and 1,1-DCA. Maillard et al. (Maillard et al., 2003) described the purification and characterization of a PceA RDase, which was reported to dechlorinate 1,1,1-TCA, but at a rate that is 1.4 % of the rate for PCE dechlorination. The novelty of CfrA and DcrA is also evident from their lack of homology to other putative or known RDases in public databases or even to other putative RDases recovered from the same metagenome (Figure 3.6 and Figure B6). Another distinctive feature of these two RDases is that they are highly similar to each other (95.2 % identical in AA sequence, 97.9 % identical in DNA sequence), but have exclusively different substrates. Perhaps one of the RDases evolved from the other or they evolved from a common ancestor; either way, the discovery of these two RDases shows how new functions can evolve in RDases. In a separate study (Tang et al., 2012), we showed that there were two Dehalobacter strains, and hence genomes, in the ACT-3 culture, and that CfrA belongs to one strain and DcrA belongs to the other strain. The existence of these two strains appears to be an example of recent strain divergence with niche selection.

3.5.2 CfrB and DcrB

Possessing typically three transmembrane α-helixes, RdhB proteins have been predicted as the membrane anchors for the catalytic units (RdhA proteins) (Neumann et al., 1998). This is the first time that RdhB proteins have been detected by mass spectrometry. There are multiple reasons why RdhB proteins have not been detected in previous proteomic studies (Morris et al., 2006; Adrian et al., 2007; Morris et al., 2007). Membrane proteins like RdhB proteins are poor substrates for trypsin digestion, a common sample preparation step in typical LC-MS/MS analysis. Trypsin cleaves proteins after lysine and arginine residues, which are rare in membrane proteins like RdhBs. Small peptides (<6 AAs) and large hydrophobic peptides resulted from trypsin digestion are difficult to identify by LC-MS/MS due to different reasons (Tran et al., 2011). This explains why only one peptide for CfrB (KDHTGGISQST) and one peptide for

55

DcrB (KDHIGGISQST) were detected in the LC-MS/MS analyses (Figure 3.5). We examined some other putative RdhB sequences from other organisms and found that they would be very difficult to detect by LC-MS/MS analysis using trypsin digestion (Figure B7).

3.5.3 Non-RDase proteins

We repeatedly observed two regions of enriched activity (one below 242 kDa and the other one below 146 kDa) in the BN-PAGE gels in the studies of the three mixed cultures. Interestingly, when we applied the same analysis to a Dehalococcoides-containing mixed culture (Tang et al., 2013), we only found one region of enriched activity, which was close to the upper region of enriched activity (242 kDa) described in this paper. In the work done by Adrian et al. (Adrian et al., 2007) using clear native PAGE, only one region of dechlorination activity was reported when protein extracts from a Dehalococcoides pure culture were used. Therefore, the observation of two distinct regions of enriched activity might be specific to Dehalobacter cultures.

As described earlier, subunits of two interesting protein complexes were found in the two regions of enriched activity in all three cultures: formate dehydrogenase in the upper region and hup-type hydrogenase in the lower region. The identified formate dehydrogenase alpha subunit (DHTCA2_00269400) is homologous (34.5% AA identity) to a protein (CBDB1A195) in Dehalococcoides mccartyi strain CBDB1, annotated as formate dehydrogenase major unit. This Dehalococcoides protein was found in high expression level in different Dehalococcoides-containing cultures (Morris et al., 2006; Adrian et al., 2007; Morris et al., 2007). Unlike Dehalococcoides, which cannot use formate as an electron donor, one Dehalobacter strain can (Sun et al., 2002). The hup-type Ni,Fe-hydrogenase found in the lower region of enriched activity (146 kDa) is potentially involved in hydrogen uptake in Dehalobacter. These two protein complexes could be parts of the electron transfer chain directed to RDases. Based on the patterns we found in the BN-PAGE gels, it is tempting to speculate that they are physically associated with RDases, although we don’t have conclusive evidence.

3.6 ACKNOWLEDGEMENTS

The authors are very grateful to Dr. Lorenz Adrian (UFZ, Leipzig) and Winnie Chan for teaching us Native PAGE techniques. We thank Ariel Grostern for establishing the three mixed cultures and Paul Taylor for help with the analysis of LC-MS/MS analysis data. 56

Support was provided by the Government of Canada through Genome Canada and the Ontario Genomics Institute (2009-OGI-ABC-1405). Support was also provided by the Government of Ontario through the ORF-GL2 program and the United States Department of Defense through the Strategic Environmental Research and Development Program (SERDP) under contract W912HQ-07-C-0036 (project ER-1586). Metagenome sequencing was conducted by the U.S. Department of Energy Joint Genome Institute supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231. S.T. received awards from the Government of Ontario through the Ontario Graduate Scholarships in Science and Technology (OGSST) and the Natural Sciences and Engineering Research Council of Canada (NSERC PGS B).

57

Chapter 4 Semi-Automatic In Silico Gap Closure Enabled De Novo Assembly of Two Dehalobacter Genomes from Metagenomic Data

Reproduced with permission from the journal of PLOS ONE. Citation: Tang S, Gong Y, Edwards EA. 2012. Semi-automatic in silico gap closure enabled de novo assembly of two Dehalobacter genomes from metagenomic data. PloS one 7:e52038. Copyright: @ 2012 Tang et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

4.1 ABSTRACT

Typically, the assembly and closure of a complete bacterial genome requires substantial additional effort spent in a wet lab for gap resolution and genome polishing. Assembly is further confounded by subspecies polymorphism when starting from metagenome sequence data. In this paper, we describe an in silico gap-resolution strategy that can substantially improve assembly. This strategy resolves assembly gaps in scaffolds using pre-assembled contigs, followed by verification with read mapping. It is capable of resolving assembly gaps caused by repetitive elements and subspecies polymorphisms. Using this strategy, we realized the de novo assembly of the first two Dehalobacter genomes from the metagenomes of two anaerobic mixed microbial cultures capable of reductive dechlorination of chlorinated ethanes and chloroform. Only four additional PCR reactions were required even though the initial assembly with Newbler v. 2.5 produced 101 contigs within 9 scaffolds belonging to two Dehalobacter strains. By applying this strategy to the re-assembly of a recently published genome of Bacteroides, we demonstrate its potential utility for other sequencing projects, both metagenomic and genomic.

58

4.2 INTRODUCTION

The value of assembling complete closed bacterial genomes has been questioned (Mardis et al., 2002) considering the high cost of genome finishing (closing assembly gaps and genome polishing). In the age of Sanger sequencing, when most assembly gaps were caused by insufficient sequencing, the only way to resolve gaps was to perform additional targeted sequencing from the contig ends (Gordon et al., 2001; Assefa et al., 2009), which is a labor-intensive and costly process. Even with next-generation sequencing (NGS) techniques (454 pyrosequencing, Illumina, SOLiD and others) that provide ample sequence coverage for single microbial genomes, the volume of data and small read length make finishing difficult and time-consuming (Chain et al., 2009), and thus many genomes remain as drafts.

In the assembly of sequencing data derived from single organisms, the major cause of assembly gaps is the presence of repetitive elements, such as the genes of transposases and reverse transcriptases (Kingsford et al., 2010). A powerful way to resolve such gaps is by incorporating mate-pair sequencing data, and several genome assemblers have incorporated mate-pair constraints into assembly. New stand-alone gap resolution programs, including IMAGE (Tsai et al., 2010), GapResolution (http://www.jgi.doe.gov/software/) and GapFiller (Boetzer and Pirovano, 2012) were designed to close gaps using mate-pair data. However, gap resolution in the assembly of NGS data is still challenging. In the assembly of the recently published genome of Bacteroides salanitronis, sequenced by 454 and Illumina sequencing, 193 additional PCR reactions and 4 shatter libraries were required to close the gaps after the application of GapResolution (Gronow et al., 2011).

In the assembly of metagenomic data, the challenge is compounded with subspecies polymorphism (or strain variation), resulting in fragmentation even in the assembly of non-repetitive genes. The interferences caused by sequences of coexisting similar genomes can cause severe fragmentation in metagenomic assembly. Before 2011, all genome assemblers and gap-resolution programs were designed to handle sequencing data derived from single genomes; therefore they are powerless in resolving the gaps caused by subspecies polymorphism. Newer assemblers, Meta-IBDA (Peng et al., 2011), Genovo (Laserson et al., 2011), Bambus 2 (Koren et al., 2011), have been developed recently to address specific challenges faced in metagenomic assembly. With respect to subspecies polymorphism, Meta-IBDA (Peng et al., 2011) proposes to

59

improve the assembly by condensing the interfering sequences from subspecies organisms into consensus sequences. However, sequence condensation results in the loss of information and the danger of inadvertent frame shifts.

In this paper, we proposed a strategy for in silico gap resolution that is capable of resolving assembly gaps within scaffolds caused by repetitive elements and subspecies polymorphisms. This strategy closes assembly gaps using pre-assembled contigs followed by verification with careful read mapping. Applying this strategy to the assembly of two coexisting Dehalobacter genomes in a metagenomic context, we were able to resolve nearly all assembly gaps in silico and close the genomes. By then incorporating sequencing data from a second metagenome that has only one of the two Dehalobacter genomes, the two genomes were successfully separated and polished into finished genomes. The Materials and Methods section describes the overall approach taken to obtain and assemble genomes, while the Results section provides specific step-by-step details of the procedure and assembly. An overview of the whole assembly process is shown in Figure 4.1. The Discussion and Conclusion provide a summary and comparison to other approaches, and recommendations for those considering metagenome sequencing.

4.3 MATERIALS AND METHODS

4.3.1 Culture description and metagenomic DNA sequencing

ACT-3 is an anaerobic enrichment culture that reductively dechlorinates chlorinated ethanes and methanes, including the industrial solvents and groundwater pollutants 1,1,1-trichloroethane (1,1,1-TCA), 1,1-dichloroethane (1,1-DCA), and trichloromethane (or chloroform, CF) (Grostern and Edwards, 2006b; Grostern et al., 2010). The name of the culture is derived from the contaminants it degrades: “ACT” is TCA backwards, and “3” is the number of chlorine substituents on a single carbon atom in both 1,1,1-TCA and CF; hence the name ACT-3. ACT-3 has been maintained in the laboratory for over a decade in a defined medium with 1,1,1-trichloroethane (1,1,1-TCA) as an electron acceptor and a mixture of methanol, ethanol, acetate and lactate as electron donors (Grostern and Edwards, 2006b). Two subcultures enriched with different chlorinated substrates were derived from ACT-3: the CF subculture was grown on chloroform and a mixture of methanol, ethanol and lactate and the DCA subculture was grown 60

Figure 4.1 Flow chart of the sequence assembly process.

61

on 1,1-DCA and a mixture of methanol, ethenal, acetate and lactate. While the parent culture dechorinates 1,1,1-TCA, 1,1-DCA and CF, the CF subculture only dechlorinates 1,1,1-TCA and CF and the DCA subculture only dechlorinates 1,1-DCA. The community of these three cultures was found to be diverse yet dominated by Dehalobacter (Figure 4.2). The identification of two different but highly similar reductive dehalogenases from these three cultures (Tang and Edwards, 2013) demonstrated the existence of two Dehalobacter strains in ACT-3, one of them was inherited by the CF subculture and the other one was inherited by the DCA subculture (Figure 4.2), which agrees perfectly with our genome assemblies described herein.

Figure 4.2 Microbial composition determined by 16S rRNA pyrotag sequences from the ACT-3 parent culture and two subcultures. The ACT-3 culture contains two Dehalobacter strains, strain CFstrain CF and strain DCA. Strain CFStrain CF was inherited by the CF subculture, expressing reductive dehalogenase CfrA, which dechlorinates CF and 1,1,1-TCA. Strain DCA was inherited by the DCA subculture, expressing reductive dehalogenase DcrA,

62

which dechlorinates 1,1-DCA. The microbial composition was determined by pyrotag sequencing of the 16S rRNA gene (Tang and Edwards, 2013).

DNA from the ACT-3 parent culture and the CF subculture were sequenced. DNA from the ACT-3 culture was extracted by a CTAB protocol as required by Joint Genome Intitute (JGI) (http://my.jgi.doe.gov/general/protocols/DNA_Isolation_Bacterial_ CTAB_Protocol.doc). This DNA sample was sequenced by the JGI in two runs of 454 pyrosequencing including one mate-pair library with an insert size of ~ 8.6kb. The collection of all the 454 pyrosequencing data from the ACT-3 culture is referred to as the “ACT-3 metagenome”. DNA from the CF subculture was extracted using the UltraCleanTM soil DNA isolation kit (MOBIO). The DNA sample was sequenced by the Toronto Center of Applied Genomics (TCAG, Toronto, CA) using mate-pair Illumina sequencing with an insert size of ~ 647 bp and read length of 76 bp. The collection of Illumina sequencing data from the CF subculture is referred to as the “CF metagenome”. All raw sequence data have been deposited in NCBI Sequence Read Archive: SRR554404, 454 paired-end reads of the ACT-3 metagenome; SRR554406, 454 non-paired-end reads of the ACT-3 metagenome; SRR554411, Illumina paired-end reads of the CF metagenome.

4.3.2 Genome assembly and gap resolution

The 454 sequencing data of the ACT-3 metagenome were pooled together and assembled by the JGI using Newbler v. 2.5 (Margulies et al., 2005). A collection of contigs and scaffolds resulted, which can be accessed with IMG taxon object ID of 2100351010 on JGI IMG/m platform (http://img.jgi.doe.gov/cgi-bin/m/main.cgi). The next step is to resolve the gaps within scaffolds. Typically, PCR reactions targeting specific gaps are run and then the resulting amplicons are sequenced. Instead, we used a different approach that resolved assembly gaps in silico using existing sequencing data. Assuming a gap was not caused by insufficient sequencing coverage, we searched with BLASTN against all contigs assembled by Newbler for overlapping contigs that could be candidates to bridge the gap (Figure 4.3a). A perl script (Text C1) was composed to automate the searching process. Briefly, this program begins by retrieving 1000 bp sequences from the edges of two presumed neighboring contigs. Using BLASTN (typically DNA sequence identity > 95% and e-value < 1e-10) searching against all contigs assembled by Newbler, other contigs that overlap nicely with these two 1000 bp sequences were identified. Imperfect

63

sequence overlapping (< 100% DNA sequence identity) is allowed considering potential variations caused by subspecies polymorphism and repetitive elements that Newbler cannot resolve properly. From the 5’ side of the gap, each potentially overlapping contig identified was used to repeat the search in the next iteration. After each iteration, the new overlapping contigs were compared to those identified from a similar process initiated from the 3’ edge of the assumed gap. If a common overlapping contig was identified from both ends, a potential solution was suggested and output. A typical output of this program is shown in Figure 4.3b. These overlapping contigs identified by the perl program were then input into a sequence manager, Geneious Pro v. 5.4 (Drummond et al., 2011), for sequence alignment and further analysis (Figure 4.3c).

Figure 4.3 Overview of the in silico gap-resolution process. (a) The principle of the perl program that automates the search for overlapping contigs that close an assembly gap. (b) A typical output of the perl program; shown is the case of gap 00973-G-00974; (c) The solutions to gap 00973-G-00974 represented as a multiple sequence alignment created and visualized with Geneious Pro.

64

Using this approach, we were able to resolve nearly all the gaps caused by repetitive elements and provide alternative solutions to gaps caused by subspecies polymorphism. Moreover, by attempting to close potential gaps between the terminal contigs of scaffolds, we were able to determine if any two scaffolds were adjacent, and if so, we could provide solutions to the gaps between them just like to a gap within a scaffold. In this way, we successfully determined the order of Dehalobacter scaffolds and combined them into a closed circle. However, this circle turned out to be a chimeric genome, a combination of two Dehalobacter genomes coexisting in the ACT-3 metagenome: in many gaps, the solutions consisted of alternative contigs that belong to the genomes of the two co-exisiting Dehalobacter strains.

4.3.3 Separation of the two Dehalobacter genomes

Separation of these two Dehalobacter genomes using sequencing data from the ACT-3 metagenome alone was impossible. Fortunately, we had Illumina read pairs from the CF metagenome, which contained only one Dehalobacter strain. The two genomes were separated by mapping the Illumina reads from the CF metagenome against the chimeric genome from the ACT-3 metagenome. First, a reference genome was created from the chimeric genome by using the consensus sequence in regions where the two genomes differed because of SNPs. In regions where the two genomes differ dramatically, we screened alternative sequences until we found the one that agreed with the Illumina data. Geneious Pro offers a powerful read-mapping program that allows mapping only with read pairs; this imposes extremely high read-mapping accuracy. After the first read mapping process, regions with poor coverage (read depth < 5) were identified (Figure 4.4a); these regions were mostly where the wrong alternative sequences were chosen and incorporated into the reference genome. By switching alternative choices in these regions, the reference genome was refined (Figure 4.4b).

65

Figure 4.4 Separation of the genome of strain CF by progressive read-mapping. (a) the result of 1st read mapping against the draft reference genome. (b) The result of last read mapping against the refined reference genome. Illumina read pairs from the CF metagenome, which only has the genome of strain CF, were mapped against a reference genome derived from a chimeric Dehalobacter genome from the ACT-3 metagenome, which has both strain CF and strain 11DCA. The progressive read-mapping process as described resulted in the refined genome (Figure 4.3b), representing the genome of strain CF. Regions that have coverage lower than 5x are highlighted in red. The read depth is highlighted in green when both DNA strands were covered and in yellow when only one strand was covered.

Another function of this read mapping process was to verify the solutions to gaps proposed by the in silico gap resolution process. Often, if the wrong or false solution was selected, it would result in a region of poor coverage (except those regions caused by subspecies polymorphism described above). However, not all false solutions could be identified by mapping with Illumina read pairs. One example is in the case of tandem repeats, which is discussed below in “Results”. Another example is related to transposable elements that tend to cause sequence duplication at their target sites. The duplicate sequences (often located on both sides of a transposable element), especially those longer than the length of 454 reads, can lead to a false solution that favors the deletion of the transposase gene. Such a false solution cannot be detected by read mapping. To avoid this pitfall, for gaps that would allow both the insertion or deletion of a multi-copy sequence (often a transposase gene), we always first tested the option including the insertion first, when mapping with Illumina read pairs.

After this trial-and-error process of progressive read mapping, the Dehalobacter genome shared by both metagenomes was identified (Figure 4.4b); the genome was named Dehalobacter sp. strain CF. By manually filtering the alternative gap solutions of the chimeric genome from the

66

ACT-3 metagenome against the genome of strain CF, the other Dehalobacter genome, named Dehalobacter sp. strain DCA, was also assembled.

4.3.4 PCR reactions

There were still four gaps that could not be fully resolved in silico: three of them were located within the three ribosomal RNA operons in both Dehalobacter genomes, and the other one was a large repetitive region (~ 800 bp long) consisting of continuous repetition of oligonucleotides. The complete resolution of these four gaps was achieved by PCR amplifications followed by Sanger sequencing targeting uncertain regions on the amplicons. In addition, to verify our in silico gap resolution approach, PCR primers were designed to 22 gaps in the genome of strain CF. These gaps were caused by the presence of repeats but resolved completely in silico. In all PCR reactions, DNA from the CF subculture, containing only strain CF, was used as the template. Primer design was facilitated by the primer design function offered in Geneious Pro. For long-distance amplifications (> 3 kb), Phire® Hot Start II DNA polymerase (Fermentas, Canada) was used; in other cases, Taq DNA polymerase (New England Biolabs, Canada) was used. The temperature programs were designed based on the properties of the primers and the instruction manuals for the two enzymes. The size of amplicons was estimated by electrophoresis on a 1% agarose gel and comparing bands to those from a DNA ladder (GeneRulerTM 1 kb Plus; Fermentas, CA). Direct Sanger sequencing of the amplicons was performed by the Centre of Applied Genomics (Toronto, Canada).

4.3.5 Genome polishing

The two Dehalobacter genomes were polished further mainly with read mapping and editing in Geneious Pro. The genome of strain CF underwent two polishing steps. First, we mapped short-insert (~ 647 bp) Illumina read pairs from the CF metagenome against the CF50 genome to correct errors caused by the inaccuracy of 454 pyrosequencing in estimating the length of homo-polynucleotides. The mapped Illumina read pairs generated a polished genome with limited SNPs (defined as positions that have a variant frequency higher than 10%). These SNPs were mainly caused by polymorphisms among multi-copy sequences within the genome. They were located so deeply inside multi-copy sequences that they could not be fixed by short-insert

67

read pairs. Second, we corrected these SNPs by mapping with long-insert (~ 8 kb) 454 read pairs from the ACT-3 metagenome.

Since the genome of strain DCA was only present in the ACT-3 metagenome and there is no Illumina paired-end data that can be used to polish it as for the genome of strain CF, it was polished slightly differently. First, a concatenated sequence containing the draft genome of strain DCA with the polished genome of strain CF was created using a 20 kb long poly-N bridge which is longer than any read pair span. Against this sequence, we mapped all 454 reads (either in pairs or not) and identified the SNPs that were not related to homo-polynucleotides but had a variant frequency higher than 10%. Nearly all of these SNPs were caused by two nucleotides (40% A and 60% T). For those consisting of more than 90% of one nucleotide, this nucleotide was chosen. Others were assumed to be caused by subspecies polymorphism. The positions of these SNPs were marked and the genome of strain DCA was aligned with that of Strain CF. For the marked SNPs, if one nucleotide was used by strain CF, the other nucleotide was chosen for strain DCA. For variations between the two genomes that were related to sequences of homo-polynucleotides, we refined the genome of strain DCA by harmonizing it to that of strain CF. Notably, because the genome of strain DCA could not be polished directly using read mapping of Illumina paired-end data, potential sequence errors intrinsic to 454 pyrosequencing cannot be fully corrected for this genome. However, we expect these errors to be minimal, because most errors caused by 454 pyrosequencing appeared as homopolynucleotides based on our observations during the polishing of the genome of strain CF and given the extreme similarity of the two strains, most of these errors would have been corrected using the genome of strain CF as the reference.

4.3.6 Testing on a published genome

To further validate the approach, we attempted to re-assemble a published genome of B. salanitronis (Gronow et al., 2011). The raw sequencing data of the genome was kindly provided by the JGI. The data consists of two sets of sequencing data derived from pure culture DNA: one set was generated using 454 pyrosequencing, which included both mate-pair (average insert size of 6465±1616 bp) and non-mate-pair sequence data; and the other set was generated using Illumina technology and provided non-mate-pair (average read length of 36 bp) sequence data. The 454 data was first assembled with Newbler v. 2.3 accessed through Galaxy JGI

68

(https://galaxy.jgi-psf.org/) and default settings were used. The draft assembly with Newbler produced 121 contigs in one scaffold; other scaffolds that belong to plasmids were not considered. Our in silico gap resolution strategy was then applied to these contigs, resulting in a closed genome. We could not verify the assembly using Illumina data as we did for the genome of Dehalobacter strain CF because the data was not in pairs. Instead, we mapped the 454 long-insert read pairs against the assembled genome. We further polished the genome with Illumina reads to correct sequencing errors in homopolynucleotides in the 454 sequence data. We then mapped 454 paired reads and single reads to further correct ambiguities.

4.3.7 Accession numbers

The sequences and annotations of the two Dehalobacter genomes have been deposited in NCBI with the following accession numbers: CP003870 for strain CF and CP003869 for strain DCA.

4.4 RESULTS

4.4.1 Draft assembly of the ACT-3 metagenome

454 pyrosequencing of the ACT-3 culture generated ~ 2.2 M reads with the average read length of 198 bp. Approximately 0.9 M reads were in pairs with an insert size of ~ 8.6 kb. The draft assembly of the ACT-3 metagenome generated 28,621 contigs and 331,559 singlets, which can be accessed through IMG/M (http://img.jgi.doe.gov/cgi-bin/m/main.cgi) with IMG taxon object ID, 2100351010. There were 13,437 contigs longer than 500 bp, with N50 of 1705 bp. The largest contig was 169,374 bp long. As the read depth of a contig is proportional to its abundance, these contigs were classified according to read depth (Figure 4.5). Subsequent assembly proved the coexistence of two Dehalobacter genomes in the ACT-3 metagenome. Therefore, there were contigs shared by both Dehalobacter genomes with read depth of ~ 70 (Region B in Figure 4.5) and there were contigs specific to each of the two Dehalobacter genomes with read depth of ~ 35 (Region C in Figure 4.5). Contigs with read depth higher than 90 (Region A in Figure 4.5) mainly belong to multi-copy sequences (such as transposable elements and ribosomal RNA sequences) in Dehalobacter genomes, while contigs with read depth lower than 20 (Region D in Figure 4.5) belong to less abundant organisms in the ACT-3 culture. The most abundant (non-Dehalobacter) fermenting organism in the ACT-3 metagenome

69

was a strain of Bacteroides, with read depth of ~ 14. Many of these contigs were further combined into scaffolds by Newbler incorporating mate-pair constraints. Overall, 159 scaffolds were generated. The largest scaffold had an estimated length of ~ 2.7 Mb. Ironically, this scaffold did not belong to Dehalobacter, but to Bacteroides. In order to assemble the genomes of Dehalobacter, we identified Dehalobacter scaffolds as those with read depth higher than 20 (Table 4.1).

Figure 4.5 Contig distribution in the ACT-3 metagenome. Based on average read depth, the contigs were grouped into 4 regions. Region A: multi-copy contigs in the Dehalobacter genomes (read depth > 90); Region B: contigs shared by both Dehalobacter strains (red depth ~ 70); Region C: contigs specific to each Dehalobacter strain (red depth ~ 35); Region D: contigs that belong to other organisms of lower abundance (red depth < 20).

Table 4.1 Dehalobacter scaffolds in the ACT-3 metagenome. (scaffolds of other organisms are not included in this table)

Scaffolds From To No. of Gaps Scaffold002 Contig00228 Contig00268 40 Scaffold003 Contig00269 Contig00297 28 Scaffold004 Contig00298 Contig00314 16 Scaffold009 Contig00530 Contig00539 9 Scaffold018 Contig00677 Contig00678 1 Scaffold041 Contig00883 Contig00885 2 70

Scaffold054 Contig00972 Contig00975 3 Scaffold095 Contig01153 Contig01154 1 Scaffold129 Contig01244 Contig01245 1

4.4.2 In Silico gap resolution

There were 101 gaps within 9 Dehalobacter scaffolds (Table 4.1). Using in silico gap resolution (Figure 4.3), we were able to close almost all these gaps from remaining contigs pre-assembled by Newbler. The gaps were classified by type into four Groups (A, B, C and D). We adopted a gap nomenclature based on neighboring contigs: for example, the gap between contig00290 and contig00291 was written as 00290-G-00291.

Group A (overalapping contigs, 23 gaps). For gaps in this group, the preceeding contig overlapped directly with the following contig, yet were not assembled. Many of these gaps were caused by the presence of a repeated sequence of 500-700 bp present in two copies in the genome (Figure 4.6). The two copies were not necessarily identical, but contained homologous regions that broke the assembly. These kind of repeated sequences tend to exist in pairs: 00237-G-00238 with 00240-G-00241, 00252-G-00253 with 00286-G-00287, 00256-G-00257 with 00261-G-00262, 00257-G-00258 with 00260-G-00261, 00280-G-00281 with 00281-G-00282, 00294-G-00295 with 00295-G-00296, and 00300-G-00301 with 00301-G-00302. Examples of the sequence annotations in these gaps include: export cytoplasm protein SecA, ATPase RNA helicase, invasion associated protein p60, spore germination B3 GerAC, flagellin protein FlaA and hypothetical proteins. The two homologous copies of each gene are probably paralogs in the genome. For the remaining 7 gaps in Group A, the reason why they were not assembled is unknown; three of these 7 gaps were chosen for verification with PCR and subsequent sequencing of the amplicons: these additional sequencing results agreed with those determined by our in silico gap-resolution method (Table C2).

71

Figure 4.6 Typical gaps in Group A. (a) The resolution of gap 00237-G-00238. (b) The resolution of gap 00240-G-00241. (c) The sequence alignment of the consensus sequences of gap 00237-G-00238 and gap 00240-G-00241. All DNA sequence alignments (including those in other figures) were generated with Geneious Pro, having the same format. As shown in Figure 4.6a, most sequence identifiers consist of three regions. Region 1 shows the ID of the sequence. Region 2 indicates some specific tags: “W” means the sequence is the last 1000 bp nucleotides adapted from the 3’ end of the contig, and it is on the west side of the gap; “E” means the sequence is the first 1000 bp adapted from the 5’ end of the contig, and it is on the east of the gap; “F” means the sequence is a whole contig and in its forward orientation; “R” means the sequence is a whole contig but in its reverse orientation. Region 3 shows the average read depth of the contig from which the sequence is derived. The sequence alignment is shown on the right hand side. Marks on the top show the scale; the alignment mismatches are highlighted in black and the matches in grey; gaps in sequences are indicated in dashes. In some Figures (e.g., Figures 4.8, 4.11, and 4.15) the identity of the overlapping sequences is shown on top of the alignment as a coloured bar; positions with 100% identity are in green and positions with lower identity are in yellow.

Group B (multi-copy contigs, 38 gaps). Gaps in this group could often be filled by the placement of contigs that have high read depth and thus exist in multiple copies in the genome. Most of these multi-copy contigs contained sequences for repetitive elements commonly found in bacterial genomes (Table 4.2), including the genes of transposases, reverse transcriptases, integrases, and ribosomal RNAs. In the resolution of these gaps, the overlap between these multi-copy contigs and other contigs was not necessarily perfect (Figure 4.7). A multi-copy contig is originally assembled from reads that come from different loci of the genome, so it is prone to be chimeric. Therefore, the multi-copy contig can have an edge sequence that is specific to one of its multiple loci on the genome, causing an imperfect overlap in other loci (Figure 4.7).

72

The poorly-overlapping edge from the multi-copy contig therefore must be trimmed in resolving these gaps (rectangular boxes in Figure 4.7).

Table 4.2 Length, read depth and annotation of multi-copy sequences in the Dehalobacter genomes. Contig Length Average Annotation ID (bp) Read Depth1 01677 1957 75.4 Putative transposase 06868 738 90.0 - 02118 1533 106.8 PBS lyase HEAT-like repeat family protein 01493 2317 115.7 Putative reverse transcriptase 05122 872 128.7 Dehalobacter 16S rRNA (partial) 01334 3439 142.1 Putative recombinase 01997 1660 149.0 Dehalobacter 16S rRNA (partial) 02840 1270 151.2 Putative transposase 01481 2332 151.2 Putative transposase 01315 4355 187.9 Dehalobacter 5S and 23S rRNA 04728 914 194.78 Putative transposase 01581 2052 226.0 Putative transposase 01468 2361 277.4 Putative transposase 04522 940 287.5 Putative transposase 01532 2202 304.1 Putative reverse transcriptase 03012 1216 355.2 Putative transposase 01504 2287 355.6 Putative transposase 02363 1449 371.5 Putative transposase 01388 2750 395.1 Putative transposase 1 The average read depth of the contigs shared by both Dehalobacter genomes is 69.5.

73

Figure 4.7 Typical gaps in Group B. Five gaps caused by the presence of a multi-copy contig, contig01468 are shown. Notably, although part of contig01468 is shared by all five gaps, the terminal part on the 5’ edge of contig01468F (highlighted with rectangles) only belongs in the last gap. It would be more reasonable to assemble the raw reads in this region to contig03616, but Newbler was not smart enough to do so. The consequence is that this kind of poor overlap (as shown in the first four gaps) prevailed in the resolution of gaps caused by multi-copy contigs. Accordingly, these poorly overlapping edges of the multi-copy contigs were trimmed in the construction of consensus solutions.

Group C (strain variation and alleles, 16 gaps). Subspecies polymorphism (or strain variation) is a challenge specific to metagenomic data. The 21 gaps in Group C resulted from differences between the two Dehalobacter genomes coexisting in the ACT-3 metagenome. A distinctive feature of all gaps of Group C is the presence of “pairs of alternative contigs” (Figure 4.8a and 4.8b); the number of the pairs of alternative contigs varies for different gaps. The two alternative contigs are homologous and have limited differences, often in the form of dispersed SNPs. They also tend to have similar length and similar read depth (Figure 4.8a and 4.8b). The presence of these “pairs of alternative contigs” further confirms the presence of two Dehalobacter genomes in the ACT-3 metagenome.

74

Figure 4.8 Typical gaps in Group C. (a) The resolution of gap 00289-G-00290. (b) The resolution of gap 00290-G-00291. In Figure 4.8a and 4.8b, “pairs of alternative contigs” are highlighted in single brackets; contig01244 and contig01245 are highlighted with an asterisk. (c) The schematic graph showing the relationship between scaffold003 and scaffold129. Contigs are represented by straight lines with contig ID on the top and average read depth at the bottom; curved arrows indicate scaffolding relationships.

Group D (insertions and deletions, 23 gaps). Gaps in this group were caused by subspecies polymorphism resulting from the insertion of a sequence in one strain but not in the other. One example is shown in the resolution of three related gaps (Figure 4.9a): 00270-G-00271, 00271-G-00272, and 00270-G-00272. These three gaps were caused by the fact that contig00271 was present in only one of the two strains. Read depth further confirmed this solution: the read depth of contig00271 is 30.86, while the read depths of contig00270 and contig00272 are 71.36 and 71.68, respectively (Figure 4.9a), therefore contig00271 only belongs to one strain. Unfortunately, superficial analysis based on scaffolding information of these contigs only favors the insertion of contig00271: these three contigs were in sequential order in scaffold002 and the

75

overlap between contig00270 and contig00271 and between contig00271 and contig00272 appears perfect. The traces that support the deletion of contig00271 were buried in the assembly of related 454 contigs (Figure C1): a phenomenon we named “edge sequence suppression”. These traces can be revealed in the visualization of the assembly at the 5’ edge of contig00270 as shown in Figure C1, which was generated using EagleView v.2.0 (Huang and Marth, 2008) where the sequences that were suppressed are highlighted in red. For contig00270, there were 15 homogeneous reads suppressed, represented by the read labeled GJDNVXK01E3JVP (Figure 4.9a); these suppressed reads constitute an alternative ending to the contig. On the 3’ edge of contig00272, an alternative ending represented by the raw read GQIUW4002GKUZE (Figure 4.9a) was found. The alignment of these two alternative endings supports the deletion of contig00271 (Figure 4.9a). Many other similar examples were found (Table C1).

The two alternative paths between contig00270 and contig00272 arose because contig00271 is present in one of the two strains. Contig00271 is only 1148 bp, encoding a gene annotated as “Stage 0 sporulation two-component response regulator”. A similar case was found between contig00253 (read depth of 73.4) and contig00255 (read depth 81.9); in this case, the ambiguity was caused by contig00254 (read depth of 43.1), that is ~ 65 kb long and includes 65 genes. In these two cases, the two sequences that were either inserted or deleted were not commonly found as transposable elements. However, in all other gaps in Group D, the insertion or deletion corresponded to a common transposable element. For example, for gap 00230-G-00231 (Figure 4.9b), the ambiguity was caused by a multi-copy contig, contig01388, which had an average read depth of 395. As discussed earlier, the gaps in Group B were also caused by such multi-copy contigs. The difference between them is that solutions to the gaps in Group B were shared by both Dehalobacter strains, while those to the gaps in Group D were strain-specific. It is likely that the transposition events that cause gaps in Group B happened before the differentiation of the two strains, while the transposition events that cause gaps in Group D happened after the differentiation.

76

Figure 4.9 Typical gaps in Group D. (a) The insertion or deletion of contig00271. (b) The insertion or deletion of contig01388. The sequences highlighted with an asterisk are raw reads that are suppressed at the edges of different contigs. Sequence edges that are highlighted in rectangles should be trimmed in generating consensus sequences.

Gap-distance comparison. To further verify the proposed solutions for gaps within scaffolds, the gap distance determined for the proposed solution was plotted against the gap distance estimated from mate-pair constraints. The latter is obtained from the scaffolding outputs of Newbler and depends on the size distribution of the insert library. Excellent agreement between the two estimates of gap distance was found for most gaps in Group A, B and C (Figure 4.10). For gaps caused by the insertion or deletion of a sequence (Group D), the gap distance was calculated assuming insertion, while the gap distance determined by Newbler from mate-pair constraints is likely an average of the two cases. This can explain why the mate-pair estimations of the gaps in Group D were found to be lower than those predicted by assuming the case of an insertion (Figure 4.10).

77

Figure 4.10 Assessment of the assembly using gap-distance comparisons. When the preceeding contig and the succeeding contig overlapped directly with each other, the gap distance was negative with the value equal to the length of the overlapped region. However, all gap distances calculated from Newbler were positive and the minimum value was 20 (the details of Newbler’s calculation are unknown). This explains why some gaps are located below the horizontal axis. Most gaps from Group D have insertion or deletion of a multi-copy sequence: insertion in one strain and deletion in the other strain. The gap distance based on insertion is longer than the one based on deletion. For simplicity, we calculated gap distance assuming insertion, while Newbler’s estimations should be average values between the gap distance in the case of deletion and the one in the case of insertion, depending on the mate pairs used for calculation. This likely explains why most gaps from Group D locate above the diagonal line. The gap distances for gaps 00285-G-00286 and 00239-G-00240 (highlighted by arrows) are consistent with mate-pair predictions if one assumes the existence of the tandem copies of the multi-copy sequences involved.

78

Figure 4.11 Combinations of alternative scaffolds. (a) The combination of scaffold095 and scaffold054. (b) Traces of homology between contig00974 and contig01154R. (c) Traces of homology between contig00975 and contig01153R. (d) The combination of scaffold041 and scaffold003. Contigs are represented by straight lines with contig ID on the top and average read depth at the bottom; curved arrows indicate scaffolding relationships.

4.4.3 Alternative scaffolds

In the discussion of gaps caused by subspecies polymorphism in Group C, we explained the existence of “pairs of alternative contigs”. In those cases, contigs are alternative to each other. In some cases, the whole scaffold becomes the alternative. One simple example is scaffold129, which consists of two contigs: contig01244 and contig01245. Contig01244 is alternative to contig01442 in gap 00289-G-00290, while contig01245 is alternative to contig01425 in gap 00290-G-00291 (Figure 4.8). Since the two gaps are neighboring, scaffold129 can be incorporated into the two gaps of scaffold003 (Figure 4.8c). Another example of alternative scaffolds is scaffold095, consisting of contig01153 and contig01154. We found that gap 00974-G-00975 and gap 01153-G-01154 share the same bridging contig, contig10498 (Figure 79

4.11a); in addition, contig00973 can be bridged to either contig00974 or contig01154R (R means reverse orientation) (Figure 4.11a). Moreover, the read depth values of contig01153, contig01154, contig00974 and contig00975 were all around 30, which means they belong to only one strain; however, contig00973 has a read depth of 50.37. The alignment of contig00974 with contig01154R (Figure 4.11b) and the alignment of contig00975 with contig01153R (Figure 4.11c) further revealed traces of conservation. Based on these observations, it can be concluded that scaffold095 is alternative to contig00974 and contig00975, located at the 3’ edge of scaffold054 (Figure 4.11a). Another more complicated example is scaffold041, which consists of three contigs: contig00883, contig00884 and contig00885. It was found that scaffold041 was alternative to contigs located at the 3’ edge of scaffold003 (Figure 4.11d).

Figure 4.12 Schematic of the draft chimeric Dehalobacter genome from the ACT-3 metagenome. The major scaffolds and contigs are represented as straight lines with contig and scaffold IDs labeled; contigs shared by both strains are in blue; contigs specific to strain CF are in read; contigs specific to strain DCA are in green.

4.4.4 Order of scaffolds

Since the 3 alternative scaffolds (scaffold129, scaffold095 and scaffold041) can be incorporated into other scaffolds as described above, there were only 6 scaffolds left. Traditionally, to determine the order of these scaffold and sequences that fill the gaps between them, PCR reactions with primers chosen from the edges of scaffolds are required. Instead, by assuming gaps between any two scaffolds, we resolved the order of scaffolds and the gaps between scaffolds using the same gap-resolution process as for the gaps within scaffolds. Moreover, it was found that scaffold018 (with a read depth of ~ 37) is also a strain-specific scaffold. It is an alternative to contig01007, which is 32,699 bp long and has a read depth of 38.73. Using this progressive resolution of gaps within, and then between scaffolds, the overall Dehalobacter 80

genome structure was revealed as shown (Figure 4.12), which is a joint representation of the two Dehalobacter genomes. Further polishing was achieved by mapping raw reads back to this assembly.

4.4.5 Read mapping

Illumina sequencing of the CF subculture generated ~ 27 million mate-pair reads with read length of 76 bp and average insert size of ~ 647 bp. In this study, these raw sequencing reads were only used for the separation and polishing of the two Dehalobacter genomes by read mapping. The Illumina sequencing data provided average coverage of more than 500 x for the genome of strain CF. The process of read mapping was a vital component of the assembly strategy. It enabled the verification of proposed gap solutions, the separation of the two genomes, and genome polishing. Perfect agreement between raw reads and the reference genome of strain CF (Figure 4.4b) demonstrated the accuracy and effectiveness of our gap-resolution strategy. The separation of the two Dehalobacter genomes was achieved by successively mapping the Illumina data from the CF subculture as described in “Materials and Methods”. For genome polishing, the genome of strain CF was first polished by mapping short-insert (~ 647 bp) Illumina reads pairs, followed by mapping long-insert (~ 8.6 kb) 454 read pairs. It was found that mapping with short-insert Illumina read pairs could not resolve all ambiguities, especially those derived from sequence variations among multiple copies of a multi-copy sequence. These recalcitrant ambiguities were then fixed by mapping long-insert 454 read pairs, proving the importance of long-insert mate-pair data in genome polishing.

4.4.6 Recalcitrant gaps

Even after the process of read mapping, assemblies in some regions remained problematic. Here are three such examples. The first example is gap 00229-G-00230, which was caused by a continuous repetition of oligonucleotides (Figure 4.13). Although the presence of such a self-repeated sequence in this gap can be concluded, the size of the repetitive region (> 450 bp) can not be determined with 454 or Illumina data. The only way to completely resolve this type of gap is to have a read long enough to cover the whole repetitive region. This gap was eventually resolved by additional PCR amplification and Sanger sequencing.

81

Figure 4.13 Gap 00229-G-00230. (a) The sequence alignment of related contigs. (b) The dot plot of the consensus sequence from (a) against itself.

The second example is two gaps caused by tandem copies of multi-copy sequences: 00239-G-00240 and 00285-G-00286. Mapping with read pairs is very effective to determine if a sequence is present in a gap; however, it is ineffective for determining if the sequence exists in more than one copy in the gap. To determine if such repetitive elements exist in tandem copies, a simple test can be performed in silico (Figure C2): two copies of a repetitive element in the same orientation were concatenated with a piece of poly-N (50 bp) in between; then Illumina read pairs were mapped against this artificial sequence (Figure C2). If read pairs spanning the region of polyN can be recovered with correct distances, this repeat must exist in the tandem copies somewhere in the genome. In this way, we identified two repetitive elements that existed in tandem copies: they are the transposable genes related to two multi-copy contigs, contig01504 82

and contig01532 (Table 4.2). By careful investigation of these two contigs, we recovered the real sequences that covered the polyN region connecting the tandem copies. Coincidently, we found that two gaps caused by these two multi-copy contigs, 00239-G-00240 and 00285-G-00286, had their gap distances underestimated initially (Figure 4.10). For example, if one single copy of contig01504 existed in the gap of 00239-G-00240, the gap distance should be 1106 bp, which is significantly smaller than the gap distance predicted from mate-pair constraints, 3111bp. But if two tandem copies are there, the gap distance becomes 3004 bp, which is consistent with the predicted length. Based on these observations, we concluded the existence of tandem repeats in gap 00239-G-00240 and gap 00285-G-00286. This conclusion has been further confirmed with additional PCR reactions (Table C1).

The last example is three gaps caused by the complicated combination of multi-copy contigs: 00310-G-00311, 00314-G-00228 and 00268-G-00269. Gap 00310-G-00311 is within scaffold004; 00314-G-00228 connects scaffold004 and scaffold002; 00268-G-00269 connects scaffold002 and scaffold003. The complicated scenarios of these three gaps stem from the fact that each gap harbored a ribosomal RNA operon, which contains one copy of each of the 5S, 16S and 23S ribosomal RNA genes. While current in silico gap resolution revealed the overall structure (Figure 4.14), ambiguities remained in 01315-G-05122 and 01997-G-01504 (Figure 4.14). Three options were found to close 01315-G-05122, while two options were found for 01997-G-01504. Interestingly, these multiple options to resolve the gaps did not stem from species polymorphism but variations related to multi-copy sequences within the genome. Such ambiguities were located between long multi-copy contigs, that is, beyond the unique mapping coverage of short-insert read pairs. Long-insert 454 read pairs have the potential to solve these ambiguities; however, our trials on this direction did not generate satisfactory results, probably due to the limited coverage (~ 11x) of 454 long-insert mate-pair data. Finally, we had to turn to long-distance PCR amplifications, followed by sequencing with primers targeting the regions of ambiguities. Notably, although current strategy of in silico gap resolution failed to resolve all uncertainties in these three gaps, it uncovered the overall layout correctly, which dramatically simplified additional sequencing efforts.

83

Figure 4.14 The incomplete resolution of three gaps in which 5S, 16S, and 23S rRNA genes locate. Straight lines represent contigs with contig IDs (top) and average read depth (bottom) indicated. The three lines between contig01315 and contig05122 indicate three potential connections between them; and the two lines between contig01504 and contig01997 indicate two potential connections.

4.4.7 PCR verification

The solutions of 19 gaps (mostly from Groups B and D) caused by the presence of long multi-copy contigs were chosen for further verification with PCR reactions. These gaps were caused at least partially by the presence of one of the three multi-copy contigs: contig01388, contig01504 and contig01532. Positive amplifications with expected amplicon size were achieved in all PCR reactions (Table C2), except those for the two gaps, 00239-G-00240 and 00285-G-00286, in which tandem copies of multi-copy sequence were expected. Surprisingly, for both of these two gaps, two amplicons with correct size corresponding to single copy and tandem copies were amplified simultaneously. Similar PCR reactions were designed to confirm the solutions to three gaps in Group A; amplicons with expected size were also obtained for them (Table C2). The amplicons of all 22 gaps were further sequenced using Sanger sequencing: for 21 gaps of them, Sanger sequencing generated sequences that match the expected amplicons derived from the assembled genome. Gaps caused by subspecies polymorphism were not chosen for PCR verification because we believe the solutions to these gaps had been solidly verified in the process that was used to separate the two Dehalobacter genomes: in every position where two alternative choices coexisted in the ACT-3 metagenome, only one of them existed in the CF metagenome.

84

4.4.8 The two Dehalobacter genomes

The finished genome of strain CF is 3,092,048 bp long, and the finished genome of strain DCA is 3,069,953 bp long. These two genomes share an identity of 90.6% in DNA sequence (Figure 4.15). The major differences between the two genomes relate to insertions or deletions of large DNA sequences (such as contig00254, ~ 65 kb) and transposable elements, and to alternative contigs or scaffolds (Figure 4.15). The absence of significant recombination of large genome sequences and the limited amount of SNPs indicating the close relationship of these two Dehalobacter genome. These are the first two genomes of Dehalobacter. Further analysis and annotation of these two genomes is underway (manuscript in preparation). Two sequences and initial annotations of these two genomes are provided as supplemental files.

Figure 4.15 The alignment of the two Dehalobacter genomes: strain CF and strain DCA.

4.4.9 Testing re-assembly of a published genome

To test the applicability of this approach to other genomes or metagenomes, we attempted to assemble the raw sequence data of the published genome of B. salanitronis and were able to assemble a genome that agrees well with the previously published genome (Figure 4.16) that was closed with substantial additional wet lab work. Out of the total length of ~ 4.24 Mb, the variations between the two assemblies consisted of 119 SNPs and two insertion/deletion regions. The two insertion/deletion regions are highlighted in Figure 4.16 and annotated as Region 7 and Region 8. For Region 7, our strategy failed to identify a solution and additional sequencing would be necessary for closure. For Region 8, our strategy suggested the presence of tandem copies of a sequence, while the published assembly has one copy. Read mapping with 454 reads agrees with the existence of two tandem copies (data not shown). Moreover, the existence of the tandem repeats explains why the assembly broke at gap 8. Thus our strategy may actually correct an error in the published genome.

85

Figure 4.16 Alignment of the published assembly versus the new (this study) assembly of the B. salanitronis genome. The positions of assembly gaps caused by the 6 copies of the rRNA operons are indicated as Region 1-6. Region 7 and 8 indicate the two large regions of disagreement.

This Bacteroides genome has six copies of ribosomal RNA operons resulting in six large assembly gaps (Region 1 to Region 6, Figure 4.16). The resolution of these gaps was challenging just as what we found for similar gaps in Dehalobacter genomes. Although we cannot resolve all ambiguities in these gaps, we managed to uncover the overall layout of bridging contigs, which will simplify additional PCR and sequencing efforts. Many of the ambiguities in these gaps showed up as SNPs when the current assembly is compared to the published assembly. 41 out of 119 SNPs belong here. Of the rest 78 (=119-41) SNPs, 31 were caused by disagreements on the length of monopolynucleotides. Re-examination of these 31 SNPs using read mapping with Illumina raw reads showed that we made the correct call for 30 of them and the wrong call for only one, located in a region of low coverage. Therefore, our polishing method resolves errors in mono-polynucleotides more effectively. Re-examination of the rest 47 (78-31) SNPs using read mapping with 454 long-insert read pairs showed that most of the choices we made in these SNPs were supported by the long-insert 454 mate-pair data. As discussed earlier, mapping with long-insert mate-pair data is critical for genome polishing; however, this was not included in the current version of the published assembly. Except the 41 SNPs located in the 6 gaps caused by ribosomal RNA genes, the other 78 SNPs resulted from the different polishing processes used. In the published assembly, the authors used Polisher (Lapidus et al., 2008) to polish the assembled genome with Illumina reads (Gronow et al., 2011), while we used read mapping with both Illumina and 454 mate-pair reads.

86

In summary, the published genome required 193 additional PCR reactions and 4 shatter libraries to close the gaps after the application of GapResolution (Gronow et al., 2011). Using our approach, we were able to generate a reliable assembly that only requires additional efforts to resolve uncertainties in 7 regions in the genome (Region 1-7, Figure 4.16). Also, our polishing method appears to have better performance for this genome.

4.5 DISCUSSION

Metagenome sequence assembly is challenging. Satisfactory assemblies have only been reported with sequences from microbial samples from extreme environments (Simmons et al., 2008) or enriched laboratory cultures (García Martín et al., 2006; Ettwig et al., 2010) that tend to have simpler community structure. However, even in these cases, the assemblies suffer from subspecies polymorphism (García Martín et al., 2006; Simmons et al., 2008). In this study, the ACT-3 metagenome was dominated by two highly similar Dehalobacter strains, resulting in severe fragmentation in the assembly produced by Newbler: subspecies polymorphism factored in about half of the assembly gaps in these Dehalobacter genomes. As with most genome assemblers, Newbler cannot address problems caused by subspecies polymorphism.

We propose a straightforward and effective solution that uses pre-assembled contigs to bridge gaps. The incorporation of read mapping for verification significantly increased the robustness and accuracy of the process. This strategy resulted in the in silico resolution of nearly all gaps caused by repetitive elements and subspecies polymorphisms in the de novo assembly of the two Dehalobacter genomes. Its power and general applicability were further demonstrated in the re-assembly of a published Bacteroides genome. Using pre-assembled contigs as building blocks, this strategy was not restricted to a particular sequencing technology or sample type (metagenomic or genomic). However, in searching for overlapping contigs, the length of the overlap often depends on sequencing technology and the program used to generate contigs. When Velvet (Zerbino and Birney, 2008) and ALLPATHS (Butler et al., 2008) are used with Illumina reads, the maximum overlap between contigs is shorter than the K-mer size, which is shorter than the Illumina raw reads (20-150 bp). But in the assembly of short-read mate-pair data, contigs assembled with ABySS (Simpson et al., 2009) tend to have much longer overlap. Our in silico gap-resolution strategy worked well when ABySS was used for contig assembly and SSPACE (Boetzer et al., 2011) for scaffolding.

87

Recent studies address specific challenges in metagenomic assembly. Genovo (Laserson et al., 2011) was designed to differentiate read variations caused by sequencing noise from those caused by true biological variation, which enabled better assembly, especially for the organisms in low abundance. Bambus 2 (Koren et al., 2011) was designed to overcome challenges in scaffolding caused by subspecies polymorphism. Iverson et al. (Iverson et al., 2012) improved metagenomic scaffolding by considering not only mate-pair information, but also nucleotide composition and read-depth distribution. Meta-IDBA (Peng et al., 2011) proposed to capture the slight variations within closely related strains by multiple sequence alignment and represent the sequence of one species by condensing the alignment. Gaps caused by subspecies polymorphism prevail in metagenomic assembly: in current study, a simple scenario in which only two interfering genomes coexisting was encountered. However, ~ 40% of all gaps were related to subspecies polymorphism. Our gap-resolution approach proposed “partial” solutions to these gaps: representing alternative choices in multiple sequence alignment. Obviously, further condensation of such an alignment as occurs with Meta-IBDA (Peng et al., 2011) risks creating reading frame shifts and introducing more confusion; therefore, multiple sequence alignment appears a better solution to these gaps. Using multiple sequence alignment to resolve the gaps caused by subspecies polymorphism, our gap-resolution approach assembled all Dehalobacter contigs in the ACT-3 metagenome (two Dehalobacter strains) into a closed assembly. If the CF metagenome data were not available, this closed assembly is still a dramatic improvement over the initial set of contigs and scaffolds assembled by Newbler. Therefore, the proposed gap-resolution approach is not only useful in our special case; rather it has the potential of improving the assembly of other all metagenomic data sets containing two or more interfering or closely related strains. However, to resolve individual genomes and assign alternative sequences in the gaps caused by subspecies polymorphism, additional information was required, such as a genome or metagenome of an individual strain (the CF metagenome in our case). This technique could be particularly useful if genome sequence from DNA amplified from single cells sorted out of a metagenome were available, for example.

Three standalone in silico gap-resolution programs, IMAGE (Tsai et al., 2010), GapResolution (http://www.jgi.doe.gov/software/) and GapFiller (Boetzer and Pirovano, 2012), have been reported previously to improve preliminary assembly of single genomes. Unlike our strategy, these programs close assembly gaps by extending the neighboring contigs using mate-pair raw

88

reads. Therefore, the resolution of a gap using these programs is subject to the coverage of mate-pair data over the gap. Most importantly, being designed to close gaps in the context of a single genome, these two programs cannot resolve gaps that have non-unique solutions, especially those caused by subspecies polymorphism. Our strategy relies on sequence overlapping between contigs to close gaps and then uses mate-pair data and read mapping for assembly verification. The closure of a gap typically requires the alignment of a small number of contigs, but would require thousands of raw reads. Therefore, dealing with contigs is computationally more efficient than with raw reads. While our gap resolution approach has yet to be fully automated (it relies on manual inspection, consideration of read-depth, and analysis and adjustment of the alignment of the overlapping contigs), it can resolve most gaps of a regular bacterial genome in silico and accurately. The re-assembly of the published Bacteroides Genome was done in a few days. The automation of this gap-resolution process is a focus of further work, but will require iterative user-input in the selection of alternative assemblies.

In this study, we also explored the possibility of assembling complete bacterial genomes using second-generation sequencing data only. With genome polishing and most gap resolution work completed in silico, we only failed in the resolution of four recalcitrant gaps. We observed that short reads were not a problem except for those gaps caused by multiple repetitions of a motif (Figure 4.13). Many sequencing facilities offer mate-pair (or paired-end) sequencing of short-insert (< 1kb) libraries because they are easy to prepare and thus low cost. However, the value of mate-pair sequencing of a long-insert library cannot be overemphasized. Because paired-end read data are used by most scaffolding algorithms (Koren et al., 2011) the maximum distance that can be overcome will be less than the distance between paired reads. This agrees with our results: all gaps within scaffolds were less than 8kb (Figure 4.10). Therefore, to a certain extent, the performance of scaffolding increases with the size of the insert in the mate-pair library. In addition to the benefits in long-distance scaffolding, long-insert mate-pair data are indispensable for accurate genome finishing and polishing.

4.6 CONCLUSIONS

Dehalobacter are strictly anaerobic, organohalide-respiring bacteria that reductively dechlorinate and detoxify common groundwater contaminants; they are important in the global chlorine cycle and for remediation of contaminated sites. Using a new gap-resolution strategy, the first two

89

genomes of Dehalobacter were assembled from the metagenomes of two enrichment cultures. The strategy resolves gaps using pre-assembled contigs followed by verification with read mapping. Designed to make full use of existing sequencing data, this new strategy successfully resolved in silico nearly all gaps caused by repetitive sequences and/or subspecies polymorphism; only four additional PCR reactions and amplicon sequencing were required to clarify ambiguities. This strategy can be used to enable the accurate assembly of single genomes and metagenomes, and substantially reduce additional efforts required in the web lab for genome finishing.

4.7 ACKNOWLEDGEMENTS

We thank Ariel Grostern and Melanie Duhamel for establishing the three mixed cultures used in this study. We thank Da Zhang for the help in assembly verification using PCR and additional Sanger sequencing. We thank Michael Brudno (Computer Science, University of Toronto), Tanja Woyke, Patrick Chain, and Alex Copeland (Joint Genome Institute, CA) for valuable discussions.

90

Chapter 5 Sister Dehalobacter Genomes Revealed Specialization in Organohalide Respiration and Strain Differentiation Driven by Chlorinated Substrates

5.1 ABSTRACT

Dehalobacter spp. are anaerobes involved in organohalide respiration of a variety of chlorinated organic compound. In this study, two highly similar and recently differentiated Dehalobacter genomes (strain CF and strain DCA) were described. The recent differentiation of these two genomes appears driven by the differentiation of two orthologous reductive dehalogenase genes (rdhA), cfrA and dcrA, which have similar but distinct substrates. These two genomes represent an early stage of strain differentiation, in which genes related to organohalide respirationevolve faster than other metabolic genes, and horizontal gene transfer and insertion sequence transposition exert major influences on shaping the genome landscape. These two Dehalobacter genomes were compared to another two Dehalobacter genomes (strain PER-K23 and strain E1) released recently. In each genome, a large proportion of rdhA genes (10/17 in strain CF and DCA, 21/25 in strain PER-K23 and 4/10 in strain E1) locate in two regions conserved in all four genomes. These two rdhA clusters have undergone rapid evolution, but no sequence evidence supporting site-specific horizontal gene transfer was identified. The fact that no genes encoding oxidoreductases catalyzing electron-accepting reactions other than reductive dechlorination were identified indicates that these four Dehalobacter genomes are specialized for organohalide respiration. The presence of a complete Wood-Ljungdahl pathway and of genes involved in mixed acid fermentation suggests that they evolved from a more versatile ancestor. The evolution of specialization is possibly still ongoing as two of the strains (strain PER-K23 and strain E1) seem to have lost essential genes in the Wood-Ljungdahl and corrinoid biosynthesis pathways. Both of these pathways are absent in Dehalococcoides, a well-known obligate dechlorinator with a small and very streamlined genome.

91

5.2 INTRODUCTION

Chlorinated organics including chlorinated ethenes, ethanes and aromatics are common groundwater contaminants (De Wildeman and Verstraete, 2003; Löffler and Edwards, 2006). A viable and already commercialized technique for the detoxification of these chlorinated organics is via bioremediation utilizing bacteria capable of organohalide resipiration (Major et al., 2002; Lendvay et al., 2003; Ward and Stroo, 2010). Organohalide-respirating bacteria are phylogenetically diverse, including Dehalococcoides, Geobacter, Sulfurospirillum, Desulfitobacterium, Dehalobacter, etc (Löffler and Edwards, 2006). Dehalobacter spp. (Phylum Firmicutes) are among the organisms that are most commonly found in groundwater contaminated with chlorinated organics. Dehalobacter spp. dechlorinate many chlorinated organics, including chlorinated ethenes (Wild et al., 1996; Holliger et al., 1998), chlorinated ethanes (Sun et al., 2002; Grostern and Edwards, 2006b, a, 2009; Grostern et al., 2010), chlorobenzenes (Nelson et al., 2011), and others (Schlotelburg et al., 2002; van Doesburg et al., 2005; Yoshida et al., 2009b, a).

To better understand the Dehalobacter genus, we sequenced a Dehalobacter-dominated mixed culture that can dechlorinate chloroform (CF), 1,1,1-trichloroethane (1,1,1-TCA) and 1,1-dichloroethane (1,1-DCA). Previously, we reported the de novo assembly of the first two complete Dehalobacter genomes, strain CF and strain DCA, from the metagenomic data (Tang et al., 2012). In this paper, we report the analyses of these two genomes and compare them to two other Dehalobacter genomes released recently: Dehalobacter restrictus strain PER-K23 (a complete genome) (Rupakula et al., 2013) and Dehalobacter sp. strain E1 (a draft genome) (Maphosa et al., 2012). All four of these Dehalobacter strains are capable of organohalide respiration: strain CF dechlorinates CF to dichloromethane (DCM) and 1,1,1-TCA to 1,1-dichloroethane (1,1-DCA) (Tang et al., 2012), strain DCA dechlorinates 1,1-DCA to monochloroethane (CA) (Tang et al., 2012), strain PER-K23 dechlorinates tetrachloroethene (PCE) to cis-dichloroethene (Holliger et al., 1998) and strain E1 can dechlorinate β-hexachlorocyclohexane (β-HCH) to benzene (van Doesburg et al., 2005). The identity between 16S rRNA genes from these four strains is greater than 99%.

The analyses and comparison of these four genomes shed light on the evolution of Dehalobacter. The high similarity (~90% nucleotide identity) between the two genomes of strain DCA and

92

strain CF indicates that they were differentiated recently and this process was likely driven by the differentiation of two functional, orthologous reductive dehalogenase genes (rdhAs), cfrA and dcrA. Two clusters of rdhA genes were found in two hypervariable regions conserved in all four genomes. Dehalobacter genomes likely have a higher degree of genome plasticity due to the presence of a large number of intragenomic repeats, including ~ 70 insertion sequences and 3-4 copies of rRNA operons in each genome. The comparison between the genome of strain CF and strain PER-K23 revealed two genome rearrangement events catalyzed by inverted repeats. The presence of genes involved in Wood-Ljungdahl pathways and mixed acid fermentations suggests that Dehalobacter evolved from a more versatile ancestor. Although strain CF and DCA have all the genes required for complete Wood-Ljungdahl and corrinoid biosynthesis pathways, strain PER-K23 has an incomplete corrinoid biosynthesis pathway and both pathways are incomplete in strain E1. Interestingly, all genes missing in these two pathways were the result of sequence deletion. These foundings suggest that strain PER-K23 and strain E1 might be shaping their genomes towards more function loss.

This first detailed analysis of these four Dehalobacter genomes enables a full picture of this genus. The presence of a large reservoir of rdhA genes and the absence of other oxidoreductases catalyzing electron-accepting reactions in respiration show that these Dehalobacter strains are highly specialized in organohalide respiration. Based on gene annotation, all have an incomplete TCA cycle and genes required for the biosynthesis of all 20 amino acids and several vitamins, including menaquinone, riboflavin, nicotinamide adenine dinucleotide, folate, pantothenate, pyridoxal phosphate; only the regular biosynthesis pathways of thiamin and biotin appears missing. The corrinoid biosynthesis pathway in strain CF and DCA also appears complete. Overall, these four Dehalobacter strains appear similar to Dehalococcoides in terms of energy metabolism, but are more similar to Desulfitobacterium in terms of other metabolic functions.

5.3 METHODS

The assembly of the two Dehalobacter genomes of strains CF and DCA was reported in a previous publication (Tang et al., 2012). The gene annotation of strain CF was performed with two automatic genome annotation pipelines: RAST (Aziz et al., 2008) and IMG-ER (Markowitz et al., 2009), separately. The subsequent results from the two annotation pipelines were compared and combined with inconsistencies resolved by manual curation. Some annotations

93

were manually refined based on the analyses of sequence homology and genome context. The genes of strain DCA were first annotated with RAST, and those sharing high identity (50% amino acid identity) with genes in strain CF were examined and curated to keep consistency, if needed. The annotation of strain PER-K23 was retrieved from IMG (http://img.jgi.doe.gov/) with Taxon Object ID of 2510065016 (Rupakula et al., 2013). The draft genome of strain E1, consisting of 102 contigs, was retrieved from GenBank with the accession number of CANE00000000. Genes of the draft genome were identified with Glimmer 3 (Delcher et al., 2007), accessed through Geneious pro v. 6.1.4 (Drummond et al., 2011). The annotation of some genes of interest in strain E1 was performed by manual BLASTP against NCBI databases. Whole genome alignment between strains CF, DCA and PER-K23 was performed by the Mauve alignment (Darling et al., 2010) in Geneious pro. DNA sequence alignments of large genome regions (containing multiple genes) were extracted from the results of Mauve alignment using the option of “Extract Mauve Regions” in Geneious pro; Default settings were used in most cases. Multiple sequence alignments of short DNA sequences, such as single genes, were performed by MUSCLE (Edgar, 2004) within Geneious pro. Genome circular maps were created with BRIG (Alikhan et al., 2011). DNA sequence repeats were identified with Repseek (Achaz et al., 2007) and inverted repeats were identified with Inverted Repeat Finder (Warburton et al., 2004). Orthologs between Dehalobacter genomes were identified with reciprocal BLASTP with e-value of e-10. Orthologs between Dehalobacter sp. strain CF, Desulfitobacterium sp. strain Y51, and Dehalococcoides mccayi strain 195 were identified with reciprocal BLASTP with e-value of e-5.

5.4 RESULTS AND DISCUSSION

5.4.1 General genome features

The three complete Dehalobacter genomes, strain CF, DCA and PER-K23, are similar in size (~ 3.0 Mb) and G+C content (44-45%) and comprise ~ 2900 coding sequences (CDS) (Table 5.1). The draft genome of strain E1 consists of 102 contigs and has similar properties (Table 5.1). Notably, because this draft genome was assembled from 454 pyrosequencing data with ~ 13× genome coverage (Maphosa et al., 2012), it can be expected that long interspersed repeats, such as multi-copy transposable elements and ribosomal RNA (rRNA) genes, are not well represented in the draft genome. 94

Table 5.1 General features of the four Dehalobacter genomes. Strain Strain Strain Strain CF DCA PER-K23 E1a Genome size (bp) 3092048 3069953 2943336 ~2.6Mb G+C content (%) 44.3 44.6 44.6 43.8 Protein coding genes 2980 2978 2826 2587b rRNA operon 3 3 4 N/A tRNA 51 51 52 55b Genes with function prediction 2072 2014 2168 N/A Genes with COGs 2167 2174 2127 N/A Genes with KEGG pathways 749 751 740 N/A Transmembrane protein genes 724 739 755 N/A Insertion sequences 68 71 69 N/A rdhA genes 17 17 25 10b Chlorinated substrates 1,1,1-TCA, CF 1,1-DCA PCE β-HCH Active rdhA genes (locus_tag) cfrA dcrA pceA N/A (DCF50_p1247) (DHBDCA_p1180) (Dehre_2398) Genome accession number CP003870c CP003869c 2510065016d CANE00000000 c a Genome Strain E1 currently exists as a draft genome consisting of 102 contigs. b Data were adapted from a previous report (Maphosa et al., 2012). c GenBank accession number. d IMG taxon object ID. “N/A” indicates not available.

Strains CF and DCA have three copies of the rRNA operon (5S, 16S and 23S), while strain PER-K23 has four (Table 5.1). On a 16S rRNA sequence basis, strains CF and DCA cluster phylogenetically, as do strains PER-K23 and E1 (Figure D1), consistent with a phylogenetic tree based on an alignment of concatenated orthologous genomic regions (Figure 5.1), and consistent with these strains’ two geographical origins. Strains CF and DCA are both from an enrichment culture derived from a contaminated site in northeastern United States (Grostern and Edwards, 2006b), while strain PER-K23 and E1 were both enriched from samples from the Netherlands (Holliger et al., 1993; Maphosa et al., 2012). Specifically, the three rRNA gene operons in the genome of strain CF (or DCA) differ in both the transcribed spacer regions and in the 16S rRNA genes (Figure 5.2 and Table 5.1). Two 16S rRNA genes, DCF50_r5 and r22, share an insertion (128 bp) located 86 bp from the 5’ end (Figure D1) compared to the third gene, DCF50_r32. Strain PER-K23 has four rRNA gene operons, and the four 16S rRNA genes differ by 3 nucleotides at most. Two of them, Dehre_0224 and Dehre_0342, are identical to a known 16S 95

sequence from strain E1 (AY766465) and share an identity of 99.4% to DCF50_r32, the 16S rRNA gene without the insertion in strain CF (Figure D1). Sequence insertions were also found in some 23S rRNA genes in the three complete Dehalobacter genomes (data not shown). Insertions in 16S rRNA genes have been reported in some Desulfitobacterium strains (Villemur et al., 2007); however, the biological meanings or causes of such insertions are unknown.

Figure 5.1 A maximum-likelihood phylogenetic tree of the four Dehalobacter strains. The tree was built based on the alignment of concatenated sequences (~ 0.6 Mb) consisting of five genomic regions from each genome. The alignment was generated using Mauve, and the tree generated using the PhyML plugin in Geneious under the substitution model of Jukes-Cantor 69 (JC69). Bootstrap support values (from 100 bootstrap iterations) are indicated where greater than 50%. The scale bar represents the average number of substitutions per site. This tree aims to illustrate the phylogeny of the four strains. The tree was built this way because sequence alignment of 16S rRNA genes of these four strains has poor resolution.

96

Figure 5.2 Genome circular map of Dehalobacter sp. strain CF.

The DNA replication origin of the genomes of strain CF and DCA (Figure 5.2) was predicted based on the transition of GC-skew and the presence of dnaA gene. A distinctive feature of these two genomes is the presence of two GC-skew arms of unequal size (Figure 5.2). In contrast, the genome of strain PER-K23 has two GC-skew arms of almost equal size. Similar observations seen in two Desulfitobacterium genomes were thought to relate to genome rearrangement events (Nonaka et al., 2006; Kim et al., 2012). We have found that the difference in GC-skew profile and overall synteny between strains CF and PER-K23 can be resolved with three genome rearrangement events (Figure D2): (i) a sequence inversion between two inverted rRNA gene operons; (ii) a sequence inversion between two inverted copies of an insertion sequence (IS), Dehre_0572 and Dehre_2499; and (iii) a translocation event, likely related to a DNA recombinase, Dehre_0459. The first two sequence inversion events were likely caused by

97

homologous recombination of inverted repeats (Treangen et al., 2009). Sequence inversion events catalyzed by inverted rRNA operons have been reported in Escherichia coli (Hill and Gray, 1988). Sequence evidence supporting the last two steps (the IS and the DNA invertase) were found only in the genome of strain PER-K23. However, the sequence inversion between inverted repeats of rRNA operons (the first step) could happen in either genome as these rRNA operons are shared by both. This event is more likely to have happened in the genome of strain CF (or DCA) because the reversion of this event in the genome of strain CF resulted in two GC-skew arms with similar size (Figure D2).

If GC-skew inconsistency is related to genome rearrangements, the Dehalobacter genomes and Desulfitobacterium genomes do have higher tendencies towards genome plasticity due to the presence of a large number of transposable elements (TE), especially ISs (Figure 5.2 and Table 5.1). We have identified 68 ISs, consisting of 27 unique ISs, from the genome of strain CF (Table D1): 14 unique ISs exist in more than one copy in the genome; an extreme case is an IS that has nine copies (e.g. DCF50_p264) in the genome. Strain CF and strain DCA share all unique ISs, but the copy numbers of some vary, revealing many strain-specific transposition events that happened after the differentiation of the two strains (Figure 5.3). Strain PER-K23 has 69 ISs but only has five unique ISs shared with strain CF (Table D1). Compared with Dehalobacter genomes, Dehalococcoides genomes do not have as many ISs. And this difference cannot be fully explained by genome size since a recently released Dehalogenimonas genome (~1.7 Mb) has more than 74 ISs (Siddaramappa et al., 2012). Previously regarded as junk DNA, TEs are now known to have positive, neutral and negative effects on the host (Rebollo et al., 2012). Some researchers believe that TEs are generally beneficial to the whole population despite deleterious effects on individuals (Rebollo et al., 2012). The comparison of the complete genomes of strain CF, strain DCA and strain PER-K23 reveals many strain-specific transposition events and some of them clearly show gene disruption (data not shown). However, most TEs detected in these four Dehalobacter genomes reside in non-coding regions and their effects cannot be evaluated without knowledge of gene regulation of the host genes. Since TEs typically exist as interspersed repeats (direct or inverted) in a genome, they are hot genome rearrangement sites due to homologous recombination. We have found such an example: two inverted copies of an IS in strain PER-K23 (Dehre_0572 and Dehre_2499) are likely responsible for a sequence inversion event (Figure D2).

98

Figure 5.3 Whole genome alignment of strains CF and DCA. Sequence discrepancies are highlighted in black (compared to light grey) in the aligned sequences; this is a feature common to all sequence alignments presented in this paper. Blue blocks highlight three major regions of sequence variations: A1/A2, B1/B2, and C1/C2, with size and G+C content attached in braces. Red triangles represent strain-specific ISs. Two arrows show the loci of cfrA and dcrA.

5.4.2 Recent differentiation between strains CF and DCA

Dehalobacter strains CF and DCA are two dominant organisms coexisting in an enrichment culture called ACT-3, which grows on 1,1,1-TCA (as an electron acceptor) and organic electron donors (Tang et al., 2012; Tang and Edwards, 2013). In this culture, strain CF dechlorinates 1,1,1-TCA to 1,1-DCA using an RDase called CfrA; and then strain DCA dechlorinates 1,1-DCA further to CA using another RDase called DcrA. CfrA also enables strain CF to dechlorinate CF to DCM (Tang and Edwards, 2013). These two RDases are highly similar to each other, sharing an amino acid identity of 95.2% and a nucleotide identity of 97.8%, and have similar functions but no known common substrates (Tang and Edwards, 2013). These two RDases were identified in a previous study and were the only two RDases found expressed in ACT-3 amended with 1,1,1-TCA (Tang and Edwards, 2013). Surprisingly, the genomes of strain CF and DCA are also highly similar to each other: 90% nucleotide identity based on Mauve (Darling et al., 2010) alignment (Figure 5.3). There have been no genome rearrangements between the two genomes and most orthologous genes are simply identical, indicating that the two genomes were recently differentiated.

99

Figure 5.4 Sequence alignment of the gene neighborhoods of cfrA and dcrA. All CDSs are indicated as directional blocks in different colors: rdhA genes (yellow), rdhB genes (green), pceC-like genes (purple), crp/fnr transcriptional regulators (red), ISs (light blue) and others (gray).

Strain CF has 17 intact rdhA genes and only two differ from their orthologs in strain DCA. DCF50_p1199 differs from its ortholog in strain DCA, DHBDCA_p1132, in six amino acids. It is not closely related to any other rdhA gene of known function. Neither DCF50_p1199 nor DHBDCA_p1132 were found expressed in ACT-3 (Tang and Edwards, 2013). cfrA (DCF50_p1247) is the gene encoding the RDase CfrA. Its ortholog in strain DCA is the dcrA gene (DHBDCA_p1180) encoding DcrA. CfrA and DcrA differ in 22 amino acids. cfrA resides in a gene cluster (Figure 5.4) that has cfrB (an rdhB gene), cfrC (pceC-like), and cfrK (a crp/fnr transcriptional regulator). These genes are commonly found associated with rdhAs and are likely related to organohalide respiration (Häggblom and Bossert, 2003). Between cfrC and cfrK, there is a gene (DCF50_p1243) annotated as thiamin biosynthesis lipoprotein ApbE, which might be involved in the maturation of iron-sulfur clusters (Skovran and Downs, 2003). Downstream of cfrK, there is one gene annotated as cobalamin methyltransferase. These two genes might be related to rdhA genes, which typically have two iron-sulfur clusters and one corrinoid as cofactors. In strain DCA, dcrA resides in a gene neighborhood orthologous to that of cfrA (Figure 5.4). Although most orthologous genes between the two genomes are identical, relatively intensive sequence variations were found between the cfrA and dcrA gene neighborhoods (Figure 5.4). Most of the variations exist in single nucleotide polymorphisms (SNP) except for an IS insertion (DHBDCA_p1176) in the dcrA gene cluster resulting in the disruption of the crp/fnr transcriptional regulator (Figure 5.4). Notably, sequence variations between cfrA and dcrA gene clusters are located only in rdhA genes and downstream genes, but not in the upstream (Figure 5.4); this indicates that these variations are not randomly distributed but were likely selected for because the mutated genes affect the fitness of the host.

100

Based on these results, we propose a mechanism for the differentiation of these two strains. The original contaminant at the site from which the ACT-3 culture was derived was 1,1,1-TCA (Grostern and Edwards, 2006b). Therefore, it seems likely that the last common ancestor of strain CF and DCA was more similar to strain CF and expresses CfrA, which enabled its host to transform 1,1,1-TCA to 1,1-DCA. The accumulation of 1,1-DCA, which is equally useful in respiration, aroused the evolution of a new strain, strain DCA. A few mutations in the cfrA gene resulted in a new rdhA gene, dcrA, which allowed the new strain to grow on 1,1-DCA. Co-evolution of other functionally related genes downstream of cfrA/dcrA gene allowed strain DCA to function better on 1,1-DCA. This strain differentiation event enabled the whole Dehalobacter population to harvest more energy from 1,1,1-TCA.

5.4.3 rdhA gene clusters

Similar to Dehalococcoides genomes, Dehalobacter genomes are big reservoirs of rdhA genes: the numbers of rdhA genes found in the genomes of Dehalobacter strains E1, CF, DCA and PER-K23 are 10, 17, 17 and 25, respectively. In Dehalococcoides genomes, most rdhA genes cluster together in two hypervariable regions, the formation of which seems related to the presence of hot recombination sites including some tRNA genes and the tmRNA gene, ssrA (McMurdie et al., 2009). Similarly, in each Dehalobacter genome, a majority of rdhA genes (21 in strain PER-K23, 10 in strains CF and DCA, and 4 in the strain E1 contigs) cluster in two small regions that are conserved in all four Dehalobacter genomes (Figure 5.2, Figure D3, Figure 5.5). These two regions were designated as rdhA cluster 1 and rdhA cluster 2 as shown in Figure 5.2. However, unlike the Dehalococcoides rdhA gene clusters, no sequence evidence supporting the presence of genomic islands was found. In the two Dehalobacter rdhA clusters, there were no direct repeats indicating recent insertion events, and no recombinases except transposases. There was no evidence showing that these transposases form composite transposons with rdhA genes, as seen in a transposon reported in Desulfitobacterium (Maillard et al., 2005). No tRNA genes or other hot recombination sites are located near these two rdhA clusters. The formation of these two rdhA clusters in Dehalobacter genomes seems unrelated to site-specific sequence recombination events or genomic islands as seen in Dehalococcoides genomes.

101

Figure 5.5 Sequence alignment between strain PER-K23 and strain CF or strain E1 focused on the two rdhA clusters. (a) The rdhA cluster 1, (b) The rdhA cluster 2. All CDSs are indicated as directional blocks in different colors: rdhA genes (yellow), rdhB genes (green), pceC-like genes (purple), crp/fnr transcriptional regulators (red), ISs (light blue) and others (grey). Pairwise undirectional blocks connected with lines representsintra-genomic homologous regions; size and nucleotide identity are noted in brackets; related regions are indicated in the same color. a E1 contig of CANE01000013. b E1 contig of CANE01000004.

Sequence duplication might play a role in the development of these two rdhA clusters. The 25 rdhA genes in strain PER-K23 have surprising low complexity. Phylogenetically, they can be classified into three groups, one of which has 16 tightly related members (Rupakula et al., 2013). Similar results were seen in the genomes of strain DCA and strain CF. Moreover, the most similar ones tend to locate closely. For example, there are 17 rdhA genes located in rdhA cluster 2 in strain CF and PER-K23 but all of them except Dehre_0793 belong to two distinct rdhA groups (Figure 5.5 and Figure D4). The presence of highly conserved intra-genomic homologous regions within rdhA cluster 2 (Figure 5.5 and Figure D4) explains low complexity among these rdhA genes within each genome. A similar scenario was observed in rdhA cluster 1 (Figure 5.5 and Figure D4). Notably, some intra-genomic homologous regions involve non-rdhA genes and non coding regions (Figure 5.5). The degree of identity between these intra-genomic homologous

102

sequences varies, with some of more than 90% nucleotide identity and some more than 4 kb long (Figure 5.5). Sequence duplication seems a good explanation for this phenomenon.

In these two rdhA clusters, strains CF and DCA do not differ, but strains CF, PER-K23 and E1 differ dramatically, showing that these two regions evolved rapidly. Almost all sequence variations in these two rdhA clusters can be explained by sequence insertion or deletion, except for a sequence inversion in the rdhA cluster 2 in strain E1 (Figure 5.5). If these variations are interpreted as sequence deletions, the last common ancestor of these three strains would have had many more rdhA genes than strain PER-K23. However, some sequence evidence definitely supports sequence deletion. For example, one of the 10 rdhA genes in strain E1 is a “fake” rdhA gene (in rdhA cluster 1, Figure 5.5a): half of the gene comes from Dehre_2017 (uroporphyrinogen-III decarboxylase CDS) and the other half from Dehre_2044, an intact rdhA gene. This “fake” rdhA gene must be the result of the sequence deletion of the regions between the two genes. If some of the sequence variations in these two rdhA clusters are interpreted as sequence insertions, this indicates horizontal gene transfer.

5.4.4 Horizontal gene transfer

Evidence for horizontal gene transfer of a Dehalobacter rdhA gene has been reported previously (Maillard et al., 2005). The pceABCT operon in Desulfitobacterium is located in the Tn-Dha1 transposon flanked by two direct copies of an IS. The high similarity of this operon with the pceABCT operon identified in Dehalobacter restrictus strain PER-K23 suggests horizontal gene transfer (Maillard et al., 2005). This pceABCT operon was conserved in the same genome context in strains CF, DCA and E1. However, considerable sequence variation in this operon exists between these Dehalobacter strains, indicating that this operon has been carried by Dehalobacter for some time (Figure D5). Interestingly, in the genomes of strain CF and DCA, this operon is located in a region flanked by two direct repeats (~400 bp each) and having other repeat patterns (Figure D5), indicating that the operon could have been acquired horizontally. These repeat patterns were not found in strain PER-K23 and E1 (Figure D5).

As shown in Figure D2, the reversal of three genome rearrangement events can make the two genomes of Strains CF and PER-K23 completely syntenous. Accordingly, most orthologous rdhA genes between the two genomes share both local and global synteny. All unique rdhA genes in strain PER-K23 reside in rdhA cluster 1 and cluster 2 (Figure 5.5 and Figure D4). Only 103

three rdhA genes unique to strain CF reside outside the two rdhA clusters: DCF50_p1979, p1981 and p1247 (cfrA). cfrA (or dcrA from strain DCA) resides in a hypervariable region that is poorly conserved between the strains of CF (or DCA), PER-K23 and E1 (Figure 5.2). In strain PER-K23, cfrA has a close relative (Dehre_0793) with 59% amino acid identity, but this gene is located in a different genome context, the rdhA cluster 2. Although cfrA and dcrA appear unique to strains CF and DCA, there is no sequence evidence showing that they were acquired by horizontal transfer.

Yet horizontal gene transfer does play an important role in shaping Dehalobacter genomes. As discussed earlier, the differentiation of strain CF and DCA appears driven by the differentiation of the cfrA and dcrA genes. However, major sequence variations between these two genomes are located in other regions (region A, B and C in Figure 5.2 and corresponding regions from each genome CF/DCA labelled A1/A2, B1/B2 and C1/C2 in Figure 5.3). Region A features a recent insertion of a 64 kb long fragment (A1) in the genome of strain CF. This region has a relatively low G+C content (40%) and most genes involved are unique to strain CF. The insertion is likely related to a phage integrase (DCF50_p74) located at the 3’ end of A1. In Region C, strain CF has many phage-related genes in sequence C1, which has a low G+C content (35%). Incorporation of C1 into strain CF is likely related to a site-specific recombinase (DCF50_p1382) located at the 5’ end of C1. On the other hand, sequence C2 from strain DCA is ~ 79 kb long and has a G+C content of 45.2%, similar to the average G+C content of the whole genome, indicating that C2 is native to Dehalobacter. The deletion of C2 and the insertion of C1 might be related. In region B, both B1 and B2 have a low G+C content, and poor sequence conservation was found between all four Dehalobacter genomes, indicating a hypervariable region. There is another big phage-related region shared by both strain CF and DCA, but not in strain PER-K23 and E1 (region D in Figure 5.2). Region D is ~ 43 kb long and many genes involved encode either phage-related or hypothetical proteins. The insertion of this region into the genomes of strain CF and DCA is likely related to a phage integrase located at the 3’ end of the region. The insertion event targeted tRNA-Thr (DCF50_r14) and resulted in a duplication (45 bp) at the insertion site, which contains a partial tRNA-Thr gene.

104

5.4.5 Metabolism

The analyses of these four Dehalobacter genomes provide insights into the organisms’ metabolism. Although phylogenetically more closely related to the more versatile Desulfitobacterium, all three Dehalobacter isolates previously reported are obligate organohalide-respiring bacteria (Wild et al., 1996; Holliger et al., 1998; Sun et al., 2002), more similar in metabolism to Dehalococcoides and Dehalogenimonas. Yet, recent reports of Dehalobacter strains growing on dichloromethane fermentation reveal that not all strains are obligate organohalide dechlorinators (Justicia-Leon et al., 2012; Lee et al., 2012). A genome size of ~ 3.0 Mb puts Dehalobacter between Dehalococcoides (~1.4 Mb) and Desulfitobacterium (~5.8 Mb), reflective perhaps of these differences. A detailed examination of the metabolic profile of Dehalobacter from annotations of the metabolic genes was carried out to reveal overall potential and specific differences between strains. Overall, these four Dehalobacter strains are highly similar based on gene annotation. Strains CF and DCA differed only in substrate preference conferred by the enzymes of CfrA and DcrA. Genes for major metabolic pathways from strain CF, grouped by pathway, are provided in Table D2 together with corresponding orthologs identified from Dehalobacter strains PER-K23 and E1, Desulfitobacterium hafniense strain Y51 and Dehalococcoides mccartyi strain 195. Relevant gene expression information was given based on a recent proteomic study of strain PER-K23 (Rupakula et al., 2013). The following discussion of metabolic pathways applies to all four Dehalobacter strains unless specifically noted. Only locus tags for strain CF are referenced in the text; corresponding orthologs in other genomes are provided in Table D2.

Energy metabolism. Strain PER-K23 is the first well-characterized Dehalobacter isolate and has been shown to be an obligate dechlorinator. Except rdhAs, no genes related to electron-accepting reactions in anaerobic respiration were identified in all four genomes. Considering the similarity between these four Dehalobacter genomes, it would not be surprising if all four are obligate dechlorinators. The lack of other electron-accepting reactions and the presence of a large number of rdhA genes indicate that these four Dehalobacter genomes are specialized in organohalide respiration. In dramatic contrast, although Desulfitobacterium is the closest genus to Dehalobacter, Desulfitobacterium genomes (Kim et al., 2012) encode a large number of other oxidoreductases, including trimethylamine oxide reductase, dimethyl sulfoxide reductase, nitrate reductase, arsenate reductase, thiosulfate reductase, etc. A variety of 105

hydrogenases and some formate dehydrogenases were identified in the four Dehalobacter genomes. For a better review of the protein complexes involved in energy metabolism in Dehalobacter, we refer the reader to Table D2 and an earlier report (Rupakula et al., 2013). Notably, cytochrome b is the only type of cytochrome annotated in Dehalobacter genomes; three Hup-type hydrogenases have a cytochrome b subunit (DCF50_p1690, p1851 and p2831). This is consistent with a previous study showing that cytochrome b is the only cytochrome found in the cell biomass of strain PER-K23 (Holliger et al., 1998).

Figure 5.6 Central carbon metabolism. Enzymes in the numbered reactions are listed below with gene locus_tags from strain CF: 1. pyruvate kinase, p653; 2. pyruvate, phosphate dikinase, p2541; 3. formate dehydrogenase, p924-927; 4. formate hydrogen lyase, p760-766; 5. acetate:CoA ligase (AMP-forming), p435; 6. pyruvate formate-lyase, p2283 + p2284; 7. pyruvate-flavodoxin oxidoreductase, p269 or p2740; 8. pyruvate carboxyl transferase, p2314; 9. NADP-dependent malic enzyme, p397; 10. citrate synthase (si), p2711 or p2808; 11. aconitate hydratase, p995; 12. isocitrate dehydrogenase [NADP], p1096; 13. 2-oxoglutarate:ferredoxin oxidoreductase, p978-981; 14. succinate:CoA ligase, p393 + p395; 15. fumarate hydratase, p2651 + p2652. An X on a pathway means the gene for the enzyme was not found.

106

Central carbon metabolism. The characterization of the three Dehalobacter isolates (Wild et al., 1996; Holliger et al., 1998; Sun et al., 2002), including strain PER-K23, showed that acetate is essential for growth, but cannot support organohalide respiration without hydrogen (H2) as some microbes do (Sung et al., 2006b). The incorporation of acetate likely starts with acetate-CoA ligase (DCF50_ p435) by conversion to acetyl-CoA, with which carbon dioxide

(CO2) can be fixed by pyruvate-flavodoxin oxidoreductase (Dcf50_ p269 or p2740) to produce 14 pyruvate (Figure 5.6). Heterotrophic CO2 fixation is supported by some CO2 fixation experiments with strain PER-K23 (Holliger et al., 1993). However, some evidence indicates that autotrophic CO2 fixation by Wood-Ljungdahl pathway occurs. Genes of a complete Wood-Ljungdahl pathway were found in the genomes of strains CF, DCA and PER-K23, and all of these genes except 5,10-methylenetetrahydrofolate reductase (Dcf50_p297 or dehre_0155) were found expressed in strain PER-K23 (Rupakula et al., 2013). Genes required for a functional Wood-Ljungdahl pathway were also found in Desulfitobacterium genomes and it is known that

Desulfitobacterium DCB-2 can grow on H2 and CO2 as a homoacetogen (Kim et al., 2012).

However, no known Dehalobacter isolates have been able to grow only with H2 and CO2 (Wild et al., 1996; Holliger et al., 1998; Sun et al., 2002), indicating that Wood-Ljungdahl pathway might play a non-essential role in Dehalobacter spp. The role of this pathway is more questionable, considering that it is incomplete in strain E1 due to sequence deletion. Many genes of Wood-Ljungdahl pathway in strains CF, DCA and PER-K23 are located in one gene cluster (DCF50_p283 to p290), but a large sequence (~ 7.9 kb) was deleted in the cluster in strain E1, leaving only one gene (the ortholog of DCF50_p283) intact (Figure 5.7a). These deleted genes were not found elsewhere in the draft genome of strain E1.

The three Dehalobacter isolates, including strain PER-K23, have been identified as obligate dechlorinators, but genes have also been found that are involved in catabolic carbon metabolism including alcohol dehydrogenase (DCF50_p2281), aldehyde dehydrogenase (DCF50_p2406), 6-phosphofructokinase (DCF50_p872), pyruvate kinase (DCF50_p653), pyruvate formate lyase (DCF50_p2283 and p2284) and formate hydrogen lyase (DCF50_p760 to p766). All four Dehalobacter genomes have a complete glycolysis pathway. This suggests that the four

Dehalobacter strains have the potential to ferment alcohols or use electron donors other than H2 or formate. This is consistent with two recent reports showing that two Dehalobacter strains grow on dichloromethane fermentation (Justicia-Leon et al., 2012; Lee et al., 2012). All of these

107

genes have orthologs identified from Desulfitobacterium Y51. In contrast, most Desulfitobacterium strains are capable of pyruvate fermentation (Villemur et al., 2006) and using a variety of electron donors (Kim et al., 2012).

Figure 5.7 Comparing strain CF, PER-K23 and E1 on three gene clusters related to (a) the Wood-Ljungdahl pathway, (b) molybdopterin biosynthesis, and (c) cobalamin biosynthesis.

Many anaerobes have an incomplete tricarboxylic acid cycle (TCA cycle), which mainly serves for anabolism. The TCA cycle of Dehalobacter appears incomplete due to the lack of genes for malate dehydrogenase and succinate dehydrogenase (Figure 5.6), and the lack of glyoxylate bypass. The loss of malate dehydrogenase can be compensated by the presence of a NADP-dependent malic enzyme (DCF50_p397). Despite the incompleteness of the TCA cycle, all intermediates within the cycle can be synthesized from pyruvate and acetyl-CoA with an oxidative half-cycle to succinate and a reductive half-cycle to fumarate (Figure 5.6). All of these enzymes were found expressed in strain PER-K23 (Table D2). Desulfitobacterium also appear to have an incomplete TCA cycle, but are missing a different step, 2-oxoglutarate dehydrogenase (Nonaka et al., 2006; Kim et al., 2012), which is present in Dehalobacter genomes and consists of four subunits (DCF50_p978 to p981). Dehalococcoides are generally assumed to have an incomplete TCA cycle (Ahsanul Islam et al., 2010; Marco-Urrea et al., 2011).

Anabolic pathways. Many functional anabolic pathways were identified: gluconeogenesis, pentose pathways, and biosynthesis pathways for fatty acids, purines, pyrimidines, amino sugars, peptidoglycan, and all 20 amino acids (Table D2). Most proteins involved in these functional pathways were found expressed in strain PER-K23 (Table D2). The presence of biosynthesis

108

pathways for all 20 amino acids is surprising since the growth of strain PER-K23 requires amending arginine, histidine, and threonine (Holliger et al., 1998). However, Dehalobacter strain TCA1 is able to grow on a defined medium amended with only one amino acid, cysteine (Sun et al., 2002).

The biosynthesis pathways for some essential cofactors appear complete, including terponoid, menaquinone, riboflavin, nicotinamide adenine dinucleotide, folate, pantothenate, pyridoxal phosphate, while the regular biosynthesis pathways of two essential cofactors, thiamin and biotin, appear absent (Table D2). The presence of complete biosynthesis pathways for menaquinone and its terponoid backbone, octaprenyl diphosphate, agrees with a previous finding that menaquinones are the only quinones detected in the cell biomass of Dehalobacter strain PER-K23 (Holliger et al., 1998). There is evidence suggesting that menaquinone serves as an electron mediator between hydrogenases and RDases in strain PER-K23 (Schumacher and Holliger, 1996). The absence of a thiamin biosynthesis pathway is consistent with the fact that the growth of strain PER-K23 requires a thiamin supplement (Holliger et al., 1998). The absence of a biotin biosynthesis pathway is surprising though, since strain PER-K23 does not require a biotin supplement for growth (Holliger et al., 1998).

Corrinoid is an essential cofactor of RDases. The completeness of the corrinoid biosynthesis pathway varies among the four Dehalobacter strains in this study. While the pathway is complete in strains CF and DCA, it has lost an essential gene in strain PER-K23 and some in strain E1 due to sequence deletion (Figure 5.7c). Most genes involved in this pathway reside in two gene clusters: genes involved in the upper pathway (from glutamyl-tRNA to Cobyrinate) are located in one cluster (DCF50_p2930 to p2943 in strain CF) and genes in the lower pathway are located in another cluster (DCF50_p799 to p808 in strain CF) (Table D2). Sequence deletion was found within the former (Figure 5.7c). In strain PER-K23, there is a sequence (101 bp) deleted within the coding region of cobalt-precorrin-3b C17-methyltransferase (CbiH, Dehre_2856), resulting in a reading frame shift; this gene was annotated as a pseudogene (Rupakula et al., 2013). Besides CbiH, several enzymes encoded in the same gene cluster were not found expressed in strain PER-K23 (Rupakula et al., 2013). The disruption of this cbiH gene probably explains why the growth of strain PER-K23 requires a vitamin B12 supplement (Holliger et al., 1998). In strain E1, there is a larger sequence (~ 6 kb) deleted at the 3’ end of the same gene cluster, resulting in the loss of genes of cobalt-precorrin-4 C11-methyltransferase (CbiF), precorrin-2

109

C20-methyltransferase (CbiL), cobalt-precorrin-6 synthase (CbiD) and four subunits of a cobalt ECF transporter (CbiOQNM) (Figure 5.7c). These genes were not found elsewhere in the draft genome of strain E1. Dehalobacter strain E1 exist in a coculture with a Sedimentibacter strain, which has a complete corrinoid biosynthesis pathway, thus the inability to produce corrnoid has been proposed as an explanation for the dependence of strain E1 on this Sedimentibacter strain (Maphosa et al., 2012).

Dehalobacter genomes encode many genes involved in the biosynthesis pathway of heme and molybdopterin; however, it is uncertain if these two pathways are complete. Heme is a cofactor of cytochrome b, a subunit of hup-type Ni,Fe-hydrogenases (Dcf50_p1688 to p1690, Dcf50_p1849 to p1851, and Dcf50_p2131 to p2133) found in all four Dehalobacter genomes. Two genes involved in the classic heme biosynthesis pathway (Layer et al., 2010), which goes from uroporphyrinogen III to heme via coproporphyrinogen III and others, were not found: oxygen-independent coproporphyrinogen III dehydrogenase (EC 1.3.3.4) and ferrochelatase (EC 4.99.1.1). Instead, three adjacent genes (DCF50_p1080 to p1082) were annotated to be related to the biosynthesis of heme d1, a cofactor of a cytochrome nitrite reductase cd1, only present in denitrifying bacteria (Bali et al., 2011). These three genes appear to fit an alternative heme biosynthesis pathway proposed in sulfate-reducing bacteria and Archaea (Ishida et al., 1998; Bali et al., 2011). The expression of these three genes was detected in strain PER-K23 (Table D2).

Molybdopterin is a cofactor incorporating molybdenum and tungsten for different enzymes (Kisker et al., 1997). Two selenocysteine-containing and molybdopterin-binding formate dehydrogenases (Dcf50_p924 to p927 and Dcf50_p1622 to p1626) are present in all four Dehalobacter genomes. These two formate dehydrogenases were found expressed in strain PER-K23 (Rupakula et al., 2013), even though it cannot use formate as an electron donor for organohalide respiration. Notably, Dehalobacter strain TCA1 can use both H2 and formate as electron donors (Sun et al., 2002). Most genes involved in the molybdopterin biosynthesis pathway are located in one gene cluster (Dcf50_p1639 to p1645) (Figure 5.7b). However, the genes encoding molybdopterin synthase (MoaD and MoaE) were not found. Instead, in strain PER-K23 and strain E1, two MOSC domain-containing proteins (Dehre_2363 and Dehre_2365) were found in the same gene cluster. MOSC domain is predicted as a sulfur-carrier domain that delivers sulfur for the formation of diverse sulfur-metal clusters (Anantharaman and Aravind, 2002). It is possible that these two proteins play a similar role as molybdopterin synthase. The

110

ortholog of Dehre_2365 was not found in either strain CF or strain DCA probably due to sequence deletion (Figure 5.7b), indicating that this pathway might be non-functional in these two strains.

Other Pathways. It is not surprising that a large number of genes that distinguish Dehalobacter and Desulfitobacterium from Dehalococcoides belong to three categories: peptidoglycan biosynthesis, sporulation, and flagella biosynthesis (chemotaxis). Strain PER-K23 has a peptidoglycan-containing cell wall (Holliger et al., 1998) and all known Dehalobacter isolates have at least one flagellum (Wild et al., 1996; Holliger et al., 1998; Sun et al., 2002). However, Dehalobacter have never been shown to undertake sporulation. Strains CF and DCA have two gene clusters with metabolic functions that were not found in strains PER-K23 and E1: one (DCF50_p2009 to p2020) encodes a complete nitrogen fixation operon and the other (DCF50_p194 to p200) encodes some genes related to arsenate resistance or detoxification (Table D2). The differences between the genomes in these two gene clusters appear to be caused by sequence insertion or deletion.

5.5 CLOSING REMARKS

The recent differentiation between strain CF and DCA illustrates a way that Dehalobacter spp. evolve and a way that other organohalide-respirating organisms can evolve. Although the process, mutation of existing rdhA genes to acquire new functions, is not surprising or might be anticipated, it is the first example ever found in organohalide-respirating organisms. Notably, this is not the only way that new dechlorination functions can be acquired. An alternative way is by horizontal gene transfer, a way that appears common in Dehalococcoides (Krajmalnik-Brown et al., 2007; McMurdie et al., 2007; McMurdie et al., 2011). There is no conclusive evidence that the rdhA genes in the four Dehalobacter genomes are related to recent horizontal gene transfer. It would be interesting to see how commonly these two ways of acquiring new rdhA genes and new dechlorination functions are used by different organohalide-respirating organisms and if some organisms tend to use one way.

Similar to Dehalococcoides genomes, Dehalobacter genomes encode a large number of rdhA genes and most of them cluster in two small regions. In Dehalococcoides genomes, the formation of rdhA clusters appears related to genomic islands and the presence of hot sequence

111

recombination sites (McMurdie et al., 2009). Although the landscapes of the two rdhA clusters vary dramatically between strains CF, PER-K23 and E1, the variations are not clearly related to site-specific recombination events or genomic islands as found in Dehalococcoide genomes. There is some evidence, though, showing that sequence duplication might play a role in the formation of these two rdhA clusters. The comparison of more Dehalobacter genomes is required to uncover the full story of the formation of these two rdhA clusters.

In phylogeny, Dehalobacter and Desulfitobacterium belong to the family Peptococcaceae. Compared to Dehalococcoides, Dehalobacter and Desulfitobacterium share more common characteristics: their genomes have a large number of ISs and are subject to higher degree of genome plasticity, and they share more orthologous genes and metabolic pathways. However, if they share a common ancestor, they have evolved in two opposite directions regarding anaerobic respiration capacity. While Desulfitobacterium are remarkably versatile in using a variety of electron donors and acceptors, these four Dehalobacter strains have become specialized in organohalide respiration. Specialization might mean function and genome reduction as exemplified by Dehalococcoides, whose genomes are among the smallest (Seshadri et al., 2005; McMurdie et al., 2009). Some genes involved in a complete Wood-Ljungdahl pathway and genes involved in mixed acid fermentation identified in Dehalobacter genomes could be disposable elements carried from the earlier ancestor. The Dehalobacter strains that are capable of dichloromethane fermentation could be close relatives that still retain some ancestral functions. Dehalococcoides might be an extreme case of specialization in organohalide respiration. They do not have a complete Wood-Ljungdahl pathway nor a biosynthesis pathway for corrinoid, the essential cofactor required for organohalide respiration. A recent study showed that Dehalococcoides also rely on an external supply of the lower α-ligand base (5′,6′-dimethylbenzimidazole) of the corrinoid (Yan et al., 2012; Yan et al., 2013). Some evidence we found suggests that some Dehalobacter strains are evolving towards more function reduction, possibly in the same direction as Dehalococcoides. Sequence deletions have resulted in an incomplete corrinoid biosynthesis pathway in strains PER-K23 and E1, and an incomplete Wood-Lungdahl pathway in strain E1.

112

5.6 ACKNOWLEDGEMENTS

I thank Ahsanul Islam for valuable discussions and help in reciprocal BLAST. Support was provided by the Government of Canada through Genome Canada and the Ontario Genomics Institute (2009-OGI-ABC-1405), the Government of Ontario through the ORF-GL2 program, and the United States Department of Defense through the Strategic Environmental Research and Development Program (SERDP) under contract W912HQ-07-C-0036 (project ER-1586). Metagenome sequencing was kindly provided by the U.S. Department of Energy Joint Genome Institute's Community Sequencing Program (CSP 2010). S.T. received awards from the Government of Ontario through the Ontario Graduate Scholarships in Science and Technology (OGSST) and the Natural Sciences and Engineering Research Council of Canada (NSERC PGS B).

113

Chapter 6 Isolation of Three Dehalobacter Strains 6.1 INTRODUCTION

Pure culture cultivation or strain isolation is a classic way to study microorganisms. Results from experiments using pure cultures can be directly attributed to one microbial species or strain because others are excluded. However, in samples from natural environments, less than 1% of all microbes are cultivable or can be isolated (Amann et al., 1995). This is a common justification for the use of metagenomic sequencing, in which the whole microbial community is sequenced without strain isolation efforts. There is an opposing view that all microorganisms are cultivable; the question is whether one can find the right growth conditions and technique for isolation.

What are the right conditions for isolating Dehalobacter strains? Previous successful experiences reported by other researchers are informative. Table 6.1 lists the isolation techniques and some special treatments used in the isolation of pure cultures from Dehalobacter and Dehalococcoides. All three Dehalobacter isolates reported so far are known to be obligate dechlorinators that use hydrogen (H2) and/or formate as electron donors and chlorinated compounds as electron acceptors (Wild et al., 1996; Holliger et al., 1998; Sun et al., 2002). This lifestyle is similar to that of Dehalococcoides, a different genus of obligate dechlorinators. In all cases listed in the table, defined salt media were amended with H2 (an electron donor), chlorinated compounds (electron acceptors), acetate (carbon source), a combination of trace metal elements, a combination of vitamins, at least one oxygen reductant (such as sodium sulfide, dithiothreitol, titanium (III) citrate and cysteine) and a certain concentration of carbon dioxide (CO2) in the headspace. Some strains were isolated using such a defined medium without further supplementation, including Dehalococcoides mccartyi strain BAV1, FL2, GT, MB and ANAS1, and Dehalobacter sp. strain TCA1. However, for others, additional nutritional supplementation was required (Table 6.1).

114

Table 6.1 Examples of Dehalobacter and Dehalococcoides strains isolated in the literature.

Genus Strain Isolation techniques Special treatments Reference

Dehalobacter TEA Serial high-dilution transfer added spent medium from a fixed-bed reactor (Wild et al., 1996)

used sodium molybdate to inhibit sulfate-reducing bacteria

Dehalobacter PER-K23 Serial high-dilution transfer used 2-bromoethanesulfonate to inhibit methanogens (Holliger et al., 1993) added “fermented yeast extract” from a mixed culture

Dehalobacter TCA1 Shake tube used 0.8% low-melting agarose. (Sun et al., 2002)

Dehalococcoides 195 Serial high-dilution transfer added sterile cell extracts from mixed H2-PCE cultures (Maymó-Gatell et al., 1999) used ampicillin

Dehalococcoides CBDB1 Shake culture used 0.3% low-melting agarose (Adrian et al., 2000) used mixed culture supernatant

Dehalococcoides BAV1 Serial high-dilution transfer used 0.5% low-melting agarose (He et al., 2003)

Shake culture used ampicillin

Dehalococcoides FL2 Serial high-dilution transfer added 2-bromoethanesulfonate to inhibit methanogens (He et al., 2005)

used ampicillin

Dehalococcoides GT Serial high-dilution transfer used ampicillin (Sung et al., 2006a)

Dehalococcoides MB and Shake culture used ampicillin (Cheng and He, ANAS1 2009) Serial high-dilution transfer

115

Dehalobacter and Dehalococcoides have to rely on other community members (fermenting

organisms) to provide H2 (or formate for some Dehalobacter) when electron donors other than

H2 are provided. When they are using H2, they might be exposed to other nutrients (such as amino acids and vitamins) generated by other community members. The availability of certain nutrients could select for strains that lose the ability of synthesizing these nutrients on their own and result in nutrient deficiency or nutrient dependence on other community members. To isolate such a strain, nutrient dependence need to be removed. At the beginning of an isolation process, the exact nutrient requirement is unknown. A regular way to circumvent this is to add complex components, such as yeast extracts. However, adding these complex components is not necessarily appropriate because they might not have all required nutrients and might contain inhibiting compounds. A better choice might be nutrient extracts derived from parent mixed cultures in which the organism of interest can grow healthily. In the isolation of Dehalobacter restrictus strain PER-K23, “fermented yeast extract” from parent mixed cultures was added to the isolation medium (Table 6.1). For Dehalobacter sp. strain TEA, spent medium from a fixed-bed reactor was used (Table 6.1). For Dehalococcoides mccartyi strain 195, sterile cell extracts from parent mixed cultures were used (Table 6.1). For Dehalococcoides mccartyi strain CBDB1, sterile supernatant from parent mixed cultures was used (Table 6.1).

What is a proper isolation technique or method to isolate Dehalobacter strains? The most convenient and straightforward technique would be to grow colonies on solid surface (such as surface of agar/cellulose plates) or inside solid media (such as shake tubes or roll tubes). Pure cultures can then be isolated by picking colonies that grow from single cells. Some Dehalobacter and Dehalococcoides strains have been isolated using shake tubes (Table 6.1), including Dehalobacter sp. strain TCA1 (Sun et al., 2002). However, not all microbes form colonies in solid media or on surfaces. When efforts with solid media fail, dilution-to-extinction transfers or serial high-dilution transfers with liquid media are alternative approaches. With dilution-to- extinction transfers, microbial isolation is achieved by diluting out the contaminating organisms by repeated high-dilution transfers. Compared with methods using solid media, this method is less efficient and the isolation process can take years. Unfortunately, for some Dehalobacter and Dehalococcoides strains, including Dehalobacter restrictus strain PER-K23 and Dehalobacter sp. strain TEA, this may be the only way feasible.

116

6.2 MATERIALS AND METHODS

6.2.1 Medium and mixed culture supernatants

ACT-3 is a commercialized culture that can dechlorinate 1,1,1-trichloroethenae (1,1,1-TCA) and chloroform (CF), and is the focus of this thesis. ACT-3 is grown on a defined medium (referred to as the MM medium in the following discussions) amended with 1,1,1-TCA as an electron acceptor and a mixture of methanol, ethanol, and lactate as electron donors (Grostern et al., 2010). The sterile mixed-culture supernatant (MCS) collected from ACT-3 was added into the MM medium to support the growth of three Dehalobacter strains in dilution-to-extinction transfers in this study. To obrain MCS, 500 mL ACT-3 culture was collected after dechlorination plateaued and was centrifuged at 9000 ×g at 4 °C for 20 min. The supernatant after centrifugation was transferred into 160 mL serum bottles, which were sealed with butyl rubber stoppers and stored frozen at -20 °C until use.

6.2.2 Cultures

Three highly-enriched Dehalobacter-containing cultures were subject to further isolation efforts:

a) DHB-111TCA/H2, an ACT-3 subculture that can dechlorinate 1,1,1-TCA to 1,1-dichloroethane (1,1-DCA) and CF to dichloromethane (DCM); it was grown on the

MM medium amended with 1,1,1-TCA, H2 and acetate; Dehalobacter occupied ~ 80% of the culture according to the results of a clone library (data not shown); this culture should contain Dehalobacter strain CF,

b) DHB-11DCA/H2, another ACT-3 subculture that can dechlorinate 1,1-DCA to

monochloroethane (CA); it was grown on the MM medium amended with 1,1-DCA, H2 and acetate; in a previous clone library (data not shown), no microbes other than Dehalobacter were found; this culture should contain Dehalobacter strain DCA,

c) DHB-12DCA/H2, a co-culture of a Dehalobacter strain and an Acetobacterium strain (Grostern and Edwards, 2009); it is not an ACT-3 subculture; it dechlorinates

117

1,2-dichloroethane (1,2-DCA) to ethene; it was grown on the MM medium amended

with 1,2-DCA, H2 and acetate.

6.2.3 Dilution-to-extinction transfers

In an anaerobic chamber, 7 mL MM medium, 3 mL ACT-3 MCS, 25 µL 2 M sodium acetate (final concentration of 5 mM), 25 µL of a iron sulfide (FeS) slurry containing ~ 2 g/L FeS, 2 µL chlorinated substrate (1,1,1-TCA, 1,1-DCA or 1,2-DCA) were added to 15 mL Belco tubes. The

Belco tubes were sealed with butyl rubber stoppers and 6.4 mL H2/CO2 (80% H2 and 20% CO2) were added so that the amount of H2 was ~ 10 times that required for reducing the chlorinated substrates. These Belco tubes were then crimped shut, autoclaved and cooled to room temperature before inoculation. Using the medium in these Belco tubes, dilution-to-extinction transfers were made. Sterile disposable syringes and needles were flushed with nitrogen before being used to transfer cultures. The new transfers were incubated at room temperature and dechlorination was monitored by taking 0.3 mL headspace samples and analyzing them with gas chromatography as previously described (Grostern and Edwards, 2006a). Those cultures that recovered from the deepest dilution were used for the next round of dilution transfers.

6.2.4 Purity tests

Cells were stained with 4,6-diamidino-2-phenylindole, dihydrochloride (DAPI), and examined with a fluorescence microscope to check for uniformity of cell morphology. Potential contamination with other microorganisms was examined by making transfers with a 10-2 or 10-3 dilution into the isolation medium described above but excluding the chlorinated substrate (to detect homoacetogens), or replacing it with 5 mM sodium nitrate (for nitrate reducers), 5 mM sodium sulfate (for sulfate reducers), a mixture of methanol, ethanol and lactate (5 mM each) (for fermenting microbes), or 0.1 g/L yeast extract. These new transfers were incubated at room temperature for more than 1 month and then examined with fluorescence microcopy with DAPI staining.

Purity tests of two cultures, DHB-11DCA/H2 and DHB-12DCA/H2, were also performed by sequencing the 16S rRNA gene amplicons using 454 pyrotag sequencing. DNA samples were extracted using MoBio UltraClean® Microbial DNA Isolation Kit. Primer design, PCR reactions 118

and sample preparation before sequencing were performed according to the work of Osburn et al (Osburn et al., 2011). The presence of the Acetobacterium strain in the co-culture of DHB-12DCA/H2 was assessed using PCR with specific primers targeting the Acetobacterium 16S sequences (Duhamel and Edwards, 2006): 572f (GGCTCAACCGGTGACATGCA) and 784r (ACTGAGTCTCCCCAACACCT). In PCR reactions, a Taq DNA polymerase (Fermentas, Canada) was used and the thermocycling program was as follows: initial denaturation of 10 min at 94 °C; 40 cycles of 30 s denaturation at 94 °C, 30 s annealing at 51 °C and 60 s extension at 72 °C; and final extension of 10 min at 72 °C.

6.3 RESULTS AND DISCUSSION

6.3.1 Stimulatory effects of Mixed Culture Supernatant (MCS) from ACT-3

The two ACT-3 subcultures, DHB-111TCA/H2 and DHB-11DCA/H2, were initially established in order to isolate Dehalobacter strains coexisting in ACT-3 cultures. After being maintained and transferred with H2 as the electron donor for years, these two cultures had become highly enriched for the Dehalobacter population. Methanogenic archaea from ACT-3 had been excluded from these two cultures because of the application of 2-bromo-ethane sulfonate, a common inhibitor of methanogens, in early transfers. However, further purification efforts were hindered because the dechlorination of these two cultures had dramatically slowed down and these two cultures could not sustain high-dilution transfers. The co-culture of DHB-12DCA/H2 was in a slightly better condition regarding dechlorination rate, but efforts to separate the Dehalobacter strain from the Acetobacterium strains were unsuccessful.

119

Figure 6.1 The dechlorination profiles of the cultures of (a) DHB-111TCA/H2 and (b) DHB-11DCA/H2 with and without ACT-3 mixed culture supernatant (MCS). Arrows indicate the time points when cultures were re-amended with chlorinated substrates.

It was found that the MCS collected from ACT-3 could stimulate the dechlorination of these three Dehalobacter strains. The effects of ACT-3 MCB on the two cultures of DHB-111TCA/H2 and DHB-11DCA/H2 are shown in Figure 6.1. Similar effects of ACT-3 MCS on the co-culture DHB-12DCA/H2 were also found (data not shown). As introduced earlier, extracts from parent mixed cultures in which the organism of interest grows well have long been used by other researchers to facilitate the isolation of some Dehalobacter and Dehalococcoides strains (Table 6.1). An explanation of the function of MCS is that it contains nutrients that are required for the growth or rapid growth of these three Dehalobacter strains.

120

Figure 6.2 Typical dechlorination profiles of dilution-to-extinction transfers of the three Dehalobacter cultures in this study when ACT-3 MCS was added to the medium.

6.3.2 Dilution-to-extinction transfers

Another benefit of adding ACT-3 MCS is that it allows the three cultures to be transferred with a dilution of as low as 10-8 (sometimes 10-9). Moreover, transfers with a 10-8 dilution typically recovered in about three weeks to the point where subsequent deep transfers could be made. Typical dechlorination profiles of these three cultures in one batch of dilution-to-extinction transfers are shown in Figure 6.2. So far, more than 19 batches of dilution-to-extinction transfers have been performed on each of the three cultures in this study.

121

6.3.3 Purity tests

Purity tests have not been performed using the most recent transfers of these three cultures. The following results were obtained with earlier transfers. First, dilution-to-extinction transfers using media amended with ACT-3 MCS resulted in the elimination of the Acetobacterium strain from the co-culture of DHB-12DCA/H2. This was proved with both PCR reactions (Figure 6.3) and 16S rRNA gene pyrotag sequencing (Table 6.2). Previously, different methods have been tried to separate the Acetobacterium strain from the Dehalobacter strain in this co-culture when the MM medium without ACT-3 MCS was used, but these were unsuccessful. Possibly, the Dehalobacter strain has certain nutrient dependence on the Acetoabacterium strain. A co-culture of Dehalobacter sp. strain E1 with an Sedimentibacter strain was reported in the literature; the separation of these two strains also appeared challenging (Maphosa et al., 2012). Our results show that the ACT-3 MCS supplement eliminated the potential nutrient dependence of the Dehalobacter strain in the DHB-12DCA/H2 culture. Moreover, with ACT-3 MCS, the Dehalobacter strain outgrew the Acetobacterium strain in dilution-to-extinction transfers, which is reasonable because organohalide respiration should produce more free energy than homoacetogenesis under regular conditions.

Figure 6.3 PCR reactions targeting the 16S rRNA gene of the Acetobacterium strain in the co-culture of DHB-12DCA/H2 before and after dilution-to-extinction transfers. 1) DNA from the culture after 3 cycles of dilution-to-extinction transfers with ACT-3 MCS; 2) 10 times dilution of sample 1; 3) DNA from the original co-culture before dilution-to-extinction transfers; 4) 10 times dilution of sample 3; 5) negative control.

122

Table 6.2 Summary of taxonomy assignment in the results of 16S rRNA gene pyrotag sequencing on the two cultures of DHB-11DCA/H2 and DHB-12DCA/H2.

DHB-­‐11DCA/H2 DHB-­‐12DCA/H2 Genus th th Dehalobacter 12 cycle of transfer 12 cycle of transfer

Clostridium 23080 30414

Microcoleus 2 4

1 4 I am not aware of any reports in the literature about using 16S rRNA gene pyrotag sequencing as a measure of testing culture purity. From the results of pyrotag sequencing, Dehalobacter sp. occupies 99.99% and 99.98% of the whole population in DHB-12DCA/H2 and DHB-11DCA/H2. There were only a few sequence counts that belong to the genus of Clostridium and Phormidiaceae (Table 6.2). Sample preparation for 16S rRNA gene pyrotag sequencing is a process involving DNA extraction, PCR reactions, PCR product cleanups, 454 sequencing, etc. It is uncertain if the presence of these few counts of Clostridium and Phormidiaceae is due to their presence in the original culture or potential contamination in the sample preparation process. It is possible that they came from abnormal sequencing errors. These transfers contained 30% (v/v) ACT-3 MCS, which was autoclaved at least once at 120 °C for 20 min. Whether all DNA sequences in ACT-3 MCS had been decomposed during autoclaving is uncertain. Some large DNA sequences could survive autoclave treatment (Elhafi et al., 2004). However, even if there were only a few or even no counts of 16S sequences from contaminating organisms in pyrotag sequencing results, we were cautious about forming conclusions using the pyrotag data alone, because the PCR primers used cannot cover all existing microbes.

Some transfers of these three cultures were subject to culture-based purity tests as described in MATERIALS AND METHODS. Based on fluorescence microscopy, no obvious growth of microorganisms was found in all media tested. Contamination was once found in a transfer of DHB-111TCA/H2 using a medium amended with yeast extract, but this test was shown negative with a more recent transfer of DHB-111TCA/H2.

123

Figure 6.4 Fluorescence microscopy with DAPI staining of the three Dehalobacter cultures: DHB-111TCA/H2 (a-b), DHB-11DCA/H2 (c-d), and DHB-12DCA/H2 (e-h). For each

124

culture, sequential batches of transfers were performed. These figures were taken with samples from different batches of transfers as indicated by the batch number (e.g. 16th batch).

6.3.4 Different morphologies of Dehalobacter cells

So far, only three Dehalobacter isolates have been reported and Dehalobacter cells were shown to be rods (Wild et al., 1996; Holliger et al., 1998; Sun et al., 2002). However, each of the three Dehalobacter strains in the cultures in the current study appeared to have at least two distinctive morphologies: rods and filaments (Figure 6.4). The presence of these two different morphologies was reported previously in the DHB-12DCA/H2 culture, the co-culture of Dehalobacter and Acetobacterium (Grostern and Edwards, 2009). However, at that time, it was uncertain whether Dehalobacter or Acetobacterium accounted for the filaments. After the elimination of the Acetobacterium strain from the culture with dilution-to-extinction transfers, which was proved after the 3rd cycle (Figure 6.3), the two morphologies still coexisted (Figure 6.4e), showing that both morphologies belong to Dehalobacter. This was also the case in the other two cultures, DHB-111TCA/H2 and DHB-11DCA/H2 (Figure 6.4a-d). Interestingly, in some transfers, Dehalobacter cells were mostly rods (Figure 6.4a and 6.4c) and in some they were mostly filaments (Figure 6.4d and 6.4f); more often, they were both. One of my colleagues had examined the co-culture of DHB-12DCA/H2 with fluorescence in situ hybridization (FISH) microscopy using probes targeting Dehalobacter 16S rRNA. Preliminary results of FISH (data not shown) agreed with these conclusions. In the culture of DHB-12DCA/H2, the filaments can grow to an extraordinary size (Figure 6.4g). A new morphology dominating DHB-12DCA/H2 was found in a recent transfer (Figure 6.4h) using ACT-3 MCS amended with formate as an electron donor instead of H2. The new morphology is similar to a common morphology of methanogenic archaea; however, methane production has never been detected in these transfers. It is uncertain if this new morphology was caused by contamination or is another morphology of Dehalobacter.

6.4 CONCLUSIONS

Using dilution-to-extinction transfers with media amended with ACT-3 MCS, we have obtained three Dehalobacter cultures that appear pure based on purity tests performed. The purity of these

125

three cultures will be further confirmed with more comprehensive and systematic tests using the most recent transfers. Dehalobacter strains from these three cultures have two distinctive morphologies, rods and filaments, which has not been previously reported.

126

Chapter 7 Complete Genome of Bacteroidales strain CF from a Chloroform-Dechlorinating Enrichment Culture

Reproduced with permission from the journal of Genome Announcements, the American Society for Microbiology. Copyright © American Society for Microbiology, Genome Announcements, 2013, accepted.

7.1 ABSTRACT

Bacteroidales strain CF is the most abundant non-dechlorinating organism in a Dehalobacter-containing enrichment culture that consistently reductively dechlorinates over 50 mg/L chloroform or 1,1,1-trichloroethane. We assembled and closed the complete genome of this organism from metagenomic sequencing data of enrichment cultures. This organism is predicted to ferment L-lactate and ethanol.

7.2 INTRODUCTION

ACT-3 is an enrichment culture used for bioaugmentation to detoxify groundwater contaminated by 1,1,1-trichloroethane (1,1,1-TCA), 1,1-dichloroethane (1,1-DCA) and chloroform (CF) (Grostern et al., 2010). It is routinely grown in pre-reduced defined mineral medium amended with ~ 50 mg/L 1,1,1-TCA as electron acceptor and a mixture of methanol, ethanol, and lactate as fermentable electron donors. The ACT-3 culture is dominated by two Dehalobacter strains which make up ~ 70% of the whole community. In this culture, 1,1,1-TCA (methyl chloroform) is reductively dechlorinated, via 1,1-DCA, to monochloroethane, and CF is reductively dechlorinated to dichloromethane (Tang and Edwards, 2013). A subculture of the ACT-3 culture is maintained in the same way except that the electron acceptor 1,1,1-TCA is replaced by CF. This subculture is dominated (~ 80%) by only one of the two Dehalobacter strains from the parent culture (Tang et al., 2012). The parent culture and CF subculture also share a non-dechlorinating organism that is the most abundant non-Dehalobacter organism in both

127

cultures, representing 8 to 9% of the sequences in the cultures (Tang et al., 2012). We have named this organism Bacteroidales strain CF based on 16S rRNA phylogeny. This organism is of special interest because it is most likely a fermenting organism providing hydrogen to Dehalobacter while surviving in the presence of at least 50 mg/L CF or 1,1,1-TCA, a concentration that typically completely inhibits many microbial processes including methanogenesis (Suidan et al., 1991; Weathers, 2000) and reductive dechlorination (Bagley et al., 2000; Duhamel et al., 2002). Despite several attempts, we have not yet obtained a pure culture of this organism.

7.3 METHODS AND RESULTS

The complete genome of Bacteroidales strain CF was assembled from the metagenomes of ACT-3 and the CF subculture. The ACT-3 metagenome was sequenced using 454 pyrosequencing including paired-end sequencing of a 8-kb insert library, and was assembled with Newbler (Margulies et al., 2005) v. 2.5. The CF subculture metagenome was sequenced using Illumina paired-end sequencing, and was assembled with AbySS (Simpson et al., 2009) v. 1.3.4. Previously, we reported the assembly of two complete Dehalobacter genomes from these two metagenomes (Tang et al., 2012). Subsequently, contigs from the subculture metagenome were mapped against the two Dehalobacter genomes and removed to obtain contigs of microbes other than Dehalobacter. The non-Dehalobacter contigs were compared to scaffolds generated in the ACT-3 metagenome. A circular scaffold comprising twenty non-Dehalobacter contigs was constructed. Nineteen gaps in the scaffold were resolved in silico with a method previously described (Tang et al., 2012). The last gap, representing a highly repetitive region, was resolved by long reads generated from additional Sanger sequencing guided by PCR. The availability of these two metagenome data sets enabled the complete assembly: the ACT-3 metagenome provided long-distance mate-pair constraints for scaffolding despite having insufficient coverage (~ 14×) to close the genome, while the other metagenome provided sufficient coverage (~ 40×).

The genome of Bacteroidales strain CF is ~ 2.66 Mb with an average G+C content of 42.7%. It contains 9 rRNA genes (including 3 copies of 16S rRNA gene), 37 tRNA genes and ~ 2310 coding sequences. It was annotated with RAST (Aziz et al., 2008) and BASys (Van Domselaar et al., 2005). Based on annotations, genes required for the fermentation pathways of L-lactate, 128

ethanol, and glucose were identified, genes encoding various hydrogenases, and an incomplete citric acid cycle was found. Notably, no essential genes involved in the Wood-Ljungdahl pathway were identified. The closest genome publicly available was a draft genome of Alistipes putredinis DSM 17216 (GenBank accession number, ABFK00000000), whose 16S rRNA sequences share only ~ 86% identity with those of Bacteroidales strain CF.

Nucleotide sequence accession number. This genome has been deposited in GenBank with the accession number of CP006772.

7.4 ACKNOWLEDGEMENTS

Support was provided by the Government of Canada through Genome Canada and the Ontario Genomics Institute (2009-OGI-ABC-1405). Support was also provided by the Government of Ontario through the ORF-GL2 program and the United States Department of Defense through the Strategic Environmental Research and Development Program (SERDP) under contract W912HQ-07-C-0036 (project ER-1586). Metagenome sequencing was conducted by the U.S. Department of Energy Joint Genome Institute supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231. S.T. received awards from the Government of Ontario through the Ontario Graduate Scholarships in Science and Technology (OGSST) and the Natural Sciences and Engineering Research Council of Canada (NSERC PGS).

129

Chapter 8 Summary, Significance and Future Work

8.1 SUMMARY

This thesis focuses on the characterization of ACT-3, a commercialized enrichment culture that is being used to detoxify groundwater contaminated by chloroform (CF) and 1,1,1-trichloroethane (1,1,1-TCA). Previously, our understanding of this mixed culture was limited. The culture was known to dechlorinate CF, 1,1,1-TCA and 1,1-dichloroethane (1,1-DCA), and the dechlorinating organisms were known to belong to Dehalobacter, a poorly understood genus commonly involved in organohalide respiration. The loss of dechlorination function observed in the ACT-3 subcultures grown on different substrates suggested that there was more than one Dehalobacter strain in ACT-3, though the evidence was not conclusive. Some important questions to be answered were: (a) whether there are dechlorinating organisms other than Dehalobacter in ACT-3; (b) how many Dehalobacter strains coexist; (c) why are there more than one Dehalobacter strain and what are the differences between them; (d) how many reductive dehalogenase (RDase) genes are encoded in ACT-3 and how many of them are expressed; (e) which RDases account for the dechlorination activities observed in ACT-3; (f) what limits the growth of Dehalobacter spp. as they become more enriched; (e) can Dehalobacter strains be isolated for further characterization.

Three parallel lines of inquiry were pursued to characterize the ACT-3 culture: RDase identification (Chapter 2 and 3), metagenomic sequencing (Chapters 4, 5 and 7) and strain isolation (Chapter 6). Strain isolation can take years, as seen in our case. Without isolates, metagenomic sequencing offers an alternative to uncovering the genetic information of the organisms of interest, provided the sequences can be binned or assembled accurately. We first sequenced the metagenome of ACT-3 using 454 pyrosequencing; this metagenome served as a blueprint for subsequent studies.

130

Chapter 2 and 3

Progress was first achieved in RDase identification. Traditionally, the characterization of RDases relied completely on direct protein purification of RDases from dechlorinating cultures (Ni et al., 1995; Neumann et al., 1996; Christiansen et al., 1998; Magnuson et al., 1998; Miller et al., 1998; van de Pas et al., 1999; Krasotkina et al., 2001; Okeke et al., 2001; van de Pas et al., 2001; Suyama et al., 2002; Boyer et al., 2003; Maillard et al., 2003; Thibodeau et al., 2004) because heterologous expression efforts were unsuccessful (Neumann et al., 1998; Suyama et al., 2002; Sjuts et al., 2012). The process of direct protein purification typically consists of a series of column chromatography. This process is time-consuming and labor-intensive, and requires the consumption of a large amount of cell biomass. The availability of the ACT-3 metagenome allowed us to use a different way to identify the RDase in ACT-3. This method features partial protein purification with blue native polyacrylamide gel electrophoresis (BN-PAGE), followed by dechlorination enzymatic assay and protein identification with LC-MS/MS. This method was described in Chapter 2 and had been successfully applied to the characterization of RDases in Dehalococcoides-containing cultures. The advantage of this method is that it allows the association of RDase functions with protein sequences, while consuming limited biomass (we typically used ~ 40 mL of mixed cultures for one run). A shortcoming of this method is that BN- PAGE is not very effective at separating two similar RDases. Nevertheless, the paper presented in Chapter 2 was selected as a “Spotlight Article” by the Applied & Environmental Microbiology editoral board highlighting research articles that have been deemed of significant interest. In the case of ACT-3 (Chapter 3), two highly similar RDases (95% amino acid identity), CfrA and DcrA, were co-expressed. Although BN-PAGE cannot separate the two RDases, we managed to separate the two RDases at the culture level: each of the two ACT-3 subcultures, one grown on chloroform (the CF subculture) and the other on 1,1-DCA (the DCA subculture) inherited either CfrA or DcrA from ACT-3, respectively. Comparing the RDases expression profiles of ACT-3 and these two subcultures differentiated the functions of CfrA and DcrA.

The discovery from the RDase identification work is that there are two functionally different but highly similar (95.2% amino acid identity) RDases co-expressed in ACT-3: CfrA accounts for

131

the dechlorination of 1,1,1-TCA to 1,1-DCA and CF to dichloromethane (DCM), and DcrA accounts for the dechlorination of 1,1-DCA to chloroethane. The fact that only CfrA was expressed in the CF subculture and only DcrA was expressed in the DCA subculture explains perfectly the functional differences between these three cultures. These results suggest that there are two Dehalobacter strains coexisting and each of them expresses one of the two RDases. This assumption became conclusive with the metagenomic assembly as two complete Dehalobacter genomes were assembled from the ACT-3 metagenome.

Chapter 4

De novo sequence assembly (without reference genomes) in a metagenomic context, in which strain variation prevails, is a challenging and rapidly developing research field. ACT-3 metagenome represents a simple scenario of the kind, as two highly similar Dehalobacter genomes (~ 90% nucleotide identity) of similar abundance coexist in the ACT-3 metagenome. The reason for their similar abundance in the culture, and consequently similar read depth (35-40×) in the metagenomic sequencing, is likely that they obtain similar amounts of free energy in each of the two dechlorination steps from 1,1,1-TCA to CA via 1,1-DCA (each step represents 2 electron equivalents). The preliminary assembly of 454 data from the ACT-3 metagenome with Newbler resulted in severe fragmentation in the assembly of the two Dehalobacter genomes because regular genome assemblers like Newbler are designed to assemble sequences derived from single genomes. These assemblers always try to represent sequence assemblies as a single nucleotide sequence, which is problematic in a metagenomic context. In the ACT-3 metagenome, many gaps between scaffolded Dehalobacter contigs have two alternative solutions, from one or the other of the two Dehalobacter strains. A better way to capture the two alternative solutions is to represent them in multiple sequence alignments. To do this, an in silico gap resolution method was developed, which can resolve assembly gaps between contigs within scaffolds no matter whether they were caused by intra-genomic repeats or strain variation. This method uses BLASTN to search for pre-assembled contigs to close gaps in scaffolds. Solutions to gaps caused by strain variation are represented as multiple sequence alignment. Using this method, almost all gaps between Dehalobacter contigs were resolved; only four recalcitrant gaps were resolved further with Sanger sequencing guided by PCR. However, despite resolving all gaps, we could not determine the orgin of the alternative solutions and 132

therefore could not separate the two genomes using the 454 data from the ACT-3 metagenome. Separation of these alternative contigs or solutions based on read depth differences was ineffective because the two strains happened to have similar abundance. Fortunately, we also sequenced the CF subculture using Illumina paired-end sequencing. We called this the CF metagenome, which inherited only one Dehalobacter strain from ACT-3. By mapping the reads from the CF metagenome against the closed Dehalobacter assembly constructed from the ACT-3 metagenome, two complete Dehalobacter genomes were separated.

Chapter 5

As expected, the gene encoding CfrA was found in one assembled Dehalobacter genome (strain CF) and the gene encoding DcrA was found in the other Dehalobacter genome (strain DCA), helping to shed light on the whole picture of ACT-3 and its subcultures. ACT-3 has two co- existing Dehalobacter strains. When ACT-3 grows on 1,1,1-TCA, strain CF expressing CfrA dechlorinates 1,1,1-TCA to 1,1-DCA and then strain DCA expressing DcrA dechlorinates 1,1- DCA to CA. When ACT-3 was enriched with CF instead of 1,1,1-TCA, only strain CF survived in the CF subculture because strain CF grows equally well by dechlorinating CF to DCM, but strain DCA has no activity on either CF or DCM. Similarly, only strain DCA survived in the DCA subculture when 1,1-DCA was amended as an electron acceptor.

An outstanding feature of these two Dehalobacter genomes is that they are highly similar to each other (~ 90% nucleotide identity), indicating recent differentiation from a common ancestor. The differentiation of these two strains was likely driven by the differentiation of cfrA (encoding CfrA) and dcrA (encoding DcrA) genes, which share 97.9% nucleotide identity and 95.2% amino acid identity; these two genes are orthologues and reside on the same location of each genome. Because ACT-3 was enriched from a 1,1,1-TCA contaminated site, it is more likely that the common ancestor of these two strains is more similar to strain CF, which dechlorinated 1,1,1-TCA to 1,1-DCA, than to strain DCA. The accumulation of 1,1-DCA might initiate the differentiation of strain DCA with slight changes of the cfrA gene to make the dcrA gene. The differentiation allows the Dehalobacter population to harvest more energy from 1,1,1-TCA than would otherwise be possible.

133

Dehalobacter is a genus that was previously poorly understood. These two Dehalobacter genomes for the first time show a complete picture of how Dehalobacter can behave. Similar to Dehalococcoides genomes, Dehalobacter genomes harbor a large reservoir of RDase genes (17 both strain CF and stain DCA), and RDases were the only oxidoreductases found in the genome catalyzing electron-accepting reactions in respiration. This supports previous recognition that both Dehalobacter and Dehalococcoides are specialized in organohalide respiration. However, in other ways, Dehalobacter genomes are more similar to Desulfitobacteriumgenomes, which is reasonable because they are phylogenetically closely related. Notably, both Dehalobacter and Desulfitobacterium have many genes in the corrinoid biosynthesis pathway, while this pathway is completely absent in Dehalococcoides. Another interesting result is that the two Dehalobacter genomes encode genes required for a functional Wood-Ljungdahl pathway and genes involved in mixed acid fermentation, suggesting that Dehalobacter evolved from a more versatile ancestor.

Comparing these two Dehalobacter genomes with another two Dehalobacter genomes released recently resulted in some interesting findings. First, many RDase genes in these four genomes were found clustered in two conserved genome regions, and loss or gain of RDase genes in these two regions seems subject to sequence deletion or insertion. However, there was no sequence evidence supporting the presence of site-specific recombination events or genomic islands as seen in Dehalococcoides genomes (McMurdie et al., 2009). Second, Dehalobacter genomes should have a higher occurrence of genome rearrangement events because of the presence of a large number of repetitive elements, especially insertion sequences. We identified two genome rearrangement events between the genomes of strain CF (or DCA) and strain PER-K23, which account for the GC-skew difference between them. These two events appear to have been catalyzed by inverted repeats. Third, although the genomes of Dehalobacter strain CF, PER-K23 and E1 share almost all metabolic genes, the completeness of two important pathways,Wood-Ljungdahl and corrinoid biosynthesis, varies. While both pathways are complete in strains CF and DCA, the corrinoid biosynthesis pathway is incomplete in strain PER-K23 and both pathways are incomplete in strain E1. All missing genes involved are likely caused by sequence deletion. These results indicate that some Dehalobacter genomes are evolving towards more function loss.

Chapter 6 134

After more than four years of pure culture isolation efforts, three pure Dehalobacter cultures were obtained: DHB-111TCA/H2, DHB-11DCA/H2 and DHB-12DCA/H2. The first two cultures should correspond to strain CF and strain DCA in ACT-3. The third is a Dehalobacter strain of a different origin (Grostern and Edwards, 2009). These three strains were isolated by dilution-to-extinction transfers. The key to success in the process was to eliminate the potential nutrient dependence of Dehalobacter on other community members, which was achieved by amending into the isolation medium sterile supernatant from the parent mixed culture ACT-3. In addition, although Dehalobacter cells were reported as rods in the literature, all three Dehalobacter strains had two distinctive morphologies: rods and filaments. The presence of these two distinctive morphologies complicated and slowed down the isolation process.

Chapter 7

Finally, we report the assembly of a complete genome from the ACT-3 metagenome. This genome belongs to a strain named Bacteroidales strain CF, which was the most abundant non-dechlorinating organism in ACT-3 and the CF subculture. This organism is interesting because it is potentially an efficient fermenting organism that tolerates at least 50 mg/L chloroform or 1,1,1-TCA, which inhibits the growth of many different microbes. We have not obtained an isolate for this organism, but efforts are underway. As shown in Chapter 6, the isolation of the three Dehalobacter pure cultures relies on the amending of sterile supernatant from the ACT-3 culture, indicating nutrient dependence of Dehalobacter spp. on non-dechlorinating organisms. Obviously, the study of non-dechlorinating organisms and their potential interactions with Dehalobacter spp. is important to the understanding of the whole culture. Unfortunately, in my Ph.D. program, I did not have enough time to study in depth the non-dechlorinating organisms in the ACT-3 cultures. This chapter, describing the complete genome of the most abundant non-dechlorinating organism in the ACT-3 culture, begins to explore these interesting organisms and highlights their importance.

8.2 SIGNIFICANCE

The contributions of this thesis can be summarized as: (a) determination of two Dehalobacter genomes and one Bacteroidales genome; (b) identification of two novel RDases, one of which

135

dechlorinates chloroform; (c) isolation of three Dehalobacter strains, two of which are from ACT-3; (d) development of a wet-lab method for rapid RDase functional characterization, (e) development of an in silico gap-resolution method for genomic and metagenomic assembly. The assembly and annotation of the three complete genomes and the identification of the two RDases in the ACT-3 culture have significantly improved our understanding of this enrichment culture, which is being commercially applied for bioaugmentation at contaminated sites. The improved understanding of ACT-3 will help in verifying the effectiveness and safety of bioaugmentation applications of this culture. The genes encoding the two RDases will be used as functional biomarkers for activity of this culture, and in a side project (not included in this thesis), I have designed quantitative PCR primers specifically targeting these two genes. These primers can be used in bioaugmentation studies to monitor the development of ACT-3 in field sites. The three complete genomes and two Dehalobacter isolates from ACT-3 have established a nice platform for further characterization of ACT-3. A further approach taken by my lab colleagues focuses on the mathematical modeling of metagenomes of enrichment cultures including ACT-3. The modeling aims for a systematic understanding of these cultures. The assembly and annotation of genomes of the three most abundant organisms in ACT-3 have greatly facilitated their work on modeling the ACT-3 metagenome. Ultimately, the better understanding of ACT-3 will improve growth and field applications of ACT-3.

From the point of view of fundamental research, the work of this thesis has improved our understanding of Dehalobacter, an important genus commonly involved in organohalide respiration. The availability of the two Dehalobacter genomes, the first two complete genomes of Dehalobacter reported, will facilitate future proteomics studies and other researchers’ work in this field, e.g. they can be used as reference genomes to help parse mategenomes containing Dehalobacter. The comparison of these two genomes and the two functional RDase genes sheds light on the evolution of RDases and Dehalobacter. The isolation of the three Dehalobacter strains establishes a platform for further characterization of these organisms and for testing hypotheses generated from genome analyses. The finding that mixed-culture supernatant (MCS) from ACT-3 stimulates the growth of these three Dehalobacter isolates reveals the presence of certain nutrient deficiency. Uncovering the mystery of ACT-3 MCS, i.e. identifying the nutrient

136

deficiencies, will indicate how to improve the growth of Dehalobacter and possibly how to grow and apply ACT-3 better.

This thesis also made some contributions in technical progress. The two methods developed can help other researchers to solve similar problems, by overcoming previously very challenging obstacles. The wet-lab method for rapid RDase functional identification can help other researches in identifying RDases from pure or mixed cultures. The semi-automatic in silico gap- resolution method can help other researchers in performing genome assembly of pure microbial isolates or metagenomic assembly of microbial communities. In the assembly of sequencing data from pure isolates, it can resolve assembly gaps caused by intra-genomic repeats. In metagenomic assembly, it can resolve assembly gaps caused by strain variation. To my knowledge, this is the first method to demonstrate a way to resolve such gaps in metagenomic assembly. Only seven closed genomes have been assembled from metagenomic data so far; three of these genomes are from this work.

8.3 FUTURE WORK

Optimization of the RDase characterization method

In Chapter 2 and Chapter 3, we reported the development and application of a BN-PAGE based method to associate expressed RDases with their dechlorination functions. The advantage of this method is that it is fast and requires little biomass consumption. The major problem is that BN-PAGE has difficulty separating two RDases of a similar size. Potential ways to increase resolution are: (a) to run electrophoresis on a longer gel and (b) to develop a non-denaturing two-dimensional gel electrophoresis. Currently, we are using precast BN-PAGE gels purchased from Invitrogen. The first approach would require learning how to cast a gradient gel on our own. Isoelectric focusing plus BN-PAGE is a potential combination for non-denaturing two-dimensional gel electrophoresis. Activity loss is a problem excpected with both approaches. Running the whole electrophoresis process in an anoxic chamber could help.

Uncovering potential protein-protein interactions between RDases and other proteins in Dehalobacter cultures

137

An interesting observation when applying the BN-PAGE-based method to identify RDases expressed in ACT-3 cultures was that both dechlorination activity and RDases were detected in two separate gel regions. This indicates that RDases might exist in different forms. CfrA and DcrA might exist in dimers or trimers, or they might interact with other proteins to form protein complexes. From the reported literature, it appears however that RDases mainly exist as monomers, so the former assumption is less likely. Chemical cross-linking is a potential way to study protein-protein interactions.

Applying the in silico gap-resolution method to Illumina sequencing data and automating the whole process

It is clear that Illumina sequencing is the most popular most accurate and least expensive next- generation sequencing technique, as it offers the best balance between read length and throughput. In Chapter 4, we reported the development of a semi-automatic in silico gap- resolution method, which enabled the assembly of two different Dehalobacter genomes. This method can resolve assembly gaps caused by both intra-genomic repeats (such as transposase genes) and strain variation. Therefore, it has potential applications in genomic and metagenomic assembly. In Chapter 4, we applied this method to the ACT-3 metagenome, which was sequenced with 454 pyrosequencing and assembled with Newbler, but this method is not limited to the 454 sequencing data since it resolves assembly gaps using pre-assembled contigs. In fact, it should work better in the case of Illumina sequencing data, for which de Bruijn graph based assemblers (such as ABySS and Velvet) are typically used for preliminary assembly. In Chapter 7, we reported the assembly of a complete genome of Bacteroidales strain CF. In this process, the in silico gap-resolution method (with a slightly modified script) was applied to contigs assembled from Illumina data while 454 data were used for scaffolding. This demonstrates that this method works with Illumina sequencing data. The major current defect of this method is that it is not fully automatic and requires manual efforts to resolve sequence alignments in Geneious pro. Plans are underway to automate the whole process.

Uncovering the mystery of mixed culture supernatant

In Chapter 6, we reported the isolation of three pure Dehalobacter cultures. The key to the success in the isolation process was the use of ACT-3 MCS. It is likely that there are some 138

unknown nutrients in ACT-3 MCS required for the rapid growth of these three Dehalobacter strains. In the isolation process, there was some evidence (data not shown) that the exact nutrient requirements that the three Dehalobacter isolates are obtaining from ACT-3 MCS might be different. We also observed that dechlorination could significantly slow down or stop in these three isolates growing on media amended with ACT-3 MCS in the presence of extra electron donors (hydrogen) and acceptors. This indicates that required nutrients in ACT-3 MCS were depleted. We plan to use two methods to identify the required nutrients from ACT-3 MCS. One is by metabolome profiling. For example, for each isolate, samples of MCS will be collected when required nutrients are depleted as indicated by dechlorination plateau, and the whole metabolomics of these samples will be compared with those of the original ACT-3 MCS to see which metabolites are depleted. The second method is to find a defined component to replace ACT-3 MCS. For example, some microbial pure cultures require amino acid supplements; in this case, the alternative to ACT-3 MCS can be a mixture of 20 amino acids.

Characterization of the three Dehalobacter isolates

Although the use of ACT-3 MCS facilitated the isolation of these three Dehalobacter isolates, it hinders a systematic characterization of them, especially in the characterization of the metabolic functions, because ACT-3 MCS is not a defined component. I recommend systematic characterization of these three Dehalobacter isolates to be performed after the mystery of ACT-3 MCS is solved and defined media can be used to grow these isolates. Some characterization can be done earlier, such as sporulation tests, chemotaxis tests, Gram stain, electron microscopy, substrate tests (different electron donors and acceptors) etc. In substrate tests, good controls should be designed to account for situations in which certain substrates might be present in ACT-3 MCS. Some other characterization work has to wait until a defined medium is available. According to the genome annotation of strain CF and strain DCA, these two strains should be able to synthesize by themselves all 20 amino acids and several vitamins, including vitamin B12, and fix nitrogen from the headspace. Because of the dependence on ACT-3 MCS, it is difficult to test these hypotheses.

Isolation of other microbes in ACT-3 and its subcultures, and uncovering interspecies interactions between Dehalobacter and other community members

139

In ACT-3, the CF subculture, and the DCA subculture, a mixture of methanol, ethanol and lactate was used as electron donors. Dehalobacter spp. can not use these compounds as direct electron donors and must rely on other community members to ferment them into acetate and hydrogen (or formate). There are likely other interactions between Dehalobacter and other community members, as indicated by the dependence of the three Dehalobacter pure cultures on ACT-3 MCS. The fermenting organisms in ACT-3 and the CF subculture are of extra interest because they are tolerant to high levels (at least 50 mg/L) of 1,1,1-TCA and chloroform, which inhibit the growth of many microbes. In Chapter 7, we reported the assembly of the complete genome of Bacteroidales strain CF, the most abundant non-dechlorinating organism in ACT-3 and the CF subculture. This organism has not been isolated. I have obtained some isolates of non-dechlorinating organisms from the CF subculture by growing colonies in shake tubes and agar plates. Unfortunately, the strain of Bacteroidales strain CF was not among them. I was unable to incorporate the description of these isolates in this thesis. Further characterization of the isolates of non-dechlorinating organisms will be performed in future. After isolates of both Dehalobacter and other fermenting organisms are available, co-culture experiments can be designed to uncover potential interactions between them. Finally, it would be interesting to see if a minimum community consisting of these isolates can be reconstructed so that it behaves similarly to ACT-3.

De novo purification of RDase from dechlorinating cultures for protein crystallization

To date, no 3D protein structure of any RDase has been determined; therefore, the reaction sites and reaction mechanism of RDases are unknown after more than 20 years since the first demonstration of organohalide respiration. Efforts of heterologous expression of RDases were unsuccessful. An easier approach to accumulate sufficient RDase of high purity for protein crystallization might be direct purification of RDases from dechlorinating cultures, but this approach seems problematic because dechlorinating cultures are generally slow-growing and of low cell density. Generally, dechlorinating organisms grow better in mixed cultures than in pure cultures, but the complexity of mixed cultures could complicate the downstream protein purification process. Surprisingly, I found that pure Dehalobacter cultures with ACT-3 MCS can dechlorinate faster than mixed cultures. This is reasonable because the growth rate of Dehalobacter in mixed cultures (such as ACT-3) might be limited by the rate in which 140

fermenting organisms produce hydrogen or formate, but there is no such limitation in pure

cultures in which H2 or formate can be amended directly. How fast can such a pure culture grow? A 10-7 dilution transfer of the DHB-12DCA/H2 culture into 100 mL ACT-3 MCS (with no fresh medium) dechlorinates 50 µL pure 1,2-DCA in less than 3 weeks. Low cell density is another potential concern, it may be possible to increase cell density in pure cultures by concentrating the ACT-3 MCS. This should be possible because ACT-3 MCS sustains its stimulator efforts after autoclaving. If all these steps are feasible, the key to success would be to have enough ACT-3 MCS. This can be done by growing ACT-3 in a big reactor.

141

References

Achaz, G., Boyer, F., Rocha, E.P., Viari, A., and Coissac, E. (2007) Repseek, a tool to retrieve approximate repeats from large DNA sequences. Bioinformatics 23: 119-121.

Adamson, D.T., and Parkin, G.F. (1999) Biotransformation of mixtures of chlorinated aliphatic hydrocarbons by an acetate-grown methanogenic enrichment culture. Water Res 33: 1482-1494.

Adamson, D.T., and Parkin, G.F. (2000) Impact of mixtures of chlorinated aliphatic hydrocarbons on a high-rate, tetraehloroethene-dechlorinating enrichment culture. Environ Sci Technol 34: 1959-1965.

Adrian, L., Szewzyk, U., Wecke, J., and Gorisch, H. (2000) Bacterial dehalorespiration with chlorinated benzenes. Nature 408: 580-583.

Adrian, L., Rahnenfuhrer, J., Gobom, J., and Hölscher, T. (2007) Identification of a chlorobenzene reductive dehalogenase in Dehalococcoides sp. strain CBDB1. Appl Environ Microbiol 73: 7717-7724.

Ahlert, R.C., and Enzminger, J.D. (1992) Anaerobic Processes for the Dechlorination of 1,1,1- Trichloroethane. J Environ Sci Health, Part A A27: 1675-1699.

Ahsanul Islam, M., Edwards, E.A., and Mahadevan, R. (2010) Characterizing the metabolism of Dehalococcoides with a constraint-based model. PLoS Comput Biol 6.

Alikhan, N.F., Petty, N.K., Ben Zakour, N.L., and Beatson, S.A. (2011) BLAST Ring Image Generator (BRIG): simple prokaryote genome comparisons. BMC Genomics 12: 402.

Amann, R.I., Ludwig, W., and Schleifer, K.H. (1995) Phylogenetic identification and in situ detection of individual microbial cells without cultivation. Microbiol Rev 59: 143-169.

Anantharaman, V., and Aravind, L. (2002) MOSC domains: ancient, predicted sulfur-carrier domains, present in diverse metal-sulfur cluster biosynthesis proteins including Molybdenum cofactor sulfurases. FEMS Microbiol Lett 207: 55-61.

Assefa, S., Keane, T.M., Otto, T.D., Newbold, C., and Berriman, M. (2009) ABACAS: algorithm-based automatic contiguation of assembled sequences. Bioinformatics 25: 1968-1969.

ATSDR (2006). Toxicological Profile for 1,1,1-trichloroethane. URL http://www.atsdr.cdc.gov/toxprofiles/tp.asp?id=432&tid=76

ATSDR (2011). Toxicological Profile for Chloroform. URL http://www.atsdr.cdc.gov/toxprofiles/tp.asp?id=53&tid=16

Aziz, R.K., Bartels, D., Best, A.A., DeJongh, M., Disz, T., Edwards, R.A. et al. (2008) The RAST Server: rapid annotations using subsystems technology. BMC Genomics 9: 75. 142

Bagley, D.M., Lalonde, M., Kaseros, V., Stasiuk, K.E., and Sleep, B.E. (2000) Acclimation of anaerobic systems to biodegrade tetrachloroethene in the presence of carbon tetrachloride and chloroform. Water Res 34: 171-178.

Bali, S., Lawrence, A.D., Lobo, S.A., Saraiva, L.M., Golding, B.T., Palmer, D.J. et al. (2011) Molecular hijacking of siroheme for the synthesis of heme and d1 heme. Proc Natl Acad Sci U S A 108: 18260-18265.

Bastida, F., Rosell, M., Franchini, A.G., Seifert, J., Finsterbusch, S., Jehmlich, J. et al. (2010) Elucidating MTBE degradation in a mixed consortium using a multidisciplinary approach. FEMS Microbiol Ecol 73: 370-384.

Bendtsen, J.D., Nielsen, H., Widdick, D., Palmer, T., and Brunak, S. (2005) Prediction of twin- arginine signal peptides. BMC Bioinf 6: 167.

Benndorf, D., Balcke, G.U., Harms, H., and von Bergen, M. (2007) Functional metaproteome analysis of protein extracts from contaminated soil and groundwater. ISME J 1: 224-234.

Boetzer, M., and Pirovano, W. (2012) Toward almost closed genomes with GapFiller. Genome Biol 13: R56.

Boetzer, M., Henkel, C.V., Jansen, H.J., Butler, D., and Pirovano, W. (2011) Scaffolding pre- assembled contigs using SSPACE. Bioinformatics 27: 578-579.

Borden, R.C. (2007) Concurrent bioremediation of perchlorate and 1,1,1-trichloroethane in an emulsified oil barrier. J Contam Hydrol 94: 13-33.

Bouwer, E.J., and McCarty, P.L. (1983) Transformations of 1- and 2-carbon halogenated aliphatic organic compounds under methanogenic conditions. Appl Environ Microbiol 45: 1286- 1294.

Boyer, A., Page-BeLanger, R., Saucier, M., Villemur, R., Lepine, F., Juteau, P., and Beaudet, R. (2003) Purification, cloning and sequencing of an enzyme mediating the reductive dechlorination of 2,4,6-trichlorophenol from Desulfitobacterium frappieri PCP-1. Biochem J 373: 297-303.

Bradford, M.M. (1976) A rapid and sensitive method for the quantitation of microgram quantities of protein utilizing the principle of protein-dye binding. Anal Biochem 72: 248-254.

Butler, J., MacCallum, I., Kleber, M., Shlyakhter, I.A., Belmonte, M.K., Lander, E.S. et al. (2008) ALLPATHS: de novo assembly of whole-genome shotgun microreads. Genome Res 18: 810-820.

Cappelletti, M., Frascari, D., Zannoni, D., and Fedi, S. (2012) Microbial degradation of chloroform. Appl Microbiol Biotechnol 96: 1395-1409.

Chain, P.S.G., Grafham, D.V., Fulton, R.S., FitzGerald, M.G., Hostetler, J., Muzny, D. et al. (2009) Genome project standards in a new era of sequencing. Science 326: 236-237.

143

Chan, W.W., Grostern, A., Löffler, F.E., and Edwards, E.A. (2011) Quantifying the effects of 1,1,1-trichloroethane and 1,1-dichloroethane on chlorinated ethene reductive dehalogenases. Environ Sci Technol 45: 9693-9702.

Chan, W.W.M. (2009) Characterization of reductive dehalogenases in a chlorinated ethene- degrading bioaugmentation culture. In Chemical Engineering and Applied Chemistry. Toronto, ON, Canada: University of Toronto.

Chen, C., Ballapragada, B.S., Puhakka, J.A., Strand, S.E., and Ferguson, J.F. (1999) Anaerobic transformation of 1,1,1-trichloroethane by municipal digester sludge. Biodegradation 10: 297- 305.

Cheng, D., and He, J. (2009) Isolation and characterization of "Dehalococcoides" sp. strain MB, which dechlorinates tetrachloroethene to trans-1,2-dichloroethene. Appl Environ Microbiol 75: 5910-5918.

Chow, W.L., Cheng, D., Wang, S., and He, J. (2010) Identification and transcriptional analysis of trans-DCE-producing reductive dehalogenases in Dehalococcoides species. ISME J 4: 1020- 1030.

Christiansen, N., Ahring, B.K., Wohlfarth, G., and Diekert, G. (1998) Purification and characterization of the 3-chloro-4-hydroxy-phenylacetate reductive dehalogenase of Desulfitobacterium hafniense. FEBS Lett 436: 159-162.

Chung, J., and Rittmann, B.E. (2007) Bio-reductive dechlorination of 1,1,1-trichloroethane and chloroform using a hydrogen-based membrane biofilm reactor. Biotechnol Bioeng 97: 52-60.

Darling, A.E., Mau, B., and Perna, N.T. (2010) progressiveMauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS One 5: e11147. de Best, J.H., Hage, A., Doddema, H.J., Janssen, D.B., and Harder, W. (1999) Complete transformation of 1,1,1-trichloroethane to chloroethane by a methanogenic mixed population. Appl Microbiol Biotechnol 51: 277-283. de Best, J.H., Jongema, H., Weijling, A., Doddema, H.J., Janssen, D.B., and Harder, W. (1997) Transformation of 1,1,1-trichloroethane in an anaerobic packed-bed reactor at various concentrations of 1,1,1-trichloroethane, acetate and sulfate. Appl Microbiol Biotechnol 48: 417- 423.

De Wildeman, S., and Verstraete, W. (2003) The quest for microbial reductive dechlorination of C(2) to C(4) chloroalkanes is warranted. Appl Microbiol Biotechnol 61: 94-102.

Deipser, A., and Stegmann, R. (1997) Biological degradation of VCCs and CFCs under simulated anaerobic landfill conditions in laboratory test digesters. Environ Sci Pollut Res 4: 209-216.

Delcher, A.L., Bratke, K.A., Powers, E.C., and Salzberg, S.L. (2007) Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics 23: 673-679. 144

Doherty, R.E. (2000a) A history of the production and use of carbon tetrachloride, , and 1,1,1-trichloroethane in the United States: Part 1 - Historical background; carbon tetrachloride and tetrachloroethylene. Environ Forensics 1: 69-81.

Doherty, R.E. (2000b) A history of the production and use of carbon tetrachloride, tetrachloroethylene, trichloroethylene and 1,1,1-trichloroethane in the United States: Part 2: trichloroethylene and 1,1,1-trichloroethane Environ Forensics 1: 83-93.

Dolfing, J. (1990) Reductive Dechlorination of 3-Chlorobenzoate Is Coupled to Atp Production and Growth in an Anaerobic Bacterium, Strain Dcb-1. Arch Microbiol 153: 264-266.

Drummond, A.J., Ashton, B., Buxton, S., Cheung, M., Cooper, A., Duran, C. et al. (2011) Geneious v5.4.2. Biomatters, Ltd., Auckland, New Zealand.

Duchesneau, M.N., Workman, R., Baddour, F.R., and Dennis, P. (2007) Combined Dehalobacter and Dehalococcoides Bioaugmentation for Bioremediation of 1,1,1-Trichloroethane and Chlorinated Ethenes. In International In situ and on-site bioremediation symposium. Baltimore, Maryland: Battelle Press. ISBN 978-1-57477-161-9.

Duhamel, M., and Edwards, E.A. (2006) Microbial composition of chlorinated ethene-degrading cultures dominated by Dehalococcoides. FEMS Microbiol Ecol 58: 538-549.

Duhamel, M., and Edwards, E.A. (2007) Growth and yields of dechlorinators, acetogens, and methanogens during reductive dechlorination of chlorinated ethenes and dihaloelimination of 1,2-dichloroethane. Environ Sci Technol 41: 2303-2310.

Duhamel, M., Mo, K., and Edwards, E.A. (2004) Characterization of a highly enriched Dehalococcoides-containing culture that grows on vinyl chloride and trichloroethene. Appl Environ Microbiol 70: 5538-5545.

Duhamel, M., Wehr, S.D., Yu, L., Rizvi, H., Seepersad, D., Dworatzek, S. et al. (2002) Comparison of anaerobic dechlorinating enrichment cultures maintained on tetrachloroethene, trichloroethene, cis-dichloroethene and vinyl chloride. Water Res 36: 4193-4202.

Edgar, R.C. (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32: 1792-1797.

Edwards, E.A., and Grbić-Galić, D. (1994) Anaerobic degradation of toluene and o-xylene by a methanogenic consortium. Appl Environ Microbiol 60: 313-322.

Egli, C., Scholtz, R., Cook, A.M., and Leisinger, T. (1987) Anaerobic Dechlorination of Tetrachloromethane and 1,2-Dichloroethane to Degradable Products by Pure Cultures of Desulfobacterium Sp and Methanobacterium Sp. FEMS Microbiol Lett 43: 257-261.

Elhafi, G., Naylor, C.J., Savage, C.E., and Jones, R.C. (2004) Microwave or autoclave treatments destroy the infectivity of infectious bronchitis virus and avian pneumovirus but allow detection by reverse transcriptase-polymerase chain reaction. Avian Pathol 33: 303-306.

145

ESTCP (2006). Bioaugmentation for remediation of chlorinated ethenes: technology development, status, and research needs. URL http://www.estcp.org/Technology/upload/BioaugChlorinatedSol.pdf

Ettwig, K.F., Butler, M.K., Le Paslier, D., Pelletier, E., Mangenot, S., Kuypers, M.M. et al. (2010) Nitrite-driven anaerobic methane oxidation by oxygenic bacteria. Nature 464: 543-548.

Fletcher, K.E., Löffler, F.E., Richnow, H.H., and Nijenhuis, I. (2009) Stable carbon isotope fractionation of 1,2-dichloropropane during dichloroelimination by Dehalococcoides populations. Environ Sci Technol 43: 6915-6919.

Freeman, D., Lasecki, M., Hasahsham, S., and Scholze, R. (1995) Accelerated Biotransformation of Carbon Tetrachloride and Chloroform by Sulfate-Reducing Enrichment Cultures. In Bioremediation of Chlorinated Solvents. Hinchee, R.E., Leeson, A., and Semprini, L. (eds). Columbus, OH, USA: Batelle Press, pp. 123-138.

Futagami, T., Goto, M., and Furukawa, K. (2008) Biochemical and genetic bases of dehalorespiration. Chem Rec 8: 1-12.

Galli, R., and Mccarty, P.L. (1989) Biotransformation of 1,1,1-Trichloroethane, Trichloromethane, and Tetrachloromethane by a Clostridium Sp. Appl Environ Microbiol 55: 837-844.

García Martín, H., Ivanova, N., Kunin, V., Warnecke, F., Barry, K.W., McHardy, A.C. et al. (2006) Metagenomic analysis of two enhanced biological phosphorus removal (EBPR) sludge communities. Nat Biotechnol 24: 1263-1269.

Gauthier, A., Beaudet, R., Lepine, F., Juteau, P., and Villemur, R. (2006) Occurrence and expression of crdA and cprA5 encoding chloroaromatic reductive dehalogenases in Desulfitobacterium strains. Can J Microbiol 52: 47-55.

Gerkens, R.R., and Franklin, J.A. (1989) The Rate of Degradation of 1,1,1-Trichloroethane in Water by Hydrolysis and Dehydrochlorination. Chemosphere 19: 1929-1937.

Gordon, D., Desmarais, C., and Green, P. (2001) Automated finishing with Autofinish. Genome Res 11: 614-625.

Gronow, S., Held, B., Lucas, S., Lapidus, A., Del Rio, T.G., Nolan, M. et al. (2011) Complete genome sequence of Bacteroides salanitronis type strain (BL78). Stand Genomic Sci 4: 191-199.

Grostern, A. (2009) Investigation of community dynamics and dechlorination processes in chlorinated ethane-degrading microbial cultures. In: University of Toronto, 2009., p. 2 leaves.

Grostern, A., and Edwards, E.A. (2006a) Growth of Dehalobacter and Dehalococcoides spp. during degradation of chlorinated ethanes. Appl Environ Microbiol 72: 428-436.

146

Grostern, A., and Edwards, E.A. (2006b) A 1,1,1-trichloroethane-degrading anaerobic mixed microbial culture enhances biotransformation of mixtures of chlorinated ethenes and ethanes. Appl Environ Microbiol 72: 7849-7856.

Grostern, A., and Edwards, E.A. (2009) Characterization of a Dehalobacter coculture that dechlorinates 1,2-dichloroethane to ethene and identification of the putative reductive dehalogenase gene. Appl Environ Microbiol 75: 2684-2693.

Grostern, A., Duhamel, M., Dworatzek, S., and Edwards, E.A. (2010) Chloroform respiration to dichloromethane by a Dehalobacter population. Environ Microbiol 12: 1053-1060.

Guerrero-Barajas, C., and Field, J.A. (2005) Riboflavin- and cobalamin-mediated biodegradation of chloroform in a methanogenic consortium. Biotechnol Bioeng 89: 539-550.

Guindon, S., and Gascuel, O. (2003) A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol 52: 696-704.

Häggblom, M.M., and Bossert, I.D. (2003) Dehalogenation: microbial processes and environmental applications. Boston: Kluwer Academic Publishers.

Harper, D.B. (2000) The global chloromethane cycle: biosynthesis, biodegradation and metabolic role. Nat Prod Rep 17: 337-348.

He, J., Ritalahti, K.M., Yang, K.L., Koenigsberg, S.S., and Löffler, F.E. (2003) Detoxification of vinyl chloride to ethene coupled to growth of an anaerobic bacterium. Nature 424: 62-65.

He, J., Sung, Y., Krajmalnik-Brown, R., Ritalahti, K.M., and Löffler, F.E. (2005) Isolation and characterization of Dehalococcoides sp. strain FL2, a trichloroethene (TCE)- and 1,2- dichloroethene-respiring anaerobe. Environ Microbiol 7: 1442-1450.

Hill, C.W., and Gray, J.A. (1988) Effects of chromosomal inversion on cell fitness in Escherichia coli K-12. Genetics 119: 771-778.

Hoekstra, N.L., A.; Verheij, T.; Dijkhuis, J.; Slenders, H. (2005) Enhanced microbial degradation of chloroethenes and chloroethanes in a bioscreen. In 1st, International symposium on permeable reactive barriers. Wallingford, England: International Association of Hydrological Sciences, pp. 75-81.

Holliger, C., Schraa, G., Stams, A.J.M., and Zehnder, A.J.B. (1990) Reductive dechlorination of 1,2-dichloroethane and chloroethane by cell suspensions of methanogenic bacteria. Biodegradation 1: 253-261.

Holliger, C., Schraa, G., Stams, A.J., and Zehnder, A.J. (1993) A highly purified enrichment culture couples the reductive dechlorination of tetrachloroethene to growth. Appl Environ Microbiol 59: 2991-2997.

Holliger, C., Hahn, D., Harmsen, H., Ludwig, W., Schumacher, W., Tindall, B. et al. (1998) Dehalobacter restrictus gen. nov. and sp. nov., a strictly anaerobic bacterium that reductively 147

dechlorinates tetra- and trichloroethene in an anaerobic respiration. Arch Microbiol 169: 313- 321.

Hölscher, T., Görisch, H., and Adrian, L. (2003) Reductive dehalogenation of chlorobenzene congeners in cell extracts of Dehalococcoides sp. strain CBDB1. Appl Environ Microbiol 69: 2999-3001.

Huang, W., and Marth, G. (2008) EagleView: a genome assembly viewer for next-generation sequencing technologies. Genome Res 18: 1538-1543.

Hug, L. (2012) A metagenome-based examination of dechlorinating enrichment cultures: Dehalococcoides and the role of the non-dechlorinating organism. In Cell and Systems Biology. Toronto, ON, Canada: University of Toronto.

Hug, L.A., Maphosa, F., Leys, D., Löffler, F.E., Smidt, H., Edwards, E.A., and Adrian, L. (2013) Overview of organohalide-respiring bacteria and a proposal for a classification system for reductive dehalogenases. Philos Trans R Soc Lond B Biol Sci 368: 20120322.

Ishida, T., Yu, L., Akutsu, H., Ozawa, K., Kawanishi, S., Seto, A. et al. (1998) A primitive pathway of porphyrin biosynthesis and enzymology in Desulfovibrio vulgaris. Proc Natl Acad Sci U S A 95: 4853-4858.

Iverson, V., Morris, R.M., Frazar, C.D., Berthiaume, C.T., Morales, R.L., and Armbrust, E.V. (2012) Untangling genomes from metagenomes: revealing an uncultured Class of marine Euryarchaeota. Science 335: 587-590.

Jackson, R.E. (2004) Recognizing emerging environmental problems - The case of chlorinated solvents in groundwater. Technol Cult 49: 55-79.

Justicia-Leon, S.D., Ritalahti, K.M., Mack, E.E., and Löffler, F.E. (2012) Dichloromethane fermentation by a Dehalobacter sp. in an enrichment culture derived from pristine river sediment. Appl Environ Microbiol 78: 1288-1291.

Kellner, H., Jehmlich, N., Benndorf, D., Hoffmann, R., Rühl, M., Hoegger, P.J. et al. (2007) Detection, quantification and identification of fungal extracellular laccases using polyclonal antibody and mass spectrometry. Enzyme Microb Technol 41: 694-701.

Keuning, S., Janssen, D.B., and Witholt, B. (1985) Purification and Characterization of Hydrolytic Haloalkane Dehalogenase from Xanthobacter-Autotrophicus Gj10. J Bacteriol 163: 635-639.

Kim, S.H., Harzman, C., Davis, J.K., Hutcheson, R., Broderick, J.B., Marsh, T.L., and Tiedje, J.M. (2012) Genome sequence of Desulfitobacterium hafniense DCB-2, a Gram-positive anaerobe capable of dehalogenation and metal reduction. BMC Microbiol 12: 21.

Kingsford, C., Schatz, M.C., and Pop, M. (2010) Assembly complexity of prokaryotic genomes using short reads. BMC Bioinf 11: 21.

148

Kisker, C., Schindelin, H., and Rees, D.C. (1997) Molybdenum-cofactor-containing enzymes: structure and mechanism. Annu Rev Biochem 66: 233-267.

Koons, B.W., Baeseman, J.L., and Novak, P.J. (2001) Investigation of cell exudates active in carbon tetrachloride and chloroform degradation. Biotechnol Bioeng 74: 12-17.

Koren, S., Treangen, T.J., and Pop, M. (2011) Bambus 2: scaffolding metagenomes. Bioinformatics 27: 2964-2971.

Krajmalnik-Brown, R., Sung, Y., Ritalahti, K.M., Saunders, F.M., and Löffler, F.E. (2007) Environmental distribution of the trichloroethene reductive dehalogenase gene (tceA) suggests lateral gene transfer among Dehalococcoides. FEMS Microbiol Ecol 59: 206-214.

Krajmalnik-Brown, R., Hölscher, T., Thomson, I.N., Saunders, F.M., Ritalahti, K.M., and Löffler, F.E. (2004) Genetic identification of a putative vinyl chloride reductase in Dehalococcoides sp. strain BAV1. Appl Environ Microbiol 70: 6347-6351.

Krasotkina, J., Walters, T., Maruya, K.A., and Ragsdale, S.W. (2001) Characterization of the B- 12- and iron-sulfur-containing reductive dehalogenase from Desulfitobacterium chlororespirans. J Biol Chem 276: 40991-40997.

Krautler, B., Fieber, W., Ostermann, S., Fasching, M., Ongania, K.H., Gruber, K. et al. (2003) The cofactor of tetrachloroethene reductive dehalogenase of Dehalospirillum multivorans is norpseudo-B-12, a new type of a natural corrinoid. Helvetica Chimica Acta 86: 3698-3716.

Kube, M., Beck, A., Zinder, S.H., Kuhl, H., Reinhardt, R., and Adrian, L. (2005) Genome sequence of the chlorinated compound-respiring bacterium Dehalococcoides species strain CBDB1. Nat Biotechnol 23: 1269-1273.

Lapidus, A., Labutti, K., Foster, B., Lowry, S., Trong, S., and Goltsman, E. (2008) POLISHER: An effective tool for using ultra short reads in microbial genome assembly and finishing. In Advances in Genome Biology and Technology. Marco Island, FL.

Laserson, J., Jojic, V., and Koller, D. (2011) Genovo: De novo assembly for metagenomes. J Comput Biol 18: 429-443.

Laturnus, F., Haselmann, K.F., Borch, H., and Grøn, C. (2002) Terrestrial natural sources of trichloromethane (chloroform, CHCl3) - An overview. Biogeochemistry 60: 19.

Laughton, P.M., and Robertson, R.E. (1959) Solvolysis in Hydrogen and Deuterium Oxide: 3. Alkyl Halides. Can J Chem 37: 1491-1497.

Layer, G., Reichelt, J., Jahn, D., and Heinz, D.W. (2010) Structure and function of enzymes in heme biosynthesis. Protein Sci 19: 1137-1161.

Lee, M., Low, A., Zemb, O., Koenig, J., Michaelsen, A., and Manefield, M. (2012) Complete chloroform dechlorination by organochlorine respiration and fermentation. Environ Microbiol 14: 883-894. 149

Lendvay, J.M., Löffler, F.E., Dollhopf, M., Aiello, M.R., Daniels, G., Fathepure, B.Z. et al. (2003) Bioreactive barriers: A comparison of bioaugmentation and biostimulation for chlorinated solvent remediation. Environ Sci Technol 37: 1422-1431.

Löffler, F.E., and Edwards, E.A. (2006) Harnessing microbial activities for environmental cleanup. Curr Opin Biotechnol 17: 274-284.

Löffler, F.E., Yan, J., Ritalahti, K.M., Adrian, L., Edwards, E.A., Konstantinidis, K.T. et al. (2012) Dehalococcoides mccartyi gen. nov., sp. nov., obligate organohalide-respiring anaerobic bacteria, relevant to halogen cycling and bioremediation, belong to a novel bacterial class, Dehalococcoidetes classis nov., within the phylum Chloroflexi. Int J Syst Evol Microbiol.

Mabey, W., and Mill, T. (1978) Critical review of hydrolysis of organic compounds in water under environmental conditions. J Phys Chem Ref Data 7: 383-415.

Magnuson, J.K., Romine, M.F., Burris, D.R., and Kingsley, M.T. (2000) Trichloroethene reductive dehalogenase from Dehalococcoides ethenogenes: sequence of tceA and substrate range characterization. Appl Environ Microbiol 66: 5141-5147.

Magnuson, J.K., Stern, R.V., Gossett, J.M., Zinder, S.H., and Burris, D.R. (1998) Reductive dechlorination of tetrachloroethene to ethene by a two-component enzyme pathway. Appl Environ Microbiol 64: 1270-1275.

Maillard, J., Regeard, C., and Holliger, C. (2005) Isolation and characterization of Tn-Dha1, a transposon containing the tetrachloroethene reductive dehalogenase of Desulfitobacterium hafniense strain TCE1. Environ Microbiol 7: 107-117.

Maillard, J., Schumacher, W., Vazquez, F., Regeard, C., Hagen, W.R., and Holliger, C. (2003) Characterization of the corrinoid iron-sulfur protein tetrachloroethene reductive dehalogenase of Dehalobacter restrictus. Appl Environ Microbiol 69: 4628-4638.

Major, D.W., McMaster, M.L., Cox, E.E., Edwards, E.A., Dworatzek, S.M., Hendrickson, E.R. et al. (2002) Field demonstration of successful bioaugmentation to achieve dechlorination of tetrachloroethene to ethene. Environ Sci Technol 36: 5106-5116.

Maphosa, F., van Passel, M.W., de Vos, W.M., and Smidt, H. (2012) Metagenome analysis reveals yet unexplored reductive dechlorinating potential of Dehalobacter sp. E1 growing in co- culture with Sedimentibacter sp. Environ Microbiol Rep 4: 604-616.

Marco-Urrea, E., Paul, S., Khodaverdi, V., Seifert, J., von Bergen, M., Kretzschmar, U., and Adrian, L. (2011) Identification and characterization of a re-citrate synthase in Dehalococcoides strain CBDB1. J Bacteriol 193: 5171-5178.

Mardis, E., McPherson, J., Martienssen, R., Wilson, R.K., and McCombie, W.R. (2002) What is finished, and why does it matter. Genome Res 12: 669-671.

Margulies, M., Egholm, M., Altman, W.E., Attiya, S., Bader, J.S., Bemben, L.A. et al. (2005) Genome sequencing in microfabricated high-density picolitre reactors. Nature 437: 376-380. 150

Markowitz, V.M., Mavromatis, K., Ivanova, N.N., Chen, I.M., Chu, K., and Kyrpides, N.C. (2009) IMG ER: a system for microbial genome annotation expert review and curation. Bioinformatics 25: 2271-2278.

Marzorati, M., de Ferra, F., Van Raemdonck, H., Borin, S., Allifranchini, E., Carpani, G. et al. (2007) A novel reductive dehalogenase, identified in a contaminated groundwater enrichment culture and in Desulfitobacterium dichloroeliminans strain DCA1, is linked to dehalogenation of 1,2-dichloroethane. Appl Environ Microbiol 73: 2990-2999.

Maymó-Gatell, X., Anguish, T., and Zinder, S.H. (1999) Reductive dechlorination of chlorinated ethenes and 1, 2-dichloroethane by "Dehalococcoides ethenogenes" 195. Appl Environ Microbiol 65: 3108-3113.

McMurdie, P.J., Behrens, S.F., Holmes, S., and Spormann, A.M. (2007) Unusual codon bias in vinyl chloride reductase genes of Dehalococcoides species. Appl Environ Microbiol 73: 2744- 2747.

McMurdie, P.J., Hug, L.A., Edwards, E.A., Holmes, S., and Spormann, A.M. (2011) Site- specific mobilization of vinyl chloride respiration islands by a mechanism common in Dehalococcoides. BMC Genomics 12: 287.

McMurdie, P.J., Behrens, S.F., Muller, J.A., Goke, J., Ritalahti, K.M., Wagner, R. et al. (2009) Localized plasticity in the streamlined genomes of vinyl chloride respiring Dehalococcoides. PLoS Genet 5: e1000714.

Meek, M.E., Beauchamp, R., Long, G., Moir, D., Turner, L., and Walker, M. (2002) Chloroform: exposure estimation, hazard characterization, and exposure-response analysis. J Toxicol Environ Health B Crit Rev 5: 283-334.

Miller, E., Wohlfarth, G., and Diekert, G. (1998) Purification and characterization of the tetrachloroethene reductive dehalogenase of strain PCE-S. Arch Microbiol 169: 497-502.

Mohn, W.W., and Tiedje, J.M. (1990) Strain Dcb-1 Conserves Energy for Growth from Reductive Dechlorination Coupled to Formate Oxidation. Arch Microbiol 153: 267-271.

Morris, R.M., Sowell, S., Barofsky, D., Zinder, S., and Richardson, R. (2006) Transcription and mass-spectroscopic proteomic studies of electron transport oxidoreductases in Dehalococcoides ethenogenes. Environ Microbiol 8: 1499-1509.

Morris, R.M., Fung, J.M., Rahm, B.G., Zhang, S., Freedman, D.L., Zinder, S.H., and Richardson, R.E. (2007) Comparative proteomics of Dehalococcoides spp. reveals strain-specific peptides associated with activity. Appl Environ Microbiol 73: 320-326.

Müller, J.A., Rosner, B.M., von Abendroth, G., Meshulam-Simon, G., McCarty, P.L., and Spormann, A.M. (2004) Molecular identification of the catabolic vinyl chloride reductase from Dehalococcoides sp. strain VS and its environmental distribution. Appl Environ Microbiol 70: 4880-4888.

151

Nelson, J.L., Fung, J.M., Cadillo-Quiroz, H., Cheng, X., and Zinder, S.H. (2011) A role for Dehalobacter spp. in the reductive dehalogenation of dichlorobenzenes and monochlorobenzene. Environ Sci Technol 45: 6806-6813.

Nesterenko, M.V., Tilley, M., and Upton, S.J. (1994) A simple modification of Blum's silver stain method allows for 30 minute detection of proteins in polyacrylamide gels. J Biochem Biophys Methods 28: 239-242.

Neumann, A., Wohlfarth, G., and Diekert, G. (1996) Purification and characterization of tetrachloroethene reductive dehalogenase from Dehalospirillum multivorans. J Biol Chem 271: 16515-16519.

Neumann, A., Wohlfarth, G., and Diekert, G. (1998) Tetrachloroethene dehalogenase from Dehalospirillum multivorans: cloning, sequencing of the encoding genes, and expression of the pceA gene in Escherichia coli. J Bacteriol 180: 4140-4145.

Ni, S.S., Fredrickson, J.K., and Xun, L.Y. (1995) Purification and Characterization of a Novel 3- Chlorobenzoate-Reductive Dehalogenase from the Cytoplasmic Membrane of Desulfomonile- Tiedjei Dcb-1. J Bacteriol 177: 5135-5139.

Nonaka, H., Keresztes, G., Shinoda, Y., Ikenaga, Y., Abe, M., Naito, K. et al. (2006) Complete genome sequence of the dehalorespiring bacterium Desulfitobacterium hafniense Y51 and comparison with Dehalococcoides ethenogenes 195. J Bacteriol 188: 2262-2274.

Okeke, B.C., Chang, Y.C., Hatsu, M., Suzuki, T., and Takamizawa, K. (2001) Purification, cloning, and sequencing of an enzyme mediating the reductive dechlorination of tetrachloroethylene (PCE) from Clostridium bifermentans DPH-1. Can J Microbiol 47: 448-456.

Olivas, Y., Dolfing, J., and Smith, G.B. (2002) The influence of redox potential on the degradation of halogenated methanes. Environ Toxicol Chem 21: 493-499.

Osburn, M.R., Sessions, A.L., Pepe-Ranney, C., and Spear, J.R. (2011) Hydrogen-isotopic variability in fatty acids from Yellowstone National Park hot spring microbial communities. Geochimica et Cosmochimica Acta 75: 16.

Parsons, F., and Lage, G.B. (1985) Chlorinated Organics in Simulated Groundwater Environments. J Am Water Works Ass 77: 52-59.

Peng, Y., Leung, H.C.M., Yiu, S.M., and Chin, F.Y.L. (2011) Meta-IDBA: a de novo assembler for metagenomic data. Bioinformatics 27: I94-I101.

Pop, S.M., Kolarik, R.J., and Ragsdale, S.W. (2004) Regulation of anaerobic dehalorespiration by the transcriptional activator CprK. J Biol Chem 279: 49910-49918.

Prestridge, D.S. (1991) SIGNAL SCAN: a computer program that scans DNA sequences for eukaryotic transcriptional elements. Comput Appl Biosci 7: 203-206.

152

Rahm, B.G., and Richardson, R.E. (2008) Correlation of respiratory gene expression levels and pseudo-steady-state PCE respiration rates in Dehalococcoides ethenogenes. Environ Sci Technol 42: 416-421.

Rahm, B.G., Morris, R.M., and Richardson, R.E. (2006) Temporal expression of respiratory genes in an enrichment culture containing Dehalococcoides ethenogenes. Appl Environ Microbiol 72: 5486-5491.

Rebollo, R., Romanish, M.T., and Mager, D.L. (2012) Transposable elements: an abundant and natural source of regulatory sequences for host genes. Annu Rev Genet 46: 21-42.

Reese, M.G. (2001) Application of a time-delay neural network to promoter annotation in the Drosophila melanogaster genome. Comput Chem 26: 51-56.

Ritalahti, K.M., and Löffler, F.E. (2004) Populations implicated in anaerobic reductive dechlorination of 1,2-dichloropropane in highly enriched bacterial communities. Appl Environ Microbiol 70: 4088-4095.

Rosenthal, S.L. (1987) A review of the mutagenicity of chloroform. Environ Mol Mutagen 10: 211-226.

Rupakula, A., Kruse, T., Boeren, S., Holliger, C., Smidt, H., and Maillard, J. (2013) The restricted metabolism of the obligate organohalide respiring bacterium Dehalobacter restrictus: lessons from tiered functional genomics. Philos Trans R Soc Lond B Biol Sci 368: 20120325.

Schlotelburg, C., Wintzingerode, C., Hauck, R., Wintzingerode, F., Hegemann, W., and Gobel, U.B. (2002) Microbial structure of an anaerobic bioreactor population that continuously dechlorinates 1,2-dichloropropane. FEMS Microbiol Ecol 39: 229-237.

Scholtz, R., Schmuckle, A., Cook, A.M., and Leisinger, T. (1987) Degradation of 18 1- Monohaloalkanes by Arthrobacter Sp Strain-Ha1. J Gen Microbiol 133: 267-274.

Schumacher, W., and Holliger, C. (1996) The proton/electron ration of the menaquinone- dependent electron transport from dihydrogen to tetrachloroethene in "Dehalobacter restrictus". J Bacteriol 178: 2328-2333.

Seshadri, R., Adrian, L., Fouts, D.E., Eisen, J.A., Phillippy, A.M., Methe, B.A. et al. (2005) Genome sequence of the PCE-dechlorinating bacterium Dehalococcoides ethenogenes. Science 307: 105-108.

Siddaramappa, S., Challacombe, J.F., Delano, S.F., Green, L.D., Daligault, H., Bruce, D. et al. (2012) Complete genome sequence of Dehalogenimonas lykanthroporepellens type strain (BL- DC-9(T)) and comparison to "Dehalococcoides" strains. Stand Genomic Sci 6: 251-264.

Simmons, S.L., Dibartolo, G., Denef, V.J., Goltsman, D.S., Thelen, M.P., and Banfield, J.F. (2008) Population genomic analysis of strain variation in Leptospirillum group II bacteria involved in acid mine drainage formation. PLoS Biol 6: e177.

153

Simpson, J.T., Wong, K., Jackman, S.D., Schein, J.E., Jones, S.J., and Birol, I. (2009) ABySS: a parallel assembler for short read sequence data. Genome Res 19: 1117-1123.

Sjuts, H., Fisher, K., Dunstan, M.S., Rigby, S.E., and Leys, D. (2012) Heterologous expression, purification and cofactor reconstitution of the reductive dehalogenase PceA from Dehalobacter restrictus. Protein Expr Purif 85: 224-229.

Skovran, E., and Downs, D.M. (2003) Lack of the ApbC or ApbE protein results in a defect in Fe-S cluster metabolism in Salmonella enterica serovar Typhimurium. J Bacteriol 185: 98-106.

Smidt, H., and de Vos, W.M. (2004) Anaerobic microbial dehalogenation. Annu Rev Microbiol 58: 43-73.

Smidt, H., van Leest, M., van der Oost, J., and de Vos, W.M. (2000) Transcriptional regulation of the cpr gene cluster in ortho-chlorophenol-respiring Desulfitobacterium dehalogenans. J Bacteriol 182: 5683-5691.

Suidan, M.T., Wuellner, A.M., and Boyer, T.K. (1991) Anaerobic Treatment of a High-Strength Industrial-Waste Bearing Inhibitory Concentrations of 1,1,1-Trichloroethane. Water Sci Technol 23: 1385-1393.

Sun, B.L., Griffin, B.M., Ayala-del-Rio, H.L., Hashsham, S.A., and Tiedje, J.M. (2002) Microbial dehalorespiration with 1,1,1-trichloroethane. Science 298: 1023-1025.

Sung, Y., Ritalahti, K.M., Apkarian, R.P., and Löffler, F.E. (2006a) Quantitative PCR confirms purity of strain GT, a novel trichloroethene-to-ethene-respiring Dehalococcoides isolate. Appl Environ Microbiol 72: 1980-1987.

Sung, Y., Fletcher, K.E., Ritalahti, K.M., Apkarian, R.P., Ramos-Hernandez, N., Sanford, R.A. et al. (2006b) Geobacter lovleyi sp. nov. strain SZ, a novel metal-reducing and tetrachloroethene- dechlorinating bacterium. Appl Environ Microbiol 72: 2775-2782.

Suyama, A., Yamashita, M., Yoshino, S., and Furukawa, K. (2002) Molecular characterization of the PceA reductive dehalogenase of Desulfitobacterium sp. strain Y51. J Bacteriol 184: 3419- 3425.

Tang, S., and Edwards, E.A. (2013) Identification of Dehalobacter reductive dehalogenases that catalyse dechlorination of chloroform, 1,1,1-trichloroethane and 1,1-dichloroethane. Philos Trans R Soc Lond B Biol Sci 368: 20120318.

Tang, S., Gong, Y., and Edwards, E.A. (2012) Semi-automatic in silico gap closure enabled de novo assembly of two Dehalobacter genomes from metagenomic data. PLoS One 7: e52038.

Tang, S., Chan, W.W., Fletcher, K.E., Seifert, J., Liang, X., Löffler, F.E. et al. (2013) Functional characterization of reductive dehalogenases by using blue native polyacrylamide gel electrophoresis. Appl Environ Microbiol 79: 974-981.

154

Thibodeau, J., Gauthier, A., Duguay, M., Villemur, R., Lepine, F., Juteau, P., and Beaudet, R. (2004) Purification, cloning, and sequencing of a 3,5-dichlorophenol reductive dehalogenase from Desulfitobacterium frappieri PCP-1. Appl Environ Microbiol 70: 4532-4537.

Tran, B.Q., Hernandez, C., Waridel, P., Potts, A., Barblan, J., Lisacek, F., and Quadroni, M. (2011) Addressing trypsin bias in large scale (phospho)proteome analysis by size exclusion chromatography and secondary digestion of large post-trypsin peptides. J Proteome Res 10: 800- 811.

Treangen, T.J., Abraham, A.L., Touchon, M., and Rocha, E.P. (2009) Genesis, effects and fates of repeats in prokaryotic genomes. FEMS Microbiol Rev 33: 539-571.

Tsai, I.J., Otto, T.D., and Berriman, M. (2010) Improving draft assemblies by iterative mapping and assembly of short reads to eliminate gaps. Genome Biol 11.

Tsukagoshi, N., Ezaki, S., Uenaka, T., Suzuki, N., and Kurane, R. (2006) Isolation and transcriptional analysis of novel tetrachloroethene reductive dehalogenase gene from Desulfitobacterium sp. strain KBC1. Appl Microbiol Biotechnol 69: 543-553.

US-EPA (2010). Drinking water contaminants. URL http://www.epa.gov/safewater/contaminants/index.html van de Pas, B.A., Gerritse, J., de Vos, W.M., Schraa, G., and Stams, A.J.M. (2001) Two distinct enzyme systems are responsible for tetrachloroethene and chlorophenol reductive dehalogenation in Desulfitobacterium strain PCE1. Arch Microbiol 176: 165-169. van de Pas, B.A., Smidt, H., Hagen, W.R., van der Oost, J., Schraa, G., Stams, A.J.M., and de Vos, W.M. (1999) Purification and molecular characterization of ortho-chlorophenol reductive dehalogenase, a key enzyme of in Desulfitobacterium dehalogenans. J Biol Chem 274: 20287-20292. van Doesburg, W., van Eekert, M.H., Middeldorp, P.J., Balk, M., Schraa, G., and Stams, A.J. (2005) Reductive dechlorination of beta-hexachlorocyclohexane (beta-HCH) by a Dehalobacter species in coculture with a Sedimentibacter sp. FEMS Microbiol Ecol 54: 87-95.

Van Domselaar, G.H., Stothard, P., Shrivastava, S., Cruz, J.A., Guo, A., Dong, X. et al. (2005) BASys: a web server for automated bacterial genome annotation. Nucleic Acids Res 33: W455- 459.

Villemur, R., Lanthier, M., Beaudet, R., and Lepine, F. (2006) The Desulfitobacterium genus. FEMS Microbiol Rev 30: 706-733.

Villemur, R., Constant, P., Gauthier, A., Shareck, M., and Beaudet, R. (2007) Heterogeneity between 16S ribosomal RNA gene copies borne by one Desulfitobacterium strain is caused by different 100-200 bp insertions in the 5' region. Can J Microbiol 53: 116-128.

Vogel, T.M., Criddle, C.S., and McCarty, P.L. (1987) ES Critical Reviews: Transformations of halogenated aliphatic compounds. Environ Sci Technol 21: 722-736. 155

Vogel, T.M., and Mccarty, P.L. (1987) Abiotic and Biotic Transformations of 1,1,1- Trichloroethane under Methanogenic Conditions. Environ Sci Technol 21: 1208-1213.

Wagner, D.D., Hug, L.A., Hatt, J.K., Spitzmiller, M.R., Padilla-Crespo, E., Ritalahti, K.M. et al. (2012) Genomic determinants of organohalide-respiration in Geobacter lovleyi, an unusual member of the Geobacteraceae. BMC Genomics 13: 200 [Epub ahead of print].

Waller, A.S. (2009) Molecular investigation of chloroethene reductive dehalogenation by the mixed microbial community KB1. In Chemical Engineering and Applied Chemistry. Toronto, ON, Canada: University of Toronto, p. 167.

Waller, A.S., Krajmalnik-Brown, R., Löffler, F.E., and Edwards, E.A. (2005) Multiple reductive- dehalogenase-homologous genes are simultaneously transcribed during dechlorination by Dehalococcoides-containing cultures. Appl Environ Microbiol 71: 8257-8264.

Warburton, P.E., Giordano, J., Cheung, F., Gelfand, Y., and Benson, G. (2004) Inverted repeat structure of the human genome: the X-chromosome contains a preponderance of large, highly homologous inverted repeats that contain testes genes. Genome Res 14: 1861-1869.

Ward, C.H., and Stroo, H.F. (2010) In situ remediation of chlorinated solvent plumes. SERDP and ESTCP remediation technology monograph series. New York: Springer.

Weathers, L.J., and Parkin, G.F. (1995) Metallic iron-enhanced biotransformation of carbon tetrachloride and chloroform under methanogenic conditions. Biorem Chlorinated Solvents 3: 117-122.

Weathers, L.J., and Parkin, G.F. (2000) Toxicity of chloroform biotransformation to methanogenic bacteria. Environ Sci Technol 34: 2764-2767.

Wild, A., Hermann, R., and Leisinger, T. (1996) Isolation of an anaerobic bacterium which reductively dechlorinates tetrachloroethene and trichloroethene. Biodegradation 7: 507-511.

Wittig, I., and Schägger, H. (2008) Features and applications of blue-native and clear-native electrophoresis. Proteomics 8: 3974-3990.

Wittig, I., Braun, H.P., and Schägger, H. (2006) Blue native PAGE. Nat Protoc 1: 418-428.

Yan, J., Ritalahti, K.M., Wagner, D.D., and Löffler, F.E. (2012) Unexpected specificity of interspecies cobamide transfer from Geobacter spp. to organohalide-respiring Dehalococcoides mccartyi strains. Appl Environ Microbiol 78: 6630-6636.

Yan, J., Im, J., Yang, Y., and Löffler, F.E. (2013) Guided cobalamin biosynthesis supports Dehalococcoides mccartyi reductive dechlorination activity. Philos Trans R Soc Lond B Biol Sci 368: 20120320.

Yang, C.-H.J. (1981) The effects of cyanide and chloroform toxicity on methane fermentation. In. Philadelphia, PA, USA: Drexel University, pp. xiii, 230 leaves.

156

Yoshida, N., Ye, L., Baba, D., and Katayama, A. (2009a) Reductive Dechlorination of Polychlorinated Biphenyls and Dibenzo-p-Dioxins in an Enrichment Culture Containing Dehalobacter Species. Microbes Environ 24: 343-346.

Yoshida, N., Ye, L., Baba, D., and Katayama, A. (2009b) A novel Dehalobacter species is involved in extensive 4,5,6,7-tetrachlorophthalide dechlorination. Appl Environ Microbiol 75: 2400-2405.

Yu, Z.T., and Smith, G.B. (2000) Inhibition of methanogenesis by C-1- and C-2-polychlorinated aliphatic hydrocarbons. Environ Toxicol Chem 19: 2212-2217.

Zerbino, D.R., and Birney, E. (2008) Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res 18: 821-829.

157

Appendix Appendix A: Supplemental Information for Chapter 2

Figure A1 Distribution of 1,2-DCA dechlorinating activity on a BN-PAGE gel lane using protein extracts from a BAV1 culture grown on 1,2-DCA. Horizontal lines in the gel indicate positions where the gel was cut. The numbers show nmoles of ethene produced from 1,2-DCA in the activity assay. “+”: positive control with 10 µL crude protein extract. Figure A1 reports similar data as in Figure A2, but in this experiment we obtained a more comprehensive distribution of activity along the lane compared to the data shown in Figure A2. However gel slices from the samples shown in Figure A2 were sent for LC-MS/MS analysis, therefore that data was kept in the body of the paper.

158

Figure A2 Dechlorination assays with protein extracts and gel slices from the KB-1 culture. (A) of the VC-induced KB-1 culture and (B) of the TCE-induced KB-1 culture. “+”: positive control with 25 µL crude protein extract; “-”: negative control with a gel slice from a region without visible bands; “Buffer”: negative control with only the assay buffer but without cell extract.

159

Figure A3 The distribution of dechlorination activity on BN-PAGE gel lanes using protein extracts of the 1,2-DCA KB-1 subculture.Horizontal lines in the gel indicate positions where the gel was cut. The numbers show amounts of product(s) formed from different electron acceptors in the activity assay. “+”: positive control with 20 µL crude protein extract.

160

Figure A4 Enrichment of RDase activity after separation by BN-PAGE. The crude protein extract from a KB-1 culture grown on TCE was applied to the gel. The results of dechlorination activity assays are shown as nmoles of dechlorination product(s). “+”: positive control with 20 µL crude protein extract (7.4 µg total protein); “Buffer”: negative control with assay buffer but no cell extract. The specific activity and fold purification was determined by dividing the nmoles dechlorinated in 24 hours by the concentration of protein in the respective samples.

161

Table A1 Specific dechlorination rates determined by methyl viologen assays with crude protein extracts. Estimated Specific Dechlorinating Activitya

(nmol · min-1 · mg protein-1) Culture

(growth substrate) Enzyme assay BAV1 KB-1 KB-1 subculture substrate: (cis-DCE) (TCE) (1,2-DCA) PCE 0 0.47 ± 0.02 0.14 ± 0.03 TCE 1.2 ± 0.1 7.0 ± 1.0 6.6 ± 1.2 cis-DCE 23 ± 4 16 ± 2 11 ± 2 trans-DCE 12 ± 1 0.57 ± 0.04 0.24 ± 0.04 1,1-DCE 32 ± 3 N/A N/A VC 0.94 ± 0.08 3.6 ± 0.8 0.56 ± 0.10 1,2-DCA 4.7 ± 0.5 6.0 ± 1.1 2.9 ± 0.5

a Calculated as nmol dechlorination products formed after 24 hours divided by the time (1440 min) and divided by the protein content. Shown are results of triplicate samples ± SD. N/A: not measured

162

Table A2 Peptide hits and coverage of the RDases identified from the three KB-1 culturesc. Culture: TCE-Induced KB-1 Accession Peptide Hits Coverage (%) RDases Number TCE1 TCE2 TCE3 TCE1 TCE2 TCE3 KB1_VcrA DCKB1_96900a 39 59 61 36 36 42 KB1_GeobRD 393716494b 8 15 17 17 28 26 KB1_BvcA 77176863b 5 9 19 5.4 15 30 KB1_TceA DCKB1_14890a 3 3 0 5.4 5.5 0 KB1_RdhA5 DCKB1_110110a 0 3 2 0 7.9 5.8

Culture: VC-Induced KB-1 Accession Peptide Hits Coverage (%) RDases Number VC1 VC2 VC3 VC1 VC2 VC3 KB1_VcrA DCKB1_96900a 48 64 108 42 45 48 KB1_BvcA 77176863b 24 14 13 36 24 27 KB1_TceA DCKB1_14890a 4 6 9 7.9 7.9 8 KB1_GeobRD 393716494b 3 4 9 8.2 11 18 KB1_RdhA5 DCKB1_110110a 9 6 5 17 14 11 KB1_RdhA1 DCKB1_110270a 3 3 0 7.3 7.3 0

Culture: 1,2-DCA KB-1 subculture Accession Peptide Hits Coverage (%) RDases Number DCA1 DCA2 DCA3 DCA1 DCA2 DCA3 KB1_VcrA DCKB1_96900a 8 16 9 24 33 29 KB1_TceA DCKB1_14890a 10 9 7 21 18 13 KB1_BvcA 77176863b 2 2 0 5.2 4.7 0 KB1_RdhA5 DCKB1_110110a 0 2 0 0 7.4 0

a IMG gene locus tag. b NCBI GI number.c For each culture, three adjacent BN-PAGE gel slices from the region of high dechlorinating activity were analyzed by LC-MS/MS. For the 163

TCE-Induced KB-1 culture, the slices were labeled TCE1, TCE2 and TCE3; for VC-induced KB-1 culture, they were labeled VC1, VC2 and VC3; for the 1,2-DCA KB-1 subculture, they were labeled DCA1, DCA2 and DCA3.

Table A3 Proteins identified from the active BN-PAGE gel slices from the three KB-1 cultures. The MS spectra were searched against the KB-1 metagenome. For each culture, the results were produced from three consecutive gel slices covering the active region and each slice was analyzed by LC-MS/MS separately. RDase-related entries are highlighted in red.

Online link: http://aem.asm.org/content/suppl/2013/01/11/AEM.01873- 12.DCSupplemental/zam003134081so1.pdf

Table A4 The custom RDase database (amino acid sequences) used as the reference database for RDase identification from the samples of the KB-1 cultures. The first 36 sequences belong to the RDases identified from the KB-1 cultures. The remaining sequences include 117 putative RDases from other Dehalococcoides strains and 65 putative RDases from other organisms. The IMG gene locus tag, NCBI gene ID or NCBI GI number was attached to the end of each sequence identifier. For example, the identifier “KB1_RdhA1 77176848” indicates that the RDase called KB1_RdhA1 has the NCBI GI number of 77176848.

Online link: http://aem.asm.org/content/suppl/2013/01/11/AEM.01873- 12.DCSupplemental/zam003134081so1.pdf

164

Appendix B: Supplemental Information for Chapter 3

Figure B1 Microbial composition of each of the three mixed cultures determined by 16S rRNA gene pyrotag sequencing. The 16S rRNA genes were amplified with the primers of 926f (AAACTYAAAKGAATTGACGG) and 1392r (ACGGGCGGTGTGTRC). The pretreatment of DNA samples, PCR amplification, 454 sequencing and phylogenetic classification of sequenced reads were all performed by the JGI. The proportions in this figure were determined based on the counts of the phylogenetic assignments of sequenced read clusters, which are shown in detail in Table B2.

165

Figure B2 The re-construction of cfrA and dcrA genes. (a) The alignment of contig00534R (IMG locus tag, DHTCA2_contig00534) and contig12843R (IMG locus tag, DHTCA2_contig12843). (b) The alignment of contig11240R (IMG locus tag, DHTCA2_contig11240) and contig02325 (IMG locus tag, DHTCA2_contig02325). The highlighted variations were likely caused by the common homopolymer-induced sequencing error inherent to 454 technology. (c) The alignment of concensus 1 and concensus 2, revealing the presence of two clearly distinct rdhA genes. To confirm the presence of these two rdhA genes, PCR reactions with the primers rdhA_23f (AAGAGATTGTAGAAGCAGCGG) and rdhA_1383r (CTTAGTAAATGGGCAAG-CAGC) were designed to distinguish the two genes. With DNA samples from the ACT-3 culture as templates, the resulting amplicons were cloned (TOPO TA cloning kit, Invitrogen) and sequenced. With DNA samples from the CF and DCA subcutlures, the amplicons were sequenced directly without cloning. Two rdhA genes were recovered from the ACT-3 culture, one of them was found in the CF subculture and the other one was found in the DCA subculture. (d) The alignment of the two polished rdhA genes. The PCR amplicons using rdhA_23f and rdhA_1383r did not cover the whole coding region as shown in

166

(c); one additional variation at the 5’ edge was identified by mapping raw 454 sequencing reads against the two rdhA genes using the Geneious software.

Figure B3 The re-construction of cfrB and dcrB genes. (a) The alignment of three relevant contigs, contig02325F (IMG locus tag, DHTCA2_contig02325), contig18901R (IMG locus tag, DHTCA2_contig18901) and contig17912F (IMG locus tag, DHTCA2_contig17912) (b) The alignment of two re-constructed rdhB genes.

167

Figure B 4 SDS-PAGE separation of protein samples extracted from BN-PAGE gel slices. This picture shows protein extracts from the CF subculture. The gel was stained by silver staining.

Figure B5 Gel showing PCR reaction results with primers distinguishing cfrA and dcrA genes in the three mixed cultures. Duplicate reactions were performed for each culture. “ACT-3”, template DNA from ACT-3 culture; “CF sub.”, template DNA from the CF subculture; “DCA sub.”, template DNA from DCA subculture; “Neg.”, negative control with no template DNA added.

168

Figure B6 Maximum likelihood phylogenetic tree of (putative) RDases retrieved from the ACT-3 metagenome. The alignment was generated using the MUSCLE algorithm, and the tree was generated using the PhyML plugin in Geneious under the WAG model of evolution. Bootstrap support values (from 100 bootstrap iterations) are indicated where greater than 50%. The scale bar represents the average number of substitutions per site. The accession numbers starting with “JX” are GenBank accession numbers; others are IMG gene locus tag, which can be accessed through IMG/M platform (http://img.jgi.doe.gov/cgi-bin/m/main.cgi).

169

Figure B7 Potential trypsin digestion sites of six putative RdhB proteins. The basic residues are highlighted in bold. The two peptides from CfrB and DcrB that were detected by LC-MS/MS are underlined.

Table B1 Tabulated results of LC-MS/MS analyses in all gel slices analyzed. Excel file (filename: “Table S1 (LC-MSMS).xlsx”).

Online link: http://rstb.royalsocietypublishing.org/content/suppl/2013/02/27/rstb.2012.0318.DC1/rstb 20120318supp2.xlsx

170

Table B2 Clustering and phylogenetic assignments of 454 pyrotag sequences (16S rRNA tags) from the three mixed cultures. Table B2 Excel file (filename: “Table S2 (Pyrotag).xlsx”)

Online link: http://rstb.royalsocietypublishing.org/content/suppl/2013/02/27/rstb.2012.0318.DC1/rstb 20120318supp3.xlsx

Table B 3 The DNA sequences of all rdhA and rdhB genes identified from the ACT-3 metagenome. Text file (filename: “Table S3 (ACT3-rdhAB).fasta”)

Online link: http://rstb.royalsocietypublishing.org/content/suppl/2013/02/27/rstb.2012.0318.DC1/rstb 20120318supp4.zip

171

Appendix C: Supplemental Information for Chapter 4

Figure C1 Visualization of raw reads suppressed at the 5’ edge of contig00270. Each row is a raw read. The raw reads that were suppressed (but match each other) are highlighted in red.

Figure C2 Detection of tandem repeats by read mapping. The vertical line in the middle indicates the region of poly-N sequence (50 bp) that is inserted between the tandem copies of the transposase gene related to contig01504. Although the coverage in this region is zero, many read pairs spanning this region were identified, which proves the existence of the tandem copies.

172

Table C1 Contigs that contain reads whose sequences were suppressed. Suppressed Sequences Contig 5’ Edge 3’ Edge ID Representative Read Name No.1 Representative Read Name No. 1 00230 GJDNVXK02FK38E 8 00241 GJDNVXK02I4DG4 6 00242 GJDNVXK02GH51R 13 00243 GJDNVXK02GYCR5 7 00244 GJDNVXK01BIEUN 13 00245 GQIUW4001DHH63_left 10 00246 GQIUW4001D6TGY_right 10 00247 GJDNVXK01COW8I 12 00248 GJDNVXK01DDO4A 12 00249 GJDNVXK02I9OPG 6 00252 GQIUW4002HJYIK 11 00253 GQIUW4002IKBAU 12 00255 GJDNVXK01A2VBS 12 00259 GJDNVXK01EQO8R 11 GJDNVXK02H78LB 6 00260 GJDNVXK01C8EDU 11 00265 GJDNVXK01BT6ZY 6 00269 GJDNVXK01EVGRK 8 00270 GJDNVXK01E3JVP 12 00272 GQIUW4002GKUZE 7 00274 GJDNVXK02G4J27 22 00279 GJDNVXK01AC3MK 8 00284 GJDNVXK02FVHEJ 2 00288 GJDNVXK01CTGJ1 16 00289 GJDNVXK02I3ZUD 3 GJDNVXK02IAXE9 9 00290 GJDNVXK02F6K62 12 00299 GQIUW4001BSRMU 13 00302 GJDNVXK01EO1PM 24 00309 GQIUW4002IEBPH 12 00530 GJDNVXK02FN2A2 16 00539 GQIUW4002GSV6F_right 9 00540 GJDNVXK02GPLN6 14 01315R GJDNVXK02FIYJF 23 GJDNVXK02J2VE2 25 GJDNVXK01ANVGN 34 05122 GJDNVXK01DTW10 19 GJDNVXK02IJB8Z 23 01997 GJDNVXK01AYWWU 23

1 The number of raw reads that were suppressed but had sequences homologous to the representative suppressed read listed.

173

Table C2 Experimental verification of the resolution of 22 assembly gaps.

Sanger Repetitive PCR Amplicon Size 3 Gaps Sequencing Primers Contigs Predicted1 Experimental2 F TCCGCTCCATAGGCACCTCGT 00253-G-00254 2470 2400 + R CCGTCCGACAAGCTCAAAACGG F AGTGGCACGTAGCGTTGAACA 00268-G-00269 2282 2200 + R TCGACGATTTCCGCTTGTTGCT F AGCCTGCCCTTTGGAGAAAGACA 00278-G-00279 2429 2400 + R CGGCAACAGGCTTGTCGGCAT F TGGGCTATTATAGTCAGCGGCGT 01388 00538-G-00539 2251 2200 + R TTGGACTCGTGGGGTGGAACT F TCGTCTCCACTCATTATCCCGGC 00232-G-00233 3252 3100 + R GCGGAAAAATTGTCCTGGCCACA F AGGAGATTGCTGATGCGGTGGGA 00251-G-00252 2415 2500 - R AAGCCGTTAGGTGTGCCCGC F TCCAGTTCAACATCGGCTGTGCT 00310-G-00311 2970 2900 + R GTTTGAGTGTCCTCCTGGGCTGA F ATGTTCATTTTCGGGGCCGTTGA 00265-G-00266 1746 1750 + R GCGAGCGCCTTCGACCAACT F TCATGCCCCTGAAGTCGCGG 00266-G-00267 2005 2000 + R TGCACTCCTCCTGTCTGGTACG F TGGCATTTTGCGCTGGCTGG 00267-G-00268 1852 1750 + R TGGTCTTGGCCCTTGCGAGC F GCAATGTTTTGCGCCTTGGTGA 00285-G-002864 1910/3384 1900/3400 + R TGGCAGGGGATGAAGGAGTTGA 01504 F CGGTGGTTTGGCAGCAGGA 00313-G-00314 2264 2200 + R GGGTCCAAGAAATGGCGGAAG F ACCCCAACAGCTTTCAGACGGGT 00242-G-00243 1774 1800 + R CGGACGCAATCTCTCAGCATTCG F TCCCGCCCCGAGACGCTTTA 00314-G-00228 1803 1800 + R CGGGTCAATCATCGGCGGAGT F TCCTGCCTTCGATAAAAGCCTGT 00268-G-00269 2017 2000 + R AGGCGAAAGAACCGGCGTACA F AAGTTGCCGCTGCTGTCGCT 00269-G-00270 2461 2300 + R TGTCCAACCTAAATGCCGCCGA F GGAACCCATCCTGTGACCGT 00274-G-00275 2238 2100 + R GCGCGAACGATATGCCAAAGGG 01532 F GCGCTGACGTTGTGCCTGAA 00279-G-00280 2391 2500 + R AGTAGTGCCGGGGGTTAGTGT F ACGGCATTTGAACCTGAAGGCCA 00239-G-002404 2579/4476 2500/4200 + R GCCAAGGGAATCCGGCGGTC F GAAAGGAGGCCGCAGTCCG Group A 00277-G-00278 968 1000 + R GCCTGACGAACCGTGGATTGAT Gaps 00284-G-00285 1006 1000 + F TCCGCTCCCCGTATGCCCT 174

R TGCCCGAAAAGCGAAAAGCGT F GCCCATAGGTGGCGTCGATGA 00298-G-00299 1065 1100 + R TGGCAGAGGGGAAGCTCAGGG

1 The predicted amplicon size was calculated based on the solution from current gap-resolution strategy.

2 The experimental amplicon size was determined by performing DNA electrophoresis and comparing the DNA bands with DNA ladders.

3 Partial sequencing of the amplicons using Sanger sequencing: “+” indicates that the DNA sequence determined by Sanger sequencing matches the expected sequence of the amplicon determined by in silico sequence assembly; “-” indicates that Sanger sequencing failed with unknown reason.

4 For gap 00285-G-00286 and gap 00239-G-00240, in which tandem repeats were expected, two PCR products with different size were found amplified simultaneously.

175

Appendix D: Supplemental Information for Chapter 5

Figure D1 Multiple sequence alignment of 16S rRNA genes from strain CF, PER-K23 and E1. The 16S rRNA genes in strain DCA are identical to the corresponding ones in strain CF.

176

Figure D2 Three potential genome rearrangement events between strain CF and PER-K23. (a) Mauve whole genome alignment of the genomes of strain CF and strain PER-K23. (b) Mauve alignment after modifying the genome of strain CF by inverting the sequence between the two inverted rRNA operons as highlighted in part (a). (c) Mauve alignment after modifying the genome of strain PER-K23 by inverting the sequence between the two inverted repeats of an insertion sequence (IS) as highlighted in part (b). (d) Mauve alignment after modifying the 177

genome of strain PER-K23 further by reversing a potential DNA translocation event likely catalyzed by a DNA recombinase as highlighted in part (c). Here, we reversed the three potential genome rearrangement events so that the genomes of strain CF and strain PER-K23 have global synteny. The two genome rearrangement events implied by the last two steps, (c) and (d), could only happen on the genome of strain PER-K23 since the IS and the DNA recombinase are specific to strain PER-K23, i.e., they were not found in strain CF or DCA. However, it is uncertain which genome the inversion event catalyzed by two inverted rRNA gene operons, step (b), could have happened to; here we assume it happened on the genome of strain CF because the reversal of this event transforms the genome of strain CF into a state with two GC-skew arms that are more evenly divided, a state that might be more stable.

Figure D3 Genome circular map of Dehalobacter restrictus strain PER-K23.

178

Figure D4 A maximum-likelihood phylogenetic tree of all RDases (characterized or putative) from the four Dehalobacter genomes together with 15 RDases with known functions from other organisms. The alignment was generated using the MUSCLE algorithm,

179

and the tree generated using the PhyML plugin in Geneious under the WAG model of evolution. Bootstrap support values (from 100 bootstrap iterations) are indicated where greater than 50%. The scale bar represents the average number of substitutions per site.

Figure D5 Comparing strain CF, PER-K23 and E1 on the gene neighborhood of pceABCT. A concatenation of two contigs from the draft genome of strain E1 was used to represent strain E1; the two contigs were connected with a polyN connector as indicated. All CDSs were indicated as directional blocks in different colors: rdhA genes (yellow), rdhB genes (green), pceC-like genes (purple), crp/fnr transcriptional regulators (red), pceT-like genes (dark blue), ISs (light blue) and others (grey). Pairwise undirectional blocks connected with straight lines represents regions of direct repeats (green) and inverted repeats (blue); size and nucleotide identity are attached in braces.

Table D1 Insertion sequences in strain CF, DCA and PER-K23.

This table was submitted as an electronic supplemental file named Tang_Shuiquan_201406_PhD_thesis_TableD1.xls.

Table D2 Genes in Dehalobacter strain CF involved in selected metabolic pathways or categories. If corresponding orthologs in Dehalobacter sp. strain DCA, Dehalobacter restrictus strain PER-K23, Dehalobacter sp. strain E1, Desulfitobacterium hafniense strain Y51 and Dehalococcoides mccartyi strain 195 were present, their locus_tags were listed; otherwise, they were labeled “No hit”. If some highlighted genes were not present in strain CF, they were labeled “NF” (not found). If the expression of the orthologs in strain PER-K23 was detected in a

180

previous proteomic study (Rupakula et al., 2013), they were labeled “Y” in the column of “Expression”; if not detected, they were labeled “N”; if the corresponding orthologs are not present in strain PER-K23, they were labeled “NA” (not available).

This table was submitted as an electronic supplemental file named Tang_Shuiquan_201406_PhD_thesis_TableD2.xls.

181