A Systems-Level Investigation of the Metabolism of Dehalococcoides mccartyi and the Associated Microbial Community
by
Mohammad Ahsanul Islam
A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy
Department of Chemical Engineering and Applied Chemistry University of Toronto
© Copyright by Mohammad Ahsanul Islam 2014
A Systems-Level Investigation of the Metabolism of Dehalococcoides mccartyi and the Associated Microbial Community
Mohammad Ahsanul Islam
Doctor of Philosophy
Department of Chemical Engineering and Applied Chemistry University of Toronto
2014
Abstract
Dehalococcoides mccartyi are a group of strictly anaerobic bacteria important for the detoxification of man-made chloro-organic solvents, most of which are ubiquitous, persistent, and often carcinogenic ground water pollutants. These bacteria exclusively conserve energy for growth from a pollutant detoxification reaction through a novel metabolic process termed organohalide respiration. However, this energy harnessing process is not well elucidated at the level of D. mccartyi metabolism. Also, the underlying reasons behind their robust and rapid growth in mixed consortia as compared to their slow and inefficient growth in pure isolates are unknown. To obtain better insight on D. mccartyi physiology and metabolism, a detailed pan- genome-scale constraint-based mathematical model of metabolism was developed. The model highlighted the energy-starved nature of these bacteria, which probably is linked to their slow growth in isolates. The model also provided a useful framework for subsequent analysis and visualization of high-throughput transcriptomic data of D. mccartyi. Apart from confirming expression of the majority genes of these bacteria, this analysis helped review the annotations of
ii
metabolic genes. Revised annotations of two such metabolic genes — NADP+-isocitrate dehydrogenase and phosphomannose isomerase — were then experimentally verified. Finally, growth experiments were performed with a D. mccartyi-containing anaerobic mixed enrichment culture to explore the effects of exogenous vitamin omission from the growth medium on D. mccartyi and the associated microbial community. The experiments showed how nutritional requirements of these bacteria changed the composition and dynamics of their associated microbial community. Overall, a systems-level approach was used in this research to obtain a fundamental and critical understanding of the metabolism and physiology of D. mccartyi in isolates, as well as in microbial communities they naturally inhabit. The results presented in this thesis, therefore, will help design effective strategies for future bioremediation efforts by D. mccartyi.
iii
Acknowledgements
I would like to take this opportunity to thank many fascinating people who helped me complete this very long but exciting journey! It was, indeed, a life changing experience!
Fisrt of all, my sincere thanks goes to my supervisors, Dr. Radhakrishnan (Krishna) Mahadevan and Dr. Elizabeth A. Edwards — both of whom are extremely caring mentors, knowledgeable scholars, and great personalities. It was their contagious enthusiasm and passion about any scientific matter, tireless inquisitive minds, and critical thinking that shaped me as a researcher during my stay at UofT. They provided me with all kinds of support and help over these many years without which this thesis wouldn’t be possible. I’m eternally grateful to both Krishna and Elizabeth for giving me such rare opportunities as to work in the amazing world of microbial metabolism.
I would like to thank all past and present members of both LMSE (Laboratory for Metabolic Systems Engineering) and EdLab, including Jiao, Karthik, Bahareh, Nadeera, Nick B, Peter, Kai, Laurence, Nik, Fahime, Srinath, Pratish, Sarat, Victor, Chris, Ariel, Alison, Eve, Roya, Anna, Winnie, Max, Jennifer, Laura, Marie, Alfredo, Torsten, Jine Jine, Cheryl, Ivy, Shuiquan, Fei, Wendy, Luz, Sarah, and Olivia. These are the people with whom I shared some of the most treasured moments of my life. You are truly the most talented, enthusiastic, and beloved coworkers that I have ever worked with.
I also want to express my humble gratitude to my committee members, Dr. Nicholas J. Provart and Dr. Emma R. Master. Their guidance, support, critical comments, and careful directions were some of the most influential factors that worked behind materializing this thesis. I feel fortunate to have these powerful scientific minds and fantastic personalities as my committee members. Likewise, I express my gratitude to Dr. David S. Guttman and Dr. Boris Steipe for teaching me bioinformatics, without which I wouldn’t be able to even start my research.
Thanks to Dr. Alexander F. Yakunin and Dr. Alexei Savchenko for sharing your expertise in enzymology and proteomics with someone like me who is so naïve in those areas. Thanks also to Dr. Melanie Duhamel, the BioZone Assistant Director and Project Manager, for being a friend
iv
and “oracle” of KB-1 related challenges, as well as of monetary issues. Also, thank you to Endang Susilawati (Susie) and Angelika Duffy, the BioZone Lab Manager and Assistant Lab Manager. Susie’s motherly touch made my office-stay feel like home-stay, and Angelika’s warm support helped me steer through difficult times more easily. I am also very grateful to Anatoli Tchigvintsev and Greg Brown, the two knowledgeable technicians and biochemists, without whom my enzyme work would only be a dream!
Thank you to Leticia Gutierrez and Gorette Silva for being so supportive in solving administrative matters, and always greeting with cordial smiles. Thanks to Pauline Martini and Joan Chen at the Chemical Engineering Graduate office for making the “bureaucratic part” of my graduate life easy, and providing essential information for all scholarship and funding related matters. Thanks so much to Julie Mendonca for keeping my head straight about payroll and tax related issues. Thanks also to Daniel Tomchyshyn and Weijun Gao, the Departmental and BioZone computer geeks, for providing and solving all IT supports and problems.
I am truly indebted to my parents, Shamsun Nahar and Shamsul Alam, for bringing me into this spectacular world of microbes. It was their lifelong teaching, their simple yet powerful philosophies about life in general, and their intrinsic novel qualities that shaped me as a human being. Without their love, devotion, encouragement, and continuous prayers, I wouldn’t imagine to come this far. I am also extremely grateful to my parents-in-law, Shamima Ali and Mazed Ali, for their love and prayers, and most importantly, for giving me their daughter, Rutba, my wife and my eternal love. Rutba is an inspirational teacher, guide, and philosopher without whom I couldn’t even imagine to embark on this PhD journey. So, equal credit goes to Rutba on successful completion of this journey, and I dedicate this thesis to her — my best friend with whom I’m looking forward to spending many more fun-filled years. The acknowledgements remain incomplete if I don’t mention the most precious gift of my life, Manar, my daughter. Her smile and warmth of calling “Baba” refresh me every single day.
Finally, I want to thank my funding agencies for their generous support during my graduate study: Government of Ontario, Genome Canada, Ontario Genomics Institute, Natural Sciences and Engineering Research Council of Canada, US Department of Defense Strategic Environmental Research and Development Program, and the University of Toronto. v
Table of Contents
Abstract ...... ii Acknowledgements ...... iv Table of Contents ...... vi List of Tables ...... ix List of Figures ...... x List of Appendices ...... xii List of Non-Standard Abbreviations Used ...... xiv Chapter 1: Introduction ...... 1 1.1. Motivation ...... 1 1.2. Research objectives ...... 3 1.3. Thesis outline ...... 4 1.4. Statement of authorship and publication status ...... 7 Chapter 2: General overview ...... 11 2.1. Systems biology ...... 11 2.2. Modeling microbial metabolism ...... 12 2.3. Constraint-based reconstruction and analysis (COBRA) approach ...... 14 2.4. Metabolic network reconstruction procedures ...... 15 2.5. Determination of biomass composition and maintenance energy ...... 19 2.6. Model validation and refinement ...... 21 2.7. Flux balance analysis ...... 21 2.8. Energy conservation in microbes ...... 23 2.9. Chlorinated xenobiotics and reductive dechlorination ...... 25 2. 10. Dehalococcoides bacteria ...... 28 2.11. The KB-1 microbial community ...... 30 Chapter 3: Characterizing the metabolism of Dehalococcoides with a constraint-based model .. 32 3.1. Abstract ...... 32 3.2. Introduction ...... 33 3.3. Materials and methods ...... 36 3.3.1. Dehalococcoides pan-genome ...... 36 3.3.2. Reconstructing the metabolic network of Dehalococcoides ...... 37 3.3.3. Estimation of biomass composition and maintenance energy requirements ...... 38 3.3.4. In silico analysis of Dehalococcoides metabolism ...... 39 3.4. Results and discussion ...... 40 vi
3.4.1. Dehalococcoides metabolic network ...... 40 3.4.1.1. Pan-metabolic-genes of Dehalococcoides ...... 40 3.4.1.2. Features of the Reconstructed Metabolic Network of Dehalococcoides ...... 43 3.4.2. Model-based simulations of Dehalococcoides physiology ...... 47 3.4.2.1. Exploring the central metabolism of Dehalococcoides ...... 47
3.4.2.2. CO2-fixation by Dehalococcoides ...... 48 3.4.3. Energy conservation process of Dehalococcoides ...... 53 3.4.4. Implications of the incomplete cobalamin synthesis pathway in Dehalococcoides .... 55 3.4.5. Does carbon or energy limit the in silico growth of Dehalococcoides? ...... 60 3.5. Conclusions ...... 63 Chapter 4: New insight into Dehalococcoides mccartyi metabolism from a model-integrated systems-level analysis of D. mccartyi transcriptomes ...... 65 4.1. Abstract ...... 65 4.2. Introduction ...... 66 4.3. Materials and methods ...... 68 4.3.1. Identification of D. mccartyi genes from KB-1 shotgun microarray data ...... 68 4.3.2. Dehalococcoides mccartyi strain 195 microarray data ...... 69 4.3.3. Operon prediction for Dehalococcoides mccartyi genomes ...... 70 4.3.4. Microarray data analysis and visualization ...... 71 4.4. Results and discussion ...... 72 4.4.1. Principal component analysis of strain 195 and KB-1 Dhc microarray data ...... 72 4.4.2. Improved identification and confirmation of D. mccartyi genes ...... 75 4.4.3. Confirmation of hypothetical proteins in strain 195 and KB-1 Dhc genomes ...... 76 4.4.4. Confirmation of metabolic genes in strain 195 and KB-1 Dhc genomes ...... 78 4.4.5. Clustering of microarray data and operon predictions ...... 85 4.4.7. Analysis of strain 195 QT cluster 2 ...... 89 4.4.8. Analysis of strain 195 QT cluster 6 ...... 92 4.5. Conclusions ...... 95 Chapter 5: Model-assisted prediction and experimental characterization of isocitrate dehydrogenase and phosphomannose isomerase from Dehalococcoides mccartyi strain KB-1 .. 97 5.1. Abstract ...... 97 5.2. Introduction ...... 98 5.3 Materials and methods ...... 100 5.3.1. Bacterial culture, reagents and chemicals ...... 100 5.3.2. Gene cloning and overexpression of the selected genes in E. coli ...... 101
vii
5.3.3. Purification of the overexpressed recombinant proteins ...... 101 5.3.4. Enzymatic assays for the purified recombinant proteins ...... 102 5.4. Results and discussion ...... 103 5.4.1. Biochemical activities and kinetic parameters of KB1_0495 (DmIDH) and KB1_0553 (DmPMI) ...... 103 5.4.2. Sequence homology and phylogenetic analyses of DmIDH and DmPMI sequences 110 5.4.3. Structure based analysis of DmIDH and DmPMI sequences ...... 117 5.5. Conclusions ...... 123 Chapter 6: Role of exogenous vitamin omission on the growth and community dynamics of a Dehalococcoides mccartyi-containing anaerobic mixed microbial community ...... 125 6.1. Abstract ...... 125 6.2. Introduction ...... 126 6.3. Materials and methods ...... 128 6.3.1. Chemicals and analytical procedures ...... 128 6.3.2. Preparation of exogenous vitamins and resazurin free KB-1 cultures ...... 128 6.3.3. Preparation of diluted KB-1 cultures for the time-course experiment ...... 131 6.3.4. DNA collection and extraction ...... 133 6.3.5. Quantitative PCR (qPCR) primers and method ...... 133 6.3.6. Analysis of qPCR data ...... 134 6.4. Results ...... 135 6.4.1. Dechlorination and growth of washed KB-1 cultures cultivated in different growth media ...... 135 6.4.2. Community composition of washed KB-1 cultures cultivated in different growth media ...... 137 6.4.3. Dechlorination and growth of diluted KB-1 cultures cultivated in different growth media ...... 143 6.5. Discussion ...... 147 6.6. Conclusions ...... 151 Chapter 7: Summary, conclusions, and future work ...... 152 7.1. Summary ...... 152 7.2. Conclusions ...... 154 7.3. Future work ...... 156 References ...... 160 Appendices ...... 186
viii
List of Tables
Table 3.1. General features of Dehalococcoides metabolic network (iAI549) ...... 44 Table 3.2. Composition of the in silico minimal medium of Dehalococcoides ...... 45 Table 3.3. Comparison of various in silico genome-scale models with iAI549 ...... 46 Table 4.1. Strain 195 genes identified in functionally enriched clusters and associated inferred annotations ...... 94 Table 5.1. Kinetic parameters of DmIDH and DmPMI from D. mccartyi strain KB-1 ...... 105 Table 6.1. Composition of different mineral media used in this study ...... 130 Table 6.2. Description of different KB-1 cultures used in this study ...... 130 Table 6.3. Estimated TCE dechlorination and ethene production rates of different washed KB-1 cultures ...... 137
ix
List of Figures
Figure 1.1. Schematic representation of the relationship between different thesis objectives. .... 10 Figure 2.1. Steps involved in developing a genome-scale constraint-based metabolic model by COBRA approach...... 15 Figure 2.2. Metabolic network reconstruction procedure...... 17 Figure 2.3. Anaerobic reductive dechlorination of chlorinated ethenes to benign ethene and higher chlorinated benzenes to less toxic lower chlorinated benzenes ...... 27 Figure 2.4. Schematic representation of the microbial interactions between different community members in the KB-1 community...... 31 Figure 3.1. Composition of the Dehalococcoides pan-genome...... 41 Figure 3.2. Distribution of dispensable and unique metabolic genes in different Dehalococcoides strains...... 43 Figure 3.3. The reconstructed TCA-cycle and CO2 fixation pathway of Dehalococcoides...... 51 Figure 3.4. Analysis of the citrate synthase (CS) reaction on Dehalococcoides growth...... 53 Figure 3.5. A Tentative Scheme for D. mccartyi electron transport chain (ETC)...... 55 Figure 3.6. Reconstructed cobalamin biosynthesis pathway of Dehalococcoides...... 58 Figure 3.7. Influence of cobalamin on the growth rate and yield of Dehalococcoides...... 60 Figure 3.8. Effect of carbon and energy sources on the growth yield of Dehalococcoides...... 63 Figure 4.1. Principal component analysis (PCA) of the array data for strain 195 and KB-1 Dhc samples...... 75 Figure 4.2. Hypothetical proteins of (A) strain 195 and (B) KB-1 Dhc with proteomic and transcriptomic evidence...... 78 Figure 4.3. Proteomic and transcriptomic evidence for the hypothetical proteins of strain 195 reannotated in the D. mccartyi metabolic model...... 80 Figure 4.4. Proteomic and transcriptomic evidence for the hypothetical proteins of KB-1 Dhc reannotated in the D. mccartyi metabolic model...... 82 Figure 4.5. Expression of reductive dehalogenase homologous (rdhA) genes...... 84 Figure 4.6. Functional enrichment analysis of QT clusters for (A) strain 195 and (B) KB-1 Dhc array data...... 88 Figure 4.7. Analysis of two functionally enriched strain 195 QT clusters...... 92 Figure 5.1. Effects of pH and substrate concentrations on the rate of DmIDH...... 108 Figure 5.2. Effects of pH and substrate concentrations on the rate of DmPMI...... 109 Figure 5.3. Mannosylglycerate (MG) biosynthesis pathway in D. mccartyi...... 109 Figure 5.4. Sequence homology network for DmIDH (KB1_0495) and DmPMI (KB1_0553). 113 Figure 5.5. Phylogenetic analysis of DmIDH protein sequence...... 115 Figure 5.6. Phylogenetic analysis of DmPMI protein sequence...... 117 Figure 5.7. Structure-based multiple sequence alignment (MSA) of DmIDH...... 119 Figure 5.8. Structure-based multiple sequence alignment (MSA) of DmPMI ...... 121 Figure 6.1. Genealogy of KB-1 cultures used in this study...... 133 Figure 6.2. TCE dechlorination profiles of different washed KB-1 cultures...... 137 Figure 6.3. Community composition of different washed KB-1 cultures ...... 140 Figure 6.4. Community composition of different washed KB-1 cultures in terms of absolute cell numbers...... 142 Figure 6.5. Dechlorination profiles for the time-course experiment of diluted KB-1 cultures. . 145
x
Figure 6.6. TCE dechlorination rates, ethene production rates, and D. mccartyi cell numbers for diluted KB-1 cultures...... 147
xi
List of Appendices
Appendix A: Supplemental information for Chapter 3 ...... 186 Table A1. Overall Macromolecular Composition of a Dehalococcoides Cell ...... 186 Table A2. Protein Composition of 1 Gram of Dehalococcoides Cell ...... 186 Table A3. DNA Composition of 1 Gram of Dehalococcoides Cell ...... 187 Table A4. RNA Composition of 1 Gram of Dehalococcoides Cell ...... 187 Table A5. Lipid Composition of 1 Gram of Dehalococcoides Cell ...... 187 Table A6. Composition of Cofactors and Other Soluble Pools of 1 Gram of Dehalococcoides Cell ...... 187 Table A7. Experimental Growth Yields of Various Dehalococcoides Cultures ...... 188 Table A8. Experimental Growth Rates of Various Dehalococcoides Cultures ...... 190 Table A9. Experimental Decay Rates of Different Anaerobes ...... 191 Table A10. Energy Cost for Processing and Polymerization of Macromolecules (GAM) of a Typical Bacterial Cell ...... 191 Table A11. Standard Gibbs Free Energies for Different Dechlorination Reactions ...... 192 Table A12. Theoretical ATP/e- and H+/e- Ratios of Reductive Dechlorination by Dehalococcoides ...... 193 Table A13. Experimental Values of Corrinoid Content of Various Anaerobes ...... 193 Table A14. Growth Rate Simulations with and without the Citrate Synthase (CS) Reaction in the TCA-cycle ...... 194 Table A15. List of tables containing information for Dehalococcoides metabolic model, iAI549 ...... 194 Supplemental Text ...... 195 Dehalococcoides Biomass Synthesis Reaction ...... 195 Calculation of Dehalococcoides Cell Composition ...... 196 Calculation of NGAM and GAM Parameters of iAI549 ...... 198 Calculation of Theoretical Maximum Energy Transfer Efficiency (ATP/e-) and Proton Translocation Stoichiometry (H+/e- ratio) of Dehalococcoides Electron Transport Chain (ETC) ...... 198 Detailed procedures for developing the Dehalococcoides pan-genome and model ...... 200 Figure A1. Steps involved in developing the pan-genome ...... 200 Figure A2. Steps involved in developing the core-genome ...... 201 Figure A3. Steps involved in developing the unique-genome ...... 202 Figure A4. Steps involved in developing the dispensable-genome ...... 203 Figure A5. Reconstructed Wood-Ljungdahl pathway for Dehalococcoides...... 205 Figure A6. Distribution of metabolic genes in different subsystems of iAI549 ...... 205 Figure A7. Distribution of gene-associated model reactions in different subsystems of iAI549 206 Appendix B: Supplemental information for Chapter 4 ...... 207 Table B1: List of supplemental tables for chapter 4 ...... 207 Figure B1. Workflow for Analyzing Pre-Processed KB-1 Microarray Data...... 209 Figure B2. Workflow for Analyzing Pre-Processed Strain 195 Microarray Data...... 211 Figure B3. Distribution of Strain 195 Gene Expression Intensities for 27 Samples...... 212 Figure B4. Distribution of KB-1 Dhc Gene Expression Intensities for 33 Samples...... 214 Figure B5. Visualization of gene expression data on the Dehalococcoides mccartyi metabolic network...... 216 Appendix C: Supplemental information for Chapter 5 ...... 217 xii
Identification of KB1_0495 (DmIDH) and KB1_0553 (DmPMI) ...... 217 Figure C1. Orthologous gene neighborhood analysis for DmIDH (KB1_0495) DmPMI (KB1_0553) ...... 220 Figure C2. Orthologous gene neighborhood analysis for the 3-isopropylmalate dehydrogenase (IPMDH) from D. mccartyi (cbdbA804, DET0826, DhcVS_730, DehaBAV1_0745, DehalGT_0706)...... 222 Appendix D: Supplemental information for Chapter 6 ...... 223 Table D1. Previously developed (Duhamel, 2005, Waller, 2010) qPCR primer-sets used in this study ...... 223 Figure D1. qPCR results for Acetobacterium OTU for 1:10 dilution of KB-1 samples ...... 224 Figure D2. Ratios of total bacterial and archaeal cell numbers tracked by individual OTU primers to general bacterial and archaeal primers...... 226 Figure D3. Electron balance profiles for the time-course experiment of diluted KB-1 cultures. 228 Appendix E: Genome-scale constraint-based metabolic modeling of Moorella thermoacetica 229 Abstract ...... 229 Introduction ...... 229 Materials and methods ...... 231 Automated generation of a draft metabolic model for Moorella thermoacetica ...... 231 Determination of biomass compositions ...... 232 Curation of the draft Moorella thermoacetica metabolic model ...... 233 Figure E1. Steps involved in curating the M. thermoacetica draft metabolic model...... 233 Results and discussion ...... 233 General features of the reconstructed metabolic network of M. thermoacetica ...... 234 Figure E2. Subsystems of the Moorella thermoacetica metabolic model, iAI517...... 235 Figure E3. Reconstructed metabolic network of Moorella thermoacetica...... 236 Figure E4. Reconstructed Wood-Ljungdahl (W-L) pathway of CO2-fixation for M. thermoacetica...... 237 Model-based simulations of M. thermoacetica metabolism ...... 238 Figure E5. In silico growth profile of Moorella thermoacetica on different substrates using the metabolic model, iAI517...... 238 Figure E6. In silico ATP generation profile of Moorella thermoacetica on different substrates using the metabolic model, iAI517...... 239 Conclusions ...... 240 Table E1. General features of Moorella thermoacetica metabolic reconstruction ...... 240 Table E2. Overall cellular composition of a Moorella thermoacetica cell ...... 241 Table E3. Protein composition of a Moorella thermoacetica cell ...... 242 Table E4. DNA composition of a Moorella thermoacetica cell ...... 243 Table E5. RNA composition of a Moorella thermoacetica cell ...... 243 Table E6. Fatty acid composition of a Moorella thermoacetica cell ...... 243 Table E7. Lipid composition of a Moorella thermoacetica cell ...... 244 Table E8. Cell wall composition of a Moorella thermoacetica cell ...... 245 Table E9. Ions and metabolites of a Moorella thermoacetica cell ...... 245 Moorella thermoacetica biomass equation ...... 247
xiii
List of Non-Standard Abbreviations Used
BLAST — Basic local alignment search tool MCA — Metabolic control analysis HCM — Hybrid cybernetic modeling COBRA — Constraint-based reconstruction and analysis FBA — Flux balance analysis GAM — Growth associated maintenance parameter NGAM — Non-growth associated maintenance parameter SLP — Substrate level phosphorylation ETP — Electron transport phosphorylation DNAPL— Dense non-aqueous phase liquid PCE — Tetrachloroethene TCE — Trichloroethene cDCE — cis 1,2-Dichloroethene VC — Vinyl chloride HCB — Hexachlorobenzene PeCB — Pentachlorobenzene TeCB — Tetrachlorobenzene TCB — Trichlorobenzene DCB — Dichlorobenzene TCEM — Trichloroethene and methanol cDCEM — cis 1,2-Dichloroethene and methanol VCM — Vinyl chloride and methanol VCH — Vinyl chloride and hydrogen NA — Not amended qPCR — Quantitative polymerase chain reaction OTU — Operational taxonomic unit
xiv
1
Chapter 1: Introduction
1.1. Motivation
Halogenated organic compounds or organohalides, i.e., organofluorides, organochlorides, organobromides, and organoiodides are abundant in nature. They originate primarily from three sources: geogenic, biogenic, and anthropogenic (Gribble, 1992, Gribble, 2010, Öberg, 2002, Gribble, 2012). Geogenic organohalides are generated mainly from oceanic sea-salt spray, volcanic eruptions, forest and grass fires, sediments, and soil, while biogenic organohalides are the results of biological synthesis of these compounds in bacteria, fungi, plants, marine plants, marine sponges, corals, insects, and mammals including humans (Gribble, 1992, Gribble, 2010). Thus, both geogenic and biogenic organohalides are naturally produced compounds, and the vast majority of these compounds, except a few such as dioxins produced by forest fires, are non- toxic, non-persistent, biodegradable, and rather useful compounds (Gribble, 1992, Gribble, 2010, Öberg, 2002).
On the contrary, the vast majority of anthropogenic organohalides generated by chemical synthesis are toxic, persistent, and recalcitrant to biodegradation (Öberg, 2002, Leys et al., 2013). Also, it is their toxic, persistent, less reactive, less flammable, and less corrosive properties that made such synthetic organohalides, including tetrachloroethene (PCE), trichloroethene (TCE), pentachlorobenzene (PeCB), hexachlorobenzene (HCB), dioxins, and polychlorinated biphenyls (PCB), very useful and popular for extensive commercial and industrial use (McCarty, 2010, Doherty, 2000a, Doherty, 2000b). They were widely used as degreasers in cleaning metal parts, dry cleaning agents, solvents, paints, pharmaceuticals, pesticides, fungicides, and as ingredients and intermediates in chemical synthesis of many useful compounds (Doherty, 2000a, Doherty, 2000b, Doherty, 2012, Lohman, 2002). However, past uncontrolled disposal and handling practices, together with their toxicity and structural stability have made them ubiquitous and persistent xenobiotics (ATSDR, 2013, McCarty, 2001, McCarty, 2010, Petrisor and Wells, 2008). Because they contaminate soil, sediments, and groundwater aquifers, their presence poses a significant threat to human health and the environment.
2
Despite being persistent and recalcitrant to biodegradation, chlorinated xenobiotics undergo both abiotic and biotic transformations (McCarty, 2001, McCarty, 2010, Leys et al., 2013). However, partial transformations of these compounds by abiotic and biotic processes produce even more toxic intermediates than the parent compounds (Öberg, 2002, McCarty, 2010); for instance, partial degradation of PCE and TCE generates vinyl chloride (VC), a known human carcinogen causing a rare form of liver cancer (Vianna et al., 1981). Nevertheless, only an anaerobic biological process, termed organohalide respiration is capable of complete biodegradation of PCE and TCE to non-toxic ethene by the reductive dechlorinatrion reaction (Freedman and Gossett, 1989, Holliger et al., 1993). This novel and natural process is catalyzed by the reductive dehalogenase enzymes of mostly Dehalococcoides mccartyi — a group of strictly anaerobic and organohalide respiring tiny bacteria (Smidt and de Vos, 2004, Tas et al., 2010, Leys et al., 2013). The fact that D. mccartyi harness energy for growth only from the reductive dechlorination reaction during organohalide respiration has made these bacteria very unique and immensely important for the bioremediation application (Holliger et al., 1993, Holliger et al., 1998b, Adrian, 2009). In other words, faster growth rates of D. mccartyi are directly linked to the rapid bioremediation of chlorinated xenobiotics from nature. However, growth of these specialized bacteria is significantly slower in pure cultures than in mixed cultures (McCarty, 2010, Adrian et al., 1998, Adrian et al., 2000a, Adrian et al., 2000b). Also, dechlorination activities of these bacteria are more robust in consortia than in isolates (Duhamel et al., 2002, Duhamel, 2005). Thus, in order to better apply organohalide respiration for the bioremediation of chlorinated pollutants, we need to have a detailed understanding of the unusual metabolism of D. mccartyi and the microbial community to which they are intricately linked.
Recent advances in high-throughput experimental technologies such as genome sequencing have facilitated detailed and systems-level studies of microbial metabolism (Heinemann and Sauer, 2010, Pagani et al., 2012). Such studies usually involve in silico mathematical modeling of cells and simulating their integrated behavior in the context of cellular physiology (Covert et al., 2001, Di Ventura et al., 2006). One of the pivotal aspects of such systematic analysis and interpretation of data is the reconstruction of cellular networks, the collection and visualization of all physiologically relevant cellular processes (Ideker et al., 2001, Chuang et al., 2010). Unlike most cellular networks, such as regulatory, signaling, and protein-protein interaction networks, the
3
interaction topology of metabolic networks is well established since metabolism is fairly conserved across different branches of life (Peregrin-Alvarez et al., 2009, Fani and Fondi, 2009). Most importantly, metabolic interactions can be described quantitatively with the help of cellular biochemical reactions, which essentially represent the inviolable physicochemical constraints of a cell (Reed et al., 2006a, Price et al., 2004). Hence, a metabolic network can be interrogated and analyzed by mathematical tools in order to predict the metabolic fate and phenotypic behavior of a cell (Price et al., 2004, Lewis et al., 2012, Pfau et al., 2011). These useful properties of modeling microbial metabolism have led to an astronomical amount of research initiatives in this field (Baumann, 2011, Patil et al., 2004, Mardinoglu et al., 2013, Thiele et al., 2013, Kim et al., 2012). Systems-level analyses of microbial metabolism have enabled researchers with improved understanding of such critical aspects of microbial physiology as gene transcription, protein translation, enzyme kinetics, and metabolic regulation (Liu et al., 2010, Oberhardt et al., 2009). This enhanced understanding of microbial physiology, in turn, allows researchers to manipulate the metabolic capability of these microbes for achieving desired goals in medical, industrial, food, and environmental biotechnology sectors (McCloskey et al., 2013, Oberhardt et al., 2009, Patil et al., 2004, Lovley, 2003, Mahadevan et al., 2011).
1.2. Research objectives
The overall goal of this research is to investigate the fundamental characteristics of the unusual metabolism of specialist bacteria Dehalococcoides mccartyi, both in isolates and in microbial communities. This goal was achieved through a systems-level approach, and both computational and experimental studies concerning the physiology and metabolism of D. mccartyi were conducted. In silico studies included the construction of a detailed mathematical model of metabolism for D. mccartyi followed by integration of high-throughput transcriptomic data with the model obtained from gene-expression microarray experiments of these bacteria. Experimental studies were conducted for the biochemical characterization of two metabolic genes — isocitrate dehydrogenase and phosphomannose isomerase — from D. mccartyi strain KB-1 and with an in-house D. mccartyi-containing anaerobic mixed enrichment culture, KB-1. Thus, the overall goal of this research was accomplished by pursuing four specific objectives:
4
1. Construction of a detailed constraint-based systemic mathematical model of D. mccartyi metabolism;
2. Improved annotation and elucidation of D. mccartyi genes and metabolism through the model-integrated analysis of high-throughput transcriptomic data;
3. Biochemical characterization of isocitrate dehydrogenase and phosphomannose isomerase genes of D. mccartyi; and
4. Exploring the influence of exogenous vitamin B12 (cobalamin) removal from the growth medium on a D. mccartyi-containing anaerobic dechlorinating microbial community.
1.3. Thesis outline
Chapter 1: Introduction
This chapter describes the motivation behind the research by defining the problems and challenges, states research objectives, and gives an outline of the contents of all chapters included in this thesis.
Chapter 2: General overview
This chapter presents a general overview and background information on some of the most important underlying basic concepts and topics of the research projects described throughout this thesis, including different concepts of systems-level modeling of microbial metabolism, essential concepts of genome-scale constraint-based modeling, theory of microbial energy conservation, and a general introduction to chloro-organic xenobiotics contamination and their remediation strategies by D. mccartyi metabolism in pure isolates and in mixed microbial communities.
5
Chapter 3: Characterizing the metabolism of Dehalococcoides with a constraint-based model
There are four results chapters in this thesis, and chapter 3 describes the in silico experiments conducted to achieve the first research objective. In particular, this chapter details the construction of a pan-genome-scale constraint-based mathematical model of D. mccartyi metabolism. The D. mccartyi pan-genome and subsequent metabolic network were developed from the publicly accessible genome sequences of D. mccartyi strains 195, CBDB1, BAV1, and VS, as well as from relevant information on the organisms’ physiology and metabolism from published literature and biological databases. In silico growth and metabolism of D. mccartyi were simulated with the flux balance analysis approach using the reconstructed metabolic network as a modeling framework. The model was used for predicting the influence of carbon and energy sources, as well as the availability of cobalamin in the growth medium on D. mccartyi growth rate and yield. The annotations of all genes in D. mccartyi genomes were also reviewed or corrected bioinformatically during the construction of the pan-genome-scale metabolic network.
Chapter 4: New insight into Dehalococcoides mccartyi metabolism from a model-integrated systems-level analysis of D. mccartyi transcriptomes
This chapter also describes the in silico experiments and analyses performed to achieve the second research objective of this thesis. Using the pan-genome-scale metabolic network and model developed in chapter 3 as a common platform, transcriptomic data from the gene- expression microarray experiments of pure and mixed cultures of D. mccartyi were analyzed in this chapter. Various bioinformatic and statistical analyses of the transcriptomic data were performed to shed light on different metabolic processes, including the poorly understood mechanism of energy conservation in D. mccartyi. Transcriptomic data, together with available proteomic data were also used to confirm transcription and expression of the majority genes in D. mccartyi genomes, as well as to review or improve their annotations. Finally, this meta-
6 analysis generated experimentally testable hypotheses regarding the function of some hypothetical proteins and metabolic genes of these environmentally important bacteria.
Chapter 5: Model-assisted prediction and experimental characterization of isocitrate dehydrogenase and phosphomannose isomerase from Dehalococcoides mccartyi strain KB-1
This chapter is one of two experimental chapters in this thesis, and it describes both computational and enzymology techniques used to achieve the third research objective. In particular, chapter 5 elucidates one of many applications of systems-level modeling of microbial metabolism by describing the detailed experimental characterization two metabolic genes — isocitrate dehydrogenase and phosphomannose isomerase — of D. mccartyi strain KB-1. Annotations of these two putative metabolic genes, one of which was originally annotated as a hypothetical protein, were reviewed in detail during the construction of D. mccartyi metabolic network and model, as well as the model-integrated bioinformatic analyses of transcriptomic data obtained from microarray experiments. The genes were heterologously expressed in E. coli, overexpressed recombinant proteins were then purified, and biochemical activity of the purified recombinant proteins were tested with appropriate enzymatic assays. The results confirmed the presence of two novel metabolic genes in D. mccartyi, as well as highlighted the importance of revised gene-annotations presented during the construction of the D. mccartyi metabolic model.
Chapter 6: Role of exogenous vitamin omission on the growth and community dynamics of a Dehalococcoides mccartyi-containing anaerobic mixed microbial community
Another experimental chapter of this thesis, which describes the microbiological techniques used to pursue the fourth and final research objective. This chapter explores if D. mccartyi, the vitamin B12-auxotrophic bacteria, can survive without the addition of exogenous vitamin mixtures, including vitamin B12, in their growth medium. The growth experiments were conducted with KB-1, a D. mccartyi-containing anaerobic and dechlorinating mixed microbial enrichment culture. KB-1 growth on trichloroethene and methanol was monitored in different
7
growth media with various combinations of exogenous vitamins and with no exogenous vitamins. D. mccartyi growth was inferred from the gas chromatographic analysis of degradation products, as well as the calculation of dechlorination rates. Also, the influence of different growth media on the KB-1 community composition was identified using the quantitative PCR (qPCR) technique. Finally, the difference in dechlorination and growth rates were analyzed by conducting growth experiments with diluted KB-1 cultures in media with and without any exogenous vitamins. D. mccartyi growth in these diluted cultures was also verified using qPCR.
Chapter 7: Summary, conclusions, and future work
This chapter summarizes the main findings from different research projects presented in this thesis, and describes their novelty, significance, and impact on D. mccartyi metabolism and physiology, as well as on the bioremediation application of these organisms, overall. New prospects for future research initiatives based on the research presented in this thesis are also described briefly.
1.4. Statement of authorship and publication status
Chapter 3: Characterizing the metabolism of Dehalococcoides with a constraint-based model
Authors: M. Ahsanul Islam1, Elizabeth A. Edwards1, and Radhakrishnan Mahadevan1
Contributions: EAE and RM conceived of the ideas and designed the experiments. MAI performed the experiments and analyzed the data. MAI wrote the manuscript with input from all co-authors.
Affiliations: 1-Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, Ontario, Canada
8
Reference to publication: PLoS Computational Biology 2010, 6(8): e1000887. doi:10.1371/journal.pcbi.1000887
Chapter 4: New insight into Dehalococcoides mccartyi metabolism from a model-integrated systems-level analysis of D. mccartyi transcriptomes
Authors: M. Ahsanul Islam1, Alison S. Waller2, Laura A. Hug3, Nicholar J. Provart4, Elizabeth. A. Edwards1, and Radhakrishnan Mahadevan1
Contributions: MAI conceived of the ideas and designed the experiments in consultation with NJP, EAE, and RM. ASW generated the KB-1 transcriptomic data. LAH generated the draft genome of D. mccartyi in KB-1. MAI analyzed the data and wrote the manuscript with input from all co-authors.
Affiliations: 1-Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, Ontario, Canada, 2-European Molecular Biology Laboratory (EMBL), Heidelberg, Germany, 3-Department of Earth and Planetary Science, University of California, Berkeley, USA, 4-Department of Cell and Systems Biology, University of Toronto, Toronto, Ontario, Canada
Conditional acceptance: PLOS ONE
Chapter 5: Model-assisted prediction and experimental characterization of isocitrate dehydrogenase and phosphomannose isomerase from Dehalococcoides mccartyi strain KB-1
Authors: M. Ahsanul Islam, Anatoli Tchigvintsev, Veronica Yim, Alexei Savchenko, Alexander F. Yakunin, Elizabeth. A. Edwards, and Radhakrishnan Mahadevan
Contributions: MAI, AS, AFY, EAE, and RM conceived of the ideas. MAI and AFY designed the experiments. VY prepared the clones, and MAI and AT performed the experiments. MAI characterized the enzymes and wrote the manuscript with input from all co-authors.
9
Affiliations: 1-Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, Ontario, Canada
In preparation for: Journal of Bacteriology
Chapter 6: Role of exogenous vitamin omission on the growth and community dynamics of a Dehalococcoides mccartyi-containing anaerobic mixed microbial community
Authors: M. Ahsanul Islam1, Radhakrishnan Mahadevan1, and Elizabeth A. Edwards1
Contributions: MAI, RM, and EAE designed the experiments. MAI performed the experiments, analyzed the data, and wrote the manuscript with input from all co-authors.
Affiliations: 1-Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, Ontario, Canada
In preparation for: Applied and Environmental Microbiology
10
Figure 1.1. Schematic representation of the relationship between different thesis objectives.
11
Chapter 2: General overview
2.1. Systems biology
A “systems” concept is relatively new in biology; at least in comparison to physical sciences or engineering (Wiki_System, 2013). In 1824, the French physicist Nicolas Léonard Sadi Carnot first used the “system” concept for studying thermodynamics (Wiki_Sadi_Carnot, 2013), while in biology, it was not until 1934 when the Austrian biologist Karl Ludwig von Bertalanffy used “systems” theory for describing an organism’s growth over time by a simple mathematical model (Wiki_Bertalanffy, 2013, Wiki_System, 2013). More recently, the advent of various “omics” data, such as genomics, transcriptomics, proteomics, and metabolomics, generated from high- throughput biological experiments has led to a new era in modern biology; a data-rich environment that is completely unfamiliar to data-poor classical biology. The high-throughput experimental technologies, including genome sequencing, microarray, and mass spectrometry, not only generated a plethora of data but also significantly broadened the horizon of our knowledge about the biological systems. However, study of this data-rich modern biology requires a more holistic approach as compared to the reductionist approach of classical biology; and this requirement ultimately gives rise to systems biology (Chuang et al., 2010). Thus, systems biology is the integrated study of a biological system and all of its components, and their intricate relationships, rather than individual study of the system components (Chuang et al., 2010, Ideker et al., 2001, Peitsch and de Graaf, 2013).
The whole new paradigm of systems biology uses genome, proteome, and metabolome-scale data, as well as model systems for accelerating predictive and hypothesis-driven research; however, such model systems are first required to be validated by the detailed single component experiments and literature from classical biology. Nevertheless, the ability of systems biology to integrate and study myriad biological data generated from both innovative system-wide experiments and computational approaches has shown its promise in many research areas, including gene expression analysis, network biology, signal transduction, pathway-based biomarkers and analysis, and metabolic pathways (Chuang et al., 2010, Peitsch and de Graaf,
12
2013). Systems biology research in metabolic pathways has been especially facilitated by the exponential growth of biological databases of microbial genome sequences, both in size and numbers (Pagani et al., 2012, Robbins, 1994, Stein, 2003). In silico mathematical modeling of microbial cells and simulating their integrated behavior in the context of cellular physiology is an organized and useful way of interpreting these biological data (Karr et al., 2012, Durot et al., 2009, Hyduke et al., 2013, McCloskey et al., 2013).
2.2. Modeling microbial metabolism
From a biochemical engineering point of view, microbial cells, or cells in general can be regarded as biological chemical process plants, where hundreds of enzyme catalyzed biochemical reactions are operating to achieve a specific cellular objective such as the cell growth. All of these biochemical reactions, categorized as energy-producing (catabolic reactions), and energy-consuming or biosynthetic precursor producing (anabolic reactions), jointly constitute the cellular process called metabolism (Madigan et al., 2010, Nelson and Cox, 2006, Todar, 2012). Thus, metabolism is considered as the “driving engine” of a cell (Buchakjian and Kornbluth, 2010), and the effort to model microbial metabolism, as well as cell growth is fairly old. As mentioned before, the earliest mathematical model for cell growth as a function of time was described by Karl Ludwig von Bertalanffy in 1934 (Wiki_Bertalanffy, 2013). Then, Jacques Monod, in 1949, presented a more systematic model for cellular growth based on empirical results (Monod, 1949), in which he described the relationship among substrates, products, and biomass by a single hyperbolic equation (Monod, 1949).
As the simplified models of cell growth were more like a correlation than a real model, the first attempt to detailed mathematical modeling of an individual cell was described by Shuler et al (Shuler et al., 1979). Shuler and coworkers developed a single-cell computer model of Escherichia coli by incorporating the formation of biomass precursors, cellular macromolecules, and intracellular metabolites as lumped reactions, called “pseudo-chemical” reactions (Shuler et al., 1979). The model was later refined to include cellular energy requirements, and model predictions were validated with experimental data under glucose limiting conditions (Domach et
13 al., 1984). However, these relatively simplified single-cell models failed to address the complexity of metabolic networks that usually arises from the multitude and regulation of metabolic reactions in a cell. These limitations were overcome in more systematic and detailed modeling approaches, such as “metabolic control analysis” (MCA) (Kacser and Burns, 1973, Heinrich and Rapoport, 1974) and “cybernetic modeling” (Ramkrishna, 1983).
Developed in the 1970s (Kacser and Burns, 1973, Heinrich and Rapoport, 1974), MCA is a mathematical modeling approach for understanding the control of metabolic flux and intermediate metabolite concentrations on the enzymes involved in a particular metabolic pathway (Wildermuth, 2000). It is essentially a sensitivity analysis of the metabolic network of a cell by perturbing metabolic fluxes and metabolite concentrations, the two system variables, for identifying rate-limiting enzymatic reactions in a pathway (Fell, 1992). Although MCA quantifies metabolic regulation to some extent, its use of sensitivity coefficients is more representative of microbial enzymatic kinetic competition than cellular regulation (Patnaik, 2001). Cellular regulatory effects on metabolic networks were addressed in a goal-oriented modeling approach called “cybernetic modeling”. Developed by Ramkrishna and coworkers (Ramkrishna and Song, 2012, Ramkrishna, 1983), cybernetic approach is based on the premise that biological systems can manipulate their metabolism in response to environmental changes such as availability of nutrients for maximizing a particular objective. This modeling framework was later extended to develop hybrid cybernetic models (HCM) (Kim et al., 2008) and lumped hybrid cybernetic models (L-HCM) (Song and Ramkrishna, 2010) by including large numbers of metabolic reactions in the models. However, none of these frameworks are suitable for incorporating genome-scale information because estimation of model parameters for solving such models, even for a small-scale model like E. coli central metabolism, is a formidable challenge (Ramkrishna and Song, 2012). Moreover, both MCA and cybernetic approaches require a large number of experimentally measured information such as enzyme kinetics data, and this requirement especially makes them inconvenient to use with systems-level data.
The linear programming-based approach for mathematical modeling of metabolism, called the constraint-based reconstruction and analysis (COBRA) (Becker et al., 2007, Schellenberger et al., 2011, Thiele and Palsson, 2010a), is a simple yet powerful method that can incorporate
14
genome-scale information and identify optimal flux distributions of a large metabolic network with minimum experimental information (Lewis et al., 2012). The COBRA approach is mainly based on flux balance analysis (FBA), which was primarily developed by Varma and Palsson in the 1990s (Varma and Palsson, 1994). Since the COBRA framework was extensively used for developing the models of microbial metabolism in this thesis, all steps involved are briefly described in the following sections.
2.3. Constraint-based reconstruction and analysis (COBRA) approach
Genome-scale constraint-based modeling of microbial metabolism, also known as COBRA approach, is an iterative model building process that requires extensive genomic, biochemical, and physiological information about an organism (Reed et al., 2006a, Thiele and Palsson, 2010a, Thiele and Palsson, 2010b, Feist et al., 2009, Palsson, 2006). Such a model never becomes complete but evolves with the evolution of knowledge about the organism of interest. Figure 2.1 is a schematic illustration of the steps involved in constructing a genome-scale model. The first step in the development of such a detailed model is the reconstruction of a genetically, genomically, and biochemically characterized metabolic network which forms the base for mathematical analysis of the model. Once a highly curated metabolic network is constructed, the biomass composition of the organism has to be determined followed by representation of all biomass components in the form of a biomass synthesis reaction. Then, additional physiological information such as the cellular maintenance energy in the form of ATP requirements has to be incorporated in the model, so that the model can be used to predict the growth rate and by- product secretion patterns of the organism (Palsson, 2006, Becker et al., 2007, Feist et al., 2009, Lewis et al., 2012, Schellenberger et al., 2011, Thiele and Palsson, 2010a). Finally, the model has to be validated by experimental data such as experimental growth rate or substrate uptake rate of the model organism, as well as by simulating its metabolism using flux balance analysis (FBA), a mathematical modeling technique used for quantitatively simulating microbial metabolism (Thiele and Palsson, 2010a, Varma and Palsson, 1994).
15
Figure 2.1. Steps involved in developing a genome-scale constraint-based metabolic model by COBRA approach. After reconstructing the metabolic network, and estimating biomass compositions and ATP requirements for cellular maintenance of the organism of interest, FBA is used to integrate all these information for simulating the metabolism of the organism.
2.4. Metabolic network reconstruction procedures
Metabolism is a collective process of all cellular biochemical reactions, and it drives the physiological processes of a cell (Madigan et al., 2010, Nelson and Cox, 2006, Todar, 2012). Thus, the backbone of a genome-wide in silico metabolic model of a microorganism is the reconstructed network of all biochemical reactions taking place during its metabolism. Such an
16
integrated representation of genes, proteins, and their interactions is required to be well curated in order to enhance the predictive power of the model for predicting the metabolic phenotype of the microbe (Covert et al., 2001, Feist et al., 2009, Francke et al., 2005). In recent years, extensive research in this field has generated a number of algorithms for developing automated reconstructed metabolic network of an organism from its annotated genome sequence (Arakawa et al., 2006, DeJongh, 2007, Karp et al., 2002, Pinney et al., 2005, Sun and Zeng, 2004, Henry et al., 2010, Hung et al., 2010). However, automatically generated networks are not free from inconsistencies, such as the presence of missing reactions required to generate essential biomass precursors, or unwanted reactions not present in the organism’s genome. Hence, they require an extensive and laborious manual curation in order to be incorporated in the genome-scale model. Various steps involved in the network reconstruction process are briefly shown in Figure 2.2.
The network reconstruction process starts with the annotation of a sequenced genome of the organism to be modeled, and this has been facilitated by the recent advances in high-throughput genome sequencing technology (Mohamed and Syed, 2013, Metzker, 2010, MacLean et al., 2009). Currently, several thousands of completely sequenced and annotated genomes are available in the World Wide Web that can be downloaded from a number of websites of different biological databases, such as GOLD (http://www.genomesonline.org/cgi-bin/GOLD/index.cgi), KEGG (http://www.genome.jp/kegg), JGI-IMG (http://img.jgi.doe.gov), JCVI-CMR (http://cmr.jcvi.org/tigr-scripts/CMR/CmrHomePage.cgi), EMBL-EBI (http://www.ebi.ac.uk/genomes/), and NCBI (http://www.ncbi.nlm.nih.gov/genome). The annotation of a sequenced genome is the assignment of functions to genes or gene products based on sequence similarity or homology with molecules or proteins of known functions available in biological databases (Koonin and Galperin, 2003). In rare cases, gene annotations are verified with mRNA, protein, or enzyme-level experimental results.
17
Figure 2.2. Metabolic network reconstruction procedure. Steps involved in constructing and curating a reconstructed metabolic network of an organism from its annotated genome sequence, and available biochemical and experimental data are illustrated.
The next step in the reconstruction process is the identification of genes with defined metabolic functions, which are then verified by identifying their homologs in other well characterized and extensively studied organisms, such as Escherichia coli, Bacillus subtilis, and Saccharomyces cerevisiae with the help of sequence alignment tool BLAST (Altschul et al., 1997). Subsequently, confidence levels are assigned based on the degree of sequence identity or bi- directional best BLAST hits. In addition, these genes are also evaluated on the basis of both gene order or conserved synteny, as well as phylogenetic analysis with the updated versions of biochemical databases, including KEGG (Kanehisa and Goto, 2000), UniProt (Apweiler et al.,
18
2012), SWISSPROT (Boeckmann et al., 2003), IMG (Markowitz et al., 2007), and PDB (Berman et al., 2000).
After identification of the genes in terms of the organism’s metabolic context, the next step is to obtain the metabolic reactions which require different levels of detailed biochemical information (Feist et al., 2009, Reed et al., 2006a, Thiele and Palsson, 2010a). First, substrate specificity for an enzyme has to be determined from biochemical literature of the organism to be modeled, as well as from the updated biochemical databases such as BRENDA (Schomburg et al, 2004) and ENZYME (Gasteiger et al, 2003), and the links to relevant publications therein. Secondly, after determining the molecular and charged formula of participating metabolites, stoichiometric coefficients for the biochemical reactions to be incorporated in the network are estimated by balancing the products and substrates on both sides of a reaction. Afterwards, the information regarding thermodynamic considerations or reaction directionality has to be incorporated followed by localization of reactions and proteins in cellular compartments (Feist et al., 2009, Reed et al., 2006a, Thiele and Palsson, 2010a). All of the aforementioned information is available in metabolic databases, including MetaCyc (Caspi et al., 2012), BioCyc (Caspi et al., 2012), KEGG (Kanehisa et al., 2011), SEED (DeJongh et al., 2007), and PubChem (Bolton et al., 2008, Wang et al., 2012). Although the cellular localization of reactions is very important and challenging for eukaryotic organisms, the task is relatively easy for prokaryotes containing only one compartment.
Once the metabolic reactions are defined, the next and most critical step in the network reconstruction process is the pathway analysis for finding and filling gaps in a metabolic network. This process is known as network debugging or network curation (Feist et al., 2009, Thiele and Palsson, 2010a). Although a number of algorithms (Green and Karp, 2004, Herrgård et al., 2006, Hung et al., 2010, Notebaart et al., 2006, Reed et al., 2006b, Kumar et al., 2007) are available for automated curation of a metabolic network, manual curation is still necessary. The manual curation step is very laborious and cumbersome yet essential for generating an accurate reconstruction, which can then be used as a scaffold for developing a genome-scale in silico model. The metabolic reconstruction generated from a genome annotation alone has several pitfalls, including incorrect substrate specificity, reaction reversibility, cofactor usage, treatment
19 of enzyme subunits as separated enzymes, and missing reactions that have no assigned ORFs (Durot et al., 2009, Feist et al., 2009, Reed et al., 2006a, Thiele and Palsson, 2010a). This is because genome annotations of an organism are usually based on sequence similarity or homology with known database proteins without having any experimental or biochemical evidence for that gene product, reaction or function prediction (Devos and Valencia, 2001, Reed et al., 2006a). In addition, the metabolic capability demonstrated by the reconstructed network has to be consistent with the known physiology of the organism. Hence, knowledge about the organism’s physiology, especially in the context of metabolic capabilities, is crucial for a curated reconstructed network.
Typically, gaps in a metabolic network are generated by the so called “dead metabolites” or “blocked metabolites”, i.e., metabolites that are only either consumed or produced in the network (Feist et al., 2009, Schellenberger et al., 2011, Thiele and Palsson, 2010a). The dead metabolites are usually originated from the presence or absence of certain pathways/reactions affecting the availability of substrates for other reactions (Francke et al., 2005, Kumar et al., 2007). So, the missing reactions associated with such metabolites have to be identified from previously mentioned reaction databases, as well as from the analysis of major metabolic pathways, such as glycolysis, TCA-cycle, or amino acid biosynthesis pathways that are essential for generating biomass precursors of the organism. Once the reactions are identified and relevant gaps are fixed, the next step is to look for the genes encoding the enzymes catalyzing the missing reactions in other organisms. Subsequently, the homologs of these genes in the organism to be modeled have to be identified using reciprocal BLAST analyses. This process, in turn, leads to reannotations of existing ORFs, or adding new genes into the reconstructed network.
2.5. Determination of biomass composition and maintenance energy
Once a highly curated metabolic network of the organism to be modeled is reconstructed, it is necessary to know the demands on the metabolic system in terms of biomass synthesis, as well as maintenance energy requirements. Hence, this information is incorporated in the model as a biomass synthesis reaction in the next step of the model building process (Becker et al., 2007,
20
Feist et al., 2009, Schellenberger et al., 2011, Thiele and Palsson, 2010a, Varma and Palsson, 1994). Usually, the composition of cellular macromolecules, such as proteins, nucleic acids, polysaccharides, lipids, fatty acids, and cofactors, are to be estimated from detailed physiological experiments about the organism of interest. However, amino acid, DNA, and RNA composition can also be estimated from the organism’s genome sequence (Roberts et al., 2010), and detailed experimental biomass compositions of several well studied organisms from Eukarya, Bacteria, and Archaea is available (Duarte et al., 2004, Feist et al., 2006, Mahadevan et al., 2006, Neidhardt et al., 1990, Oh et al., 2007). Thus, in the absence of organism-specific experimental data, the distribution and composition of remaining biomass precursors can be approximated and modified from other organisms’ data in accordance with the physiology and morphology of the organism to be modeled.
In addition to the biomass composition, the other key component of the biomass synthesis reaction is the maintenance energy requirement of a microbial cell. The maintenance energy refers to the energy required for a microbial cell to perform functions that are not directly growth related, or not involved in synthesizing new materials for a microbial cell (Pirt, 1965, Pirt, 1982, Russell and Cook, 1995). Cells produce energy in the form of ATP from catabolic reactions, and utilize these ATP molecules for biosynthetic anabolic reactions. However, not all ATP from catabolism are consumed by anabolic reactions, and cells require ATP energy for many functions not otherwise captured in the metabolic mdoel, such as the polymerization of macromolecules, turnover of cellular amino acid pools, active movement or motility, and active substrate and ion transport; this cellular ATP requirement is termed maintenance energy (Neidhardt et al., 1990). Cellular maintenance energy requirement can be of two types: growth-associated maintenance energy (GAM), and non-growth associated maintenance energy (NGAM) (Neidhardt et al., 1990). GAM is variable and accounts for the energy related to assembly and polymerization of macromolecules (i.e., proteins, DNA, RNA, lipids, and polysaccharides) while constant NGAM corresponds to ATP energy required for maintaining the integrity of a microbial cell (Pirt, 1965, Pirt, 1982, Russell and Cook, 1995). Methods for estimating both types of maintenance energy have been developed and described in literature (Neijssel et al., 1996, Pirt, 1965, Pirt, 1982).
21
2.6. Model validation and refinement
The final step in developing a genome-scale in silico model of microbial metabolism is the validation of the model with organism-specific experimental data, and analysis of the model predictions for probable refinement of the metabolic network, as well as the overall model. This process can be accomplished by predicting and comparing the data on growth and by-product secretion patterns of the organism in other conditions that are not used to estimate the model parameters. Such experimental data can be obtained from large-scale high-throughput physiological techniques, called phenotype micro-arrays (Bochner et al., 2001, Atanasova and Druzhinina, 2010, Oh et al., 2007). In addition, isotope labeling experiments such as 13C-based metabolic flux analysis can also be used for evaluating metabolic networks and validating genome-scale models (Tang et al., 2009a, Tang et al., 2012, Sauer, 2006). Moreover, the developed model can be used to generate experimentally testable hypotheses regarding the cellular physiology, as well as the metabolic capability in terms of genotype-phenotype relationship of an organism by utilizing flux balance analysis (FBA) technique. In fact, this technique is at the core of constraint-based genome scale modeling; thus, discussed in detail in the following section.
2.7. Flux balance analysis
Flux balance analysis (FBA) is an established mathematical modeling approach that has been extensively used for quantitatively simulating cellular metabolism using genome-scale metabolic models. Essentially, FBA is a mathematical framework that is usually used to interrogate the reconstructed metabolic network of an organism to predict its cellular behavior, or metabolic phenotype under certain physicochemical constraints, such as mass balance constraints, energy balance constraints, and flux limitations or bounds constraints (Price et al., 2004, Orth et al., 2010, Edwards et al., 2002, Bonarius et al., 1997, Kauffman et al., 2003). Thus, FBA calculates the flow of metabolites through a metabolic network and predicts the growth or by-product secretion pattern of an organism. The fundamental principle upon which FBA is based on is the law of conservation of mass (Edwards et al., 2002, Kauffman et al., 2003, Palsson, 2006).
22
Extensive literature describing the formulation, as well as the implementation in metabolic modeling is available on FBA (Price et al., 2004, Orth et al., 2010, Edwards et al., 2002, Bonarius et al., 1997, Kauffman et al., 2003, Gianchandani et al., 2010, Raman and Chandra, 2009, Palsson, 2006, Varma and Palsson, 1994, Lewis et al., 2012), and formulation of the method is briefly described in the following section.
Let us assume, the concentration of a particular metabolite (xi) in a metabolic reaction network is
influenced by various reaction fluxes (vj). A material/mass balance around each metabolite in a metabolic network results in the dynamic mass balance equation of the form:
dx i = v − v − ()v ± v (1) dt syn deg use trans
where, vsyn and vdeg are referring to the synthesis and degradation fluxes of the metabolite xi. vuse is the cellular maintenance flux and vtrans is the uptake or secretion flux of the metabolite xi; the former can be determined from cellular compositions as discussed previously while the latter can be measured experimentally. Thus, equation (1) takes the form:
dx i = v − v − b (2) dt syn deg i
where, bi is the total output of xi from the defined metabolic system. Equation (2) can also be represented by a single matrix equation of the form:
dX = S • v − b (3) dt
where, X is an m x 1 dimensional matrix of m metabolites within the cell, v is the n x 1 dimensional matrix of n fluxes through n number of metabolic reactions, S is the m x n dimensional stoichiometric matrix representing the entire metabolic network, and b is the matrix of known metabolic demands or exchange fluxes.
23
Due to the fact that the metabolic transients are typically more rapid compared to cellular growth and process dynamics (Vallino and Stephanopoulos, 1993), a steady-state condition can be assumed and equation (3) reduces to:
S•v=b (4)
If exchange fluxes are known and incorporated in the v matrix, equation (4) can also be written as:
S•v=0 (5)
Equation (5) simply states that the synthesis fluxes of a metabolite must be balanced by the degradation fluxes over long time intervals. Since the number of reaction fluxes usually exceeds the number of metabolites or mass balances, equation (5) constitutes an underdetermined system resulting in a plurality of solutions, or feasible flux distributions. Although infinite in number, these solutions are constrained by mass balances included in the S matrix, forming a bounded solution space. Thus, a linear programming (LP) problem can be formulated and solved for a particular objective such as the maximization of cellular growth. Additional constraints such as ≤ ≤ ai vi bi , where ai and bi represent the upper and lower bounds of the corresponding reaction
fluxes (vi), in addition to the stoichiometric constraints represented by equation (5) are required to solve the LP problem. The optimal flux distributions represented by the solution of the LP problem is essentially the metabolic phenotype of the microbe at the conditions described by the constraints.
2.8. Energy conservation in microbes
The metabolic diversity of microorganisms is unfathomable, and this is especially true for their energy metabolism. In biological systems, adenosine triphosphate (ATP) is the universal molecular currency for storing and exchanging biological energy (Madigan et al., 2010, Nelson
24
and Cox, 2006). Irrespective of the types of substrate used for metabolism, energy is transformed and ultimately conserved in the form of energy-rich pyrophosphate bond of ATP in all forms of life (Thauer et al., 1977). Chemotrophic microbes usually produce ATP during catabolism by one of the two energy conserving mechanisms — substrate level phosphorylation (SLP), and electron transport phosphorylation (ETP), and these processes consist of a combination of various electron-accepting and electron-donating redox reactions, respectively (Kröger et al., 2002, Kröger et al., 1992, Thauer et al., 1977). In SLP or fermentation, ATP is generated from
the exergonic reaction of adenosine di-phosphate (ADP) and inorganic phosphate (Pi) during the conversion of an organic compound, and in the absence of a terminal electron acceptor (Thauer et al., 1977, Madigan et al., 2010, http://textbookofbacteriology.net/index.html, 2008, Unden et al., 2013). The synthesis of ATP through ETP is more evolved and observed widely in aerobic, anaerobic, and photosynthetic microorganisms (Thauer et al., 1977, Unden et al., 2013). In ETP or respiration, the synthesis of ATP is coupled to redox reactions mediated by electron carriers in an electron transport chain via a “chemiosmotic mechanism”, and in the presence of a terminal electron acceptor (Kröger et al., 2002, Kröger et al., 1992, Thauer et al., 1977, Unden et al., 2013).
The chemiosmotic mechanism, proposed by the British biochemist Peter Mitchell (Mitchell, 1961, Mitchell, 1972), deals with a proton motive force or PMF (Δp) consisting of a pH gradient (ΔpH) and an electrochemical potential difference (Δψ), and generated by the flow of electrons from an electron donor to an electron acceptor through a membrane-bound electron transport
chain. The generated PMF is then utilized for synthesizing ATP from ADP and Pi by the action of a membrane-potential-driven enzyme complex, called ATP synthase or ATPase (Thauer et al., 1977, Mitchell, 1961, Mitchell, 1972, Unden et al., 2013). Mitchell originally described the mechanism for aerobic bacteria which use oxygen as the terminal electron acceptor, and the process is known as oxidative phosphorylation. However, the generation of ATP via chemiosmotic mechanism by chemotrophic anaerobes is known as electron transport phosphorylation (ETP), or anaerobic respiration, where fumarate, nitrate, nitrite, sulfate, polysulfide, organohalides, carbon dioxide, and metal oxides instead of oxygen are used as terminal electron acceptors (Unden et al., 2013, Thauer et al., 1977, Kröger et al., 2002, Kröger et al., 1992).
25
2.9. Chlorinated xenobiotics and reductive dechlorination
Chlorinated organic solvents, including both chlorinated aliphatic hydrocarbons (e.g., PCE, TCE, and VC) and chlorinated aromatic hydrocarbons (e.g., HCB, TCB, and MCB), are highly volatile, less corrosive, less reactive, less flammable, and are being capable of effectively dissolving a wide range of organic compounds (Doherty, 2000a, Doherty, 2000b, Doherty, 2012, Lohman, 2002). These properties have made them very popular for widespread commercial and industrial use as cleaning and degreasing agents, refrigerants, extraction agents, ingredients for making adhesives, industrial paints, paint strippers, varnishes, lubricants, fungicides, herbicides, and pesticides in industries ranging from dry cleaning, electronics, defense, and automotive to pharmaceutical, textile and agriculture sectors, for more than 30 years (Petrisor and Wells, 2008, Doherty, 2000a, Doherty, 2000b, Lohman, 2002). However, such extensive uses, and past uncareful handling and disposal practices, together with the lack of awareness of harmful effects on human health and the environment, have made chlorinated solvents the most widespread xenobiotics or man-made contaminants of groundwater and soil (Doherty, 2000a, Doherty, 2000b, Doherty, 2012, Lohman, 2002). Trichloroethene (TCE) was found in at least 852 of 1430 while trichlorobenzene (TCB) was identified in 1699 USEPA (US Environmental Protection Agency) National Priorities List (NPL) Superfund sites (ATSDR, 2013). A 1995 Health Canada study (Canada and Limited, 1995) also found tetrachloroethene (PCE) and TCE to be the most common groundwater contaminants at Canadian sites while TCE was the most prevalent contaminants (in 3% surface and 19% ground water samples) worldwide as per a 1989 estimate by USEPA (Petrisor and Wells, 2008). Being sparingly soluble and denser than water, chlorinated solvents tend to sink in water and create a separate layer called dense non-aqueous phase liquids (DNAPLs) in soil and subsurface liquids (McCarty, 2001, McCarty, 2010). Due to their persistent nature, such DNAPLs ultimately hit the groundwater level and contaminate it by creating DNAPL plumes (McCarty, 2001, McCarty, 2010).
In spite of their environmental persistence, chlorinated solvents can be degraded by both biotic and abiotic processes to form even harmful intermediates than the original higher chlorinated compounds. For instance, partial biological transformation of TCE to produce cis-1,2- dichloroethene (cDCE) and vinyl chloride (VC) was first reported in 1985 (Parsons and Lage,
26
1985, Vogel and McCarty, 1985); VC is a known human carcinogen (Maltoni and Lefemine, 1975, Maltoni and Cotti, 1988) as its exposure causes a rare form of liver cancer angiosarcoma (Vianna et al., 1981) while TCE has been listed as a carcinogen recently (ATSDR_TCE_PCE, 2010). High level of TCE exposure also causes nervous system effects and lung damage, and PCE is a suspected carcinogen as it is found to be linked to leukemia and other defects in children with indirect exposure to PCE (ATSDR_TCE_PCE, 2010). Pentachlorobenzene (PeCB) and hexachlorobenzene (HCB) are reported to show hepatocarcinogenic activity (Thomas et al., 1998), while mono and dichlorobenzenes are relatively less toxic and can cause defects in skin, hyperpigmentation, osteoporosis, and other diseases primarily in children (Gustafson et al., 2000).
In 1989, it was first reported (Freedman and Gossett, 1989) that biological transformation of not only PCE and TCE but also VC was occurred to produce completely non-hazardous ethene by an anaerobic process termed reductive dechlorination (Freedman and Gossett, 1989). Although TCE was earlier reported to degrade by aerobic co-metabolic processes (Wilson and Wilson, 1985, McCarty et al., 1998), the anaerobic reductive dechlorination was shown to be linked to organisms’ growth (Holliger et al., 1993, Freedman and Gossett, 1989). In a co-metabolic process, microbes cannot couple the energy of the dechlorination reaction to their growth because the transformation of chlorinated compounds is a fortuitous modification by cofactors or enzymes which are catalyzing other reactions (Haggbloom and Bossert, 2003, El Fantroussi et al., 1998). However, during the anaerobic reductive dechlorination process, microbe can conserve the energy of the dechlorination reaction by generating ATP through the electron transport phosphorylation and the chemiosmotic mechanism using chlorinated solvents as terminal electron acceptors, and coupling this energy to their growth (Smidt and de Vos, 2004, Tas et al., 2010, Holliger et al., 1998b, Leys et al., 2013); hence, the process is also known as dehalorespiration, or organohalide respiration, and found to be catalyzed by reductive dehalogenase enzymes of some anaerobic bacteria (Figure 2.3) (Smidt and de Vos, 2004, Tas et al., 2010, Holliger et al., 1998b, Leys et al., 2013, Futagami et al., 2008).
27
Figure 2.3. Anaerobic reductive dechlorination of chlorinated ethenes to benign ethene and higher chlorinated benzenes to less toxic lower chlorinated benzenes
Although organohalide respiration is catalyzed by reductive dehalogenase (RDase) enzymes encoded by reductive dehalogenase homologous (rdh) genes, exact mechanism of the reductive dechlorination reaction is yet to be known because the structure of a purified RDase protein has not been determined yet. However, the purified and characterized RDase enzymes to date contain a corrinoid protein such as cobalamin and two iron-sulfur clusters as cofactors (Schumacher et al., 1997, Miller et al., 1997, Miller et al., 1998, Neumann et al., 1996, Neumann et al., 2002, Magnuson et al., 2000, Magnuson et al., 1998, Adrian et al., 2007b, Krajmalnik- Brown et al., 2004, Müller et al., 2004), except the 3-chlorobenzoate dehalogenase of Desulfomonile tiedjei which contains a heme group instead of a corrinoid protein (Ni et al., 1995). The involvement of a corrinoid cofactor makes the reductive dechlorination reaction a novel type of corrinoid-dependent reaction (Banerjee and Ragsdale, 2003, Holliger et al., 1998b). Krasotkina and coworkers (Krasotkina et al., 2001) proposed two working models regarding the mechanism of the reductive dechlorination reaction of chloro-aromatic compounds. The first model described the formation of different intermediates with cob(I)alamin of corrinoid as addition reactions while the second one categorized the formation of intermediates as radical
28
reactions (Banerjee and Ragsdale, 2003, Krasotkina et al., 2001). The first mechanism was purely theoretical while there was some experimental evidence for the latter one (Glod et al., 1997, Holliger et al., 1998b)
2. 10. Dehalococcoides bacteria
The biological transformation of VC, the known human carcinogen (Vianna et al., 1981), to completely non-toxic ethene by anaerobic reductive dechlorination was first reported in 1989 (Freedman and Gossett, 1989). This finding attracted renewed interest from researchers about this topic, and led to the isolation of many anaerobic bacteria, including Desulfitobacterium sp. strain PCE 1 (Gerritse et al., 1996) Sulfurospirillum multivorans (Scholzmuramatsu et al., 1995), and Dehalobacter restrictus (Holliger et al., 1998a), capable of transforming PCE and TCE partially to VC, but not to ethene by reductive dechlorination. In 1997, the anaerobic bacterium, Dehalococcoides ethenogenes strain 195 capable of degrading PCE completely to ethene was first isolated (Maymó-Gatell et al., 1997). Since then, a number of Dehalococcoides strains, including strains CBDB1 (Adrian et al., 2000b), BAV1 (He et al., 2003), FL2 (He et al., 2005), GT (Sung et al., 2006), VS (Cupples et al., 2003), MB (Cheng and He, 2009), and ANAS1 and ANAS2 (Lee et al., 2011), have been isolated from geographically diverse contaminated sites. Recently, the genus and species of Dehalococcoides has been defined, and all isolates are now known as the strains of Dehalococcoides mccartyi (Löffler et al., 2012). Phylogenetically, members of D. mccartyi belong to the phylum Chloroflexi, a not very well characterized bacterial phylum of green non-sulphur bacteria (Löffler et al., 2012). Despite being named as coccoids, these bacteria are fairly small with a flattened disc shape morphology, having a diameter of approximately 0.3 ~ 1 µm, and a thickness of 0.1 ~ 0.2 μm with occasional cellular appendages and biconcave indentations (Adrian et al., 2000b, Löffler et al., 2012). Unlike a typical bacterial cell wall, D. mccartyi cell wall has a close resemblance to archaeal S-layer like proteins (Maymó-Gatell et al., 1997, Adrian et al., 2000b, Löffler et al., 2012); hence, they are neither Gram-positive nor Gram-negative bacteria.
29
More than 98% sequence identity in their 16S rRNA gene sequences renders the strains of D. mccartyi strikingly similar and makes their isolation very difficult (Ritalahti et al., 2006, Löffler et al., 2012); nonetheless, they are very diverse in terms of their metabolic capabilities as demonstrated by the wide range of chloro-organics they can respire. Strains 195 and CBDB1 can dechlorinate both chloro-aliphatics and chloro-aromatics, including potent human carcinogens VC, TCE, and dioxins (Adrian et al., 2007a, Bunge et al., 2003) while rest of the strains respire chloro-aliphatics only (Löffler et al., 2012). All strains use the chlorinated organics as terminal electron acceptors while only hydrogen as the electron donor or energy source and acetate as the carbon source during organohalide respiration (Löffler et al., 2012, Adrian, 2009). Notably, all steps involved in the dechlorination of chloro-aliphatics by D. mccartyi are not energy conserving. For example, during the dechlorination of higher chlorinated ethenes by strains 195 and FL2, the vinyl chloride (VC) to ethene step is co-metabolic (He et al., 2005, Maymó-Gatell et al., 1997) while for strain BAV1, PCE and TCE degradation steps are co-metabolic, but cDCE and VC steps are growth related (He et al., 2003). Based on 16S rRNA gene sequence similarity of D. mccartyi isolates, and other mixed culture and environmental samples, Hendrickson et al. (2002) divided the Dehalococcoides phylotype into 3 subgroups: Cornell, Pinellas and Victoria. While strains 195, MB, ANAS1, and ANAS2 belong to the Cornell subgroup, the other isolates except strain VS belong to the Pinellas group; strain VS is the lone isolate from the Victoria group (Hendrickson et al., 2002, Löffler et al., 2012).
Organohalide respiration or reductive dechlorination is the only known metabolic process by which D. mccartyi conserve energy for growth (Löffler et al., 2012, Futagami et al., 2008, Adrian, 2009, Leys et al., 2013). Thus, the difference in metabolizing various organohalides by D. mccartyi strains can be attributed to the presence of multiple copies of non-identical but homologous rdh genes (Hölscher et al., 2004) because the RDase enzymes, encoded by the rdh genes, catalyze the reductive dechlorination reaction in D. mccartyi. Genome sequences of multiple D. mccartyi strains (Kube et al., 2005, McMurdie et al., 2009, Seshadri et al., 2005) revealed the presence of an unusually large number of rdh genes in each strain, ranging from 10 in strain BAV1 to 36 in strain VS (Hug et al., 2013, McMurdie et al., 2009). Interestingly, the majority of these rdh genes are located in two variable regions called high plasticity regions in the genomes, along with other insertion sequences and repeated elements (McMurdie et al.,
30
2009, Kube et al., 2005, Seshadri et al., 2005). Apart from these differences, the core genome of sequenced D. mccartyi strains is remarkably similar and conserved (Ahsanul Islam et al., 2010, McMurdie et al., 2009). Although substrate ranges of only five RDase enzymes were experimentally characterized (Adrian et al., 2007b, Krajmalnik-Brown et al., 2004, Magnuson et al., 2000, Magnuson et al., 1998, Müller et al., 2004) so far, it is hypothesized that these bacteria probably degrade a wide variety of chlorinated pollutants as growth supporting terminal electron acceptors due to a large number of rdh genes in the genomes (Kube et al., 2005, Seshadri et al., 2005).
2.11. The KB-1 microbial community
KB-1 is an anaerobic dechlorinating mixed microbial culture, maintained and enriched in the Edwards lab for more than 16 years, and originated from the soil and groundwater of a TCE- contaminated site in Southern Ontario (Duhamel et al., 2002, Edwards and Cox, 1997). Using only methanol as the electron donor, KB-1 can dechlorinate PCE, TCE, cDCE and VC to ethene by using them as electron acceptors. Although KB-1 is a mixed microbial community, past studies (Duhamel et al., 2004, Duhamel, 2005, Waller, 2010) have identified dominant organisms in the community belonging to four major phylotypes: dechlorinators, acetogens, methanogens, and fermenters (Figure 2.4); however, dechlorinators, such as Dehalococcoides and Geobacter, are the largest microbial populations, comprising more than 60% of the KB-1 community (Duhamel et al., 2004, Duhamel, 2005, Waller, 2010). Both Dehalococcoides and Geobacter are active dechlorinating organisms in the community while other members mainly play supporting roles by providing essential substrates for the dechlorinarors; for instance, methanol is converted to acetate and hydrogen by acetogens (Duhamel and Edwards, 2006, Duhamel et al., 2002, Duhamel, 2005), and these two metabolites are the carbon and energy source for Dehalococcoides. Growth and dechlorination activities by Dehalococcoides are reported to be faster and more robust in the KB-1 community than in pure cultures (Duhamel, 2005, Hug, 2012, Waller, 2010), which indicate the presence of some beneficial interactions between the community members. Moreover, the presence of multiple organisms provide functional redundancies in KB-1, which likely plays a crucial role in the robustness of
31 dechlorination activity by this microbial consortium as compared to pure cultures of Dehalococcoides (Duhamel and Edwards, 2006, Duhamel et al., 2002) A metagenome of the KB-1 community has recently been sequenced, and a composite genome of two very similar Dehalococcoides strains was identified (Hug, 2012). A variety of KB-1 culture is also using in commercial bioremediation applications, especially for bioaugmentation purposes in more than 180 contaminated sites around the world (Nicholson, 2010).
Figure 2.4. Schematic representation of the microbial interactions between different community members in the KB-1 community. Only major KB-1 phylotypes are shown in the figure.
32
Chapter 3: Characterizing the metabolism of Dehalococcoides with a constraint-based model
3.1. Abstract
Dehalococcoides strains respire a wide variety of chloro-organic compounds and are important for the bioremediation of toxic, persistent, carcinogenic, and ubiquitous ground water pollutants. In order to better understand metabolism and optimize their application, we have developed a pan-genome-scale metabolic network and constraint-based metabolic model of Dehalococcoides. The pan-genome was constructed from publicly available complete genome sequences of Dehalococcoides mccartyi strains CBDB1, 195, BAV1, and VS. We found that Dehalococcoides pan-genome consisted of 1118 core genes (shared by all), 457 dispensable genes (shared by some), and 486 unique genes (found in only one genome). The model included 549 metabolic genes that encoded 356 proteins catalyzing 497 gene-associated model reactions. Of these 497 reactions, 477 were associated with core metabolic genes, 18 with dispensable genes, and 2 with unique genes. This study, in addition to analyzing the metabolism of an environmentally important phylogenetic group on a pan-genome scale, provides valuable insights into Dehalococcoides metabolic limitations, low growth yields, and energy conservation. The model also provides a framework to anchor and compare disparate experimental data, as well as to give
insights on the physiological impact of “incomplete” pathways, such as the TCA-cycle, CO2 fixation, and cobalamin biosynthesis pathways. The model, referred to as iAI549, highlights the specialized and highly conserved nature of Dehalococcoides metabolism, and suggests that evolution of Dehalococcoides species is driven by the electron acceptor availability.
33
3.2. Introduction
Genome sequencing has enabled the characterization of biological systems in a more comprehensive manner. Recent research in bioinformatics and systems biology has resulted in the development of numerous systematic approaches for the analysis of cellular physiology that have been reviewed elsewhere (Covert et al., 2008, Medini et al., 2008, Reed et al., 2006a, Young et al., 2008). However, constraint-based reconstruction and analysis (COBRA), a mathematical framework for integrating sequence data with a plethora of experimental ‘omics’ data has been shown to be successful in the genome-wide analysis of cellular physiology (Becker et al., 2007, Becker and Palsson, 2008, Feist et al., 2009). In addition, this approach has also been utilized to explore the metabolic potential, as well as the gene essentiality analysis of several organisms across different kingdoms of life (Heinemann et al., 2005, Joyce and Palsson, 2008, Kim and Lee, 1999, Nookaew et al., 2008, Schilling et al., 2002, Teusink et al., 2006); however, the COBRA approach has not yet been implemented for Dehalococcoides, or any other known dechlorinating bacterium.
Using acetate as a carbon source and hydrogen as an electron donor, small, disc-shaped anaerobic bacteria Dehalococcoides are capable of dehalogenating a variety of halogenated organic compounds as electron acceptors, of which many are problematic ground water pollutants (El Fantroussi et al., 1998, Haggbloom and Bossert, 2003, Holliger et al., 1998b, Smidt and de Vos, 2004). Dehalococcoides mccartyi strain 195 (strain 195) is the first member of this phylogenetic branch that was grown as an isolate (Maymó-Gatell et al., 1997). Subsequently, a number of Dehalococcoides strains were isolated: strain CBDB1 (Adrian et al., 2000b), strain BAV1 (He et al., 2003), strain FL2 (He et al., 2005), strain GT (Sung et al., 2006), and strain VS (Cupples et al., 2003). The strains respire through a membrane-bound electron transport chain (ETC) (Hölscher et al., 2003, Jayachandran et al., 2004, Nijenhuis and Zinder, 2005), which is incompletely defined. Reductive dehalogenases (RDases), encoded by reductive dehalogenase homologous (rdh) genes, are pivotal membrane-associated enzymes of the ETC (Hölscher et al., 2003, Jayachandran et al., 2004, Nijenhuis and Zinder, 2005). Genome sequencing has revealed the presence of multiple non-identical putative rdh genes in each strain (Hölscher et al., 2004, Kube et al., 2005, Seshadri et al., 2005, Waller et al., 2005). Since these
34
microbes respire chlorinated pollutants by RDase-catalyzed reductive dechlorination reaction, rdh genes determine a significant part of Dehalococcoides’ phenotypes. Functional characterization of only 5 of the over 190 rdh genes reveals that cobalamin — a corrinoid compound — is an essential cofactor for the corresponding RDases (Adrian et al., 2007b, Cupples et al., 2004, Krajmalnik-Brown et al., 2004, Magnuson et al., 2000, Magnuson et al.,
1998). Hydrogenase (H2ase) is another key enzyme of Dehalococcoides ETC (Jayachandran, 2004, Kube et al., 2005, Nijenhuis and Zinder, 2005, Seshadri et al., 2005). Interestingly, the genomes of Dehalococcoides strains encode 5 different types of H2ases: membrane-bound hup, ech, hyc, hym, and cytoplasmic vhu (Kube et al., 2005, Morris et al., 2007, Morris et al., 2006,
Seshadri et al., 2005). The presence of multiple types of H2ases clearly emphasizes the
importance of H2 in their energy metabolism (Adrian et al., 2000a, Adrian et al., 2000b, He et al.,
2005, He et al., 2003, Maymó-Gatell et al., 1997). This multiplicity of H2ases and RDases further highlights redundancy in the organisms’ energy conservation process that may ensure a rapid and efficient response of their energy metabolism towards changing growth conditions (Meyer, 2007, Vignais et al., 2001).
In addition to RDase and H2ase, the ETC likely requires an in vivo electron carrier to mediate electron transport between these enzymes. Previous studies have shown that the reductive ’ dechlorination reaction requires an in vivo electron donor of redox potential (E0 ) ≤-360 mV (Hölscher et al., 2003, Nijenhuis and Zinder, 2005), similar to other dechlorinating bacteria (Holliger et al., 1998b, Krasotkina et al., 2001, Miller et al., 1997). The cob(II)alamin of corrinoid cofactor in the RDase enzyme is reduced to cob(I)alamin during the reductive dechlorination reaction; hence, necessitating a low-potential donor because the redox potential
(E0’) of Co(II)/Co(I) couple is between -500 and -600 mV (Banerjee and Ragsdale, 2003, Holliger et al., 1998b, Krasotkina et al., 2001). While quinones such as menaquinone or ubiquinone could act as electron carriers in anaerobes (Kröger et al., 2002, Louie and Mohn, 1999, Schumacher and Holliger, 1996), experimental evidence suggests this is not the case in Dehalococcoides (Jayachandran et al., 2004, Nijenhuis and Zinder, 2005). Moreover, the redox ’ ’ potentials for quinones (Menaquinone ox/red E0 = -70 mV and Ubiquinone ox/red E0 = +113 mV (Thauer et al., 1977)) are not compatible with the RDases’ requirement of a low potential donor. Furthermore, cytochrome b — a typical donor for the quinones to participate in the redox
35
reactions of anaerobic ETCs (Dross et al., 1992, Menon, 1992) — appears to be absent in the genomes of Dehalococcoides (Kube et al., 2005, Seshadri et al., 2005). However, the genomes have ferredoxin, an iron-sulphur protein, which can act as the low-potential donor for RDases because ferredoxin is the most electronegative electron carrier yet found in the bacterial ETCs (Bruschi and Guerlesquin, 1988, Eisenstein and Wang, 1969, Miller et al., 1997, Sterner, 2001, Thauer et al., 1977, Valentine and Wolfe, 1963, Valentine, 1964).
Although, Dehalococcoides are capable of harnessing free energy from the RDase catalyzed exergonic reductive dechlorination reactions by coupling to ATP generation for growth (Holliger et al., 1998b, Smidt and de Vos, 2004), their pure culture growth is much less robust than their growth in mixed cultures (Adrian et al., 2000a, Cupples et al., 2004, Duhamel and Edwards, 2007); even in mixed cultures, their growth yield is not as high as that predicted from the free energy of reductive dechlorination (Jayachandran, 2004, Jayachandran et al., 2004). Thus, in order to better understand dechlorination-metabolism, and given that to-date sequenced Dehalococcoides genomes are more than 85% identical at the amino acid level (Krajmalnik- Brown et al., 2006, Morris et al., 2006), we developed a pan-genome-scale constraint-based in silico metabolic model of Dehalococcoides. The model was constructed from the complete genome sequences of 4 geographically distinct strains: strain CBDB1 from the Saale river near Jena, Germany (Adrian et al., 1998, Nowak et al., 1996), strain BAV1 from Oscoda, Michigan, USA (He et al., 2002, Lendvay et al., 2003), strain 195 from a wastewater treatment plant in Ithaca, New York, USA (Distefano et al., 1991, Freedman and Gossett, 1989, Maymó-Gatell et al., 1997), and strain VS from Victoria, Texas, USA (Cupples et al., 2004, Cupples et al., 2003). Although the model comprises multiple genomes, it analyzed the outcome of metabolic genes only. Also, it did not include information about cellular regulation due to the lack of adequate knowledge about Dehalococcoides regulatory networks. Nonetheless, the model was primarily used to investigate the intrinsic metabolic limitations, in addition to addressing open questions regarding Dehalococcoides physiology, such as the incomplete nature of various metabolic pathways, and attendant implications on metabolism and growth. We also identified the environmental conditions from the model simulations that resulted in faster in silico growth of Dehalococcoides. Furthermore, the constraint-based model, along with the comparative analysis of 4 genomes, clarifies both similarities and differences among the strains in terms of their core
36
metabolism and other biosynthetic processes leading to an improved understanding of metabolism and evolution in Dehalococcoides.
3.3. Materials and methods
3.3.1. Dehalococcoides pan-genome
In order to develop the pan-genome of Dehalococcoides, we obtained strain CBDB1 genome sequence from JCVI (http://cmr.jcvi.org/tigr-scripts/CMR/CmrHomePage.cgi) while strain 195 and strain BAV1 genome sequences were downloaded from the IMG database (http://img.jgi.doe.gov/cgi-bin/pub/main.cgi). Strain VS genome sequence was obtained from Alfred Spormann at Stanford University, CA. The genome sequences were compared using OrthoMCL (Li et al., 2003), a widely accepted method for finding orthologs across different genomes (Chen et al., 2007). OrthoMCL is based on reciprocal best BLAST hit (RBH), but recognizes co-orthologous groups using a Markov graph clustering (MCL) algorithm (Van Dongen, 2000). The Dehalococcoides pan-genome was developed following a previously described approach (Tettelin et al., 2005, Tettelin et al., 2008) outlined in Figures A1-A4 in Appendix A and in the following section.
First, we identified putative orthologs between a reference genome and a subject genome which were selected arbitrarily from the 4 genomes compared. The analysis was conducted by OrthoMCL, keeping the parameters of the algorithm in default settings. Subsequently, those genes that were present only in subject genome 1 were identified and combined with the reference genome to create the augmented genome 1 (Figure A1 in Appendix A). Then, the augmented genome 1 was compared and analyzed with subject genome 2 as described above to construct the augmented genome 2. The pan-genome was obtained by comparing the augmented genome 2 and subject genome 3. The number of genes in a pan-genome was reported to depend on both the order of genomes analyzed and the reference genome (Tettelin et al., 2005); hence, we constructed 6 pan-genomes for 6 different genome-order combinations. Of these 6 pan- genomes, we selected the one with the highest number of genes (2061) as Dehalococcoides pan- genome in order to capture the entire gene repertoire of Dehalococcoides species (Muzzi et al.,
37
2007). We also identified the core, dispensable, and unique genomes for Dehalococcoides pan- genome modifying the previously described methods (Medini et al., 2008, Tettelin et al., 2005), and detailed in the supplemental text in Appendix A.
3.3.2. Reconstructing the metabolic network of Dehalococcoides
The pan-genome was used to reconstruct the pan-genome-scale metabolic network, and the constraint-based model of Dehalococcoides metabolism was developed from this reconstruction. Since the strains of Dehalococcoides share a high degree of sequence identity, we arbitrarily chose strain CBDB1 genome as a reference and constructed the metabolic network from its annotated genome sequence (Kube et al., 2005), publications regarding its physiology, and various genomic and biochemical databases (Feist et al., 2009). Then, we included other metabolic genes from the pan-genome into the reconstructed network that were missing from strain CBDB1 genome. Five gene correspondence tables for the four genomes were prepared (Tables S3-S7 in Table A15 in Appendix A) for facilitating gene identification and cross- reference regardless of the genome of interest. We developed and manually curated the reconstructed network using the procedures described previously (Covert et al., 2001, Feist et al., 2009, Francke et al., 2005, Reed et al., 2006a, Thiele and Palsson, 2010a) with the SimPheny platform (Genomatica Inc., San Diego, CA). Since genome annotations are error prone (Devos and Valencia, 2001), annotated genes of strain CBDB1, as well as the pan-genome genes with defined metabolic functions were verified by identifying their homologs in other well characterized organisms, including Escherichia coli, Bacillus subtilis, Geobacter sulfurreducens, and Saccharomyces cerevisiae with BLAST (Altschul et al., 1997). Subsequently, confidence levels were assigned based on the degree of sequence identities or reciprocal best BLAST hits. Dehalococcoides genes, for instance, having > 40% amino acid sequence identity with homologs in the protein databases (SWISSPROT (Boeckmann et al., 2003), IMG (Markowitz et al., 2012), PDB (Berman et al., 2000), GO (Ashburner et al., 2000)) were given a confidence level of 3, and genes with > 30% and < 30% identity were assigned a confidence level of 2 and 1, respectively. In addition, these genes were also evaluated on the basis of gene order or conserved synteny (Markowitz et al., 2012), along with phylogenetic analysis with updated versions of biological databases, such as UniProt (Apweiler et al., 2012), IMG, GO, and PDB. Afterwards, both
38
elementally and charge balanced biochemical reactions were assigned to the genes to create the gene-protein-reaction (GPR) associations (Reed et al., 2006a). These reactions were further verified by biochemical literature as well as enzyme databases, such as KEGG (Kanehisa et al., 2011), BRENDA (Chang et al., 2009) , MetaCyc (Caspi et al., 2012), and ENZYME (Bairoch, 2000). In some instances, genes required for some biosynthetic reactions essential for producing all the precursor metabolites for cell biomass were not identified. Such reactions (21 in numbers detailed in Table S1 in Table A15 in Appendix A) were added to the reconstructed network as non-gene associated reactions.
3.3.3. Estimation of biomass composition and maintenance energy requirements
The biomass composition (dry basis) of 1 gram of Dehalococcoides cells was calculated from various published and experimental data, and expressed in mmol (millimoles)/g DCW (dry cell weight) (Tables A1-A6 in Appendix A). Due to the lack of detailed experimental data on the cellular composition of Dehalococcoides, the weight fractions of protein, lipid, carbohydrate, soluble pools and ions of the cell were estimated from the published genome-scale model of Methanosarcina barkeri (Feist et al., 2006). We choose to use data from M. barkeri model — an archaeon — because Dehalococcoides cells are enclosed by the archaeal S-layer like protein instead of a typical bacterial cell wall (Adrian et al., 2000b, He et al., 2003, Maymó-Gatell et al., 1997). The weight percent of DNA was estimated from the cell morphology, length of the genome sequence (Borodina et al., 2005), and molar mass of the DNA while the weight percent of RNA was calculated from the experimental data on a Dehalococcoides containing mixed microbial culture (see supplemental text in Appendix A for details). In addition, the detailed composition of each macromolecule, as well as the composition of cofactors, and other soluble pools and ions are presented in Tables A1-A6 in Appendix A. The distribution of amino acids, nucleotides and cofactors in the biomass was calculated from the data reported previously (Neidhardt et al., 1990, Pramanik and Keasling, 1997) while the weight fractions of different fatty acids were estimated from White et al. (2005). These compositions were then integrated into the model as a biomass synthesis reaction, BIO_DHC_DM_61 (see the supplemental text in Appendix A for additional details).
39
Maintenance energy accounts for the ATP requirements of cellular processes, such as turnover of the amino acid pools, polymerization of cellular macromolecules, and ion transport, which are not included in the biomass synthesis reaction (Pirt, 1965, Pirt, 1982, Russell and Cook, 1995). These ATP requirements can be either growth associated (GAM), i.e., related to assembly and polymerization of macromolecules (eg. proteins, DNA, etc.), or non-growth associated (NGAM) that corresponds to maintaining membrane potential for keeping cellular integrity (Neijssel et al., 1996, Pirt, 1965, Pirt, 1982). Due to the lack of experimental chemostat data required for calculating both maintenance parameters (Varma and Palsson, 1994), the NGAM for a Dehalococcoides cell (1.8 mmol ATP.gDCW-1.h-1) was calculated from the experimental decay rate (0.09 day-1) (Cupples et al., 2003) and the average of pure-culture experimental growth yields (0.69 g DCW/eeq; Table A7 in Appendix A) following the procedures described previously (Pirt, 1965, Russell and Cook, 1995). The GAM was estimated by the regression analysis, using an initial estimate of 26 mmol ATP/g DCW for a typical bacterial cell (Table A9 in Appendix A) (Neidhardt et al., 1990). The initial estimate of GAM and the calculated NGAM were then used to simulate (using flux balance analysis, described below) the average of reported pure-culture experimental growth rates (0.014 h-1; Table A8 in Appendix A). A GAM of 61 mmol ATP/g DCW gave the best prediction of the experimental growth rate.
3.3.4. In silico analysis of Dehalococcoides metabolism
Flux Balance Analysis (FBA) relies on the imposition of a series of constraints including stoichiometric mass balance constraints derived from the metabolic network, thermodynamic reversibility constraints and any available enzyme capacity constraints (Price et al., 2004, Reed et al., 2003, Reed et al., 2006a). The imposition of these constraints leads to a linear optimization (Linear Programming, LP) problem formulated to maximize a cellular objective function such as the growth rate. Hence, the biomass synthesis reaction is assumed to be the objective function to be maximized to solve the LP problem in SimPheny. In addition, a number of reversible reactions were added in the network for exchanging external metabolites, such as acetate (ac), - -2 chloride (Cl ), carbondioxide (CO2), and sulphate (SO4 ), to represent the in silico minimal medium (Table 3.2) for Dehalococcoides. Cobalamin is essential for Dehalococcoides growth, but they are unable to synthesize it de novo; hence, they salvage cobalamin from the medium. In
40
order to analyze whether cobalamin flux can limit Dehalococcoides growth, we performed a robustness analysis on the cobalamin exchange reaction for different weight fractions of cobalamin in the biomass. We also simulated growth rates by incorporating all the pathways required for de novo cobalamin synthesis in iAI549 for analyzing cobalamin synthesis cost, and its effect on Dehalococcoides growth. Finally, to identify whether the growth of Dehalococcoides was carbon or energy limited, the growth simulations were conducted by
varying acetate fluxes and energy transfer efficiencies since acetate and H2 are the carbon and energy source of these microbes, respectively (Adrian et al., 2000b, He et al., 2003, Maymó- Gatell et al., 1997). Energy transfer efficiencies were calculated by normalizing the ATP fluxes to the maximum ATP that could be generated from H2 based on Gibb’s free energy of H2 oxidation and the energetic cost of ATP synthesis (mol ATP/mol H2) (see Table A12 and supplemental text in Appendix A for additional details). The constraints set used to simulate Dehalococcoides growth is listed in Table S18 in Table A14 in Appendix A, and the SBML file for the reconstructed network (iAI549) is presented in Table A14.
3.4. Results and discussion
3.4.1. Dehalococcoides metabolic network
3.4.1.1. Pan-metabolic-genes of Dehalococcoides
The concept of a pan-genome was first investigated by Tettelin and colleagues for the 8 isolates of common human pathogen Streptococcus agalactiae (Tettelin et al., 2005). While pan-genome analyses for other organisms have been reported (Tettelin et al., 2008), no such analysis has been performed to-date for any dechlorinating bacterium, or any other microbe of bioremediation importance. In addition, most of the reported pan-genome analyses were conducted on pathogenic isolates for designing vaccines by assessing their virulence evolution and diversity (Tettelin, 2009). Here, we developed the Dehalococcoides pan-genome from the complete sequenced genomes of four Dehalococcoides strains. Method details are provided in the materials and methods, and also in the supplemental text (Figures A1-A4) in Appendix A. The pan-genome comprises 2061 genes (Figure 3.1). Of these 2061 genes, 1118 genes are in the core,
41
457 are dispensable, and 486 are unique (Figure 3.1). The genes are further classified as metabolic, non-metabolic, and hypothetical based on information obtained from the literature and various biochemical databases, such as SWISSPROT, UniProt, IMG, and PDB. We defined metabolic genes as those that are exclusively related to metabolic processes such as carbon and energy metabolism of Dehalococcoides. Genes that are involved in DNA repair and metabolism, as well as encoding putative transposable elements and insertion elements (Kube et al., 2005, Seshadri et al., 2005) are classified as non-metabolic. Putative genes with a non-specific metabolic function or genes without any function or annotation are categorized as hypothetical (Figure 3.1).
Figure 3.1. Composition of the Dehalococcoides pan-genome. Core, dispensable, and unique genomes are represented by blue, green, and orange, respectively. Genes in these genomes are also categorized as metabolic (spotted pattern), non-metabolic (plain), and hypothetical (grid pattern) on the basis of various bioinformatic analyses (see text for details).
42
Most of the metabolic genes (413 out of 549) were found in the core genome while only a small number of those were identified in the dispensable (75) and unique (61) genomes. The abundance of core metabolic genes in the pan-genome indicates that the central metabolism of Dehalococcoides is very well conserved across strains since core genes are shared by all. We further categorized the metabolic genes in the dispensable and unique genomes based on both function and strain (Figure 3.2). Clearly, the majority of differences among the strains (45 out of the 75 dispensable genes and 47 out of the 61 unique genes) are due to the rdh genes (Figure 3.2). In addition, only strain195 has nitrogen fixing genes and associated transporters related to the nitrogen fixation process. As a result of these genes, together with unique rdhs, strain 195 has the most unique genes of the 4 genomes compared. Due to the presence of a suite of multiple non-identical rdh genes, each strain metabolizes a unique set of specific chlorinated substrates (Adrian et al., 2000b, Bunge et al., 2003, Morris et al., 2007). Hence, the differences in rdh genes largely define the strain specific phenotypes of Dehalococcoides.
43
Figure 3.2. Distribution of dispensable and unique metabolic genes in different Dehalococcoides strains. Colors are assigned to further categorize the genes according to their function identified from annotation and verified by different bioinformatic analyses. Each color except black signifies the presence of a corresponding metabolic gene while black indicates the absence of the corresponding gene. Genes belonging to amino acid metabolism, lipid metabolism and nucleotide metabolism are small in number; hence, included in ‘other’ category. This heat map essentially describes the differences among Dehalococcoides strains from the context of metabolic genes.
Though there were differences in rdh genes, most of these were found in the dispensable genome (Figure 3.2 and Table A15 in Appendix A) while only 9 rdhs (5 rdhA and 4 rdhB genes with >35% amino acid sequence identity) were shared by all strains and found in the core genome (Table A15 in Appendix A). Presence of the majority of rdh genes in the dispensable genome further supports the hypothesis that they were acquired through lateral gene transfer events (Krajmalnik-Brown et al., 2006, Kube et al., 2005, McMurdie et al., 2009).
3.4.1.2. Features of the Reconstructed Metabolic Network of Dehalococcoides
The reconstructed metabolic network of Dehalococcoides, denoted as iAI549 according to the established naming convention (Reed et al., 2003), accounted for 549 open reading frames (ORF) or protein coding genes (27% of the total 2061 genes). Metabolic genes were identified from the genome annotations which were verified with various bioinformatic analyses (see Materials and Methods). In addition, we annotated or revised the annotation for 70 ORFs based on information obtained from different biochemical databases (Table S2 in Table A15 in Appendix A provides a full list of reannotated genes). General features of the Dehalococcoides metabolic network (iAI549) are provided in Table 3.1. iAI549 includes 518 model reactions and 549 metabolites where 497 reactions are gene associated and 21 (4%) are non-gene associated (Table 3.1). The non-gene associated reactions (Table A15 in Appendix A) were added in order to fill gaps in the reconstructed network based on simulations. Although no gene associations were identified for these reactions, we provided a list of core hypothetical genes (Table A15 in Appendix A) which potentially could contain genes associated with these reactions and are prime candidates for further biochemical testing. The network also comprises 36 exchange reactions, including one demand reaction called the
44
biomass synthesis reaction (BIO_DHC_DM_61), to facilitate the transport of various metabolites into and out of the cell. The composition of the in silico minimal medium is shown in Table 3.2, while detailed composition of BIO_DHC_DM_61 is available in the supplemental text in Appendix A. We further categorized the genes and reactions of iAI549 into 7 different functional categories or subsystems based on the associated metabolic pathways (Figures A6 and A7 in Appendix A). The differences among the strains are mainly observed in the energy metabolism category, which includes 51 dispensable and 54 unique metabolic genes, and most of these are rdhs (Figure A6 in Appendix A). However, almost all the reactions of iAI549 (96% of the total 518) are core, which again indicates that the basic central metabolism of Dehalococcoides is strictly conserved (Figure A7 in Appendix A). Although a number of dispensable metabolic genes are found in different subsystems, most of these genes are actually paralogs of the core metabolic genes. This relationship explains why, for example, there are 13 dispensable genes in the transport subsystem, 3 genes each in the lipid and nucleotide metabolism, but no corresponding dispensable reactions (Figure A7 in Appendix A). Since rdhs were found in core, unique and dispensable genomes, we assigned the reductive dechlorination reaction as a core reaction. Therefore, the truly unique metabolic reactions of iAI549 are the nitrogen fixing reaction (EC-1.18.6.1) and the molybdate (required for synthesizing cofactor for the nitrogenase) transport reaction (TC-3.A.1.8) belonging to strain 195 only.
Table 3.1. General features of Dehalococcoides metabolic network (iAI549)
Genes
Total number of genes 2061
Number of included genes 549
Number of excluded genes 1512
Proteins
Total number of proteins 356
Intra-system Reactions
Total number of model reactions 518
45
Gene associated model reactions 497
Non-gene associated model reactions 21
Exchange Reactions
Total number of exchange reactions 36
Input-output reactions 35
Demand reactions 1
Metabolites
Total number of metabolites 549
Number of extracellular metabolites 31
Number of intracellular biomass metabolites 110
Table 3.2. Composition of the in silico minimal medium of Dehalococcoides
Abbreviation Exchange reaction Equation Acetate exchange EX_ac(e) ac <==>
Vitamin B12 or cobalamin exchange EX_cbl1(e) cbl1 <==> Chloride exchange EX_cl(e) cl <==> Carbon dioxide exchange EX_co2(e) co2 <==> Proton exchange EX_h(e) h <==> Hydrogen exchange EX_h2(e) h2 <==> Water exchange EX_h2o(e) h2o <==> Dichlorobenzene exchange EX_dcb(e) dcb <==> Ethene exchange EX_etl(e) etl <==> Tetrachloroethene exchange EX_pce(e) pce <==> Hexachlorobenzene exchange EX_hcb(e) hcb <==> Ammonium exchange EX_nh4(e) nh4 <==> Inorganic phosphate exchange EX_pi(e) pi <==> Sulphate exchange EX_so4(e) so4 <==>
46
Table 3.3. Comparison of various in silico genome-scale models with iAI549
iAI549 iRM588 iAF692 In silico models iAF1260 iYO844 (B. (D. (G. (M. (Organisms) (E. coli) subtilis) mccartyi) sulfurreducens) barkeri) Total reactions 518 522 619 2077 1020 Amino acid 139 119 150 198 207 metabolism Cofactor and prosthetic group 102 100 153 162 83 biosynthesis Nucleotide 83 58 75 155 123 metabolism Lipid metabolism 81 93 46 522 126 Central carbon 41 64 72 252 196 metabolism Energy metabolism 40 37 41 90 41
Transport 32 51 82 698 244
Furthermore, we compared iAI549 to a number of in silico genome-scale models of other Bacteria and Archaea (Table 3.3): iAF1260 for Escherichia coli (Feist et al., 2007), iYO844 for Bacillus subtilis (Oh et al., 2007), iRM588 for Geobacter sulfurreducens (Mahadevan et al., 2006), and iAF692 for Methanosarcina barkeri (Feist et al., 2006). We found that iAI549 had the lowest number of total reactions because of the limited scope of Dehalococcoides’ metabolism. In addition, these numbers also suggest that facultative anaerobes (E. coli and B. subtilis) are more versatile in their lifestyle and metabolism compared to obligate anaerobes (Dehalococcoides, Geobacter and Methanosarcina). These differences are further supported by the presence of a high number of transporters in iAF1260 and iYO844 compared to the presence of only 32 transporters in iAI549 (Table 3.3). A large number of reactions of iAI549 are found to be involved in the amino acid metabolism since the genes for de novo synthesis of all the amino acids except methionine are identified to be present in the genomes (Kube et al., 2005, Seshadri et al., 2005). Also, iAI549 comprises only 41 reactions for the central carbon metabolism — glycolysis, gluconeogenesis, TCA-cycle, pentose phosphate pathway, carbohydrate metabolism
47
— compared to 262 reactions in iAF1260; an incomplete TCA-cycle and an inactive glycolysis pathway explain this low number for iAI549. Since Dehalococcoides lack a typical bacterial cell wall (Adrian et al., 2000b, He et al., 2003, Maymó-Gatell et al., 1997), iAI549 has only 81 reactions for the lipid metabolism category. Furthermore, the cofactor and prosthetic group biosynthesis comprises 101 reactions of iAI549 compared to 162 reactions of iAF1260 because the pathways for synthesizing vitamin B12 and quinones are predicted to be incomplete in Dehalococcoides (Kube et al., 2005, Seshadri et al., 2005).
3.4.2. Model-based simulations of Dehalococcoides physiology
3.4.2.1. Exploring the central metabolism of Dehalococcoides
The reconstructed network for glycolysis, gluconeogenesis, the TCA-cycle and the pentose phosphate pathway of iAI549 highlighted some of the key limitations of Dehalococcoides central metabolism. Although putative genes for glycolysis and gluconeogenesis were identified, no gene for a glucose or fumarate transporter was found in any of the genomes, explaining the inability of Dehalococcoides to use glucose or fumarate as a carbon source. The TCA-cycle of Dehalococcoides (Figure 3.3A) is incomplete, as previously reported (Kube et al., 2005, Seshadri et al., 2005). We could identify putative genes for 2-oxoglutarate synthase and succinyl Co-A synthetase (with 26% amino acid sequence identity to the Methanococcus jannaschii gene), and fumarate reductase/succinate dehydrogenase (with 31-33% amino acid sequence identity to the E. coli gene), but we could not find a gene encoding the citrate synthase (CS) in Dehalococcoides. In a scenario without CS, carbon assimilation could occur using a reductive TCA-cycle. However, the biosynthetic formation of citrate by Dehalococcoides ethenogenes strain 195 was recently demonstrated using 13C-labeled isotopomer experiments although the gene encoding the putative Re-type CS enzyme was not identified (Tang et al., 2009b). The two Dehalococcoides genes that are most similar to the only biochemically characterized Re-type CS gene from Clostridium kluyveri DSM555 (Li et al., 2007) are annotated as isopropyl malate and homocitrate synthase; however, these genes share only 27% amino acid sequence identity with CS gene from C. kluyveri. Hence, further experiments are required to establish the role of these genes, as well as the aforementioned putative TCA-cycle genes in Dehalococcoides.
48
Nonetheless, these isotope labeling studies suggest the formation of 2-oxoglutarate from citrate through the oxidative branch of the TCA-cycle.
In order to analyze the effect of the presence of CS reaction on Dehalococcoides growth, we conducted growth simulations with and without this reaction in iAI549 (Figure 3.4). Only a subtle difference in the growth rate (0.0137 h-1 vs. 0.014 h-1) and yield (0.72 gDCW/eeq vs. 0.71 gDCW/eeq) was observed (Figures 3.4A, 3.4B, and Table A14 in Appendix A). Hence, regardless of whether the TCA- cycle is oxidative (Figure 3.4B) or reductive (Figure 3.4A), the fact that it is incomplete explains why Dehalococcoides are unable to use acetate as their energy source. Interestingly, iAI549 has one anaplerotic reaction — pyruvate carboxylase (PC) — which produces oxaloacetate from pyruvate (Figures 3.3A, 3.4A, and 3.4B). Generally, anaplerotic reactions generate intermediates of a TCA-cycle, but in the absence of a CS reaction, PC is essentially the sole pathway for producing oxaloacetate in the TCA-cycle of iAI549.
3.4.2.2. CO2-fixation by Dehalococcoides
Analysis of iAI549 also revealed the presence of a carbon fixation step via pyruvate-ferredoxin oxidoreductase or pyruvate synthase (POR) enzyme encoded by 4 putative Dehalococcoides genes (gene number 181, 182, 183, 184; Table S13 in Table A15 in Appendix A). Anaerobes such as Geobacter sulfurreducens and Methanosarcina barkeri are also reported to utilize this step in their central metabolism (Bock et al., 1996, Mahadevan et al., 2006). POR is essential for the in silico growth of Dehalococcoides using iAI549 since it is the only pathway for producing pyruvate from acetate (Figures 3.3A, 3.4A, and 3.4B). Growth simulations of iAI549 further
predict that 33% of the total moles of carbon fixed into the biomass is from extracellular CO2 via POR, and the balance (67%) is from extracellular acetate through acetyl-CoA synthetase (Figure
3.3B); thus, clearly highlighting the important requirement for extracellular CO2 in addition to acetate as a carbon source for Dehalococcoides.
Moreover, the presence of both POR and carbon-monoxide dehydrogenase enzymes (CODHr) encoded by 4 putative genes of iAI549 (gene number 170, 171, 172, 174; Table S13 in Table A15 in Appendix A) initially suggested that the Wood-Ljungdahl pathway (Wood and
49
Ljungdahl, 1991) of CO2 fixation might be active in Dehalococcoides. However, the absence of several key enzyme encoding genes, such as the methylenetetrahydrofolate reductase and a methyltransferase in the folate-dependant branch of the Wood-Ljungdahl pathway (Drake et al., 2008, Müller, 2003, Ragsdale, 2008) indicated that the pathway was incomplete in Dehalococcoides (Figure A5 in Appendix A). All of these observations are consistent with the carbon labeling studies by Tang et al. (2009b).
50
51
Figure 3.3. The reconstructed TCA-cycle and CO2 fixation pathway of Dehalococcoides. The arrows show the directionality of the reactions. (A) Grey: citrate synthase gene currently not identified in iAI549, but the pathway was suggested to be present in the carbon isotope labeling study (Tang et al., 2009b); Orange: pathways for which homologous putative genes (~30% amino acid sequence identity) were tentatively identified in Dehalococcoides, but are suggested to be absent by the carbon isotope labeling study (Tang et al., 2009b); Red: pathways for which putative genes are confirmed to be present by both iAI549 and the carbon isotope labeling study (Tang et al., 2009b). In all cases, the TCA-cycle of Dehalococcoides is not closed which explains their inability to use acetate as an energy source. (B) Dehalococcoides’ requirement of CO2 in addition to acetate for their in silico growth. The numbers are flux values in mmol.gDCW-1.h-1. During pyruvate synthesis, Dehalococcoides require 67% carbon (molar basis) from acetate and 33% (molar basis) from CO2. Thus, Dehalococcoides fix carbon via the pyruvate-ferredoxin oxidoreductase or pyruvate synthase (POR) pathway.
52
53
Figure 3.4. Analysis of the citrate synthase (CS) reaction on Dehalococcoides growth. (A) In the absence of the CS reaction, the TCA-cycle operates reductively via succinyl-CoA synthetase and 2-oxoglutarate synthase for producing biomass precursors for Dehalococcoides to grow. (B) The oxidative TCA-cycle operates when the CS reaction is present, but succinyl-CoA synthetase and 2-oxoglutarate synthase are absent, as suggested by the carbon isotope labeling experiment (Tang et al., 2009b). However, Dehalococcoides growth remains almost unchanged with and without the CS reaction (0.0137 h-1 vs. 0.014 h-1) as represented by the flux values obtained from the growth simulations of iAI549.
3.4.3. Energy conservation process of Dehalococcoides
Dehalococcoides strains respire through a membrane-bound electron transport chain (ETC) (Hölscher et al., 2003, Jayachandran et al., 2004, Nijenhuis and Zinder, 2005), which is
incompletely defined. In addition to RDase and hydrogenase (H2ase) enzymes, the ETC of
Dehalococcoides requires an in vivo electron carrier to mediate electron transport between H2ase and RDase. The reductive dechlorination reaction requires an in vivo electron donor of redox ’ potential (E0 ) ≤ -360 mV (Hölscher et al., 2003, Jayachandran et al., 2004, Nijenhuis and Zinder, 2005) similar to other dechlorinating bacteria (Holliger et al., 1998b, Krasotkina et al., 2001, Miller et al., 1997). The cob(II)alamin of corrinoid cofactor in the RDase enzyme is reduced to cob(I)alamin during the reductive dechlorination reaction; hence, necessitating a low-potential
donor because the redox potential (E0’) of Co(II)/Co(I) couple is between -500 and -600 mV (Banerjee and Ragsdale, 2003, Holliger et al., 1998b, Krasotkina et al., 2001). While quinones, such as menaquinone or ubiquinone could act as electron carriers in anaerobes (Kröger et al., 2002, Louie and Mohn, 1999, Schumacher and Holliger, 1996), experimental evidence suggests this is not the case in Dehalococcoides (Jayachandran et al., 2004, Nijenhuis and Zinder, 2005). ’ Moreover, the half reaction potentials for quinones (Menaquinone ox/red E0 = -70 mV, ’ Ubiquinone ox/red E0 = +113 mV (Thauer et al., 1977)) are not compatible with RDases’
requirement of a donor of E0’ ≤-360 mV.
Therefore, we hypothesize that ferredoxin could be a low-potential electron donor for the RDase of Dehalococcoides because it is the most electronegative electron carrier yet found in the bacterial ETCs (Bruschi and Guerlesquin, 1988, Valentine, 1964). Various redox potentials had been reported for bacterial ferredoxins, which included -417 mV at pH 7.55 for Clostridium
54
pasteurianum (Valentine and Wolfe, 1963), -398 and -367 mV in the range of pH 6.13 to 7.41 for C. pasteurianum (Eisenstein and Wang, 1969, Thauer et al., 1977), -445 mV at pH 7 for Dehalospirillum multivorans (Miller et al., 1997), -453 mV at pH 8 for Thermotoga maritime (Sterner, 2001). While these experimental data illustrate the differences in ferredoxin potential across microbes, it also supports their putative role as a low-potential electron carrier in the Dehalococcoides ETC. Furthermore, there was strong genomic evidence that the sequences of rdh genes contained two iron-sulfur cluster binding motifs, which are the characteristic motifs for bacterial ferredoxins (Hölscher et al., 2004). So far, the genomes of Dehalococcoides have 6 putative ferredoxin-encoding genes, but no gene was identified for a b-type cytochrome. Miller and colleagues (Miller et al., 1997) described a mechanism for the ETC of D. multivorans
involving both H2ase and RDase enzymes where they propose the “reverse electron transport”, and the requirement of both a low-potential and a high-potential electron carrier for the ETC. Recently, Thauer et al. (2008) suggested that the energy conservation process of methanogens without cytochromes (a system similar to Dehalococcoides) used a flavin-based “electron bifurcation” system where an endergonic reaction was driven by the energy from an exergonic reaction that took place simultaneously. A similar bifurcation mechanism was also proposed for
the trimeric [Fe]-only H2ase of T. maritime (Schut and Adams, 2009). Based on the literature and considering the lack of information on the Dehalococcoides ETC, we propose the following simplified mechanism of energy conservation for its ETC (Figure 3.5).
55
Figure 3.5. A Tentative Scheme for D. mccartyi electron transport chain (ETC). Dehalococcoides grow by conserving the free energy of reductive dechlorination reaction ( + + + → + + - + + RCl Fd red 2H RH Fd ox Cl H ) through the membrane bound ETC. During this process, the donor H2 likely reduces the putative electron carrier oxidized ferredoxin (FdOx), and - the reduced ferredoxin (FdRed) transfers 2e to the terminal electron acceptors chlorinated ethene or benzene (RCl) via cob(II)alamin to produce lower chlorinated compounds or ethene (RH) and hydrogen chloride (HCl). Reduction of ferredoxin and electron acceptors are catalyzed by H2ase and RDase enzymes, respectively, and 2 protons (H+) are consumed from the cytoplasm during the reductive dechlorination reaction. Hence, the proton translocation stoichiometry of Dehalococcoides ETC is 2H+/2e- or 1 H+/e-.
We assumed that the H2ase of Dehalococcoides reduced ferredoxin (FdOx) in a similar process as described for M. barkeri (Deppenmeier, 2002, Deppenmeier, 2004, Hedderich, 2004, Thauer et
al., 2008). Subsequently, the reduced ferredoxin (FdRed) transferred 2 electrons to the terminal electron acceptors such as chloroethenes or chlorobenzenes (RCl) via cob(II)alamin, and cob(II)alamin was reduced to cob(I)alamin while RCl was reduced to lower chlorinated
compounds or ethenes (RH). Alternatively, the endergonic reduction of ferredoxin (FdOx) with
H2 could be coupled to the exergonic reduction of RCl with reduced ferredoxin (FdRed), in which the latter reaction was catalyzed by the RDase in a similar manner as the electron bifurcation scheme. This might be possible because a corrinoid protein, like a flavo-protein, could also be a site for electron bifurcation (R. K. Thauer, personal communication). In either case, we assumed the uptake of two protons (2H+) from the cytoplasm during the transfer of two electrons (2e-)
from the donor H2 to the acceptor RCl; thus, resulting in a net proton translocation stoichiometry of 1 H+ per e- (Figure 3.5).
3.4.4. Implications of the incomplete cobalamin synthesis pathway in Dehalococcoides
Cobalamin or vitamin B12 is essential for RDase activity; however, the pathway for producing cobalamin is incomplete in Dehalococcoides (Kube et al., 2005, Seshadri et al., 2005) (Figure 3.6). The complete de novo biosynthesis (aerobic or anaerobic) of vitamin B12 requires around 30 genes (Warren et al., 2002), of which only 18 are identified in Dehalococcoides. Seven (7) of these genes belong to the “anaerobic” pathway while 2 are found to be involved in the “aerobic”
56
pathway of cobalamin biosynthesis. Several key enzyme encoding genes required for the precorrin ring formation, cobalt insertion, and methylation were not found in Dehalococcoides genomes (Figure 3.6). However, 7 genes of iAI549 (3 core, 1 dispensable, and 3 unique genes: 161, 162, 163, 433, 524, 525, 526; Tables S3-S6 in Table A15 in Appendix A) that encode a putative cobalamin transporter were identified; thus, indicating that Dehalococcoides could uptake vitamin B12 from the medium in the form of either cobinamide or cobalamin (Escalante- Semerena, 2007). In fact, vitamin B12 has been shown to be required for the growth of pure cultures, and its addition to the medium has been reported to enhance the growth rate of Dehalococcoides (He et al., 2007).
57
58
Figure 3.6. Reconstructed cobalamin biosynthesis pathway of Dehalococcoides. Dashed orange lines indicate cell membrane, grey lines indicate missing pathways, and red lines indicate existing pathways, putative genes of which are identified in Dehalococcoides during the reconstruction of iAI549. The arrows are denoting the directionality of the reactions. Since the genomes encode a putative cobalamin transporter, Dehalococcoides may salvage vitamin B12 either in the form of cobinamide or cobalamin from the environment as indicated by ‘cobinamide transport’ and ‘cobalamin transport’ reactions in the figure. The adenosylcobalamin, which is the end product of the entire pathway, is a biomass constituent and is assumed to take part in Dehalococcoides cell formation.
Therefore, in order to examine the influence of cobalamin on the growth of Dehalococcoides, we conducted growth simulations (Figure 3.7) for two scenarios using iAI549: 1) Dehalococcoides growth rate as a function of weight fraction of cobalamin in the biomass and cobalamin salvage rate from the medium (Figure 3.7A), and 2) Dehalococcoides growth yield assuming it could synthesize its own cobalamin (i.e., adding all the reactions to iAI549 required for de novo cobalamin synthesis) compared to the yield when B12 is salvaged from the medium (Figure 3.7B). Predictably, the growth rate decreases to zero at low cobalamin salvage rates (Figure 3.7A). Also, the cobalamin salvage rate at which metabolism becomes limited by vitamin B12 is a strong function of the cobalamin fraction in the biomass, which has never been experimentally measured for Dehalococcoides (Figure 3.7A). From the second simulation, it is clear that the energetic cost for synthesizing cobalamin de novo is not very significant since the predicted yield with and without a cobalamin synthesis pathway is almost identical (Figure 3.7B). Only if one assumes a biomass cobalamin fraction 10 times higher than the maximum reported, a small (4%) reduction in the growth yield (from 0.72 gDCW/eeq to 0.69 gDCW/eeq) is predicted as a penalty for synthesizing cobalamin de novo (Figure 3.7B and Table A13 in Appendix A). This low synthesis cost, along with the fact that cobalamin is essential, yet its synthesis pathway is incomplete in Dehalococcoides suggests that perhaps Dehalococcoides might have evolved syntrophically with cobalamin secreters, and never faced significant evolutionary pressure to acquire a complete cobalamin synthesis pathway in their genomes.
59
60
Figure 3.7. Influence of cobalamin on the growth rate and yield of Dehalococcoides. (A) Growth rate of Dehalococcoides is simulated as a function of both cobalamin salvage rate and cobalamin fraction in the biomass equation. It shows the role of cobalamin in limiting the growth rate of Dehalococcoides. Clearly, the cobalamin uptake or salvage rate at which Dehalococcoides growth is limiting increases with the increase of cobalamin fraction in the biomass. (B) The cost of de novo cobalamin synthesis in terms of Dehalococcoides growth yield is compared (see text for details). The predicted yield of Dehalococcoides with and without the de novo cobalamin synthesis pathway remains almost identical for the reported maximum cobalamin fraction in the biomass. However, the predicted yield decreased only by 4% (from 0.72 gDCW/eeq to 0.69 gDCW/eeq) with 10 fold increase of cobalamin fraction in the biomass indicating the low cost of de novo cobalamin synthesis.
3.4.5. Does carbon or energy limit the in silico growth of Dehalococcoides?
Growth of Dehalococcoides is more rapid in mixed microbial communities than in pure cultures (Adrian et al., 1998, Cupples et al., 2003, Duhamel and Edwards, 2007, Duhamel et al., 2002) although the reasons for this discrepancy are not entirely clear. The difference in reported growth yields between pure and mixed cultures is more significant (p = 0.0005 at 95% confidence level) than the difference in reported growth rates (p = 0.05 at 95% confidence level) (Tables A7 and A8 in Appendix A). Thus, in order to examine the growth-limiting conditions, we simulated Dehalococcoides growth yields (Figure 3.8A) under two different conditions: 1) allowing unlimited flux of amino acids in the medium at a hydrogen flux of 10 mmol.gDCW-1.h1 (equivalent to the dechlorination rate obtained from average pure-culture growth yields and rates; Tables A7 and A8 in Appendix A), and 2) doubling the hydrogen flux (20 mmol. gDCW-1.h-1) without allowing any amino acid flux in the medium (Figure 3.8A). The first condition mimics a carbon-rich environment while the second one represents an energy-rich situation. The model predicts that adding unlimited amount of any or all of the amino acids in the growth medium (obviating the need for the cell to synthesize these amino acids) increased the growth yield by a maximum of 55% (1.13 gDCW/eeq) compared to the case with no amino acids in the medium (0.72 gDCW/eeq) (Figure 3.8A). However, doubling only the hydrogen flux enhanced the growth yield by 65% (from 0.72 gDCW/eeq to 1.19 gDCW/eeq) (Figure 3.8A).
To further analyze this aspect of energy limitation, we simulated in silico growth yields of Dehalococcoides as a function of both acetate flux (carbon availability) and energy flux,
61 represented by the energy transfer efficiency (Figure 3.8B). This analysis shows that the growth of Dehalococcoides is energy-limited but not carbon-limited since growth yield increases proportionally to increase in energy transfer efficiency regardless of acetate flux. Moreover, simulations also reveal that growth yield of Dehalococcoides in a pure culture is only 30% efficient (corresponding to the green arrow) compared to 65% efficient (corresponding to the red arrow) in a mixed culture (Figure 3.8B). These simulations point towards the electron flux from hydrogen to the RDase as the rate-limiting step, which is somehow more efficient in mixed cultures. It is possible that interspecies hydrogen transfer, such as in a mixed culture is more direct than hydrogen provided in the medium (as for pure cultures). Electrons supplied from an electrode that was polarized to a very low potential were shown to stimulate Dehalococcoides metabolism (Aulenta et al., 2009), possibly illustrating such an effect; if true, these results suggest a mechanism for the enhanced growth of Dehalococcoides and to a faster dechlorination of pollutants.
62
63
Figure 3.8. Effect of carbon and energy sources on the growth yield of Dehalococcoides. (A) The experimental growth yield of Dehalococcoides in the minimal medium (0.69 gDCW/eeq) is compared with increased growth yields achieved by allowing unlimited fluxes of all amino acids -1 -1 at a H2 flux of 10 mmol.gDCW .h (corresponding to the experimental dechlorination rate), as -1 -1 well as doubling the H2 flux (20 mmol.gDCW .h ). It shows that unlimited flux of amino acids (carbon source) increased the in silico growth yield of Dehalococcoides by 55%, whereas doubling the H2 flux (electron donor or energy source) alone enhanced the yield by 65%. (B) Analysis of the energy limited growth of Dehalococcoides. Since the growth yield of Dehalococcoides varies linearly with the energy transfer efficiency, their yield can be improved by increasing the flux of their energy source or electron donor to generate more ATP per electron. However, the variation in acetate fluxes has no effect on growth yields. Red and green arrows show growth yields and corresponding efficiencies for Dehalococcoides growth in mixed and pure cultures, respectively. ‘MM’ = minimal medium; ‘Tyr’ = tyrosine; ‘Glu’ = glutamate; ‘Gln’ = glutamine; ‘Gly’ = glycine; ‘Ala’ = alanine; ‘Thr’ = threonine; ‘Asp’ = aspartate; ‘All -1 -1 AA’ = all amino acids; ‘2X H2 flux’ = 20 mmol H2.gDCW .h .
As described earlier, experimental studies clearly illustrate the favorable growth of Dehalococcoides in syntrophic microbial consortia compared to their isolated pure cultures. This obviously points towards the existence of some undefined beneficial metabolic interactions among the consortia members. Although iAI549 simulations suggested more efficient electron transfer and energy utilization in a mixed culture, this result requires further experimental validation. Because these microbes harness energy for their growth from reductive dechlorination reactions, their increased growth will certainly accelerate the bioremediation process. Hence, the current challenge is to understand the reason behind their favorable growth in a mixed microbial community prevailing in their natural habitat. Therefore, a genome-scale metabolic model of a syntrophic community of dechlorinating bacteria, where Dehalococcoides are the dominant members, can be useful to understand the factors influencing their growth. This information may also help to develop a defined bacterial community with enhanced bioremediation capability, in addition to developing effective strategies for exploiting these microbes for effective bioremediation of contaminated sites around the world.
3.5. Conclusions
Although genome-scale constraint-based models are available for several microbes from all three forms of life, iAI549 is the first such endeavor for dechlorinating bacteria. This constraint-based
64
flux balance model is consistent with the specialized nature of Dehalococcoides metabolism. The model supports the idea that evolution of the chlorinated compound specific rdh genes conferred the strain-specific metabolic phenotype to Dehalococcoides. In addition to cataloguing significant metabolic similarities among Dehalococcoides strains, the model also provides valuable insights regarding physiological and metabolic bottlenecks of these microbes. Reconstructed central metabolic pathways, for example, identified underlying reasons for Dehalococcoides’ requirement of a separate energy source in addition to a carbon source for growth, as well as a carbon fixation step. Also, growth simulations revealed the energy-limited rather than carbon or cobalamin-limited growth of these organisms. In the process of developing the model, detailed tables of metabolic gene correspondences among 4 genomes, reannotations based on pathway analysis, and intrinsic kinetic and stoichiometric parameters were developed for the user community. We also created lists of core hypothetical genes and non-gene associated model reactions; these lists will be useful for designing enzyme assays for functional annotation of the hypothetical genes. Finally and most importantly, this pan-genome-scale metabolic model now provides a common and scalable framework as well as a knowledgebase, which can be used for visualization and interpretation of various omics-scale data from transcriptomics, proteomics and metabolomics for any Dehalococcoides strain; such analysis will further our understanding of these environmentally important organisms so that the outcome of bioremediation can be improved.
65
Chapter 4: New insight into Dehalococcoides mccartyi metabolism from a model-integrated systems-level analysis of D. mccartyi transcriptomes
4.1. Abstract
Organohalide respiration, mediated by Dehalococcoides mccartyi, is a useful bioremediation process that transforms ground water pollutants and known human carcinogens such as trichloroethene and vinyl chloride into benign ethenes. Successful application of this process depends on the fundamental understanding of the respiration and metabolism of D. mccartyi. To better elucidate D. mccartyi metabolism and physiology, we analyzed available transcriptomic data for a pure isolate (Dehalococcoides mccartyi strain 195) and a mixed microbial consortium (KB-1) using the pan-genome-scale metabolic model for D. mccartyi developed in chapter 3. The transcriptomic data, together with available proteomic data helped confirm transcription and expression of the majority of D. mccartyi genes. We also identified functionally enriched important clusters (13 for strain 195 and 11 for KB-1) of co-expressed metabolic genes using the quality threshold clustering algorithm and information from the model. A composite genome of two highly similar D. mccartyi strains from the KB-1 metagenome sequence was constructed, and operon prediction was conducted for this composite genome and other single genomes. This operon analysis, together with clustering analysis of transcriptomic data helped generate experimentally testable hypotheses regarding the function of a number of hypothetical proteins and the poorly understood mechanism of energy conservation in D. mccartyi. Overall, this study shows how an organism’s metabolic model can be used as a platform to analyze and visualize transcriptomic data for obtaining improved understanding of the unusual metabolism of an environmentally important but difficult to grow microorganism.
66
4.2. Introduction
Obligate anaerobes such as Dehalococcoides mccartyi support growth and metabolism by conserving energy from an unusual respiratory metabolic process termed organohalide respiration (Holliger et al., 1998b, Smidt and de Vos, 2004, Tas et al., 2010). The hallmark of this important biological process lies in the detoxification of halogenated xenobiotics such as trichloroethene and vinyl chloride — known human carcinogens and groundwater pollutants — as well as tetrachloroethene, chlorobenzenes, dioxins, and polychlorinated biphenyls (Adrian et al., 2000b, Bunge et al., 2003, He et al., 2003, Maymó-Gatell et al., 1997). However, optimized use of this natural and effective bioremediation process is hampered due to the lack of detailed knowledge about D. mccartyi metabolism, both in pure cultures and in mixed microbial communities they normally inhabit. Although some of the genes and enzymes involved in organohalide respiration are identified and characterized (Adrian et al., 2007b, Jayachandran et al., 2004, Müller et al., 2004, Nijenhuis and Zinder, 2005), mechanism of the respiratory chain and its components, as well as functional annotations of ~50% D. mccartyi genes is yet to be determined (Kube et al., 2005, Seshadri et al., 2005). Due to the associated difficulty in expressing genes heterologously and the lack of a genetic system in D. mccartyi (Löffler et al., 2012), experimental studies on characterization and manipulation of genes and enzymes of these bacteria are challenging. Hence, most studies to date have primarily focused on the identification and characterization of reductive dehalogenase homologous (rdh) genes, and their respective enzyme’s cofactors and substrate ranges (Adrian et al., 2007b, Krajmalnik-Brown et al., 2004, Lee et al., 2006, Magnuson et al., 2000, Magnuson et al., 1998, Müller et al., 2004).
Recently, a number of isotope labeling studies concerning D. mccartyi metabolism have discussed the genes and enzymes of some key metabolic processes, including the TCA-cycle, and amino acid transport and metabolism (Marco-Urrea et al., 2011, Tang et al., 2009b, Zhuang et al., 2011). In addition, sequencing of multiple D. mccartyi genomes (Kube et al., 2005, McMurdie et al., 2009, Seshadri et al., 2005) enabled the construction of a detailed pan-genome- scale constraint-based model of metabolism, which revealed their energy-starved nature, as well as depicted the overall metabolic landscape of D. mccartyi (Ahsanul Islam et al., 2010). Also, a number of proteomic studies (Lee et al., 2012, Morris et al., 2007, Morris et al., 2006, Tang et
67
al., 2013) have provided important information on some metabolic genes and processes, including nitrogen fixation and carbon metabolism of D. mccartyi. Apart from these metabolic studies, data from systems-wide high-throughput experimental studies such as whole genome microarrays are available for Dehalococcoides mccartyi strain 195 (formerly Dehalococcoides ethenogenes strain 195) (Johnson et al., 2008, Johnson et al., 2009, Lee et al., 2011). A shotgun metagenome microarray study on KB-1 — a D. mccartyi-containing anaerobic mixed microbial community — has been published recently (Waller et al., 2012, Waller, 2010). While these studies obtained expression data for all genes, each study focused on analyzing the expression of specific genes involved in, for instance, reductive dechlorination and energy conservation, in cobalamin (vitamin B12) biosynthesis pathway, or phage related genes. None of these studies, however, focused on the analysis of overall D. mccartyi metabolism using genome-wide transcriptomic data. Also, no integrated analysis of the available transcriptomic and proteomic data with the published pan-genome-scale metabolic model of these bacteria has been conducted yet. Such a systemic analysis of the “omics” data can be useful to glean a more comprehensive understanding of the metabolic processes of D. mccartyi, as well as to verify the presence of sequenced genes in their genomes as most genes have only weak bioinformatic evidence.
Here, we analyzed the published transcriptomic data for a pure culture, Dehalococcoides mccartyi strain 195 (from here on, strain 195) (Johnson et al., 2008, Johnson et al., 2009) and a mixed culture, KB-1 (Waller, 2010, Waller et al., 2012) using our previously developed pan- genome-scale D. mccartyi metabolic model (Ahsanul Islam et al., 2010) as a guide. A composite genome of two highly similar D. mccartyi strains in KB-1 (from here on, KB-1 Dhc) was constructed from the publicly available KB-1 metagenome sequences (http://img.jgi.doe.gov/cgi- bin/m/main.cgi) and subsequently used for analyzing D. mccartyi-specific transcriptomic data from the KB-1 community arrays (Waller, 2010, Waller et al., 2012). This model-guided study of transcriptomic data, together with available proteomic data analyzed and confirmed the transcription and expression of the majority of genes in strain 195 and KB-1 Dhc genomes. In addition, we specifically examined and visualized the expression of some metabolic genes and hypothetical proteins, as well as their putative annotations proposed during the metabolic modeling study. Then, operon analysis for the KB-1 Dhc genome and for other single strain- genomes of D. mccartyi, including strains 195, CBDB1, and GT was conducted. The
68
transcriptomic data were further analyzed with the quality threshold (QT) clustering algorithm and functional enrichment analysis, which provided valuable insight on the poorly understood mechanism of energy conservation in these bacteria. Moreover, these bioinformatic analyses of transcriptomic data, along with operon analysis helped suggest putative functions for at least five hypothetical proteins of strain 195. Thus, our analysis provides a guide for selecting and screening some of the hypothetical proteins in D. mccartyi genomes, which can aid future targeted proteomic work to increase our knowledge on the physiology and biochemistry of these useful bacteria.
4.3. Materials and methods
4.3.1. Identification of D. mccartyi genes from KB-1 shotgun microarray data
Pre-processed and normalized transcriptomic data for the KB-1 community were collected from a shotgun microarray study of 33 KB-1 samples (Waller, 2010, Waller et al., 2012). Details of array construction methods, experimental conditions, and array data normalization techniques were described elsewhere (Waller, 2010, Waller et al., 2012). RNA was collected from KB-1 cultures amended pairwise with and without chlorinated acceptors for a given donor (methanol or hydrogen). These RNA samples included combinations with methanol only (M) compared to the same cultures with trichloroethene and methanol (TCEM), cis-1, 2 dichloroethene and methanol (cDCEM), vinyl chloride and methanol (VCM), and vinyl chloride and hydrogen (VCH) (Waller, 2010, Waller et al., 2012). In these experiments, a TCE-grown culture was first purged of all chlorinated substrates and starved for 4 days, prior to amendment with electron donors and acceptors. In addition, arrays were also interrogated with RNA from cultures after being starved (i.e., not amended) for 4 days (NA) and one sample from a culture kept anaerobic, but starved for 1 year (“Starved”) (Waller, 2010, Waller et al., 2012). In total, 33 independent RNA samples and corresponding array data were used for principal component analysis (PCA). Although the KB-1 mixed microbial community mainly comprises dechlorinators, methanogens, acetogens, and fermenters (Duhamel and Edwards, 2006, Duhamel et al., 2004, Edwards and Cox, 1997, Waller, 2010), D. mccartyi are the dominant members that detoxify toxic chlorinated solvents (Duhamel and Edwards, 2006, Duhamel et al., 2004, Edwards and Cox, 1997, Waller,
69
2010). In addition, only D. mccartyi-specific array data can be integrated with the pan-genome- scale metabolic model (Ahsanul Islam et al., 2010); hence, only those genes and the corresponding array data were analyzed in this study. The data were extracted from KB-1 arrays and nucleotide sequences following a simple workflow (Figure B1 in Appendix B). First, all array sequences were aligned against the non-redundant nucleotide database (“nt”) from NCBI (http://www.ncbi.nlm.nih.gov/nuccore) with BLAST (blastn) (Altschul et al., 1997) for identifying their species level identity. Sequences that matched to a database D. mccartyi genome as the best hit with > 85% identity at the nucleotide level were chosen as D. mccartyi genes. Next, all array sequences were compared to the NCBI non-redundant protein database (“nr”) (http://www.ncbi.nlm.nih.gov/protein) with BLAST (blastx) (Altschul et al., 1997) for identifying their annotations. Since D. mccartyi genomes are very similar (Ahsanul Islam et al., 2010, Hug et al., 2011, Kube et al., 2005, Seshadri et al., 2005), only sequences that matched to the database D. mccartyi genes with > 95% identity at the amino acid level were retained for subsequent analyses. Finally, KB-1 array nucleotide sequences were compared to the draft composite genome of KB-1 Dhc as constructed from the KB-1 metagenome (Hug, 2012). Afterwards, results from all three analyses were compared, and only consensus array sequences and corresponding intensity data were selected as KB-1 Dhc array data (Figure B1 in Appendix B). Out of a total of 26,186 sequences on the shotgun array, 1,162 consensus sequences were identified as D. mccartyi. Subsequently, the data were analyzed with QT clustering algorithm (Heyer et al., 1999) followed by mapping to D. mccartyi metabolic model (Ahsanul Islam et al., 2010) for conducting functional enrichment analysis of the clusters (Mahadevan et al., 2008, Tavazoie et al., 1999, Huang et al., 2009) (Figure B1 in Appendix B).
4.3.2. Dehalococcoides mccartyi strain 195 microarray data
Pre-processed and normalized transcriptomic data for Dehalococcoides mccartyi (formerly Dehalococcoides ethenogenes) strain 195 was obtained from published literature (Johnson et al., 2008, Johnson et al., 2009) and NCBI GEO database (http://www.ncbi.nlm.nih.gov/geo/). In total, microarray data for 9 experimental conditions and 27 samples were analyzed, where each condition comprised 3 biological replicates. Experimental conditions include the growth of Strain 195 in 5 phases — early exponential (EE), late exponential (LE), transition (TR), early
70
stationary (ES) and late stationary (LS) — of its growth curve (Johnson et al., 2008). Arrays were also generated for RNA samples collected from the cultures growing in Strain 195 medium with the addition of: high and low concentrations of vitamin B12 (HighB12 and LowB12), filter sterilized supernatant of ANAS (ANASspent), and ANAS mineral medium (ANASmedium) (Johnson et al., 2009). ANAS is an anaerobic and TCE-dechlorinating mixed methanogenic microbial community that was enriched from the contaminated sites of Alameda Naval Air Station (Richardson et al., 2002). Of the total 1,579 array sequences, 1,560 non-duplicate sequences and corresponding array data from 27 samples were further analyzed following a workflow (Figure B2 in Appendix B). After PCA, array data for all samples were mapped to D. mccartyi metabolic model for identifying metabolic genes followed by clustering of genes with the QT clustering algorithm (Heyer et al., 1999). Then functional enrichment analysis was performed through calculation of enrichment p-values for metabolic genes in each cluster with hypergeometric distribution method (Figure B2 in Appendix B).
4.3.3. Operon prediction for Dehalococcoides mccartyi genomes
Operon predictions for both KB-1 Dhc and strain 195 were performed using the procedure described in Bergman et al. (2007). As per the procedure, we randomly chose 27 diverse bacterial genomes (Table S17 in Table B1 in Appendix B) from different branches of the bacterial phylogeny for constructing the barcode. The barcode was generated by identifying homologs of strain 195 and KB-1 Dhc in the chosen bacterial genomes. Subsequently, intergenic distance for each gene was calculated from the positional information of genes in the genome. Intergenic distance and strand location, as well as the barcode information was then used for calculating posterior probabilities of genes to be considered as operonic or not operonic. If the probability value of a gene was ≥ 0.5, it was assigned as an operonic gene; otherwise, genes were not considered as operonic for lower probability values (Appendix B: Table S15 in Table B1). A similar procedure was followed for identifying operon structures of Strains CBDB1 and GT (Appendix B: Table S15 in Table B1).
71
4.3.4. Microarray data analysis and visualization
Clustering analysis and heat map visualization of transcriptomic data were conducted with MeV: MultiExperiment Viewer (Saeed et al., 2003) — an open-source software for analyzing and visualizing microarray gene expression data. First, the array data were mapped to the D. mccartyi metabolic model for identifying metabolic genes and classifying them according to the model subsystems. Next, the quality threshold (QT) clustering algorithm (Heyer et al., 1999) and Spearman’s rank correlation coefficient as the distance metric (Usadel et al., 2009) were used for clustering the gene expression data. The number of clusters generated by QT clustering depends on two parameters: cluster diameter and minimum cluster size; thus, threshold for a cluster diameter and minimum cluster size was chosen as 0.06 and 7 for obtaining very stringent QT clusters. Theses stringent cut offs also ensured that co-expressed or co-transcribed clusters formed were not very large and potentially more meaningful. Using both subsystem and clustering information, hypergeometric p-values were calculated for each QT cluster to identify functionally enriched i.e., overrepresented (p ≤ 0.05) clusters (Mahadevan et al., 2008, Tavazoie et al., 1999, Huang et al., 2009). Subsequently, hierarchical clustering (Eisen et al., 1998) was used for further analysis of some functionally enriched important QT clusters. Absolute intensity values were used for representing if a gene was highly expressed/ transcribed (“on”) or not highly expressed/ not transcribed (“off”) in heat maps. The frequency distribution of intensity values (Figures B3 and B4 in Appendix B) showed that the majority of strain 195 and KB-1 Dhc genes were expressed above intensity values of 800 and 100, respectively. Hence, we set the threshold intensity of 800 (< 800 = “off”, > 800 = “on”) for strain 195 data and 100 (< 100 = “off”, > 100 = “on”) for KB-1 Dhc arrays to represent as heat maps. Relative or normalized gene expression intensities were calculated using the formula: normalized intensity value = [(absolute intensity value) – mean of absolute intensity values in a row)] / [standard deviation of absolute intensity values in a row]. Thus, normalized intensities depicted the highest and lowest expression of any gene across all samples. Principal component analysis of all array data was performed by MATLAB (The Mathworks Inc.), and the metabolic network of D. mccartyi was visualized with Cytoscape (Smoot et al., 2011).
72
4.4. Results and discussion
4.4.1. Principal component analysis of strain 195 and KB-1 Dhc microarray data
Principal component analysis (PCA) is a useful statistical method to identify underlying trends of a high-dimensional data set such as microarray data by reducing its dimensionality and extracting important information (Clark and Ma’ayan, 2011, Gehlenborg et al., 2010, Hotelling, 1933). PCA was performed for strain 195 and KB-1 Dhc array data to analyze their dimensionality and variability (Figure 4.1). In total, data for 27 strain 195 samples under 9 conditions (Figure 4.1A) and 33 KB-1 Dhc samples under 7 conditions (Figure 4.1B) (Johnson et al., 2008, Johnson et al., 2009, Waller, 2010, Waller et al., 2012) were analyzed by PCA. Strain 195 samples (Figure 4.1A) were collected from parallel triplicate cultures during sequential dechlorination of trichloroethene (TCE) at 5 time points: Early Exponential (EE), Late Exponential (LE), Transition (TR), Early Stationary (ES), and Late Stationary (LS), in high and low vitamin B12 concentrations (HighB12 and LowB12), and in two different growth media with higher nutrient contents (ANASmedium and ANASspent) (Johnson et al., 2008, Johnson et al., 2009). ANAS is an enrichment culture of a D. mccartyi-containing methanogenic mixed microbial community (Richardson et al., 2002, West et al., 2008), and array experiments were conducted with strain 195 in the ANAS mineral medium (ANASmedium), as well as in the filter sterilized supernatant of ANAS culture (ANASspent) (Johnson et al., 2009). The PCA-plot (Figure 4.1A) shows good agreement between triplicate samples for the corresponding conditions, indicating that the biological replicates behaved consistently in the array experiments.
The samples used for extracting RNA to interrogate KB-1 Dhc arrays were comparisons of mainly two growth conditions: one with and one without a chlorinated electron acceptor (Waller, 2010, Waller et al., 2012); specifically, KB-1 cultures grown with trichloroethene and methanol (TCEM) were compared to cultures grown with methanol (M) only. Other conditions tested included cis-1,2-dichloroethene and methanol (cDCEM), vinyl chloride and methanol (VCM), and vinyl chloride and hydrogen (VCH). These samples were also compared to samples that were not amended with any substrates for 4 days (NA), and for 1 year (“Starved”) (Figure 4.1B).
73
Although methanol is supplied to the KB-1 community as the electron donor, it is fermented to
H2 which is the direct electron donor for D. mccartyi strains in KB-1 (Duhamel and Edwards, 2006, Waller, 2010). RNA for the cDCEM and starved conditions was arrayed only once while multiple biological replicates for other conditions were analyzed (TCEM: 3 samples, VCM: 10 samples, VCH: 2 samples, M: 11 samples, and NA: 5 samples). PCA showed high dimensionality in KB-1 Dhc array data (Figure 4.1B), which primarily stemmed from the type of array technology (shotgun “spotted” DNA array) and the experimental approach used (sample collection for only one time point 4 hours after substrate addition), as well as the inherent variability of working with a mixed microbial culture.
Strain 195 arrays were short oligonucleotide-based Affymetrix microarrays (Johnson et al., 2008, Johnson et al., 2009), whereas KB-1 Dhc arrays were shotgun “spotted” DNA microarrays where the DNA probe samples were generated from PCR-amplified shotgun clones of mixed culture DNA (Waller, 2010, Waller et al., 2012). Spotted arrays, in general, are more dimensional and noisy than oligonucleotide arrays due to the nature of spotting procedure and the use of different fluorescent dyes (Cy3 and Cy5) in target preparation. These factors can also contribute to the cross hybridization of different cell populations to the same array (Allison et al., 2006, Lee and Saeed, 2007, Schulze and Downward, 2001). Moreover, transcriptomic data for KB-1 Dhc were extracted from the shotgun metagenome microarray experiments of KB-1 (see materials and methods for details) — an anaerobic mixed microbial community mainly includes dechlorinators, acetogens, methanogens, and fermenters (Duhamel and Edwards, 2006, Duhamel et al., 2004, Edwards and Cox, 1997, Waller, 2010); hence, unlike studies with pure cultures such as strain 195, growth of D. mccartyi strains in KB-1 is always associated with the interactions of other organisms in the consortium. These interactions are further complicated by the fact that KB-1 contains at least two D. mccartyi strains: one grows preferentially on TCE, and the other grows on cDCE and VC (Duhamel et al., 2004). Also, the KB-1 biological replicates were sampled from various experiments conducted separately and at different times, but always consisting of transfers of the same parent culture. Thus, in addition to the type of array and experimental design, the intricate and subtle interactions of microbes in the mixed community are responsible for the observed high dimensionality of KB-1 Dhc array data (Figure 4.1B).
74
75
Figure 4.1. Principal component analysis (PCA) of the array data for strain 195 and KB-1 Dhc samples. (A) Array data for pure culture strain 195 included triplicate biological replicates that were clustered together for each experimental condition by PCA. All samples were used for subsequent data analysis. (B) D. mccartyi-specific array data for biological replicates of KB-1 mixed culture demonstrated variability owing to array type, experimental design, and complex interactions of organisms in the community. Subsequent data analyses, therefore, were conducted with the expression values of all 33 biological replicates. “EE” = early exponential phase, “LE”= late exponential phase, “TR” = transition phase, “ES” = early stationary phase, “LS” = late stationary phase, “HighB12” = higher concentration of vitamin B12 in the medium, “LowB12” = lower concentration of vitamin B12 in the medium, “ANASspent” = ANAS supernatant added medium, “ANASmedium” = growth medium of ANAS cultures, “TCEM” = trichloroethene and methanol, “cDCEM” = cis 1,2-dichloroethene and methanol, “VCM” = vinyl chloride and methanol, “VCH” = vinyl chloride and hydrogen, “M” = methanol only, “NA” = not amended.
4.4.2. Improved identification and confirmation of D. mccartyi genes
Of the total 1560 putative genes in strain 195 genome, only 3 were biochemically characterized: DET0079 (tceA) (Magnuson et al., 2000), DET0318 (pceA) (Magnuson et al., 1998), and DET1363 (mgsD) (Empadinhas et al., 2004). However, none of the 1162 putative genes of KB-1 Dhc was biochemically characterized. Due to the lack of biochemical evidence for the majority of genes in strain 195 and KB-1 Dhc genomes, available high-throughput experimental data such as proteomics (Lee et al., 2012, Morris et al., 2007, Tang et al., 2013) and transcriptomics (Johnson et al., 2008, Johnson et al., 2009, Waller et al., 2012) data can be used to identify and support the existence of these putative genes, if not their functions, in the genomes. Previous proteomic studies (Lee et al., 2012, Morris et al., 2007, Tang et al., 2013) identified only 718 strain 195 and 20 KB-1 Dhc genes (Appendix B: Tables S1 and S2 in Table B1). However, the transcriptomic data for both organisms, analyzed in this study, showed high expression (see Materials and methods for how gene expression cut-off values were chosen to determine “on” and “off” genes) of 925 strain 195 genes and 257 KB-1 Dhc genes in all samples. Apart from these genes, only 229 and 34 genes from strain 195 and KB-1 Dhc were found to be “off” or not transcribed in any sample, and the remaining genes (406 of strain 195 and 871 of KB-1 Dhc) showed high expression in at least one sample (Appendix B: Tables S1 and S2 in Table B1). Thus, the majority (~60%) of genes were transcribed in all strain 195 samples, while the majority (~75%) of KB-1 Dhc genes showed high expression in at least one sample. Overall, the existence of more strain 195 genes was supported by both proteomic and transcriptomic evidence as
76
compared to KB-1 Dhc genes. We further discussed the proteomic and transcriptomic evidence for hypothetical proteins and metabolic genes in the following sections.
4.4.3. Confirmation of hypothetical proteins in strain 195 and KB-1 Dhc genomes
Hypothetical proteins or genes with unknown functions constitute ~33% (523) of strain 195 and ~22% of KB-1 Dhc (264) genomes, the latter being a draft genome. Analysis of transcriptomic data (Figure 4.2) revealed high expression of 243 (Appendix B: Table S3 in Table B1) and 56 (Appendix B: Table S5 in Table B1) hypothetical proteins of strain 195 and KB-1 Dhc in all samples, respectively. Notably, 96 of the 243 strain 195 hypotheticals were also detected in previous proteomic studies (Figure 4.2A and Appendix B: Table S3 in Table B1), while none of the 56 KB-1 Dhc hypotheticals has proteomic evidence (Figure 4.2B and Appendix B: Table S5 in Table B1). Thus, the existence of these hypothetical proteins was supported by either proteomic or transcriptomic data or both. However, the majority hypothetical proteins of both genomes (280 of strain 195 and 208 of KB-1 Dhc) were not highly expressed or “on” in all samples, which also included 14 strain 195 (Figure 4.2A and Appendix B: Table S4 in Table B1) and 2 KB-1 Dhc (Figure 4.2B and Appendix B: Table S6 in Table B1) hypotheticals detected in the proteomic studies. From these lists, only 116 and 6 hypothetical proteins of strain 195 (Appendix B: Table S4 in Table B1) and KB-1 Dhc (Appendix B: Table S6 in Table B1) were found to be “off” or not highly expressed in all samples, and the remaining hypotheticals (164 of strain 195 and 202 of KB-1 Dhc) showed high expression in at least one sample; thus, probably be considered as true hypothetical proteins.
77
78
Figure 4.2. Hypothetical proteins of (A) strain 195 and (B) KB-1 Dhc with proteomic and transcriptomic evidence. Hypothetical proteins that are highly expressed (“on”) in all samples are represented by “blue color”, and those that are not highly expressed (“on”) in all samples are represented by “orange color”. Also, hypothetical proteins with both proteomic and transcriptomic evidence are represented by “plain blue and orange colors”, and with only transcriptomic evidence are represented by “grid patterned blue and orange colors”.
4.4.4. Confirmation of metabolic genes in strain 195 and KB-1 Dhc genomes
Metabolic genes from the transcriptomic data were identified by mapping them to the manually curated pan-genome-scale metabolic model for D. mccartyi (Ahsanul Islam et al., 2010) (see Materials and methods, and Appendix B: Figures B1 and B2). As expected, more metabolic genes (467) were identified for strain 195 than for the composite genome of KB-1 Dhc (429) (Appendix B: Tables S7 and S8 in Table B1) because the latter one was a draft genome. Of the 467 putative metabolic genes of strain 195, 314 genes were highly expressed or “on” in all samples, 93 were “on” in at least one sample, and 60 were “off” or not transcribed in all samples. Also, the majority (305) of these metabolic genes (Appendix B: Table S7 in Table B1) were detected in previous proteomic studies (Lee et al., 2012, Morris et al., 2007). Thus, the presence of at least 412 metabolic genes in strain 195 genome was supported by either proteomic or transcriptomic evidence. On the contrary, only 10 out of 429 metabolic genes of KB-1 Dhc were identified in proteomic studies (Morris et al., 2007, Tang et al., 2013) (Appendix B: Table S8 in Table B1); however, 101 of these genes were found to be “on” in all samples, 317 were “on” in at least one sample, and only 11 putative metabolic genes were “off” in all 33 KB-1 Dhc samples (Appendix B: Table S8 in Table B1). Thus, the presence of 418 metabolic genes of KB-1 Dhc, including 10 with proteomic evidence, is supported by transcriptomic data. Most importantly, analysis of transcriptomic data for the hypothetical proteins reannotated during the metabolic modeling study (Ahsanul Islam et al., 2010) showed high expression of 13 strain 195 (Figure 4.3) and 11 KB-1 Dhc hypothetical proteins (Figure 4.4) in at least one sample. Because their presence is supported by either proteomic or transcriptomic evidence or both, these hypothetical proteins are good candidates for future biochemical experiments as their proposed functions can serve as valuable hypotheses to be tested experimentally.
79
80
Figure 4.3. Proteomic and transcriptomic evidence for the hypothetical proteins of strain 195 reannotated in the D. mccartyi metabolic model. Transcriptomic evidence for the reannotated hypothetical proteins is presented as heat maps while proteomic evidence is obtained from literature (Lee et al., 2012, Morris et al., 2007). Proposed functions and the metabolic pathways in which the hypothetical proteins were involved in the metabolic model are also shown in the table.
Further analysis of the transcriptomic data for metabolic genes identified the presence of more rdhA genes — involved in the energy conserving reductive dechlorination reaction — for KB-1 Dhc (20 rdhAs) than for strain 195 (17 rdhAs) (Figure 4.5, and Appendix B: Tables S9 and S10 in Table B1), and 7 of those were homologous to strain 195 rdhA genes (Figure 4.5A). The KB-1 rdhA genes included homologs of the characterized pceA (Magnuson et al., 1998) and vcrA genes (Hug, 2012, Waller, 2010, Müller et al., 2004); however, probes for homologs of other characterized rdhAs such as bvcA (Krajmalnik-Brown et al., 2004) and tceA (Magnuson et al., 2000) were not present in KB-1 Dhc shotgun arrays. A recent proteomic study (Tang et al., 2013) of KB-1 identified 5 rdhAs, including vcrA (KB1_1502), bvcA (KB1_6), tceA (KB1_1037), RdhA5 (KB1_0072), and RdhA1 (KB1_0054). In total, 6 out of 17 KB-1 rdhAs were highly expressed in all samples while only one rdhA gene (KB1_1570) was “off” in all samples (Figures 4.5A and 4.5B). Most importantly, a total of 12 KB-1 rdhAs were transcribed even in the starved condition (Figures 4.5A and 4.5B), indicating that the genes were not strictly regulated by the presence of chlorinated substrates. This notion is further evident from the rdhA expression profiles (Figures 4.5A and 4.5B) which do not show any major difference between the samples with chlorinated solvents and those without. All the M and NA samples showed almost similar expression patterns.
Among the strain 195 rdhA genes, only 2 (DET1559 and DET0079, tceA) out of 17 were highly transcribed or “on” in all samples (Figures 4.5A and 4.5B); tceA was transcribed because TCE was used as the electron acceptor in all samples, but the expression of DET1559 seemed to be constitutive as noted previously (Adrian et al., 2007b, Morris et al., 2007, Morris et al., 2006). Also DET1545, similar to previous studies (Rahm and Richardson, 2008a, Rahm and Richardson, 2008b), was highly transcribed even in the stationary phase when the substrate concentration was low (Figure 4.5A).
81
82
Figure 4.4. Proteomic and transcriptomic evidence for the hypothetical proteins of KB-1 Dhc reannotated in the D. mccartyi metabolic model. Transcriptomic evidence for the reannotated hypothetical proteins is presented as heat maps while proteomic evidence is obtained from literature (Lee et al., 2012, Morris et al., 2007). Proposed functions and the metabolic pathways in which the hypothetical proteins were involved in the metabolic model are also shown in the table.
83
84
Figure 4.5. Expression of reductive dehalogenase homologous (rdhA) genes. Absolute intensities of (A) homologous and (B) non-homologous rdhA genes of strain 195 and KB-1 Dhc are illustrated as heat maps. For Strain 195 data, the characterized genes, tceA and pceA (Magnuson et al., 2000, Magnuson et al., 1998), and DET1559 were highly expressed as previously reported (Rahm and Richardson, 2008a, Rahm and Richardson, 2008b). DET1545 and its homolog in KB-1 Dhc, KB1_0072, were expressed at highest levels in late stationary or unamended conditions (to see this more clearly, refer to absolute values of intensities provided in Appendix B: Tables S9 and S10 in Table B1). For KB-1 Dhc rdhA genes, identifiers in parenthesis are provided for cross-referencing as they were used in other studies (Morris et al., 2006, Rahm and Richardson, 2008a, Waller, 2010). Although vcrA and pceA homologs were found, bvcA and tceA homologs were unfortunately not identified as probes in the KB-1 Dhc shotgun arrays. Note that 12 out of 20 rdhAs from KB-1 Dhc were found to be “on” even in the “Starved” condition.
We further visualized the expression of all metabolic genes by overlaying both data sets on the D. mccartyi metabolic network (Appendix B: Figure B5). The reconstructed network (genes, reactions and metabolites) was organized using the organic layout algorithm (http://docs.yworks.com/yfiles/doc/developers-guide/smart_organic_layouter.html) in Cytoscape (Smoot et al., 2011), and genes and reactions were colored (Appendix B: Figure B5A) according to their functional categories in D. mccartyi model (Ahsanul Islam et al., 2010) for obtaining a better topological view of the network. Clearly, genes and reactions involved in the energy metabolism category are very important for D. mccartyi as they formed three distinct clusters in the network (orange colored nodes). Comparison of the absolute intensities of only metabolic genes from both arrays revealed the presence of highest number of highly transcribed (“on”) genes in “ANASspent” and “TCEM” samples (Appendix B: Figures B5B and B5D) while the lowest number of metabolic genes was “on” in “LS” and “Starved” conditions (Appendix B: Figures B5C and B5E) of Strain 195 and KB-1 Dhc, respectively. Interestingly, high expression of the majority of metabolic genes (358 or 77% in strain 195 and 209 or 61% in KB-1 Dhc), including 6 and 12 rdhA genes (Figures 4.5A and 4.5B) in “LS” and “Starved” conditions (Appendix B: Tables S11 and S12 in Table B1), suggests that D. mccartyi metabolism remains active even when the organisms are not growing. This constitutive gene expression under non- growth conditions further indicates that perhaps many of the metabolic genes in D. mccartyi are essential or housekeeping genes.
85
4.4.5. Clustering of microarray data and operon predictions
In addition to confirming the existence of sequenced genes in strain 195 and KB-1 Dhc genomes, we also analyzed both transcriptomic data sets with the quality threshold (QT) clustering algorithm (Heyer et al., 1999) for identifying clusters of co-expressed or co-transcribed genes (Hanson et al., 2009). QT clustering is an unsupervised algorithm that, in addition to finding co- expressed gene clusters, ensures the quality of formed clusters by applying quality thresholds such as minimum cluster diameter and minimum cluster size (Heyer et al., 1999). Using very stringent cut-offs for QT clustering (see Materials and methods), we obtained 30 QT clusters of 7 – 31 genes for strain 195 and 26 QT clusters of 7 – 35 genes for KB-1 Dhc (Appendix B: Tables S13 and S14 in Table B1). D. mccartyi genes were categorized in 7 different model subsystems (i.e., functional categories) based on their involvement in different metabolic pathways in the previously developed metabolic model (Ahsanul Islam et al., 2010). We used these functional classifications for identifying metabolic genes in each QT cluster (see Materials and methods, and Appendix B: Figures B1 and B2). Furthermore, hypothetical proteins and genes without any particular annotations or predicted functions were catagorized as “unknown function” while genes involved in regulation, DNA repair, replication and recombination were classified as “non- metabolic function”. Subsequently, functional enrichment analysis (Huang et al., 2009) (Figure 4.6) was performed for all QT clusters, and enrichment p-values were calculated using the hypergeometric distribution method (Huang et al., 2009). Enrichment analysis essentially selects a subset of genes from a larger gene list, in which genes having similar metabolic functions, or genes involved in the same metabolic pathway have a higher likelihood or enriched potential to be selected as a group (Huang et al., 2009, Mahadevan et al., 2008, Tavazoie et al., 1999). It also helps in observing the frequencies of genes from particular functional categories in a cluster by chance (Huang et al., 2009, Mahadevan et al., 2008, Tavazoie et al., 1999). We obtained 13 and 11 functionally enriched, i.e., overrepresented clusters (p < 0.05) for strain 195 and KB-1 Dhc, respectively (Figures 4.6A and 4.6B).
We further predicted the operon structures of strain 195 genome and the composite genome of KB-1 Dhc with a published operon prediction algorithm (Bergman et al., 2007). This algorithm was chosen because of its improved prediction capability for any newly sequenced genome and
86
ease of implementation as it does not require any experimental data (Bergman et al., 2007). Since operons are sets of multiple co-transcribed genes forming a single mRNA sequence (Jacob and Monod, 1961), they encode proteins of similar metabolic or regulatory functions; hence, this information, together with co-expressed gene clusters, can be used to infer functions for hypothetical proteins and proteins with unknown functions (Aravind, 2000, Hanson et al., 2009, Overbeek et al., 1999). Of the 1589 and 1614 total genes in the genome of strain 195 and in the contigs from D. mccartyi strains in KB-1, 1251 (79%) and 984 (61%) were identified to be part of an operon (i.e., operonic) comprising 348 and 318 multigene operon pairs, respectively (Appendix B: Table S15 in Table B1). Due to the low number (61%) of predicted operonic genes for KB-1 Dhc, we tested the prediction capability of the algorithm by applying it to two other publicly accessible and complete D. mccartyi genomes — strains CBDB1 and GT — that share high nucleotide similarity and gene synteny with KB-1 Dhc (Hug et al., 2011). Strain CBDB1 contains 79% (1150 of 1457) operonic genes consisted of 333 multigene operon pairs while strain GT has 295 such operon pairs comprising 78% (1119 of 1432) of genes in the genome (Appendix B: Table S15 in Table B1). Our operon predictions for strains 195 and CBDB1 (79% for each) are comparable to the publicly available results for those genomes (71% and 76%) in the DOOR database (Mao et al., 2009) (Appendix B: Table S15 in Table B1). Operon prediction result for the composite genome of D. mccartyi strains in KB-1 was lower because only a draft genome assembled from the KB-1 metagenome is available, and contig breaks can disrupt operons.
87
88
Figure 4.6. Functional enrichment analysis of QT clusters for (A) strain 195 and (B) KB-1 Dhc array data. Genes in each QT cluster were categorized according to the subsystems or functional categories of D. mccartyi metabolic model. Next, enrichment p-values were calculated using hypergeometric distribution for each QT cluster to identify which clusters were enriched with genes from a particular subsystem. This analysis identified 13 and 11 clusters of co- expressed genes for strain 195 and KB-1 Dhc, which were significantly overrepresented by genes from specific functional categories. Such functionally enriched clusters are shaded in red (p ≤ 0.05) while black (No gene) indicates the absence of a gene from the corresponding subsystems, and green represents non-significant p-values (p > 0.05) for the clusters.
4.4.6. Functionally enriched QT clusters
Functional enrichment analysis (Mahadevan et al., 2008, Tavazoie et al., 1999, Huang et al., 2009) for the QT clusters of strain 195 and KB-1 Dhc was performed to obtain better insight into the contents of each co-expressed cluster. Although each QT cluster contains important information, functionally enriched clusters emphasize the presence of genes from a certain functional category is statistically significant, and potentially all genes in the cluster might be related to similar functions, or involved in similar metabolic pathways (see Appendix B: Tables S13 and S14 in Table B1 for a list of all QT clusters and genes). These clusters are, therefore, useful in predicting and analyzing the functions of hypothetical proteins within them. Of the 13 and 11 functionally enriched QT clusters of strain 195 and KB-1 Dhc, some are enriched for more than one functional category (Figures 4.6A and 4.6B). This multiple enrichment situation indicates genes belonged to the enriched categories are probably functionally related, or may be regulated by common regulators. Since D. mccartyi are organohalide respiring microbes, QT clusters enriched for genes involved in energy metabolism, such as hydrogenases, reductive dehalogenases, and proton translocating NADH-dehydrogenases, are very important. Also, QT clusters enriched for genes from multiple functional categories, including genes with unknown functions, are interesting as the co-expressed metabolic genes probably help annotate the hypothetical proteins. Thus, further analysis of two functionally enriched QT clusters (Figure 4.7) is described in the following sections and summarized in Appendix B: Table S16 in Table B1.
89
4.4.7. Analysis of strain 195 QT cluster 2
Cluster 2 of strain 195 comprises 25 genes and is overrepresented by genes from central carbon metabolism, nucleotide metabolism, and of unknown function (Figure 4.6A). The absolute gene- expression profile (Figure 4.7A) shows that genes in this cluster have similar expression patterns with higher expression in “HighB12” and “ANASspent” conditions. However, the relative gene- expression profile (Figure 4.7B) indicates the genes were most highly and lowly transcribed in “ANASspent” and “LS” conditions, respectively. Since genes in this cluster are mostly growth related, as suggested by the enrichment of genes from central carbon metabolism and nucleotide metabolism categories, higher gene-expression likely indicates a faster growth of strain 195. Also, the filter sterilized supernatant of ANAS cultures (i.e., ANASspent) added growth medium probably had the highest nutrient content (Richardson et al., 2002, Johnson et al., 2009) as compared to the rest of the conditions; hence, higher transcription of genes in the “ANASspent” condition (Figure 4.7B) was likely due to the favorable growth of strain 195. The lowest concentration of substrate and nutrient in the “LS” condition caused slow growth of strain 195 (Johnson et al., 2008) and was possibly responsible for the lowest gene-transcription (Figure 4.6B). In the metabolic modeling study (Ahsanul Islam et al., 2010), the central metabolic genes (DET0509 and DET0742) of this cluster were suggested to be involved in glycolysis/gluconeogenesis and sugar metabolism to produce precursors for cell membrane biogenesis (Kanehisa et al., 2011, Markowitz et al., 2009, Nelson and Cox, 2006). DET0509 (hypothetical protein) was annotated as a putative bifunctional phosphoglucose isomerase (EC: 5.3.1.8)/phosphomannose isomerase (EC: 5.3.1.9) during extensive curation of the D. mccartyi metabolic model (Ahsanul Islam et al., 2010) (Table 4.1 and Appendix B: Table S16 in Table B1). Thus, its inclusion in a central carbon metabolism gene enriched cluster further supports its annotation. Similarly, two other operonic hypothetical proteins, DET0591 and DET0592 (Figure 4.7B and Table 4.1), of this cluster are probably involved in sugar or carbohydrate metabolism because they clustered closer to the central metabolic genes (DET0509 and DET0742) during hierarchical clustering (Figure 4.7B). Moreover, two other genes (DET0590: glyceraldehyde-3- phosphate dehydrogenase and DET0593: enolase) of this operon (Markowitz et al., 2009) are also involved in sugar metabolism (Kanehisa et al., 2011). In fact, DET0592 is 58% identical at the amino acid level to the biochemically characterized maltose-6-phosphate glucosidase (EC:
90
3.2.1.122) of Fusobacterium mortiferum (Thompson et al., 1995) in SWISSPROT (Boeckmann et al., 2003) and PDB (Berman et al., 2000); hence, annotated as a putative maltose-6-phosphate glucosidase involved in carbohydrate metabolism (Kanehisa et al., 2011) (Table 4.1 and Appendix B: Table S16 in Table B1).
The cluster also includes three putative lipid metabolism genes that are members of the same operon: DET0369, DET0371 and DET0372 (Table 4.1). DET0369 (EC: 1.17.7.1) and DET0371 (EC:1.1.1.267) are involved in isoprenoid biosynthesis using the non-mevalonate pathway (Brammer et al., 2011, Kanehisa et al., 2011, Kemp et al., 2002, Ramsden et al., 2009) while DET0372 (phosphatidate cytidylyltransferase, EC: 2.7.7.41) takes part in glycerophospholipid metabolism (Kanehisa et al., 2011, Markowitz et al., 2009), the main structural components of biological cell membranes (Nelson and Cox, 2006) (Figure 4.7B and Table 4.1). Two operonic transporter genes (DET0417 and DET0418) were proposed to be putative L-glutamine transporters during the previous modeling study (Ahsanul Islam et al., 2010); however, clustering of DET0418 closer to DET0518 (Figure 4.7B) suggests both are probably methionine transporters. This is because the proposed annotation of DET0518 was a putative methylthioribose-1-phosphate isomerase (EC: 5.3.1.23), involved in methionine metabolism (Kanehisa et al., 2011, Markowitz et al., 2009), in the modeling study (Ahsanul Islam et al., 2010). Intriguingly, the close hierarchical clustering of a putative methionine transporter (DET0418) with a gene involved in glycerophospholipid metabolism (DET0372) (Figure 4.7B and Table 4.1) suggests a potential relationship between amino acid transport and lipid metabolism. A recent isotope labelling study (Zhuang et al., 2011), indeed, showed that strain 195 incorporated methionine from the external medium during growth and dechlorination. Thus, QT clustering analysis of transcriptomic data, along with functional enrichment analysis and operon predictions, helped annotate hypothetical proteins, or propose new annotation for previously annotated genes of strain 195.
91
92
Figure 4.7. Analysis of two functionally enriched strain 195 QT clusters. Two functionally enriched and interesting QT clusters (clusters 2 and 6) of strain 195 transcriptomic data were further analyzed by the hierarchical clustering algorithm as represented by the dendrograms in (B) and (D). Absolute gene expression intentisities of the clusters are plotted in (A) and (C) while relative or normalized gene expression intensities (Materials and methods) are presented as heat maps in (B) and (D). The height of the dendrograms represents the similarity of gene expression patterns and is measured by the Spearman’s rank correlation coefficient (SCC). Genes whose names are in green or orange are part of an operon, but orange further indicates that multiple genes from the same operon are present in the cluster.
4.4.8. Analysis of strain 195 QT cluster 6
Another important QT cluster of strain 195, overrepresented by genes involved in energy metabolism (Figure 4.6A), is cluster 6 comprising 15 genes. Absolute (Figure 4.7C) and relative (Figure 4.7D) gene expression profiles of this cluster showed high and low transcription of genes in” LS” and “ANASspent” conditions, respectively — a scenario opposite to the previously described QT cluster 2. This difference in relative gene expression profiles suggests that strain 195 needs to generate energy by reductive dechlorination to maintain cellular integrity (Pirt, 1965, Pirt, 1982, Russell and Cook, 1995) even though the cells are not growing in the “LS” condition. It also supports the notion of growth-decoupled reductive dechlorination by strain 195 (Maymó-Gatell et al., 1997, Seshadri et al., 2005). Genes in this cluster are mainly involved in energy metabolism, specifically genes present in the respiratory chain of strain 195, including 2 rdhA and 2 rdhB genes (DET0318, pceA, DET0319, DET1558, and DET1559) (Table 4.1 and Figure 4.7D). Interestingly, DET0318 — a biochemically characterized tetrachloroethene (PCE) rdhA (pceA) gene (Magnuson et al., 1998) — was not transcribed in “ANASspent” and “ANASmedium” conditions though it was the most highly transcribed gene during the growth of strain 195 in its own medium (Figure 4.7C). ANAS cultures were not reported to degrade PCE (Lee et al., 2006, Richardson et al., 2002), and the supernatant, as well as the growth medium of ANAS might contain nutrients that possibly inhibited the pceA gene expression.
The cluster also contains a putative flavodoxin gene (DET1501) that is 33% identical at the amino acid level with the biochemically characterized flavodoxin from Desulfovibrio vulgaris strain Hildenborough (Curley and Voordouw, 1988) in SWISSPROT and PDB (Table 4.1 and
93
Figure 4.7D). Flavodoxins are small electron transfer proteins containing a single flavin mononucleotide (FMN) molecule that usually participates in low potential redox reactions (Biel et al., 1996, Sancho, 2006). Thus, the presence of a putative flavodoxin (DET1501) with rdh genes in a co-expressed and energy metabolism gene enriched QT cluster indicates its potential involvement in the reductive dechlorination process, as well as in D. mccartyi respiration (Figure 4.7D, Table 4.1 and Appendix B: Table S16 in Table B1). This hypothesis is further corroborated by the fact that a low potential electron donor is required to continue the reductive dechlorination process (Holliger et al., 1998b, Hölscher et al., 2003). Recently, a flavin mediated “electron bifurcation” mechanism has been reported for anaerobic microorganisms (Herrmann et al., 2008, Thauer et al., 2008), in which an endergonic reaction is driven by the energy from a simultaneously occurring exergonic reaction. The mechanism of D. mccartyi electron transport chain (ETC) is still unknown; however, probable involvement of a flavodoxin, together with reductive dehalogenases in the ETC suggests the possibility of electron bifurcation during the reductive dechlorination process. Also, the inclusion of DET0320 and DET1500 — two putative transcriptional regulators due to their homology (46% sequence identity with E. coli K12) in SWISSPROT, IMG, PDB, and EBI InterProScan (Quevillon et al., 2005) databases — in this cluster suggests their likely involvement in regulating energy conservation processes and reductive dehalogenation, as has been suggested previously (Kube et al., 2005, Seshadri et al., 2005) (Table 4.1 and Appendix B: Table S16 in Table B1). Clustering of similar energy metabolism genes was also observed for KB-1 Dhc transcriptomic data (Appendix B: Table S16 in Table B1).
94
Table 4.1. Strain 195 genes identified in functionally enriched clusters and associated inferred annotations
Model Suggested New Locus Oper Clust Revised Annotation in Gene Primary Annotation Annotation from Subsystem Tag on ID er No the Model No This Study putative bifunctional Central Retain previous DET0509 106 2 113 hypothetical protein phosphoglucose/phosph Carbon annotation omannose isomerase Metabolism Central triosephosphate Retain previous Retain previous DET0742 160 2 192 Carbon isomerase annotation annotation Metabolism 1-hydroxy-2-methyl-2- Retain previous Retain previous Lipid DET0369 84 2 57 (E)-butenyl 4- annotation annotation Metabolism diphosphate synthase 1-deoxy-D-xylulose 5- Retain previous Retain previous Lipid DET0371 84 2 58 phosphate annotation annotation Metabolism reductoisomerase phosphatidate Retain previous Retain previous Lipid DET0372 84 2 59 cytidylyltransferase annotation annotation Metabolism amino acid ABC putative putative glutamine DET0417 91 2 79 transporter; ATP- methionine Transport transporter binding protein transporter amino acid ABC putative putative glutamine DET0418 91 2 80 transporter; permease methionine Transport transporter protein transporter translation initiation methylthioribose-1- Retain previous Amino Acid DET0518 108 2 118 factor, putative, phosphate isomerase annotation Metabolism putative Central DET0591 125 2 hypothetical protein hypothetical protein carbohydrate Carbon
esterase Metabolism 03, Löffler et al., Central DET0592 125 2 hypothetical protein hypothetical protein hosphate Carbon
glucosidase Metabolism reductive tetrachloroethene Retain previous Energy DET0318 71 6 19 dehalogenase, putative reductive dehalogenase annotation Metabolism reductive dehalogenase tetrachloroethene Retain previous Energy DET0319 71 6 446 anchoring protein, reductive dehalogenase annotation Metabolism putative anchoring protein putative Non- DET0320 71 6 hypothetical protein hypothetical protein transcriptional metabolic regulator/activator reductive dehalogenase Retain previous Retain previous Energy DET1558 326 8 523 anchoring protein, annotation annotation Metabolism putative reductive Retain previous Retain previous Energy DET1559 326 8 425 dehalogenase, putative annotation annotation Metabolism putative Non- DET1500 310 8 hypothetical protein hypothetical protein transcriptional metabolic regulator/activator Retain previous Energy DET1501 310 8 flavodoxin flavodoxin annotation Metabolism
95
Although gene expression microarrays are genome-wide high throughput experimental studies cataloguing the global transcriptional changes of an organism, they cannot provide deterministic information such as the activity of genes and enzymes, or their involvement in specific metabolic processes. Hence, this information alone lacks the capability of unraveling and depicting the activity of metabolic genes, as well as the metabolism of an organism. However, if transcriptomic data can be analyzed together with detailed metabolic information such as a pan- genome-scale metabolic model as discussed in this study, they can provide useful insight about the function of metabolic genes, as well as hypothetical proteins. Such integrated analysis can also be instrumental in shedding light on poorly understood physiological processes of difficult to culture organisms like D. mccartyi. That being said, the transcriptomic experiments and data analyzed in this study were not designed specifically to capture the changes in expression pattern of metabolic genes; for instance, D. mccartyi were either growing or not-growing in all experimental conditions, and no specific metabolic perturbations such as the lack of an essential nutrient or vitamin were imposed on them during their growth. Moreover, absolute expression intensities, rather than differential gene expression analysis, of array data were used in our study due to the variability of the array design and array data sources. Hence, future microarray experiments designed to perturb and catalogue metabolic changes in D. mccartyi will be useful for advancing our fundamental understanding about the physiology and metabolism of these environmentally important and difficult to culture microbes.
4.5. Conclusions
Due to the lack of a genetic system and associated challenges of growing pure isolates of D. mccartyi in defined mineral media, detailed biochemical studies concerning their physiology and metabolism are limited. This study analyzed and visualized curated transcriptomic data for strain 195 and D. mccartyi strains in KB-1 (KB-1 Dhc) from various experiments while leveraging our previously developed D. mccartyi metabolic model. Using available transcriptomic and proteomic data from previous studies, we confirmed the presence of the majority of hypothetical proteins and metabolic genes in strain 195 and KB-1 Dhc genomes. We identified a number of high quality clusters for both data sets that provided improved understanding of the genes (such as flavodoxin and rdhs) involved in the yet unknown mechanism of the energy conserving
96 respiratory chain of these organisms. Clustering and functional enrichment analyses of the transcriptomic data highlighted that lipid metabolism, more specifically, cell membrane biogenesis and the function of transporters were very important for D. mccartyi. Operon analysis, as well as the quality threshold clustering of transcriptomic data, provided additional confidence in prior reannotations or new function predictions for a number of genes, including 5 hypothetical proteins. Since hypothetical proteins constitute a major portion of any sequenced genome, predicting function is a significant challenge, and all relevant clues are welcome. Also, predicted annotations for the hypothetical proteins can serve as a guide in designing future experiments for biochemical characterization of these genes. Finally, this meta-analysis clearly shows that the integrated study of high-throughput transcriptomic data with the pan-genome- scale metabolic model for D. mccartyi can advance our knowledge on the fundamental details of the physiology and metabolism of these difficult to grow yet environmentally important anaerobes. This enhanced knowledge of metabolism, in turn, will be beneficial for the optimal use of these bacteria in elucidating global halogen cycles and developing effective strategies for the bioremediation of chlorinated pollutant contaminated sites around the world.
97
Chapter 5: Model-assisted prediction and experimental characterization of isocitrate dehydrogenase and phosphomannose isomerase from Dehalococcoides mccartyi strain KB-1
5.1. Abstract
Proteins of unknown or non-specific function and hypothetical proteins constitute ~50% of the genomes of Dehalococcoides mccartyi — a group of environmentally important strictly anaerobic bacteria. Previous genome-scale modeling and transcriptomic studies on D. mccartyi metabolism led to the review or reannotation of over 80 genes, including some hypothetical proteins. These reannotations were based on various bioinformatic analyses, such as sequence homology analysis, operon analysis, and phylogenetic profiling. Here, we experimentally characterized two of those reannotated genes to verify the proposed annotations: 1) an NADP+- dependent isocitrate dehydrogenase (KB1_0495), and 2) a bifunctional phosphoglucose/phosphomannose isomerase (KB1_0553) from D. mccartyi strain KB-1. The original annotation for KB1_0495 was an NAD+-dependent isocitrate dehydrogenase, and KB1_0553 was originally annotated as a hypothetical protein/SIS domain protein. KB1_0495, also denoted as DmIDH, showed activity primarily with NADP+ as a cofactor, while only phosphomannose isomerase activity was identified and confirmed for KB1_0553, also denoted as DmPMI. Bioinformatic analysis of their sequences suggested their involvement in novel enzyme families within the respective enzyme superfamilies. Thus, the biochemical characterization of KB1_0495 (DmIDH) and KB1_0553 (DmPMI) highlights the importance of gene reannotations from metabolic modeling and transcriptomic studies as valuable hypotheses that, if tested, can enhance our knowledge about the physiology and biochemistry of the organism of interest.
98
5.2. Introduction
A group of one of the smallest free-living organisms, Dehalococcoides mccartyi is important for their unique niche specialty — detoxification of ubiquitous and stable ground water pollutants such as anthropogenic chlorinated ethenes and benzenes into benign or less toxic compounds (Adrian et al., 2007a, Adrian et al., 2000b, He et al., 2003, Löffler et al., 2012, Maymó-Gatell et al., 1997). Only these strictly anaerobic bacteria are capable of harnessing energy for growth from the complete detoxification of known human carcinogens — trichloroethene (TCE) and vinyl chloride (VC) (Guha et al., 2012, TEACH, 2011) — into benign ethenes. This energy- conserving metabolic process, termed organohalide respiration, is catalyzed by reductive dehalogenases, the respiratory enzyme system of D. mccartyi. Although organohalide respiration is useful for the bioremediation of toxic chloro-organic solvents, the process is slow due to the slower growth of D. mccartyi in pure isolates than in mixed microbial communities, their natural habitats (Adrian et al., 1998, Adrian et al., 2000b, Ahsanul Islam et al., 2010). Thus, the fundamental understanding of D. mccartyi metabolism, including the genes and enzymes involved in metabolic processes, is essential for their successful application to bioremediation purposes. So far, numerous systems-level studies on D. mccartyi metabolism, including the construction of a pan-genome-scale metabolic model, and various transcriptomic and proteomic analyses (Johnson et al., 2008, Johnson et al., 2009, Lee et al., 2012, Morris et al., 2007, Morris et al., 2006), have shed light on key metabolic processes and the genes involved.
Although D. mccartyi metabolism is well-studied, only a few metabolic genes are experimentally characterized. These include an Re-citrate synthase (Marco-Urrea et al., 2011) involved in the TCA-cycle, a bifunctional mannosylglycerate synthase/phosphatase with potential role in osmotic stress adaptation (Empadinhas et al., 2004), and five reductive dehalogenases involved in respiration and energy conservation (Adrian et al., 2007b, Krajmalnik-Brown et al., 2004, Magnuson et al., 2000, Magnuson et al., 1998, Müller et al., 2004, Tang et al., 2013). Also, the activity of hydrogenases was experimentally characterized in D. mccartyi strains 195 (Nijenhuis and Zinder, 2005) and CBDB1 (Jayachandran et al., 2004). Apart from these functionally characterized genes, experimental studies on D. mccartyi genes are largely unknown. In addition, genome sequences of these bacteria revealed the presence of ~50% hypothetical proteins and
99
proteins with unknown or non-specific functions in their genomes (Kube et al., 2005, Seshadri et al., 2005). However, the primary gene-annotations, including the annotations for more than 80 metabolic genes, were reviewed or corrected during the construction and manual curation of the D. mccartyi metabolic model (Chapter 3) (Ahsanul Islam et al., 2010). Also, a recent system- wide study (Ahsanul Islam et al., 2013) on D. mccartyi transcriptomes (Chapter 4) lead to proposing putative functions for 5 hypothetical proteins, in addition to providing additional confidence in reannotated genes from the modeling study. One of these 5 hypothetical proteins was proposed to be a putative bifunctional phosphoglucose isomerase (PGI; EC 5.3.1.8)/ phosphomannose isomerase (PMI; EC 5.3.1.9) due to its inclusion in a co-expressed gene-cluster enriched for central carbon metabolic genes (Ahsanul Islam et al., 2013). The same annotation was proposed during the construction of the D. maccartyi metabolic model (Ahsanul Islam et al., 2010). Another metabolic gene that was primarily annotated as a putative NAD+-isocitrate dehydrogenase (IDH) (EC 1.1.1.41) was reannotated as a putative NADP+-dependent IDH (EC 1.1.1.42) during the modeling study (Ahsanul Islam et al., 2010). However, neither of these proposed gene-annotations was supported by experimental evidence.
In this study, we reported the biochemical characterization of the aforementioned putative IDH (KB1_0495) and PGI/PMI (KB1_0553) from D. mccartyi strain KB-1. Although IDH is an + important TCA-cycle enzyme catalyzing the formation of 2-oxoglutarate and CO2 with NAD or NADP+ as a cofactor (Kanehisa et al., 2011, Madigan et al., 2010, Nelson and Cox, 2006), the physiological role of a bifunctional PGI/PMI in D. mccartyi is unclear. PGI, in general, plays a central role in sugar metabolism via glycolysis and gluconeogenesis in all three forms of life (Hansen et al., 2004, Hansen, 2004, Nelson and Cox, 2006), while PMI helps to produce precursors for cell wall components, glycoproteins, glycolipids, and storage polysaccharides (Quevillon et al., 2005, Rajesh et al., 2012, Hansen, 2004). However, glycolysis is inactive in D. mccartyi, and they also lack a typical bacterial cell wall (Löffler et al., 2012). Moreover, all metabolic genes in the D. mccartyi metabolic model (Ahsanul Islam et al., 2010) were categorized into three classes based on the quality and types (i.e., biochemical or bioinformatic) of evidence supporting the gene-annotations: low, medium, and high confidence genes. We choose to functionally characterize a medium confidence (KB1_0495) and a low confidence gene (KB1_0553) because (1) biochemical evidence for these genes is non-existent and
100
annotations without experimental evidence are simply hypotheses, (2) bioinformatic evidence for the genes is either limited (represented by medium confidence for KB1_0495) or insufficient (represented by low confidence for KB1_0553) in the model, and (3) correctness of their proposed annotations/hypotheses will support, or at least, add confidence to the other gene reannotations/hypotheses presented in the modeling and transcriptomic studies (Ahsanul Islam et al., 2010, Ahsanul Islam et al., 2013). We, therefore, heterologously expressed KB1_0495 and KB1_0553 in E. coli and tested biochemical activities of the purified recombinant proteins. We characterized IDH activity in KB1_0495 (also denoted as DmIDH) with NADP+ as the main cofactor, while only PMI activity was identified and confirmed for KB1_0553 (also denoted as DmPMI). Analyses of their physiological roles indicated potential involvement of DmIDH in energy metabolism and DmPMI in compatible solute production to help D. mccartyi adapt osmotic stress. We also analyzed the predicted secondary structures of both enzymes with the crystal structures of their closest homologs, and conducted bioinformatic analyses of their sequences. These analyses revealed their novelties and suggested they might part of new enzyme families within their respective enzyme superfamilies.
5.3 Materials and methods
5.3.1. Bacterial culture, reagents and chemicals
Genomic DNA (gDNA) was collected from KB-1, a D. mccartyi-containing anaerobic mixed culture growing on TCE and methanol following the procedure described previously (Duhamel and Edwards, 2006, Duhamel, 2005, Waller, 2010). The PCR primers for amplifying strain KB-1 gDNA were synthesized by Integrated DNA Technologies (Coralville, IA, USA). Luria Broth (LB) and Terrific Broth (TB) powder were purchased from EMD Chemicals (Gibbstown, NJ, USA), and the Bradford assay reagent from Bio-Rad (Hercules, CA, USA). Lysozyme, proteinase K, agarose, glycerol, ampicillin, kanamycin, SDS, and IPTG were obtained from BioShop (Burlington, ON, Canada), and all other chemicals were purchased from Sigma-Aldrich (St. Louis, MO, USA) with greater than 98% in purity. Nickel-nitrilotriacetic acid (Ni-NTA) resin and the QIAquick PCR purification kit were purchased from Qiagen (Mississauga, ON,
101
Canada), while the In-Fusion PCR cloning kit was purchased from Clontech (Palo Alto, CA, USA). The commercially available kits were used according to the manufacturers’ instructions.
5.3.2. Gene cloning and overexpression of the selected genes in E. coli
The selected genes (KB1_0495 and KB1_0553) were PCR-amplified using KB-1 gDNA and the PCR primers containing the restriction sites for BamHI and NdeI, and were cloned into the modified pET-15b vector (Novagen, Madison, WI, USA) containing a 5’ N-terminal hexahistidine tag (6xHis-tag) and an ampicillin resistance gene as described previously (Zhang et al., 2001). In the modified vector, the tobacco etch virus protease cleavage site replaced the thrombin cleavage site, and a double stop codon was introduced downstream from the BamHI site (Zhang et al., 2001). These vectors were subsequently transformed into E. coli BL21 (DE3) Gold strain (Stratagene, La Jolla, CA, USA) for overexpression of the targeted fused genes. The cells were grown aerobically in 1 liter flasks containing LB medium at 37°C and ~220 rpm until
the OD600 reached around 1.0 (approximately in 3 hours). Expression of the cloned genes was induced by the addition of 100 mg IPTG. The cells were then harvested the following day by centrifugation at 7,500 rpm and 4°C for 20 min.
5.3.3. Purification of the overexpressed recombinant proteins
The overexpressed, fused 6xHis-tagged proteins were purified using the immobilized metal ion affinity chromatography (IMAC) (Hochuli, 1990, Hochuli et al., 1988, Porath, 1992, Porath et al., 1975) as described previously (Zhang et al., 2001). Briefly, the harvested E. coli cells were suspended in the binding buffer and disrupted by sonication. The sonicated cell lysates were separated by centrifugation at 23,500 rpm and 4°C for 45 minutes. Afterwards, the supernatant was loaded on a column containing Ni-NTA resins and washed with 200 mL wash buffer to remove non-specifically attached proteins to the resins. The 6xHis-tagged proteins were eluted using an eluent buffer with increasing concentrations of imidazole. Purified proteins were frozen with liquid nitrogen if they were collected with a high yield (≥ 50 mg/liter of culture) and homogeneity (≥ 95%) as described previously (Zhang et al., 2001). SDS-PAGE (sodium dodecyl sulfate polyacrylamide gel electrophoresis) followed by staining with Coomassie Brilliant Blue
102
R 250 were performed according to standard procedures (Laemmli, 1970) for checking the expression level and purity of targeted proteins.
5.3.4. Enzymatic assays for the purified recombinant proteins
The isocitrate dehydrogenase (IDH) activity KB1_0495 was determined by an standard assay
(Steen et al., 1998), in which the enzyme converted D-isocitric acid to 2-oxoglutarate and CO2 using NADP+ or NAD+ as a cofactor according to the following equation: