EXAMINATION OF METABOLIC AND REGULATORY NETWORKS OF

DESULFOVIBRIO

A Dissertation

Presented to

The Faculty of the Graduate School

University of Missouri-Columbia

In Partial Fulfillment

Of the Requirement for the Degree

Doctor of Philosophy

by

CHRISTOPHER LEE HEMME

Judy D. Wall, Dissertation Supervisor

DECEMBER 2004

Acknowledgements

I would like to thank my dissertation advisor Dr. Judy Wall for giving me the opportunity to work in her laboratory and for being patient with me during the difficult times. I would also like to thank my colleagues and friends in the Wall laboratory: Bill Yen, Brett Emo, Grant Zane, Joe Ringbauer, Laurie

Casalot, Kate Hart, Suzanne Miller, Leena Pattarkine, Kelly Bender, Elliot Drury and especially Barbara

Giles. Thank you all for your help and friendship.

I would like to acknowledge the members of my dissertation committee, Drs. Dave Emerich,

David Eide, Arun Chatterjee, Toni Kazic and Peter Tipton, for the advice and aid they offered over the course of this project.

Finally, I would like to thank my parents for always believing in me, and my brother for giving me three wonderful nephews who helped keep my spirits up.

ii EXAMINATION OF METABOLIC AND REGULATORY NETWORKS OF

DESULFOVIBRIO SPECIES

Christopher L. Hemme

Dr. Judy D. Wall, Dissertation Supervisor

Abstract

The sulfate-reducing are a morphologically diverse group of organisms characterized by the ability to couple the enzymatic reduction of sulfate to energy production and growth. This metabolic activity has profound economic and environmental consequences such as the corrosion of metal structures and the souring of petroleum reserves. The sulfate-reducing bacteria are also among a select group of organisms that may be used as tools for the bioremediation of toxic heavy metal contaminants from the environment. To understand the mechanisms through which these bacteria impact our environment both positively and negatively, genomic studies have been undertaken to predict the metabolic and regulatory networks of two species of the genus Desulfovibrio. Studies have focused on the elucidation of carbon metabolic pathways, the role of CRP-FNR proteins in the regulation of Desulfovibrio metabolic pathways, and the prediction of global regulatory networks using bioinformatics techniques.

Surprisingly, several hexose metabolic genes were found despite the fact that biochemical evidence suggests that these bacteria do not use hexose sugars as growth substrates. This physiological paradox was explored. Secondary pathways for the metabolism of galactose and the synthesis of α,α- trehalose were observed in Desulfovibrio desulfuricans G20 but not Desulfovibrio vulgaris Hildenborough.

Physiological experiments showed that despite the presence of a complete set of galactose metabolism genes, Dv. desulfuricans was unable to utilize galactose as the sole carbon source for growth. Growth experiments using [14C]-labeled galactose in the presence of lactate suggested that galactose was incorporated into the cell. This result coupled with the published observation that the extracellular polymeric substances (EPS) of Dv. desulfuricans strains contained detectable levels of galactose suggests that metabolism of galactose occurs for the purposes of production of EPS and possibly lipopolysaccharide

(LPS).

iii To explore the possible hierarchical regulation of substrate utilization, global regulators responding to redox signals were sought. Sequencing revealed multiple orthologs of the CRP-FNR genes in Desulfovibrio. A mutant strain of Dv. vulgaris was constructed that was interrupted in a gene encoding a putative CRP-FNR protein. A phenotypic analysis of the mutant strain showed no significant differences in the growth rates and growth yields of the cells compared to wild type when grown on lactate-sulfate, pyruvate-sulfate (respiration), pyruvate alone (fermentation) or formate-sulfate. However, the mutant was shown to be impaired in growth on ethanol-sulfate compared to wild type, suggesting the involvement of this protein in either the cellular ability to use ethanol or maintain cellular integrity.

A computational analysis of the promoter regions of putative transcription units of Desulfovibrio revealed possible regulatory protein binding motifs homologous to the E. coli CRP and GalR binding sites as well as a motif common to genes encoding phosphate homeostasis proteins. Further examination revealed a set of statistically significant motifs not immediately identified as E. coli homologs. This set of motifs may represent unique regulatory motifs of Desulfovibrio.

iv Table of Contents

Acknowledgements ...... ii

ABSTRACT ...... iii

Table of Contents ...... v

List of Tables ...... viii

List of Figures...... ix

1. General Background...... 1

1.1 Overview ...... 1

1.2 Introduction to the Sulfate-Reducing Bacteria ...... 1

1.3 Classical Metabolism of Desulfovibrio ...... 7

1.3.1 Central Carbon Metabolism ...... 7

1.3.2 Energetics of the SRB ...... 9

1.3.2.1 Respiration by Reduction of Sulfate ...... 9

1.3.2.2 Growth with Alternative Terminal Electron Acceptors...... 13

1.3.2.3 Dissimilatory Metal Reduction ...... 13

1.3.2.4 Oxygen Metabolism ...... 14

1.4 Computational Methods for Sequence Analysis ...... 14

1.4.1 Determination of Open Reading Frames...... 15

1.4.2 Elucidation of Protein Function by Sequence Homology...... 15

1.4.3 Detection of Orthologs...... 18

1.4.4 Functional Assignment by Contextual Analysis...... 19

1.4.5 Pitfalls of Functional Assignment by Sequence Homology Methods ...... 20

1.4.6 Constructing Biochemical Networks from Genomic Sequence Data...... 22

v 1.4.6.1 Metabolic Networks...... 22

1.4.6.2 Regulatory Networks...... 23

1.5 Summary of Project...... 25

2. Comparative Carbon Metabolism of Desulfovibrio vulgaris Hildenborough and

Desulfovibrio desulfuricans G20...... 26

2.1 Introduction/Rationale ...... 26

2.2 Materials and Methods...... 28

2.3 Results ...... 38

2.3.1 Genomic and Physiological Insights into Carbon Metabolism of Desulfovibrio..38

2.3.1.1 Determination of BLAST Parameters and Statistical Cutoffs ...... 38

2.3.1.2 Central Carbon Metabolism ...... 39

2.3.1.2.1 Glycolysis and Gluconeogenesis...... 39

2.3.1.2.2 Organic Acid Metabolism ...... 40

2.3.1.2.3 TCA Cycle...... 41

2.3.1.2.4 Secondary Carbohydrate Metabolism Pathways...... 43

2.3.1.3 Galactose Metabolism by Dv. desulfuricans G20 ...... 58

2.3.1.3.1 Genomic Analysis of Galactose Metabolism Genes ...... 58

2.3.1.3.2 Physiological Studies of Galactose Metabolism by Dv. desulfuricans G20.....70

2.4 Discussion...... 80

2.4.1 Computational Prediction of Central Carbon Metabolism ...... 80

2.4.2 Galactose Metabolism of Desulfovibrio desulfuricans G20 ...... 82

3. Regulatory Mutant of Dv. vulgaris Hildenborough ...... 87

3.1 Introduction/Rationale ...... 87

vi 3.2 Materials and Methods...... 90

3.3 Results ...... 101

3.3.1 Genomic Analysis of CRP-FNR Proteins of Desulfovibrio Species ...... 101

3.3.2 Phenotypic Analysis of an CRP-FNR Mutants Strain of Dv. vulgaris ...... 109

3.4 Discussion...... 119

4. Computational Analysis of Regulatory Motifs of Desulfovibrio Species ...... 129

4.1 Introduction/Rationale ...... 129

4.2 Materials and Methods...... 131

4.3 Results and Discussion...... 134

4.3.1 AlignACE Results ...... 134

4.3.2 Comparison to E. coli Regulons...... 146

4.3.3 Identification of Regulatory Proteins...... 158

5. Research Acknowledgements...... 167

6. Appendices...... 168

7. References ...... 237

Vita ...... 254

vii List of Tables

Table 1-1. δ-proteobacterial species examined in comparative genomics studies ...... 6

Table 1-2. Common biochemical reactions of the SRB ...... 12

Table 2-1. Strains and plasmids used in this study...... 33

Table 2-2. Growth media used in this study...... 35

Table 2-3. Oligonucleotide primers used in PCR analysis ...... 37

Table 2-4. Comparative carbon metabolism of Desulfovibrio ...... 45

Table 2-5. Distribution of the galactose enzymes among the δ-...... 67

Table 2-6. Total protein assay of Dv. desulfuricans grown on galactose...... 75

Table 2-7. Incorporation of [14C]galactose by Dv. desulfuricans G20...... 77

Table 2-8. Bacterial name abbreviations for Figure 2-9 ...... 86

Table 3-1. Verification of pHB1 plasmid insertion into Dv. vulgaris by PCR...... 98

Table 3-2. CRP-FNR proteins of Desulfovibrio...... 104

Table 3-3. Comparisons of the CRP-FNR proteins of the δ-proteobacteria ...... 106

Table 3-4. Growth phenotypes of Dv. vulgaris HBAR5 mutant strain ...... 111

Table 3-5. Total protein, sulfate and sulfide determinations...... 118

Table 3-6. Descriptions and comparative analysis of the genes of Figure 3-6...... 127

Table 4-1. Significant non-palindromic motifs of Desulfovibrio...... 139

Table 4-2. Significant palindromic motifs of Desulfovibrio...... 141

Table 4-3. Comparison of hypothetical Dv. vulgaris regulons to E. coli ...... 149

Table 4-4. Comparison of hypothetical Dv. desulfuricans regulons to E. coli ...... 151

Table 4-5. E. coli regulatory motifs scanned against Desulfovibrio...... 153

viii Appendix A1. BLASTP comparison of control sequences against the Dv. vulgaris

proteome ...... 170

Appendix A2. Comparison of BLAST scoring matrices ...... 218

Appendix A3. Effect of BLAST low-complexity filter...... 220

Appendix B1. Regulon groupings based on metabolic pathways...... 226

Appendix B2. Regulon groupings based on conserved operons ...... 229

Appendix B3. Regulon groupings based on conserved regulons of E. coli ...... 231

ix List of Figures

Figure 2-1. Glycolytic, gluconeogenic and organic acid metabolism pathways of

Desulfovibrio ...... 55

Figure 2-2. TCA and reductive carboxylation cycles of Desulfovibrio...... 57

Figure 2-3. Secondary carbon metabolism of Dv. desulfuricans G20...... 61

Figure 2-4. Predicted galactose metabolism of Dv. desulfuricans G20...... 63

Figure 2-5. Predicted gal operon of Dv. desulfuricans G20...... 65

Figure 2-6. Phylogeny of GalK ...... 69

Figure 2-7. Growth phenotype of Dv. desulfuricans G20 on galactose ...... 73

Figure 2-8. Galactose-dependent expression of galT determined by RT-PCR ...... 79

Figure 2-9. Arrangement of gal genes in bacteria...... 85

Figure 3-1. Plasmid insertion of pHB1 into Dv. vulgaris to produce HBAR5 ...... 96

Figure 3-2. PCR verification of plasmid insertion ...... 100

Figure 3-3. Phylogenetic grouping of CRP-FNR genes ...... 108

Figure 3-4. Growth of HBAR5 on lactate, pyruvate and formate...... 113

Figure 3-5. Growth of HBAR5 on ethanol...... 116

Figure 3-6. Genomic environments of the Desulfovibrio CRP-FNR genes...... 125

Figure 4-1. Motifs derived from Desulfovibrio regulons ...... 137

Figure 4-2. Non-palindromic motifs of Desulfovibrio ...... 143

Figure 4-3. Palindromic motifs of Desulfovibrio ...... 145

Figure 4-4. Genes of the PhoB regulon of Desulfovibrio...... 162

Figure 4-5. Arginine biosynthesis pathway of Desulfovibrio...... 164

Figure 4-6. Glycerol metabolism of Desulfovibrio...... 166

x Appendix A4. Comparison of Desulfovibrio control data for Dv. vulgaris ...... 222

Appendix A5. Comparison of non-Desulfovibrio control data for Dv. vulgaris ...... 224

Appendix C1. Sample output from AlignACE...... 233

Appendix C2. Sample output from ScanACE...... 236

xi 1. General Background

1.1. Overview

The metabolic and regulatory networks of the sulfate-reducing bacteria Desulfovibrio vulgaris

Hildenborough and Desulfovibrio desulfuricans G20 were investigated through computational and molecular biological techniques applied to the complete genome sequences. The text is organized as follows. Chapter 1 provides a general overview of the phylogeny, environmental distribution and economic impact of the sulfate-reducing bacteria as well as an overview of the computational processes used in later chapters. Chapter 2 discusses insights gained into the central carbon metabolism of

Desulfovibrio species by examination of the genomic sequences and a comparison of these data to prior biochemical studies. Chapter 3 describes the construction and phenotypic analysis of a Dv. vulgaris mutant strain lacking a gene encoding a putative CRP-FNR regulatory protein. Chapter 4 covers the elucidation of regulatory networks and the prediction of regulatory binding motifs using computational means.

1.2. Introduction to the Sulfate-Reducing Bacteria

The term “sulfate-reducing bacteria” (SRB) refers to a morphologically diverse group of microbes

characterized by the ability to utilize sulfate as a terminal electron acceptor for growth. This metabolic

process is referred to as dissimilatory sulfate reduction and involves the enzymatic reduction of sulfate to

+ - - sulfide (8H + 8e + SO4 + ATP → 2Pi + AMP + H2S + 3H2O) with the accompanied production of energy.

While a few archaeal sulfate-reducers have been identified, most notably Archaeoglobus fulgidus (1), the

vast majority of identified sulfate-reducing species are eubacteria. The eubacterial sulfate-reducers

comprise both Gram positive and Gram negative species and members of the genus Desulfotomaculum are

capable of spore formation. Most characterized SRB species are classified as mesophilic heterotrophs,

though several autotrophic (2-4) and/or extremophilic (5-12) species have also been identified.

1 Historical interest in the SRB has focused on the propensity of certain species to metabolically induce corrosion of metals. This activity has a devastating economic impact on industry, with microbially influenced corrosion (MIC) accounting for millions of dollars in corrosion damage annually in such

manmade anaerobic environments as petroleum pipelines and sewer systems (13). In addition to corrosion

damage, sulfide produced by the SRB leads to souring of petroleum reserves and a subsequent increase in

refining costs. Studies of MIC have traditionally focused the genus Desulfovibrio, species of which are often isolated from environmental corrosion sites. Current and past efforts have attempted to combine the available chemical, physiological and genetic data into a unified model for Desulfovibrio-induced corrosion of ferrous metals (13, 14). The current model for MIC by Desulfovibrio species attempts to explain the observation of an increased rate of corrosion of ferrous metals by mixed SRB-containing biofilms exposed to oxygen (13). In this model, heterotrophic aerobes colonize the surface of the metal and establish the biofilm matrix. As the biofilm thickens, oxygen is rapidly metabolized at the surface of the biofilm resulting in an aqueous anaerobic environment at the surface of the metal that is suitable for colonization by the Desulfovibrio. In such an environment, hydride produced from the natural hydrolysis of water forms a layer on the metal surface. The localized formation of biofilms thus results in distinct chemical environments on the metal surface which can lead to the formation of spatially-separated anodes and cathodes and thus an electrochemical cell. Once such a cell has been established, electrons are liberated from the cathode and react with the hydride to form molecular hydrogen (H2) while the corresponding anodic reaction involves the dissolution of iron as Fe(II) and the formation of an anodic corrosion pit. The hydrogenase-positive Desulfovibrio is able to utilize H2 as an electron donor for the reduction of sulfate.

The removal of H2 by the organism leads to a depolarization of the cathode which in turn drives the overall electrochemical reaction forward. Furthermore, the metabolic production of sulfide from sulfate results in the precipitation of insoluble metal sulfides which can have either a positive of negative effect on the corrosion rate depending on the ambient conditions. Further reactions occur at the oxic/anoxic boundary with oxygen serving as the electron acceptor, with Fe(II) being oxidized to Fe(III) and sulfide oxidized back to sulfate or to the diagnostic corrosion product elemental sulfur. Molecular oxygen thus acts as the

terminal electron acceptor for the transfer of electrons from the metal surface to the surface of the biofilm,

resulting in an increased corrosion rate when compared to pure purely anaerobic biofilms (13). 2 Recent interest in the SRB has focused on the potential use of selected species in environmental

bioremediation efforts (15-18). Several Desulfovibrio species have been shown to reduce toxic heavy

metal ions from contaminated soil and water samples enzymatically (17, 19, 20). For most metals studied,

save iron, the metal is reduced from a highly soluble to a highly insoluble oxidation state (15, 16).

Precipitation of insoluble metal complexes renders the metal less mobile in the environment and, as a

result, less toxic to biological systems. Certain species have also been observed to carry out the enzymatic degradation of toxic hydrocarbons under anaerobic conditions (21) and thus may contribute to bioremediation of contaminated sites.

Studies have been conducted on the SRB for over a hundred years, but for much of that time the bacteria were viewed as a biological curiosity unique only in their ability to reduce sulfate. It is now clear that SRB species represent a significant proportion of bacterial life on earth and play a crucial role in the environmental cycling of sulfur and carbon (22, 23). Strains have been isolated from such diverse environments as the deep soil subsurface, deep-sea hydrothermal vents, marine and fresh water sediments, arctic environments, petroleum reserves, the gastrointestinal tracts of animals and the aforementioned manmade environments. Despite being traditionally classified as strict anaerobes, SRB species are often found in environments that experience transient periods of oxygen exposure. Certain species have been shown to metabolize oxygen at rates similar to facultative aerobes, although accompanying growth has not yet been demonstrated (24). SRB species employ a wide variety of oxygen defense mechanisms not found in traditional strict anaerobes, and it is likely that this oxygen tolerance contributes to the wide ecological distribution of the bacteria (25).

The relative genetic inaccessibility of the SRB has focused previous research on the biochemistry and physiology of the organisms. In recent years, genetic techniques have evolved to the point where comprehensive genetic analyses are now possible. In particular, the sequencing of the genomes of several sulfate-reducing strains, including an archaeal sulfate-reducer (Archaeoglobus. fulgidus), two species of

Desulfovibrio (Dv. vulgaris Hildenborough and Dv. desulfuricans G20) and two marine SRB species

(Desulfotalea psychrophila and Desulfobacterium autotrophicum) (Table 1-1), now permits the first comprehensive analyses of the predicted metabolic and regulatory networks of sulfate-reducing species.

3 This project was designed to exploit the information available from the genomic sequences to make physiological and regulatory predictions regarding species of Desulfovibrio. The information derived from these analyses was used to form hypotheses regarding the metabolic and regulatory networks of the organisms which were then tested using traditional genetic, biochemical and molecular biology techniques.

4 Table 1-1. δ-proteobacterial species examined in comparative genomics studies

a The version number (if available) of the genomic sequence used. CMR indicates the

version currently deposited in the TIGR Comprehensive Microbial Database

b TIGR = The Institute for Genomic Research, VIMSS = Virtual Institute for Microbial

Stress and Survival, JGI = Joint Genome Institute, REGX = Real Environmental

Genomix , SI = Sanger Institute MPI = Max Planck Institute

5 Table 1-1. δ-proteobacterial species examined in comparative genomics studies

Family Organism Name Abbrev. Versiona Sourceb Reference Desulfovibrio vulgaris Desulfovibrionaceae Dvu CMR TIGR/VIMSS (26) Hildenborough Desulfovibrio Desulfovibrionaceae Dde 22dec03 JGI/VIMSS - desulfuricans G20 Desulfotalea Desulfobulaceae Dps - REGX/VIMSS (8) psychrophila LSv54 Desulfobacterium Dau - REGX (2) autotrophicum HRM2 Desulfuromonas Desulfuromonadaceae Dac 07apr03 JGI/VIMSS (27) acetoxidans Geobacter Geobacteraceae Gme 12nov03 JGI/VIMSS (28) metallireducens GS-15 Geobacter Geobacteraceae Gsu CMR TIGR/VIMSS (29) sulfurreducens PCA Bdellovibrionaceae Bacteriovorax marinus Bma - SI (30) Bdellovibrio Bdellovibrionaceae Bba CMR TIGR/MPI/VIMSS (31) bacteriovorus HD100

6

1.3. Classical Metabolism of Desulfovibrio

1.3.1. Central Carbon Metabolism

Prior to the publication of the first genomic sequences of a sulfate-reducing bacterium, most of the

accumulated knowledge of the central carbon metabolism in Desulfovibrio was derived from biochemical

and physiological examinations. Most Desulfovibrio species are unable to utilize hexose or pentose sugars as carbon sources, having instead evolved towards the utilization of organic acid end products of glycolysis as the primary sources of carbon (32). One notable exception to this observation is Desulfovibrio fructosovorans, which is able to grow on fructose (33). Despite the inability of most Desulfovibrio species to utilize hexose sugars, activities for each of the glycolytic enzymes were detected in cell extracts of

Desulfovibrio gigas (34). Furthermore, several Desulfovibrio species have been shown to synthesize polyglucose molecules (34-37), the degradation of which has been shown to provide reducing potential for the production of NTP’s under both fermentative and respiratory conditions in Dv. gigas (34). This metabolism appears to occur via the classical Embden-Meyerhof glycolysis pathway and the methylglyoxal shunt in a 3:2 (34) ratio and has been linked to the rubredoxin-rubredoxin oxidoreductase oxygen stress response mechanism (37). Several SRB species are known to oxidize acetate to CO2 with this activity

often contributing to true autotrophic growth (32, 38, 39), but this activity has not been observed in

Desulfovibrio species. In general, Desulfovibrio species are characterized as chemolithotrophs that derive

ation of short-chain organic acids such as

lactate and pyruvate to acetate coupled to the reduction of sulfate.

Characterization of carbon utilization in the SRB is complicated by the observation of mixotrophic

growth effects. The classical example of this phenomenon is the “autotrophic” growth of Desulfovibrio

strains on CO2 and H2 (32). Energy is generated by the oxidation of H2, but CO2 can only be assimilated if preformed acetate is present in the media. The reductive carboxylation of acetate to pyruvate appears to be responsible for growth on mineral media (32). Another example of mixotrophic growth in the SRB is the growth of strains on butanol in the presence of yeast extract (32). Under these conditions, butanol is

7 oxidized to butyric acid with no incorporation of this carbon into cell material. The electrons donated by butanol appear to stimulate growth of the organisms on the carbon stores contained in the yeast extract.

Numerous other mixotrophic effects by SRB species have been observed (32).

Biochemical studies have not conclusively shown the presence of an active TCA cycle, though many of the enzymes of both the oxidative and reductive cycles have been biochemically characterized in various species (32). Biochemical evidence was obtained that indicated some species of Desulfovibrio and

Clostridia, including Dv. vulgaris and Dv. desulfuricans, utilized an atypical citrate synthase that produces the R enantiomer of citrate instead of the typical S form (40-42), but such an enzyme has never been completely purified (43). Several species apparently contain enzymes of the reductive TCA cycle, but not all encode a citrate lyase enzyme that would allow for CO2 fixation and acetate oxidation by this pathway

(38). The coupling of sulfate reduction to electron donor oxidation via a fumarate-succinate cycle has been demonstrated in Dv. desulfuricans strains (44), but the theoretical mechanisms for such a cycle have not been fully elucidated (32).

8

1.3.2. Energetics of the SRB

1.3.2.1. Respiration by Reduction of Sulfate

The enzymatic reduction of sulfate to sulfide is the characteristic metabolic reaction of the SRB

(45). Unlike most examples of terminal electron acceptor usage, the reduction of sulfate requires the prior

activation of the molecule by ATP due to the fact that sulfate is a relatively stable molecule. This reaction

is carried out by the ATP sulfurylase enzyme with the production of adenosine phosphosulfate (APS) and

inorganic pyrophosphate (PPi) (Table 1-2). The APS molecule represents the replacement of the two terminal phosphate groups of ATP with a single sulfate and has approximately double the energy of the analogous ADP molecule. Because of the positive free energy change of this reaction, it has been posited that an inorganic pyrophosphatase is required to remove the PPi and thus drive the reaction in the forward

direction. Such an enzyme has been identified both from biochemical studies (32) and from the genomic

sequence annotations. Once activated, the APS is reduced to sulfite with the release of AMP by APS

reductase, with the sulfite ultimately reduced to sulfide by the diagnostic enzyme bisulfite reductase

(desulfoviridin) (Table 1-2). The reduction of 1 mole sulfate requires 8 e- and this reductant is provided by

the oxidation of 2 mole lactate or 4 mole pyruvate. Furthermore, reduction of 1 mole sulfate consumes the

equivalent of two mole ATP (ATP→AMP). The oxidation of 1 mole lactate/pyruvate is predicted to result

in the production of 1 mole ATP via substrate-level phosphorylation (acetate-P + ADP -> acetate + ATP),

therefore the oxidation of 2 mole lactate would exactly balance the ATP requirements for the reduction of 1 mole sulfate (32). If lactate oxidation provided the only source of ATP under sulfate-reducing conditions, the net ATP yield would be 0 and growth of the cell would not be possible (46). The fact that lactate oxidation is shown to be linked to sulfate reduction with accompanying growth suggests a respiratory mechanism by which ATP is produced through the establishment of a chemiosmotic gradient (46). Such a system would also act to overcome the positive free energy of the conversion of lactate to pyruvate by drawing off hydrogen equivalents produced by the oxidation of lactate to pyruvate (32) (Table 1-2).

9 The observation of a spike in the initial concentration of molecular hydrogen with cells grown on

lactate and sulfate led to the proposal of a model for the establishment of a proton gradient across the

cytoplasmic membrane by cycling of internal molecular hydrogen (47). In this hydrogen cycling model,

protons and electrons resulting from the oxidation of lactate and/or pyruvate are converted to H2 in the cytoplasm by one or more hypothetical cytoplasmic hydrogenases. The H2 generated in this reaction is

transported across the cytoplasmic membrane into the periplasm where it is converted back to protons and

electrons by one or more periplasmic hydrogenase enzymes. This results in a proton and charge gradient

which can drive traditional chemiosmotic ATP synthesis via the F0-F1 ATP synthase complex. The

electrons, on the other hand, enter a periplasmic cytochrome c3 network and are transported back into the cytoplasm where they become available for sulfate reduction.

10 Table 1-2. Common biochemical reactions of the SRB. ∆G 0’ values for the reactions were calculated using the ∆Gf 0 values of products and substrates as listed in (48).

11 Table 1-2. Common biochemical reactions of the SRB

∆Gº’ Reaction Description Enzyme EC # (kJ/mol) L-Lactate → Pyruvate + H+ + e- Dehydrogenation of lactate L-Lactate Dehydrogenase 1.1.1.27 +43.2

Pyruvate + CoA + Fdox → Acetyl-CoA + CO2 + Fdred Dehydrogenation of pyruvate Pyruvate Synthase 1.2.7.1 -19.2

Pyruvate + CoA → Acetyl-CoA + Formate Thiolytic cleavage of pyruvate Pyruvate Formate-Lyase 2.3.1.54 -16.3

Acetyl-CoA + Pi → Acetyl-P + CoA Formation of Acetyl-P Phosphate Acetyltransferase 2.3.1.8 +9.0

Acetyl-P + ADP → Acetate + ATP Substrate-level phosphorylation Acetate Kinase 2.7.2.1 -13.0

- + - Formate + H2O → HCO3 + 2H + 2e Dehydrogenation of formate Formate Dehydrogenase 1.2.1.2 +1.4

2- + SO4 + 2H + ATP → APS + PPi Activation of sulfate ATP Sulfurylase 2.7.7.4 +46.0 12 PPi → 2Pi Degradation of pyrophosphate Inorganic Pyrophosphatase 3.6.1.1 -21.9

+ - - APS + H + 2e → HSO3 + AMP Reduction of APS to sulfite APS Reductase 1.8.4.8 -68.6

- + - - HSO3 + 6H + 6e → HS + 3H2O Reduction of sulfite to sulfide Bisulfite Reductase 1.8.99.3 -171.7

1.3.2.2. Growth with Alternative Terminal Electron Acceptors

While SRB species are characterized by the ability to utilize sulfate as a terminal electron acceptor, most species are capable of utilizing alternative terminal electron acceptors for growth.

Reduced sulfur compounds such as sulfite and thiosulfate can serve as electron acceptors in many SRB species. The ability to dismutate pyruvate or fumarate is widespread among SRB species (32). Many species of Dv. desulfuricans are capable of dissimilatory nitrate reduction (44) and a novel periplasmic nitrate reductase has been isolated from Dv. desulfuricans ATCC 27774 (49). Choline and cysteine

fermentation has also been demonstrated in certain species (32). The ability to utilize a diverse range

of compounds as terminal electron acceptors is another likely reason for the wide ecological

distribution of SRB species.

1.3.2.3. Dissimilatory Metal Reduction

Recent SRB research has focused on the use of the organisms as bioremediation tools for the cleanup of environmental contaminants. The principle behind bioremediation is to stimulate the growth and/or activity of naturally-occurring microbes in order to degrade or immobilize environmental contaminants (16). Physical remediation processes are expensive and often require significant disruptions of the local environment; thus bioremediation methods potentially offer a cheaper and more environmentally-friendly method of remediating contaminated sites (16).

Unlike organic contaminants which can be completely mineralized to harmless byproducts, heavy metals cannot be degraded and therefore present a particular problem for remediation efforts

(16). Desulfovibrio species, as well as other environmentally relevant bacteria such as Shewanella and

Geobacter, are able to precipitate toxic heavy metal ions through a biotic process referred to as dissimilatory metal reduction (15). In this process, metal ions serve as terminal electron acceptors coupled to the oxidation of an electron donor, usually without accompanying growth. In the case of most non-ferrous metals, the reduction of the metal ion to a lower oxidation state involves the formation of less soluble or insoluble metal complexes. Immobilization of the metal contaminant in

13 this way simplifies traditional remediation efforts or, if further remediation is not feasible, ensures that the metal contaminant is less biologically active. Metals which can be remediated by dissimilatory metal reduction include base metals such chromium, vanadium, molybdenum and manganese and as well as radionucleotides such as uranium (17, 19, 20).

Dissimilatory metal reduction by Desulfovibrio has been shown to be a biotic process involving the periplasmic cytochrome c3 protein (17, 50, 51). Cytochrome c3 is a low-potential

electron carrier originally isolated from Desulfovibrio species and is the most abundant cytochrome in

those organisms (32). The current model suggests that a cytochrome c3 network in the periplasm

serves as the primary electron transfer pathway for many metabolic processes, including the oxidation

of H2 and dissimilatory metal reduction (26, 50, 51).

1.3.2.4. Oxygen Metabolism

Despite their classification as strict anaerobes, Desulfovibrio species possess a remarkably

complex oxidative stress defense system (52). Several species have been shown to encode superoxide

dismutase (53, 54) and catalase (55), and these genes have been identified from the available

Desulfovibrio genomic sequences. Membrane-bound oxygen-reducing respiratory chains involving

terminal cytochrome oxidase complexes have also been identified (56). A putative O2-sensing chemotaxis protein, DcrA, that has been identified in Dv. vulgaris (57-60) is located near an operon encoding the oxygen defensive genes rubredoxin and rubredoxin oxidoreductase (superoxide reductase) (61, 62). These two genes have been linked to polyglucose metabolism in Dv. gigas (34).

1.4. Computational Methods for Sequence Analysis

Partial or complete genomic sequences of over 1000 organisms were available at the time of

this writing (Aug. 2004). The accompanying growth of the sequence databases increases the utility of

computational methods of function assignment.

14

1.4.1. Determination of Open Reading Frames

The first step in the annotation of a genome is the determination of the regions encoding

putative functional macromolecules. DNA sequences predicted to encode functional proteins are

referred to as open reading frames (ORFs). The simplist method for the prediction of ORFs is to

search the genome for in-frame start and stop codons separated by an arbitrary length (63). This

produces a set of coding sequences (CDS) corresponding with a high degree of accuracy, but randomly

occurring instances of start and stop codons results in a significant number of false positives that do

not correspond to legitimate CDS. It has been estimated that these “phantom ORFs” can account for

10-30% of the recognized ORFs of a genome (63). Current ORF-finding algorithms alleviate this

problem to an extent by using statistical methods to predict the true translational start site (64-66) but

are unable to eliminate false positives entirely. Algorithm-independent problems with ORF detection

can include the following: 1) shifting of start or stop codons out of frame due to sequencing errors or

legitimate frameshifts that causes a CDS to not be detected, 2) sequencing error in a start or stop codon

that causes a CDS to not be detected, 3) the fusion of two biochemically distinct genes into a single

putative CDS, 4) the separation of a biochemically relevant multidomain protein into separate CDS,

and 5) phantom ORFs that cannot be easily eliminated by computational means. Lost or misleading

ORFs resulting from sequencing errors can be difficult to resolve using purely computational means

and must often be corrected manually (67). Phantom ORFs, on the other hand, are more amenable to

computational analysis. If a putative ORF results purely from random chance, it can be assumed that

the ORF sequence will not display significant conservation across species and may thus be analyzed by

sequence homology and comparative genomics techniques (63).

1.4.2. Elucidation of Protein Function by Sequence Homology

Once the set of ORFs have been defined for a genomic sequence, the next step in the annotation is to assign putative functions to each predicted gene product. Functional assignment of gene products permits the construction of the regulatory and metabolic networks of the organism. A

15 variety of algorithms have been developed for the comparison of unassigned sequences to those available in the publically accessible sequence databases for the purposes of functional identification.

Functional assignment of a sequence is based on the assumption that proteins of similar

function will maintain a degree of sequence similarity between organisms. Computational sequence

analysis algorithms are designed to quantify the degree of similarity and are evaluated by

computational speed, sensitivity (detection of similarities) and selectivity (rejection of false positives)

(68). A given algorithm represents a tradeoff between these three parameters. Global pairwise

sequence alignment algorithms (69, 70) display a relatively high degree of accuracy at the cost of

speed. Conversely, local pairwise alignment algorithms such as BLAST (71, 72) search for regions of

local similiarity. Local alignment is less computationally expensive, but this speed comes at the cost

of lower selectivity (73). Furthermore, the high-scoring segment pairs (HSPs) produced by the

BLAST algorithm represent only local similarity. As such, the presence of repeated fragments,

homologues and pseudogenes can result in redundant HSPs in the BLAST output (74). Despite these

deficiences, the speed and sensitivity of BLAST analysis makes this algorithm the popular choice for

routine sequence comparisons, particularly when one considers the increasing size of the public

sequence databases.

The primary task of any sequence comparison algorithm is to quantify the degree of similarity

between multiple sequences. A simple indicator of sequence similarity is the percent identity between

two sequences, defined as the number of identical residues between the two sequences. However,

functional assignment by this measure is problematic for sequence pairs displaying low identity.

Transition to the “twilight zone” of 20-35% identity comes at the cost of a drastic decrease of

selectivity with an accompanying increase in the number of false positives (75). Statistical methods

for quantifying the degree of similarity have been shown to be more accurate and reliable and a variety

of measures have been developed (68, 71, 72, 74, 76-78). The statistics of ungapped local alignment

by BLAST are well-defined (72) and experiments suggest that the theory remains valid for gapped

alignments as well (72, 79). The raw score S of an HSP is dependent on the scoring matrix used, the

assigned gap penalties, and the lengths of the two sequences compared. Because of this dependence on

sequence length, a high raw score may result from the alignment of long sequences as opposed 16 significant homology between the sequences (78). The normalized score S’ (in bits) is defined by the equation:

λS − ln K S'= ln 2

Where K and λ represent statistical parameters determined for each individual analysis by the BLAST

algorithm (71). Normalizing the raw score permits the statistical comparison of different scoring

systems (72). The number E of distinct HSP’s of score >=S’ expected to occur by chance is

determined by the equation:

E = mn2−S '

Where m and n are the lengths of the sequences being compared (72). The value E is often referred to as the expect score, or E score. The probability P (referred to as the P-score) of an HSP occurring by chance with score >= S’ is related to E by the equation:

P = 1− e −E

When E < 0.01, E ≈ P.

Despite the speed advantage of pairwise methods, the amount of information available from these methods is limited (80). For protein sequence pairs sharing 20-30% sequence identity, only half are detected by pairwise methods (76). Alternatives to pairwise methods are based on comparisons against shared characteristics of multiple related sequences (81). A variety of such techniques have been developed, including position-specific-iterated (PSI) BLAST (72), hidden Markov models (82,

83), profile analysis (84) and intermediate sequence searches (85). These methods result in significantly fewer false negatives in the twilight zone of 20-30% sequence identity but come at a higher computational cost. A full-scale annotation effort would nonetheless employ multiple 17 comparison techniques in order to obtain a more complete picture of the functional gene set of the genome.

For the analysies described in this paper, homology between protein sequences was examined using the BLASTP algorithm. Due to the degeneracy of the genetic code and the large number of amino acids residues compared to nucleotide residues (20 compared to 4), a protein sequence would be expected to encode more information than the corresponding nucleotide sequence (86). The BLASTP algorithm was chosen because of its high speed and selectivity which can allow for the detection of statistically significant homology at bit scores as low as the 55-70 range (86). For analyses where

protein-protein sequence comparisons were not possible because the coding regions of a genomic

sequence were not defined (i.e. an early version of a genomic sequence where the predicted proteome

has not yet been defined), the TBLASTN (protein sequence against translated nucleotide sequence)

algorithm was used instead.

1.4.3. Detection of Orthologs

The computational comparison of protein sequences is based on the assumption that

structurally related sequences in disparate organisms will carry out similar functions. Proteins derived

from a common ancestor, referred to as homologs, can be divided into two categories based on the

nature of their evolutionary relationship: orthologs (proteins in different organisms resulting from

speciation) and paralogs (proteins from the same organism resulting from gene duplication) (87). A

common method for the assignment of orthologs is to use reciprocal best hit (rbh) methods. In this

method, a sequence i from genome I is compared to genome J and the top hit j from genome J is compared again to the original genome using the same conditions. If the original query sequence i is

returned, the two sequences are considered to be orthologs. The primary problem with this method is

that the highest scoring hit for a given sequence does not necessarily represent the closest phylogenetic

neighbor (88). The presence of paralogs can further complicate analysis by causing the exclusion of

authentic orthologous pairs (89). Despite this drawback, rbh remains a fast and convenient method for

18 the initial determination of orthologs. Other methods such as reciprocal smallest distance (rsd) (89) can allieviate some of these errors at the cost of additional computations.

Gene duplication is considered to be one of the single largest contributors to the diversity of function in protein superfamilies (90). When a gene is duplicated, it is assumed that one of the genes will continue to encode a protein carrying out the original function while the second would be under less selective pressure to maintain that specific function. As such, the second copy of the gene would have more freedom to evole novel functions (91). As such, orthologs (speciation) will tend to encode proteins with similar functions while paralogs (duplication) are more likely to encode proteins of divergent function (91). This is not always true, however, as orthologous proteins have been shown to encode different functions (92). Phylogenetic profling (functional assignment based on phylogenetic methods, see below) is an increasingly valuable tool in elucidating the functions of homologous proteins (91, 93).

1.4.4. Functional Assignment by Contextual Analysis

Sequence homology alone is often insufficient to determine function (87). For example, it has

been shown that mechanistic diversity does not always require significant divergence of sequence, nor

does high sequence identity guarantee that two proteins will share the same function (87, 94). To aid

in the functional assignment of proteins, investigators often employ ‘guilt by association’ techniques

that attempt to elucidate a protein function based on associations with other macromolecules. The

most general form of contextual analysis is phylogenetic profiling, which is defined as the pattern of

occurrence of orthologs of a particular gene in a set of genomes under comparison (93, 95, 96) and is

the basis for the cluster of orthologous genes (COG) database (97). The underlying assumption of of

phyletic profiling is that functionally linked proteins have homologs in the same set of organisms due

to correlated evolution, i.e. genes encoding flagellar proteins should be present in bacteria utilizing

flagella (96).

A second method of contextual analysis is based on the assumption that in bacteria, functionally related genes will organize into operons and that these operons will be conserved to an

19 extent across species (95, 98, 99). Despite the variability of the composition of operons across species, the observation that a set of genes tend to cluster in operons suggests a functional link between the genes even in organisms where the genes are spatially separated. Furthermore, the observation of unique operon arrangements involving functionally known and unknown genes can be used to imply a function for the unknown genes (98).

Finally, the presence of protein domain fusions can be used to define a functional link between proteins. A multidomain enzyme resulting from a gene fusion of two consecutive enzymes of a pathway that facilitates the kinetic coupling of the two enzymes will likely be maintained by selection (98). When the domains of such a protein fusion are observed to be encoded by separate genes in a second organism, the two spatially separated genes can attentively be assumed to maintain a functional link. For example, the adhE gene of E. coli encodes a multidomain enzyme catalyzing the

sequential reduction of acetyl-CoA to acetaldehyde to ethanol (100). The protein contains an

aldehyde:NAD+ oxidoreductase domain for the first step of the reaction and an ethanol:NAD+ oxidoreductase domain for the second step. In Salmonella typhimurium, individual homologs of these domains (EutE and EutG) carry out a similar reaction sequence as part of the ethanolamine metabolism pathway and the genes encoding these ezymes are part of a larger eut operon (101). Thus, it is logical to assume that the adhE gene of E. coli is the result of a fusion of two originally distinct enzymes.

This also demonstrates that the assignment of function by analysis of fusion events often requires additional assignment methods for verification.

1.4.5. Pitfalls of Functional Assignment by Sequence Homology

Methods

Efforts to predict protein function from sequence are complicated by the fact that protein

function is an ill-defined concept (102). Individual proteins rarely carry out a single function but

rather multiple functions based on interactions with other proteins or changing physiological

conditions. Proteins often display different functions at different levels of cellular organization

ranging from the biochemical and cellular to, in the case of multicellular organisms, the developmental

20 and tissue levels (102). Conversely, similar functions may be mediated by different protein structures that evolved independently over time (103). Efforts such as the Gene Ontology (GO) project (104) have been made to categorize the multifunctional nature of proteins through the establishment of a standardized nomenclature, but the establishment of controlled vocabularies is difficult (102).

An example of such multifunctional proteins that may have relavence to the sulfate-reducers is the phosphoenolpyruvate:carbohydrate phosphotransferase system (PTS). The traditional role of the

PTS is for the phosphorylation and transport of carbohydrates into the cell (105). The

phosphorylation of the carbohydrate occurs through the sequential transfer of phosphate from

phosphoenolpyruvate through the common PTS enzymes (EI and HPr) and the carbohydrate-specific

enzymes/domains (EIIA, EIIB) (PEP→EI→HPr→EIIA→EIIB→Carbohydrate) (105). Carbohydrate transport is not the only function of the system however. The glucose-specific EIIAGlc subunit has

been shown to play a role in the regulation in the usage of non-PTS carbon sources in enteric bacteria

(105). EIIAGlc is phosphorylated during the course of phosphor transfer to glucose, but the degree of

phosphorylation of the molecular population is determined by the flux through the system. When the

cell is actively transporting glucose, the nonphosphorylated form will predominate and has been shown

to bind to transporters of non-PTS carbon sources. The binding of EIIAGlc inhibits the uptake of these

carbon sources and this process is referred to as inducer exclusion. In cells growing on non-PTS

carbon sources, the phosphorylated EIIAGlc predominates and the binding of this protein to adenylate

cyclase increase the production of cAMP which is needed for several catabolic enzymes (105). Other

functions attributed to the PTS enzymes are alternative phosphorylation events with TCA cycle

enzymes, a linking of carbon and nitrogen metabolism involving protein-protein interactions with the

σ54 subunit of RNA polymerase, and signal transduction linked to chemotaxis (105). The example of the PTS enzymes shows that it is often difficult to assign complete functions to even well- characterized enzymes.

Sequence homology methods such as BLAST detect structural homology from which functional homology is implied. This is based on the assumption that proteins of similar structure will carry out similar functions. As has been seen, this is not always a safe assumption. It is well established that function tends to diverge more rapidly than structure (106). As such, several efforts 21 have been made to determine the efficiency of functional annotation transfer using sequence homology methods. Pairs of proteins sharing 30% sequence identity are presumed to have similar structures, but the sequence-structure relationship becomes problematic in the twilight zone of <25% sequence identity (75, 106). Several efforts have been conducted to link structural and functional homology, particularly in this twilight zone of low sequence identity. One such method of the assemment of functional transfer is to examine the computational transfer of the Enzyme Commission (EC) number

of a known enzyme to an enzyme of unknown function. Devos, Wilson and Todd conducted all-

against-all pairwise sequence comparisons under various conditions and sequence identity cutoffs and

concluded that 40% sequence identity was a confident threshold for assessing functional conservation

(90% accuracy when transferring all four EC numbers) (103, 107, 108). However, it has been argued

that the ‘gold standard’ databases (i.e. SWISSPROT) which are used to assess functional

characterization represent only a small subset of enzyme function (102). Rost constructed a dataset of

SWISSPROT sequences grouped into sequence families designed to reduce sequence bias and

concluded from his results that below 70% sequence identity, all four EC numbers begin to diverge

(<90% conservation) (102). Conversely, Tian argued that while Rost’s method is closer to the truth,

using only sequence similarity to construct sequence families might result in misassignments (106).

For a simple all-against-all pairwise, Tian showed that the first three EC numbers can be transferred

90% of the time when the gloal sequence identity is above 30% (106). Furthermore, Tian confirmed

Rost’s conclusion that enzyme function conservation is lower than previously anticipated. However, it

was shown that by grouping enzyme sequences by both structural and function similiarity, the first

three EC numbers could be transferred 90% of the time when the global sequence identity was greater

than 40% (106).

As described in the introduction, statistical methods such as the E score are superior to

sequence identity scores in detecting remote homology (76). However, these scoring methods must be

used with some caution. For the first iteration of a PSI-BLAST analysis (corresponding to a single

BLASTP analysis), the first three digits of an EC number are transferred 97% of the time when the E

score is less than e-100 and 88% when the E score is between e-60 and e-50 (106). In contrast, by the third

PSI-BLAST iteration the transfer rate when the E score is between e-60 and e-50 falls to 47% (106). 22 Thus, E score alone is insufficient for measuring the transfer of function and should be coupled with other scores such as % identity.

1.4.6. Constructing Biochemical Networks from Genomic

Sequence Data

1.4.6.1. Metabolic Networks

Once a genomic sequence set has been functionally annotated, the next step is to use the data for individual genes and gene products to construct putative metabolic networks. Several tools are available for the reconstruction of metabolic networks from sequence data, including the MetaCyc

(109) and KEGG (110) databases. Despite the increasing sophistication of the computational tools, pathways reconstructed from genomic data often contain gaps due to “missing” enzymes (111, 112).

Such gaps may result from several phenomena: 1) inability of computational tools to detect homology with known enzyme sequences, 2) the presence of analogous enzymes that carry out the same cellular function but lack sequence homology (113, 114), and 3) legitimate biochemical gaps in the pathway

(111). For example, it has been biochemically demonstrated that Dv. vulgaris employs a modified pathway to synthesize heme. In this modified pathway, the traditional step catalyzed by HemE

(uroporphyrinogen III to coporphhyrinogen III) is replaced by multiple steps (uroporphyrinogen III to precorrin-2 to 12,18-didecarboxyprecorrin-2 to coporphyrinogen III) (114) and the enzymes associated with these activities are unknown. In such a case, the lack of any sequence or additional biochemical data regarding the enzyme makes detection of the enzyme by computational means impossible (111).

While the computational reconstruction of the metabolic pathways of an organism is a useful tool for predicting the biochemical networks of an organism, these networks must be considered putative until verified by biochemical analysis.

1.4.6.2. Regulatory Networks

23 Just as metabolic networks can be computationally predicted from genomic data, so too may

regulatory networks be predicted. The set of genes in an organism whose expression is known to be

regulated by a specific regulatory protein is referred to as a regulon, i.e. all of the genes in E. coli known to be regulated by the arginine repressor ArgR are part of the E. coli ArgR regulon. Over short evolutionary distances, conservation of regulons is assumed (115). Putative regulons can be constructed for unknown organisms based on sequence homology of the individual regulon members, but no actual regulatory information is conferred by this method. Contextual analysis can often infer a regulatory function based on the proximity of the gene in question to genes encoding proteins of known function (116). While contextual analysis adds a degree of confidence to regulon prediction, the actual regulatory information inferred is still limited. An additional degree of verification can be obtained by the examination of promoter regions and the analysis of regulatory binding sites to determine biochemically-relevant regulatory information.

Transcriptional regulation of gene expression is based on the idea of a specific regulatory protein binding to a conserved DNA sequence within the promoter region of the target gene and the regulatory response can be either positive or negative depending on cellular conditions. The expression of certain genes can be further modulated through the activities of multiple regulatory proteins binding distinct promoter elements and/or through protein-protein interactions that affect the activity or binding specificity of regulatory proteins. The specificity of regulatory protein binding to

DNA requires a specific interaction between a conserved DNA-binding motif of the protein and a consensus binding site in the nucleotide sequence. Much research has been devoted to the prediction of binding sites within a genome with limited results (117). Current methods instead employ comparisons of multiple genomic sequences to derive regulatory motifs that are conserved between or among the species (115, 118-122).

Regardless of the method used, the regulon prediction requires information on the number of transcription units encoded by the genome. A transcription unit (TU) is defined as the set of one or more genes that are cotranscribed. In most eukaryotes, all of the organism’s genes are monocistronic, that is, each gene corresponds to a single transcription unit. The monocistronic nature of eukaryotes

simplifies the prediction of both TU’s and regulatory motifs, as the number of instances of a motif in 24 the genome will be maximal (118). This process is complicated in prokaryotes where genes are often cotranscribed as operons, with an operon being defined in this context as the subset of TUs containing two or more genes (123). Operons complicate the identification of true promoter sequences while simultaneously decreasing the number of instances of a motif within the genome (118). Thus, the comparative genomics methods already described are particularly useful for the prediction of motifs from bacterial genomes.

The simplest method for the prediction of transcription units is to cluster genes based on intergenic sequence length and directionality (123). For this method, an arbitrary basepair length cutoff is determined and two genes separated by an intergenic sequence of length less than this cutoff are considered to be cotranscribed. Directionality is also important as all of the genes of an operon are assumed to be transcribed in the same direction (123). While this method is somewhat simplistic, it has been shown to be a quite accurate prediction method, particularly when coupled with contextual analysis methods (116).

For this project, transcription units were determined based on intergenic sequence length and contextual analysis. Orthologs were determined by the reciprocal best hit technique using the

BLASTP algorithm.

1.5. Summary of Project

The accumulated biochemical knowledge of the carbon metabolism of Desulfovibrio species suggests that at least some species encode all of the enzymes of the glycolytic and gluconeogenic pathways despite the inability of these organisms to metabolize hexose sugars as carbon sources. One model suggests that the cells are unable to grow due to the lack of hexose transport molecules and that the pathway’s primary purpose is the synthesis of polysaccharides. The genomic sequence of two

Desulfovibrio strains was analyzed and the central carbon metabolism pathways were elucidated.

Putative carbon metabolism and carbon transport proteins were identified using the BLASTP algorithm

(72) supplemented by contextual analysis. The pathways constructed from this information were

25 compared to the biochemical literature to produce a more complete picture of central carbon metabolism by Desulfovibrio.

A seemingly complete galactose metabolism pathway was identified from Dv. desulfuricans

G20 despite the lack of evidence for growth by this organism on galactose. It was hypothesized that the organism carries out the metabolism of galactose for the purposes of production of extracellular polysaccharides and experiments were conducted to test this hypothesis.

Desulfovibrio species are classified as strict anaerobes but are remarkably tolerant to exposure to molecular oxygen. This fact, coupled with the diversity of terminal electron acceptor usage by

Desulfovibrio species, suggested that regulatory proteins of the FNR family might play a significant regulatory role in the organisms. The genomic sequences of two Desulfovibrio species were searched for instances of genes encoding putative FNR proteins. A mutant strain of Dv. vulgaris lacking the gene for an FNR protein common to the δ-proteobacteria was constructed and phenotypic analyses were conducted to determine the function of this regulator. Further computational analyses were conducted to predict other regulatory networks that may be important to the organism.

26 2. Comparative Carbon Metabolism of Desulfovibrio vulgaris

Hildenborough and Desulfovibrio desulfuricans G20

2.1. Introduction/Rationale

Prior to the advent of genomic technology, the accumulated data on carbon metabolism of

sulfate-reducing species were derived primarily from biochemical and physiological studies. With the

completion of the genomic sequences of multiple SRB species and the development of tools for the

genetic manipulation of the organisms, the collected biochemical and physiological data may now be

examined in a genetic context. At the time of this writing, the partial or complete genomic sequences

of the following SRB species are now available: Desulfovibrio vulgaris Hildenborough, Desulfovibrio

desulfuricans G20, Desulfotalea psychrophila, Desulfobacterium autotrophicum and the archaeal

species Archeoglobus fulgidus (Table 1-1). In addition to the SRB, genomic sequences of several

related δ-proteobacteria are also available. A comparative analysis of the metabolic pathways of these

organisms using the accumulated genomic, biochemical and physiological data has provided insights

into the metabolic and regulatory networks of not only the SRB specifically but the δ-proteobacteria in

general.

Annotation of the Desulfovibrio genomic sequences was carried out by a combination of individual laboratories and sequencing centers. Preliminary annotation of the genomic sequence of

Dv. vulgaris Hildenborough was performed by The Institute for Genomic Research (TIGR) with input from the laboratories of Dr. Judy D. Wall (University of Missouri) and Dr. Gerritt Voordouw

(University of Calgary). The annotation of Dv. desulfuricans G20 was carried out by the annotation groups from the United States Department of Energy Joint Genomes Institute (DOE-JGI) and Oak

Ridge National Laboratory (ORNL) with input from the laboratory of Dr. Judy Wall. Additional annotations for the δ-proteobacteria and the development of computational tools for genomic

27 comparisons were provided by the Virtual Institute for Microbial Stress and Survival (VIMSS) through

DOE-JGI.

The genomic sequence data from the Desulfovibrio species were used to elucidate the carbon metabolism pathways of the organisms and these data were compared and contrasted to the accumulated knowledge base of Desulfovibrio metabolism. Based on these analyses, laboratory experiments were designed to determine if insights gleaned from the genomic data translated into predicted physiological phenomena.

28

2.2. Materials and Methods

BLAST analysis of Desulfovibrio. Carbon metabolism proteins of Desulfovibrio were

identified by sequence homology using stand-alone BLAST v2.2.8 (72). To test the reliability of

BLAST, sets of experimentally-verified control sequences (protein and nucleotide) were constructed

based on the following criteria: 1) sequences from Dv. vulgaris Hildenborough (exact match), 2)

sequences from the related strain Dv. vulgaris Miyazaki and 3) sequences from other Desulfovibrio

strains that were not predicted to be present in Dv. vulgaris Hildenborough. Comparisons were made

between the control sequences and the appropriate predicted sequence set from Dv. vulgaris (protein

vs. protein using BLASTP, nucleotide vs. nucleotide using BLASTN) using default parameters for

each analysis. While identification was possible using nucleotide data, the number of false positives

was significantly larger than for protein-protein comparisons due to the large number of high-scoring

pairs (16+ nucleotides in this case) generated by BLAST. Due to this phenomenum and the fact that

protein sequences are expected to contain more information than corresponding nucleotide sequences

due to the degeneracy of the genetic code, protein-protein comparison was chosen for the full analysis.

BLAST parameters (BLOSUM62 scoring matrix, low-complexity filter off) were determined through

additional comparison. The results of these analyses are listed in Appendices A1-5. BLASTP

parameters and statistical cutoffs of E score <1e-10 were chosen based on this control data (see Results).

Carbon metabolism proteins of Desulfovibrio were determined by comparison of either predicted E. coli proteome members or experimentally-verified sequences obtained from the NCBI database using

BLASTP with the previously determined parameters and statistical cutoffs. The results of these analyses are listed in Table 2-4.

Bacterial strains and culture methods. Wild-type strains of Desulfovibrio desulfuricans

G20 were used in this study (Table 2-1). Overnight cultures of the bacteria were grown in LS4a medium (Table 2-2), defined as follows: 50 mM sodium sulfate, 60 mM sodium DL lactate, 8 mM

MgCl2, 60 mM CaCl2, 20 mM NH4Cl, 1.0 mM K2HPO4 (dibasic), 1x trace mineral solution as described below, 30 mM PIPES (pH 7.0), 1g/l yeast extract, pH 7.2. After autoclaving, the medium

29 was supplemented with Thauer’s vitamins solution (described below, final concentration 1x). Media for growth of Dv. desulfuricans G20 were additionally supplemented with nalidixic acid to a final concentration of 200 µg/ml to reduce growth of contaminants. Defined LS4a medium lacking yeast

extract was designated LS4aD.

Trace mineral solution is defined as follows: 12.8 g/l nitrilotriacetic acid (pH 6.5), 1.0 g/l

FeCl2·4H2O, 0.5 g/l MnCl2·4H2O, 0.3 g/l CoCl2·4H2O, 0.2 g/l ZnCl2, 0.05 g/l Na2MoO4·4H2O, 0.02 g/l

H3BO3, 0.1 g/l NiSO4·6H2O, 0.002 g/l CuCl2·2H2O, 0.006 g/l Na2SeO3·5H2O, 0.008 g/l Na2WO4·2H2O.

The solution was prepared by adding nitrilotriacetic acid to dH2O and adjusting the pH to 6.5 with

HCl. The remaining minerals were added in the order listed and the pH was readjusted to 6.5 with

NaOH. The solution was stored at 4ºC until added to the medium prior to autoclaving.

10 x Thauer’s vitamin solution is defined as follows (124): 0.02 g/l biotin, 0.02 g/l folic acid,

0.1 g/l pyridoxine HCl, 0.05 thiamine HCl, 0.05 g/l riboflavin, 0.05 g/l nicotinic acid, 0.05 g/l DL-

pantothenic acid, 0.05 g/l p-aminobenzoic acid, 0.05 g/l lipoic acid (DL-6,8 thioctic acid), 2.0 g/l

choline chloride and 0.01 g/l vitamin B12. The pH of the solution was adjusted to 7.0 with KOH, filter

sterilized and stored at 4ºC in ~50 ml aliquots.

All media were reduced with 30 ml/l titanium citrate prepared as follows: 500 ml of 0.2 M

Na-citrate solution was boiled under N2 for 20 min followed by the addition of 37.5 ml 20% (wt/vol)

Ti-chloride (TiCl3 in 0.2 M HCl) and 100 ml 8% (wt/vol) sodium carbonate. The flask was stoppered and the stopper was secured by wiring and autoclaved. Upon cooling to room temperature, the mixture was distributed into 50 ml aliquots. Sodium carbonate solution (8% wt/vol) was prepared by boiling under CO2 for 20 min followed by filter sterilization. Titanium citrate additions were made in an

anaerobic chamber to 5 ml/l.

Growth of Dv. desulfuricans G20 on secondary carbohydrates. Overnight cultures of Dv. desulfuricans G20 were grown to an OD600 0.5-0.7 on LS4a medium as defined above. Basal no- carbon medium (NCS4) for the analysis of growth on carbohydrates consisted of LS4aD medium minus lactate (Table 2-2). For experiments testing fermentative growth on carbon sources, a modified

NCS4 medium lacking high levels of sulfate was used (NC4). Sealed Hungate tubes were autoclaved and flushed with 100% N2 to purge the tubes of oxygen. Carbon source stock solutions (2M 30 concentrations) were prepared separately and filter sterilized. The carbon sources were added to the appropriately-labeled Hungate tubes with the final concentration of the carbon sources in 10 ml medium as follows: lactate (60 mM), galactose (10 mM), glucose (10 mM), fructose (10 mM), or trehalose (10 mM). For experiments involving mixotrophic growth on acetate, the appropriate medium was supplemented with sodium acetate (2M) to 1 mM. The overnight cultures were harvested by centrifugation at 2000 x g for 10 min at 4°C, resuspended in an equal volume of NCS4 and incubated at 37ºC for ~2-3 h prior to inoculation to allow residual carbon stores to be consumed. Carbon- depleted cells were diluted 1:20 to stock solutions of NCS4 or NC4 medium (as appropriate) and 10 ml of inoculum was added to the appropriate tubes by syringe in an anaerobic chamber. Two 1 ml aliquots of the inoculum and uninnoculated NCS4 medium were saved for initial time total protein assays (125). Cultures were incubated at 37ºC and growth was followed by optical density measurements at 600 nm using a Genesys 20 Thermospectronic spectrophotometer. Aliquots of 1 ml were periodically removed by syringe to monitor growth by total protein assays.

[1-14C]galactose incorporation assay. Galactose incorporation into cells grown on lactate- sulfate (60 mM/50 mM) was measured using [1-14C]galactose. An overnight culture of Dv. desulfuricans G20 was added to 20 ml NCS4 medium containing 60 mM lactate and 10 mM galactose

(LGS4), to obtain a final OD600 = ~0.1. This culture was divided into two 10-ml portions in heat

14 sterilized sealed Hungate tubes under prepurified N2. To one 10 ml sample, 100 µl of [1- C]galactose

(55 mCi/mmol, 0.11 mCi/ml) was added. Then both 10 ml samples were split into duplicate 5 ml cultures. From each [14C]-labeled culture, a 100 µl sample was immediately removed for baseline

counting. The cultures were grown to OD600 ~0.7-0.8 at 37ºC at which point three 1-ml aliquots of each isotopically labeled culture were removed and harvested by centrifugation at 16000 x g for 5 min.

Each supernatant was sampled for counting. The cell pellets were washed twice in unlabeled medium and the final wash supernatant was collected for counting. The cell pellet was resuspended in 1 ml unlabeled medium and aliquots were removed to determine incorporation of label. The collected samples were analyzed using a Beckman-Coulter LS1701 scintillation counter.

Growth of Dv. desulfuricans G20 under stress conditions and preparation of total cellular RNA. Dv. desulfuricans G20 was grown under stress conditions predicted to induce synthesis 31 of extracellular polysaccharides (EPS). The media used were as follows (Table 2-2): LS4aD (control),

LGS4 (presence of galactose), and LS4bD+NaCl (osmotic stress). Dv. desulfuricans G20 cultures grown under these conditions were used for preparation of total cellular RNA and for growth determination as described for secondary carbohydrate growth of Dv. desulfuricans G20. Total cellular RNA was isolated using the RNAWhiz kit (Ambion, Austin, Texas) and was visualized by gel electrophoresis (1% vol/vol formaldehyde/1.2% wt/vol agarose, 5-10 µg RNA per sample lane).

Growth was measured by following OD600nm of 5 ml cultures inoculated 1:20 in test media in test tubes closed with black rubber stoppers.

RT-PCR. cDNA was constructed from total cellular RNA using the First Strand cDNA

Synthesis Kit (Fermentas, Vilnius, Lithuania) with the appropriate complementary primer. The

resulting cDNA was used as template for PCR amplification of the gene fragment of interest. The

primer sequences used are listed in Table 2-3. Expression was tested on RNA isolated from Dv.

desulfuricans G20 grown on LS4aD, LGS4 (galT) and LS4bD (otsA and otsB) media. Products were verified by size determination on agarose gels and DNA sequencing.

Construction of phylogenetic trees. Amino acid sequences for GalK proteins were collected from GenBank based on the results of BLASTP or BLASTN comparisons of the Dv. desulfuricans

G20 GalK protein against the collected microbial databases. The sequence set was aligned with

ClustalX 1.83 using the Gonnet 250 matrix and the following alignment parameters: pairwise gap opening, 35; pairwise gap extension, 0.75; multiple gap opening, 15; multiple gap extension, 0.30. A phylogenetic tree was constructed from this data set using the neighbor-joining algorithm of ClustalX and the robustness of the tree was analyzed by bootstrap analysis. Trees were viewed and manipulated using the ATV software package (http://www.genetics.wustl.edu/eddy/atv) (126).

32 Table 2-1. Strains, plasmids and sequences used in this study. a Kanr is resistance to kanamycin; Ampr is resistance to ampicillin b DVU ORF numbers from The Institute for Genomic Research Comprehensive

Microbial Resource version 13.0

33 Table 2-1. Strains and plasmids used in this study

Strain, Plasmid or Primer Known genotype, description or sequencea Reference or source Escherichia coli strains e14-(McrA-) recA- endA1 gyrA96 thi-1 (r - K Stratagene, La Jolla, JM109 m +) supE44 relA1 ∆(lac-proAB) [F’ traD36 K CA proAB lacIqZ∆M15] H1 JM109 transformed with pHB1 plasmid This study Desulfovibrio strains Dv. desulfuricans G20 Wild Type (127) Dv. vulgaris Hildenborough Wild Type; ATCC 29579 Dv. vulgaris Hildenborough containing pSC27 Zane, unpublished JW906 plasmid, Kanr results D. vulgaris Hildenborough transformed with HBAR5 pHB1 plasmid inserted within ORF This study DVU2547b, Kanr Plasmids pGEM T-Easy Commercial cloning vector, Ampr Promega, Madison, WI pDrive Commercial cloning vector, Ampr Kanr Qiagen, Valencia, CA Shuttle vector for Desulfovibrio containing pSC27 mobilization genes and Dv. desulfuricans G20 (128) endogenous plasmid pBG1, Kanr D. vulgaris ORF DVU2547b complete coding phbaR This study sequence in pGEM T-Easy, Ampr 472 bp internal fragment of D. vulgaris ORF pHB1 This study DVU2547 in pDrive vector, Ampr Kanr

34 Table 2-2. Growth media used in this study. Media are specifically defined in the

Materials and Methods sections of Chapters 2 and 3.

35 Table 2-2. Growth media used in this study

Medium abbreviation Description Basal lactate (60 mM)-sulfate (50 mM)-yeast extract (1g/L) LS4a medium for Dv. desulfuricans carbohydrate growth curves LS4aD Defined LS4a medium lacking yeast extract LS4aD medium lacking utilizable carbon source for respiratory NCS4 growth assays (50 mM Na2SO4) NCS4 medium with 5 mM sulfate and no utilizable carbon source NC4 for fermentative growth assays NCS4 medium supplemented with 60 mM lactate and 60 mM LGS4 galactose (50 mM Na2SO4) NCS4 medium supplemented with 60 mM galactose (50 mM GS4 Na2SO4) G4 NC4 medium supplemented with 60 mM galactose (5 mM Na2SO4) GYS4 GS4 medium supplemented w/ 1g/L yeast extract GAS4 GS4 medium supplemented w/ 1.0 mM sodium acetate LS4bD LS4aD medium supplemented with 0.5 M NaCl Basal lactate (60 mM)-sulfate (50 mM)-yeast extract (1g/L) medium for Dv. vulgaris comparative growth assay, modified from LS5a LS4a by substitution of PIPES buffer with Tris-HCl and supplemented with 50 mM NaCl LS5aD Defined LS5a medium lacking yeast extract LS5a medium lacking a utilizable carbon source for respiratory NCS5 growth assays (50 mM Na2SO4) NCS5 medium with 5 mM sulfate and no utilizable carbon source NC5 for respiratory growth assays NCS5 medium supplemented with 60 mM pyruvate (50 mM PS5 Na2SO4) P5 NC5 medium supplemented with 60 mM pyruvate (5 mM Na2SO4) NCS5 medium supplemented with 60 mM formate (50 mM FS5 Na2SO4) ES5 NCS4 medium supplemented with 60 mM ethanol (50 mM Na2SO4) LC Trypton-yeast extract medium for growth of E. coli

36 Table 2-3. Oligonucleotide primers used in PCR analysis. All experimental oligonucleotide primers were synthesized by Integrated DNA Technologies (IDT),

Coralville, IA. The listed primer pairs amplify the following products: 1&2, 315 bp internal fragment Dv. desulfuricans galT gene; 3&4 417 bp internal fragment Dv. desulfuricans otsA gene; 5&6 482 bp internal fragment Dv. desulfuricans otsB gene;

7&8, 612 bp fragment containing complete gene sequence Dv. vulgaris ORF

DVU2547; 9&10, 472 bp internal fragment Dv. vulgaris ORF DVU2547; 11&12, 684 bp internal fragment kanamycin resistance gene from commercial cloning vector pDrive (Qiagen); 13&14, standard sequencing primers for commercial cloning vectors; 15&16, 377 bp internal fragment Dv. desulfuricans cycA gene.

37 Table 2-3. Oligonucleotide primers used in PCR analysis

Oligonucleotide Primers Sequence (5’to 3’) Source 1 G20_galT_f AACCGTACAAGGACATTCCGCACA This study 2 G20_galT_r GAGAGTAACATACGACACGGCAGA This study 3 G20_otsA_f TTGTCCGTACGGCACGTTCT This study 4 G20_otsA_r GTTCACATTGCAGGCGCAGT This study 5 G20_otsB_f TGTCACGAAATTCCGCCTTTGC This study 6 G20_otsB_r ACGGTCAGTAACCAGAACAGCA This study 7 hbaR_full_for GTGGCCTCTCTTGTCGTCGAC This study 8 hbaR_full_rev GAGTTGCTGCGTCCCTTCGAC This study 9 hbaR_071502_for ATCAAGGTGTTCCGTTCCGGA This study 10 hbaR_071502_rev CTGTCGAGGATGGTGACGCTA This study 11 pDrive_Kan_f TCAACGGGAAACGTCTTGCTCT This study 12 pDrive_Kan_r AAACTCACCGAGGCAGTTCCAT This study 13 T7 CATTTAGGTGACACTATAG IDT, Coralville, IA 14 SP6 GTAATACGACTCACTATAG IDT 15 c3f GAAGGAGGTATCACAGTTATGAGGA Wall Lab 16 c3r_take2 AGTTCCTTTTTCAGGTCCTTGTC Wall Lab

38

2.3. Results

2.3.1. Genomic and Physiological Insights into Carbon

Metabolism of Desulfovibrio

2.3.1.1. Determination of BLAST Parameters and Statistical

Cutoffs

To determine the parameters and statistical cutoffs for the anlysis of Desulfovibrio proteins, the experimentally-verified Desulfovibrio protein control set was compared to the predicted Dv. vulgaris Hildenborough proteome with BLASTP using default parameters as described in Materials and Methods (BLOSUM62 scoring matrix, low-complexity filter on) (Appendix A-1). Hits corresponding to exact matches with an E score <1e-10 were considered to be true positives while all

other hits with E score <1e-10 were considered false positives even if the hits represented verified

paralogs. No false negatives, i.e. exact matches that did not meet the assigned cutoff, were observed.

The returned hits were plotted as a function of E score vs. % identity in order to determine the validity

of the statistical cutoffs. As predicted from the statistics, shorter sequences produced higher (i.e. less

significant) E scores, but nonetheless all of the true positives displayed >60% sequence identity.

Several of the false positives corresponding to verified paralogs (i.e. NiFe hydrogenase subunits)

displayed % identities between 35% and 60%. Many of the false positives, however, were located in

the “twilight zone” of 20-35% identity where the false positive rate tends to increase dramatically.

Decreasing the E score cutoff to ~1e-20 (bit score ~100) was sufficient to remove many of the false

positives but would likely result in the loss of weak but significant matches in experimental data. The

true negatives displayed a wide range of % identities ranging from 15-75% but extremely high E

scores. An E score cutoff of 1e-10 is thus sufficient to find the majority of true positives, particularly when compared with % identity. Additional comparisons were conducted using different BLAST parameters to determine the ideal conditions for analysis. Comparisons of the control set to the Dv. 39 vulgaris proteome using different scoring matrcies (BLOSUM62 vs. BLOSUM45) showed little difference was observed in the results, so BLOSUM62 was chosen as the standard scoring matrix. A second parameter tested was the low-complexity filter typically utilized by BLAST to ignore sequences displaying a high compositional bias (72). Utilization of the filter decreases the computational time required to run the analysis but has been suggested to negatively affect the results of the analysis in some cases (68). For comparisons (BLOSUM62) in which the low-complexity filter was not necessary, there was no effect on the statistical scores when the filter was turned on or off. In

other cases, however, turning off the filter resulted in increased % identity and decreased (more

significant) E scores. This was often sufficient to eliminate instances of <100% sequence identity

between exact matches. The results of the control analyses are listed in Appendices A1-3. Based on

these control analyses, BLASTP parameters (BLOSUM62 scoring matrix, low-complexity filter off)

were chosen for the final analysis.

The final analyses of Desulfovibrio carbon metabolism proteins were carried out using either

the predicted E. coli proteome or experimentally-verified protein sequences obtained from the NCBI

database and the results are listed in Table 2-4. Two cutoffs were chosen for determining significance

of the hits: E Score <1e-10 (colored black in Table 2-4) and ~<1e-08 (corresponding to a bit score >50, colored blue). The results are discussed in more detail below.

2.3.1.2. Central Carbon Metabolism

2.3.1.2.1. Glycolysis and Gluconeogenesis

Orthologs of all of the genes encoding the glycolytic and gluconeogenic enzymes were

identified in both Desulfovibrio species (Table 2-4). Genes encoding enzymes sufficient to produce polyglucose (glycogen) molecules were also identified. These data are consistent with the known biochemistry of Desulfovibrio. A similar system in Dv. gigas would account for the ability of that bacterium to synthesize and metabolize internal polyglucose (34). However, the presence of these

40 pathways raises the question of why these species are unable to utilize external glucose as a carbon source.

One hypothesis suggests that the cells are unable to transport hexose sugars into the cell, and so the genomic sequences were examined for orthologs to known hexose transporters. The classical phosphoenolpyruvate:carbohydrate phosphotransfer system (PTS) of E. coli couples carbohydrate

phosphorylation to transport across the cell membrane (105). Phosphate is relayed from

phosphoenolpyruvate sequentially through the E1 and HPr subunits, both of which are common to all

PTS systems. The phosphate is then transferred to a substrate-specific EII transmembrane complex

where terminal phosphorylation of the carbohydrate occurs. The EII complex typically consists of

subunits IIA, IIB and IIC as monomers (cellobiose) or fusions (mannose, IIAB and IIC; glucose, IICB,

IIA; mannitol, IIABC) and some complexes include a D subunit (mannose) (105). Each Desulfovibrio species examined appears to encode all of the proteins necessary for a functional PTS transporter. The genes encoding the ‘common’ PTS components, HPr (ptsH/O; DVU0830, Dde VIMSS395593) and E1

(ptsI; DVU0829, Dde VIMSS395594), are found in conserved clusters containing genes for a glycolate oxidase-like complex and a mannose-specific EIID PTS subunit (DVU0831, Dde VIMSS395592).

Genes encoding other EII components are located in conserved clusters with the rpoN encoding the σ54 subunit of RNA polymerase, an arrangement that is common to many Gram-negative bacteria (129,

130).

No genes corresponding to a classical glucose permease were identified. The only possibilities remaining for facilitated transport are the wide array of genes encoding ABC transporter components. ABC-type transporters are a diverse class of proteins that facilitate the transport of a myriad of compounds from carbohydrates to inorganic compounds to metal ions and account for ~40% of the encoded transporters of E. coli (131). Obvious carbohydrate-specific ABC transporters were not identified, but it is difficult to deduce the function of a putative transporter protein from the genomic annotation alone.

2.3.1.2.2. Organic Acid Metabolism

41 Consistent with the known Desulfovibrio physiology of lactate and pyruvate as preferred

substrates, the genomic sequences reveal genes encoding all of the enzymes necessary for the classical

metabolism of these organic acids (Table 2-4, Figure 2-1). Lactate is transported into the cell by a

lactate permease (DVU2110, DVU3026; Dde VIMSS395606, Dde VIMSS393794,) where it is

converted to pyruvate by either L- or D-lactate dehydrogenase (Table 2-4). The presence of both of

these enzymes is consistent with the known biochemistry and eliminates the need for a lactate

racemase which was proposed but never demonstrated (32).

As described in the introduction, there are several possible metabolic fates for pyruvate.

Genes sufficient to encode a functional pyruvate dehydrogenase (PDH) complex were not recognized,

although a gene homologous to the E3 lipoamide dehydrogenase subunit of PDH was identified in both

organisms (Table 2-4). The E3 subunit is known to be shared with branched-chain amino acid

dehydrogenase complex and α-ketoglutarate dehydrogenase complexes. Pyruvate can be converted to

acetyl-CoA either oxidatively by a ferredoxin-linked pyruvate:ferredoxin oxidoreductase (POR) (2

+ - pyruvate <-> 2 acetyl-CoA + 2 CO2 + 4 H + 4e ) (132) or non-oxidatively by pyruvate formate-lyase

(PFL) (2 pyruvate <-> 2 acetylCoA + 2 formate). In both Desulfovibrio species studied, one of the

POR homologs (DVU3025, Dde VIMSS393795) clustered in a conserved manner with genes encoding

L-lactate permease and glycolate oxidase, as well as enzymes of the phosphoroclastic pathway

(phosphotransacetylase and acetate kinase) that allow substrate-level phosphorylation (133). The

conserved clustering of the genes suggests that this particular POR enzyme may play a central role in

the core carbon metabolism of Desulfovibrio. The reaction catalyzed by the POR enzymes (and by

extension the phosphoroclastic pathway as a whole) should also be able to operate in reverse as a

pyruvate synthase. Such an activity would account for the observation of chemolithotrophic growth of

Desulfovibrio species on CO2 and H2 in the presence of acetate.

Genes encoding phosphoenolpyruvate synthase enzymes, which allow for the introduction of pyruvate into gluconeogenesis, were described in the previous section. Mechanisms for the introduction of pyruvate into terminal carbon pathways are examined in the following section.

2.3.1.2.3. TCA Cycle 42 An examination of the genomic sequences of the two Desulfovibrio species suggests an incomplete TCA cycle lacking citrate synthase and possibly malate dehydrogenase. The pathway could conceivably operate in a reverse (reductive) direction (Table 2-4, Figure 2-2). A weak ortholog of the E. coli malate dehydrogenase (MDH) was identified in Dv. vulgaris (DVU0600) but the protein showed greater homology to the MDH protein from Rhizobium leguminosarum (Figure 2-2).

Conversely, no MDH orthologs were identified in Dv. desulfuricans by BLAST analysis using the E.

coli, R. leguminosarum or Dv. vulgaris protein sequences. An early examination of the Dv. vulgaris

genome indicated the presence of two putative lactate dehydrogenase genes (Wall). It is known that

the active L-lactate dehydrogenase (LDH) corresponding to DVU2784 (annotated as an FMN-

dependent dehydrogenase) of Dv. vulgaris is an oxygen-sensitive NAD(P)-independent enzyme

possibly linked to FAD+ (134). Genetic studies are currently underway to verify the functions of these

proteins. It should be noted however that physiological studies conducted in the Wall lab suggest that

Dv. vulgaris does not readily oxidize malate (Yen and Wall, unpublished results).

A gene encoding citrate synthase was not identified in either Desulfovibrio species examined.

An initial observation in the related δ-proteobacteria Geobacter sulfurreducens suggested that

Geobacter species also lacked a typical bacterial citrate synthase (29). Further examinations of the G.

sulfurreducens genome revealed instead a gene encoding a citrate synthase homologous to those of

eukaryotic species. Sequence comparisons of the Desulfovibrio genomic sequences with the G.

sulfurreducens and Saccharomyces cerevisae citrate synthase protein sequences did not reveal

homologs in Desulfovibrio. Thus, evidence for citrate synthase in Desulfovibrio species rests solely on

the early biochemical data in cell abstracts (40, 41).

As previously mentioned, genes encoding a complete pyruvate dehydrogenase complex were

not identified save for a homolog of the E3 lipoamide dehydrogenase subunit, with this activity

presumably delegated instead to the POR/pyruvate synthase complex as previously described. In

addition to this reaction, pyruvate is involved in reactions with at least two other substrates of the TCA cycle. Both malic enzyme (MEZ) and pyruvate carboxylase (Table 2-4) genes were recognized in the genomic sequences of the Desulfovibrio species, catalyzing the conversion of malate to pyruvate and

43 pyruvate to oxaloacetate, respectively. Interestingly, the Desulfovibrio malic enzyme sequences showed significantly greater homology to that of Bradyrhizobium japonicum than to E. coli.

Genes encoding a complete α-ketoglutarate dehydrogenase complex were not identified. This activity instead appears to be controlled by a 2-oxoglutarate:ferredoxin oxidoreductase (α- ketoglutarate synthase) complex similar to that of the POR/pyruvate synthase complex. This enzyme in conjunction with an identified fumarate reductase would suggest that the TCA cycle may be able to operate in reverse for synthesis of metabolic intermediates (Figure 2-2). One major problem with the reductive TCA cycle theory, however, is the apparent lack of a citrate lyase enzyme. Citrate lyase is considered to be critical to reductive CO2 fixation by this pathway as it would allow true autotrophic growth by the bacterium. In fact, the Desulfovibrio species examined seem to lack critical enzymes in the three known carbon fixation pathways (Calvin cycle, reductive carboxylation cycle, acetyl-CoA pathway) which would account for the inability of the cells to grow autotrophically. Despite biochemical evidence of ribulose-1,5-diphosphate carboxylase activity in Dv. vulgaris (135), no genes encoding such an enzyme were identified.

2.3.1.2.4. Secondary Carbohydrate Metabolism Pathways

The central carbon metabolic pathways of the two examined Desulfovibrio species are

essentially identical. However, an examination of the secondary carbohydrate metabolism pathways

reveals marked differences between the two species, with Dv. desulfuricans G20 containing several

pathways for the partial or complete metabolism of mono- and disaccharides other than glucose

(Figure 2-3). Dv. desulfuricans G20 appears to encode an enzyme homologous to fructokinase, but

there is currently no evidence for metabolism of fructose by this organism. Enzymes sufficient for the

biosynthesis of α,α-trehalose were encoded in Dv. desulfuricans G20 (OstAB, ddes VIMSS394765-6),

but the enzymes necessary for cleavage of the molecule for the subsequent introduction of the

monomers into glycolysis were not identified. Most interesting was the observation that Dv.

desulfuricans G20 appears to encode all of the enzymes necessary for the classical metabolism of

galactose (Figure 2-4).

44 Table 2-4. Comparative carbon metabolism of Desulfovibrio. a Pathways derived from KEGG database

(http://www.genome.ad.jp/kegg/pathway.html) b Gene names and descriptions derived from KEGG nomenclature c Organism from which the query sequence was derived with accompanying identifier. Names in red designate experimentally-verified protein sequences obtained from the NCBI database. d ORF designations for Dv. vulgaris Hildenborough from TIGR-CMR

(http://www.tigr.org/tigr-scripts/CMR2/GeneNameSearch.spl?db=gdv) e ORF designations for Dv. desulfuricans G20 from VIMSS (http://www.vimss.org/) f Expect value (E Score) for query protein sequence compared to the subject predicted proteome g BLASTP score (bits) for query protein sequence compared to the subject predicted proteome h % sequence identity between query and subject sequence over the length of homology

45 Table 2-4. Comparative carbon metabolism of Desulfovibrioa

Pathway/Enzyme Notesb Source Dv. vulgaris Hildenboroughd Dv. desulfuricans G20e Ref Organismc Gene # E Scoref Bit % Gene # E-Score Bit % Scoreg Identityh Score Identity Glycolysis and Gluconeogenesis Phosphoglucomutase α-D-Glucose E. coli (b0688) DVU1428 0 658.3 59.41 395078 0 650.2 59.23 Specific, pgm 395240 3.60e-08 53.14 23.42 Glucokinase glk E. coli (b2388) DVU1035 2.20e-17 82.80 27.70 395294 2.70e-18 85.89 27.67 Glucophosphate pgi E. coli (b4025) DVU3222 1.70e-10 60.85 23.82 393524 1.00e-10 61.61 24.27 Isomerase 6-Phosphofructokinase class I, pfkA E. coli (b3916) DVU2061 1.10e-21 97.06 27.41 395173 4.00e-22 98.6 28.10 Fructose-Bisphosphate class II, fba E. coli (b2925) DVU2143 2.20e-13 69.71 24.75 394513 9.20e-15 74.33 25.81 Aldolase Triosphosphate Isomerase tpiA E. coli (b3919) DVU1677 1.70e-38 152.5 39.68 394833 5.50e-37 147.5 38.87 Glyceraldehyde-3- gapA E. coli (b1779) DVU2144 4.10e-83 301.2 49.54 394512 2.20e-79 288.9 46.22 Phosphate DVU2144 5.90e-82 297.4 47.15 393401 1.20e-77 283.1 44.88 Dehydrogenase Phosphoglycerate Kinase pgk E. coli (b2926) DVU2529 1.00e-140 492.7 63.59 394271 3.00e-140 491.5 65.22 Phosphoglycerate 2,3- E. coli (b0755) DVU2935 4.90e-75 273.9 52.44 393998 8.10e-86 309.7 60.41

46 Phosphomutase Bisphosphoglycerate independent, gpmA gpmB E. coli (b4395) DVU3147 8.90e-11 60.08 31.33 395864 8.80e-14 70.09 29.95 392945 1.10e-08 53.14 29.87 Enolase eno E. coli (b2779) DVU0322 1.00e-144 506.1 61.79 393135 2.00e-146 511.9 62.26 Pyruvate Kinase I, pykF E. coli (b1676) DVU2514 5.20e-69 255.0 36.69 395642 3.10e-69 255.8 37.29 II, pykA E. coli (b1854) DVU2514 3.90e-64 238.8 35.09 395642 7.60e-63 234.6 35.59 Fructose-Bisphosphatase fbp E. coli (b4232) DVU1841 1.60e-87 315.8 50.77 394757 1.60e-82 299.3 48.64 class II, glpX E. coli (b3925) DVU1539 1.90e-75 275.8 42.77 Phosphoenolpyruvate ppsA E. coli (b1702) DVU0152 2.10e-57 217.2 33.92 394746 1.30e-54 208.0 43.66 Synthase DVU1833 4.50e-55 209.5 45.34 394746 2.20e-49 190.7 32.41 DVU1833 1.90e-50 194.1 32.05 392863 2.10e-39 157.5 28.73 DVU3214 9.60e-50 191.8 31.80 395518 6.40e-25 109.4 26.65 DVU2739 1.50e47 184.5 31.50 394512 2.20e-79 288.9 46.22 1,4-α-Glucan Branching glgB E. coli (b3432) DVU2243 0 656.8 52.17 394561 0 674.5 52.18 Enzyme 394561 2.20e-08 54.30 23.67 Glycogen Synthase glgA E. coli (b3429) DVU2244 7.30e-79 287.7 37.01 394560 5.70e-79 288.1 36.55 Glycogen Phosphorylase glgP E. coli (b3428) DVU2349 1.80e-19 91.28 24.44 393926 0 858.2 51.43 395327 3.20e-19 90.51 24.31 Glucose-1-Phosphate glgC E. coli (b3430) DVU0072 1.10e-09 57.77 21.82 393438 4.70e-08 52.37 21.22 Adenylyltransferase

Secondary Hexose Metabolism Galactokinase galK E. coli (b0757) Not - - - 393474 8.20e-45 174.1 31.71 Identified Galactose-1-Phosphate galT E. coli (b0758) Not - - - 393475 1.00e-104 372.9 51.16 Uridylyltransferase Identified UDPglucose 4-Epimerase galE E. coli (b0759) DVU1360 2.60e-64 238.8 41.21 394648 1.90e-62 232.6 40.30 DVU1364 1.30e-23 103.6 27.84 394652 7.50e-27 114.4 29.12 DVU0319 2.20e-23 102.8 30.65 405402 1.90e-22 97.44 27.39 DVU0554 2.40e-22 99.37 28.05 393079 9.50e-22 97.44 27.39 DVU3356 1.30e-18 87.04 24.63 393441 1.10e-17 83.96 25.77 Galactose-1-Epimerase galM E. coli (b0756) Not - - - 393742 8.20e-45 174.1 31.71 Identified Glucose-1-Phosphate galU E. coli (b1236) DVU1283 3.30e-52 198.4 42.29 395239 2.40e-53 202.2 41.49 Uridylyltransferase DVU0925 4.40e-12 65.08 24.59 394218 1.60e-12 66.63 25.21 Aldose 1-Epimerase E. coli Not - - - 393472 1.60e-44 174.1 31.71 (gi 725494) Identified 394765 8.30e-09 55.45 26.53 Fructokinase scrK Streptococcus Not - - - Not - - - mutans Identified Identified (gi 26007032) Klebsiella Not - - - 394357 2.90e-22 98.98 32.57 pneumoniae Identified

47 (gi 248849) 6-Phosphofructo-2- Possible gene fusion Saccharomyces DVU3147 1.20e-07 50.83 22.34 392945 3.70e-12 65.85 27.72 Kinase with Fructose-2,6- cerevisiae Bisphosphatase (gi 6324436) Fructose-2,6- Possible gene fusion Saccharomyces DVU3147 3.50e-43 169.1 31.13 392945 2.10e-43 169.9 29.74 Bisphosphatase with Fructose-2,6- cerevisiae 395894 3.80e-08 52.76 26.20 Bisphosphatase (gi 1169587) Phosphomannomutase cpsG E. coli (b2048) DVU0685 1.80e-50 193.4 32.53 394052 9.00e-50 191.0 31.25 DVU1282 6.30e-32 131.7 28.22 395240 3.20e-31 129.4 28.07 Mannose-1-Phosphate GDP, cpsB E. coli (b2049) DVU0697 4.00e-110 391.7 45.78 394063 9.00e-101 360.5 40.59 Guanylyltransferase Mannosephosphate manA E. coli (b1613) Possible ------Isomerase gene fusion w/ cpsB as DVU0697/ Dde 394063 α,α-Trehalose-Phosphate otsA E. coli (b1896) Not - - - 394766 2.00e-76 279.6 35.57 Synthase Identified Trehalose-Phosphatase otsB E. coli (b1897) Not - - - 394765 3.00e-09 55.45 26.53 Identified

Pentose Phosphate Pathway and Reductive Pentose Phosphate (Calvin) Cycle Glucose-6-Phosphate zwf E. coli (b1852) Not - - - 393629 1.90e-93 336.3 39.03 Dehydrogenase Identified Sinorhizobium Not - - - 393629 1.00e-107 383.6 41.88 meliloti Identified (gi 4210900) 6- Pseudomonas DVU2313 9.00e-23 100.1 34.2 395417 4.00e-18 84.73 29.05 Phosphogluconolactonase aeruginosa (gi 7387516) Phosphogluconate Decarboxylating E. coli Not - - - 393630 1.80e-37 150.2 32.23 Dehydrogenase (gi 1736717) Identified Ribosephosphate rpiA E. coli (b2914) Not - - - Not - - - (134) Isomerase Identified Identified Ribulophosphate 3- araD E. coli DVU2531 7.70e-53 199.9 48.53 394269 5.60e-51 193.7 44.24 Epimerase (gi 7437281) Transketolase 2 tktB E. coli (b2465) DVU2530 0 839.3 61.07 394270 0 863.6 62.39 DVU1350 7.50e-08 52.37 22.81 DVU1350 3.70e-07 50.06 24.77 Transketolase 1 tktA E. coli (b2935) DVU2530 0 874.4 63.86 394270 0 898.7 64.61 DVU1350 1.10e-11 65.08 22.18 394636 1.70e-07 51.22 24.59 Transaldolase A talA E. coli (b2464) DVU1658 2.00e-10 59.69 30.32 395610 4.60e-15 75.1 30.58 Transaldolase B talB E. coli (b0008) DVU1658 2.00e-10 59.69 29.34 395610 2.50e-16 79.34 28.17

48 Phosphroribulokinase E. coli Not - - - Not - - - (gi 7434252) Identified Identified Ribulose-Bisphosphate Small Subunit Synechocystis sp. Not - - - Not - - - (134) Carboxylase PCC 6803 Identified Identified (gi 16331394) Large Subunit Synechocystis sp. Not - - - Not - - - PCC 6803 Identified Identified (gi 16331392) Organic Acid, Alcohol and Carbon Monoxide Metabolism L-Lactate Dehydrogenase lldD E. coli (b3605) DVU2784 1.40e-27 117.1 46.46 395881 4.30e-29 122.1 54.92 (135) D-Lactate Dehydrogenase Cytochrome Archaeoglobus DVU0390 1.30e-61 230.3 37.5 393233 1.20e-67 250.4 35.92 fulgidus DVU3027 5.90e-59 221.5 33.17 393793 3.60e-56 212.2 31.25 (gi 2650235) DVU0827 3.30e-54 205.7 32.93 395596 9.00e-55 207.6 31.25 DVU0253 2.30e-23 103.2 26.28 393120 8.20e-24 104.8 31.12 DVU3071 8.30e-21 94.74 41.3 393517 1.20e-19 90.89 24.01

Pyruvate Dehydrogenase aceE, E1 Subunit E. coli (b0114) Not - - - Not - - - Identified Identified aceF, E2 Subunit E. coli (b0115) Not - - - Not - - - Identified Identified lpdA, E3 Subunit E. coli (b0116) DVU1037 1.70e-48 186.8 30.55 395083 3.00e-56 212.6 30.63 DVU1423 8.90e-45 174.5 29.63 395292 3.60e-49 189.1 28.76 DVU3212 1.80e-13 70.48 26.13 393064 3.90e-11 62.77 26.17 DVU0283 2.40e-13 70.09 24.78 393211 4.70e-09 55.84 23.62 395497 1.40e-08 54.3 23.03 Pyruvate:Ferredoxin por Desulfovibrio DVU3025 0 1739.9 69.58 393795 0 1669.8 67.48 (132) Oxidoreductase (Pyruvate africanus DVU1946 3.80e-08 54.3 31.15 395126 1.10e-10 62.77 25.47 Synthase) (gi 1770208) DVU1945 7.20e-07 50.06 23.81 395128 4.30e-07 50.83 28.57 porA, α Subunit Clostridium DVU1945 1.60e-28 120.2 28.20 395126 2.50e-32 132.9 30.00 thermocellum DVU3349 3.70e-25 109.0 25.83 393795 1.20e-23 104.0 25.79 (gi 48859890) DVU3025 4.10e-24 105.5 24.80 393339 3.30e-21 95.9 26.69 DVU1569 3.40e-10 59.31 24.65 394999 1.40e-14 73.94 25.29 porB, β Subunit Clostridium DVU3348 1.40e-13 70.09 29.33 393338 1.40e-13 70.09 29.55 thermocellum DVU1946 1.30e-11 63.54 27.85 393795 3.40e-12 65.47 27.87 (gi 48859891) DVU3025 2.80e-11 62.39 27.72 395128 6.50e-11 61.23 26.05 δ Clostridium Not - - - Not - - - thermocellum Identified Identified (gi 48859889) 49 γ Clostridium DVU1947 1.30e-12 65.85 27.98 395129 6.00e-10 57.00 28.14 thermocellum (gi 48859888) Formate pflB E. coli (b0903) DVU2272 1.90e-23 104.4 23.72 393760 4.00e-32 133.3 24.36 Acetyltransferase 1 DVU2824 2.50e-23 104.0 23.27 393965 4.10e-29 123.2 23.54 393952 3.80e-27 116.7 23.16 395452 4.40e-23 103.2 22.24 Formate pflD E. coli (b3951) DVU2824 1.00e-108 387.1 33.17 393760 6.00e-118 418.3 33.50 Acetyltransferase 2 DVU2272 7.70e-97 348.2 31.97 393965 1.00e-114 407.5 34.30 395452 1.00e-107 384.4 32.66 393952 5.10e-88 318.9 28.41 Formate tdcE E. coli (b3114) DVU2272 7.40e-20 92.43 23.50 393965 3.10e-29 123.6 25.05 Acetyltransferase 3 DVU2824 4.10e-18 86.66 23.27 393760 4.00e-29 123.2 23.90 393952 3.90e-24 106.7 22.77 395452 2.20e-19 90.89 22.97 Pyruvate Decarboxylase Zymomonas DVU0360 8.90e-15 75.10 22.77 394521 5.10e-42 165.6 25.67 mobilis DVU1376 8.30e-13 68.55 20.82 393043 4.80e-16 79.34 22.65 (gi 68249) DVU3293 3.00e-10 60.08 20.21 394666 1.10e-07 51.6 20.46 Alcohol Dehydrogenase adh, Ethanol Desulfovibrio DVU2405 - - - 393583 - - - (136) oxidation (function vulgaris experimentally Hildenborough verified in Dv. vulgaris)

Phosphate pta E. coli (b2297) DVU3029 3.00e-118 419.1 36.51 393791 2.00e-125 443.4 37.48 Acetyltransferase DVU0627 7.30e-17 82.42 28.01 393888 1.20e-14 75.1 25.66 Acetate Kinase ackA E. coli (b2296) DVU3030 2.90e-86 312.0 47.00 393790 1.80e-83 302.8 45.27 DVU0628 2.50e-08 53.14 27.17 Acylphosphatase acyP E. coli DVU1192 2.70e-08 51.22 394420 2.50e-09 54.68 56.25 (TIGR NT01EC1150) Acetyl-CoA Synthetase acs E. coli (b4069) DVU2969 0 746.9 58.91 393822 0 748.0 57.99 DVU0748 1.00e-161 563.1 45.66 394123 1.00e-153 537.0 45.71 DVU2250 1.00e-54 208.0 31.50 394533 1.30e-63 237.7 32.09 DVU1453 2.10e-31 130.6 26.29 395051 8.80e-33 135.2 26.14 DVU3065 5.80e-21 95.9 24.47 392901 2.60e-24 107.1 25.53 Pyrococcus DVU2970 4.60e-32 131.0 33.48 393821 2.40e-36 145.2 35.65 furiosus DVU0585 4.00e-12 64.7 29.59 392981 1.60e-11 62.77 27.84 (gi 5419968) DVU0373 8.90e-12 63.54 29.20 Pyrococcus DVU2970 4.30e-68 251.9 32.38 393821 5.00e-72 265.0 32.60 furiosus DVU0373 3.20e-31 129.4 24.85 392981 7.10e-26 111.7 22.78 (gi 5419966) DVU0585 7.40e-28 118.2 24.38 394520 2.90e-11 63.16 28.02 DVU2137 1.90e-12 67.01 28.38 Aldehyde Dehydrogenase aldA, NADP-linked E. coli (b1300) DVU3319 8.40e-54 204.5 31.61 393333 2.90e-49 189.5 30.32 DVU3294 1.00e-43 171.0 31.84 393896 5.70e-13 68.94 22.19 393763 1.70e-09 57.38 25.10 50 aldAB NAD-linked E. coli (b1415) DVU3319 7.40e-63 234.6 33.76 393333 2.80e-57 216.1 32.26 DVU3294 2.80e-38 152.9 26.62 393896 1.00e-11 64.7 20.74 393763 3.70e-09 56.22 22.74 Acetaldehyde mhpF, Acylating E. coli (b0351) Not - - - Not - - - Dehydrogenase Identified Identified Formate Dehydrogenase fdnH, nitrate- E. coli (b1475) DVU2481 2.50e-36 145.6 39.55 395831 3.10e-34 138.7 37.04 (137, inducible, FeS DVU2811 1.80e-34 139.4 37.50 395910 5.30e-34 137.9 36.60 138) Subunit DVU0588 5.70e-33 134.4 34.89 393591 3.40e-33 135.2 32.56 DVU0535 1.60e-30 126.3 33.33 392828 2.20e-27 115.9 30.43 DVU0172 1.70e-21 96.29 31.58 395941 5.50e-23 101.3 33.33 fdoI, FDO, E. coli (b3892) Not - - - 395942 1.80e-11 62.39 24.58 cytochrome B556 Identified Subunit fdoH, FDO, FeS E. coli (b3893) DVU2481 6.10e-38 151 41.81 395910 2.90e-35 142.1 37.31 Subunit DVU0588 1.50e-33 136.3 33.76 395831 4.20e-34 138.3 36.57 DVU2811 2.60e-33 135.6 37.20 393591 4.60e-33 134.8 33.49 DVU0535 3.00e-29 122.1 30.00 392828 8.40e-37 114.0 26.58 DVU0172 4.70e-22 98.21 31.65 395941 2.50e-23 102.4 30.84 Carbon Monoxide Rhodospirillum DVU2098 2.00e-148 519.6 46.83 393975 2.00e-150 526.2 47.54 (139, Dehydrogenase rubrum 140) (gi 399279)

sn-Glycerol-3-Phosphate glpA, Anaerobic, E. coli (b2241) DVU1940 2.00e-105 376.3 40.93 393035 6.30e-29 122.1 27.09 Dehydrogenase Large Subunit DVU2673 5.00e-71 261.9 38.24 393403 2.70e-19 90.12 28.12 DVU3132 1.30e-34 141.0 27.55 glpB, Anaerobic, E. coli (b2242) DVU1939 5.20e-33 135.2 27.32 Not - - - Membrane Anchor Identified Subunit glpC, Anaerobic, K- E. coli (b2243) DVU3028 6.60e-22 98.21 25.18 393787 4.80e-20 92.05 25.65 Small Subunit DVU3033 1.80e-19 90.12 23.49 393792 1.40e-19 90.51 24.19 DVU1783 4.90e-17 82.03 29.68 394950 3.50e-18 85.89 30.28 DVU0826 3.60e-12 65.85 24.87 DVU0253 8.30e-09 54.68 24.06 Glycerol Kinase gpk E. coli (b3926) DVU3134 8.00e-169 586.6 58.10 393033 1.00e-171 596.3 58.18 Glycerol Dehydrogenase gldA, NAD E. coli (b3945) DVUA0098 1.40e-10 60.46 41.57 Not - - - Identified Methylglyoxal Shunt Methylglyoxal Synthase mgsA E. coli (b0963) Not - - - 394841 2.50e-32 131.0 45.39 Identified Lactoyl-Glutathione gloA, Glyoxalase I E. coli (b1651) Not - - - Not - - - Lyase Identified Identified Hydroxyacylglutathione gloB, Glyoxalase II E. coli (b0212) Not - - - 395856 1.20e-09 56.61 31.74 Hydrolase Identified Saccharomyces DVU1773 5.00e-10 58.00 27.00 Not - - -

51 cerevisiae Identified (gi 1773012) TCA Cycle and Reductive Carboxylation Cycle Pyruvate Carboxylase Rhizobium etli DVU1834 2.00e-120 427.6 29.87 394745 1.00e-133 471.5 31.36 (gi 1256798) DVU2226 1.80e-44 174.9 34.00 395220 6.60e-50 193.0 32.70 DVU0162 4.70e-08 53.91 22.22 393102 1.10e-09 59.31 22.73 DVUA0016 8.00e-08 53.14 30.00 393102 1.40e-07 52.37 20.22 Citrate Synthase gltA E. coli (b0720) Not - - - Not - - - (41) Identified Identified Geobacter Not - - - Not - - - sulfurreducens Identified Identified (gi 39982973) Saccharomyces Not - - - Not - - - cerevisae Identified Identified (gi 6324328) Aconitate Hydratase A acnA E. coli (b1276) DVU1064 1.50e-51 198.0 27.31 395343 1.80e-55 211.1 28.68 DVU2982 1.70e-18 88.2 25.98 393811 9.80e-22 98.98 25.28 Aconitate Hydratase B acnB E. coli (b0118) DVU1064 2.90e-23 104.0 25.87 395343 7.80e-24 105.9 25.65 DVU2982 1.40e-08 85.11 25.22 393811 1.20e-21 98.6 25.76 Isocitrate Dehydrogenase icdA, NADP+ E. coli (b1136) DVU0477 2.00e-115 409.1 53.66 393624 2.00e-116 412.5 54.45 DVU2985 5.00e-20 92.05 25.80 393808 3.00e-20 92.82 24.56

2-Oxoglutarate Synthase α Thauera aromatica DVU1569 8.50e-77 281.2 34.49 394999 1.20e-73 270.8 32.91 (α-Ketoglutarate (gi 19571179) DVU1945 9.40e-52 198.0 33.86 395128 2.00e-49 190.3 33.33 Synthase) DVU3349 3.30e-28 119.8 28.57 393339 1.70e-27 117.5 27.3 β Thauera aromatica DVU1570 4.60e-53 201.1 44.17 394998 4.30e-54 204.5 43.57 (gi 19571178) DVU1946 4.90e-47 181.0 47.26 395128 7.30e-46 177.2 45.77 DVU3348 7.90e-21 93.97 34.83 393338 2.50e-17 82.42 31.50 γ Heliobacter pylori DVU1944 1.00e-13 68.17 52.05 393340 2.10e-09 53.91 43.33 (gi 2935178) DVU3350 2.00e-09 53.91 41.79 393312 3.00e-08 50.06 31.76 DVU1769 1.30e-08 51.22 39.06 Succinyl-CoA Synthetase sucC, ADP- E. coli (b0728) DVU2137 5.80e-55 208.0 37.02 394520 5.00e-54 204.9 34.71 Forming, α/β subunits (possible gene fusion of sucC and sucD) sucD, ADP- E. coli (b0729) DVU2137 1.70e-69 255.8 44.56 394520 3.90e-66 244.6 42.46 Forming, α/β DVU2970 1.30e-10 60.08 25.25 393821 5.20e-10 58.15 26.97 subunits (possible gene fusion of sucC and sucD) Fumarate frdA E. coli (b4154) DVU3262 4.90e-91 328.6 35.64 395465 3.00e-100 359.4 36.53 (44) Reductase/Succinate DVU1809 3.00e-48 186.4 29.43 394963 1.20e-44 174.5 28.25 Dehydrogenase DVU3110 2.10e-17 83.96 24.18 52 frdB E. coli (b4153) DVU3263 7.10e-23 100.5 26.94 395466 3.20e-26 111.7 33.64 DVU2674 1.50e-20 92.82 31.78 Fumarate Hydratase fumA E. coli (b1612) DVU3265 4.90e-26 112.5 39.24 395468 4.90e-23 122.5 42.38 (Fumarase) DVU3264 8.20e-21 95.13 30.24 393484 2.30e-26 113.6 40.40 395467 1.20e-24 107.8 33.33 393483 4.00e-23 102.8 32.40 fumB E. coli (b4122) DVU3265 6.40e-26 112.1 39.74 395468 5.40e-28 119.0 42.38 DVU3264 1.10e-22 101.3 30.28 393484 1.70e-26 114.0 41.06 395467 3.30e-25 109.8 33.33 393483 3.00e-23 103.2 32.00 class II, fumC E. coli (b1611) DVU0080 2.00e-150 525.8 58.77 393185 6.00e-148 517.3 58.24 DVU1871 1.20e-97 350.1 42.58 395105 6.80e-93 334.3 40.34 DVU1766 9.30e-87 313.9 38.78 393555 9.80e-84 303.9 37.72 DVU1094 2.30e-08 53.53 25.09 394351 3.60e-09 56.22 23.36 393991 3.90e-08 52.76 25.40 Malate Dehydrogenase mdh E. coli (b3236) DVU0600 1.60e-12 66.63 23.65 Not - - - (44) Identified Rhizobium DVU0600 2.70e-39 155.6 30.90 Not - - - leguminosarum Identified (gi 2624395)

Malic Enzyme sfcA, NADP+ E. coli (b1479) DVU0414 1.30e-13 71.25 23.9 Not - - - Identified Bradyrhizobium DVU0414 4.00e-126 445.7 55.56 395469 3.00e-132 465.7 55.66 japonicum DVU3029 3.40e-44 173.3 33.02 395485 4.00e-131 462.2 55.80 (gi 23194089) 393791 1.40e-40 161.4 32.40 Aspartate aspC E. coli (b0928) Not - - - Not - - - Aminotransferase Identified Identified Methanothermo- DVU0494 7.00e-61 227.6 36.81 393645 3.80e-62 231.9 35.44 bacter DVU3121 5.40e-45 174.9 31.4 394172 5.60e-42 164.9 30.85 thermautotrophicus DVU0767 7.70e-44 171.0 31.64 393934 1.80e-08 53.53 25.15 (gi 704449) Aspartate Ammonia- Aspartase, aspA E. coli (b4139) DVU1871 5.00e-137 481.1 53.48 395105 3.00e-135 474.9 52.12 Lyase DVU1766 1.10e-96 347.1 42.46 393555 2.80e-89 322.4 41.73 DVU0080 1.40e-88 320.1 38.63 393185 6.30e-89 321.2 39.11 394351 1.50e-10 60.85 27.14 Other Carbon Metabolism Malate Synthase G Glycolate-inducible, E. coli (b2976) DVU0701 0 790.0 55.14 395523 0 768.5 54.86 glcB Malate Synthase A Acetate-inducible, E. coli (b4014) Not - - - 395523 5.20e-12 65.85 22.82 aceB Identified Isocitrate Lyase aceA E. coli (b4015) Not - - - Not - - - Identified Identified

53 Ralstonia Not - - - Not - - - eutropha Identified Identified (gi 17402487) L-2-Hydroxyacid Oxidase glcD E. coli (b2979) DVU3027 4.60e-84 305.1 38.08 393793 1.10e-85 310.5 40.10 (Glycolate Oxidase) DVU0390 8.20e-65 241.1 33.80 395596 6.70e-62 231.5 33.41 DVU0827 8.20e-65 241.1 35.70 393233 1.80e-59 223.4 31.34 DVU0253 1.80e-32 133.7 26.63 393120 2.30e-22 100.1 24.84 DVU3071 5.60e-13 68.94 40.00 393517 1.70e-20 93.97 25.15 FeS subunit, glpF E. coli (b2978) DVU3028 1.70e-24 107.8 22.79 393792 5.00e-27 116.3 24.36 DVU3033 3.10e-21 97.06 21.50 393787 8.90e-24 105.5 24.19 DVU0827 6.00e-17 82.8 36.72 395598 6.20e-17 82.8 23.15 DVU0390 1.30e-14 75.1 33.81 395596 1.40e-16 81.65 36.62 DVU0826 1.30e-11 65.08 22.19 393233 2.90e-14 73.94 29.87 Glyoxylate Carboxyligase gcl E. coli (b0507) DVU1376 1.30e-85 310.5 34.09 394666 6.40e-91 328.2 36.03 DVU0360 3.90e-85 308.9 34.41 393043 5.20e-85 308.5 34.41 DVU3293 5.00e-48 185.7 25.4 394521 1.10e-13 71.63 20.62 Tartronic Semialdehyde E. coli Not - - - Not - - - Reductase (gi 1786719) Identified Identified Glycerate Kinase Methylobacterium DVU0765 1.20e-75 276.9 41.87 394168 2.50e-73 269.2 42.25 extorquens (gi 1907334) Figure 2-1. Glycolytic, gluconeogenic and organic acid metabolim pathways of Dv. vulgaris and Dv. desulfuricans. Substrate abbreviations and enzyme numbers are as follows: GAP (glyceraldehyde-3-phosphate); DHAP (dihydroxyacetone phosphate);

Fd (ferredoxin); 1) UTP-Glucose-1-Phosphate Uridylyltransferase; 2) Glycogen

Synthase; 3) Glycogen Phosphorylase; 4) Glycerol-3-Phosphate Dehydrogenase; 5)

Glycerol Kinase; 6) Lactate Permease; 7) Lactate Dehydrogenase; 8) Formate

Acetyltransferase (Pyruvate Formate-Lyase); 9) Formate Dehydrogenase; 10)

Pyruvate:Ferredoxin Oxidoreductase (Pyruvate Synthase); 11) Phosphate

Acetyltransferase; 12) Acetate Kinase; 13) Hydrogenase. ORF numbers for the genes encoding these enzymes are listed in Table 2-4.

54 Figure 2-1. Glycolytic, gluconeogenic and organic acid metabolism pathways of Desulfovibrio

2 Glycogen 3 UDP-glucose Glucose-1 P 1

4 GAP DHAP Glycerol-3 P 5

10 Glycerol 7 Pyruvate Acetyl-CoA + CO2 + Fdred H2 Lactatein 13 8 11 ADP 6 AcetylCoA Lactateout Acetyl-P + ATP Formate 12 9 Acetate

CO2 + H2

55 Figure 2-2. TCA and reductive carboxylation cycles of Desulfovibrio. Red arrows signify activities not identified from the genomic sequences. OAA = oxaloacetate.

Enzymes are numbered as follows: 1) Pyruvate:Ferredoxin Oxidoreductase (Pyruvate

Synthase); 2) Acetyl-CoA Synthetase; 3) Citrate Synthase (Not Identified); 4) Citrate

Lyase (Not Identified); 5) Aconitase; 6) Isocitrate Dehydrogenase; 7) 2-

Oxoacid:Ferredoxin Oxidoreductase (α-Ketoglutarate Synthase); 8) Succinyl-CoA

Synthetase; 9) Fumarate Reductase/Succinate Dehydrogenase; 10) Fumarate

Hydratase (Fumarase); 11) Malate Dehydrogenase (not identified from BLAST search); 12) Pyruvate Carboxylase; 13) Malic Enzyme; 14) Aspartate

Aminotransferase; 15) Aspartate Ammonia-Lyase (Aspartase); 16) Isocitrate Lyase

(Not Identified); 17) Glycolate Oxidase; 18) Malate Synthase G. ORF numbers for the genes encoding these enzymes are listed in Table 2-4.

56 Figure 2-2. TCA and reductive carboxylation cycles of Desulfovibrio

Pyruvate Acetyl-CoA

CO CO2 2 CO2 Acetate OAA OAA OAA Malate Glycolate Citrate

L-Aspartate

Fumarate Acetyl-CoA Glyoxylate Isocitrate

Succinate α-Ketoglutarate

Succinyl-CoA

CO2 L-Glutamate

57

2.3.1.3. Galactose Metabolism of Dv. desulfuricans G20

2.3.1.3.1. Genomic Analysis of Galactose Metabolism Genes

The most striking difference in the putative carbon metabolic pathways of the two

Desulfovibrio species was the identification in Dv. desulfuricans G20 of a seemingly complete set of

genes for the metabolism of galactose (Figure 2-4). The arrangement of the genes differs from that of

the classical galETK operon of E. coli (136). In Dv. desulfuricans G20, the galE gene is spatially separated from the galTK genes (Figure 2-5). The galE gene is present in both Desulfovibrio species and appears to be monocistronic. In both cases, the gene lies immediately upstream and in the opposite orientation of an lpxB lipid-A-disaccharide synthase-encoding gene. Similar unlinking of the galE gene from the galTK genes has been observed from expression studies in Vibrio cholerae (137) and Erwinia stewartii (138) as well as from examinations of the collective bacterial genomes. The

galTK genes cluster (Figure 2-5) with a group of three genes encoding sugar metabolism proteins

corresponding to, 1) a LacI-like repressor with homology to the GalR repressor of the gal operon, 2) a

GalM mutarotase that catalyzes the conversion of α-D-glucose to β-D-glucose and 3) a MalK/UgpC-

like ATPase component of an ABC transporter. Further downstream of this group of genes is a cluster

of three genes encoding a putative ABC transporter for glycerol-3-phosphate, Ugp, but it is unknown

at this time if a functional link exists between the two gene clusters. A gene encoding a putative

glucose-1-phosphate uridylyltransferase (GalU) was identified and is spatially separated from the other

gal genes. A search for these genes in Dv. vulgaris revealed orthologs for only the galE and galU

genes. This observation is not surprising as the GalE and GalU genes are necessary for the production

of lipopolysaccharide (LPS) and extracellular polymeric substances (EPS).

To determine the distribution of the galactose metabolic genes among close relatives of Dv.

desulfuricans G20, the comparative search was expanded to include the other currently sequenced δ- proteobacterial species (Table 1-1). Orthologs of galE and galU were identified in all of the species examined and orthologs of galT were identified in Desulfuromonas acetoxidans and the two sequenced 58 Geobacter species studied (Table 2-5). However, of the eight δ-proteobacterial species examined, only

Dv. desulfuricans G20 contained a complete set of recognizable galactose metabolic genes. In

addition, only Dv. desulfuricans G20 contained homologs of the galactokinase (GalK) and the

mutarotase (GalM). GalK amino acid sequences were collected from the publicly-available databases

and this sequence library was used to construct a phylogenetic tree (Figure 2-6). This phylogeny

coupled with sequence comparison data showed that the GalK enzyme of Dv. desulfuricans G20 is

most similar to the GalK protein of the rumen bacterium Ruminococcus albus (Blast2Seq, BLASTP

algorithm, BLOSUM62 scoring matrix, Score 281 bits, 42% identity). Interestingly, the R. albus GalK

enzyme did not cluster with those from the other Gram positive Firmicutes. The R. albus/Dv.

desulfuricans cluster is instead appears to be most closely associated with the galactokinase enzymes

of the actinobacterial species.

59 Figure 2-3. Secondary carbon metabolism pathways of Dv. desulfuricans G20. The boxed portion of the pathway refers to the glycolytic pathway common to both studied Desulfovibrio species. Red arrows represent pathways only identified in Dv. desulfuricans G20. Abbreviations used are as follows: Glu = Glucose, Gal =

Galactose, Fru = Fructose, Tre = Trehalose, GalE = UDPgalactose 4-Epimerase, GalT

= Galactose-1-Phosphate Uridylyltransferase, GalK = Galactokinase, GalM =

Mutarotase, OtsAB = α,α-Trehalose-Phosphate Synthase

60 Figure 2-3. Secondary carbon metabolism pathways of Dv. desulfuricans G20

GalM α-D-glu β-D-glu β-D-gal α-D-glu-1P GalM GalETK OtsAB α-D-gal α-D-glu-6P α,α-tre

β-D-fru β-D-fru-6P Fructokinase β-D-fru-1,6P2

Glyceraldehyde-3P

Pyruvate

Common Glycolytic Pathway

61 Figure 2-4. Predicted galactose metabolism of Dv. desulfuricans G20. The portion of the pathway linking glycolysis to LPS production (GalE, GalU) is common to both Desulfovibrio species examined. Enzymes are as indicated in the legend for Figure 2- 3 except for galU encoding glucose-1-phosphate uridylyltransferase.

62 Figure 2-4. Predicted galactose metabolism of Dv. desulfuricans G20 LPS, Capsule, etc.

O O O O O P O O O UDP O galK O galT O O O O O O O O O O O

α-D-Galactose α-D-Galactose-1P UDP-D-Galactose

galE O O O P O O O UDP O galU/F O O Glycolysis O O O O O O α-D-Glucose-1P UDP-D-Glucose

63 Figure 2-5. Predicted gal operon of Dv. desulfuricans G20 and the accompanying promoter region of galT. The numbers between the genes represent intergenic sequence lengths based on the ORNL ORF coordinates. Promoter elements were found using the SoftBerry BPROM program (http://www.softberry.com/). Putative enzymes encoded by genes galTK and galM are given in legend for Figure 2-3. Gene galR encodes gal operon repressor, malK, maltose transport system ATB-binding protein (homologous to ugpC); ugp, sn-glycerol-3-phosphate transport system (ugpB, periplasmic binding protein; ugpA, integral membrane protein; ugpE, integral membrane protein). Proximity suggests that the malK gene is incorrectly annotated and is more likely to encode the structurally-similar UgpC protein.

64 Figure 2-5. Predicted gal operon of Dv. desulfuricans G20

65 Table 2-5. Distribution of the galactose metabolic enzymes among the δ- proteobacteria. a VIMSS numbers of corresponding ORF b Blast algorithm used for comparison: P = BLASTP, T = TBLASTN c Complete names for organism abbreviations are listed in Table 1-1 d Blast score (bits)/% identity

66 Table 2-5. Distribution of the galactose metabolic enzymes among the δ- proteobacteria

Dv. desulfuricans G20 Protein GalE GalK GalT GalM GalU

VIMSS394648a VIMSS393474 VIMSS393475 VIMSS393472 VIMSS395239 Pb Dvuc 447/67d - - - 436/82 T Dps 241/- - - - 229/- T Dau 141/- - - - 236/- P Gme 316/51 - 93/23 - 270/51 P Gsu 301/52 - 99/25 - 272/51 P Dac 288/46 - 109/28 - 291/51 P Bba 101/27 - - - 223/43 T Bma 248/44 - - - 208/46 T Mxa 103/40 - - - 246/50

67 Figure 2-6. Phylogeny of GalK. GalK from Dv. desulfuricans G20 is labeled green.

68 Figure 2-6. Phylogeny of GalK

γ-Proteobacteria

Firmicutes (Gram+)

δ-Proteobacteria Firmicutes (Gram+)

Actinobacteria

69

2.3.1.3.2. Physiological Studies of Galactose Metabolism in Dv.

desulfuricans G20

No evidence of galactose metabolism by Dv. desulfuricans G20 has been documented in the scientific literature. However, this organism represents a recently isolated strain that is not well- represented in the classical literature and thus galactosidic activity of the strain may not yet have been documented. Furthermore, the tendency for bacteria to excise unnecessary coding regions makes it unlikely that the organism would maintain a complete set of galactose metabolism genes if they were not used in some physiological capacity. Thus, based on the lack of physiological data and the insights gained from the genomic comparisons, physiological studies were initiated to determine if Dv. desulfuricans G20 was capable of galactose metabolism.

Growth studies were conducted as described in Materials and Methods under a variety of conditions designed to assay utilization of galactose as the sole carbon source. Defined lactate-sulfate

(LS4aD) medium served as the positive control and NCS4 medium served as the negative control. In medium containing 10 mM galactose as the sole carbon source, no growth was observed under respiratory or fermentative conditions (Figure 2-7, Table 2-6), nor was growth observed under these conditions when the galactose concentration was increased to 60 mM. Furthermore, growth was not observed under mixotrophic conditions of 10 mM galactose supplemented with 1g/l yeast extract or 1 mM acetate (Figure 2-7). Cells grown in LS4aD medium supplemented with 10 mM galactose (LGS4) showed no significant change in growth rate or final cell density nor was growth on galactose as the sole carbon source observed with cells precultured with galactose present, in LGS4 medium (results not shown). Growth assays conducted under the same conditions using glucose, fructose, mannose and trehalose as the sole carbon source showed similar negative results.

While growth was not observed under the conditions assayed, galactose might still be incorporated into the cell for the production of LPS or EPS. Dv. desulfuricans cultures were inoculated into various media containing [1-14C]galactose to determine if the radiolabeled carbon was

incorporated into cellular material and the results are listed in Table 2-7. Galactose values in Table 2-7 70 were calculated as described in the footnotes for Table 2-7. As expected, medium containing only galactose as the sole carbon source did not support growth of Dv. desulfuricans and subsequently no significant incorporation of the radiolabeled carbon was observed. Conversely, for cells grown in lactate-galactose-sulfate media supplemented with [1-14C]galactose, the cell pellet was shown to

reproducibly associate with nanomolar amounts of the label corresponding to ~20-30% of the predicted

amount of sugar. The label remained associated with the cell pellet even after further incubation with

additional unlabelled galactose.

For galactose to be incorporated into cell material in any capacity the galactose genes would

have to be expresse and the expression of at least some of the genes might be modulated by the

presence or absence of galactose or a derivative. Regulation of the gal operon in E. coli is complex,

with the operon utilizing two promoters and two distinct transcriptional start sites. Transcription from

the start sites are positively (S1) or negatively (S2) controlled by the cAMP receptor protein (CRP).

The ratio of gal operon products is dependent on which transcriptional start site is employed (136).

Because there is no such information available regarding the expression of the galactose metabolism

genes in Dv. desulfuricans G20, a simplifying assumption was made that the gene cluster encompassing galTK would be expressed as a single transcription unit with no differential expression of the individual genes. An initial analysis of the expression of the galT gene was conducted using

RT-PCR with cells grown in LS4aD medium with and without the presence of 10 mM galactose.

When galactose was present in the medium, expression of the galT gene was shown to be upregulated

(Figure 2-8).

71 Figure 2-7. Growth phenotype of Dv. desulfuricans G20 on galactose. Cultures were grown as described in Materials and Methods with a final galactose concentration of 60 mM. For media composition see Table 2-2.

72 Figure 2-7. Growth phenotype of Dv. desulfuricans G20 on galactose

1000

LS4aD LGS4 x 1000 600 OD

GS4

G4, NCS4 100 0 10203040506070 Time (hr)

800

LS4aD 700

600

500

x 1000 400 600

OD 300

200 GYS4 GAS4, NCS4 100

0 0 5 10 15 20 25 30 35 40 45 time (h)

73 Table 2-6. Total protein assay of Dv. desulfuricans G20 grown with galactose.

Samples were collected at t=16.5 h and total cellular protein was determined using the

Bradford method with 25, 50 and 100 µl aliquots of the samples shown in Figure 2-7.

BDL indicates that the OD595 reading was below the detection limit of the instrument.

74 Table 2-6. Total protein assay of Dv. desulfuricans G20 galactose growth (t=16.5 hr)

Growth Medium µg protein/ml culture Uninoculated No Carbon Medium (NCS4) BDL Lactate-Sulfate (60/50) (LS4aD) 173.8 ± 15 Lactate-Galactose-Sulfate (60/10/50) (LGS4) 195.8 ± 13 Galactose-Sulfate (10/50) (GS4) BDL Galactose (50) (G4) BDL Inoculated No Carbon Medium (NCS4) BDL Overnight Lactate-Sulfate (60/50) (LS4a) Culture 204.4 ± 8

75 Table 2-7. Incorporation of [1-14C]galactose by Dv. desulfuricans G20. Calculations were made using approximations of chemical composition of prokaryotic cells according (139) (protein ~ 50% dry weight, total sugar ~ 10% dry weight). The calculation was made as follows:

100 µl [1-14C]galactose (0.11 mCi/ml, 0.55 mCi/mmol) in 10 ml culture yields 0.2

µmole [1-14C]galactose/100 µl.

Using the protein value for LGS4-grown cells from Table 2 as an approximation

(~200 µg/ml), the total predicted amount of sugar in the sample is 40 µg (200 * 2 /

10). Dividing by the molecular weight of galactose (180 g/mol) gives a predicted mass of total galactose of 220 nmol.

100 µl 0.11 mCi/ml [1-14C]galactose in 10 ml culture = 1.1 µCi/ml culture. Using the conversion factor 2.2x106 dpm/µCi gives 2.42x106 dpm/ml culture. With a total galactose concentration of ~10 mM, or 10 µmole galactose/ml culture, this yields

0.242x106 dpm/µmole galactose.

Assuming 1 dpm = 1 cpm, the experimental value for the LGS cell pellet (100 µl) is

1188 cpm in 100 µL, or 11880 dpm/mL. Using 0.242x106 dpm/µmole galactose yields 50 nmole [1-14C]galactose in the sample.

50 nmole [1-14C]galactose in 220 nmole total galactose gives a final yield of ~22% incorporation.

76 Table 2-7. Incorporation of [1-14C]galactose by Dv. desulfuricans G20

% total sugar nm [1-14C]galactose in Sample tested cpm composed of 10 ml culture galactose Blank 36.80 - - LGS4a, supernatant 139887 ± 23601 5800 - (100µL) GS4, supernatant (100µL) 213131 ± 5229 8800 - LGS4a, final wash (100µL) 69 ± 26 - - GS4, final wash (100µL) 213 ± 136 - - LGS4a, cell pellet (100µL) 1188 ± 407 50 22% LGS4a, cell pellet (200µL) 1833 ± 1118 75 34% GS4, cell pellet (100µL) 95 ± 39 - - GS4, cell pellet (200µL) 142 ± 71 - -

77 Figure 2-8. Galactose-dependent expression of galT evaluated by RT-PCR. A)

Total cellular Dv. desulfuricans RNA used for RT-PCR. B) Expression of galT was measured in LS4a medium in the presence and absence of 10 mM galactose as described in the text. The cycA gene encoding the constitutively-expressed cytochrome c3 was used as the positive control. Molecular weight marker sizes are listed on the figure (bp). Expected product size for galT = 315 bp, cycA = 377 bp.

78 Figure 2-8. Gal-dependent expression of galT determined by RT-PCR

) alT alT

g g cycA

-HinDIII + control LGS4 LS 4a LS 4a LGS4 LS4a ( λ

23130

9416 6557 4361

2322 2027

564

A) B)

79

2.4. Discussion

2.4.1. Computational Prediction of Central Carbon Metabolism

The results of a control BLASTP analysis using experimentally verified Desulfovibrio protein sequences against the predicted Dv. vulgaris Hildenborough proteome are listed in Appendix A-1. The control set was constructed using three sets of sequences: 1) protein sequences from Dv. vulgaris

Hildenborough (expected exact matches), 2) protein sequences from the related strain Dv. vulgaris

Miyazaki (expected homologs), and 3) protein sequences from other Desulfovibrio species that were not expected to be present in Dv. vulgaris Hildenborough. Each of the experimental Dv. vulgaris

Hildenborough sequences returned as the best hit the corresponding sequence from the predicted proteome (self-to-self) and these hits were regarded as true positives. In certain cases such as the periplasmic Fe-only hydrogenase large subunit (DVU1769) or the cytochrome c553 (DVU1817), the self-to-self hit displayed less than 100% sequence identity. In each of these cases, the primary cause of the loss of sequence identity was due to the loss of comparison data resulting from the use of the

BLAST algorithm’s low complexity filter. The BLAST algorithm automatically removes low complexity regions that can result in high scores reflecting compositional bias (140), but there are indications that the use of this filter may damage the performance of BLAST in certain situations (68).

The control analysis was conducted a second time with the low complexity filter turned off which resulted in the elimination of most large gaps and thus improved sequence identity scoresas well as improved statistical scores (+30-50 bits). This added significance came with the price of increased computational time and additional low-significance hits that are removed when the filter is applied.

Remaining instances of <100% sequence identity were due to mismatch events at individual nucleotides, but in all such instances the E scores of the pairings were sufficiently high to represent self-to-self matches.

Statistical cutoffs were used to determine the rate of false positives in the analysis. A false positive in this analysis was defined as any returned hit (besides the self-to-self hit) with an E score <

1e-10. In cases such as the NiFe hydrogenase small subunit isozyme 1 (DVU1921) sequence, some of

80 the false positives corresponded to known paralogs of the enzyme (NiFe hydrogenase small subunit isozyme 2 (DVU2525) and NiFeSe hydrogenase small subunit (DVU1917)). In the case of DcrH, a member of the methyl-accepting chemotaxis protein family, 27 false positives corresponding to methyl-accepting chemotaxis proteins were returned. While this represents an extreme case, it does demonstrate that multiple family members within the same organism can make functional assignment difficult. In most cases, a more stringent statistical cutoff of 100 bits (corresponding to E score ≈ 1e-20)

was sufficient to eliminate most of the false positives returned for the control set. False positives

remaining after applying the more stringent cutoffs tended to correspond to known and highly

conserved paralogs. While these cutoffs are sufficient for eact or highly homologous sequence pairs,

comparisons with more distantly related proteins are likely to require less stringent cutoffs. Based on

this analysis, a score cutoff of 50 bits (E score ≈ 1e-7) was chosen for the axamination of experimental data.

Two enzymes not known to be encoded in Dv. vulgaris Hildenborough (periplasmic nitrate reductase, NapA, from Dv. desulfuricans ATCC27774 and neelaredoxin from Dv. gigas) were used as

negative controls for comparisons to Dv. vulgaris. The neelaredoxin protein sequence did not return

any statistically significant hits when compared to Dv. vulgaris. NapA, on the other hand, returned

three significant hits corresponding to molybdopterin-binding proteins. Molybdopterin-binding

enzymes share common structural features that make functional assignment of these proteins difficult.

Members of the family of molybdopterin-binding enzymes include nitrate reductase, formate

dehydrogenase, thiosulfate reductase and DMSO reductase. While no NapA was identified from Dv.

vulgaris, the results of the BLAST demonstrate that common structural families can result in false

assignments. Despite the significant E score of these false positives, none of the returned sequences

displayed more than ~25% sequence identity, placing them firmly in the twilight zone of sequence

homology. Thus, for weakly homologous sequence pairs, both the E score and % identity should be

examined in order to determine the significance of the hit.

The carbon metabolism data elucidated from the genomic sequences are generally in

accordance with the accumulated biochemical data currently documented in the literature (Table 2-4).

The presence of complete glycolytic/gluconeogenic pathways as well as the enzymes necessary for 81 synthesis of polyglucose would seem to support the physiological evidence for the accumulation and metabolism of detectable amounts of polyglucose by various Desulfovibrio species. Despite the

presence of these cycles, no metabolism of external glucose has been observed for these species either

through physiological experimentation or by a search of the available literature. A possible means of

assaying the functionality of the glycolytic pathway in Desulfovibrio would be through the

introduction of an exogenous glucose permease transporter followed by growth assays under

respiratory and fermentative conditions. Biochemical evidence for hexose metabolism has been

obtained for two species of Desulfovibrio, the fructose-oxidizing Dv. fructosovorans (33) and Dv.

gigas which is believed to metabolize polyglucose.(34, 37). The sequencing of the genomes of these

species may provide insights into the general nature of hexose metabolism by Desulfovibrio.

2.4.2. Galactose Metabolism of Desulfovibrio desulfuricans G20

Under the conditions assayed, Dv. desulfuricans G20 was not observed to utilize galactose as an energy or carbon source despite the identification of genes encoding all of the enzymes necessary for galactose metabolism. In addition, no genes encoding classical galactose transporters, such as the

Mgl ABC-transport complex, were identified. A similar phenotype was observed in Bacillus subtilis which maintains a set of galactose metabolism genes homologous to that of Dv. desulfuricans without the ability to grow on galactose as the sole carbon source (141). In that system, however, metabolism of galactose has been demonstrated by the observation that a mutant strain interrupted in the galE- accumulates toxic levels of galactose metabolism intermediates (141).

Incorporation of [1-14C]galactose by Dv. desulfuricans was examined and galactose appears

to be incorporated into cell material. The exact nature of this incorporation is not known. Chemical

analysis of the EPS of Dv. desulfuricans strains shows detectable levels of galactose (142, 143). This

observation coupled with the observation that Dv. desulfuricans cannot grow on galactose as the sole carbon source suggests that galactose metabolism occurs in the cells primarily for the purpose of polysaccharide biosynthesis. Such a system would be similar to the observed galactose metabolism of

B. subtilis. Alternatively, if galactose is able to enter the pentose-phosphate pathway via glycoloysis,

82 14 the radiolabelled carbon could be liberated as CO2 which could subsequently be reincorporated by other metabolic processes. In either case, in depth studies of EPS biosynthesis in Dv. desulfuricans coupled with mutagenic studies of the galactose metabolism genes (in particular galE) would be the next logical steps in solidifying the role of galactose metabolism in Dv. desulfuricans.

The genetic separation of galE from galT and galK observed in Dv. desulfuricans appears to be a common occurrence among bacterial species (Figure 2-9). GalE and GalU are required enzymes for the synthesis of cell wall polysaccharides and the genes encoding these enzymes are thus ubiquitous throughout the eubacterial kingdom. The unlinking of galE from the other gal genes may allow an organism to lose the ability to metabolize galactose through selective pressure without compromising the ability to synthesize polysaccharides. This may explain why Dv. desulfuricans has maintained the entire pathway while the closely-related Dv. vulgaris has not. However, there is currently no evidence implicating a selective pressure on Dv. desulfuricans that would require the retention of the galactose metabolism genes.

No dedicated hexose transporters were identified from the genomic sequences. Several ABC- type transporters were identified but none corresponded to a classical Mgl transporter, nor were genes for glucose permease identified. A gene set encoding a seemingly complete complement of PTS components was identified from both Desulfovibrio species studied, but the substrate-specificity of such a system is difficult to determine from sequence analysis alone. As has been previously mentioned, the PTS system has been shown to encompass a variety of functions beyond merely hexose transport, including regulation of carbon and nitrogen utilization and chemotaxis response (105, 129,

130). It is quite possible that the PTS components observed in Desulfovibrio are geared towards a function unrelated to hexose transport.

83 Figure 2-9. Arrangement of gal genes in bacteria. Genes are color-coded as follows: galM (red), galK (blue), galT (black), galE (yellow), galU (green), galR/S (white),

Mgl transporter component genes (cyan). Comparisons were made using the Genome

Browser function of the VIMSS web site using the genomic annotation data available at that site. Bacterial name abbreviations are defined in Table 2-8. Only galU and galE appear to be universal.

84 Figure 2-9. Arrangement of gal genes in bacteria

Eco

Dde

Dvu

Dps

Gme

Gsu

Dac

Son

Vch

Vpa

Vvu

Pmu

Hin

Cpe

Cte

Bsu

Spn

Lla

Blo

85 Table 2-8. Bacterial name abbreviations for Figure 2-9

Abbreviation Full Name Eco Escherichia coli K12 Dde Desulfovibrio desulfuricans G20 Dvu Desulfovibrio vulgaris Hildenborough Dps Desulfotalea psychrophila LSv54 Gme Geobacter metallireducens GS-15 Gsu Geobacter sulfurreducens PCA Dac Desulfuromonas acetoxidans Son Shewanella oneidensis MR1 Vch Vibrio cholerae Vpa Vibrio parahaemolyticus RIMD 2210633 Vvu Vibrio vulnificus CMCP6 Pmu Pastuerella multocida Hin Haemophilus influenzae RdKW20 Cpe Clostridium perfringens Cte Clostridium tetni E88 Bsu Bacillus subtilis Spn Streptococcus pneumoniae R6 Lla Lactococcus lactis subsp. Lactis Blo Bifidobacterium longum NCC2705

86 3. Regulatory Mutant of Desulfovibrio vulgaris Hildenborough

3.1. Introduction/Rationale

Information on the regulatory networks of the sulfate-reducing bacteria has traditionally been

limited by the relative genetic inaccessibility of the organisms. With the recent sequencing of the

genomes of multiple SRB species, patterns are beginning to emerge with regards to the putative

regulatory networks of the organisms. Of particular interest are members of the CRP-FNR

superfamily of transcriptional regulators.

The CRP-FNR superfamily was named after the first two characterized members, the model

transcriptional regulator cAMP-receptor protein and the fumarate-nitrate reductase regulator, both

originally characterized in Escherichia coli (144, 145). Despite markedly different environmental

responses, the two enzymes were shown to share significant structural homology and binding site

similarity (146). The activity of CRP in E. coli has been well-studied and this protein is considered the

paradigm for positive transcriptional regulation by catabolite repression. The primary activity of CRP

is to modulate carbon metabolism in response to exogenous carbon sources. For example, in the

presence of galactose as the sole carbon source, CRP is activated by cAMP and upregulates the

expression of the gal operon genes. cAMP is produced by the adenylate cyclase enzyme whose activity is in turn modulated by the bacterial PTS carbohydrate transport system. When sufficient amounts of glucose become available, adenylate cyclase activity is decreased, leading to a decrease in cytosolic levels of cAMP and inactivation of CRP (147). FNR is structurally homologous to CRP but differs primarily in its environmental response. In the active state, FNR forms a dimer due to the structural effects of a labile 4Fe-4S cluster in each monomer. The active regulator primarily upregulates metabolic pathways required for anaerobic growth, such as nitrate or fumarate reduction pathways (148). When molecular oxygen diffuses into the cell, it interacts directly with the labile FeS clusters within the FNR, ultimately causing monomerization and inactivation of the regulator (149,

87 150). FNR acts in concert with a variety of other regulators to further modulate the environmental response of the cell (145). In E. coli for example, upregulation of the genes encoding nitrate reduction enzymes in response to exogenous nitrate requires the activity of both FNR and NarL (151).

As experimental data accumulated, the FNR branch of the CRP-FNR superfamily was further divided into three distinct subgroups: FNR, the FixK and DNR (152). Members of the FNR subgroup are represented by the previously described E. coli FNR and are generally considered to be directly

responsive to molecular oxygen. The second subgroup, represented by the FixK2 regulator of

Bradyrhizobium japonicum (153, 154), are primarily found in azotrophic bacteria and act in concert

with other regulatory proteins (primarily NifA) to modulate the expression of metabolic genes

necessary for nitrogen fixation (155). Unlike a true FNR, FixK regulators lack the labile FeS cluster

and are instead activated through a phosphorelay cascade by the oxygen-sensing FixLJ two component

system (156-159). Rhizobial species are often found to employ multiple FNR-like proteins in addition

to FixK, which may allow the cell to regulate its metabolism in response to the microaerobic

conditions necessary for nitrogen fixation (160, 161). The third subgroup, designated DNR, has

recently been characterized in Pseudomonas species (162, 163). Like the FixK proteins, DNR lacks

the labile FeS of FNR but the exact mechanism of activation is not known (164). Instead, DNR

appears to respond to the presence of nitric oxide and have been shown to upregulate genes necessary

for denitrification (163-165).

Prior to the genomic era, the accumulated knowledge of FNR proteins suggested that they

were only found in facultative aerobes. As the knowledge base of microbial genome sequences

increased, it soon became clear that the CRP-FNR family was far more diverse and widely distributed

among the bacteria than was previously realized. A comprehensive phylogeny of the CRP-FNR

superfamily derived from information from the publicly available bacterial genome sequences has

recently been completed (166). The Körner phylogeny revealed 21 distinct clusters within the

superfamily comprising 369 enzymes. The functions of many of the regulators identified in the

analysis remain unknown. A surprising result was that the CRP-FNR proteins were not limited to

facultative aerobes as previously believed but rather covered the gamut of bacterial species, both

aerobic and anaerobic. Included in this analysis were four CRP-FNR proteins identified from the 88 genomic sequence of Dv. desulfuricans G20 (a similar set had previously been identified from the genomic sequence of Dv. vulgaris by this laboratory). One of the Dv. desulfuricans G20 proteins was classified as a CooA regulator of carbon monoxide dehydrogenase (167). Two additional G20 proteins clustered with a third protein from the sulfur-reducing bacterium Desulfitobacterium hafniense to form a previously unidentified subgroup designated “cc”. The final Dv. desulfuricans G20 protein clustered with proteins from Geobacter metallireducens and Synechocystis sp PCC6803 in a distinct clade

within the larger DNR cluster (166).

As is to be expected, such a global phylogeny represents only the accumulated genomic

dataset at any given time. As such, sequences from several species of interest (in particular Dv.

vulgaris) were not included. A relevant subset of the Körner sequences was combined with CRP-FNR sequences from absent δ-proteobacterial species to construct a more focused phylogenetic tree. Based on these data, a CRP-FNR protein from Dv. vulgaris that appears to be common to the examined sulfate-reducing bacteria and Geobacter species was chosen for further laboratory analysis. A strain of

Dv. vulgaris interrupted in ORF DVU2754, the gene encoding this δ-proteobacterial protein, was constructed and a phenotypic analysis was performed to determine possible roles and environmental responses of the regulator.

89

3.2. Materials and Methods

Availability of genomic and protein sequences. Genomic sequences of the following

organisms were obtained from the following sources: Dv. vulgaris Hildenborough, Geobacter sulfurreducens PCA, The Institute for Genomic Research Comprehensive Microbial Resource

(http://www.tigr.org/tigr-scripts/CMR2/CMRHomePage.spl) (168) ; Dv. desulfuricans G20, Geobacter

metallireducens GS-15 and Desulfuromonas acetoxidans, The Joint Genome Institute

(http://www.jgi.doe.gov/); Bdellovibrio bacteriovorus HD100, Max-Plank Institute for Developmental

Biology (http://www.eb.tuebingen.mpg.de/schuster/research_bd.htm); Bacteriovorax marinus,

WellcomeTrust Sanger Institute (http://www.sanger.ac.uk/Projects/B_marinus/); Desulfobacterium autotrophicum HRM2 and Desulfotalea psychrophila LSv54, Real Environmental Genomix Project

(http://www.regx.de); The sequences for most of these species are also available for analysis through

the DOE Virtual Institute for Microbial Stress and Survival (VIMSS, http://www.VIMSS.org).

Comparisons of CRP-FNR protein sequences. Amino acid sequences of CRP-FNR

enzymes used in this study were obtained from the publicly available sequence databases. Direct

comparisons between the Desulfovibrio CRP-FNR protein sequences and those from the other publicly

available δ-proteobacterial genomic sequences were performed using BlastP with a bit score cutoff of

50.

Phylogenetic analysis of the CRP-FNR proteins. Previously uncharacterized CRP-FNR

sequences with homology to those of the Desulfovibrio species studied were identified by comparing

the amino acid sequences of the Desulfovibrio proteins against the comprehensive proteome databases

BLASTP with a bit score cutoff of 100. Additional representative sequences of previously identified

CRP-FNR proteins were added for reference and completion of the phylogeny. The sequence set was

aligned with ClustalX 1.83 using the Gonnet 250 matrix and the following alignment parameters:

pairwise gap opening 35, pairwise gap extension 0.75, multiple gap opening 15, multiple gap extension

0.30. A phylogenetic tree was constructed from this data set using the neighbor-joining algorithm of

90 ClustalX and the robustness of the tree was analyzed by bootstrap analysis. Trees were viewed and manipulated using the ATV software package (http://www.genetics.wustl.edu/eddy/atv) (126).

Bacterial strains and primers. The bacterial strains and oligonucleotide primers used in this

study are listed in Tables 2-1 and 2-2. Bacterial plasmids were propagated in E. coli JM109 (Promega)

and purified by CsCl2-density gradient centrifugation. Oligonucleotide primers used for PCR

amplification were obtained from Integrated DNA Technologies (IDT) (Table 2-1).

Standard growth of bacterial strains. Media used for these analyses are summarized in

Table 2-3. Overnight cultures of Dv. vulgaris strains were grown in LS5a medium defined as follows:

60 mM sodium lactate, 50 mM Na2SO4, 8.0 mM MgCl2, 20 mM NH4Cl, 1.0 mM K2HPO4 dibasic, 0.6

mM CaCl2, 1X Thauer’s Vitamins (defined below, add after autoclaving), 1X Trace Minerals (defined below), 50 mM NaCl, 1M Tris-HCl (pH 8.0), 1g/l yeast extract. The pH was adjusted to ~7.2 w/ 0.5 M

NaOH and the medium was sterilized by autoclaving. Media were reduced with 30 ml/l Ti-Citrate

(described below). A defined version of this media lacking yeast extract was designated LS5a.

Solid medium was made as described above with 1.5% wt/vol agar and was reduced with 30 ml/l Ti-citrate and 1.2 mM sodium thioglycolate. Cells to be plated were transferred to 4 ml top agar

(24 mM PIPES, 1.5% wt/vol agar, pH 7.2) except where explicitly stated otherwise and the mixture was spread on previously reduced plates and incubated anaerobically at 37 ºC.

Trace mineral, vitamin, Ti-citrate and carbonate solutions were used as described in Chapter 2

Materials and Methods.

PCR amplification and cloning of a DVU2547 internal fragment. Genomic DNA was prepared using the Wizard Genomic Prep Kit (Qiagen). PCR amplification was performed using a

Mastercycler Gradient thermal cycler (Eppendorf, Hamburg, Germany). The full Dv. vulgaris ORF

DVU2547, previously annotated as hbaR (TIGR), was amplified from genomic DNA using the hbar_full_for and hbar_full_rev primers (Table 2-3). The resulting amplicon was cloned into pGEM

T-Easy PCR cloning vector (Promega) to produce plasmid pHbaR. pHbaR was subsequently used as the template for the PCR amplification of a 472 bp internal fragment of ORF DVU2547 using primers hbaR_071502_for and hbaR_071502_rev (Table 2-3). The resulting amplicon was cloned into the pDrive PCR cloning vector (Qiagen) to produce the mutagenic vector pHB1 (Figure 3-1). 91 Preparation of electrocompetent Dv. vulgaris cells. A volume of 9 ml LS5aD was inoculated with 1 ml of a Dv. vulgaris -80 ºC freezer stock and incubated at 37 °C for ~2 days. An inoculation of 2 ml of this culture was made into 50 ml fresh LS5aD and the diluted culture was grown overnight at 37 °C to an OD 600 nm of 0.5-0.8. After harvesting of the cells by centrifugation at 4300 x g for 10 min, the supernatant was discarded and the cell pellet was resuspended in 1 ml of chilled electroporation buffer (EP; sterile 10% v/v glycerol in 30 mM PIPES, pH 7.0) and kept on ice. The cell suspension was diluted to 50 ml using EP buffer and centrifuged as above. The cell pellet was resuspended in 0.5 ml EP buffer and used immediately for transformation.

Transformation of Dv. vulgaris by electroporation. Prior to transformation, electroporation cuvettes were transferred to an anaerobic hood and were allowed to deoxygenate for 1 hour. A 50-µl

portion of electrocompetent Dv. vulgaris was added to a prechilled microcentrifuge tube and mixed by

repeated pipeting with CsCl2-pure pHB1 plasmid (Figure 3-1) (200 ng in a volume not exceeding 10%

of the cell volume). The mixtures were transferred to the anaerobic hood and placed in the previously

deoxygenated cuvettes. Electroporation was performed using an exponential decay ECM 630

electoporator (BTX) with parameters 1750V, 250Ω, 25µF. Following electroporation, 950 µl LS5a medium was gently added to each cuvette, and the cells were incubated overnight at 37 °C for recovery. Following cell recovery, the cells were resuspended in the cuvette and plated (90 and 900 µl

in 3 ml top agar) on solidified LS5a medium containing 400µg G418 antibiotic/ml (Invitrogen,

Carlsbad, CA) that selected for cells receiving the plasmid. Control cells were plated after 10-6, 10-5 and 10-4 dilutions on LS5a plates to determine viability of transformed cells. Plates were incubated

anaerobically at 37 °C for 4-5 days. KanR colonies were transferred to 2 ml liquid LS5a medium

containing 400 µg G418/ml. Plasmid insertion was confirmed by PCR analysis (Table 3-1, Figure 3-

2). The Dv. vulgaris mutant strain chosen for study was designated HBAR5.

Growth media for phenotypic analysis of HBAR5. Basal no-carbon medium lacking a utilizable carbon source was prepared as described for LS5aD medium except for the absence of lactate. For experiments involving respiratory growth, the medium still contained 50 mM sodium sulfate and was designated NCS5. Where fermentative growth was being analyzed, no-carbon media containing 5 mM sulfate as a sulfur source was used and this medium was designated NC5. Carbon 92 sources to be tested as growth substrates were prepared separately and filter sterilized as 2 M stock solutions (sodium lactate, sodium pyruvate, sodium formate, and absolute ethanol). Contamination of the media was monitored both before and after inoculation with SRB cells by streaking on LC agar plates (10 % wt/vol tryptone, 5% wt/vol yeast extract, and 5% wt/vol NaCl) incubated aerobically at

37ºC.

Phenotypic analysis of HBAR5 and wild-type Dv. vulgaris. Cultures of Dv. vulgaris strain

JW906 and the mutant strain HBAR5 (Table 2-1) were grown in LS5a medium for 1-2 days at 37 ºC.

HBAR5 culture media were additionally supplemented with 400 µg G418 antibiotic/ml. 2 ml of the

resulting cultures were diluted into 20 ml of the same medium and grown to OD 600 nm of 0.5-0.8. Prior

to inoculation in the various test media, the cells were harvested by centrifugation (1900 x g,10 min)

and resuspended in sufficient anaerobic NCS5 medium to bring the cells to an equal OD 600 nm of 0.5-

0.6. The cultures were incubated anaerobically 3-4 hours at 37 ºC to exhaust carryover carbon stores.

The resulting carbon-starved cells were inoculated 1:10 to NCS5 or NC5 medium as appropriate and 3

ml of the appropriate culture was placed in rubber-stoppered 10x100 mm Kimex tubes. The

appropriate carbon source was added to each tube to a final concentration of 60 mM and the cultures

were incubated at 37 ºC. Growth was followed by measuring the OD 600 nm and by determining total

protein using the Bradford method (125). Sulfate and sulfide concentrations were periodically

measured as described below.

Sulfate and sulfide assays. Sulfate was measured using a colorimetric BaSO4 assay. A four ml volume of ethanol/acetic acid solution (100 ml 100% (v/v) glacial acetic acid, 75 ml 100% (v/v) ethanol, 150 ml ddH2O) was added to test tubes containing 400 µl experimental samples diluted to 1

ml with ddH2O. Ba-chloranilate solution, 100 µl (1 g Ba-chloranilate stirred overnight in 50 ml 50%

(v/v) ethanol, washed and resuspended in 10 ml 100% (v/v) ethanol), was added to each sample and vortexed three times to ensure complete mixing. The mixtures were centrifuged 5 min @ 5000 rpm and the absorbance of the supernatant was measured at 578 nm using a Genesys 20 Thermospectronic spectrophotometer. A standard curve was prepared in the same manner over a range of 0-40 mM sulfate.

93 Sulfide was measured using a colorimetric CuS assay. A standard sulfide concentration curve

was prepared using six sealed Hungate tubes containing 10 ml anoxic water and flushed with 100% N2.

Crystalline Na2S·9H2O, 0.06 g, was dissolved in a tube containing 10 ml anoxic water to give an initial

sulfide concentration of 50 mM. A concentration range was established by serial dilution (1:1) of 1.6-

50.0 mM. Each standard (50 µl) was added to 4 ml copper reagent (5 mM CuSO4 in 5 mM HCl) in

100x10 mm Kimex test tubes. Absorbance was measured at 480 nm with Genesys 20

Thermospectronic spectrophotometer. A 50 µl portion of each experimental sample was assayed in

triplicate as described for the standards.

94 Figure 3-1. Plasmid insertion of pHB1 to produce Dv. vulgaris mutant strain

HBAR5. Mutagenesis procedure is described in Materials and Methods.

Oligonucleotide primers are listed in Table 2-3 and are numbered as follows: 1) hbaR_071502_for, 2) hbaR_071502_rev, 3) hbaR_full_for, 4) hbaR_full_rev, 5) pDrive_Kan_f, 6) pDrive_Kan_r, 7) T7 and 8) SP6. Primer pairs used for PCR verification of plasmid insertion are listed in Table 3-1. Figure is not to scale.

95 Figure 3-1. Plasmid insertion of pHB1 to produce Dv. vulgaris mutant strain HBAR5

5 Kanr Ampr 6 pHB1 Plasmid f1 Ori (4324 bp) 8 1

lacZ’ 7 b c

2

3 1 Dv. vulgaris Genomic ab c d 2 4 (ORF DVU2547)

3 1 6 8

a b c 5 b c d 7 2 4 Mutant strain designated HBAR5

96 Table 3-1. Verification of pHB1 plasmid insertion creating Dv. vulgaris mutant

HBAR5 by PCR. Samples correspond to sample lanes in Figure 3-2. a pHB1 plasmid insertion into ORF DVU2547 as described in Materials and Methods b Oligonucleotide primer sequences listed in Table 2-3. c Predicted product size in bp except where noted. Predicted size of products from samples 12 and 18 vary based on site of insertion. The predicted ~5 kb amplicon of sample 3 was not expected to be amplified due to its large size and no such product was observed. Al other predicted bands were observed when the reactions were performed (Figure 3-2).

97 Table 3-1. Verification of pHB1 plasmid insertion creating Dv. vulgaris mutant HBAR5 by PCRa

Sample # Primer 1b Primer 2 Template Expected Expected Product Product Sizec Observed? 1 hbaR_full_f hbaR_full_r wt Genomic 612 y 2 hbaR_full_f hbaR_full_r pHB1 NP NP 3 hbaR_full_f hbaR_full_r HBAR5 Genomic 4936 n 4 pDrive_Kan_f pDrive_Kan_r wt Genomic NP NP 5 pDrive_Kan_f pDrive_Kan_r pHB1 684 y 6 pDrive_Kan_f pDrive_Kan_r HBAR5 Genomic 684 y 7 T7 SP6 wt Genomic NP NP 8 T7 SP6 pHB1 616 y 9 T7 SP6 HBAR5 Genomic NP NP 10 T7 hbaR_full_f wt Genomic NP NP 11 T7 hbaR_full_f pHB1 NP NP 12 T7 hbaR_full_f HBAR5 Genomic <800 y 13 T7 hbaR_full_r wt Genomic NP NP 14 T7 hbaR_full_r pHB1 NP NP 15 T7 hbaR_full_r HBAR5 Genomic NP NP 16 SP6 hbaR_full_r wt Genomic NP NP 17 SP6 hbaR_full_r pHB1 NP NP 18 SP6 hbaR_full_r HBAR5 Genomic <800 y

98 Figure 3-2. PCR verification of pHB1 plasmid insertion. Sample lane numbers corresponding to specific primer pairs against wt genomic, HBAR5 genomic and pHB1 plasmid are as listed in Table 3-1.

99 Figure 3-2. PCR verification of plasmid insertion.

PCR Samples HinDIII HinDIII HinDIII λ λ 10 11 12 13 14 15 16 17 18 1 2 3 4 5 6 7 8 9

100

3.3. Results

3.3.1. Genomic Analysis of CRP-FNR Proteins of Desulfovibrio

Species

The genes encoding the putative CRP-FNR proteins of the Desulfovibrio species examined in this study are listed in Table 3-2. An examination of an early annotation of the Dv. vulgaris

Hildenborough genomic sequence revealed two genes encoding putative CRP-FNR proteins of the

DNR subfamily. These two genes were initially annotated as dnr (DVU0379, ORF05277 in the older annotation) and hbaR (DVU2547, ORF03210). HbaR is a CRP-FNR protein of the DNR family from

Rhodopseudomonas palustris that responds to and regulates the metabolic degradation of 4- hydroxybenzoate (169). Based on the known and predicted metabolism of Dv. vulgaris, it seems unlikely that DVU2547 encodes a true 4-hydroxybenzoate-responsive HbaR and in fact, this designation has been dropped in the most current versions of the annotation. However, the name was retained for purposes of internal laboratory nomenclature and strain designations pending the establishment of a defined phenotype.

Further reviews of the Dv. vulgaris genomic sequence revealed two additional CRP-FNR- encoding genes, the cooA (DVU2097) regulator of carbon monoxide dehydrogenase and an uncharacterized CRP-FNR protein (DVU3111). The recently completed genomic sequence of Dv. desulfuricans G20 has a qualitatively similar set of four CRP-FNR genes. These Dv. desulfuricans

G20 genes were independently integrated into a global phylogeny of the CRP-FNR superfamily (166).

Drawing upon that work, a study was initiated to determine the homology, if any, between the two sets of Desulfovibrio species genes and among the Desulfovibrio genes and those from related δ- proteobacteria. The δ-proteobacterial species examined are listed in Table 1-1.

The set of four Dv. vulgaris CRP-FNR protein sequences were compared to the set of four

CRP-FNR protein sequences of Dv. desulfuricans G20 using BLASTP as described in Materials and

Methods (Table 3-3). Three of the Dv. vulgaris proteins were clearly orthologous to the four Dv.

101 desulfuricans proteins in the following manner (based on the global phylogeny nomenclature of

Körner et al., 2003): CooA (DVU2097, Dde VIMSS393976), DNR (DVU2547, Dde VIMSS394258) and cc (DVU379, Dde VIMSS392986, and Dde VIMSS395500). The fourth CRP-FNR protein of Dv. vulgaris (DVU3111) shows weak homology to members of the cc subfamily but the returned BLASTP score alone was insufficient to make a definitive assignment. An examination of the global CRP-FNR phylogeny showed that the Dv. desulfuricans protein Dde VIMSS394258 (corresponding to

DVU2547) clustered with a similar protein from the related δ-proteobacteria Geobacter metallireducens, suggesting that these genes may be common to the δ-proteobacteria. To test this hypothesis, the CRP-FNR protein sets of the two Desulfovibrio species were compared to the available genomic sequences of the δ-proteobacteria (several of which were not included in the global CRP-FNR phylogeny) using BLASTP (Table 3-3). Putative orthologs of these Desulfovibrio genes were identified in all of the free-living δ-proteobacterial species examined. Weak hits to DVU2547/Dde

VIMSS394258 with scores below the assigned cutoffs were observed for the parasitic

Bdellovibrionaceae species (50

To complement these results, a phylogenetic tree was constructed from a set of relevant CRP-

FNR proteins as described in Materials and Methods (Figure 3-3). While the data set used was, by design, less comprehensive than that used to construct the original global CRP-FNR phylogeny, it was assumed that the addition of new sequences would not significantly alter the landscape of that phylogeny. While differences were noted in the arrangement of the large clusters, the composition of the smaller clusters correlated well with that of the global phylogeny (166). As predicted from the

Blast experiment and the global phylogeny, the sequences from the free-living δ-proteobacterial species corresponding to DVU2547/Dde VIMSS394258 clustered with the corresponding

Desulfovibrio sequences within the larger DNR cluster. No Bdellovibrioacea CRP-FNR proteins were observed to cluster with these proteins. As observed in the global phylogeny, the DNR protein from the cyanobacteria Synechocystis sp. PCC6803 (gi 13375) also clustered in this group as did a newly included protein from the cyanobacteria Anabaena variabilis ATCC29413 (gi 45509266).

102 Table 3-2. CRP-FNR genes of Desulfovibrio. a Primary annotation numbers for Dv. vulgaris from TIGR-CMR annotation, for Dv. desulfuricans G20 from ORNL annotation b gi number is the National Center for Biotechnology Information (NCBI) sequence identifier c Phylogenetic nomenclature as defined in (166).

103 Table 3-2. CRP-FNR genes of Desulfovibrio

Primary VIMSS Species Annotation ORF Annotation ORF gi Numberb Phylogenyc Numbera Number Dv. vulgaris DVU0379 209314 46580951 This study DVU2097 207572 46581514 This study DVU2547 208044 46578795 This study DVU3111 208629 46580502 This study Dv. desulfuricans 786 394258 23474021 DNR 2072 395500 23475483 cc 2963 393976 23473780 CooA 3637 392986 23474426 cc

104 Table 3-3. Comparisons of the CRP-FNR proteins of the δ-proteobacterial species.

BLASTP was used for the comparisons except for Bma, Dau and Dps (TBLASTN). a Dv. vulgaris ORF numbers from TIGR-CMR, Dv. desulfuricans G20 from VIMSS b Six digit VIMSS numbers used where available c Blast score (bits)/% Identity

105 Table 3-3. Comparisons of the CRP-FNR proteins of the δ-proteobacterial species

Dv. vulgaris Hildenborougha Dv. desulfuricans G20a DVU0379 DVU2097 DVU2597 DVU3111 394258 395500 393976 392986 Dde 394258b 169/41c - - 72/24 - - - - 395500 - 159/37 ------393976 - - 278/62 - - - - - 392986 ------Dps 424635 - - 162/39 - 167/41 - - 63/25 Dau orf42 59/25 50/21 174/39 - 195/44 50/22 59/26 69/23 Dac 391024 51/24 ------391000 - 50/27 153/35 - 170/38 - 54/24 - Gme 379469 63/25 149/35 50/26 167/42 50/25 - 68/25 Gsu 383747 60/24 - 159/37 51/27 167/41 - 50/22 69/26 Bba 412839 - - 81/21 - 99/28 - - 54/20 Bma - - - 85/27 - 103/28 - - 59/21

106 Figure 3-3. Phylogenetic grouping of CRP-FNR genes. Full bacterial names given in the tree are followed by the strain and specific protein designation or ORF number, where available. Numbers at the tree branches represent bootstrap values based on

1000 trials. Group names were taken from (166). The cluster containing Dv. vulgaris

ORF DVU2547 is labeled “HbaR” in reference to an early annotation of the ORF.

107 Figure 3-3. Phylogenetic grouping of CRP-FNR genes

“HbaR”

DNR

FNR

CRP

Flp

NnrR

G

cc

CooA

108

3.3.2. Phenotypic Analysis of an CRP-FNR Mutant Strain of

Desulfovibrio vulgaris Hildenborough

The phylogenetic analysis described above suggested that the gene family represented by Dv.

vulgaris ORF DVU2547 may represent a conserved family among the free-living δ-proteobacteria and

possibly some cyanobacterial species. A mutant strain of Dv. vulgaris interrupted in ORF DVU2547

was constructed by plasmid insertion as described in Materials and Methods (Figure 3-1) and this

strain was designated HBAR5.

Based on the accumulated knowledge of the environmental responses of members of the

CRP-FNR superfamily, it was hypothesized that the regulator encoded by DVU2547 might be

responsive to the cellular redox state by unknown mechanisms. To test this supposition, growth of the

control strain JW906 and HBAR5 strains was tested under respiratory and fermentative conditions

using a variety of individual carbon sources which also served as the sole electron donor. The general

growth phenotypes are summarized in Table 3-4.

HBAR5 and JW906 strains grown on lactate-sulfate, pyruvate sulfate, pyruvate alone or

formate-sulfate showed similar comparative growth rates and final cell densities (Figure 3-4) as well as

similar levels of protein, sulfate and sulfide (Table 3-5). The most striking phenotype observed was

the complete inability of the HBAR5 strain to grow in medium containing ethanol as the sole carbon

source and electron donor (Figure 3-5). Ethanol oxidation was tested under a range of ethanol

concentrations over 100+ hours with no evidence of growth by the HBAR5 strain (results not shown),

suggesting that DVU2547 is essential for growth of Dv. vulgaris on ethanol.

109 Table 3-4. General growth phenotypes of Dv. vulgaris wt and HBAR5 strains.

110 Table 3-4. Growth phenotypes of Dv. vulgaris wild type and mutant HBAR5 strains

Wild type HABR5 Electron Donor and Acceptor (mM) Sulfate Sulfate Growth Growth Reduction Reduction Lactate/Sulfate (60/50) (LS5aD) + + + + Pyruvate/Sulfate (60/50) (PS5) + + (decreased) + + (decreased) Pyruvate (60) (Fermentation) (P5) + NA + NA Formate/Sulfate (60/50) (FS5) - + - + Formate/Sulfate + Acetate (60/50/1) + + + + (FAS5) Ethanol/Sulfate (60/50) (ES5) + + - - No Carbon + Sulfate (50) (NCS5) - - - -

111 Figure 3-4. Growth of JW906 and HBAR5 Dv. vulgaris on lactate-sulfate, pyruvate- sulfate, pyruvate alone and formate-sulfate. Substrate concentrations are as follows: lactate, 60 mM; pyruvate, 60 mM; formate, 60 mM; sulfate, 50 mM. In each plot,

JW906 is designated by ■, HBAR5 by □. a) Lactate-Sulfate, b) Pyruvate-Sulfate

(respiration, 5 mM sulfate), c) Pyruvate (fermentation, 5 mM sulfate), d) Formate-

Sulfate. Data points are the average of triplicate samples and are indicative of four experiments under these media conditions.

112 Figure 3-4. Growth of JW906 and HBAR5 Dv. vulgaris on lactate-sulfate, pyruvate- sulfate, pyruvate alone and formate-sulfate

a)

1000

100 OD600 x 1000

10 0 5 10 15 20 25 30 35 40 45 50 Time (hr)

b)

1000

100 OD600 x 1000

10 0 5 10 15 20 25 30 35 40 45 50 Time (hr)

113 Figure 3-4, con’t. c)

800 700 600 500 400 300

OD600 x 1000 200 100 0 0 5 10 15 20 25 30 35 40 45 50 Time (hr)

d)

1000

100 OD6001000 x

10 0 5 10 15 20 25 30 35 40 45 50 Time (hr)

114 Figure 3-5. Growth phenotypes of Dv. vulgaris wt and HBAR5 strains on ethanol- sulfate (60/50 mM). In each plot, wt is designated by ■, HBAR5 by □. Solid lines denote ethanol-sulfate data; dotted lines denote no carbon (NCS5) data.

115 Figure 3-5. Growth of wt and HBAR5 Dv. vulgaris on ethanol

1000

x 1000 100 600 OD

10 -5 5 15 25 35 time (hr)

116 Table 3-5. Total protein, sulfate and sulfide determination (t = 23.0 h). Assays were performed using 1 ml culture (cell pellet for protein, supernatant for sulfate/sulfide). a) Protein concentration was measured using 25, 50 and 100 µl volumes of the cell pellet resuspended in 500 µl 0.5 N NaOH. BDL indicates that the signal was below the spectrophotometric detection limit. b) Sulfate concentrations were measured using 100, 200 and 400 µl supernatant; sulfide concentrations were measured using three 50 µl samples of the supernatant.

117 Table 3-5. Total protein, sulfate and sulfide determination (t = 23.0 h)

a)

µg Protein/100 µL Sample @ t = 23 OD Sample @ t = 23 h Sample 600 h JW906 HBAR5 JW906 HBAR5 Lactate-Sulfate (60/50) 0.604 ± 5.3 0.623 ± 12.7 48.7 ± 7.4 60.1 ± 3.3 Pyruvate-Sulfate (60/50) 0.667 ± 9.1 0.735 ± 14.5 54.8 ± 4.6 60.4 ± 1.8 Pyruvate (60) (5 mM 0.650 ± 0.664 ± 9.8 15.3 ± 3.8 BDL sulfate) 13.6 Formate-Sulfate (60/50) 0.427 ± 0.416 ± 18.6 BDL BDL 18.6 NCS5 (no carbon) 0.081 ± 6.6 0.087 ± 7.2 BDL BDL

b)

Sample [Sulfate] (mM) [Sulfide] (mM) JW906 HBAR5 JW906 HBAR5 Lactate-Sulfate (60/50) 29.7 ± 1.1 29.5 ± 3.1 19.0 ± 1.3 23.3 ± 3.8 Pyruvate-Sulfate (60/50) 38.3 ± 4.8 39.5 ± 5.5 1.8 ± 1.2 2.4 ± 0.8 Pyruvate (60) (5 mM 0.8 ± 0.2 0.9 ± 0.02 1.6 ± 1.3 1.6 ± 0.6 sulfate) Formate-Sulfate (60/50) 33.4 ± 12.7 39.5 ± 6.8 6.4 ± 2.5 6.3 ± 0.6

118

3.4. Discussion

Phylogeny of the Desulfovibrio CRP-FNR genes. An analysis of the genomic sequences of two

Desulfovibrio species reveals multiple genes encoding putative CRP-FNR regulatory proteins. Dv.

vulgaris and Dv. desulfuricans each appear to encode a set of at least four CRP-FNR genes, with three

of the genes clearly common between the two species: CooA (DVU2097/Dde VIMSS393976), DNR

(DVU2547/Dde 394258) and members of the cc subgroup (DVU0379/Dde VIMSS395500/Dde

VIMSS392986). The fourth CRP-FNR gene of Dv. vulgaris (DVU3111) is most closely related to the

Desulfovibrio cc genes but appears to be more divergent than the other members of that subgroup.

Orthologs of the Dv. vulgaris protein encoded by DVU2547 were identified from the genomic

sequences of the four currently sequenced sulfate-reducing species (Dv. vulgaris, Dv. desulfuricans,

Dt. psychrophila, and Db. autotrophicum) as well as both Geobacter species and Dm. acetoxidans.

Weak hits were observed in comparisons with the parasitic Bdellovibrionaceae species, but the

phylogeny presented does not support the inclusion of those genes in the DVU2547 cluster. The

cluster also contains members from two cyanobacterial species, Synechocystis sp. PCC6803 (DNR)

and Anabaena variabilis ATCC 29413. The phylogeny presented here as well as the global CRP-FNR

phylogeny of (166) suggests that the regulators represent a distinct subgroup within the DNR family of

the CRP-FNR regulators. These proteins are discussed in more detail below.

The cc family previously identified in the global phylogeny was filled out with two CRP-FNR

proteins from Dv. vulgaris (DVU0379, DVU 3111) to bring the total known members of this family to

five, all of which represent proteins from sulfidogenic microbes (2 sulfate-reducing, 1 sulfur-reducing).

The fourth Dv. vulgaris CRP-FNR protein, annotated as CooA regulator of carbon monoxide

dehydrogenase, was confirmed to cluster with the previously characterized CooA proteins.

An examination of the genomic arrangements of the genes encoding the CRP-FNR genes of

the Desulfovibrio species suggests possibilities as to the functions and environmental responses of the

active proteins (Figure 3-6). In each Desulfovibrio species, the cooA gene lies immediately upstream

of a putative cooSC operon encoding the active carbon monoxide dehydrogenase (CODH) and a

119 CODH Ni-insertion accessory protein (DVU2097-2099; Dde VIMSS393974-5), an arrangement that appears to be conserved from the model CODH operon of Rhodospirillum rubrum (170-172). CO may serve as the sole electron donor for growth and sulfate reduction in Dv. vulgaris strain Madison, suggesting that the cooS gene is expressed in Desulfovibrio species (173). Furthermore, a CO cycling model analogous to the classical hydrogen cycling model (47) has recently been proposed for Dv. vulgaris whereby CO is produced and consumed during the oxidation of organic acids under sulfate- respiring conditions (172). Based on the accumulated genomic and physiological evidence for CO metabolism by Dv. vulgaris, it can reasonably be assumed that ORF DVU2097 (and, by extension,

ORF Dde VIMSS393976) constitutes a true CO-responsive cooA gene. Interestingly, genes for a membrane-bound CO-dependent hydrogenase complex similar to that of R. rubrum have been identified in Dv. vulgaris (cooMKLXUHF, DVU2286-2293) but not in Dv. desulfuricans, suggesting that carbon monoxide metabolism may proceed differently between the two species (172).

The functions of the remaining CRP-FNR proteins of Desulfovibrio are less clear, but the genomic environments of the genes encoding these proteins suggest that the internal redox state of the cell may play a role in the activity of the regulator. The genes of the cc subgroup members are prime examples of this. Each of the three definitive cc genes of the two Desulfovibrio species (DVU0379,

Dde VIMSS395500, Dde VIMSS392986) display marked synteny (Figure 3-6, Table 3-5), with ORFs

DVU0379 and Dde VIMSS392986 maintaining identical gene arrangements over the length of five

ORFs (DVU377-381, Dde 392984-8). For a given species, these ORFs putatively encode an nhaC-like

Na+/H+ antiporter, a putative sulfatase, the CRP-FNR protein, a trxA thioredoxin and a trxB thioredoxin reductase. The lengths of the intergenic sequences separating these ORFs suggest that, in each species, these five genes are cotranscribed as an operon along with the downstream genes shown in Figure 3-6. The second Dv. desulfuricans G20 cc protein, Dde VIMSS395500, appears to be the result of a duplication and genomic rearrangement of the region encompassing Dde VIMSS392986, with sulfatase-encoding gene having switched places with the CRP-FNR paralog and the trxAB cluster being maintained. Furthermore, the CRP-FNR that is separated by 387 bp from the sulfatase gene appears to constitute a separate transcription unit from that of the trxAB-sulfatase cluster. In all three cases, the presence of the trxAB genes cotranscribed with, or in proximity to, the CRP-FNR genes 120 provides circumstantial evidence that the active CRP-FNR proteins are responsive to the cellular redox

state.

The fourth CRP-FNR protein of Dv. vulgaris (DVU3111) shows only weak homology to one

of the cc proteins of Dv. desulfuricans (Dde VIMSS392986) but still clusters with these proteins in the

phylogeny (Figure 3-3). However, the genomic environment of DVU3111 shows no similarity to the

three Desulfovibrio cc genes previously described. The cluster of three genes (including a putative

carbamoyl-phosphate synthase carA gene) immediately upstream of DVU3111 (DVU3112-3114) is

conserved in Dv. desulfuricans G20, but the cluster DVU3109-3111 including the CRP-FNR gene, is

not observed in the second Desulfovibrio species. Based on intergenic sequence lengths (123 bp

upstream, 247 bp downstream), DVU3111 appears to be monocistronic and is situated upstream of a

putative operon containing genes for a second nhaC gene (DVU3108), an unidentified FeS protein

(DVU3109) and an L-aspartate oxidase-like protein (DVU3110). This group of genes was examined

using the TIGR-CMR genome region comparison tool available through the individual gene

information page for DVU3110. Four additional bacterial species were identified which maintained

this pair arrangement (Mesorhizobium loti MAFF303099, Bradyrhizobium japonicum USDA110,

Pseudomonas putida KT2440 and Pseudomonas aeruginosa PAO1). In each of these species, the gene encoding the FeS protein was identified as a putative ferredoxin. Unlike Dv. vulgaris, no CRP-FNR genes were observed in the vicinities of these clusters, though in each of the Pseudomonas species, the cluster was preceded by a regulatory protein of the GntR family. Regardless, the presence of a proximate gene encoding a ferredoxin-like protein again suggests a redox response.

Phenotypic analysis of the Dv. vulgaris HBAR5 mutant strain. A phenotypic analysis of the Dv. vulgaris HBAR5 mutant strain revealed that the strain was unable to utilize ethanol as the sole carbon source and electron donor. Previous studies have suggested that ethanol oxidation in Dv. vulgaris is attributed solely to an alcohol dehydrogenase, adh, encoded by ORF DVU2405 (designated

ORF02977 in that publication) (174). These data combined with the results of this study suggest that expression of the adh gene may be regulated by the gene product of DVU2547, CRP-FNR.

Furthermore, a mutant strain of Dv. vulgaris lacking the Fe-hydrogenase has been shown through proteomic and transcriptome analysis to downregulate expression of adh and upregulate expression of 121 ORF DVU2543 (ORF03201), which encodes an FeS-containing hybrid-cluster protein of unknown function (174). The proximity of the gene encoding the hybrid-cluster protein to DVU2547, as well as the observation that expression of the two genes appears to be linked under conditions of nitrate stress

(He), suggests a tenuous functional link between the two expressed proteins.

An examination of the genomic environment in the region of the genes encoding the hybrid-cluster protein DVU2543 and the CRP-FNR protein DVU2547 revealed the presence of an intervening operon containing an unidentified histidine kinase response regulator (DVU2546) and a gene encoding an alcohol dehydrogenase (DVU2545) with homology to the EutG alcohol dehydrogenase of Salmonella species. EutG has been proposed to maintain redox balance during metabolism of ethanolamine in

Salmonella typhimurium by reducing excess acetaldehyde produced to ethanol (101). No other genes encoding obvious ethanolamine metabolic enzymes were identified in Dv. vulgaris but several have been identified in Dv. desulfuricans G20. The alcohol dehydrogenase encoded by DVU2545 may play a similar redox-balancing role in Dv. vulgaris under certain metabolic conditions. In addition to the two genes previously described, the annotation of Dv. vulgaris suggests at least five additional genes encoding iron-containing alcohol dehydrogenase enzymes (DVU0353, DVU2201, DVU2396,

DVU2885 and DVU2905), the functions of which are currently unknown.

The E. coli alcohol dehydrogenase encoded by the adhE gene catalyzes the conversion of acetyl-CoA to ethanol during anaerobic growth in order to maintain the redox balance of the cell

(Membrillo-Hernandez). Unlike most anaerobic metabolism genes of E. coli, expression of adhE is

independent of FNR activity under standard physiological conditions (175). Analyses of the promoter

region revealed two putative transcriptional start sites at positions -188 and -292 from the translational

start site as well as putative binding sites for NarL, FNR and Cra (catabolite repressor activator). FNR

has been experimentally shown to regulate adhE expression from the -188 start site but only in the

physical absence of the -292 start site, suggesting that the upstream region acts as a silencer of the -188

site (176). FNR binding sites have also been identified in the promoters of adhE genes of

Actinobacillus pleuropneumoniae, Clostridium acetobutylicum, Lactococcus lactis and Salmonella

typhimurium (Membrillo-Hernandez). These results suggest that physiological conditions may exist

under which ethanol oxidation is modulated by the activity of FNR. An examination of the promoter 122 region of the Dv. vulgaris adh gene (DVU2405) revealed a putative binding site

(CTATGTTTTTTCTGGCATATGT) at position -310 and -68 from the predicted translational start

sites of DVU2405 and DVU2406 (hypothetical), respectively, that is similar to the CRP-binding site

(CTATGTGATGTTTTTCACATAA) identified in the promoter of the sulfatase of Dv. desulfuricans.

At this time, there is no experimental evidence to verify the function, if any, of this site.

123 Figure 3-6. Genomic environments of the genes encoding the CRP-FNR proteins of

Desulfovibrio. This figure is a diagrammatic representation of genes in region of

CRP-FNR encoding genes as determined by the genomic annotations of the relavent institutions. ORF numbers below each putative gene are the TIGR-CMR (for Dv. vulgaris) or the truncated VIMSS (for Dv. desulfuricans, i.e. [VIMSS39]5500 for the fourth crp-fnr gene). Open arrows represent apparent genes and arrow heads indicate the direction of transcription. When available, descriptive gene names or names of encoded enzymes are given above each arrow.

124 FIGURE 3-6. Genomic environments of the genes encoding the CRP-FNR proteins of Desulfovibrio

125 Table 3-6. Description and comparative analysis of the genes listed in Figure 3-6. a Dvu ORF numbers are TIGR-CMR followed by VIMSS. Dde ORF numbers are

VIMSS. b Best match in second Desulfovibrio species (Dvu to Dde or Dde to Dvu) based on

BLASTP comparison. Dashes indicate no ortholog was observed.

126 Table 3-6. Description and comparative analysis of the genes listed in Figure 3-6.

Gene Best Hit to BLASTP ORF Number Gene Description aa Length Name Ddea Scoreb DVU2543 (208040) Hybrid Cluster Protein 539 2760 (394261) 856 DVU2544 (208041) FeS Cluster-Binding Protein 271 2761 (394260) 304 Alcohol Dehydrogenase, DVU2545 (208042) 386 2057 (395532) 396 Fe-Containing Sensory Box Histidine DVU2546 (208043) 512 2024 (395609) 253 Kinase Transcriptional Regulator DVU2547 (208044) 228 786 (394258) 278 (CRP-FNR) Acyl Carrier Protein DVU2548 (208045) acpD 209 - - Phosphodiesterase Transcriptional Regulator DVU2097 (207572) cooA 223 2963 (393976) 159 (CRP-FNR) Carbon Monoxide DVU2098 (207573) cooS 629 2964 (393975) 870 Dehydrogenase DVU2099 (207574) cooC CODH Accessory Protein 293 2965 (393974) 218 Pyruvate Ferredoxin/Flavodoxin DVU374 (209309) porA 832 3641 (392982) 1060 Oxidoreductase Family Protein Glu/Leu/Phe/Val DVU375 (209310) Dehydrogenase Family 409 - - Protein DVU376 (209311) Hypothetical Protein 88 - - DVU377 (209312) trxB-1 Thioredoxin Reductase 305 3639 (392984) 372 DVU378 (209313) trxA Thioredoxin 104 3638 (392985) 106 Transcriptional Regulator DVU379 (209314) 225 3637 (392986) 169 (CRP-FNR) DVU380 (209315) Putative Sulfatase 520 3636 (392987) 875 DVU381 (209317) nhaC-1 Na+/H+ Antiporter 463 3635 (392988) 572 Cytochrome c Family DVU3107 (208625) 589 3698 (392898) 526 Protein DVU3108 (208626) nhaC-2 Na+/H+ Antiporter 475 3635 (392988) 538 DVU3109 (208627) FeS Cluster-Binding Protein 66 904 (394449) 53 DVU3110 (208628) Putative L-Asp Oxidase 573 2032 (395577) 88 Transcriptional Regulator DVU3111 (208629) 244 3637 (392986) 72 (CRP-FNR) DVU3112 (208630) TPR Domain Protein 267 367 (393480) 305 Carbamoyl-Phosphate DVU3113 (208631) carA 375 366 (393479) 608 Synthase, Small Subunit 3-Deoxy-Manno- DVU3114 (208632) kdsB Octulosonate 252 365 (393478) 332 Cytidylyltransferase DVU3115 (208633) Hypothetical Protein 77 - -

127 Table 3-6. Cont

Dde ORF Gene aa Best Hit to BLASTP Gene Description Number Name Length Dvua Scoreb DVU2495 2765 (394255) Hypothetical Thioesterase 171 125 (207992) Hypothetical Protein (VirR- DVU0158 2764 (394256) 167 54 Related) (209090) DVU2549 2763 (394257) Hypothetical Protein 90 87 (208046) Transcriptional Regulator (CRP- DVU2547 786 (394258) 248 293 FNR) (208044) 2762 (394259) uspA Universal Stress Protein Family 168 - - DVU2544 2761 (394260) Ferredoxin 3 253 293 (208041) DVU2543 2760 (394261) Hybrid Cluster Protein 575 812 (208040) DVU2099 2965 (393974) cooC CODH Accessory Protein 268 167 (207574) DVU2098 2964 (393975) cooS Carbon Monoxide Dehydrogenase 625 867 (207573) Transcriptional Regulator (CRP- DVU2097 2963 (393976) cooA 216 241 FNR) (207572) 2962 (393977) Hypothetical Protein 280 - - Hypothetical Acyl-CoA Synthetase DVU0373 3642 (392981) 820 1085 (NDP Forming) (209308) Pyruvate:Ferredoxin DVU0374 3641 (392982) porA 832 1045 Oxidoreductase, α-Subunit (209309) 3640 (392983) Hypothetical Protein 137 - - DVU0377 3639 (392984) trxB Thioredoxin Reductase 305 348 (209312) DVU0378 3638 (392985) trxA Thioredoxin 103 106 (209313) Transcriptional Regulator (CRP- DVU0379 3637 (392986) 221 169 FNR) (209314) DVU0380 3636 (392987) Probable Sulfatase 520 889 (209315) DVU0381 3635 (392988) nhaC Na+/H+ Antiporter 483 550 (209317) Hypothetical Transmembrane DVU3081 2077 (395495) 317 154 Protein (209317) Hypothetical Carboxylate 2076 (395496) 461 - - Transport Protein DVU0377 2075 (395497) trxB Thioredoxin Reductase 305 325 (209312) DVU0378 2074 (395498) trxA Thioredoxin 102 93 (209313) DVU0380 2073 (395499) Probable Sulfatase 507 689 (209315) Transcriptional Regulator (CRP- DVU0379 2072 (395500) 233 53 FNR) (209314)

128

4. Computational Prediction of Regulatory Motifs of Desulfovibrio

Species

This chapter is adapted from C.L. Hemme and J.D. Wall, “Genomic Insights into Gene

Regulation of Desulfovibrio vulgaris Hildenborough”, 2004 Omics 8(1), 43-55. The data presented in

that paper has been updated to include individual and comparative analyses of both Dv. vulgaris

Hildenborough and Dv. desulfuricans G20. Sections that are taken verbatim from the paper are

indicated in the text or presented as quotes.

4.1. Introduction/Rationale

The availability of the genomic sequences of Desulfovibrio species enables the elucidation of not only individual regulatory proteins but also the elucidation of regulatory networks. A computational approach was adopted to reveal potential transcriptional regulatory sequences of

Desulfovibrio. A strategy developed by Church and coworkers (118, 177) for detecting conserved motifs in the upstream regions in bacterial genomes using the AlignACE suite of programs was applied to the sulfate-reducing bacteria. Such motifs may represent binding sites for regulatory proteins and identification of the motifs may provide keys to the regulatory networks of the sulfate-reducing bacteria. The approach used requires that genes be organized into putative regulons based upon biologically relevant grouping strategies. Alignment of the upstream regions of the genes in the groups by AlignACE returns a set of putative motifs that are then analyzed for statistical significance (178).

This method was successfully used by McGuire et al. (118) to predict regulatory motifs in 17 bacterial genomes, both in individual bacteria and in groups of organisms. AlignACE was able to detect not only experimentally verified regulatory motifs from Escherichia coli but also potentially unique motifs 129 from less well-studied organisms such as the archaeal sulfate-reducer Archeoglobus fulgidus. As the

AlignACE strategy requires only a genomic sequence and an annotated coordinate list, it is ideal for analyzing less genetically accessible organisms such as the sulfate-reducing bacteria.

130

4.2. Materials and Methods

The Materials and Methods section is as presented in Hemme and Wall, 2004 with minor

corrections regarding score cutoffs and the inclusion of data from Dv. desulfuricans.

Determination of putative regulons. Potentially coregulated genes were grouped from Dv.

vulgaris and Dv. desulfuricans alone and combined using three primary strategies: 1) genes representing orthologs of known E. coli regulons (179), 2) genes composing functional metabolic pathways derived from the KEGG database (http://www.genome.ad.jp/kegg/kegg2.html), and 3) genes composing conserved operons as defined in the WIT database (http://wit.mcs.anl.gov/WIT2).

Orthologs were identified using reciprocal TBLASTN (72) searches as described by McGuire et al.

(118) using a minimum bit score cutoff of 100.

Extraction of intergenic sequences. Operons were predicted based on the lengths of the intergenic sequences of the genome. ORF’s separated by <20 bp were automatically considered to be part of the same operon. Arbitrary nucleotide length cutoffs of 100 and 300 bp were used to define the maximum distances between potentially coregulated genes. ORF’s separated by fewer base pairs than the maximum cutoff were considered to be part of an operon. Regardless of the intraoperon cutoff used, the promoter sequence of an operon was assumed to be located within the sequence 300 bp upstream of the putative operon head. The intergenic and promoter sequences for each regulon were extracted from the Dv. vulgaris and Dv. desulfuricans genomes and collected into single- and multi- species libraries using ad hoc Perl scripts. Each library consisted of the putative promoter sequence of each operon in the group plus all of the intergenic sequences greater than 20 bp and less than the given cutoff from each operon. Only groupings containing 3 or more promoter sequences encompassing at least 3 different regulon members were retained for further analysis. Groupings for the combined

Desulfovibrio sequences (designated SRB) were constructed by combining the sequences from the relevent individual groups, i.e. glycolysis metabolism groups for Dv. vulgaris and Dv. desulfuricans were combined into a single SRB glycolysis metabolism group.

131 Statistical analysis of the motifs. The sequences were analyzed using the AlignACE suite of programs (http://arep.med.harvard.edu/). AlignACE, based on the Gibbs-sampling algorithm, performs local alignments of DNA sequences and returns a list of putative motifs along with a maximum a priori log likelihood (MAP) score for each. The MAP score is an indication of the overrepresentation of the motif in the input sequences but gives no indication as to its specificity to the input sequences (177). In order to determine the specificity of a motif, the ScanACE program was used to scan for additional instances of the motif in the genome and the information returned was used to calculate the site specificity (Ssite) score for the motif (177). Ssite is defined as the probability that x or more hits will occur randomly in the input sequences compared to the entire genome.

⎛ s1 ⎞⎛ N − s1 ⎞ min(s1,s 2) ⎜ ⎟⎜ ⎟ ⎝ i ⎠⎝ s2 − i ⎠ Ssite = ∑ ⎛ N ⎞ i=x ⎜ ⎟ ⎝ s2 ⎠

Where:

N = total number of basepairs in the genome(s) considered,

s1 = total number of ScanACE hits considered (200),

s2 = total number of basepairs of the input sequences,

x = number of the top 200 hits returned by ScanACE that occur within the input

sequences

Many regulatory proteins bind to palindromic DNA sequences and thus the palindromicity score of a returned motif was used as an additional parameter for determining the statistical

-20 significance of a motif. General statistical cutoffs of MAP > 5 and Ssite < 1e were chosen based on the control analyses of McGuire and Hughes (118, 177) (see Chapter 4 Results and Discussion). The

-15 Ssite cutoff was increased to 1e when considering palindromic motifs. Palindromicity scores for

132 motifs were calculated by comparing a motif to its reverse complement using the CompareACE program (177). The score returned by CompareACE is the correlation coefficient (0

two position-specific scoring matrices of the best alignment of two motifs. Motifs with a score >0.7 were considered to be palindromic (177). In order to reduce the effect of repetitive elements in the analysis, motifs in which more than half of the aligned sites were derived from a single upstream region were excluded (177).

Comparison of experimental motifs to motifs verified in E. coli. Conservation of motifs between different organisms can be examined using CompareACE. A CompareACE score >0.7 suggests significant similarity between two motifs (177). Using this criterion, experimental motifs from the Desulfovibrio species were compared to known regulatory motifs from E. coli

(http://arep.med.harvard.edu/ecoli_matrices) to determine if any conservation of regulatory elements

exists between the organisms.

Sequence Data. Preliminary sequence data for Dv. vulgaris was obtained from The Institute

for Genomic Research website at http://www.tigr.org (v07June03). Preliminary sequence data for Dv. desulfuricans G20 was obtained from the Oak Ridge National Laboratory Computational Biology website at http://genome.ornl.gov/microbial/ddes/22dec03 (v22dec03). E. coli motif data was obtained from http://arep.med.harvard.edu/ecoli_matrices/ (179).

133

4.3. Results and Discussion

4.3.1. AlignACE Results

To determine the accuracy of the AlignACE program, control experiments were performed by

McGuire and Hughes (118, 177) whereby positive and negative control sets were constructed and

submitted to the same computational analysis as the experimental sets. A set of 32 known E. coli regulons with between 5 and 100 known regulatory protein binding sites (179) were examined using

AlignACE and 26 of the sets returned known regulatory motifs (118). The false positive rate for the E. coli motifs was high (95%) when no statistical cutoffs were used, but the number dropped drastically when MAP and Ssite score cutoffs were applied (118). The false positive rate fell to 0% when statistical

-20 -10 cutoffs of MAP >10 and Ssite <1e were applied to non-palindromic motifs or MAP >10, Ssite <1e and pal >0.7 when applied to palindromic motifs (118). However, the reduction of false positives was accompanied by a decrease in the number of true positives found, from 81% with no statistical cutoffs

-20 -10 to 28% with MAP >10 and Ssite <1e for palindromic motifs or 25% with MAP >10, Ssite <1e and pal >0.7 for palindromic motifs. Thus, the application of statistical cutoffs sufficient to eliminate false positives will subsequently result in the detection of only the most significant of true positives.

Negative controls were performed using the promoters from 50 sets each of 20, 40, 60, 80 and 100 randomly chosen Saccharomyces cerevisiae ORFs and the results were compared to those from sets of functionally related ORFs (179). Motifs from the functionally-related gene sets tended to display higher statistical scores overall than those derived from random sets, but some motifs from the random sets were observed to be statistically significant. This background noise was significantly reduced when statistical cutoffs were applied, from 3692 occurrences with no cutoffs to 5 occurrences with

MAP >10 and group specificity <1e-10 (group specificity score is an earlier statistical parameter which

can be slightly modified to give the site specificity score, Ssite). This number of motifs from random

ORFs was 10x fewer than the 54 motifs derived from functional categories with MAP >10 and group

134 specificity <1e-10 and none of the false positives derived from the random set corresponding to known motifs described in the literature. These results suggest that AlignACE tends to return relevant motifs primarily from sets of functionally related ORFs (179). Overall, the statistical cutoffs described were sufficient to eliminate the vast majority of false positives and reduce the occurrence of false negatives, with Ssite and palindromicity serving as the most important scores for determining statistical

significance. Based on these control experiments, statistical cutoffs were chosen of the Desulfovibrio

-20 -15 analyses of MAP >5 (all), Ssite >1e (nonpalindromic), Ssite >1e (palindromic) and pal >0.7

(palindromic).

Three general grouping strategies generated 393 putative Desulfovibrio regulons

(Dvu/Dde/SRB, 51/71/49 from metabolism, 42/69/31 from E. coli regulons, 49/31/25 from conserved

operons). The analysis of these regulons using AlignACE resulted in 10669 motifs with MAP>5

(Figures 4-1). An additional Dv. vulgaris regulon was constructed based on the derived -10/-35

regions of studied Dv. vulgaris genes (180). Although consensus -10 and -35 sequences have been

visually identified upstream of several Dv. vulgaris genes (180), no statistically significant motifs were returned when these sequence were analyzed with AlignACE. The variable distances between the predicted -10 and -35 regions may have interfered with their identification. Thus, all statistically significant motifs were derived from the three primary grouping strategies. From the three primary

-20 groupings, ten motifs were identified which had MAP>5, SSite< 10 , AT% < 80% (Table 4-1). Nearly

half of all known regulatory motifs are palindromic (118), therefore, analyzing the derived motifs for

those that display palindromicity should increase the probability of finding biologically relevant

sequences. Furthermore, McGuire et al. (118) showed that by considering palindromic motifs

separately, the Ssite cutoff could be increased without a significant increase in the number of false

-10 positives observed. Two additional motifs were found with MAP > 5, SSite < 10 , AT% < 80% and

palindromicity score > 0.7 (Table 4-2). Sequence logos of each of the motifs are shown in Figures 4-2

and 4-3.

135 Figure 4-1. Motifs derived from single and multi-Desulfovibrio species groupings.

10669 motifs with MAP Score > 5 were derived from all groupings (metabolism, conserved operons, and conserved E. coli regulons). Motifs in the upper right quadrant represent motifs that are over represented in the input sequences but not specific to the input sequences (i.e. Shine-Delgarno sequences, ribosomal binding

-20 sites, etc.). Motifs with MAP Score > 5.0 and SSite < 10 or palindromic motifs

-15 (palindromicity score > 0.7) with MAP Score > 5.0 and SSite < 10 were considered to be statistically significant.

136 Figure 4-1. Motifs derived from AlignACE for individual and combined Desulfovibrio regulons

250

200

150

MAP ScoreMAP 100

50

0 -30 -20 -10 0 10 10 10 10 Ssite

137 Table 4-1. Significant motifs independent of palindromicity. Motifs were collected from all groupings based on MAP Score > 5.0 and Ssite < 1e-20. a Group strategy used to derive the motifs (Metabolism, Conserved Operon, E. coli

Regulon) b Functional group used to collect the data c Dv. vulgaris (Dvu), Dv. desulfuricans (Dde), Both organisms (SRB) d Ssite calculated as described in Materials and Methods e The MAP Score is defined in Materials and Methods. MAP scores for each motif are returned by AlignACE (118)

138 Table 4-1. Significant Nonpalindromic Motifs

Group MAP Motif ID Functional Groupb Organismc S d Strategya site Scoree 1 Regulon NarP SRB 3.02e-28 29.3 Amino Acid 2 Operon Dvu 5.74e-24 21.7 Transport I 3 Regulon RpoN Dvu 1.31e-23 6.8 Ubiquinone 4 Metabolism Dvu 2.64e-23 33.3 Biosynthesis Nif1 (Nitrogen 5 Operon Dvu 4.11e-23 16.6 Metabolism) 6 Operon Pyruvate Synthase 1 Dvu 3.92e-22 14.3 7 Regulon NarP SRB 1.07e-21 23.8 Oxidative 8 Metabolism Dvu 7.87e-21 16.2 Phosphorylation 9 Operon Glutamate Synthase Dvu 9.60e-21 11.1 10 Operon Glutamate Synthase Dvu 9.60e-21 6.6

139 Table 4-2. Significant motifs based on palindromicity. Motifs were collected from all groupings based on MAP Score > 5.0 and Ssite < 1e-15. a Footnotes from Table 4-1 apply b Palindromicity (Pal) Score is the correlation coefficient returned by CompareACE for a motif sequence compared to its reverse complement. Pal Scores greater than

0.7 are considered to be palindromic (118).

140 Table 4-2. Significant Palindromic Motifsa

Group MAP Motif ID Functional Group Organism S Palb Strategy site Score 11 Regulon CpxR Dde 2.14e-19 6.3 0.75 Oxidative 12 Metabolism Dvu 6.48e-19 7.8 0.83 Phosphorylation

141 Figure 4-2. Sequence logos of nonpalindromic motifs derived from Desulfovibrio.

The logos correspond to the motifs listed in Table 4-1. The X-axis represents nucleotide position and the Y-axis represents information content in bits. The height of each individual nucleotide is a representation of the relative information content of the nucleotide at that position. Logos were generated using the alpro and makelogo programs (http://www.bio.cam.ac.uk/cgi-bin/seqlogo/logo.cgi) (181).

142 Figure 4-2. Non-Palindromic Motifs of the SRB

1) 6)

2) 7)

3) 8)

4) 9)

10) 5)

143 Figure 4-3. Sequence logos of palindromic motifs derived from Desulfovibrio. The logos correspond to the motifs listed in Table 4-2. All other footnotes from Figure 4-

2 apply.

144 Figure 4-3. Palindromic Motifs of the SRB

11)

12)

145

4.3.2. Comparisons to E.coli regulons

The derived motifs were compared to a set of 34 experimentally-defined E. coli regulatory motifs with CompareACE to determine if significant conservation existed between the two organisms.

Previous analysis of a data set for Dv. vulgaris suggested a possible glycerol-phosphate repressor

(GlpR) binding motif (182), but no significant conservation was detected in the current data set. Based

on reciprocal TBLASTN searches, Dv. vulgaris and Dv. desulfuricans appear to encode orthologs for

11 (Table 4-3) and 18 (Table 4-4) of the 34 E. coli regulatory genes, respectively. Of these orthologs,

4 from Dv. vulgaris and 4 from Dv. desulfuricans were annotated identically between the relevant

Desulfovibrio species and E. coli (Table 4-3, Table 4-4).

Tables 4-3 and 4-4 provide several insights into the regulatory systems of Desulfovibrio.

Consistent with the known carbon metabolism, the Desulfovibrio species appear to lack several genes

responsible for regulating the metabolism of alternative hexose sugars (araC, fruR, malT), while Dv.

desulfuricans maintains the previously described GalR/S regulator. A purR gene was not identified despite the observation of apparent orthologs to 13 (Dvu) and 18 (Dde) members of the E. coli PurR regulon and the fact that the Desulfovibrio species appear to encode all of the genes necessary for biosynthesis of purines. In some cases (Crp, Fnr, Fur, NtrC), instances of initial or additional orthologs were suggested in the annotation or from experimental evidence, but these genes did not meet the minimum cutoff of the reciprocal TBLASTN searches used in this study for determining orthologs and thus were not included in this analysis. Sequence similarity between members of conserved protein families, such as the two-component response regulators, leads to additional uncertainty in annotation. For example, the initial annotation of Dv. vulgaris identified ORF00741

(DVU1083) as a “DNA-binding response regulator” and is defined in Table 1 as an ortholog to ArcA

and PhoB, all of which are two-component response regulators. However, an examination of the

genomic sequences flanking ORF00741 suggested that the protein encoded by this gene may represent

a bona fide PhoB ortholog (see below).

146 A second method for the determination of orthologs of E. coli regulatory proteins is to scan a

genome of interest for instances of a particular motif, in this case the E. coli regulatory motifs. By

using this method, one is able to avoid the noise inherent in the previously described grouping

processes. Conversely, this method may introduce noise through the detection of motifs arising from

random chance. Furthermore, if multiple regulatory proteins employ similar binding motifs, the

derived dataset may not represent a “pure” regulon. The motifs used for this search were CRP, FNR,

Fur, ArcA, ModE, GalR, PurR, PhoB, NarP, NarL and ArgR. Each E. coli motif was scanned against

the combined Desulfovibrio genomic sequences, except in the case of GalR (Dv. desulfuricans only),

and the 200 top scoring motifs were collected. For each motif set, those motifs determined to lie <300

bp upstream of the putative translational start site of a given gene were collected and this subset was

further filtered to include only those motifs representing potential palindromic pairs (two instances of a

motif at the same location on opposite strands) and those representing potential tandem repeats (two or

more returned motifs in the same promoter region) were collected. The resulting regulons and returned

motifs are listed in Table 4-5. Binding sites corresponding to GalR were found upstream of the Dv.

desulfuricans G20 galTK and galR genes described in Chapter 1. A putative CRP binding site was located in the promoter of the putative sulfatase Dde VIMSS395499 which is downstream of the adjacent CRP-FNR protein (Dde VIMSS395500) described previously. The Fur motif set implicated genes/operons related to iron transport and heme biosynthesis. Potential CRP/FNR, Fur (Radionov, personal communication) and GalR (Huang, personal communication) have been independently identified upstream of certain genes in Dv. desulfuricans. Despite the absence of any obvious nitrate reductase genes, every hit returned from the NarP scan belonged to a potential palindromic pair, suggesting that this motif may play an important regulatory role in Desulfovibrio. Interestingly, the

NarP regulon included motifs for four genes encoding putative phosphate homeostasis enzymes: PhoH

(Dde VIMSS395114), PhoR (Dde VIMSS393372), phosphate permease (Dde VIMSS393373) and an alkanylphosphonate transporter component (Dde VIMSS393410). Scans with PhoB, ArgR and PurR did not return any obviously relevant regulon members. Putative PurR binding motifs were identified upstream of the galTK and galR transcriptional units, but these appear to be the same as the previously

147 identified GalR binding sites. These observations suggest that a certain degree of homology may exist

between the binding motifs of functionally similar regulators of Desulfovibrio.

148 Table 4-3. Comparison of Hypothetical Dv. vulgaris Regulons to E. coli. a From: http://arep.med.harvard.edu/ecoli_matrices/ b Based on number of putative promoters of the 300 bp cutoff conserved regulon groupings c “Ortholog” refers to identification based on reciprocal TBLASTN alone, “Yes” indicates additional confirmation from annotation, “*” indicates that the regulator was identified by other means d Primary Dv. vulgaris ORF numbers from v07112001 final draft sequence. Current

CMR numbers follow in parentheses. e Multiple groupings based on variable distance of binding site from transcriptional start site

149 Table 4-3. Comparison of Hypothetical Dv. vulgaris Regulons to E. coli.

E. coli Size of Identified E. coli Size of Identified Dv. vulgaris TBLASTN Dv. vulgaris TBLASTN Regulon in in Dv. Regulon in in Dv. ORF #d Score (bits) ORF # Score (bits) Regulatora Dv. vulgarisb vulgarisc Regulator Dv. vulgaris vulgaris AraC 1 No - - MalT 0 No - - ArcA 11 Ortholog 00741 (1083) 126 MetJ 2 No - - ArgR 5 No - - MetR 4 No - - CpxR 9 No - - ModE 2 No - - Crp 10 No* - - NagC 3 No - - CytR 0 No - - NarL 6 Ortholog 03259 (2577) 103 DnaA 1 Yes 02718 (2252) 153 NarP 5 Ortholog 03259 (2577) 106 Yes 04631 (0001) 115 FhlA 0 Ortholog 02896 (2359) 277 NtrC (GlnG) 9 Ortholog 04819 (0110) 306 Ortholog A00094(A0143) 255 Ortholog 04336 (3220) 301 Ortholog 04604 (3381) 254 Ortholog 04604 (3381) 288 Ortholog 03773 (2894) 228 Ortholog 03835 (2934) 288 Ortholog 04336 (3220) 244 Ortholog 00159 (0744) 211 FadR 0 No - - PdhR 0 No - - Fis 1 No - - PhoB 4 Yes 00741 (1083) 173 Ortholog 03280 (2588) 118 FlhCD 2 No - - PurR 13 No - -

150 Fnr 7 No* - - RhaS 0 No - - FruR 7 No - - RpoH 6/4e Ortholog 01558 (1584) 162 Fur 2 No* - - RpoN 12 Yes 01629 (1628) 263 GalR 1 No - - SoxS 4 No - - GlpR 2 No - - TrpR 5 No - - LexA 7 Yes - - TyrR 7 Ortholog 05608 (0569) 214 Ortholog 02175 (1949) 211 Ortholog 03667 (2827) 204 Ortholog 04819 (0110) 182 Ortholog 00704 (1063) 183 Table 4-4. Comparison of Hypothetical Dv. desulfuricans G20 Regulons to E. coli. Footnotes from Table 4-3 apply. Dde

ORF numbers are from the ORNL annotation. 151 Table 4-4. Comparison of Hypothetical Dv. desulfuricans G20 Regulons to E. colia.

E. coli Size of E. coli Size of Identified in TBLASTN Identified in TBLASTN Regulon in G20 ORF # Regulon in G20 ORF # G20 Score (bits) G20 Score (bits) Regulator G20 Regulator G20 AraC 1 No - - MalT 0 No - - ArcA 9 Ortholog 923 140 MetJ 2 No - - ArgR 9 No - - MetR 4 No - - CpxR 13 Ortholog 923 129 ModE 2 No - - Crp 18 No - - NagC 2 No - - CytR 0 Ortholog 3332 181 NarL 9 Ortholog 1989 117 2779 106 DnaA 1 Ortholog 3407 134 NarP 5 Ortholog 1989 116 952 208 2779 124 FadR 5 No - - NtrC (GlnG) 7 Ortholog 257 322 Ortholog 3299 317 Ortholog 1910 295 Ortholog 2946 295 Ortholog 2350 294 FhlA 2 Ortholog 2350 270 PdhR 0 No - - Ortholog 257 265 Ortholog 2914 259

152 Ortholog 2275 258 Ortholog 1506 506 Fis 0 No - - PhoB 9 Yes 923 195 FlhCD 2 No - - PurR 18 Ortholog 3332 144 Fnr 10 Ortholog 786 RhaS 0 No FruR 7 No - - RpoH 7/8 Ortholog 2510 179 Fur 2 Yes 773 102 RpoN 13 Yes 2376 322 GalR 4 Ortholog 3332 142 SoxS 6 No - - GlpR 2 Ortholog 336 197 TrpR 4 No - - LexA 11 Yes 1166 108 TyrR 7 Ortholog 2275 232 Yes 1685 118 Ortholog 2327 229 Ortholog 1505 218 Ortholog 3299 215 Ortholog 152 207 Table 4-5. E. coli regulatory motifs scanned against Desulfovibrio. Rows containing

E. coli regulator data are colored blue. a SRB ORF numbers are CMR (Dv. vulgaris) or ORNL (Dv. desulfuricans) followed by VIMSS number. b ORF associated with the returned motif. For E. coli regulator rows, this column lists the protein family to which the regulator belongs. c SRB motif returned from ScanACE. For E. coli rows, the motif scanned against the

SRB is listed. Motifs returned for palindromic pairs are for the forward strand only. d Location of the motif in bp upstream of the putative translational start site of the associated ORF. (p) indicates a putative palindromic pair (two motifs on separate strands within 2 bp of each other). Single instances of a motif are indicated by a 1

(forward strand) or a 0 (reverse strand).

153 Table 4-5. E. coli regulatory motifs scanned against Desulfovibrio

Gene Location Regulon SRB Gene #a Description/Protein Motif Scanned/Returned (5’ to 3’)c of Motifd Familyb GalR HTH LacI NTGTAANCGNTTNCAN 7,8-Dihydro-6- Hydroxymethylpterin- Dde 1149 (394851) GTAACCGTTTATA 63 (p) Pyrophosphokinase (Folate Biosynthesis) Dde 3332 (393473) GalR GCAAACGTTTGCA 25 (p) Dde 3330 (393475) GalT GCAAACGTTTGCA 76 (p) 1-Acyl-sn-Glycerol-3- Dde 2472 (394821) Phosphate GTAATCGGTTGCA 192 (p) Acyltransferase Dde 3729 (-) Hypothetical GTAAGCAGTTACA 22 (p) FNR HTH CRP NNWWTTGATNNMNVTCAANWWN Translation Initiation ATTTTTTATCACAATAAAAAAA 168 (1) Dde 1721 (395834) Inhibitor AATAATTTTTTATCACAATAAA 172 (1) TTAAATTATTTAAAACAATATG 215 (p) Dde 1644 (395661) Hypothetical TAAAATGAAAAACAGCAATCGA 196 (1) Dde 678 (405404) Hypothetical CAAATTGAATTGCATCAATTTT 211 (p) Dde 1528 (395481) Hypothetical GTATTTTAGCTAAATCAATAGG 77 (p) AAATTTTATATATCAAGATATT 115 (0) Dde 101 (393037) Hypothetical ATAAATTTTATATATCAAGATA 113 (0) Winged-Helix DNA- Fur DDWAATKATWNTCATTTN Binding Transcriptional Dvu 00192 ()206188 Regulator, AraC TTTATTGATTATGATTTT 72 (p) Family GAAATTTATTTTCATTAT 41 (p) Dde 3621 (393004) M16 Peptidase GAAAATGAAATTTATTTT 35 (p) Sensory Box Histidine GAAATTGATTTTTGTTTT 161 (0) Dvu 03422 (208178) Kinase* TATATTGAAATTGATTTT 167 (0) Dde 360 (393462) Flavodoxin* GACAATGAATTTCATTTT 260 (p) Hypothetical Dde 3342 (-) (spermidine transport GACAATGAATTTCATTTT 237 (p) operon) 50S Ribosomal Protein Dde 1598 (395588) TTTAATGATATTCATATT 4 (p) L19 Metallo-Cofactor Dde 2729 (394314) Biosynthesis Protein TGAAATGCCTATCATTAT 14 (p) (Heme Biosynthesis) Dde 823 (-) Hypothetical TGAAATGCCTATCATTAT 124 (p) Non-Ribosomal Dde 1364 (395155) GAAAATGATTTTCAATGT 183 (p) Peptide Synthase* Dde 2312 (395154) Regulatory Protein* GAAAATGATTTTCAATGT 121 (p) AATTATGATTAACGTTTT 142 (1) Dde 830 (394328) Glycosyl Transferase* CATAATTTTTACCATTTT 153 (0) Alkyl Hydroperoxide AATTATGATTAACGTTTT 111 (1) Dde 2721 (394327) Reductase* CATAATTTTTACCATTTT 100 (0) GAAAATAATTTTTGTTAT 180 (0) Dde 3428 (393334) Hypothetical AAAAATTATTTTCAATGC 175 (1) Hypothetical (Iron Dvu 03253 (208071) AAACTTGACAATCATTTT 80 (p) Transport) Transcriptional Dde 146 (393128) TTTCATGCATATCATTTT 197 (p) Regulatory Protein ABC Transporter, Dde 2156 (395379) AGAAAAAATTATTATTTT 83 (p) ATP-Binding Protein GAAAATTATTTTCAATAT 32 (p) Dvu 05098 (209207) Hypothetical GACATTGAAAATTATTTT 38 (1) AATAACGAAATTCATTAT 95 (p) Dde 1525 (395478) Hypothetical CATTATGATAATGAATTT 88 (0) Dde 3653 (392971) Hypothetical* GATATTGAAAACCAATTT 80 (p) Dde 77 (-) Hypothetical GATATTGAAAACCAATTT 4 (p) Dvu 00803 (206557) Hypothetical GAAAATGATGTTAATTTA 50 (p) 154

ModE Winged-Helix CGNTNTATWSWNGMYTAYATARCG Periplasmic Linker TGTTATATACTAGATTGTCTACCG 166 (1) Dde 1793 (395950) Protein* CGATATTGACAATATTACATATCA 104 (1) Glycine Betaine ABC TGTTATATACTAGATTGTCTACCG 80 (1) Dde 1804 (-) Transporter, ATP- CGATATTGACAATATTACATATCA 142 (1) Binding Protein* Dde 1939 (395739) Hypothetical CATTATCGAGTCGATCATATAGCG 32 (p) Dde 3216 (405379) Hypothetical CGTTGTATTCGGGCATACATCGCA 151 (p) HTH LuxR 2- NarL Component Response TNMYYCNNWMNGGGTA Regulator Membrane-Spanning Dde 3109 (393757) TTCCCCTTAAGGGGAA 102 (p) Protein* Dde 506 (-) Hypothetical* TTCCCCTTAAGGGGAA 45 (p) Hypothetical TTATTAGTAATAGGTA 195 (1) Dvu 04666 (208949) (upstream on GGATTCTTTATTAGTA 188 (1) nigerythrin) HTH LuxR 2- NarP Component Response TACCNCDWWHGNGGTA Regulator Fumarate Reductase C Dvu 04400 (208785) (Fumarate Reductase TACCTCTTTCTCGGTA 258 (p) Operon) Cell Cycle Histidine Dvu 02492 (207606) TACCCCTATCGTGGGA 128 (p) Kinase CckA Dde 480 (405387) Hypothetical* TACCCCTAACGCGGAA 16 (p) Dde 3141 (-) Hypothetical* TACCCCTAACGCGGAA 66 (p) Dde 2237 (395252) Hypothetical TACCCAGAAACTGGTA 123 (p) Dde 2428 (-) Hypothetical TACCTATTTTGAGGAA 3 (p) 3-Oxoacyl-(ACP) Dvu 04189 (208656) TAACCATACCGAGGTA 19 (p) Reductase DNA Binding Protein Dvu 03896 (208486) TACTTCACTCGGGGTA 267 (p) HU Dde 1572 (395559) Uridylate Kinase TAACACTATTGAGGAA 24 (p) Ferredoxin II-Related Dvu 05152 (209239) TACCTCCATAACGGTA 5 (p) Protein Putative Translation Dde 3040 (393862) Initiation Factor TACTGCAAACGTAGTA 68 (p) Protein Dvu 04981 (209138) Hypothetical TAGCCCTTGAGGAGTA 174 (p) DNA Polymerase III, Dvu 04297 (208719) subunits gamma and TAGCACTTTCGCAGTA 157 (p) tau Dvu 01834 (-) Hypothetical TGCTCATAAAGGGTTA 4 (p) Membrane Spanning Dde 3109 (393757) TTCCCCTTAAGGGGAA 281 (p) Protein* Dde 506 (-) Hypothetical* TTCCCCTTAAGGGGAA 45 (p) Carbon Starvation Dvu 05663 (209543) GACCTCTACCGGGGTA 198 (p) Protein A Dde 1341 (395114) PhoH TATCCGAAAAGAGGTA 71 (p) Dde 2332 (395113) Hypothetical* TATCCGAAAAGAGGTA 156 (p) Aldehyde Dde 890 (394403) TAACACTTTCGAACTA 28 (p) Oxidoreductase Polysulfide Reductase Dde 1613 (395604) TACTCAGTATGCGTTA 241 (p) A Methyl-Accepting Dde 2027 (395603) Chemotaxis Protein TACTCAGTATGCGTTA 85 (p) McpK Dde 898 (394440) Hypothetical TGCCCGTAAAGAAGTA 52 (p) Phage Tail Fibre Dde 3183 (393675) TACGGCCATATGGGTA 89 (p) Protein Dde 309 (-) Hypothetical* TACCCCTTACGGTGAA 11 (p) Hypothetical* Dde 3413 (393365) (Translational TACCCCTTACGGTGAA 5 (p) Machinery Operon)

155

Sigma-54 Dependent Dde 152 (393140) TACTCCTTATGAAGCA 175 (p) Sensory Box Protein Dde 3649 (392975) Transposase TATCTCTATAGCTTTA 55 (p) Dvu 03058 (207951) Hypothetical CACCCCTTTCGGGGCA 153 (p) Dde 2961 (393981) Hypothetical TGCCGAGAACGGGGTA 41 (p) Dvu 05140 (209232) Hypothetical TTCCTCGAAAGGGGGA 33 (p) Alkylphosphonate Dde 341 (393410) ABC Transport, ATP- CACCCCGAAAGGGGCA 141 (p) Binding Protein Dde 313 (393373) Phosphate Permease* CACCGCTATCAGGGTA 54 (p) Two-Component Dde 3406 (393372) Sensor Histidine CACCGCTATCAGGGTA 177 (p) Kinase PhoR* 2-Component PhoB HTDTHWTHHWNYTGTMANNNN Response Regulator Ribonucleoside- Diphosphate ATTTTATAATATAGTAAAATAA 66 (p) Dvu 04601 (208907) Reductase, alpha ATATTATAAAATTGTCAGGAAT 56 (0) subunit Asparginyl-tRNA TTAAATTAATTTTTACATAAAT 233 (1) Dde 261 (393296) Synthetase* TTTACATAAATGTTTAAAAATA 222 (1) TTAAATTAATTTTTACATAAAT 215 (1) Dde 3455 (-) Hypothetical* TTTACATAAATGTTTAAAAATA 284 (1) Intracellular ATAAAATATAGCTTTTACATAA 141 (1) Dde 2330 (-) Proteinase PfpI* ATATTTTATTTCTGACATTTGT 128 (0) ATAAAATATAGCTTTTACATAA 133 (1) Dde 1347 (395121) Hypothetical* ATATTTTATTTCTGACATTTGT 146 (0) Dde 2459 (394844) MutS-Like ATPase TTTTATTTAATTTAAAATATAT 26 (p) High-Affinity Sulfate CTATAATATATGTCAAAACAAA 39 (0) Dde 369 (393481) Transporter* ATGTCTTTGTTTTGACATATAT 44 (1) CTATAATATATGTCAAAACAAA 34 (0) Dde 3325 (-) Hypothetical* ATGTCTTTGTTTTGACATATAT 29 (1) ATTTTATATTTTTTTGACATAA 239 (0) Dde 2089 (-) Hypothetical* TTTATATTTTTTTGACATAAAC 241 (0) Dvu 01227 (206829) Hypothetical* ATGTAATATTCATGAAAAAGAC 145 (p) Dvu 01228 (206830) Hypothetical* ATGTAATATTCATGAAAAAGAC 146 (p) PurR HTH LacI NNNNNANGMAAWCGKTTNCNTNNN Ribosomal Protein Dde 897 (394439) TTGTCAAGAAAAGGGTTGCCTTTTTT 152 (p) L28* Dde 2655 (394438) Hypothetical* TTGTCAAGAAAAGGGTTGCCTTTTTT 45 (p) Dde 3330 (393475) GalT TGCTGATGCAAACGTTTGCAATTTGC 69 (p) Pyruvate-Ferredoxin Oxidoreductase Dde 3096 (393795) (pyruvate CTGTCTGGAAAACGTGTGCGTGAACG 288 (p) phosphoroclastic operon) Leu/Ile/Val-Binding Dvu 03527 (208250) AGACCACTAAAACGGTTGCACAGAAA 293 (p) Protein HesB-Like Domain Dvu 05241 (209293) CCGAAGACCAATCGTTTGCCATGATA 223 (p) Protein Dde 3332 (393473) GalR AGTTGCTGCAAACGTTTGCAGTTTGA 31 (p) Dde 613 (393959) Hypothetical ACGCAACGAAAGCGGTTTCTTCATCA 291 (p) Dvu 4840 (209053) Hypothetical CCTGAGGGCAAAAGTTTACCTTGCCA 9 (p) CRP HTH CRP WWNTGTGANNNNNNTCACANWN RNA- Methyltransferase Dde 2784 (394217) ATTTGTGCGCAAAATCACAAAA 73 (p) (F1F0ATPase complex operon) Sulfatase (downstream Dde 2073 (395499) of gene 2072 encoding CTATGTGATGTTTTTCACATAA 111 (p) DNR) Uncharacterized TATTGTGATTTTGTGAACAATG 219 (0) Dvu 04950 (209119) Protein TTTTGTGAACAATGACTCAAAA 227 (0) Coenzyme F390 AAATATTATTTTATTAACATAA 246 (p) Dvu 04386 (208777) Synthetase* TTATTTTATTAACATAATACTT 251 (p)

156

Threonyl-tRNA Dvu 03195 (208035) ATTTTTTAAGAACATCGCATTT 214 (p) Synthetase Dde 714 (-) Hypothetical* GTATGTTATTTTTTTCACATTG 87 (p) Dde 2864 (-) Hypothetical* GTATGTTATTTTTTTCACATTG 124 (p) Dde 3721 (392869) Hypothetical GATTGTGAATTATTTAACAATC 85 (p) Thioredoxin Dde 2604 (394537) ATTTGTAAAAAATAACAAATAT 141 (p) Peroxidase Dde 1481 (395426) Copper Chaperone* AATTGTGAATCGTCTTTCATTT 160 (p) Copper-Translocating Dde 2129 (395425) AATTGTGAATCGTCTTTCATTT 19 (p) ATPase* Hypothetical Dde 2112 (-) GTATGTTCTGTATTTCACATAT 30 (p) (upstream of DcrA) Dvu 02323 (207505) Hypothetical AAGTGTAATATATATCTAAAAA 99 (p) Binding Protein- Dde 3387 (393396) Dependent Transport AAATTCGAAGTACATAATATTT 14 (p) System IMP Alanine Dehydrogenase Dde 2276 (395208) TATTTTGATGAAAATATCAAAA 156 (p) (Thiamine Biosynthesis Operon) Translation Initiation Dde 1721 (395834) ATTTTTTATCACAATAAAAAAA 168 (p) Inhibitor Dvu 03372 (208146) Endoribonuclease* TTACGTGATTTGCTTCACAAGT 61 (p) Dvu 03373 (208147) Hypothetical* TTACGTGATTTGCTTCACAAGT 9 (p) Coenzyme F420- Reducing Dde 920 (394469) CAATGTGACCTGCATCACAGAC 82 (p) Hydrogenase, Beta Subunit Dvu 04452 (208813) Glycosyl Transferase AAATGTTACTAGATTCTCATAT 92 (p) Hook-Related Dde 1311 (395067) TAATGTGTTGTAATTAATAAAT 83 (p) Flagellin Protein* Dde 2358 (395066) Hypothetical* TAATGTGTTGTAATTAATAAAT 204 (p) Dde 2459 (394844) MutS-Like ATPase GTATTTTATTTAATTTAAAATA 23 (p) 3-Phosphoshikimate-1- Dde 1588 (395579) Caroxyvinyltransferas CTTTGTGAACAGTTGCACATAA 152 (p) e Hypothetical Dde 2843 (-) (Upstream of Arsenate AATTTTGAGTGTCATCAAAATG 151 (p) Reductase Operon) ArgR Winged-Helix WNTGAATDWWHATNCANW Transcriptional Dde 1196 (394909) TATGAATAAAAACTCATT 113 (p) Regulatory Protein* Hypothetical Cation- Dde 2424 (394908) TATGAATAAAAACTCATT 110 (p) Transporting ATPase* Carbon Dioxide AATGAATGAAAATATATT 163 (p) Dde 509 (393758) Concentrating TAACAATATATTTTCATT 159 (0) Mechanism TCCGAATAAAAATTCATT 136 (p) Dde 2685 (394399) Hypothetical TATGCATATTTTTTCACC 14 (0) Glutamate Synthase- Dvu 00046 (206094) TGTAAATATTTATTGATA 146 (p) Related Protein* Dvu 00048 (206095) Hypothetical* TGTAAATATTTATTGATA 146 (p) Dvu A00007 Hypothetical* TGTGAATGAATACTCACA 169 (p) (209595) Dvu A00008 Hypothetical* TGTGAATGAATACTCACA 83 (p) (209596) Dvu A00199 Hypothetical AGTAAATGTTTATTCTCA 191 (p) (209734) ATTGAATTATAATAAAAT 200 (p) Dde 476 (393702) Transposase* AAAGAATAAAAAAACAAT 297 (1) ATTGAATTATAATAAAAT 271 (p) Dde 3157 (393701) Hypothetical* AAAGAATAAAAAAACAAT 174 (1) Dvu 01762 (207161) Hypothetical AATGAATATATATCAAAA 255 (p) Dvu 02581 (207659) Hypothetical ACTGAATTTTAATGTTCT 164 (p) Dvu 03058 (207951) Hypothetical TGTGAATTTTAATTTCCA 1 (p)

157

Site-Specific Dde 2878 (394090) ATTGATTTTTAATTAAAA 108 (p) Recombinase ATP Synthase (F1F0 Dde 1662 (395679) TTTGAGCTTAAATTCACA 38 (p) Complex Operon) ATP Synthase (F1F0 Dvu 00228 (206207) TGTGAATTAAAGCTCAAA 21 (p) Complex Operon)

158

4.3.3. Identification of regulatory proteins

The following analysis attempts to discern the functions of several putative regulatory

proteins of Desulfovibrio based on genomic context. To simplify the discussion, Dv. vulgaris is used

as the model for both Desulfovibrio species and the subsequent text is taken from Hemme and Wall,

2004. Where significant differences occur, Dv. desulfuricans is discussed separately.

Several of the E. coli regulatory proteins listed in Tables 4-3 and 4-4 were shown to have orthologs in Dv. vulgaris by the criteria described but were not initially annotated as such (i.e. phoB).

Additionally, orthologs for several regulators that one would expect to exist in Dv. vulgaris, such as the arginine repressor ArgR, were not observed. In some cases it has been possible to predict the function of a gene by examining the context of the flanking sequences (183). Several regulatory genes from Tables 4-1 and 4-2 were examined using genomic context as well as comparisons with Dv. desulfuricans.

In E. coli, regulation of phosphate utilization is handled primarily by the PhoBR two- component regulatory system, the activity of which is modulated by the PstSCAB-PhoU complex. The genes are organized into two operons, the phoBR genes in the first and the pstSCAB-phoU genes in the second (184). An analysis of Dv. vulgaris revealed a complete set of the pstSCAB-phoU genes in three spatially separated operons which include two copies each of pstSCAB (Figure 4-4). The second pstSCAB gene set is missing in Dv. desulfuricans and the phoH gene is associated with the single pstSCA cluster, but otherwise the gene order is conserved. Although neither the phoB nor phoR genes were initially annotated in early annotations of the Desulfovibrio genomes, reciprocal BLASTP searches using the corresponding E. coli sequences revealed putative orthologs for each gene. A putative phoR ortholog was identified (ORF04651/DVU0013), but the surrounding regions of the genome provide no additional context to support the assignment of the gene. In Dv. desulfuricans

G20, however, the ortholog for phoR (Dde VIMSS393372) is situated next to a putative phosphate permease (Dde VIMSS393373), suggesting that this gene may in fact play a role in phosphate

159 homeostasis. More intriguing is the putative Dv. vulgaris phoB ortholog (ORF00741/DVU1083) located immediately upstream of and in the same orientation as the apparent pstB-phoU operon. Such

an arrangement has not been reported to our knowledge, although the same putative genes in Dv.

desulfuricans G20 appear to have similar arrangements. From contextual information, it is inferred

that ORF00741encoding PhoB is involved in the regulation of phosphate homeostasis in Dv. vulgaris.

There are currently two known pathways for arginine biosynthesis in prokaryotes. The

pathway of Enterobacteriaceae and Vibrionaceae employs a linear pathway for the synthesis of arginine from glutamate utilizing the ArgABCDEFGH-CarAB proteins (185). In a more typical version of the pathway, N-acetylornithionase (encoded by argE) is replaced by an ornithine acetyltransferase encoded by argJ. ArgJ allows for a circular pathway in which acetate is transferred from N-acetylornithine to glutamate and the resulting N-acetylglutamate is cycled back into the

pathway (186). Lactobacillus plantarum lacks the first enzyme in the pathway, N-acetylglutamate

synthetase (encoded by argA). This activity instead appears to be handled in Desulfovibrio by a

bifunctional ornithine acetyltransferase, ArgJ (187). An examination of arginine biosynthesis genes in

Dv. vulgaris suggests that the organism utilizes the ArgJ circular pathway and lacks an obvious argA

gene, which suggests the possibility of an ArgJ-mediated N-acetylglutamate synthetase activity similar

to that of L. plantarum (Figure 4-5). Despite this complete and highly conserved biosynthetic

pathway, Dv. vulgaris lacks an obvious arginine repressor based on BLASTP searches using a variety of ArgR sequences. A literature search revealed that Pseudomonas aeruginosa utilizes a non- homologous ArgR protein that has similarity to the AraC/XylS family (188). A BLASTP search using the P. aeruginosa ArgR sequence showed weak hits to AraC/XylS family regulators in both Dv. vulgaris and Dv. desulfuricans G20 but no definitive ArgR ortholog.

A previously identified motif derived from a conserved operon amino acid transport grouping showed conservation with the E. coli glycerol-3-phosphate regulon repressor (GlpR) binding site

(182), yet no glpR gene was identified in Dv. vulgaris. In comparison with the E. coli genome (136),

Dv. vulgaris appears to have a complete pathway for the scavenging of glycerol with the exception of a glycerol-3-phosphate permease (glpT), for which a candidate gene was not identified (Figure 4-6).

Multiple TBLASTN searches revealed no orthologs for glpR in Dv. vulgaris; however, a glpR ortholog 160 was identified in Dv. desulfuricans (Dde VIMSS393405). The presence of these pathways is consistent with previous observations of growth on glycerol by Desulfovibrio species (32).

Computational analysis of a gene sequence is insufficient to determine the function of that gene. The identification of several unannotated regulatory proteins was attempted by examination of the genome sequences based on the assumption of conservation among bacteria. Experimental approaches such as standard genetic and biochemical analyses will be necessary in order to elucidate the true functions of these regulatory proteins.

161 Figure 4-4. Genes of the PhoB Regulon of Desulfovibrio. Figure adapted from

(182).

162 Figure 4-4. PhoB regulons of Desulfovibrio

163 Figure 4-5. The putative arginine biosynthetic pathway of Dv. vulgaris. ArgJ

[DVU0823], Bifunctional Ornithine Acetyltransferase/N-Acetylglutamate Synthetase;

ArgB [DVU1466], N-Acetylglutamate Kinase; ArgC [DVU0492], N-Acetylglutamyl

Phosphate Reductase; ArgD [DVU2347], N-Acetylornithine Transaminase; CarAB

[DVU3113, DVU0162], Carbamoyl Phosphate Synthetase P; ArgF [DVU1096],

Ornithine Transcarbamylase; ArgG [DVU1095], Argininosuccinate Synthase; ArgH

[DVU1094], Argininosuccinase. Dv. desulfuricans G20 encodes an identical set of genes. Figure adapted from Bringel et al., 1997 and originally appeared in (182).

164 Figure 4-5. The Putative Arginine Biosynthetic Pathway of Desulfovibrio

165 Figure 4-6. Putative glycerol metabolism pathway of Dv. vulgaris. G3P = glycerol-

3-phosphate, DHAP = dihydroxyacetone phosphate. GlpF [DVU3133], Glycerol

Facilitator; GlpK [DVU3134], Glycerol Kinase; GlpT [not identified], G3P

Permease; GlpQ [DVU0176], Glycerophosphoryl Diester Phosphodiesterase; GlpD

[DVU3132], Aerobic G3P Dehydrogenase; GlpAB [DVU1940, DVU2673,

DVU1939], Anaerobic G3P Dehydrogenase; GpsA [DVU3159], G3P Dehydrogenase

(NAD+). TBLASTN searches did not reveal a glpT permease gene, but genes originally annotated as components of a G3P ABC transporter [DVU3161-3164] were identified. Dv. desulfuricans G20 encodes an identical set of genes. Figure adapted from Lin, 1996 and originally appeared in (182).

166 Figure 4-6. The Putative Glycerol Metabolism of Desulfovibrio

167 5. Research Acknowledgements

This work was supported by the United States Department of Energy Office of Biological and

Environmental Research through the Natural and Accelerated Bioremeditation Research Program

(NABIR). Draft sequences for Dv. vulgaris were provided by Dr. John Heidelberg (TIGR) and for Dv. desulfuricans G20 by Dr. Paul Richardson (DOE-JGI). CRP-FNR sequences of interest for Db. autotrophicum and Dt. psychrophila were generously provided by Drs. Frank Glöckner and Hans-Peter

Klenk, respectively (REGX). Sequence data from Bacteriovorax marinus were produced by the B. marinus Sequencing Group at the Sanger Centre and can be obtained from ftp://ftp.sanger.ac.uk/pub/pathogens/bm. Sequence data for several of the species examined in this study are also available through the Virtual Institute for Microbial Stress and Survival (VIMSS) and may be accessed through http://www.VIMSS.org.

168

6. Appendices

The following tables provide supplemental information for the analyses presented in the previous chapters.

169 Appendix A-1. BLASTP Comparison of experimentally verified Dv. vulgaris protein sequences

against the Dv. vulgaris genome. Experimentally verified query sequences are of three categories: 1)

Dv. vulgaris Hildenborough protein sequences (positive exact match), 2) Dv. vulgaris Miyazaki protein sequences (Positive homologous match), 3) general Desulfovibrio protein sequences that are not expected to be present in Dv. vulgaris (negative match). Query sequences were obtained from the

NCBI database. Hits with an E score < 1e-10 are colored blue and hits representing self-to-self matches

(the query sequence returns the identical sequence from the proteome) are colored red. Self-to-self

matches that do not display 100% sequence identity are the result of the introduction of gaps into the

protein alignment. Comparisons were made using the BLOSUM62 matrix with the low-complexity

filter turned on (default parameters). a Query sequence from the NCBI protein databases b NCBI gi (GeneInfo Identifier) number for the query sequence c Database from which the protein sequence was obtained: sw = Swissprot, pir = PIR database, gb =

GenBank d Organism from which the query sequence was derived e Nature of match: t+ = true positive (exact match), f+ = false positive (including homologs), f- = false negative, t- = true negative f Dv. vulgaris gene number from TIGR-CMR returned by BLAST g TIGR-CMR annotation of returned sequence h % identity of local homologous regions i E Score returned by BLAST j Bit Score returned by BLAST

170 Appendix A-1. BLASTP Comparison of experimentally verified Dv. vulgaris protein sequences against the Dv. vulgaris genome

Query Source Bit Query Sequence Namea gi #b dbc Matche TIGR #f Subject Annotationg % Identityh Lengthi E Scorej Organismd Scorek Periplasmic Fe-Only Dv. vulgaris Periplasmic Fe-Only 130072 sp t+ DVU1770 100 123 8.00E-69 251.5 Hydrogenase, small Hildenborough Hydrogenase, small Periplasmic Fe-Only t- DVU1771 34 50 1.40E-04 38.12 Hydrogenase, gamma t- DVU2798 ApbE family protein 23.28 116 0.938 25.41 t- DVU2211 hypothetical protein 25 32 2.7 23.87 t- DVU2094 thiG protein 28.57 63 4.7 23.1 t- DVU2088 membrane protein 47.62 21 7.9 22.33 t- DVU2415 hypothetical protein 25.53 47 7.9 22.33 Periplasmic Fe-Only Dv. vulgaris Periplasmic Fe-Only 130070 sp t+ DVU1769 95.96 421 0 828.9 Hydrogenase, large Hildenborough Hydrogenase, large Periplasmic Fe-Only f+ DVU1771 43.92 362 1.20E-74 273.5 Hydrogenase, gamma iron-sulfur cluster-binding t- DVU0686 37.5 56 1.10E-06 47.75 171 protein pyruvate ferredoxin t- DVU1944 oxidoreductase, iron-sulfur 41.67 48 9.30E-06 44.67 binding iron-sulfur cluster-binding t- DVU1080 31.67 60 1.30E-04 40.82 protein heterodisulfide reductase, A t- DVU2402 38.18 55 1.30E-04 40.82 subunit heterodisulfide reductase, iron- t- DVU0849 29.41 68 1.80E-04 40.43 sulfur-binding subunit iron-sulfur cluster-binding t- DVU2797 27.78 90 2.30E-04 40.05 protein iron-sulfur cluster-binding t- DVU0172 27.91 86 3.00E-04 39.66 protein iron-sulfur cluster-binding t- DVU0686 33.33 60 3.00E-04 39.66 protein t- DVU2289 hydrogenase, CooX subunit 32.31 65 5.10E-04 38.89 glutamate synthase, iron-sulfur t- DVU1823 21.43 126 8.70E-04 38.12 cluster-binding subunit Periplasmic Fe-Only Dv. vulgaris iron-sulfur cluster-binding 130070 sp t- DVU1931 30.77 52 8.70E-04 38.12 Hydrogenase, large, con’t. Hildenborough protein iron-sulfur cluster-binding t- DVU2797 34.04 47 0.001 37.74 protein glutamate synthase, iron-sulfur t- DVU3291 29.03 93 0.001 37.35 cluster-binding subunit t- DVU1220 nitroreductase family protein 32.76 58 0.002 36.96 pyruvate formate-lyase t- DVU2271 34.43 61 0.002 36.96 activating enzyme heterodisulfide reductase, t- DVU0850 26.36 129 0.003 36.58 transmembrane subunit iron-sulfur cluster- t- DVU2103 29.55 88 0.006 35.42 binding/ATPase domain protein iron-sulfur cluster-binding t- DVU3109 33.93 56 0.006 35.42 protein iron-sulfur cluster-binding t- DVU2544 32.73 55 0.007 35.04 protein iron-sulfur cluster-binding t- DVU2493 29.79 47 0.021 33.5 protein 172 iron-sulfur cluster-binding t- DVU2293 28.95 76 0.028 33.11 protein cysteine-rich domain/iron-sulfur t- DVU1558 28.79 66 0.037 32.73 cluster-binding domain adenylylsulphate reductase, beta t- DVU0846 31.03 58 0.048 32.34 subunit iron-sulfur cluster-binding t- DVU2493 37.5 40 0.048 32.34 protein reductase, iron-sulfur binding t- DVU1287 42.11 38 0.062 31.96 subunit, putative t- DVU0305 ferredoxin II 29.69 64 0.081 31.57 pyridine nucleotide-disulfide t- DVU3292 36.96 46 0.081 31.57 oxidoreductase iron-sulfur cluster- t- DVU2104 31.82 66 0.106 31.19 binding/ATPase domain protein pyruvate-ferredoxin t- DVU3025 30.88 68 0.106 31.19 oxidoreductase t- DVU3276 ferredoxin I 27.78 54 0.106 31.19 Periplasmic Fe-Only Dv. vulgaris iron-sulfur cluster-binding 130070 sp t- DVU0498 39.13 23 0.139 30.8 Hydrogenase, large, con’t Hildenborough protein glycolate oxidase, iron-sulfur t- DVU0826 21.35 89 0.139 30.8 subunit t- DVU0531 hmc operon protein 6 36.67 30 0.181 30.42 iron-sulfur cluster-binding t- DVU1614 29.09 55 0.181 30.42 protein electron transport complex t- DVU2792 30.43 46 0.237 30.03 protein RnfC iron-sulfur cluster-binding t- DVU3350 31.03 58 0.237 30.03 protein heterodisulfide reductase, C t- DVU2404 24.19 62 0.309 29.65 subunit iron-sulfur cluster-binding t- DVU1782 75 12 0.404 29.26 protein iron-sulfur cluster-binding t- DVU3033 21.49 121 0.404 29.26 protein reductase, iron-sulfur binding t- DVU1287 31.25 48 0.528 28.88 subunit, putative 173 hydrogenase, iron-sulfur cluster- t- DVU2401 21.84 87 0.528 28.88 binding subunit iron-sulfur cluster-binding t- DVU1080 50 18 0.689 28.49 protein pyruvate-ferredoxin t- DVU3025 34.78 46 0.9 28.11 oxidoreductase iron-sulfur cluster-binding t- DVU0172 31.11 45 1.2 27.72 protein t- DVU0429 Ech hydrogenase, subunit EchF 27.66 47 1.2 27.72 molybdopterin oxidoreductase, t- DVU0693 25.64 78 1.2 27.72 iron-sulfur cluster-bindin iron-sulfur cluster-binding t- DVU1081 29.82 57 1.2 27.72 protein t- DVU2400 hydrogenase, putative 26.42 53 1.2 27.72 dissimilatory sulfite reductase t- DVU0403 31.91 47 1.5 27.34 beta subunit iron-sulfur cluster-binding t- DVU1931 36 25 1.5 27.34 protein Periplasmic Fe-Only Dv. vulgaris iron-sulfur cluster-binding 130070 sp t- DVU2544 35 40 1.5 27.34 Hydrogenase, large, con’t. Hildenborough protein t- DVU0535 hmc operon protein 2 26.98 63 2 26.95 iron-sulfur cluster-binding t- DVU1782 27.45 51 2 26.95 protein pyruvate-ferredoxin t- DVU3025 36.84 19 2 26.95 oxidoreductase pyridine nucleotide-disulfide t- DVU3292 58.82 17 2 26.95 oxidoreductase iron-sulfur cluster-binding t- DVU0498 42.11 19 2.6 26.56 protein iron-sulfur cluster- t- DVU2103 42.86 21 2.6 26.56 binding/ATPase domain protein heterodisulfide reductase, C t- DVU2404 32.35 34 2.6 26.56 subunit t- DVU3144 cytochrome c family protein 27.27 77 2.6 26.56 pyruvate formate-lyase t- DVU2271 31.71 41 3.4 26.18 activating enzyme

174 iron-sulfur cluster-binding t- DVU3028 50 16 3.4 26.18 protein t- DVU2400 hydrogenase, putative 53.33 15 4.5 25.79 succinate dehydrogenase and t- DVU2674 53.33 15 4.5 25.79 fumarate reductase iron-sulf pyridine nucleotide-disulfide t- DVU3292 47.06 17 4.5 25.79 oxidoreductase iron-sulfur cluster-binding t- DVU0498 22.89 83 5.8 25.41 protein iron-sulfur cluster-binding t- DVU1080 38.1 21 5.8 25.41 protein heterodisulfide reductase, A t- DVU2402 28.33 60 5.8 25.41 subunit heterodisulfide reductase, A t- DVU2402 26.67 75 5.8 25.41 subunit formate dehydrogenase t- DVU2810 32.14 28 5.8 25.41 formation protein FdhE oxidoreductase, FAD/iron-sulfur t- DVU3071 33.33 36 5.8 25.41 cluster-binding domain Periplasmic Fe-Only Dv. vulgaris iron-sulfur cluster-binding 130070 sp t- DVU3350 45 20 5.8 25.41 Hydrogenase, large, con’t. Hildenborough protein t- DVU0535 hmc operon protein 2 53.33 15 7.6 25.02 t- DVU1196 leucyl-tRNA synthetase 30 70 7.6 25.02 transcriptional regulator, LysR t- DVU1402 35 40 7.6 25.02 family oxidoreductase, FAD/iron-sulfur t- DVU0253 45.83 24 10 24.64 cluster-binding domain molybdopterin oxidoreductase, t- DVU0694 29.03 31 10 24.64 molybdopterin-binding subu adenylylsulphate reductase, beta t- DVU0846 30 30 10 24.64 subunit iron-sulfur cluster-binding t- DVU3028 30.23 43 10 24.64 protein iron-sulfur cluster-binding t- DVU3028 23.81 63 10 24.64 protein High Molecular Weight Dv. vulgaris High Molecular Weight 123431 sp t+ DVU0536 100 545 0 1154 Cytochrome c Precursor Hildenborough Cytochrome

175 t- DVU2483 cytochrome c family protein 25.52 388 4.60E-08 52.76 t- DVU0263 acidic cytochrome c3 35.35 99 1.00E-07 51.6 t- DVU2809 cytochrome c3 36.84 76 1.30E-07 51.22 t- DVU2791 cytochrome c family protein 26.67 195 6.70E-07 48.91 t- DVU0263 acidic cytochrome c3 36.49 74 1.50E-06 47.75 t- DVU2791 cytochrome c family protein 35.94 64 1.50E-06 47.75 t- DVU2791 cytochrome c family protein 24.14 174 3.30E-06 46.59 t- DVU2791 cytochrome c family protein 23.97 267 3.30E-06 46.59 t- DVU0263 acidic cytochrome c3 37.5 80 7.40E-06 45.44 t- DVU2524 cytochrome c3, putative 32.94 85 1.60E-05 44.28 t- DVU2809 cytochrome c3 33.33 81 4.80E-05 42.74 t- DVU2524 cytochrome c3, putative 28.23 124 8.10E-05 41.97 t- DVU2483 cytochrome c family protein 23.31 133 1.80E-04 40.82 t- DVU2791 cytochrome c family protein 27.96 93 2.40E-04 40.43 t- DVU2483 cytochrome c family protein 28 100 3.10E-04 40.05 t- DVU3171 cytochrome c3 31.58 76 3.10E-04 40.05 t- DVU2483 cytochrome c family protein 21.6 287 0.002 37.74 t- DVU0702 cytochrome c family protein 25.64 78 0.022 33.88 t- DVU0263 acidic cytochrome c3 29.67 91 0.029 33.5 High Molecular Weight Dv. vulgaris Cytochrome c Precursor, 123431 sp t- DVU0922 cytochrome c family protein 30.65 62 0.546 29.26 Hildenborough con’t. t- DVU3171 cytochrome c3 28.87 97 0.713 28.88 branched chain amino acid ABC t- DVU1069 50 18 2.7 26.95 transporter, permease t- DVU1288 cytochrome c family protein 37.5 24 2.7 26.95 NapC/NirT cytochrome c family t- DVU0624 34.29 35 4.6 26.18 protein t- DVU2524 cytochrome c3, putative 30 40 4.6 26.18 t- DVU0059 AcrB/AcrD/AcrF family protein 50 24 6 25.79 t- DVU3144 cytochrome c family protein 20.7 227 7.9 25.41 Cytochrome c553 Dv. vulgaris 115241 sp t+ DVU1817 Cytochrome c553 85.44 103 5.70E-46 174.5 Precursor Hildenborough f+ DVU3041 cytochrome c553 39.51 81 1.60E-11 60.08 t- DVU1812 cytochrome c oxidase, subunit II 24.71 85 0.051 28.49 t- DVU3171 cytochrome c3 46.15 26 0.328 25.79 t- DVU1812 cytochrome c oxidase, subunit II 44.83 29 3.6 22.33

176 t- DVU2069 DNA processing protein DprA 50 20 3.6 22.33 t- DVUA0125 transglycosylase, SLT family 46.67 30 3.6 22.33 hydantoinase/oxoprolinase t- DVU0029 28.57 28 4.7 21.94 family protein t- DVU0383 hypothetical protein 47.06 17 6.2 21.56 t- DVU0671 conserved hypothetical protein 34.48 29 6.2 21.56 ATP synthase, F1 gamma t- DVU0776 31.03 29 6.2 21.56 subunit glycosyl transferase, group 2 t- DVU2354 37.5 24 6.2 21.56 family protein UDP-N-acetylmuramoylalanyl- t- DVU2508 43.48 23 6.2 21.56 D-glutamyl-2,6-diaminopimelat t- DVU2753 C_GCAxxG_C_C family protein 40 25 6.2 21.56 t- DVU0394 radical SAM domain protein 47.62 21 8.1 21.17 high-molecular-weight t- DVU0536 54.55 11 8.1 21.17 cytochrome C ATP synthase F0, B subunit, t- DVU0779 36.36 22 8.1 21.17 putative t- DVU2727 conserved hypothetical protein 60 15 8.1 21.17 Dv. vulgaris HMC Operon ORF6 462286 sp t+ DVU0531 HMC Operon ORF6 97.83 461 0 959.9 Hildenborough f+ DVU0264 ferredoxin, 4Fe-4S 34.07 408 8.30E-72 264.2 reductase, iron-sulfur binding f+ DVU1289 24.51 408 1.30E-27 117.5 subunit oxidoreductase, FAD/iron-sulfur t- DVU3071 26 100 0.001 38.12 cluster-binding domain iron-sulfur cluster-binding t- DVU3033 28.74 87 0.018 33.88 protein iron-sulfur cluster-binding t- DVU3033 29.29 99 0.018 33.88 protein iron-sulfur cluster-binding t- DVU3028 23.28 116 0.041 32.73 protein periplasmic [Fe] hydrogenase, t- DVU1769 36.67 30 0.202 30.42 large subunit heterodisulfide reductase, C t- DVU2404 25 116 0.263 30.03 subunit iron-sulfur cluster-binding t- DVU0498 24.47 94 0.344 29.65

177 protein glycolate oxidase, iron-sulfur t- DVU0826 21.69 83 0.449 29.26 subunit iron-sulfur cluster-binding t- DVU1782 27.17 92 0.449 29.26 protein cysteine-rich domain/iron-sulfur t- DVU1558 45 20 0.766 28.49 cluster-binding domain hydrogenase, iron-sulfur cluster- t- DVU2401 26.67 90 1 28.11 binding subunit general secretion pathway t- DVU1271 52.63 19 1.3 27.72 protein F, glutamate synthase, iron-sulfur t- DVU3291 42.86 28 1.3 27.72 cluster-binding subunit succinate dehydrogenase and t- DVU2674 27.27 99 2.2 26.95 fumarate reductase iron-sulf acetyl-CoA carboxylase, t- DVU2225 25.53 47 5 25.79 carboxyl transferase, alpha/beta t- DVU2400 hydrogenase, putative 21.92 73 5 25.79 heterodisulfide reductase, iron- t- DVU0849 53.33 15 6.5 25.41 sulfur-binding subunit Dv. vulgaris HMC Operon ORF6, con’t. 462286 sp t- DVU1089 alanyl-tRNA synthetase 25 72 6.5 25.41 Hildenborough general secretion pathway t- DVU1271 29.63 54 6.5 25.41 protein F, pyruvate formate-lyase t- DVU2271 63.64 11 6.5 25.41 activating enzyme oxidoreductase, FAD/iron-sulfur t- DVU0253 20.88 91 8.5 25.02 cluster-binding domain t- DVU1450 anti-sigma factor 41.67 24 8.5 25.02 t- DVU2218 GTP-binding protein, putative 29.09 55 8.5 25.02 heterodisulfide reductase, A t- DVU2402 29.31 58 8.5 25.02 subunit iron-sulfur cluster-binding t- DVU2797 22.54 71 8.5 25.02 protein radical SAM/B12 binding t- DVU3019 50 26 8.5 25.02 domain protein Dv. vulgaris HMC Operon ORF5 462285 sp t+ DVU0532 HMC Operon ORF5 100 226 5.00E-131 459.5 Hildenborough

178 f+ DVU0265 membrane protein 35.05 214 5.20E-33 134 heterodisulfide reductase, t- DVU0850 27.82 133 4.60E-05 41.2 transmembrane subunit t- DVU2286 hydrogenase, CooM subunit 36.67 60 0.31 28.49 t- DVU1290 nitrate reductase, gamma subunit 20.75 106 0.902 26.95 cation ABC transporter, t- DVU0104 30.77 52 2 25.79 permease protein t- DVU0597 regulatory protein LytS 29.27 41 2 25.79 t- DVU1250 methyltransferase GidB 26.32 76 2 25.79 t- DVU1090 recA protein 33.33 27 4.5 24.64 t- DVU2425 rarD protein 36.96 46 4.5 24.64 t- DVU0109 sensor histidine kinase 26.23 61 10 23.48 branched-chain amino acid ABC t- DVU0713 25.98 127 10 23.48 transporter, permease heat-inducible transcription t- DVU0813 35.48 31 10 23.48 repressor HrcA ABC transporter, ATP-binding t- DVU1671 35.14 37 10 23.48 protein/permease protein phospho-N-acetylmuramoyl- t- DVU2507 47.37 19 10 23.48 pentapeptide-transferase Dv. vulgaris HMC Operon ORF5, con’t. 462285 sp t- DVUA0090 membrane protein 40 30 10 23.48 Hildenborough Dv. vulgaris HMC Operon ORF4 462284 sp t- DVU0533 HMC Operon ORF4 100 47 1.50E-24 103.6 Hildenborough t- DVU2485 membrane protein 32 25 1.6 23.87 prolipoprotein diacylglyceryl t- DVU0015 50 18 3.5 22.71 transferase t- DVU1281 conserved hypothetical protein 36 25 3.5 22.71 t- DVU0059 AcrB/AcrD/AcrF family protein 53.33 15 7.7 21.56 Dv. vulgaris HMC Operon ORF3 462283 sp t+ DVU0534 HMC Operon ORF3 96.39 388 0 760 Hildenborough molybdopterin oxidoreductase, f+ DVU0692 25.98 358 5.10E-19 88.58 transmembrane subunit reductase, transmembrane t- DVU1286 22.06 136 0.126 30.8 subunit Glu/Leu/Phe/Val dehydrogenase t- DVU0964 30.34 89 2.4 26.56 family protein t- DVU1644 permease, putative 21.43 56 2.4 26.56

179 t- DVU3106 GGDEF domain protein 42.86 35 2.4 26.56 copper-translocating P-type t- DVU2324 26.67 105 3.1 26.18 ATPase TRAP dicarboxylate transporter t- DVU2823 19.53 128 4.1 25.79 family protein t- DVUA0067 membrane protein, putative 21.37 131 5.3 25.41 t- DVU2261 hypothetical protein 42.42 33 9 24.64 Periplasmic NiFe Dv. vulgaris Periplasmic NiFe Hydrogenase, 130115 sp t+ DVU1921 79.5 317 2.00E-155 541.2 Hydrogenase, small Hildenborough small, isozyme 1 periplasmic [NiFe] hydrogenase, f+ DVU2525 55.78 294 1.00E-101 362.8 small subunit, isozyme 2 periplasmic [NiFeSe] f+ DVU1917 38.56 319 1.80E-51 196.1 hydrogenase, small subunit t- DVU0432 Ech hydrogenase, subunit EchC 30.77 65 0.075 31.19 t- DVU2288 hydrogenase, CooL subunit 26.23 61 1.1 27.34 t- DVU1883 conserved hypothetical protein 38.71 31 3.1 25.79 sodium/alanine symporter family t- DVU0724 41.38 29 7 24.64 protein t- DVU1513 conserved hypothetical protein 46.15 26 7 24.64 Periplasmic NiFe Dv. vulgaris copper-translocating P-type 130115 sp t- DVU2324 37.5 32 9.1 24.25 Hydrogenase, small, con’t. Hildenborough ATPase Periplasmic NiFe Dv. vulgaris Periplasmic NiFe Hydrogenase, 130104 sp t+ DVU1922 86.42 567 0 1015 Hydrogenase, large Hildenborough large, isozyme 1 periplasmic [NiFe] hydrogenase, f+ DVU2526 51.64 548 2.00E-167 582.4 large subunit, isozyme 2... t- DVU0430 Ech hydrogenase, subunit EchE 25.24 103 1.20E-06 48.14 carbon monoxide-induced t- DVU2291 22.22 81 0.003 36.96 hydrogenase CooH t- DVU0430 Ech hydrogenase, subunit EchE 31.82 88 0.195 30.8 2-C-methyl-D-erythritol 4- t- DVU1454 23.77 122 0.255 30.42 phosphate cytidylyltransferase ribose-phosphate t- DVU1575 35.71 56 0.742 28.88 pyrophosphokinase carbon monoxide-induced t- DVU2291 27.5 80 4.8 26.18 hydrogenase CooH zinc resistance-associated t- DVU3384 40.63 32 8.2 25.41 protein

180 t- DVUA0058 BNR/Asp-box repeat protein 31.03 29 8.2 25.41 Dv. vulgaris Cytochrome c3 Precursor 476416 pir t+ DVU3171 Cytochrome c3 84.5 129 1.30E-64 237.7 Hildenborough f+ DVU2809 cytochrome c3 35.16 91 2.90E-11 60.46 t- DVU2483 cytochrome c family protein 34.38 96 4.60E-09 53.14 high-molecular-weight t- DVU0536 31.58 76 4.00E-05 40.05 cytochrome C t- DVU2524 cytochrome c3, putative 32.35 68 0.002 34.27 NapC/NirT cytochrome c family t- DVU0624 30.93 97 0.006 32.73 protein high-molecular-weight t- DVU0536 28.87 97 0.093 28.88 cytochrome C t- DVU2791 cytochrome c family protein 23.16 95 0.093 28.88 t- DVU0263 acidic cytochrome c3 18.52 54 0.46 26.56 t- DVU1817 cytochrome c-553 46.15 26 0.785 25.79 high-molecular-weight t- DVU0536 21.59 88 1.3 25.02 cytochrome C t- DVU1559 aldehyde oxidoreductase 55.56 18 5.1 23.1 t- DVU0732 valyl-tRNA synthetase 30 30 6.6 22.71 t- DVU2732 conserved hypothetical protein 46.15 13 6.6 22.71 Cytochrome c3 Precursor, Dv. vulgaris sensory box/GGDEF 476416 pir t- DVU0422 47.83 23 8.7 22.33 con’t. Hildenborough domain/EAL domain protein t- DVU2100 universal stress protein family 32.5 40 8.7 22.33 t- DVU2747 hypothetical protein 47.06 17 8.7 22.33 t- DVU3144 cytochrome c family protein 54.55 11 8.7 22.33 t- DVU3144 cytochrome c family protein 45 20 8.7 22.33 Cytochrome c553 Dv. vulgaris 476416 pir t+ DVU1817 Cytochrome c553 85.44 103 5.70E-46 174.5 Precursor Hildenborough f+ DVU3041 cytochrome c553 39.51 81 1.60E-11 60.08 t- DVU1812 cytochrome c oxidase, subunit II 24.71 85 0.051 28.49 t- DVU3171 cytochrome c3 46.15 26 0.328 25.79 t- DVU1812 cytochrome c oxidase, subunit II 44.83 29 3.6 22.33 t- DVU2069 DNA processing protein DprA 50 20 3.6 22.33 t- DVUA0125 transglycosylase, SLT family 46.67 30 3.6 22.33 hydantoinase/oxoprolinase t- DVU0029 28.57 28 4.7 21.94 family protein t- DVU0383 hypothetical protein 47.06 17 6.2 21.56 t- DVU0671 conserved hypothetical protein 34.48 29 6.2 21.56

181 ATP synthase, F1 gamma t- DVU0776 31.03 29 6.2 21.56 subunit glycosyl transferase, group 2 t- DVU2354 37.5 24 6.2 21.56 family protein UDP-N-acetylmuramoylalanyl- t- DVU2508 43.48 23 6.2 21.56 D-glutamyl-2,6-diaminopimelat t- DVU2753 C_GCAxxG_C_C family protein 40 25 6.2 21.56 t- DVU0394 radical SAM domain protein 47.62 21 8.1 21.17 high-molecular-weight t- DVU0536 54.55 11 8.1 21.17 cytochrome C ATP synthase F0, B subunit, t- DVU0779 36.36 22 8.1 21.17 putative t- DVU2727 conserved hypothetical protein 60 15 8.1 21.17 Dv. vulgaris High Molecular Weight Cytochrome cc3 Precursor 97376 pir t+ DVU0536 99.82 545 0 1151.7 Hildenborough Cytochrome t- DVU2483 cytochrome c family protein 25.52 388 4.60E-08 52.76 t- DVU0263 acidic cytochrome c3 35.35 99 1.00E-07 51.6 t- DVU2809 cytochrome c3 36.84 76 1.30E-07 51.22 t- DVU2791 cytochrome c family protein 26.67 195 3.90E-07 49.68 Cytochrome cc3 Precursor, Dv. vulgaris 97376 pir t- DVU0263 acidic cytochrome c3 36.49 74 1.50E-06 47.75 con’t. Hildenborough t- DVU2791 cytochrome c family protein 35.94 64 1.50E-06 47.75 t- DVU2791 cytochrome c family protein 24.14 174 3.30E-06 46.59 t- DVU2791 cytochrome c family protein 23.97 267 3.30E-06 46.59 t- DVU0263 acidic cytochrome c3 37.5 80 7.40E-06 45.44 t- DVU2524 cytochrome c3, putative 32.94 85 1.60E-05 44.28 t- DVU2809 cytochrome c3 33.33 81 4.80E-05 42.74 t- DVU2524 cytochrome c3, putative 28.23 124 8.10E-05 41.97 t- DVU2483 cytochrome c family protein 23.31 133 1.80E-04 40.82 t- DVU2483 cytochrome c family protein 28 100 2.40E-04 40.43 t- DVU2791 cytochrome c family protein 27.96 93 2.40E-04 40.43 t- DVU3171 cytochrome c3 31.58 76 3.10E-04 40.05 t- DVU2483 cytochrome c family protein 21.85 302 9.00E-04 38.51 t- DVU0263 acidic cytochrome c3 29.67 91 0.017 34.27 t- DVU0702 cytochrome c family protein 25.64 78 0.022 33.88 t- DVU3171 cytochrome c3 27.55 98 0.32 30.03 t- DVU0922 cytochrome c family protein 30.65 62 0.546 29.26

182 t- DVU0702 cytochrome c family protein 30 90 0.713 28.88 branched chain amino acid ABC t- DVU1069 50 18 2.7 26.95 transporter, permease t- DVU1288 cytochrome c family protein 37.5 24 2.7 26.95 t- DVU2524 cytochrome c3, putative 25.53 94 2.7 26.95 NapC/NirT cytochrome c family t- DVU0624 34.29 35 3.5 26.56 protein t- DVU0059 AcrB/AcrD/AcrF family protein 50 24 6 25.79 t- DVU0270 sensory box histidine kinase 26.58 79 7.9 25.41 t- DVU3144 cytochrome c family protein 20.7 227 7.9 25.41 Cytochrome c3 Dv. vulgaris Periplasmic Fe-Only 66320 pir t+ DVU1769 95.96 421 0 828.9 Hydrogenase Hildenborough Hydrogenase, Large Periplasmic Fe-Only f+ DVU1771 43.92 362 1.20E-74 273.5 Hydrogenase, gamma iron-sulfur cluster-binding t- DVU0686 37.5 56 1.10E-06 47.75 protein pyruvate ferredoxin t- DVU1944 oxidoreductase, iron-sulfur 41.67 48 9.30E-06 44.67 binding Cytochrome c3 Dv. vulgaris iron-sulfur cluster-binding 66320 pir t- DVU1080 31.67 60 1.30E-04 40.82 Hydrogenase, con’t. Hildenborough protein heterodisulfide reductase, A t- DVU2402 38.18 55 1.30E-04 40.82 subunit heterodisulfide reductase, iron- t- DVU0849 29.41 68 1.80E-04 40.43 sulfur-binding subunit iron-sulfur cluster-binding t- DVU2797 27.78 90 2.30E-04 40.05 protein iron-sulfur cluster-binding t- DVU0172 27.91 86 3.00E-04 39.66 protein iron-sulfur cluster-binding t- DVU0686 33.33 60 3.00E-04 39.66 protein t- DVU2289 hydrogenase, CooX subunit 32.31 65 5.10E-04 38.89 glutamate synthase, iron-sulfur t- DVU1823 21.43 126 8.70E-04 38.12 cluster-binding subunit iron-sulfur cluster-binding t- DVU1931 30.77 52 8.70E-04 38.12 protein iron-sulfur cluster-binding t- DVU2797 34.04 47 0.001 37.74 protein 183 glutamate synthase, iron-sulfur t- DVU3291 29.03 93 0.001 37.35 cluster-binding subunit t- DVU1220 nitroreductase family protein 32.76 58 0.002 36.96 pyruvate formate-lyase t- DVU2271 34.43 61 0.002 36.96 activating enzyme heterodisulfide reductase, t- DVU0850 26.36 129 0.003 36.58 transmembrane subunit iron-sulfur cluster- t- DVU2103 29.55 88 0.006 35.42 binding/ATPase domain protein iron-sulfur cluster-binding t- DVU3109 33.93 56 0.006 35.42 protein iron-sulfur cluster-binding t- DVU2544 32.73 55 0.007 35.04 protein iron-sulfur cluster-binding t- DVU2493 29.79 47 0.021 33.5 protein t- DVU2293 iron-sulfur protein CooF 28.95 76 0.028 33.11 cysteine-rich domain/iron-sulfur t- DVU1558 28.79 66 0.037 32.73 cluster-binding domain Cytochrome c3 Dv. vulgaris adenylylsulphate reductase, beta 66320 pir t- DVU0846 31.03 58 0.048 32.34 Hydrogenase, con’t. Hildenborough subunit iron-sulfur cluster-binding t- DVU2493 37.5 40 0.048 32.34 protein reductase, iron-sulfur binding t- DVU1287 42.11 38 0.062 31.96 subunit, putative t- DVU0305 ferredoxin II 29.69 64 0.081 31.57 pyridine nucleotide-disulfide t- DVU3292 36.96 46 0.081 31.57 oxidoreductase iron-sulfur cluster- t- DVU2104 31.82 66 0.106 31.19 binding/ATPase domain protein pyruvate-ferredoxin t- DVU3025 30.88 68 0.106 31.19 oxidoreductase t- DVU3276 ferredoxin I 27.78 54 0.106 31.19 iron-sulfur cluster-binding t- DVU0498 39.13 23 0.139 30.8 protein glycolate oxidase, iron-sulfur t- DVU0826 21.35 89 0.139 30.8 subunit

184 t- DVU0531 hmc operon protein 6 36.67 30 0.181 30.42 iron-sulfur cluster-binding t- DVU1614 29.09 55 0.181 30.42 protein electron transport complex t- DVU2792 30.43 46 0.237 30.03 protein RnfC iron-sulfur cluster-binding t- DVU3350 31.03 58 0.237 30.03 protein heterodisulfide reductase, C t- DVU2404 24.19 62 0.309 29.65 subunit iron-sulfur cluster-binding t- DVU1782 75 12 0.404 29.26 protein iron-sulfur cluster-binding t- DVU3033 21.49 121 0.404 29.26 protein reductase, iron-sulfur binding t- DVU1287 31.25 48 0.528 28.88 subunit, putative hydrogenase, iron-sulfur cluster- t- DVU2401 21.84 87 0.528 28.88 binding subunit iron-sulfur cluster-binding t- DVU1080 50 18 0.689 28.49 protein Cytochrome c3 Dv. vulgaris pyruvate-ferredoxin 66320 pir t- DVU3025 34.78 46 0.9 28.11 Hydrogenase, con’t. Hildenborough oxidoreductase iron-sulfur cluster-binding t- DVU0172 31.11 45 1.2 27.72 protein t- DVU0429 Ech hydrogenase, subunit EchF 27.66 47 1.2 27.72 molybdopterin oxidoreductase, t- DVU0693 25.64 78 1.2 27.72 iron-sulfur cluster-bindin iron-sulfur cluster-binding t- DVU1081 29.82 57 1.2 27.72 protein t- DVU2400 hydrogenase, putative 26.42 53 1.2 27.72 dissimilatory sulfite reductase t- DVU0403 31.91 47 1.5 27.34 beta subunit iron-sulfur cluster-binding t- DVU1931 36 25 1.5 27.34 protein iron-sulfur cluster-binding t- DVU2544 35 40 1.5 27.34 protein t- DVU0535 hmc operon protein 2 26.98 63 2 26.95 iron-sulfur cluster-binding t- DVU1782 27.45 51 2 26.95

185 protein pyruvate-ferredoxin t- DVU3025 36.84 19 2 26.95 oxidoreductase pyridine nucleotide-disulfide t- DVU3292 58.82 17 2 26.95 oxidoreductase iron-sulfur cluster-binding t- DVU0498 42.11 19 2.6 26.56 protein iron-sulfur cluster- t- DVU2103 42.86 21 2.6 26.56 binding/ATPase domain protein heterodisulfide reductase, C t- DVU2404 32.35 34 2.6 26.56 subunit t- DVU3144 cytochrome c family protein 27.27 77 2.6 26.56 pyruvate formate-lyase t- DVU2271 31.71 41 3.4 26.18 activating enzyme iron-sulfur cluster-binding t- DVU3028 50 16 3.4 26.18 protein t- DVU2400 hydrogenase, putative 53.33 15 4.5 25.79 succinate dehydrogenase and t- DVU2674 53.33 15 4.5 25.79 fumarate reductase iron-sulf Cytochrome c3 Dv. vulgaris pyridine nucleotide-disulfide 66320 pir t- DVU3292 47.06 17 4.5 25.79 Hydrogenase, con’t. Hildenborough oxidoreductase iron-sulfur cluster-binding t- DVU0498 22.89 83 5.8 25.41 protein iron-sulfur cluster-binding t- DVU1080 38.1 21 5.8 25.41 protein heterodisulfide reductase, A t- DVU2402 28.33 60 5.8 25.41 subunit heterodisulfide reductase, A t- DVU2402 26.67 75 5.8 25.41 subunit formate dehydrogenase t- DVU2810 32.14 28 5.8 25.41 formation protein FdhE oxidoreductase, FAD/iron-sulfur t- DVU3071 33.33 36 5.8 25.41 cluster-binding domain iron-sulfur cluster-binding t- DVU3350 45 20 5.8 25.41 protein t- DVU0535 hmc operon protein 2 53.33 15 7.6 25.02 t- DVU1196 leucyl-tRNA synthetase 30 70 7.6 25.02

186 transcriptional regulator, LysR t- DVU1402 35 40 7.6 25.02 family oxidoreductase, FAD/iron-sulfur t- DVU0253 45.83 24 10 24.64 cluster-binding domain molybdopterin oxidoreductase, t- DVU0694 29.03 31 10 24.64 molybdopterin-binding subu adenylylsulphate reductase, beta t- DVU0846 30 30 10 24.64 subunit iron-sulfur cluster-binding t- DVU3028 30.23 43 10 24.64 protein iron-sulfur cluster-binding t- DVU3028 23.81 63 10 24.64 protein Dv. vulgaris Flavodoxin 476442 pir t+ DVU2680 Flavodoxin 100 148 3.30E-84 303.1 Hildenborough t- DVU3222 glucose-6-phosphate isomerase 36.17 47 0.777 26.18 alcohol dehydrogenase, iron- t- DVU2545 29.09 55 1 25.79 containing t- DVU0198 minor capsid protein C 40.48 42 1.3 25.41 t- DVU2871 minor capsid protein C 40.48 42 1.3 25.41 Dv. vulgaris phage tail tape measure protein, Flavodoxin, con’t. 476442 pir t- DVU2721 42.86 35 1.7 25.02 Hildenborough TP901 family t- DVU1122 portal protein, putative 35.42 48 3.9 23.87 glycosyl transferase, group 1/2 t- DVU2993 41.67 24 3.9 23.87 family protein peptidyl-prolyl cis-trans t- DVU2569 42.5 40 5 23.48 isomerase, FKBP-type rubredoxin-oxygen t- DVU3185 23.7 135 5 23.48 oxidoreductase pyridine nucleotide-disulfide t- DVU1613 27.03 37 6.6 23.1 oxidoreductase ABC transporter, ATP-binding t- DVU0484 56.25 16 8.6 22.71 protein t- DVU0961 conserved hypothetical protein 34.38 32 8.6 22.71 t- DVU2239 glycosyl hydrolase, family 3 33.33 33 8.6 22.71 t- DVU3101 tonB protein, putative 30.3 33 8.6 22.71 Dv. vulgaris Nigerythrin 47606725 sp t+ DVU0019 Nigerythrin 94.06 202 1.00E-108 384.8 Hildenborough

187 f+ DVU2318 rubrerythrin 34.13 167 1.30E-19 89.35 f+ DVU3094 rubrerythrin 32.26 186 2.20E-16 78.57 t- DVU3184 rubredoxin 28 50 1.7 25.79 t- DVU2316 DNA topoisomerase III 31.25 64 2.9 25.02 t- DVU2815 outer membrane efflux protein 27.08 48 3.8 24.64 t- DVU0421 agmatinase 37.93 29 5 24.25 t- DVU1407 radical SAM domain protein 45.83 24 6.5 23.87 Dv. vulgaris Rubrerythrin 134119 sp t+ DVU3094 Rubrerythrin 100 191 5.00E-111 392.9 Hildenborough f+ DVU0019 nigerythrin 34.41 186 1.90E-19 88.58 f+ DVU2318 rubrerythrin 32.61 184 6.30E-18 83.57 t- DVU3184 rubredoxin 31.25 48 1.6 25.79 t- DVU0033 isochorismatase family protein 30.61 49 2 25.41 preprotein translocase, SecY t- DVU1323 29.63 54 2 25.41 subunit t- DVU1525 conserved domain protein 38.71 31 2.6 25.02 t- DVU1973 rhodanese-like domain protein 31.37 51 3.5 24.64 t- DVU0732 valyl-tRNA synthetase 22.45 49 4.5 24.25 t- DVU1766 aspartate ammonia-lyase 23.53 68 4.5 24.25 Dv. vulgaris Rubredoxin 134114 sp t+ DVU3184 Rubredoxin 100 52 1.20E-29 120.6 Hildenborough f+ DVU3093 rubredoxin-like protein 56.1 41 2.80E-10 56.22 t- DVU2318 rubrerythrin 34.78 46 0.021 30.03 t- DVU1511 hypothetical protein 71.43 14 0.236 26.56 t- DVU0019 nigerythrin 28 50 0.402 25.79 t- DVU3094 rubrerythrin 31.25 48 0.402 25.79 radical SAM/B12 binding t- DVU3019 44.44 27 2 23.48 domain protein t- DVU3389 DNA topoisomerase I 57.14 14 2 23.48 t- DVU3389 DNA topoisomerase I 61.54 13 2 23.48 t- DVU1932 adenylate kinase 47.37 19 4.4 22.33 t- DVU2904 radical SAM enzyme, Cfr family 57.14 14 5.8 21.94 t- DVU0448 GDP-mannose 4,6-dehydratase 40 25 7.6 21.56 methyl-accepting chemotaxis t- DVU0750 45 20 7.6 21.56 protein t- DVU1986 conserved hypothetical protein 45.45 22 7.6 21.56 outer membrane protein, t- DVU3090 45 20 7.6 21.56 188 OMPP1/FadL/TodX family t- DVU1316 ribosomal protein S14 55.56 9 9.9 21.17 t- DVU1646 arsenate reductase 28.13 32 9.9 21.17 sensory box histidine t- DVU2129 75 12 9.9 21.17 kinase/response regulator Dv. vulgaris Desulfoferrodoxin 118461 sp t+ DVU3183 Desulfoferredoxin 100 126 1.60E-75 273.9 Hildenborough succinyl-CoA synthase, t- DVU2137 27.72 101 0.761 25.79 beta/alpha subunits t- DVU2330 MRP family protein 52.94 17 1.3 25.02 t- DVU2620 conserved hypothetical protein 31.58 38 1.7 24.64 precorrin-6Y C5,15- t- DVU2749 36.59 41 1.7 24.64 methyltransferase iron-sulfur cluster-binding t- DVU3143 29.55 44 1.7 24.64 protein MTH1175-like domain family t- DVU2109 21.88 64 2.9 23.87 protein t- DVU0517 peptidase, M23/M37 family 34.62 52 3.8 23.48 aldehyde dehydrogenase t- DVU3294 50 18 3.8 23.48 (NADP) family protein Dv. vulgaris iron-sulfur cluster-binding Desulfoferrodoxin, con’t. 118461 sp t- DVU0686 29.27 41 4.9 23.1 Hildenborough protein t- DVU1976 chaperonin, 60 kDa 30.19 53 4.9 23.1 2-dehydro-3- t- DVU1624 44.44 18 6.4 22.71 deoxyphosphooctonate aldolase t- DVU1847 conserved hypothetical protein 43.75 16 8.4 22.33 transcriptional regulator, Fur t- DVU3095 46.15 13 8.4 22.33 family Dv. vulgaris Rubrerythrin 79430 pir t+ DVU3094 Rubreythrin 100 191 5.00E-111 392.9 Hildenborough f+ DVU0019 nigerythrin 34.41 186 1.90E-19 88.58 f+ DVU2318 rubrerythrin 32.61 184 6.30E-18 83.57 t- DVU3184 rubredoxin 31.25 48 1.6 25.79 t- DVU0033 isochorismatase family protein 30.61 49 2 25.41 preprotein translocase, SecY t- DVU1323 29.63 54 2 25.41 subunit t- DVU1525 conserved domain protein 38.71 31 2.6 25.02 t- DVU1973 rhodanese-like domain protein 31.37 51 3.5 24.64

189 Dv. vulgaris 79430 pir t- DVU0732 valyl-tRNA synthetase 22.45 49 4.5 24.25 Hildenborough t- DVU1766 aspartate ammonia-lyase 23.53 68 4.5 24.25 Dv. vulgaris Desulfoferredoxin 66409 pir t+ DVU3183 Desulfoferredoxin 100 126 1.60E-75 273.9 Hildenborough succinyl-CoA synthase, t- DVU2137 27.72 101 0.761 25.79 beta/alpha subunits t- DVU2330 MRP family protein 52.94 17 1.3 25.02 t- DVU2620 conserved hypothetical protein 31.58 38 1.7 24.64 precorrin-6Y C5,15- t- DVU2749 36.59 41 1.7 24.64 methyltransferase iron-sulfur cluster-binding t- DVU3143 29.55 44 1.7 24.64 protein MTH1175-like domain family t- DVU2109 21.88 64 2.9 23.87 protein t- DVU0517 peptidase, M23/M37 family 34.62 52 3.8 23.48 aldehyde dehydrogenase t- DVU3294 50 18 3.8 23.48 (NADP) family protein iron-sulfur cluster-binding t- DVU0686 29.27 41 4.9 23.1 protein Dv. vulgaris Desulfoferredoxin, con’t. 66409 pir t- DVU1976 chaperonin, 60 kDa 30.19 53 4.9 23.1 Hildenborough 2-dehydro-3- t- DVU1624 44.44 18 6.4 22.71 deoxyphosphooctonate aldolase t- DVU1847 conserved hypothetical protein 43.75 16 8.4 22.33 transcriptional regulator, Fur t- DVU3095 46.15 13 8.4 22.33 family Dv. vulgaris Rubredoxin 65799 pir t+ DVU3184 Rubredoxin 100 52 1.20E-29 120.6 Hildenborough t- DVU3093 rubredoxin-like protein 56.1 41 2.80E-10 56.22 t- DVU2318 rubrerythrin 34.78 46 0.021 30.03 t- DVU1511 hypothetical protein 71.43 14 0.236 26.56 t- DVU0019 nigerythrin 28 50 0.402 25.79 t- DVU3094 rubrerythrin 31.25 48 0.402 25.79 radical SAM/B12 binding t- DVU3019 44.44 27 2 23.48 domain protein t- DVU3389 DNA topoisomerase I 57.14 14 2 23.48 t- DVU3389 DNA topoisomerase I 61.54 13 2 23.48

190 t- DVU1932 adenylate kinase 47.37 19 4.4 22.33 t- DVU2904 radical SAM enzyme, Cfr family 57.14 14 5.8 21.94 t- DVU0448 GDP-mannose 4,6-dehydratase 40 25 7.6 21.56 methyl-accepting chemotaxis t- DVU0750 45 20 7.6 21.56 protein t- DVU1986 conserved hypothetical protein 45.45 22 7.6 21.56 outer membrane protein, t- DVU3090 45 20 7.6 21.56 OMPP1/FadL/TodX family t- DVU1316 ribosomal protein S14 55.56 9 9.9 21.17 t- DVU1646 arsenate reductase 28.13 32 9.9 21.17 sensory box histidine t- DVU2129 75 12 9.9 21.17 kinase/response regulator Dv. vulgaris Rubredoxin 145120 gb t+ DVU3184 Rubredoxin 100 52 1.20E-29 120.6 Hildenborough f+ DVU3093 rubredoxin-like protein 56.1 41 2.80E-10 56.22 t- DVU2318 rubrerythrin 34.78 46 0.021 30.03 t- DVU1511 hypothetical protein 71.43 14 0.236 26.56 t- DVU0019 nigerythrin 28 50 0.402 25.79 t- DVU3094 rubrerythrin 31.25 48 0.402 25.79 Dv. vulgaris radical SAM/B12 binding Rubredoxin, con’t. 145120 gb t- DVU3019 44.44 27 2 23.48 Hildenborough domain protein t- DVU3389 DNA topoisomerase I 57.14 14 2 23.48 t- DVU3389 DNA topoisomerase I 61.54 13 2 23.48 t- DVU1932 adenylate kinase 47.37 19 4.4 22.33 t- DVU2904 radical SAM enzyme, Cfr family 57.14 14 5.8 21.94 t- DVU0448 GDP-mannose 4,6-dehydratase 40 25 7.6 21.56 methyl-accepting chemotaxis t- DVU0750 45 20 7.6 21.56 protein t- DVU1986 conserved hypothetical protein 45.45 22 7.6 21.56 outer membrane protein, t- DVU3090 45 20 7.6 21.56 OMPP1/FadL/TodX family t- DVU1316 ribosomal protein S14 55.56 9 9.9 21.17 t- DVU1646 arsenate reductase 28.13 32 9.9 21.17 sensory box histidine t- DVU2129 75 12 9.9 21.17 kinase/response regulator Dv. vulgaris Superoxide Dismutase 2654175 gb t+ DVU2410 Superoxide Dismutase 97.45 196 1.00E-112 398.3 Hildenborough

191 t- DVU1644 permease, putative 26.51 83 1.6 25.79 acetyl-CoA carboxylase, t- DVU2225 25.53 47 2.1 25.41 carboxyl transferase, alpha/beta pyridine nucleotide-disulfide t- DVU1165 48.28 29 2.1 25.41 oxidoreductase methyl-accepting chemotaxis t- DVU0094 20.87 115 2.1 25.41 protein t- DVU3373 dihydroxy-acid dehydratase 36.67 30 8 23.48 t- DVU3188 NLP/P60 family protein 36.96 46 8 23.48 flagellar hook-associated protein t- DVU0520 40.74 27 8 23.48 FlgL Dv. vulgaris Rubrerythrin 1769571 gb t+ DVU3094 rubrerythrin 100 191 5.00E-111 392.9 Hildenborough f+ DVU0019 nigerythrin 34.41 186 1.90E-19 88.58 f+ DVU2318 rubrerythrin 32.61 184 6.30E-18 83.57 t- DVU3184 rubredoxin 31.25 48 1.6 25.79 t- DVU0033 isochorismatase family protein 30.61 49 2 25.41 preprotein translocase, SecY t- DVU1323 29.63 54 2 25.41 subunit t- DVU1525 conserved domain protein 38.71 31 2.6 25.02 Dv. vulgaris Rubrerythrin, con’t. 1769571 gb t- DVU1973 rhodanese-like domain protein 31.37 51 3.5 24.64 Hildenborough t- DVU0732 valyl-tRNA synthetase 22.45 49 4.5 24.25 t- DVU1766 aspartate ammonia-lyase 23.53 68 4.5 24.25 Dv. vulgaris Dissimilatory Sulfite Reductase, Sulfite reductase, beta 2828518 sp t+ DVU0403 100 381 0 798.5 Hildenborough Beta iron-sulfur cluster-binding t- DVU1080 27.12 118 3.10E-05 42.74 protein iron-sulfur cluster-binding t- DVU0172 31.25 48 0.055 31.96 protein pyruvate formate-lyase t- DVU2271 23.46 81 0.094 31.19 activating enzyme iron-sulfur cluster-binding t- DVU1931 31.82 44 0.123 30.8 protein iron-sulfur cluster-binding t- DVU1080 22.03 59 0.275 29.65 protein heterodisulfide reductase, A t- DVU2402 33.96 53 0.469 28.88 subunit

192 cysteine-rich domain/iron-sulfur t- DVU1558 40.74 27 0.8 28.11 cluster-binding domain sulfite reductase, assimilatory- t- DVU1597 33.33 60 0.8 28.11 type pyruvate-ferredoxin t- DVU3025 25.33 75 0.8 28.11 oxidoreductase pyridine nucleotide-disulfide t- DVU3292 30.77 39 0.8 28.11 oxidoreductase iron-sulfur cluster-binding t- DVU0686 26.32 57 1 27.72 protein periplasmic [Fe] hydrogenase, t- DVU1769 31.91 47 1.4 27.34 large subunit iron-sulfur cluster-binding t- DVU2493 33.33 45 1.8 26.95 protein succinate dehydrogenase and t- DVU2674 39.13 23 1.8 26.95 fumarate reductase iron-sulf molybdopterin oxidoreductase, t- DVU0693 36.67 30 2.3 26.56 iron-sulfur cluster-bindin iron-sulfur cluster-binding t- DVU1931 47.37 19 2.3 26.56 protein Sulfite reductase, beta, Dv. vulgaris 2828518 sp t- DVU0264 ferredoxin, 4Fe-4S 32.69 52 3 26.18 con’t. Hildenborough iron-sulfur cluster-binding t- DVU1782 18.38 136 3 26.18 protein iron-sulfur cluster-binding t- DVU2797 47.62 21 3 26.18 protein pyruvate formate-lyase t- DVU2271 32.56 43 4 25.79 activating enzyme iron-sulfur cluster-binding t- DVU3028 27.91 43 4 25.79 protein iron-sulfur cluster-binding t- DVU3109 47.37 19 4 25.79 protein glycolate oxidase, iron-sulfur t- DVU0826 36.84 19 5.2 25.41 subunit t- DVU2168 major head protein 25 84 5.2 25.41 t- DVU0305 ferredoxin II 33.33 27 6.8 25.02 glycolate oxidase, iron-sulfur t- DVU0826 40 15 6.8 25.02 subunit

193 t- DVU1062 conserved hypothetical protein 56.25 16 6.8 25.02 glutamate synthase, iron-sulfur t- DVU1823 32.5 40 6.8 25.02 cluster-binding subunit NAD-dependent t- DVU2996 epimerase/dehydratase family 29.27 41 6.8 25.02 protein iron-sulfur cluster-binding t- DVU3033 37.5 24 6.8 25.02 protein oxidoreductase, FAD/iron-sulfur t- DVU3071 45.45 22 6.8 25.02 cluster-binding domain t- DVU3276 ferredoxin I 34.48 29 8.8 24.64 glutamate synthase, iron-sulfur t- DVU3291 26.67 45 8.8 24.64 cluster-binding subunit Dv. vulgaris Dissimilatory Sulfite Reductase, Sulfite reductase, alpha 2828517 sp t+ DVU0402 96.57 437 0 888.3 Hildenborough Alpha sulfite reductase, assimilatory- t- DVU1597 32.84 67 0.029 33.11 type t- DVU0305 ferredoxin II 36.67 30 0.146 30.8 indolepyruvate ferredoxin t- DVU1951 37.04 27 2.1 26.95 oxidoreductase, alpha subunit Sulfite reductase, alpha, Dv. vulgaris 2828517 sp t- DVU3276 ferredoxin I 28.57 42 2.1 26.95 con’t. Hildenborough iron-sulfur cluster-binding t- DVU2797 29.73 37 2.7 26.56 protein iron-sulfur cluster-binding t- DVU1782 34.29 35 4.7 25.79 protein Dv. vulgaris Dissimilatory Sulfite Reductase, Sulfite reductase, gamma 1169436 sp t+ DVU2776 100 105 5.60E-61 224.9 Hildenborough Gamma methyl-accepting chemotaxis t- DVU0170 30 20 0.649 25.41 protein methyl-accepting chemotaxis t- DVU3155 20.69 29 1.9 23.87 protein DcrH peptide methionine sulfoxide t- DVU1984 38.71 31 3.2 23.1 reductase MsrA B12 binding domain t- DVU3016 protein/radical SAM domain 30.77 26 3.2 23.1 protein high-molecular-weight t- DVU0536 27.66 47 7.2 21.94

194 cytochrome C t- DVU0449 sensor/response regulator 57.14 14 9.4 21.56 t- DVU2079 sensory box histidine kinase 47.37 19 9.4 21.56 Dv. vulgaris DcrH 887858 gb t+ DVU3155 DcrH 89.93 963 0 1649.8 Hildenborough methyl-accepting chemotaxis f+ DVU0170 31.93 523 2.20E-64 240.7 protein methyl-accepting chemotaxis f+ DVU3082 34.71 363 6.90E-50 192.6 protein methyl-accepting chemotaxis f+ DVU0018 32.16 370 9.60E-44 172.2 protein methyl-accepting chemotaxis f+ DVU2317 30.52 426 6.90E-42 166 protein methyl-accepting chemotaxis f+ DVU2585 31.37 373 5.90E-41 162.9 protein methyl-accepting chemotaxis f+ DVU0700 26.72 595 1.30E-40 161.8 protein methyl-accepting chemotaxis f+ DVU3035 31.13 379 1.30E-40 161.8 protein Dv. vulgaris methyl-accepting chemotaxis DcrH, con’t. 887858 gb f+ DVU0645 32.47 385 1.70E-40 161.4 Hildenborough protein methyl-accepting chemotaxis f+ DVU2309 32.64 386 8.40E-40 159.1 protein methyl-accepting chemotaxis f+ DVU1400 29.93 421 1.10E-39 158.7 protein methyl-accepting chemotaxis f+ DVU0183 30.67 375 9.30E-39 155.6 protein methyl-accepting chemotaxis f+ DVU1975 28.87 426 2.70E-38 154.1 protein methyl-accepting chemotaxis f+ DVU0608 30.29 383 2.30E-37 151 protein methyl-accepting chemotaxis f+ DVU1857 29.79 376 1.10E-36 148.7 protein methyl-accepting chemotaxis f+ DVU2738 33 297 1.40E-34 141.7 protein methyl-accepting chemotaxis f+ DVU0700 33.55 304 2.40E-34 141 protein methyl-accepting chemotaxis 195 f+ DVU2295 31.79 280 1.30E-32 135.2 protein methyl-accepting chemotaxis f+ DVU1869 30.95 294 3.80E-32 133.7 protein methyl-accepting chemotaxis f+ DVU0935 28.49 344 2.50E-31 131 protein methyl-accepting chemotaxis f+ DVU0344 29.53 359 3.60E-30 127.1 protein methyl-accepting chemotaxis f+ DVU0094 27.3 381 3.00E-29 124 protein methyl-accepting chemotaxis f+ DVU0750 30.04 283 4.10E-26 113.6 protein methyl-accepting chemotaxis f+ DVU0668 29.77 299 6.90E-26 112.8 protein methyl-accepting chemotaxis f+ DVU3182 26.14 329 5.90E-25 109.8 protein DcrA methyl-accepting chemotaxis f+ DVU1884 27.03 296 3.80E-24 107.1 protein methyl-accepting chemotaxis f+ DVU1169 26.38 307 1.90E-23 104.8 protein Dv. vulgaris methyl-accepting chemotaxis DcrH, con’t. 887858 gb f+ DVU1884 20.88 704 7.40E-20 92.82 Hildenborough protein t- DVU3049 hemerythrin family protein 28.57 133 2.70E-09 57.77 methyl-accepting chemotaxis t- DVU0591 19.7 269 1.70E-08 55.07 protein methyl-accepting chemotaxis t- DVU1962 19.23 364 1.50E-07 51.99 protein t- DVU3106 GGDEF domain protein 28.79 132 1.60E-06 48.52 methyl-accepting chemotaxis t- DVU2738 20.86 350 6.80E-05 43.13 protein methyl-accepting chemotaxis t- DVU1869 35 60 7.50E-04 39.66 protein arginine N-succinyltransferase, t- DVU1592 20.56 180 0.091 32.73 beta subunit methyl-accepting chemotaxis t- DVU2295 27.66 47 0.091 32.73 protein methyl-accepting chemotaxis t- DVU0750 31.03 58 0.347 30.8 protein

196 methyl-accepting chemotaxis t- DVU1169 17.81 421 0.347 30.8 protein t- DVU0407 rare lipoprotein A family protein 29.82 57 1 29.26 fructose-1,6-bisphosphatase, t- DVU1539 36.07 61 1.7 28.49 class II (glpX) methyl-accepting chemotaxis t- DVU0668 19.73 375 2.9 27.72 protein t- DVU1806 magnesium transporter (mgtE) 27.87 61 3.8 27.34 t- DVU0337 hypothetical protein 30 70 5 26.95 type III secretion protein, YscD t- DVUA0113 29.51 61 5 26.95 family t- DVU2931 sensory box histidine kinase 27.03 74 6.5 26.56 t- DVU0013 sensory box histidine kinase 22.7 141 8.5 26.18 anthranilate t- DVU0467 24 75 8.5 26.18 phosphoribosyltransferase (trpD) t- DVU1267 hypothetical protein 23.08 91 8.5 26.18 t- DVU2171 portal protein 36.36 66 8.5 26.18 sensor histidine kinase/response t- DVU3062 36.36 33 8.5 26.18 regulator

Dv. vulgaris Hybrid Cluster Protein 3915811 sp t+ DVU2013 Hybrid Cluster Protein 97.65 553 0 1080.5 Hildenborough f+ DVU2543 hybrid cluster protein 44.58 554 7.00E-131 460.7 carbon monoxide dehydrogenase t- DVU2098 26.22 225 7.20E-09 55.45 (cooS) ABC transporter, ATP-binding t- DVU2380 31.82 44 8 25.41 protein t- DVU2494 peptidase, M48 family 19.15 94 8 25.41 Dv. vulgaris Hybrid Cluster Protein 3123494 emb t+ DVU2013 Hybrid Cluster Protein 97.65 553 0 1080.5 Hildenborough f+ DVU2543 hybrid cluster protein 44.58 554 7.00E-131 460.7 carbon monoxide dehydrogenase t- DVU2098 26.22 225 7.20E-09 55.45 (cooS) ABC transporter, ATP-binding t- DVU2380 31.82 44 8 25.41 protein t- DVU2494 peptidase, M48 family 19.15 94 8 25.41 Dv. vulgaris

197 Hybrid Cluster Protein 11374312 pir t+ DVU2013 Hybrid Cluster Protein 97.65 553 0 1080.5 Hildenborough f+ DVU2543 hybrid cluster protein 44.58 554 7.00E-131 460.7 carbon monoxide dehydrogenase t- DVU2098 26.22 225 7.20E-09 55.45 (cooS) ABC transporter, ATP-binding t- DVU2380 31.82 44 8 25.41 protein t- DVU2494 peptidase, M48 family 19.15 94 8 25.41 Assimilatory Sulfite Dv. vulgaris 47606708 sp t+ DVU1597 Assimilatory Sulfite Reductase 100 218 2.00E-126 444.1 Reductase Hildenborough dissimilatory sulfite reductase t- DVU0402 32.84 67 0.012 33.11 alpha subunit pyridine nucleotide-disulfide t- DVU3292 23.16 95 0.101 30.03 oxidoreductase dissimilatory sulfite reductase t- DVU0403 33.33 60 0.383 28.11 beta subunit iron-sulfur cluster-binding t- DVU1080 25 68 0.383 28.11 protein DNA-binding domain, t- DVU3193 28.57 56 0.852 26.95 excisionase family Assimilatory Sulfite Dv. vulgaris 47606708 sp t- DVU1547 sensory box protein 29.23 65 1.9 25.79 Reductase, con’t. Hildenborough phenylacetate-coenzyme A t- DVU1615 32.61 46 5.5 24.25 ligase (paaK-2) heptosyltransferase family t- DVU1956 32.08 53 5.5 24.25 protein t- DVU1026 uracil permease (uraA) 45.45 22 7.2 23.87 iron-sulfur cluster-binding t- DVU1080 30.77 26 7.2 23.87 protein t- DVU1089 alanyl-tRNA synthetase 35.71 28 9.4 23.48 t- DVU1357 conserved domain protein 33.93 56 9.4 23.48 t- DVU1834 pyruvate carboxylase, putative 31.58 38 9.4 23.48 Assimilatory Sulfite Dv. vulgaris 79431 pir t+ DVU1597 Assimilatory Sulfite Reductase 99.54 218 5.00E-126 443 Reductase Hildenborough dissimilatory sulfite reductase t- DVU0402 31.34 67 0.026 31.96 alpha subunit pyridine nucleotide-disulfide t- DVU3292 23.16 95 0.101 30.03 oxidoreductase

198 dissimilatory sulfite reductase t- DVU0403 33.33 60 0.224 28.88 beta subunit iron-sulfur cluster-binding t- DVU1080 25 68 0.383 28.11 protein DNA-binding domain, t- DVU3193 28.57 56 1.1 26.56 excisionase family t- DVU1547 sensory box protein 29.23 65 1.9 25.79 phenylacetate-coenzyme A t- DVU1615 32.61 46 5.5 24.25 ligase (paaK-2) heptosyltransferase family t- DVU1956 32.08 53 5.5 24.25 protein t- DVU1026 uracil permease (uraA) 45.45 22 7.2 23.87 iron-sulfur cluster-binding t- DVU1080 30.77 26 7.2 23.87 protein t- DVU1089 alanyl-tRNA synthetase 35.71 28 9.4 23.48 t- DVU1357 conserved domain protein 33.93 56 9.4 23.48 t- DVU1834 pyruvate carboxylase, putative 31.58 38 9.4 23.48

Dv. vulgaris Periplasmic Fe-Only Fe Hydrogenase, Small 66327 pir t+ DVU1770 100 123 8.00E-69 251.5 Hildenborough Hydrogenase, Small Periplasmic Fe-Only t- DVU1771 34 50 1.40E-04 38.12 Hydrogenase, gamma t- DVU2798 ApbE family protein 23.28 116 0.938 25.41 t- DVU2211 hypothetical protein 25 32 2.7 23.87 t- DVU2094 thiG protein 28.57 63 4.7 23.1 t- DVU2088 membrane protein 47.62 21 7.9 22.33 t- DVU2415 hypothetical protein 25.53 47 7.9 22.33 Dv. vulgaris Periplasmic Fe-Only Fe Hydrogenase, Large 66320 pir t- DVU1769 95.96 421 0 33.11 Hildenborough Hydrogenase, Large Periplasmic Fe-Only t- DVU1771 43.92 362 1.20E-74 32.73 Hydrogenase, gamma iron-sulfur cluster-binding t- DVU0686 37.5 56 1.10E-06 40.05 protein pyruvate ferredoxin t- DVU1944 oxidoreductase, iron-sulfur 41.67 48 9.30E-06 26.95

199 binding iron-sulfur cluster-binding t- DVU1080 31.67 60 1.30E-04 36.96 protein heterodisulfide reductase, A t- DVU2402 38.18 55 1.30E-04 25.41 subunit heterodisulfide reductase, iron- t- DVU0849 29.41 68 1.80E-04 38.12 sulfur-binding subunit iron-sulfur cluster-binding t- DVU2797 27.78 90 2.30E-04 28.88 protein iron-sulfur cluster-binding t- DVU0172 27.91 86 3.00E-04 828.9 protein iron-sulfur cluster-binding t- DVU0686 33.33 60 3.00E-04 37.74 protein t- DVU2289 hydrogenase, CooX subunit 32.31 65 5.10E-04 26.95 glutamate synthase, iron-sulfur t- DVU1823 21.43 126 8.70E-04 31.96 cluster-binding subunit iron-sulfur cluster-binding t- DVU1931 30.77 52 8.70E-04 28.88 protein iron-sulfur cluster-binding t- DVU2797 34.04 47 0.001 27.72 protein Fe Hydrogenase, Large, Dv. vulgaris glutamate synthase, iron-sulfur 66320 pir t- DVU3291 29.03 93 0.001 25.41 con’t. Hildenborough cluster-binding subunit t- DVU1220 nitroreductase family protein 32.76 58 0.002 35.42 pyruvate formate-lyase t- DVU2271 34.43 61 0.002 31.19 activating enzyme heterodisulfide reductase, t- DVU0850 26.36 129 0.003 37.35 transmembrane subunit iron-sulfur cluster- t- DVU2103 29.55 88 0.006 25.79 binding/ATPase domain protein iron-sulfur cluster-binding t- DVU3109 33.93 56 0.006 24.64 protein iron-sulfur cluster-binding t- DVU2544 32.73 55 0.007 26.56 protein iron-sulfur cluster-binding t- DVU2493 29.79 47 0.021 30.03 protein t- DVU2293 iron-sulfur protein CooF 28.95 76 0.028 31.19 cysteine-rich domain/iron-sulfur t- DVU1558 28.79 66 0.037 33.5 cluster-binding domain

200 adenylylsulphate reductase, beta t- DVU0846 31.03 58 0.048 38.12 subunit iron-sulfur cluster-binding t- DVU2493 37.5 40 0.048 29.65 protein reductase, iron-sulfur binding t- DVU1287 42.11 38 0.062 26.56 subunit, putative t- DVU0305 ferredoxin II 29.69 64 0.081 39.66 pyridine nucleotide-disulfide t- DVU3292 36.96 46 0.081 25.41 oxidoreductase iron-sulfur cluster- t- DVU2104 31.82 66 0.106 31.19 binding/ATPase domain protein pyruvate-ferredoxin t- DVU3025 30.88 68 0.106 27.72 oxidoreductase t- DVU3276 ferredoxin I 27.78 54 0.106 25.79 iron-sulfur cluster-binding t- DVU0498 39.13 23 0.139 25.41 protein glycolate oxidase, iron-sulfur t- DVU0826 21.35 89 0.139 38.89 subunit t- DVU0531 hmc operon protein 6 36.67 30 0.181 28.49 Fe Hydrogenase, Large, Dv. vulgaris iron-sulfur cluster-binding 66320 pir t- DVU1614 29.09 55 0.181 32.34 con’t. Hildenborough protein electron transport complex t- DVU2792 30.43 46 0.237 26.95 protein RnfC iron-sulfur cluster-binding t- DVU3350 31.03 58 0.237 24.64 protein heterodisulfide reductase, C t- DVU2404 24.19 62 0.309 30.03 subunit iron-sulfur cluster-binding t- DVU1782 75 12 0.404 32.34 protein iron-sulfur cluster-binding t- DVU3033 21.49 121 0.404 26.56 protein reductase, iron-sulfur binding t- DVU1287 31.25 48 0.528 35.04 subunit, putative hydrogenase, iron-sulfur cluster- t- DVU2401 21.84 87 0.528 26.56 binding subunit iron-sulfur cluster-binding t- DVU1080 50 18 0.689 26.18 protein pyruvate-ferredoxin 201 t- DVU3025 34.78 46 0.9 27.72 oxidoreductase iron-sulfur cluster-binding t- DVU0172 31.11 45 1.2 273.5 protein t- DVU0429 Ech hydrogenase, subunit EchF 27.66 47 1.2 40.82 molybdopterin oxidoreductase, t- DVU0693 25.64 78 1.2 39.66 iron-sulfur cluster-bindin iron-sulfur cluster-binding t- DVU1081 29.82 57 1.2 36.58 protein t- DVU2400 hydrogenase, putative 26.42 53 1.2 30.8 dissimilatory sulfite reductase t- DVU0403 31.91 47 1.5 44.67 beta subunit iron-sulfur cluster-binding t- DVU1931 36 25 1.5 31.57 protein iron-sulfur cluster-binding t- DVU2544 35 40 1.5 29.26 protein t- DVU0535 hmc operon protein 2 26.98 63 2 25.41 iron-sulfur cluster-binding t- DVU1782 27.45 51 2 24.64 protein Fe Hydrogenase, Large, Dv. vulgaris pyruvate-ferredoxin 66320 pir t- DVU3025 36.84 19 2 27.72 con’t. Hildenborough oxidoreductase pyridine nucleotide-disulfide t- DVU3292 58.82 17 2 25.02 oxidoreductase iron-sulfur cluster-binding t- DVU0498 42.11 19 2.6 25.41 protein iron-sulfur cluster- t- DVU2103 42.86 21 2.6 31.57 binding/ATPase domain protein heterodisulfide reductase, C t- DVU2404 32.35 34 2.6 25.41 subunit t- DVU3144 cytochrome c family protein 27.27 77 2.6 24.64 pyruvate formate-lyase t- DVU2271 31.71 41 3.4 28.11 activating enzyme iron-sulfur cluster-binding t- DVU3028 50 16 3.4 27.34 protein t- DVU2400 hydrogenase, putative 53.33 15 4.5 30.8 succinate dehydrogenase and t- DVU2674 53.33 15 4.5 29.26 fumarate reductase iron-sulf

202 pyridine nucleotide-disulfide t- DVU3292 47.06 17 4.5 25.02 oxidoreductase iron-sulfur cluster-binding t- DVU0498 22.89 83 5.8 40.82 protein iron-sulfur cluster-binding t- DVU1080 38.1 21 5.8 36.96 protein heterodisulfide reductase, A t- DVU2402 28.33 60 5.8 30.42 subunit heterodisulfide reductase, A t- DVU2402 26.67 75 5.8 30.42 subunit formate dehydrogenase t- DVU2810 32.14 28 5.8 25.79 formation protein FdhE oxidoreductase, FAD/iron-sulfur t- DVU3071 33.33 36 5.8 26.18 cluster-binding domain iron-sulfur cluster-binding t- DVU3350 45 20 5.8 24.64 protein t- DVU0535 hmc operon protein 2 53.33 15 7.6 40.43 t- DVU1196 leucyl-tRNA synthetase 30 70 7.6 35.42 transcriptional regulator, LysR t- DVU1402 35 40 7.6 27.34 family Fe Hydrogenase, Large, Dv. vulgaris oxidoreductase, FAD/iron-sulfur 66320 pir t- DVU0253 45.83 24 10 47.75 con’t. Hildenborough cluster-binding domain molybdopterin oxidoreductase, t- DVU0694 29.03 31 10 27.72 molybdopterin-binding subu adenylylsulphate reductase, beta t- DVU0846 30 30 10 27.34 subunit iron-sulfur cluster-binding t- DVU3028 30.23 43 10 26.95 protein iron-sulfur cluster-binding t- DVU3028 23.81 63 10 25.02 protein Dv. vulgaris Ferredoxin II 6015132 sp t- DVU0305 Ferredoxin II 82.81 64 1.70E-28 116.7 Miyazaki periplasmic [Fe] hydrogenase, t- DVU1769 23.21 56 0.007 31.57 large subunit t- DVU3276 ferredoxin I 50 20 0.007 31.57 cysteine-rich domain/iron-sulfur t- DVU1558 47.62 21 0.035 29.26 cluster-binding domain iron-sulfur cluster-binding t- DVU3350 55.56 18 0.035 29.26

203 protein iron-sulfur cluster-binding t- DVU0686 32.73 55 0.045 28.88 protein iron-sulfur cluster-binding t- DVU1614 45 20 0.045 28.88 protein heterodisulfide reductase, A t- DVU2402 23.29 73 0.059 28.49 subunit iron-sulfur cluster-binding t- DVU3143 39.39 33 0.077 28.11 protein iron-sulfur cluster-binding t- DVU0172 41.18 17 0.132 27.34 protein t- DVU2400 hydrogenase, putative 47.06 17 0.132 27.34 iron-sulfur cluster-binding t- DVU2544 27.14 70 0.173 26.95 protein iron-sulfur cluster-binding t- DVU1931 36.67 30 0.225 26.56 protein pyruvate formate-lyase t- DVU2271 31.37 51 0.225 26.56 activating enzyme t- DVU1743 hypothetical protein 28.33 60 0.294 26.18 Dv. vulgaris electron transport complex Ferredoxin II, con’t. 6015132 sp t- DVU2792 36 25 0.385 25.79 Miyazaki protein RnfC pyruvate-ferredoxin t- DVU3025 53.85 13 0.385 25.79 oxidoreductase iron-sulfur cluster-binding t- DVU3350 23.81 63 0.385 25.79 protein iron-sulfur cluster-binding t- DVU0172 38.46 26 0.502 25.41 protein t- DVU0429 Ech hydrogenase, subunit EchF 24 50 0.502 25.41 adenylylsulphate reductase, beta t- DVU0846 41.18 17 0.502 25.41 subunit ATP-dependent Clp protease, t- DVU1336 41.94 31 0.502 25.41 ATP-binding subunit ClpX formate dehydrogenase, beta t- DVU2481 36.36 22 0.502 25.41 subunit, putative iron-sulfur cluster-binding t- DVU2797 25.93 54 0.502 25.41 protein iron-sulfur cluster-binding t- DVU1080 53.33 15 0.656 25.02 protein 204 pyruvate ferredoxin t- DVU1944 oxidoreductase, iron-sulfur 34.21 38 0.656 25.02 binding iron-sulfur cluster-binding t- DVU2493 61.54 13 0.656 25.02 protein iron-sulfur cluster-binding t- DVU1080 44.44 18 0.857 24.64 protein dissimilatory sulfite reductase t- DVU0402 33.33 24 1.1 24.25 alpha subunit iron-sulfur cluster-binding t- DVU0498 50 16 1.1 24.25 protein glutamate synthase, iron-sulfur t- DVU1823 37.5 16 1.1 24.25 cluster-binding subunit glutamate synthase, iron-sulfur t- DVU1823 22.81 57 1.1 24.25 cluster-binding subunit iron-sulfur cluster-binding t- DVU1931 50 16 1.1 24.25 protein iron-sulfur cluster-binding t- DVU2797 27.66 47 1.1 24.25 protein Dv. vulgaris glycolate oxidase, iron-sulfur Ferredoxin II, con’t. 6015132 sp t- DVU0826 37.5 16 1.5 23.87 Miyazaki subunit formate dehydrogenase, beta t- DVU2811 37.5 24 1.9 23.48 subunit, putative dissimilatory sulfite reductase t- DVU0403 35.29 17 2.5 23.1 beta subunit molybdopterin oxidoreductase, t- DVU0693 20 65 2.5 23.1 iron-sulfur cluster-bindin t- DVU3276 ferredoxin I 41.18 17 2.5 23.1 glycolate oxidase, iron-sulfur t- DVU0826 31.82 22 3.3 22.71 subunit t- DVU2246 S1 RNA binding domain protein 25 28 3.3 22.71 heterodisulfide reductase, A t- DVU2402 61.54 13 3.3 22.71 subunit heterodisulfide reductase, C t- DVU2404 50 12 3.3 22.71 subunit iron-sulfur cluster-binding t- DVU2797 24 50 3.3 22.71 protein

205 t- DVU0264 ferredoxin, 4Fe-4S 46.15 13 4.3 22.33 dissimilatory sulfite reductase t- DVU0402 25 52 4.3 22.33 alpha subunit t- DVU0429 Ech hydrogenase, subunit EchF 43.75 16 4.3 22.33 heterodisulfide reductase, A t- DVU2402 35.29 17 4.3 22.33 subunit t- DVU0531 hmc operon protein 6 29.17 24 5.6 21.94 Periplasmic Fe-Only t- DVU1771 20.59 68 5.6 21.94 Hydrogenase, gamma succinate dehydrogenase and t- DVU2674 42.86 14 5.6 21.94 fumarate reductase iron-sulf pyruvate-ferredoxin t- DVU3025 37.5 16 5.6 21.94 oxidoreductase oxidoreductase, FAD/iron-sulfur t- DVU0253 42.86 14 7.3 21.56 cluster-binding domain t- DVU1220 nitroreductase family protein 50 12 7.3 21.56 t- DVU2293 iron-sulfur protein CooF 50 12 7.3 21.56 iron-sulfur cluster-binding t- DVU3033 54.55 11 7.3 21.56 protein Dv. vulgaris pyridine nucleotide-disulfide Ferredoxin II, con’t. 6015132 sp t- DVU3292 35.29 17 7.3 21.56 Miyazaki oxidoreductase oxidoreductase, FAD/iron-sulfur t- DVU0253 29.27 41 9.5 21.17 cluster-binding domain acetolactate synthase, large t- DVU0360 34.78 23 9.5 21.17 subunit, biosynthetic type ADP-L-glycero-D- t- DVU0481 mannoheptose-6-epimerase 37.5 24 9.5 21.17 (rfaD) heterodisulfide reductase, iron- t- DVU0849 46.15 13 9.5 21.17 sulfur-binding subunit heterodisulfide reductase, iron- t- DVU0849 50 12 9.5 21.17 sulfur-binding subunit iron-sulfur cluster-binding t- DVU1782 33.33 18 9.5 21.17 protein iron-sulfur cluster-binding t- DVU3028 31.58 19 9.5 21.17 protein Dv. vulgaris Cytochrome c3 Precursor 2851444 sp t+ DVU3171 Cytochrome c3 72.09 129 9.60E-55 204.9

206 Miyazaki f+ DVU2809 cytochrome c3 37.36 91 1.70E-11 61.23 t- DVU2483 cytochrome c family protein 35.71 98 1.20E-09 55.07 high-molecular-weight t- DVU0536 32.89 76 7.00E-05 39.28 cytochrome C t- DVU2791 cytochrome c family protein 32.91 79 4.50E-04 36.58 t- DVU2791 cytochrome c family protein 25.77 97 0.019 31.19 high-molecular-weight t- DVU0536 26.92 104 0.032 30.42 cytochrome C NapC/NirT cytochrome c family t- DVU0624 29.9 97 0.042 30.03 protein t- DVU1817 cytochrome c-553 47.83 23 0.612 26.18 t- DVU0263 acidic cytochrome c3 20 50 1 25.41 t- DVU3144 cytochrome c family protein 63.64 11 4 23.48 t- DVU2732 conserved hypothetical protein 26.09 46 5.2 23.1 glutamate synthase, iron-sulfur t- DVU3291 26.67 60 5.2 23.1 cluster-binding subunit sensory box/GGDEF t- DVU0422 50 20 6.8 22.71 domain/EAL domain protein Cytochrome c3 Precursor, Dv. vulgaris molybdopterin oxidoreductase, 2851444 sp t- DVU0693 44 25 6.8 22.71 con’t. Miyazaki iron-sulfur cluster-bindin t- DVU0732 valyl-tRNA synthetase 33.33 24 6.8 22.71 C-5 cytosine-specific DNA t- DVU1746 42.11 19 8.8 22.33 methylase family protein Cytochrome c553 Dv. vulgaris 1705528 sp t+ DVU1817 Cytochrome c553 65.05 103 2.00E-35 139.4 Precursor Miyazaki t- DVU3041 cytochrome c553 38.24 68 5.60E-09 51.6 t- DVU1812 cytochrome c oxidase, subunit II 26.44 87 0.066 28.11 t- DVU3171 cytochrome c3 57.89 19 0.252 26.18 peptidyl-prolyl cis-trans t- DVU1873 35 40 0.734 24.64 isomerase B t- DVU0398 HMGL-like domain protein 27.87 61 1.6 23.48 t- DVU1812 cytochrome c oxidase, subunit II 45.45 22 2.8 22.71 t- DVU0213 conserved domain protein 50 20 3.6 22.33 t- DVU2856 conserved domain protein 50 20 3.6 22.33 t- DVUA0127 hypothetical protein 30.95 42 3.6 22.33 t- DVU0223 conserved hypothetical protein 47.62 21 4.8 21.94

207 glycosyl transferase, group 2 t- DVU2354 37.5 24 4.8 21.94 family protein methyl-accepting chemotaxis t- DVU3155 45.45 22 4.8 21.94 protein DcrH conserved domain t- DVU0093 protein/glycosyl transferase, 26.83 41 6.2 21.56 group 1 8-amino-7-oxononanoate t- DVU2564 25.71 35 6.2 21.56 synthase, putative B12 binding domain t- DVU3016 protein/radical SAM domain 24.24 33 6.2 21.56 protein high-molecular-weight t- DVU0536 34.48 29 8.1 21.17 cytochrome C Dv. vulgaris Rubredoxin 134115 sp t+ DVU3184 Rubredoxin 90.38 52 8.60E-28 114.4 Miyazaki f+ DVU3093 rubredoxin-like protein 58.14 43 3.30E-11 59.31 t- DVU2318 rubrerythrin 32.61 46 0.028 29.65 t- DVU1511 hypothetical protein 52.38 21 0.18 26.95 t- DVU0019 nigerythrin 31.25 48 0.525 25.41 Dv. vulgaris Rubredoxin, con’t. 134115 sp t- DVU3094 rubrerythrin 29.17 48 0.686 25.02 Miyazaki t- DVU3389 DNA topoisomerase I 47.06 17 2.6 23.1 t- DVU1503 terminase, large subunit 38.1 21 4.4 22.33 t- DVU1932 adenylate kinase 47.37 19 4.4 22.33 t- DVU0959 replicative DNA helicase (dnaB) 61.54 13 5.8 21.94 t- DVU2904 radical SAM enzyme, Cfr family 57.14 14 5.8 21.94 t- DVU3389 DNA topoisomerase I 53.85 13 5.8 21.94 ATP-dependent DNA helicase, t- DVU0453 46.67 15 7.6 21.56 UvrD/REP family t- DVU1316 ribosomal protein S14 66.67 9 7.6 21.56 t- DVU1986 conserved hypothetical protein 45.45 22 7.6 21.56 radical SAM/B12 binding t- DVU3019 40.74 27 7.6 21.56 domain protein t- DVU0448 GDP-mannose 4,6-dehydratase 40 25 9.9 21.17 t- DVU0711 hypothetical protein 42.86 14 9.9 21.17 sensory box histidine t- DVU2129 75 12 9.9 21.17 kinase/response regulator

208 Dv. vulgaris Ferredoxin I 119924 sp t- DVU3276 Ferredoxin I 95.08 61 1.90E-32 129.8 Miyazaki iron-sulfur cluster-binding t- DVU0498 29.35 92 1.80E-06 43.51 protein heterodisulfide reductase, A t- DVU2402 36.84 57 3.10E-06 42.74 subunit iron-sulfur cluster-binding t- DVU2797 34.78 69 6.80E-06 41.59 protein iron-sulfur cluster-binding t- DVU2797 39.02 41 6.80E-06 41.59 protein t- DVU1220 nitroreductase family protein 32.2 59 8.90E-06 41.2 molybdopterin oxidoreductase, t- DVU0693 31.4 86 1.20E-05 40.82 iron-sulfur cluster-bindin pyruvate ferredoxin t- DVU1944 oxidoreductase, iron-sulfur 42.11 57 1.20E-05 40.82 binding adenylylsulphate reductase, beta t- DVU0846 38.98 59 3.40E-05 39.28 subunit iron-sulfur cluster-binding t- DVU2493 37.5 48 5.80E-05 38.51 protein Dv. vulgaris pyruvate formate-lyase Ferredoxin I, con’t. 119924 sp t- DVU2271 36.84 57 9.90E-05 37.74 Miyazaki activating enzyme iron-sulfur cluster-binding t- DVU2544 32.73 55 9.90E-05 37.74 protein iron-sulfur cluster-binding t- DVU0686 32.56 43 1.30E-04 37.35 protein t- DVU0305 ferredoxin II 57.14 21 2.20E-04 36.58 iron-sulfur cluster-binding t- DVU1931 26.32 57 3.80E-04 35.81 protein iron-sulfur cluster- t- DVU2104 33.96 53 3.80E-04 35.81 binding/ATPase domain protein pyridine nucleotide-disulfide t- DVU3292 31.67 60 3.80E-04 35.81 oxidoreductase iron-sulfur cluster-binding t- DVU1081 32.76 58 4.90E-04 35.42 protein iron-sulfur cluster-binding t- DVU0172 33.33 36 0.001 33.88 protein iron-sulfur cluster-binding t- DVU0686 31.37 51 0.001 34.27 protein 209 iron-sulfur cluster-binding t- DVU3109 29.09 55 0.001 33.88 protein iron-sulfur cluster- t- DVU2103 28.33 60 0.002 33.11 binding/ATPase domain protein t- DVU2293 iron-sulfur protein CooF 33.33 45 0.002 33.5 iron-sulfur cluster-binding t- DVU3350 33.9 59 0.002 33.5 protein reductase, iron-sulfur binding t- DVU1287 35.9 39 0.003 32.73 subunit, putative periplasmic [Fe] hydrogenase, t- DVU1769 30.91 55 0.004 32.34 large subunit t- DVU0535 hmc operon protein 2 29.69 64 0.005 31.96 iron-sulfur cluster-binding t- DVU2544 48.15 27 0.005 31.96 protein iron-sulfur cluster-binding t- DVU2797 30.43 46 0.005 31.96 protein iron-sulfur cluster-binding t- DVU1931 48.39 31 0.007 31.57 protein Dv. vulgaris iron-sulfur cluster-binding Ferredoxin I, con’t. 119924 sp t- DVU2544 37.5 32 0.007 31.57 Miyazaki protein pyruvate ferredoxin/flavodoxin t- DVU0374 32.81 64 0.009 31.19 oxidoreductase family pro iron-sulfur cluster-binding t- DVU1614 47.62 21 0.012 30.8 protein glutamate synthase, iron-sulfur t- DVU1823 28 50 0.012 30.8 cluster-binding subunit pyruvate-ferredoxin t- DVU3025 55 20 0.021 30.03 oxidoreductase iron-sulfur cluster-binding t- DVU1614 30.77 65 0.027 29.65 protein t- DVU2289 hydrogenase, CooX subunit 44.44 27 0.027 29.65 glutamate synthase, iron-sulfur t- DVU3291 27.5 80 0.027 29.65 cluster-binding subunit iron-sulfur cluster-binding t- DVU0172 31.37 51 0.035 29.26 protein indolepyruvate ferredoxin t- DVU1951 40.91 22 0.035 29.26 oxidoreductase, alpha subunit 210 heterodisulfide reductase, A t- DVU2402 40.74 27 0.035 29.26 subunit iron-sulfur cluster-binding t- DVU3143 23.33 60 0.035 29.26 protein pyruvate formate-lyase t- DVU2271 52.38 21 0.046 28.88 activating enzyme pyridine nucleotide-disulfide t- DVU3292 40 25 0.06 28.49 oxidoreductase dissimilatory sulfite reductase t- DVU0402 30.95 42 0.078 28.11 alpha subunit heterodisulfide reductase, A t- DVU2402 27.03 74 0.102 27.72 subunit oxidoreductase, FAD/iron-sulfur t- DVU3071 52.94 17 0.102 27.72 cluster-binding domain t- DVU0305 ferredoxin II 23.44 64 0.134 27.34 iron-sulfur cluster-binding t- DVU2493 45.45 22 0.175 26.95 protein iron-sulfur cluster-binding t- DVU3033 26.87 67 0.175 26.95 protein Dv. vulgaris iron-sulfur cluster-binding Ferredoxin I, con’t. 119924 sp t- DVU3143 31.43 35 0.175 26.95 Miyazaki protein iron-sulfur cluster- t- DVU2103 30.3 33 0.298 26.18 binding/ATPase domain protein pyruvate formate-lyase 1 t- DVU2825 55.56 18 0.298 26.18 activating enzyme, putative iron-sulfur cluster-binding t- DVU3143 53.33 15 0.298 26.18 protein Periplasmic Fe-Only t- DVU1771 47.06 17 0.389 25.79 Hydrogenase, gamma iron-sulfur cluster-binding t- DVU2797 52.63 19 0.389 25.79 protein pyruvate formate-lyase 1 t- DVU2825 24.39 41 0.508 25.41 activating enzyme, putative iron-sulfur cluster-binding t- DVU3143 53.33 15 0.508 25.41 protein t- DVU0429 Ech hydrogenase, subunit EchF 42.86 21 0.663 25.02 heterodisulfide reductase, A t- DVU2402 50 20 0.663 25.02 subunit 211 oxidoreductase, FAD/iron-sulfur t- DVU3071 24.69 81 0.663 25.02 cluster-binding domain iron-sulfur cluster-binding t- DVU0498 39.29 28 0.866 24.64 protein heterodisulfide reductase, iron- t- DVU0849 26.53 49 0.866 24.64 sulfur-binding subunit pyruvate-ferredoxin t- DVU3025 34.78 23 0.866 24.64 oxidoreductase t- DVU0535 hmc operon protein 2 34.78 23 1.1 24.25 iron-sulfur cluster-binding t- DVU1080 23.81 63 1.1 24.25 protein glycolate oxidase, iron-sulfur t- DVU0826 37.5 24 1.5 23.87 subunit iron-sulfur cluster-binding t- DVU1080 35 20 1.5 23.87 protein t- DVU2400 hydrogenase, putative 53.33 15 1.5 23.87 hydrogenase, iron-sulfur cluster- t- DVU2401 50 16 1.5 23.87 binding subunit Dv. vulgaris formate dehydrogenase, beta Ferredoxin I, con’t. 119924 sp t- DVU2481 30.43 23 1.5 23.87 Miyazaki subunit, putative iron-sulfur cluster-binding t- DVU2493 50 18 1.5 23.87 protein t- DVU0263 acidic cytochrome c3 37.04 27 1.9 23.48 iron-sulfur cluster-binding t- DVU0686 36.67 30 1.9 23.48 protein Periplasmic Fe-Only t- DVU1771 61.54 13 1.9 23.48 Hydrogenase, gamma glycolate oxidase, iron-sulfur t- DVU0826 42.86 21 2.5 23.1 subunit heterodisulfide reductase, iron- t- DVU0849 40 20 2.5 23.1 sulfur-binding subunit glutamate synthase, iron-sulfur t- DVU1823 38.89 18 2.5 23.1 cluster-binding subunit pyruvate formate-lyase t- DVU2271 37.93 29 2.5 23.1 activating enzyme succinate dehydrogenase and t- DVU2674 43.48 23 2.5 23.1 fumarate reductase iron-sulf 212 pyruvate-ferredoxin t- DVU3025 41.38 29 2.5 23.1 oxidoreductase iron-sulfur cluster-binding t- DVU0172 43.48 23 3.3 22.71 protein oxidoreductase, FAD/iron-sulfur t- DVU0253 47.06 17 3.3 22.71 cluster-binding domain glutamate synthase, iron-sulfur t- DVU1823 38.89 18 3.3 22.71 cluster-binding subunit dissimilatory sulfite reductase t- DVU0403 30 20 4.3 22.33 beta subunit dissimilatory sulfite reductase t- DVU0403 31.03 29 4.3 22.33 beta subunit iron-sulfur cluster-binding t- DVU3350 42.11 19 4.3 22.33 protein iron-sulfur cluster-binding t- DVU0172 26.32 19 5.6 21.94 protein t- DVU0429 Ech hydrogenase, subunit EchF 27.27 33 5.6 21.94 iron-sulfur cluster-binding t- DVU0686 46.15 13 5.6 21.94 protein Dv. vulgaris Ferredoxin I, con’t. 119924 sp t- DVU2400 hydrogenase, putative 46.67 15 5.6 21.94 Miyazaki t- DVU0204 lipoprotein, putative 36 25 7.3 21.56 glycolate oxidase, iron-sulfur t- DVU0826 42.11 19 7.3 21.56 subunit t- DVU1318 ribosomal protein L6 (rplF) 60 15 7.3 21.56 oxidoreductase, FAD/iron-sulfur t- DVU0253 57.14 14 9.6 21.17 cluster-binding domain t- DVU1010 hypothetical protein 38.89 18 9.6 21.17 Periplasmic NiFe Dv. vulgaris Periplasmic NiFe Hydrogenase, 130115 sp t+ DVU1921 79.5 317 2.00E-155 541.2 Hydrogenase, Small Miyazaki small, isozyme 1 periplasmic [NiFe] hydrogenase, f+ DVU2525 55.78 294 1.00E-101 362.8 small subunit, isozyme 2 periplasmic [NiFeSe] f+ DVU1917 38.56 319 1.80E-51 196.1 hydrogenase, small subunit t- DVU0432 Ech hydrogenase, subunit EchC 30.77 65 0.075 31.19 t- DVU2288 hydrogenase, CooL subunit 26.23 61 1.1 27.34 t- DVU1883 conserved hypothetical protein 38.71 31 3.1 25.79

213 sodium/alanine symporter family t- DVU0724 41.38 29 7 24.64 protein t- DVU1513 conserved hypothetical protein 46.15 26 7 24.64 copper-translocating P-type t- DVU2324 37.5 32 9.1 24.25 ATPase Periplasmic NiFe Dv. vulgaris Periplasmic NiFe Hydrogenase, 130104 sp t+ DVU1922 86.42 567 0 1015 Hydrogenase, Large Miyazaki large, isozyme 1 periplasmic [NiFe] hydrogenase, f+ DVU2526 51.64 548 2.00E-167 582.4 large subunit, isozyme 2 t- DVU0430 Ech hydrogenase, subunit EchE 25.24 103 1.20E-06 48.14 carbon monoxide-induced t- DVU2291 22.22 81 0.003 36.96 hydrogenase CooH t- DVU0430 Ech hydrogenase, subunit EchE 31.82 88 0.195 30.8 2-C-methyl-D-erythritol 4- t- DVU1454 23.77 122 0.255 30.42 phosphate cytidylyltransferase ribose-phosphate t- DVU1575 35.71 56 0.742 28.88 pyrophosphokinase carbon monoxide-induced t- DVU2291 27.5 80 4.8 26.18 hydrogenase CooH Periplasmic NiFe Dv. vulgaris zinc resistance-associated 130104 sp t- DVU3384 40.63 32 8.2 25.41 Hydrogenase, Large, con’t. Miyazaki protein t- DVUA0058 BNR/Asp-box repeat protein 31.03 29 8.2 25.41 Dv. vulgaris Flavodoxin 6015161 sp t+ DVU2680 Flavodoxin 66.22 148 4.60E-54 203 Miyazaki methyl-accepting chemotaxis t- DVU0094 27.5 40 3 24.25 protein iron-sulfur cluster-binding t- DVU3033 55.56 18 3 24.25 protein t- DVU0932 sensor histidine kinase 31.25 48 3.9 23.87 NAD-dependent t- DVU0342 epimerase/dehydratase family 28.85 52 5 23.48 protein t- DVU1640 hypothetical protein 66.67 12 5 23.48 t- DVU3101 tonB protein, putative 31.25 32 5 23.48 t- DVU1881 phoH family protein 26.32 38 6.6 23.1 t- DVU2020 conserved hypothetical protein 50 26 6.6 23.1 Periplasmic Nitrate Dv. desulfuricans 8488996 sp f+ DVU0173 Thiosulfate Reductase 25.11 701 1.60E-30 127.9 214 Reductase ATCC27774 molybdopterin oxidoreductase, f+ DVU0694 23.95 526 1.70E-19 91.28 molybdopterin-binding subu molybdopterin oxidoreductase f+ DVU1611 23.17 505 8.10E-14 72.4 domain protein t- DVU3368 histidyl-tRNA synthetase (hisS) 32.79 61 0.269 30.8 t- DVU0506 DHH family protein 29.23 65 0.458 30.03 t- DVU2730 tail fiber protein, putative 24.14 58 3.9 26.95 t- DVU0371 conserved hypothetical protein 32.76 58 5.1 26.56 Neelaredoxin 4235394 gb Dv. gigas t- DVU3183 desulfoferredoxin 29.57 115 1.60E-04 38.12 anaerobic ribonucleoside- t- DVU0299 33.00 59 5.2 23.1 triphosphate reductase

Appendix A-2. Comparison of different matrices for BLASTP analysis. The Dv. vulgaris Hildenborough protein sequence for the cytochrome c553

protein (gi # 115241) was compared to to the Dv. vulgaris Hildenborough predicted proteome using BLASTP with either the BLOSUM62 or

BLOSUM45 matrices (189). Entries in red or blue refer to true positives or false positives as defined in Appendix A-1. The data for cytochrome c553

are representative of the control data.

a BLOSUM matrix used in the analysis

b TIGR-CMR annotation of returned sequences

c TIGR-CMR gene number of corresponding protein returned by the analysis

d % Identity of region of homology

e E Score

f Bit Score 215 Appendix A-2. Comparison of BLAST matrices BLOSUM62 and BLOSUM45

BLOSUM62a BLOSUM45a Protein Descriptionb Gene #c % Identityd E Scoree Bit Scoref Gene #c % Identityd E Scoree Bit Scoref Cytochrome c553 DVU1817 85.44 5.70E-46 174.5 DVU1817 85.44 2.00E-38 149.8 cytochrome c553 DVU3041 39.51 1.60E-11 60.08 DVU3041 39.51 8.60E-09 51.38 cytochrome c oxidase, subunit II DVU1812 24.71 0.051 28.49 DVU1812 23.53 1.7 23.81 cytochrome c3 DVU3171 46.15 0.328 25.79 DVU3171 46.15 4.6 22.41 cytochrome c oxidase, subunit II DVU1812 44.83 3.6 22.33 DVU1812 40.91 6.7 21.85 DNA processing protein DprA DVU2069 50 3.6 22.33 DVU2069 40 6.7 21.85 transglycosylase, SLT family DVUA0125 46.67 3.6 22.33 hydantoinase/oxoprolinase family protein DVU0029 28.57 4.7 21.94 DVU0029 28.57 10 21.28 hypothetical protein DVU0383 47.06 6.2 21.56 DVU0383 28.26 8.2 21.56 conserved hypothetical protein DVU0671 34.48 6.2 21.56 ATP synthase, F1 gamma subunit DVU0776 31.03 6.2 21.56 glycosyl transferase, group 2 family protein DVU2354 37.5 6.2 21.56 DVU2354 37.5 6.7 21.85 UDP-N-acetylmuramoylalanyl- DVU2508 43.48 6.2 21.56 DVU2508 21.21 6.7 21.85 216 D-glutamyl-2,6-diaminopimelat C_GCAxxG_C_C family protein DVU2753 40 6.2 21.56 radical SAM domain protein DVU0394 47.62 8.1 21.17 DVU0394 36.11 10 21.28 high-molecular-weight cytochrome C DVU0536 54.55 8.1 21.17 ATP synthase F0, B subunit, putative DVU0779 36.36 8.1 21.17

Appendix A-3. Effect of the low-complexity filter on BLASTP output. The Dv. vulgaris

Hildenborough protein sequence for the cytochrome c553 protein (gi # 115241) was compared to to the Dv. vulgaris Hildenborough predicted proteome using BLASTP with the BLOSUM62 matrix and the low complexity filter turned on or off. Table elements and colors are as defined in Appendix A-2.

217 Appendix A-3. Effect of the low complexity filter on BLASTP output

Low Complexity Filter On Low Complexity Filter Off % Bit % Bit Gene # E Score Gene # E Score Identity Score Identity Score DVU1817 85.44 5.70E-46 174.5 DVU1817 100 3.90E-55 204.9 DVU3041 39.51 1.60E-11 60.08 DVU3041 37.63 4.10E-12 62 DVU1812 24.71 0.051 28.49 DVU1812 24.71 0.051 28.49 DVU1812 44.83 3.6 22.33 DVU1812 44.83 3.6 22.33 DVU3171 46.15 0.328 25.79 DVU3171 46.15 0.328 25.79 DVUA0125 46.67 3.6 22.33 DVU0383 36.36 0.954 24.25 DVU2069 50 3.6 22.33 DVU2753 38.89 1.6 23.48 DVU0029 28.57 4.7 21.94 DVU3196 42.31 2.1 23.1 DVU2753 40 6.2 21.56 DVU2069 35.9 2.8 22.71 DVU2508 43.48 6.2 21.56 DVU0394 41.67 2.8 22.71 DVU2354 37.5 6.2 21.56 DVU0029 31.58 2.8 22.71 DVU0776 31.03 6.2 21.56 DVUA0125 46.67 3.6 22.33 DVU0671 34.48 6.2 21.56 DVU3036 39.29 6.2 21.56 DVU0383 47.06 6.2 21.56 DVU2508 43.48 6.2 21.56 DVU2727 60 8.1 21.17 DVU2354 37.5 6.2 21.56 DVU0779 36.36 8.1 21.17 DVU0776 31.03 6.2 21.56 DVU0536 54.55 8.1 21.17 DVU0671 34.48 6.2 21.56 DVU0394 47.62 8.1 21.17 DVU2727 60 8.1 21.17

218 Appendix A-4. Comparison of control data for Dv. vulgaris. Each hit generated from the control data presented in Appendix A-1 was plotted as E Score vs. % Identity. Red circles represent true positives

(ie exact matches), blue squares represent false positives (ie significant hits including paralogs that do not constitute exact matches) and green triangles represent true negatives. The horizontal dashed lines at 20% and 35% sequence represent the boundries of the “twilight zone” where the rate of false positives increases dramatically.

219 Appendix A-4. Comparison of control data for Dv. vulgaris

100 True + 90 False + True - 80

70

60

50 % Identity

40

30

20

10 -200 -150 -100 -50 0 50 10 10 10 10 10 10 E Score

220 Appendix A-5. Control data from experimentally verified protein sequences from other organisms used to derive carbon metabolism pathways of Dv. vulgaris. Where carbon metabolism data for Dv. vulgaris could not be obtained using the predicted E. coli proteome, a set of experimentally verified protein sequences from other organisms was compiled and compared to Dv. vulgaris (and Dv. desulfuricans) using BLASTP (BLOSUM62, low complexity filter on, expect value 1.0). All of the hits obtained from the analysis were plotted as E Score vs. % Identity as in Appendix A-4. The horizontal dashed lines at 20% and 35% sequence represent the boundries of the “twilight zone” where the rate of false positives increases dramatically.

221 Appendix A-5. Control data from experimentally verified protein sequences from other organisms used to derive carbon metabolism pathways of Dv. vulgaris

80

70

60

50

40 % Identity

30

20

10 -150 -100 -50 0 50 10 10 10 10 10 E Score

222 Appendix B-1. Regulon Groupings Based on KEGG Metabolic Pathways.

Groupings listed are those containing at least three distinct regulon members and at least three putative promoter sequences. Groupings that did not meet these criteria are left blank. Metabolic data was collected for Dv. vulgaris by recipricol best hit against E. coli or a relevant alternative from the KEGG database using BLASTP (50 bit cutoff). Metabolic data for Dv. desulfuricans was collected by downloading the

Dv. desulfuricans KEGG pathway data from the ORNL source page (Drury, unpublished results). This accounts for part of the disparity in the number of regulon members between the two species. It has also been observed that Dv. vulgaris intergenic regions tend to be slightly longer than those from other organisms, including Dv. desulfuricans (Arkin, personal communication). As such, Dv. vulgaris operons predicted based on intergenic sequence length would on average contain fewer members than those from Dv. desulfuricans. a,b KEGG Number and Name as listed in the KEGG Database (http://www.kegg.org) c,d “Members” and “Promoters” refers to the total number of sequences and the number of putative promoter sequences in the grouping, respectively e Intergenic length cutoff for prediction of operons (bp) as described in Chapter 4

Materials and Methods

223 Appendix B-1. Regulon Groupings Based on Metabolic Pathways

Dv. vulgaris Dv. desulfuricans KEGG KEGG Group Nameb Membersc Promotersd Membersc Promotersd Numbera 100e 300 100 300 100 300 100 300 00010 Glycolysis / Gluconeogenesis 45 95 24 23 68 162 34 33 00020 Citrate Acid Cycle (TCA Cycle) 20 38 14 13 22 59 14 14 00030 Pentose Phosphate Pathway 22 52 10 10 34 81 14 12 00051 Fructose and Mannose Metabolism 22 33 10 10 62 136 31 31 00052 Galactose Metabolism 12 18 5 5 36 61 15 14 00620 Pyruvate Metabolism 42 82 20 20 73 137 34 32 Glyoxylate and Dicarboxylate 00630 32 80 20 19 45 91 24 24 Metabolism 00640 Propanoate Metabolism 13 25 8 7 20 44 11 11 00650 Butanoate Metabolism 25 64 17 15 51 100 28 25 00190 Oxidative Phosphorylation 21 26 9 8 31 84 15 15 00680 Methane Metabolism 33 58 15 16 39 81 17 17 00910 Nitrogen Metabolism 51 77 22 21 38 89 20 19 00920 Sulfur Metabolism 7 12 7 6 10 18 5 5 00061 Fatty Acid Biosynthesis (Path 1) 18 27 7 6 26 42 10 7 00071 Fatty Acid Metabolism 10 23 7 7 15 33 9 9 00230 Purine Metabolism 52 85 29 29 93 182 41 39 00240 Pyrimidine Metabolism 36 62 19 19 91 178 45 43 00520 Nucleotide Sugar Metabolism 13 15 8 7 46 71 20 19 00251 Glutamate Metabolism 30 52 17 17 50 94 25 24 00252 Alanine and Aspartate Metabolism 28 54 14 14 41 87 21 21 Glycine, Serine and Threonine 00260 38 65 16 16 50 108 21 21 Metabolism 00271 Methionine Metabolism 14 33 8 8 10 18 6 6 00272 Cysteine Metabolism 4 8 4 3 20 44 9 9 Valine, Leucine and Isoleucine 00290 20 40 11 11 32 72 14 13 Biosynthesis 00300 Lysine Biosynthesis 19 29 10 10 60 96 27 22 00330 Arginine and Proline Metabolism 25 35 12 11 37 66 15 14 00340 Histidine Metabolism 11 21 8 7 67 151 33 31 Phenylalanine, Tyrosine and 00400 37 70 20 15 42 74 21 18 Tryptophan Biosynthesis Urea Cycle and Metabolism of Amino 00220 27 42 13 13 30 40 13 12 Groups 00450 Selenamino Acid Metabolism 13 43 5 5 43 94 24 24 D-Arginine and D-Ornithine 00472 16 43 10 9 Metabolism 00500 Starch and Sucrose Metabolism 17 33 11 10 65 142 29 29 00510 N-Glycans Biosynthesis 49 101 24 24 00530 Animosugars Metabolism 21 42 12 12 48 115 25 24 00540 Lipopolysaccharide Biosynthesis 25 46 19 17 81 178 42 41 00550 Peptidoglycan Biosynthesis 10 15 6 6 13 27 8 7 00561 Glycerolipid Metabolism 34 75 18 17 90 183 46 42 00730 Thiamine Metabolism 14 19 9 8 17 36 6 6 00740 Riboflavin Metabolism 14 16 4 3 12 12 5 5 00750 Vitamin B6 Metabolism 7 17 5 5 12 26 6 7 Nicotinate and Nicotinamide 00760 10 14 6 6 21 39 11 11 Metabolism

224 Appendix B-1, con’t.

Dv. vulgaris Dv. desulfuricans KEGG KEGG Group Name Members Promoters Members Promoters Number 100 300 100 300 100 300 100 300 00770 Pantothenate and CoA Biosynthesis 18 28 12 11 29 54 16 15 00780 Biotin Metabolism 7 9 5 5 17 31 14 14 00790 Folate Biosynthesis 19 35 10 10 43 90 21 21 00670 One Carbon Pool by Folate 32 59 13 13 24 56 11 11 00860 Porphyrin and Chlorophyll Metabolism 36 67 23 22 88 177 45 44 00130 Ubiquinone Biosynthesis 25 46 17 15 84 182 41 39

225 Appendix B-2. Regulon Groupings Based on Conserved Operons from the WIT

Database (http://wit.mcs.anl.gov/WIT2 ). Grouping criteria are as described in

Appendix B-1. a Group number assigned by the investigator b WIT Group name c,d “Members” and “Promoters” refers to the total number of sequences and the number of putative promoter sequences in the grouping, respectively e Intergenic length cutoff for prediction of operons (bp) as described in Chapter 4

Materials and Methods

226 Appendix B-2. Regulon Groupings Based on Conserved Operons

Dv. vulgaris Dv. desulfuricans Group WIT Group Nameb Membersc Promotersd Members Promoters #a 100e 300 100 300 100 300 100 300 00001 Purine Biosynthesis 20 21 10 10 17 33 8 8 00013 Glycolysis I 10 19 5 5 8 19 5 5 00016 Non-Oxidative Glucose Metabolism 15 9 4 4 14 21 6 4 00020 Formate Dehydrogenase I 10 18 3 5 19 28 5 5 00021 Formate Dehydrogenase II 11 21 5 7 20 29 6 6 00022 1.2.1.2 (Formate Dehydrogenase) 9 12 3 3 8 13 3 3 00029 Pyruvate Synthase I 7 9 4 4 16 38 6 6 00033 Mannose Metabolism 13 15 3 3 9 26 5 5 00034 Ribose Metabolism I 11 27 5 5 00037 Polysaccharide Biosynthesis 1 10 21 5 5 20 32 8 7 00038 Polysaccharide Biosynthesis 2 7 12 3 3 00039 Indolepyruvate Oxidoreductase 12 14 3 3 00041 Arginine Biosynthesis 13 21 6 6 14 18 7 6 00043 Branched-Chain Amino Acid Biosynthesis 15 21 7 6 7 14 5 4 00044 Chorismate Biosynthesis 12 26 8 6 14 29 7 6 00046 Sulfate Trans-Cys Synthase 5 12 4 4 20 42 12 12 00047 Diaminopimelate Pathway 15 23 3 3 13 30 4 4 00048 Glutamate Synthase 9 11 5 5 5 14 3 3 00050 Histidine Biosynthesis 11 22 7 7 16 23 6 6 00055 Threonine Biosynthesis 13 15 3 3 8 9 3 3 00058 Lipopolysaccharide Biosynthesis 1 5 7 3 3 7 11 4 4 00059 Lipopolysaccharide Biosynthesis 3 7 9 3 3 15 31 6 6 00061 PTS Sugar Utilization 17 33 3 3 11 24 4 4 00067 Phosphate Regulatory Cassette 5 19 3 3 5 13 4 4 00079 Amino Acid Transport I 32 50 17 13 33 52 16 12 00080 Amino Acid Transport II 23 44 12 12 27 51 13 13 00081 Amino Acid Transport 3 14 27 5 5 11 23 6 6 00086 Oligopeptide Transport 14 17 6 6 17 27 7 7 00104 HYP Operon (Hydrogenase) 6 13 5 5 6 15 4 4 00109 Precorrin-2- Biosynthesis 13 19 5 5 14 24 6 6 00111 NIF1 (Nitrogen Fixation) 6 14 6 6 16 30 6 6 00125 Molybdopterin Biosynthesis 9 19 5 5 15 34 8 8 00129 Thi Operon (Thiamine Biosynthesis) 7 8 3 3 5 18 3 3 00140 Mixed Replication / Recombination 13 13 3 3 10 20 6 6 00143 Translation I 49 35 14 20 39 74 22 17 00153 Translation 4 12 14 6 5 9 17 5 3 00167 Glutamyl-tRNA Synthase 11 18 3 3 00178 Regulatory I 12 18 6 6 00179 Cell Division 18 10 3 3 17 43 6 6 00182 Chemotaxis 26 42 10 9 24 36 10 9 00184 Soj (Sporulation) 8 6 3 3 8 25 5 5 00185 Heat Shock 14 20 7 5 11 30 6 5 00189 fla Operon (Flagella) 34 43 11 11 28 52 10 9 00190 Flagella 6 13 5 5 6 21 5 4

227 Appendix B-3. Regulon Groupings Based on Conserved Regulons of E. coli.

Grouping criteria are as described in Appendix B-1. a,b “Members” and “Promoters” refers to the total number of sequences and the number of putative promoter sequences in the grouping, respectively c Intergenic length cutoff for prediction of operons (bp) as described in Chapter 4

Materials and Methods

228 Appendix B-3. Regulon Groupings Based on Conserved Regulons of E. coli

Dv. vulgaris Dv. desulfuricans E. coli Description of Regulator Membersa Promotersb Members Promoters Regulator 100c 300 100 300 100 300 100 300 ArcA Regulator of Aerobic Respiration 78 47 11 11 17 48 9 9 ArgR Arginine Repressor 12 20 5 5 20 28 10 9 CpxR Sensor Protein Acting on ArcA 13 31 9 9 20 43 13 13 CRP cAMP-Receptor Protein 14 37 11 10 33 58 19 18 FadR Negative Regulator of fad Operon 13 24 5 5 FNR Fumarate-Nitrate Reductase Regulator 12 35 7 7 24 47 11 10 FruR Fructose Repressor 12 35 7 7 13 29 7 7 GalR Galactose Repressor 9 14 4 4 IHF Integration Host Factor 9 17 6 6 34 69 18 17 LexA Regulator of SOS Regulon 15 22 7 7 22 31 11 11 Lrp Regulator for Leucine Regulon 5 15 3 3 MetR Regulator of metEH 15 26 4 4 11 26 4 4 NarL Nitrate/Nitrite Response Regulator 15 30 6 6 19 33 9 9 NarP Nitrate/Nitrite Response Regulator 13 28 5 5 11 22 5 5 NtrC Nitrogen Regulatory Protein C 16 33 9 9 15 29 7 7 PhoB Regulator of Phosphate Uptake 12 26 4 4 17 43 9 9 PurR Purine Repressor 33 54 13 13 42 84 18 18 σ70 Subunit of RNA Polymerase (15 RpoD15 21 50 14 14 52 117 29 29 bp separator) σ70 Subunit of RNA Polymerase (16 RpoD16 57 110 26 27 85 179 43 43 bp separator) σ70 Subunit of RNA Polymerase (17 RpoD17 96 187 50 43 142 261 68 60 bp separator) σ70 Subunit of RNA Polymerase (18 RpoD18 56 98 26 23 64 158 29 27 bp separator) σ70 Subunit of RNA Polymerase (19 RpoD19 21 46 15 15 46 92 24 23 bp separator) σ32 Subunit of RNA Polymerase (Heat RpoH2 11 39 6 6 15 29 7 7 Shock Response) σ32 Subunit of RNA Polymerase (Heat RpoH3 8 29 4 4 19 51 8 8 Shock Response) σ54 Subunit of RNA Polymerase RpoN 17 36 12 12 26 50 13 13 (Nitrogen Assimilation) Regulation of Superoxide Response SoxS 6 10 4 4 8 19 6 6 Regulon TrpR Regulator of trp Operon 12 27 5 6 6 11 4 4 Regulator of Aromatic Amino Acid TyrR 20 43 7 7 9 25 7 7 Transport

229 Appendix C-1. Sample Output for AlignACE Analysis of

Dvu_00190_metabolism_100 (Oxidative Phosphorylation) Grouping.

Sample output from AlignACE on the Dv. vulgaris Oxidative Phosphorylation metabolic regulon (KEGG # 00190, 100 bp intergenic length cutoff) annotated to show the relevant input sequences. Motif data for (Designated Motif 12 in Table 4-2) are presented below the list of input sequences. Each line contains the motif sequence data, the input sequence from which the motif was derived, the location within the input sequence where the motif occurs and the orientation of the motif within the input sequence. Below the list of motif sequences are a line indicating which residues contain the most information content (designated by *) and the MAP

Score for the motif.

230 Appendix C-1. Sample Output for AlignACE Analysis of Dvu_00190_metabolism_100 (Oxidative

Phosphorylation) Grouping.

AlignACE -i Dvu_00190_metabolism_100.igrlist -gcback 0.633

Parameter values: expect = 10 Number of sites expected in the model (default) gcback = 0.633 Fractional GC Background of studied organism (for Dv. vulgaris) minpass = 200 Minimum number of non-improved passes in phase 1 (default) seed = 1087315299 Seed for random number generator numcols = 10 Number of columns to align (default) undersample = 1 Possible sites / (expect * numcols * seedings) (default) oversample = 1 1/undersample (default)

Input sequences (intergenic sequence upstream of labeled sequence):

#0 ORF01941 #1 ORF01942 #2 ORF01946 Cytochrome c Oxidase (Subunit I) #3 ORF01947 #4 ORF02767 NADH Dehydrogenase I Chain N

#5 ORF00226 ATP Synthase (F0 B) #6 ORF00219 ATP Synthase (F1 β) #7 ORF00228 ATP Synthase (F0 B) #8 ORF02976 Operon Containing Succinate Dehydrogenase Subunit C #9 ORF00461 ATP Synthase c

#10 ORF00455 #11 ORF00465 #12 ORF01631 #13 ORF04402 Succinate Dehydrogenase (Flavoprotein Subunit) #14 ORF00459

#15 ORF04404 Fumarate Reductase (Iron-Sulfur Subunit) #16 ORF01642 Inorganic Pyrophosphatase #17 ORF03409 #18 ORF01636 #19 ORF01629

#20 ORF01639

231 Appendix C-1, con’t. Sample Output for AlignACE Analysis of Dvu_00190_metabolism_100 (Oxidative Phosphorylation) Grouping.

Motif 12 Motif Sequence Input Sequence Location of Motif Motif Orientation TTCAATTCCTCATCAGAGTA 3 75 0 TATATTGTTTGATTATATAA 4 2 1 GTGCTTACGTGATGCAACAG 4 29 0 TCTTTTTTGTCCTGAGGTAA 4 139 0 TATCTTGATTGACCAAACAG 4 251 0 GGACTTTCGTCATCAGGGAG 4 274 1 TTGTTTGAGAGAACACTCAA 7 50 0 GGTGATGTAAAATCACACAA 7 144 0 GAAAAAAGCTTTTGACACAA 7 213 1 GATGATGGCTGAAGAGTCGA 11 202 0 TTGATTGTTTGAACAAAAAA 11 249 0 GCCCTTCGATGAACATGCAG 13 6 0 TACGTAGTATCATGACACAG 19 138 0 TCTGTTAGCAAAAGAAGAAA 19 178 0 * ** * ** * * ** MAP Score: 7.76775

232 Appendix C-2. Sample Output for ScanACE Analysis of Motif 12 Derived from AlignACE Analysis of Dvu_00190_metabolism_100 (Oxidative Phosphorylation) Grouping.

Sample ScanACE output for Motif 10 of the Dv. vulgaris Oxidative Phosphorylation metabolic regulon (KEGG # 00190, 100 bp intergenic length cutoff). Motif 12 derived from the previously described AlignACE analysis was scanned against the Dv. vulgaris genome and the 200 highest scoring hits were collected (the first four hits are shown). The score S for each site Q whose

sequence as a function of position p is given by q(p) is calculated as follows:

S(Q) = ∑ M p,q( p) p

Fp,b + pb M = log − log p N +1 b

Where:

Fp,b = number bases of type b aligned at position p,

Pb = genomic background nucleotide frequency for base b,

N = number of aligned sites

Perl scripts were used to determine which of the returned hits occurred within the input sequences used to derive the motif. This resulted in the following statistical parameters as defined in

Chapter 4 Materials and Methods:

N = 3774712 (total bp in Dv. vulgaris genomic sequence)

s1 = 200 (total ScanACE hits considered)

s2 = 3178 (total bp in input sequences)

x = 12 (# ScanACE hits occurring within input sequences)

These parameters were used to calculate a site specificity score (Ssite) of

6.48 x 10-19 for the motif. A comparison of the motif against its reverse complement using

CompareACE gave a palindromicity score of 0.83 (>0.7 considered to be palindromic).

233 Appendix C-2. Sample Output for ScanACE Analysis of Motif 12 Derived from AlignACE Analysis of Dvu_00190_metabolism_100 (Oxidative Phosphorylation) Grouping.

ScanACE version 1.3 October 18, 1999 /triton/hemme/ScanACE -i Dvu_00190_metabolism_100.ace -z GDV.1con -g 0.633 -s 200 Input Motif:

Motif 12 Score TTCAATTCCTCATCAGAGTA 14.4152 TATATTGTTTGATTATATAA 19.0625 GTGCTTACGTGATGCAACAG 12.2171 TCTTTTTTGTCCTGAGGTAA 13.3566 TATCTTGATTGACCAAACAG 14.065 GGACTTTCGTCATCAGGGAG 14.6182 TTGTTTGAGAGAACACTCAA 14.6849 GGTGATGTAAAATCACACAA 14.8063 GAAAAAAGCTTTTGACACAA 10.775 GATGATGGCTGAAGAGTCGA 10.0576 TTGATTGTTTGAACAAAAAA 18.4037 GCCCTTCGATGAACATGCAG 13.9594 TACGTAGTATCATGACACAG 15.0073 TCTGTTAGCAAAAGAAGAAA 14.8817 * ** * ** * * ** Motif Average: 14.3079 Std. Dev.: 2.42902

Best Hits:

Contig Start Site Orientation Motif Score in Genome Dvu_1531 1091995 1 TTTATTCCATTATCAAAAAA 19.0625 Dvu_1531 1386175 0 TTAGTTTCGTTATCAAAAAA 19.0625 Dvu_1531 1660896 1 TATATTGTTTGATTATATAA 19.0625 Dvu_1531 1660900 0 TTGTTTATATAATCAAACAA 19.0625

234 7. References

1. Klenk, H., R. Clayton, J. Tomb, O. White, K. Nelson, K. Ketchum, R. Dodson, M. Gwinn, E. Hickey, J. Peterson, D. Richardson, A. Kerlavage, D. Graham, N. Kyrpides, R. Fleischmann, J. Quackenbush, N. Lee, G. Sutton, S. Gill, E. Kirkness, B. Dougherty, K. Mckenney, M. Adams, B. Loftus, and J. Venter. 1997. The complete genome sequence of the hyperthermophilic, sulphate-reducing archaeon Archaeoglobus fulgidus. Nature 390:364-370.

2. Brysch, K., C. Schneider, G. Fuchs, and F. Widdel. 1987. Lithoautotrophic growth of sulfate-reducing bacteria, and description of Desulfobacterium autotrophicum gen. nov., sp. nov. Archives of Microbiology 148:264-274.

3. Schink, B., V. Thiemann, H. Laue, and M. W. Friedrich. 2002. Desulfotignum phosphitoxidans sp. nov., a marine sulfate reducer that oxidizes phosphite to phosphate. Archives of Microbiology 177:381-391.

4. Lee, C.-M., and K. L. Sublette. 1994. Autotrophic growth of Desulfotomaculum orientis with reduction of sulfur dioxide. Applied Biochemistry and Biotechnology 45-46:417-428.

5. Liu, Y., T. M. Karnauchow, K. F. Jarrel, D. L. Balkwill, G. R. Drake, D. Ringelberg, R. Clarno, and D. R. Boone. 1997. Description of two new thermophilic Desulfotomaculum spp., Desulfotomaculum putei sp. nov., from a deep terrestrial subsurface, and Desulfotomaculum luciae sp. nov., from a hot spring. International Journal of Systemic Biology 47:615-621.

6. Mori, K., H. Kim, T. Kakegawa, and S. Hanada. 2003. A novel lineage of sulfate-reducing microorganisms: Thermodesulfobiaceae fam. nov., Thermodesulfobium narugense, gen. nov., sp. nov., a new thermophilic isolate from a hot spring. Extremophiles 7:283-290.

7. Audiffrin, C., J.-L. Cayol, C. Joulian, L. Casalot, P. Thomas, J.-L. Garcia, and B. Ollivier. 2003. Desulfonauticus submarinus gen. nov., sp. nov., a novel sulfate-reducing bacterium isolated from a deep-sea hydrothermal vent. International Journal of Systemic and Evolutionary Microbiology 53:1585-1590.

8. Knoblauch, C., K. Sahm, and B. B. Jorgensen. 1999. Psychrophilic sulfate-reducing bacteria isolated from permanently cold Arctic marine sediments: description of Desulfofrigus oceanse gen. nov., sp. nov., Desulfofrigus fragile sp. nov., Desulfofaba gelida gen. nov., sp. nov., Desulfotalea psychrophila gen. nov., sp. nov., and Desulfotalea arctica sp. nov. International Journal of Systemic Biology 49:1631-1643.

9. Isaksen, M. F., and A. Teske. 1996. Desulforhopalus vacuolatus gen. nov., sp. nov., a moderately psychrophilic sulfate-reducing bacterium with gas vacuoles isolated from a temerate estuary. Archives of Microbiology 166:160-168.

235 10. Brandt, K. K., B. K. C. Patel, and K. Ingvorsen. 1999. gen. nov., sp. nov., a halophilic, fatty-acid-oxidizing, sulfate-reducing bacterium isolated from sediments of the Great Salt Lake. International Journal of Systemic Biology 49:193-200.

11. Tardy-Jacquenod, C., M. Magot, F. Laigret, M. Kaghad, B. K. C. Patel, J. Guezennec, R. Matheron, and P. Caumette. 1996. Desulfovibrio gabonensis sp. nov., a new moderately halophilic sulfate-reducing bacterium isolated from an oil pipeline. International Journal of Systemic Biology 46:710-715.

12. Tardy-Jacquenod, C., M. Magot, B. K. C. Patel, R. Matheron, and P. Caumette. 1998. Desulfotomaculum halophilum sp. nov., a halophilic sulfate-reducing bacterium isolated from oil production facilities. International Journal of Systemic Biology 48:333-338.

13. Hamilton, W. A. 2003. Microbially influenced corrosion as a model system for the study of metal microbe interactions: a unifying electron transfer hypothesis. Biofouling 19:65-76.

14. Payne, R. B., C. L. Hemme, and J. D. Wall. 2004. A New Frontier in Genomic Research. World Pipelines 4:53-55.

15. Lovley, D. R. 1994. Bioremediation of organic and metal contamination with dissimilatory metal reduction. Journal of Industrial Microbiology 14:85-93.

16. Lovley, D. R., and J. D. Coates. 1997. Bioremediation of metal contamination. Current Biology 8:285-289.

17. Lovley, D. R., E. R. Roden, E. J. P. Phillips, and J. C. Woodward. 1993. Enzymatic iron and uranium reduction by sulfate-reducing bacteria. Marine Geology 113:41-53.

18. White, C., A. K. Sharman, and G. M. Gadd. 1998. An integrated microbial process for the bioremediation of soil contaminated with toxic metals. Nature Biotechnology 16:572-575.

19. Tucker, M. D., L. L. Barton, and B. M. Thomson. 1998. Removal of U and Mo from water by immobilized Desulfovibrio desulfuricans in column reactors. Biotechnology and Bioengineering 60:88-96.

20. Tucker, M. D., L. L. Barton, and B. M. Thomson. 1998. Reduction of Cr, Mo, Se and U by Desulfovibrio desulfuricans immobilized in polyacrylamide gels. Journal of Industrial Microbiology and Biotechnology 20:13-19.

21. So, C. M., C. D. Phelps, and L. Y. Young. 2003. Anaerobic transformation of alkanes to fatty acids by a sulfate-reducing bacterium, Strain Hxd3. Applied and Environmental Microbiology 69:3892-3900.

236 22. Roden, E. E., J. H. Tuttle, W. R. Boynton, and W. M. Kemp. 1995. Carbon cycling in mesohaline Chesapeake Bay sediments 1: POC deposition rates and mineralization pathways. Journal of Marine Research 53:799-819.

23. Rysgaard, S., P. B. Christensen, M. V. Sorensen, P. Funch, and P. Berg. 2000. Marine meiofauna, carbon and nitrogen mineralization in sandy and soft sediments of Disko Bay, West Greenland. Aquatic Microbial Ecology 21:59-71.

24. Ozawa, K., T. Mogi, M. Suzuki, M. Kitamura, T. Nakaya, Y. Anraku, and H. Akutsu. 1997. Membrane-bound cytochromes in a sulfate-reducing strict anaerobe Desulfovibrio vulgaris Miyazaki F. Anaerobe 3:339-346.

25. Fareleira, P., B. S. Santos, C. Antonio, P. Moradas-Ferreira, J. Legall, A. V. Xavier, and H. Santos. 2003. Response of a strict anaerobe to oxygen: survival strategies in Desulfovibrio gigas. Microbiology 149:1513-1522.

26. Heidelberg, J. F., R. Seshadri, S. A. Haveman, C. L. Hemme, I. T. Paulsen, J. F. Kolonay, J. A. Eisen, N. Ward, B. Methé, L. M. Brinkac, S. C. Daugherty, R. T. Deboy, R. J. Dodson, A. S. Durkin, R. Madupu, W. C. Nelson, S. A. Sullivan, D. Fouts, D. H. Haft, J. Selengut, J. D. Peterson, T. M. Davidsen, N. Zafar, L. Zhou, D. Radune, G. Dimitrov, M. Hance, K. Tran, H. Khouri, J. Gill, T. R. Utterback, T. V. Feldblyum, J. D. Wall, G. Voordouw, and C. M. Fraiser. 2004. The genome sequence of the anaerobic, sulfate-reducing bacterium Desulfovibrio vulgaris Hildenborough. Nature Biotechnology 22:1-6.

27. Pfenning, N., and H. Biebl. 1976. Desulfuromonas acetoxidans gen. nov. and sp. nov., a new anaerobic, sulfur-reducing, acetate-oxidizing bacterium. Archives of Microbiology 110:3-12.

28. Lovley, D. R., S. J. Giovannoni, D. C. White, J. E. Champine, E. J. P. Phillips, Y. A. Gorby, and S. Goodwin. 1993. Geobacter metallireducens, new genus new species, a microorganism capable of coupling the complete oxidation of organic compounds to the reduction of iron and other metals. Archives of Microbiology 159:336-344.

29. Methé, B. A., K. E. Nelson, J. A. Eisen, I. T. Paulsen, W. Nelson, J. F. Heidelberg, D. Wu, M. Wu, N. Ward, M. J. Beanan, R. J. Dodson, R. Madupu, L. M. Brinkac, S. C. Daugherty, R. T. Deboy, A. S. Durkin, M. Gwinn, J. F. Kolonay, S. A. Sullivan, D. H. Haft, J. Selengut, T. M. Davidsen, N. Zafar, O. White, B. Tran, C. Romero, H. A. Forberger, J. Weidman, H. Khouri, T. V. Feldblyum, T. R. Utterback, S. E. V. Aken, D. R. Lovley, and C. M. Fraser. 2003. Genome of Geobacter sulfurreducens: metal reduction in subsurface environments. Science 302:1967-1969.

30. Baer, M. L., J. Ravel, J. Chun, R. T. Hill, and H. N. Williams. 2000. A proposal for the reclassification of Bdellovibrio stolpii and Bdellovibrio starrii into a new genus, Bacteriovorax gen. nov. as Bacteriovorax stolpii comb. nov. and Bacteriovorax starrii comb. nov., respectively. International Journal of Systematic & Evolutionary Microbiology 50:219- 224.

237 31. Rendulic, S., P. Jagtap, A. Rosinus, M. Eppinger, C. Baar, C. Lanz, H. Keller, C. Lambert, K. J. Evans, A. Goesmann, F. Meyer, R. E. Sockett, and S. C. Schuster. 2004. A predator unmasked: life cycle of Bdellovibrio bacteriovorus from a genomic perspective. Science 303:689-692.

32. Postgate, J. R. 1984. The sulphate-reducing bacteria, 2 ed. Cambridge University Press, Cambridge, Great Britain.

33. Ollivier, B., R. Cord-Ruwisch, E. C. Hatchikian, and J. L. Garcia. 1988. Characterization of Desulfovibrio fructosovorans new-species. Archives of Microbiology 149:447-450.

34. Fareleira, P., J. Legall, A. V. Xavier, and H. Santos. 1997. Pathways for utilization of carbon reserves in Desulfovibrio gigas under fermentative and respiratory conditions. Journal of Bacteriology 179:3972-3980.

35. Stams, A. J. M., M. Veenhaus, G. H. Weenk, and T. A. Hansen. 1983. Occurence of polyglucose as a storage polymer in Desulfovibrio species and Desulfobulbus propionicus. Archives of Microbiology 136:54-59.

36. Niel, E. W. J. V., and J. C. Gottschal. 1998. Oxygen consumption by Desulfovibrio strains with and without polyglucose. Applied and Environmental Microbiology 64:1034-1039.

37. Santos, H., P. Fareleira, A. V. Xavier, L. Chen, M.-Y. Liu, and J. Legall. 1993. Aerobic metabolism of carbon reserves by the "obligate anaerobe" Desulfovibrio gigas. Biochemical and Biophysical Research Communications 195:551-557.

38. Moller, D., R. Schauder, G. Fuchs, and R. K. Thauer. 1987. Acetate oxidation to CO2 via a citric acid cycle involving an ATP-citrate lyase: a mechanism for the synthesis of ATP via substrate level phosphorylation in Desulfobacter postgatei growing on acetate and sulfate. Archives of Microbiology 148:202-207.

39. Widdel, F. 1987. New types of acetate-oxidizing, sulfate-reducing Desulfobacter species, D. hydrogenophilus sp. nov., D. latus sp. nov., and D. curvatus sp. nov. Archives of Microbiology 148:286-291.

40. Gottschalk, G., and H. A. Barker. 1967. Presence and stereospecificity of citrate synthase in anaerobic bacteria. Biochemistry 6:1027-1034.

41. Gottschalk, G. 1968. The stereospecificity of the citrate synthase in sulfate-reducing and photosynthetic bacteria. European Journal of Biochemistry 5:346-351.

238 42. Dunbrenner, S., A. A. Chowdhury, and G. Gottschalk. 1969. The stereospecificity of the (R)-citrate synthase in the presence of p-chloromercuribenzoate. Biochemical and Biophysical Research Communications 5:802-808.

43. Gottschalk, G. 1968. Partial purification and some properties of the (R)-citrate synthase from Clostridium acidi-urici. European Journal of Biochemistry 7:301-306.

44. Grossman, J. P., and J. R. Postgate. 1955. The metabolism of malate and certain other compounds by Desulphovibrio desulfuricans. Journal of General Microbiology 12:429-445.

45. Fauque, G., J. Legall, and L. L. Barton. 1991. Sulfate-reducing and sulfur-reducing bacteria, p. 271-337. In J. M. Shively and L. M. Barton (ed.), Variations in autotrophic life. Acedemic Press Limited, London.

46. Peck, H. D. 1960. Evidence for oxidative phosphorylation during the reduction of sulfate with hydrogen by Desulfovibrio desulfuricans. The Journal of Biological Chemistry 235:2734- 2738.

47. Odom, J. M., and H. D. Peck. 1981. Hydrogen cycling as a general mechanism for energy coupling in the sulfate-reducing bacteria Desulfovibrio sp. FEMS Microbiology Letters 12:47-50.

48. Thauer, R. K., K. Jungermann, and K. Decker. 1977. Energy conservation in chemotrophic anaerobic bacteria. Bacteriological Reviews 41:100-180.

49. Moura, I., S. Bursakov, C. Costa, and J. J. G. Moura. 1997. Nitrate and nitrite utilization in sulfate-reducing bacteria. Anaerobe 3:279-290.

50. Payne, R. B., D. M. Gentry, B. J. Rapp-Giles, L. Casalot, and J. D. Wall. 2002. Uranium reduction by Desulfovibrio desulfuricans G20 and a cytochrome c3 mutant. Applied and Environmental Microbiology 68:3129-3132.

51. Elias, D. A., J. A. Suflita, M. J. Mcinerney, and L. R. Krumholtz. 2004. Periplasmic cytochrome c3 of Desulfovibrio vulgaris is directly involved in H2-mediated metal, but not sulfate, reduction. Applied and Environmental Microbiology 70:413-420.

52. Legall, J., and A. V. Xavier. 1996. Anaerobes response to oxygen: the sulfate-reducing bacteria. Anaerobe 2:1-9.

53. Hardy, J. A., and W. A. Hamilton. 1981. The oxygen tolerance of sulfate-reducing bacteria isolated from North Sea waters. Current Microbiology 6:259-262.

239 54. Fournier, M., Y. Zhang, J. D. Wildschut, A. Dolla, J. K. Voordouw, D. C. Schriemer, and G. Voordouw. 2003. Function of oxygen resistance proteins in the anaerobic sulfate- reducing bacterium Desulfovibrio vulgaris Hildenborough. Journal of Bacteriology 185:71.

55. Dos Santos, W. G., I. Pacheco, M.-Y. Liu, M. Teixeira, A. V. Xavier, and J. Legall. 2000. Purification and characterization of an iron superoxide dismutase and a catalse from the sulfate-reducing bacterium Desulfovibrio gigas. Journal of Bacteriology 182:796-804.

56. Lemos, R. S., C. Gomes, M., M. Santana, J. Legall, A. V. Xavier, and M. Teixeira. 2001. The 'strict' anaerobe Desulfovibrio gigas contains a membrane-bound oxygen-reducing respiratory chain. FEBS Letters 496:40-43.

57. Deckers, H. M., and G. Voordouw. 1996. The dcr gene family of Desulfovibrio: implications from the sequence of dcrH and phylogenetic comparison with other mcp genes. Antonie van Leeuwenhoek 70:21-29.

58. Fu, R., J. D. Wall, and G. Voordouw. 1996. DcrA, a c-type heme-containing methyl- accepting protein from Desulfovibrio vulgaris Hildenborough, senses oxygen concentration or redox potential of the environment. Journal of Bacteriology 176:344-350.

59. Fu, R., and G. Voordouw. 1997. Targeted gene-replacement mutagenesis of dcrA, encoding an oxygen sensor of the sulfate-reducing bacterium Desulfovibrio vulgaris Hildenborough. Microbiology 143:1815-1826.

60. Xiong, J., D. M. J. Kurtz, J. Ai, and J. Sanders-Loehr. 2000. A hemerythrin-like domain in a bacterial chemotaxis protein. Biochemistry 39:5117-5125.

61. Lumppio, H. L., N. V. Shenvi, A. O. Summers, G. Voordouw, and D. M. Kurtz, Jr. 2001. Rubrerythrin and rubredoxin oxidoreductase in Desulfovibrio vulgaris: a novel oxidative stress protection system. Journal of Bacteriology 183:101-108.

62. Voordouw, J. K., and G. Voordouw. 1998. Deletion of the rbo gene increases the oxygen sensitivity of the sulfate-reducing bacterium Desulfovibrio vulgaris Hildenborough. Applied and Environmental Microbiology 64:2882-2887.

63. Ochman, H. 2002. Distinguishing the ORFs from the ELFs: Short bacterial genes and the annotation of genomes. Trends in Genetics 18:335-337.

64. Hannenhalli, S. S., W. S. Hayes, A. G. Hatzigeorgiou, and J. W. Fickett. 1999. Bacterial start site prediction. Nucleic Acids Research 27:3577-3582.

65. Suzek, B. E., M. D. Ermolaeva, M. Schreiber, and S. L. Salzberg. 2001. A probabilistic method for identifying start codons in bacterial genomes. Bioinformatics 17:1123-1130. 240 66. Delcher, A. L., D. Harmon, S. Kasif, O. White, and S. L. Salzberg. 1999. Improved microbial gene identification with GLIMMER. Nucleic Acids Research 27:4636-4641.

67. Skovgaard, M., L. J. Jensen, S. Brunak, D. Ussery, and A. Krogh. 2001. On the total number of genes and their length distribution in complete microbial genomes. Trends in Genetcis 17:425-428.

68. Chen, Z. 2003. Assessing sequence comparison methods with the average precision criterion. Bioinformatics 19:2456-2460.

69. Needleman, S. B., and C. D. Wunsch. 1970. A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology 48:443- 453.

70. Smith, T. F., and M. S. Waterman. 1981. Identification of common molecular subsequences. Journal of Molecular Biology 147:195-197.

71. Karlin, S., and S. F. Altschul. 1990. Methods for assessing the stastical significance of molecular sequence features by using general scoring schemes. Proceedings of the National Academy of Sciences of the United States of America 87:2264-2268.

72. Altschul, S. F., T. L. Madden, A. A. Schäffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research 25:3389-3402.

73. Anderson, I., and A. Brass. 1998. Searching DNA databases for similarities to DNA sequences: when is a match significant? Bioinformatics 14:349-356.

74. Zhang, H. 2003. Alignment of BLAST high-scoring segment pairs based on the longest increasing subsequence algorithm. Bioinformatics 19:1391-1396.

75. Rost, B. 1999. Twilight zone of protein sequence alignments. Protein Engineering 12:85-94.

76. Brenner, S. E., C. Chothia, and T. J. P. Hubbard. 1998. Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships. Proceedings of the National Academy of Sciences of the United States of America 95:6073-6078.

77. Pearson, W. R. 1998. Empirical statistical estimates for sequences similarity searches. Journal of Molecular Biology 276:71-84.

78. Webber, C., and G. J. Barton. 2001. Estimation of P-values for global alignments of protein sequences. Bioinformatics 17:1158-1167. 241 79. Altschul, S. F., and W. Gish. 1996. Local alignment statistics, p. 460-480, Methods in Enzymology, vol. 266. Acedemic Press.

80. Park, J., K. Karplus, C. Barrett, R. Hughey, D. Haussler, T. Hubbard, and C. Chothia. 1998. Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods. Journal of Molecular Biology 284:1201-1210.

81. Mcclure, M. A., T. K. Vasi, and W. M. Fitch. 1994. Comparative analysis of multiple protein-sequence alignment methods. Molecular Biology Evolution 4:571-592.

82. Eddy, S. R. 1996. Hidden Markov models. Current Opinion in Structural Biology 6:361-365.

83. Bateman, A., E. Birney, L. Cerruti, R. Durbin, L. Etwiller, S. R. Eddy, S. Griffiths- Jones, K. L. Howe, M. Marshall, and E. L. L. Sonnhammer. 2002. The Pfam protein family database. Nucleic Acids Research 30:276-280.

84. Gribskov, M., and S. Veretnik. 1996. Identification of sequence patterns with profile analysis, p. 198-212, Methods in Enzymology, vol. 266. Acedemic Press.

85. Park, J., S. A. Teichmann, T. Hubbard, and C. Chothia. 1997. Intermediate sequences increase the detection of homology between sequences. Journal of Molecular Biology 273:249-254.

86. Pearson, W. R. 1996. Effective protein sequence comparison, p. 227-258, Methods in Enzymology, vol. 266. Acedemic Press.

87. Gerlt, J. A., and P. C. Babbitt. 2000. Can sequence determine function? Genome Biology 1:1-10.

88. Koski, L. B., and G. B. Golding. 2001. The closest BLAST hit is often not the nearest neighbor. Journal of Molecular Evolution 52:540-542.

89. Wall, D. P., H. B. Fraser, and A. E. Hirsh. 2003. Detecting putative orthologs. Bioinformatics 19:1710-1711.

90. Fitch, W. M. 1970. Distinguishing homologous from analogous proteins. Systematic Zoology 19:99-113.

91. Sjölander, K. 2003. Phylogenomic inference of protein molecular function: advances and challenges. Bioinformatics 20:170-179.

242 92. Gerlt, J. A., and P. C. Babbitt. 2001. Divergent evolution of enzymatic function: mechanistically diverse superfamilies and functionally distinct suprafamilies. Annual Reviews in Biochemistry 70:209-246.

93. Eisen, J. A. 1998. Phylogenomics: Improving functional predictions for uncharacterized genes by evolutionary analysis. Genome Research 8:163-167.

94. Broun, P., J. Shanklin, E. Whittle, and C. Somerville. 1998. Catalytic plasticity of fatty acid modification enzymes underlying chemical diversity of plant lipds. Science 282:1315- 1317.

95. Aravind, L. 2000. Guilt by association: contexual information in genome analysis. Genome Research 10:1074-1077.

96. Pellegrini, M., E. M. Marcotte, M. J. Thompson, D. Eisenberg, and T. O. Yeates. 1999. Assigning protein functions by comparative genome analysis: Protein phylogenetic profiles. Proceedings of the National Academy of Sciences of the United States of America 96:4285- 4288.

97. Tatusov, R. L., M. Y. Galperin, D. A. Natale, and E. V. Koonin. 2001. The COG database: new developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids Research 29:22-28.

98. Galperin, M. Y., and E. V. Koonin. 2000. Who's your neighbor? New computational approaches for functional genomics. Nature 18:609-613.

99. Overbeek, R., M. Fonstein, M. D'souza, G. D. Pusch, and N. Maltsev. 1999. The use of gene clusters to infer functional coupling. Proceedings of the National Academy of Sciences of the United States of America 96:2896-2901.

100. Membrillo-Hernández, J., P. Echave, E. Cabisco, J. Tamarit, J. Ros, and E. C. C. Lin. 2000. Evolution of the adhE gene product of Escherichi coli from a functional reductase to a dehydrogenase. The Journal of Biological Chemistry 275:33869-33875.

101. Kofoid, E., C. Rappleye, I. Stojilkovic, and J. Roth. 1999. The 17-gene ethanolamine (eut) operon of Salmonella typhimurium encodes five homologues of carboxysome shell proteins. Journal of Bacteriology 181:5317-5329.

102. Rost, B., J. Liu, R. Nair, K. O. Wrzeszczynski, and Y. Ofran. 2003. Automatic prediction of protein function. Cellular and Molecular Life Sciences 60:2637-2650.

103. Todd, A. E., C. A. Orengo, and J. M. Thornton. 2001. Evolution of function in protein superfamilies, from a structural perspective. Journal of Molecular Biology 307:113-1143. 243 104. Camon, E., M. Magrane, D. Barrell, D. Binns, W. Fleischmann, P. Kersey, N. Mulder, T. Oinn, J. Maslen, A. Cox, and R. Apweiler. 2003. The gene ontology annotation (GOA) project: Implementation of GO in SWISS-PROT, TrEMBL, and InterPro. Genome Research 13:662-672.

105. Postma, P. W., J. W. Lengeler, and G. R. Jacobson. 1996. Phosphoenolpyruvate:Carbohydrate Phosphotransferase Systems, p. 1149-1174. In F. C. Neidhardt (ed.), Escherichia coli and Salmonella: Cellular and Molecular Biology, vol. 2. ASM Press, Washington, D.C.

106. Tian, W., and J. Skolnick. 2003. How well is enzyme function conserved as a function of pairwise sequence identity. Journal of Molecular Biology 333:863-882.

107. Devos, D., and A. Valencia. 2000. Practical limits of function prediction. Proteins: Structure, Function and Genetics 41:98-107.

108. Wilson, C. A., J. Kreychman, and M. Gerstein. 2000. Assessing annotation transfer for genomics: Quantifying the relations between protein sequence, structure and function through traditional and probabilistic scores. Journal of Molecular Biology 297:233-249.

109. Karp, P. D., M. Riley, S. M. Paley, and A. Pellegrini-Toole. 2002. The MetaCyc Database. Nucleic Acids Research 30:59-61.

110. Kanehisa, M., S. Goto, S. Kawashima, Y. Okuno, and M. Hattori. 2004. The KEGG resource for deciphering the genome. Nucleic Acids Research 32:D277-D280.

111. Cordwell, S. J. 1999. Microbial genomes and "missing" enzymes: redefining biochemical pathways. Archives of Microbiology 172:269-279.

112. Claudel-Renard, C., C. Chevalet, T. Faraut, and D. Kahn. 2003. Enzyme-specific profiles for genome annotation: PRIAM. Nucleic Acids Research 31:6633-6639.

113. Morett, E., J. O. Korbel, E. Rajan, G. Saab-Rincon, L. Olvera, M. Olvera, S. Schmidt, B. Snel, and P. Bork. 2003. Systematic discovery of analogous enzymes in thiamin biosynthesis. Nature Biotechnology 21:790-795.

114. Ishida, T., L. Yu, H. Akutsu, K. Ozawa, S. Kawanishi, A. Seto, T. Inubushi, and S. Sano. 1998. A primitive pathway of porphyrin biosynthesis and enzymology in Desulfovibrio vulgaris. Proceedings of the National Academy of Sciences of the United States of America 95:4853-4858.

244 115. Gelfand, M. S., P. S. Novichkov, E. S. Novichkova, and A. A. Mironov. 2000. Comparative analysis of regulatory patterns in bacterial genomes. Briefings in Bioinformatics 1:357-371.

116. Romero, P. R., and P. D. Karp. 2003. Using functional and organizational information to improve genome-wide computational prediction of transcription units on pathway-geonme databases. Bioinformatics 20:709-717.

117. Gelfand, M. S. 1998. Computer analysis of DNA sequences. Molecular Biology 32:88-104.

118. Mcguire, A. M., J. D. Hughes, and G. M. Church. 2000. Conservation of DNA regulatory motifs and discovery of new motifs in microbial genomes. Genome Research 10:744-757.

119. Pilpel, Y., P. Sudarsanam, and G. M. Church. 2001. Identifying regulatory networks by combinatorial analysis of promoter elements. Nature Genetics.

120. Tan, K., G. Moreno-Hagelsieb, J. Collado-Vides, and G. D. Stormo. 2001. A comparative genomics approach to prediction of new members of regulons. Genome Research 11:566-584.

121. Mccue, L. A., W. Thompson, C. S. Carmack, M. P. Ryan, J. S. Liu, V. Derbyshire, and C. E. Lawrence. 2001. Phylogenetic footprinting of transcription factor binding sites in proteobacterial genomes. Nucleic Acids Research 29:774-782.

122. Gelfand, M. S., E. V. Koonin, and A. A. Mironov. 2000. Prediction of transcription regulatory sites in Archea by a comparative genomics approach. Nucleic Acids Research 28:695-705.

123. Salgado, H., G. Moreno-Hagelsieb, T. F. Smith, and J. Collado-Vides. 2000. Operons in Escherichia coli: Genomic analysis and predictions. Proceedings of the National Academy of Sciences of the United States of America 97:6652-6657.

124. Brandis, A., and R. K. Thauer. 1981. Growth of Desulfovibrio species on hydrogen and sulfate as sole energy source. Journal of General Microbiology 126:249-252.

125. Bradford, M. M. 1976. A rapid and sensitive method for the quantitation of microgram quantities of protein utilizing the principle of protein dye binding. Analytical Biochemistry 72:248-254.

126. Zmasek, C. M., and S. R. Eddy. 2001. ATV: Display and Manipulation of Annotated Phylogenetic Trees. Bioinformatics 17:383-384.

245 127. Rapp-Giles, B. J., L. Casalot, R. S. English, J. A. Ringbauer, Jr., A. Dolla, and J. D. Wall. 2000. Cytochrome c3 mutants of Desulfovibrio desulfuricans. Applied and Environmental Microbiology 66:671-677.

128. Rousset, M., L. Casalot, B. J. Rapp-Giles, Z. Dermoun, P. D. Philip, J.-P. Bélaich, and J. D. Wall. 1998. New shuttle vectors for the introduction of cloned DNA in Desulfovibrio. Plasmid 39:114-122.

129. Michiels, J., T. V. Soom, I. D'hooghe, B. Dombrecht, T. Benhassine, P. D. Wilde, and J. Vanderleyden. 1998. The Rhizobium etli rpoN locus: DNA sequence analysis and phenotypical characterization of rpoN, ptsN and ptsA mutants. Journal of Bacteriology 180:1729-1740.

130. Cases, I., F. Velázquez, and V. D. Lorenzo. 2001. Role of ptsO in carbon-mediated inhibition of the Pu promoter belonging to the pWW0 Pseudomonas putida plasmid. Journal of Bacteriology 183:5128-5133.

131. Boos, W., and J. M. Lucht. 1996. Periplasmic Binding Protein-Dependent ABC Transporters, p. 1175-1209. In F. C. Neidhardt (ed.), Escherichia coli and Salmonella: Cellular and Molecular Biology, 2 ed, vol. 1. ASM Press, Washington, D.C.

132. Pieulle, L., V. Magro, and E. C. Hatchikian. 1997. Isolation and analysis of the gene encoding the pyruvate-ferredoxin oxidoreductase of Desulfovibrio africanus, production of the recombinant enzyme in Escherichia coli, and effect of carboxy-terminal deletions on its stability. Journal of Bacteriology 179:5684-5692.

133. Barton, L. L. 1994. Pyruvic Acid Phosphoroclastic System, p. 94-104. In J. Harry D. Peck and J. Legall (ed.), Methods in Enzymology, vol. 243. Acedemic Press, London, UK.

134. Stams, A. J. M., and T. A. Hansen. 1982. Oxygen-labile L(+) lactate dehydrogenase activity in Desulfovibrio desulfuricans. FEMS Microbiology Letters 13:389-394.

135. Alvarez, M., and L. L. Barton. 1977. Evidence for the presence of phosphoriboisomerase and ribulose-1,5-diphosphate carboxylase in extracts of Desulfovibrio vulgaris. Journal of Bacteriology 131:133-135.

136. Lin, E. C. C. 1996. Dissimilatory Pathways for Sugars, Polyols, and Carboxylates, p. 307- 342. In F. C. Neidhardt (ed.), Escherichia coli and Salmonella: Cellular and Molecular Biology, 2 ed, vol. 1. ASM Press, Washington, D.C.

137. Houng, H.-S. H., and T. M. Cook. 1986. Presented at the First Colloquium in Biological Sciences, New York.

246 138. Dolph, P. J., D. R. Majerczak, and D. L. Coplin. 1988. Characterization of a gene cluster for exopolysaccaride biosynthesis and virulence in Erwinia stewartii. Journal of Bacteriology 170:865-871.

139. Madigan, M. T., J. M. Martinko, and J. Parker. 2000. Brock Biology of Microorganisms, 9 ed. Prentice-Hall, Inc., Upper Saddle River, NJ, USA.

140. Wootton, J. C., and S. Federhen. 1996. Analysis of compositionally biased regions in sequence databases. Methods Enzymology 266:554-71.

141. Krispin, O., and R. Allmansberger. 1998. The Bacillus subtilis galE gene is essential in the presence of glucose and galactose. Journal of Bacteriology 180:2265-2270.

142. Beech, I. B., C. C. Gaylarde, J. J. Smith, and G. G. Geesey. 1991. Extracellular polysaccharides from Desulfovibrio desulfuricans and Pseudomonas fluorescens in the presence of mild and stainless steel. Applied Microbiology and Biotechnology 35:65-71.

143. Gaylarde, C. C., and I. B. Beech. 1996. Lipopolysaccharide composition of Desulfovibrio cell wall. World Journal of Microbiology and Biotechnology 12:113-114.

144. Achebach, S. 2003. Control of the O2 sensor/regulator FNR of Escherichia coli by O2 and reducing agents in vivo and in vitro, p. 93-99. In P. Dürry and B. Friedrich (ed.), Regulatory Networks in Prokaryotes. Horizon Scientific Press, Norfolk, UK.

145. Unden, G., and J. Schirawski. 1997. The oxygen-responsive transcriptional regulator FNR of Escherichia coli: the search for signals and reactions. Molecular Microbiology 25:205-210.

146. Shaw, D. J., D. W. Rice, and J. R. Guest. 1983. Homology between CAP and Fnr, a regulator of anaerobic respiration in Escherichia coli. Journal of Molecular Biology 166:241- 247.

147. Saier, M. H. J., T. M. Ramseier, and J. Reizer. 1996. Regulation of Carbon Utilization, p. 1325-1343. In F. C. Neidhardt (ed.), Escherichia coli and Salmonella: Cellular and Molecular Biology, 2 ed, vol. 1. ASM Press, Washington, D.C.

148. Mazoch, J., and I. Kucera. 2002. Control of gene expression by FNR-like proteins in facultatively anaerobic bacteria. Folia Microbiologica 47:95-103.

149. Lazazzera, B. A., H. Beinert, N. Khoroshilova, M. C. Kennedy, and P. J. Kiley. 1996. DNA binding and dimerization of the Fe-S-containing FNR protein from Escherichia coli are regulated by oxygen. The Journal of Biological Chemistry 271:2762-2768.

247 150. Khoroshilova, N., C. Popescu, E. Munck, H. Beiner, and P. J. Kiley. 1997. Iron-sulfur cluster disassembly in the FNR protein of Escherichia coli by O-2: (4Fe-4S) to (2Fe-2S) conversion with loss of biological activity. Proceedings of the National Academy of Sciences of the United States of America 94:6087-6092.

151. Li, S. F., and J. A. Demoss. 1988. Location of sequences in the nar promoter of Escherichia coli required for regulation by Fnr and NarL. The Journal of Biological Chemistry 263:13700-13705.

152. Zumft, W. G. 1997. Cell biology and molecular basis of denitrification. Microbiology and Molecular Biology Reviews 61:533-616.

153. Anthamatten, D., B. Scherb, and H. Hennecke. 1992. Characterization of a fixLJ-regulated Bradyrhizobium japonicum gene sharing similarity with the Escherichia coli fnr and Rhizobium meliloti fixK genes. Journal of Bacteriology 174:2111-2120.

154. Durmowicz, M. C., and R. J. Maier. 1998. The FixK2 protein is involved in regulation of symbiotic hydrogenase expression in Bradyrhizobium japonicum. Journal of Bacteriology 180:3253-3256.

155. Fischer, H. M. 1994. Genetic regulation of nitrogen fixation in rhizobia. Microbiological Reviews 58:352-386.

156. Reyrat, J.-M., M. David, C. Blonski, P. Boistard, and J. Batut. 1993. Oxygen-regulated in vitro transcription of Rhizobium meliloti nifA and fixK genes. Journal of Bacteriology 175:6867-6872.

157. Galinier, A., A.-M. Garnerone, J.-M. Reyrat, D. Kahn, J. Batut, and P. Boistard. 1994. Phosphorylation of the Rhizobium meliloti FixJ protein induces its binding to a compound regulatory region at the fixK promoter. The Journal of Biological Chemistry 269:23784- 23789.

158. Hertig, C., R. Y. Li, A.-M. Louarn, A.-M. Garnerone, M. David, J. Batut, D. Kahn, and P. Boistard. 1989. Rhizobium meliloti regulatory gene fixJ activates transcription of R. meliloti nifA and fixK genes in Escherichia coli. Journal of Bacteriology 171:1736-1738.

159. Agron, P. G., and D. R. Helinski. 1995. Symbiotic expression of Rhizobium meliloti nitrogen fixation genes is regulated by oxygen, p. 275-287. In J. A. Hoch and T. J. Silhavy (ed.), Two-Component Signal Transduction. American Society for Microbiology Press, Washington, D.C.

160. Nellen-Anthamatten, D., P. Rossi, O. Preisig, I. Kullik, M. Babst, H. M. Fischer, and H. Hennecke. 1998. Bradyrhizobium japonicum FixK2, a crucial distributor in the FixLJ-

248 dependent regulatory cascade for control of genes inducible by low oxygen levels. Journal of Bacteriology 180:5251-5255.

161. Mesa, S., E. J. Bedmar, A. Chanfon, H. Hennecke, and H.-M. Fischer. 2003. Bradyrhizobium japonicum NnrR, a denitrification regulator, expands the FixLJ-FixK2 regulatory cascade. Journal of Bacteriology 185:3978-3982.

162. Vollack, K. U., E. Härtig, H. Körner, and W. G. Zumft. 1999. Multiple transcription factors of the FNR family in denitrifying Pseudomonas stutzeri: characterization of four fnr- like genes, regulatory responses and cognate metabolic processes. Molecular Microbiology 31:1681-1694.

163. Vollack, K.-U., and W. G. Zumft. 2001. Nitric oxide signaling and transcriptional control of denitrification genes in Pseudomonas stutzeri. Journal of Bacteriology 183:2516-2526.

164. Zumft, W. G. 2003. Nitric oxide signaling and NO dependent transcriptional control in bacterial denitrification by members of the FNR-CRP regulator family, p. 109-118. In P. Dürre and B. Friedrich (ed.), Regulatory Networks in Prokaryotes. Horizon Scientific Press, Norfolk, UK.

165. Arai, H., M. Mizutani, and Y. Igarashi. 2003. Transcriptional regulation of the nos genes for nitrous oxide reductase in Pseudomonas aeruginosa. Microbiology 149:29-36.

166. Körner, H., H. J. Sofia, and W. G. Zumft. 2003. Phylogeny of the bacterial superfamily of Crp-Fnr transcription regulators: exploiting the metabolic spectrum by controlling alternative gene programs. FEMS Microbiology Reviews 27:559-592.

167. Youn, H., R. L. Kerby, M. Conrad, and G. P. Roberts. 2004. Functionally Critical Elements of CooA-Related CO Sensors. Journal of Bacteriology 186:1320-1329.

168. Peterson, J. D., L. A. Umayam, T. Dickenson, E. K. Hickey, and O. White. 2001. The comprehensive microbial resource. Nucleic Acids Research 29:123-125.

169. Egland, P. G., and C. S. Harwood. 2000. HbaR, a 4-hydroxybenzoate sensor and FNR-CRP superfamily member, regulates anaerobic 4-hydroxybenzoate degradation by Rhodopseudomonas palustris. Journal of Bacteriology 182:100-106.

170. Kerby, R. L., S. S. Hong, S. A. Ensign, L. J. Coppoc, P. W. Ludden, and G. P. Roberts. 1992. Genetic and physiological characterization of the Rhodospirillum rubrum carbon monozide dehydrogenase system. Journal of Bacteriology 174:5284-5294.

249 171. Fox, J. D., Y. He, D. Shelver, G. P. Roberts, and P. W. Ludden. 1996. Characterization of the region encoding the CO-induced hydrogenase of Rhodospirillum rubrum. Journal of Bacteriology 178:6200-6208.

172. Voordouw, G. 2002. Carbon monoxide cycling by Desulfovibrio vulgaris Hildenborough. Journal of Bacteriology 184:5903-5911.

173. Lupton, F. S., R. Conrad, and J. G. Zeikus. 1984. CO metabolism of Desulfovibrio vulgaris strain Madison: physiological function in the absence of exogenous substrates. FEMS Microbiology Letters 23:263-268.

174. Haveman, S. A., V. Brunelle, J. K. Voordouw, G. Voordouw, J. F. Heidelberg, and R. Rabus. 2003. Gene expression analysis of energy metabolism mutants of Desulfovibrio vulgaris Hildenborough indicates an important role for alcohol dehydrogenase. Journal of Bacteriology 185:4345-4353.

175. Leonardo, M. R., P. R. Cunningham, and D. P. Clark. 1993. Anaerobic regulation of the adhE gene, encoding the fermentative alcholo dehydrogenase of Escherichia coli. Journal of Bacteriology 175:870-878.

176. Membrillo-Hernández, J., and E. C. C. Lin. 1999. Regulation of expression of the adhE gene, encoding ethanol oxidoreductase in Escherichia coli: transcription from a downstream promoter and regulation by Fnr and RpoS. Journal of Bacteriology 181:7571-7579.

177. Hughes, J. D., P. W. Estep, S. Tavazoie, and G. M. Church. 2000. Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisae. Journal of Molecular Biology 296:1205-1214.

178. Roth, F. P., J. D. Hughes, P. W. Estep, and G. W. Church. 1998. Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation. Nature Biotechnology 16:939-945.

179. Robison, K., A. M. Mcguire, and G. M. Church. 1998. A comprehensive library of DNA- binding site matrices for 55 proteins applied to the complete Escherichia coli K-12 genome. Journal of Molecular Biology 284.

180. Pollock, W. B. R., and G. Voordouw. 1994. Molecular biology of c-type cytochromes from Desulfovibrio vulgaris Hildenborough. Biochimie 76:554-560.

181. Schneider, T. D., and R. M. Stephens. 1990. Sequence Logos: A new way to display consensus sequences. Nucleic Acids Research 18:6097-6100.

250 182. Hemme, C. L., and J. D. Wall. 2004. Genomic insights into gene regulation of Desulfovibrio vulgaris Hildenborough. Omics 8:43-55.

183. Huynen, M., B. Snel, W. Lathe Iii, and P. Bork. 2000. Predicting protein function by genomic context: Quantitative evaluation and qualitative inferences. Genome Research 10:1204-1210.

184. Wanner, B. L. 1996. Phosphorus Assimilation and Control of the Phosphate Regulon, p. 1357-1381. In F. C. Neidhardt, Curtiss Iii, R., Ingraham, J.L., Lin, E.C.C., Low, K.B., Magasanik, B., Reznikoff, W.S., Riley, M., Schaechter, M., Umbarger, H.E. (ed.), Escherichia coli and Salmonella: Cellular and Molecular Biology, 2 ed, vol. 1. ASM Press, Washington, D.C., USA.

185. Glansdorff, N. 1996. Biosynthesis of Arginine and Polyamines, p. 408-433. In F. C. Neidhardt, Curtiss Iii, R., Ingraham, J.L., Lin, E.C.C., Low, K.B., Magasanik, B., Reznikoff, W.S., Riley, M., Schaechter, M., Umbarger, H.E. (ed.), Escherichia coIi and Salmonella: Cellular and Molecular Biology, 2 ed, vol. 1. ASM Press, Washington, D.C., USA.

186. Picard, F. J., and J. R. Dillon. 1989. Cloning and organization of seven arginine biosynthesis genes from Neisseria gonorrhoeae. Journal of Bacteriology 171:1644-1651.

187. Bringel, F., L. Frey, S. Boivin, and J. Hubert. 1997. Arginine biosynthesis and regulation in Lactobacillus plantarum: the carA gene and the argCJBDF cluster are divergently transcribed. Journal of Bacteriology 179:2697-2706.

188. Park, S., C. Lu, and A. T. Abdelal. 1997. Cloning and characterization of argR, a gene that participates in regulation of arginine biosynthesis and catabolism in Pseudomonas aeruginosa PAO1. Journal of Bacteriology 179:5300-5308.

189. Henikoff, S., and J. G. Henikoff. 1992. Amino acid substitution matrices from protein blocks. Proceedings of the National Academy of Sciences of the United States of America 89:10915-10919.

251 252 Vita

Christopher Lee Hemme was born on August 18, 1974 to Sylvan and Diane

Hemme in Sweet Springs, MO. In 1997, he received his Bachelor of Science degrees in Chemistry and Life Sciences from the University of Missouri-Rolla.

My dissertation research has resulted in the following papers:

Hemme, C. L., and J. D. Wall. 2004. Genomic insights into gene regulation of Desulfovibrio vulgaris Hildenborough. Omics 8:43-55

Payne, R. B., C. L. Hemme, and J. D. Wall. 2004. A New Frontier in Genomic Research. World Pipelines 4:53-55

Heidelberg, J. F., R. Seshadri, S. A. Haveman, C. L. Hemme, I. T. Paulsen, J. F. Kolonay, J. A. Eisen, N. Ward, B. Methé, L. M. Brinkac, S. C. Daugherty, R. T. Deboy, R. J. Dodson, A. S. Durkin, R. Madupu, W. C. Nelson, S. A. Sullivan, D. Fouts, D. H. Haft, J. Selengut, J. D. Peterson, T. M. Davidsen, N. Zafar, L. Zhou, D. Radune, G. Dimitrov, M. Hance, K. Tran, H. Khouri, J. Gill, T. R. Utterback, T. V. Feldblyum, J. D. Wall, G. Voordouw, and C. M. Fraiser. 2004. The genome sequence of the anaerobic, sulfate-reducing bacterium Desulfovibrio vulgaris Hildenborough. Nature Biotechnology 22:1-6.

Wall, J. D., C. L. Hemme, B. J. Rapp-Giles, J. A. Ringbauer, Jr., L. Casalot, and T. Giblin. 2003. Genes and genetic manipulations of Desulfovibrio, p. 85-98. In L. G. Ljungdahl, M. W. Adams, L. L. Barton, J. G. Ferry, and M. K. Johnson (ed.), Biochemistry and Physiology of Anaerobic Bacteria, vol. 1. Springer-Verlag New York, Inc., New York, NY USA.

253