THE ROLE OF A PORIN-CYTOCHROME FUSION IN NEUTROPHILIC FE OXIDATION:

INSIGHTS FROM FUNCTIONAL CHARACTARIZATION AND METATRANSCRIPTOMICS

by

Arkadiy Garber

A thesis submitted to the Faculty of the University of Delaware in partial fulfillment of the requirements for the degree of Master of Science in Geology

Summer 2018

© 2018 Arkadiy Garber All Rights Reserved

THE ROLE OF A PORIN-CYTOCHROME FUSION IN NEUTROPHILIC FE OXIDATION:

INSIGHTS FROM FUNCTIONAL CHARACTARIZATION AND METATRANSCRIPTOMICS

by

Arkadiy Garber

Approved: ______Clara Chan, Ph.D. Professor in charge of thesis on behalf of the Advisory Committee

Approved: ______Neil Sturchio, Ph.D. Chair of the Department of Geological Sciences

Approved: ______Estella Atekwana, Ph.D. Dean of the College of Earth, Ocean, and Environment

Approved: ______Ann L. Ardis, Ph.D. Senior Vice Provost for the Office of Graduate and Professional Education ACKNOWLEDGMENTS

I would like to thank my advisor, Clara Chan, for all the support and guidance over the last years. In addition to learning many valuable skills, my growth as a scientist would not have been possible without the direction and focus that Clara’s mentorship has instilled in me. Additionally, I would like to thank Sean McAllister for all the guidance and feedback; my understanding of bioinformatics and phylogenetics has benefited greatly from our many conversations. I would also like to thank other Chan lab members, past and present, for always being available and willing to help out and provide feedback, and for providing a great atmosphere to do science in. Thank you to my committee members, Dr. Thomas Hanson and Dr. Shawn Polson, for guidance and feedback on my research. Dr. Hanson’s extensive understanding of microbial physiology and Dr. Polson’s intimate knowledge of bioinformatics have been limitless resources during my time at the University of Delaware. Thank you to Dr. Sharon Rozovsky for kindly providing the resources for me to learn about protein expression and purification, as well as for advice about project design. This thesis is dedicated to my parents and sister, who have always supported me in my research endeavors, even without much understanding of what it is that I actually study.

iii TABLE OF CONTENTS

LIST OF TABLES ...... vii LIST OF FIGURES ...... viii ABSTRACT ...... xii

Chapter

1 INTRODUCTION ...... 1

1.1 Motivation to understand Fe-oxidation mechanisms ...... 1 1.2 Fe oxidation in acidophilic FeOB...... 2 1.3 Fe oxidation in neutrophilic FeOB...... 4 1.4 Phylogenetic distribution of cyc2 ...... 8 1.5 Hypotheses and overall approach ...... 9

2 FUNCTIONAL CHARACTERIZATION OF CYCPV-1 ...... 11

2.1 Section Introduction ...... 11 2.2 2.2 Materials and Methods ...... 12

2.2.1 Design of Escherichia coli expression system ...... 12 2.2.2 Expression of Cyc2 and control vectors ...... 14 2.2.3 Western blot and heme stain ...... 15 2.2.4 Whole-cell Fe oxidation assay ...... 16 2.2.5 Cyc2 purification ...... 17

2.3 2.3 Results and Discussion ...... 18

2.3.1 Expression of Cyc2 and controls in E. coli ...... 18 2.3.2 Western blot and heme stain identification of Cyc2 and Cyc2 porin ...... 21 2.3.3 Fe oxidation by Cyc2-expressing cells ...... 23 2.3.4 Cyc2 expression without successful heme insertion ...... 25 2.3.5 Inhibition of Fe oxidation activity using sodium azide ...... 27 2.3.6 E. coli growth anaerobically with nitrate ...... 28 2.3.7 Preliminary attempts for Cyc2 purification: ...... 30

iv 2.4 Conclusions ...... 32

3 CYC2 EXPRESSION IN THE ENVIRONMENT ...... 34

3.1 Section Introduction ...... 34 3.2 Materials and Methods ...... 34

3.2.1 Development and Use of HMM Profiles ...... 34 3.2.2 Re-analysis of the metagenome/metatranscriptome from the Rifle aquifer ...... 35 3.2.3 Re-analysis of the metatranscriptome from Brocéliande stream ...... 37

3.3 Results and Discussion ...... 38

3.3.1 Expression of cyc2 in the Rifle Aquifer ...... 38 3.3.2 Re-analysis of the metatranscriptome from Brocéliande stream ...... 40 3.3.3 Genomic Co-occurrence of cyc2 and other EET enzymes ...... 47

3.4 Conclusions ...... 50

4 INSIGHTS INTO DIVERSITY OF CYC2 ...... 51

4.1 Introduction ...... 51 4.2 Materials and Methods ...... 51

4.2.1 Identification of Cyc2 and MtrCAB Homologs Within Sequence Databases ...... 51 4.2.2 Multiple Sequence Alignments of Cyc2 and MtrA homologs ..... 52 4.2.3 Cyc2 and MtrA Amino Acid Identity Analysis ...... 52 4.2.4 Hidden Markov Model (HMM) Validation ...... 53

4.3 Results and Discussion ...... 54

4.3.1 Cyc2 lateral gene transfer ...... 54 4.3.2 Cyc2 tertiary structure ...... 55 4.3.3 HMM Validation and Identification of Cyc2-related “standalone” cytochromes ...... 58

4.4 Conclusions ...... 65

v 5 PRELIMINARY WORK WITH SIDEROXYDANS LITHOTROPHICUS ES-1: CYC2 EXPRESSION UNDER GROWTH ON THIOSULFATE AND FE(II) ...... 67

5.1 Introduction ...... 67 5.2 Methods ...... 68

5.2.1 Culturing of Sideroxydans lithotrophicus ES-1: ...... 68 5.2.2 Ferrozine measurements ...... 68 5.2.3 Culture harvesting, DNA/RNA extraction, and molecular analyses: ...... 69

5.3 Results ...... 70

5.3.1 Growth of Sideroxydans lithotrophicus ES-1: ...... 70 5.3.2 Amplification of cyc2 from ES-1 DNA: ...... 72 5.3.3 Amplification of cyc2 from ES-1 RNA: ...... 73

5.4 Conclusions ...... 74

REFERENCES ...... 75

Appendix

SUPPLEMENTARY FIGURES FOR CYC2 EXPRESSION IN THE ENVIRONMENT .. 82

vi LIST OF TABLES

Table 2.1: Final gene construct sequences, including the OmpA signal sequence (italicized), TEV site (underlined), and Strep Tag (bolded)...... 14

Table 2.2: Oxygen concentrations within sterile LB and suspensions of E. coli cells expressing Cyc2 and controls. Cells were at incubated at an optical density of 2 (109 cells/ml) for 5 minutes with vigorous stirring (300 RPM) during measurement of oxygen concentrations (error intervals reflect the fluctuation during those five minutes). Measurement of oxygen in sterile LB represents a single time-point. Measurements taken using the Firesting trace oxygen sensor...... 23

Table 3.1: Summary of HMM-identified cyc2 and mtoA homologs from the Brocéliande stream Fe mat metatranscriptome. Only the five mtoA homologs that fall within the broad “FeOB” cluster of the mtoA tree are shown here...... 43

Table 4.1: Organisms that encode homologs to cyc1 and cyc2 adjacent to each other in their genomes. The accession number for each gene is listed, as well as the BLAST e-value. The query genes for BLAST consisted of the cyc1 and cyc2 homologs from At. ferrooxidans...... 54

Table 4.2. Results from HHPRED structural homology search, showing the closest structural homologs to a representative set of Cyc2 homologs (Chan et al., 2018, bioRxiv) ...... 58

Table A1: Representative sequences used to collect homologs from NCBI with BLAST. Results for each BLAST search are also shown ...... 84

vii LIST OF FIGURES

Figure 1.1 Proposed electron transport chain, with Cyc2 as the electron acceptor from Fe(II), for At. ferrooxidans during chemolithoautotrophic growth on that substrate, as inferred by the Fe-oxidase activity of the purified multi-membrane-spanning protein complex (Castelle et al., 2008)...... 2

Figure 1.2: Proposed electron transport chain from iron(II) in Sideroxydans lithotrophicus ES-1. Electrons are accepted by MtoA, which may be embedded within the outer-membrane port created by MtoB. Those electrons then get passed down to MtoD and, possibly, other electron acceptors (Shi et al., 2008; Beckwith et al., 2015)...... 4

Figure 1.3: Proposed electron transport chain in M. ferrooxydans PV-1. This schematic is based on data from a proteome, and subsequent re- sequencing, leading to the discovery of an outer-membrane cytochrome, Cyc2. This protein is homologous to the confirmed Fe- oxidase in A. ferrooxidans, and suggested to act as such in PV-1 (Barco et al., 2015) ...... 6

Figure 1.4: Maximum-likelihood phylogenetic tree of Cyc2 homologs (Chan et al., in prep). This tree represents 532 unique sequences from the NCBI and IMG databases...... 8

Figure 2.1: Plasmids used in this study. (A) pMal-p4x plasmid (including the presently-excluded malE gene) into which the synthesized cyc2 gene was ligated (image from addgene.org). (B) The pEC86 plasmid, containing the eight ccm genes necessary for covalent heme- attachment onto c-type cytochrome (image from Jensen et al., 2010). (C) Schematic of the cyc2 gene construct products...... 13

Figure 2.2: A) Cell pellets of Cyc2-expressing cells (right) compared to the pellets of cultures expressing the Cyc2-porin (middle) and the empty-vector control (left). The visibly redder pellet belonging to Cyc2-expressing cells is suggestive of heme presence. B) UV-Vis spectra of empty vector and Cyc2-expressing whole-cells. C) UV-Vis spectra of isolated membrane fraction from Cyc2-expressing cells...... 20

viii Figure 2.3: A) Western bot showing expression of Cyc2 and the Cyc2 porin-only control. B) Heme stain demonstrating the peroxidase activity of the Cyc2 western blot band. (C) Results of Bev Halahan’s work relating to optimization of Cyc2 expression using different concentrations of IPTG. (U=uninduced, I=induced)...... 21

Figure 2.4: Fe oxidation in empty vector-containing cultures compared to Fe oxidation in fresh, sterile LB medium...... 22

Figure 2.5: Expression of and Fe oxidation by Cyc2. Western blot image (A) of the successful expression of Cyc2 and the Cyc2 porin-only control (U=uninduced, I=induced). Heme-stain (B) demonstrates the heme peroxidase activity of the same band that corresponds to the Cyc2 protein. Fe oxidation assay (C) performed on cell suspensions expressing Cyc2 and the porin-only and empty-vector controls demonstrates that presence of Cyc2 results in substantially more Fe oxidation...... 24

Figure 2.6: Expression of Cyc2 without successful heme insertion, and subsequent lack of Fe oxidation. A) Successful expression of the Cyc2 protein; B) absence of discernable heme peroxidase activity at the site of the Cyc2 protein. C) lack of Fe oxidation by cell suspensions expressing Cyc2 and the empty vector control...... 25

Figure 2.7: Fe oxidation by Cyc2-expressing cells compared with empty-vector controls. Dotted lines represent cell suspensions incubated with 3 mM sodium azide for 5 minutes prior to addition of FeCl and 2 measuring Fe oxidation...... 28

Figure 2.8: A) growth curve of E. coli (C43) grown under anaerobic conditions with nitrate as the sole electron acceptor. Western blot (B) and heme stain (C) demonstrate the presence of heme-containing Cyc2 in the cell lysate...... 29

Figure 2.9: A) Concentrated E. coli suspensions of the empty vector control (left) and Cyc2-expressing cells (right). B) shows the western blot image of the Cyc2-expressing cell lysate (T), the insoluble (IS) and soluble (S) fractions, as well as the 3 mL elution fractions collected during affinity-tag purification...... 30

ix Figure 3.1: Schematic of the Brocéliande forest stream FeOB mats, with stars indicating sampling locations 1-5. Source of ferruginous water is shown penetrating through a dam on the left side of the image, and entering the stream nearest to sampling site 1. Schematic adapted from Quaiser, et al., 2014...... 37

Figure 3.2: Violin plots of gene expression in Gallionellaceae bin 22.1 (A) and 22.2 (B) from the Rifle aquifer. These bins displayed expression of cyc2 (red dots and lines) and mtoAB (light green=mtoA; dark- green=mtoB expression). The violin plots represent time-series metatranscriptomic data taken after nitrate/oxygen amendments into a groundwater aquifer...... 38

Figure 3.3: Violin plots of gene expression in bins related to Sulfurimonas denitrificans (A) and Candidate Superphylum Parcubacteria (B) These organisms have not previously been associated with Fe oxidation. Expression levels of cyc2 are denoted by red dots and lines...... 40

Figure 3.4: Histogram of contig lengths from the Brocéliande stream Fe mat assembled metatranscriptome...... 42

Figure 3.5: Maximum-likelihood phylogenetic tree of mtr/mtoA homologs. Red stars indicate positions of mtoA homologs identified within the Brocéliande stream Fe mat metatranscriptome...... 44

Figure 3.6: Geochemical data (adapted from Quaiser et al., 2014) and schematic of the Brocéliande stream Fe mat, showing the localization of cyc2 and mtoA transcripts. Red stars indicate presence of cyc2 transcripts; green stars indicate presence of mtoA transcripts...... 46

Figure 3.7: Venn diagram showing genomic co-occurrence of genes implicated in Fe-cycling metabolisms. This figure does not include the metagenome-assembled Gallionellaceae bins form the Rifle aquifer that encode cyc2 and mtoAB...... 48

Figure 4.1: 3D model of Cyc2 generated using MODELLER, based on the structural homolog, OprA, identified by HHPRED. Figure from Chan et al., 2018, bioRxiv...... 55

Figure 4.2: Amino acid identities calculated from pairwise alignment of each gene, or portion of gene...... 57

x Figure 4.3: Histogram of bit scores corresponding to Cyc2 HMM matches identified within the UniProtKB database. Vertical line denotes the proposed bit score cut-off for the HMM. *Cyc2 homologs are those matches whose sequences 1) contain the heme-binding motif (CxxCH), 2) have a conserved motif upstream the heme-binding site, and 3) are long enough to encode a 16-strand beta-barrel porin (Tamm, et al., 2005)...... 61

Figure 4.4: Gene predictions (Prodigal v.2.6.3) for six of the 19 short “cytochrome-only” Cyc2 homologs, demonstrating the presence of potential start and stop codons denoting each gene; stop codons indicated by asterisks...... 62

Figure 4.5: Alignment of 16 “standalone” Cyc2 cytochromes against 17 cytochromes from the representative set of 77, which were derived from full-length Cyc2 homologs. Alignments was carried out in Muscle and manually curated...... 63

Figure 4.6: Cyc2 Phylogenetic tree of 77 cytochromes derived from a representative set of Cyc2 homologs from the Chan et al., 2018 (bioRxiv), as well as 16 cytochromes derived from HMM-identified homologs to Cyc2 lacking the porin, which are denoted as gray stars. . 65

Figure 5.1: Growth curves for ES-1 growth on Fe(II) and thiosulfate...... 70

Figure 5.2: Gel of cyc2 amplification from ES-1 DNA. The expected size of each band is between 90 and 100 bp, corresponding to the presence of bands in lanes A-F. Lanes G-L represent extraction and PCR negatives. 71

Figure 5.3: Gel of cyc2 amplification from reverse-transcribed RNA (cDNA) extracted from ES-1 grown on thiosulfate and Fe(II). The expected size of each band is between 90 and 100 bp. Faint bands are seen in lanes 1-2 (white arrow), corresponding to thiosulfate-grown ES-1 cDNA samples...... 73

Figure A1: Violin plots of gene expression in six of the Gallionellaceae bins. Each plot shows expression over four time points. Expression of cyc2 is signified by red dots...... 82

Figure A2: Maximum-likelihood phylogenetic tree of MtrA homologs. Bootstrap values are shown at each node and represent a total of 100 bootstraps...... 84

xi ABSTRACT

Fe-oxidizing (FeOB) are an ecologically-important group of organisms found in a wide range of environments. However, mechanisms for microbial Fe oxidation, particularly within neutrophilic microaerophilic FeOB, such as the and Gallionellaceae, are not well understood and lag behind our knowledge of other metabolisms. Acidophilic FeOB are thought to carry out Fe oxidation through an outer membrane cytochrome, Cyc2, functionally-verified as an Fe oxidase in acidophiles. Recently, a distant homolog to Cyc2 has been identified in the proteome of a neutrophilic Zetaproteobacterium Mariprofundus ferrooxydans PV-1, but has not yet been functionally-verified. We identified homologs of cyc2 in the genomes of a wide variety of organisms, including most known neutrophilic FeOB. The protein is also encoded by many organisms not previously associated with Fe oxidation. The overall aim of this thesis is to provide functional evidence for Cyc2’s role as an Fe oxidase in neutrophilic FeOB and demonstrate its relevance to ecosystems dominated by neutrophilic FeOB.

Using an Escherichia coli expression system, we heterologously expressed

Cyc2 from the neutrophilic FeOB PV-1, and showed that it binds a heme c cofactor.

Expression of Cyc2 conferred Fe oxidation to the host E. coli strain as evidenced by

xii accelerated Fe oxidation in cell suspensions expressing Cyc2. Controls lacking 1) the cyc2 gene, 2) the cyc2 cytochrome, or 3) only just the heme cofactor all failed to confer Fe oxidation to E. coli. We found that acceleration of Fe oxidation by Cyc2 was dependent on micromolar amounts of oxygen; no Fe oxidation was observed under anaerobic conditions, suggesting that oxygen is the final electron acceptor in our whole-cell assays. Moreover, sodium azide partially inhibited Fe oxidation. These results provide support for Cyc2’s role as an Fe oxidase in neutrophilic FeOB.

In addition, I analyzed previously-published metatranscriptome data for expression of cyc2 homologs in environments associated with active microbial Fe oxidation. Using a custom-made, curated, and validated Cyc2 HMM, we re-examined the dataset from the Rifle alluvial aquifer (Jewell et al., 2016) and found that cyc2 homologs are highly-expressed by known FeOB closely related to Gallionellaceae. We also detected cyc2 transcripts from Gallionellaceae within the iron-seep metatranscriptome published by Quaiser et al. (2014), even though Gallionellaceae form only a small fraction of the microbial community there. Taken together, these results improve our understanding of mechanisms for microbial Fe oxidation in neutrophilic Zetaproteobacteria, and highlight the potential for cyc2 to be used as a genetic marker for microbial Fe oxidation in diverse ecosystems. Future work with the E. coli expression system, as well as development of a genetic model in

xiii neutrophilic FeOB, such as the Gallionellaceae, will provide valuable insights into the genetics and mechanisms of microbial Fe oxidation.

xiv Chapter 1

INTRODUCTION

1.1 Motivation to understand Fe-oxidation mechanisms Microaerophilic Fe-oxidizing bacteria (FeOB) couple the oxidation of ferrous iron (Fe2+) to the reduction of oxygen, fixing carbon in the process; they have been found in nearly all ecosystems where opposing gradients of Fe2+ and oxygen create favorable conditions for these organisms to grow (Emerson and Moyer, 2002; Krepski et al., 2013; Emerson et al., 2013; Weiss et al., 2007; Quaiser et al., 2014; Jewell et al., 2016). In addition to supporting primary production in many environments, FeOB produce amorphous and highly-reactive iron oxyhydroxides, which have the potential to control the geochemistry of the surrounding ecosystem. In spite of the potential environmental influence of FeOB and their by-products, we have a relatively poor understanding of their activity in the environment, particularly in neutral pH systems. This poor understanding is, in part, due to lack of knowledge of the cellular machinery responsible for Fe oxidation by neutrophilic FeOB. An outer membrane cytochrome has previously been shown to act as an Fe oxidase in acidophilic FeOB (Appia-Ayme et al., 1998; Malarte et al., 2005; Castelle et al., 2008; Roger et al., 2012; Osorio et al., 2013; Jeans et al. 2012). However, the ecosystems in which acidophilic FeOB are found are not representative of the habitats associated with microbial FeOB, and are typically restricted to man-made settings, such as acid mine drainage sites. To better understand and quantify FeOB activity and influence on the geochemistry of their habitats, a better understanding of the biochemical

1 Figure 1.1 Proposed electron transport chain, with Cyc2 as the electron acceptor from Fe(II), for At. ferrooxidans during chemolithoautotrophic growth on that substrate, as inferred by the Fe-oxidase activity of the purified multi-membrane-spanning protein complex (Castelle et al., 2008). mechanisms of neutrophilic microbial Fe oxidation is necessary; this will allow for development of reliable genetic markers to track microbial Fe oxidation in the environment.

1.2 Fe oxidation in acidophilic FeOB. Fe oxidation mechanisms have been previously studied in acidophilic FeOB, particularly in Leptospirillum-dominated biofilms (Jeans et al. 2012) and pure cultures of Acidithiobacillus ferrooxidans (Appia-Ayme et al., 1998; Malarte et al., 2005; Castelle et al., 2008; Roger et al., 2012; Osorio et al., 2013), and found to be dependent on an outer-membrane cytochrome, Cyc2 (Castelle et al., 2008). The Cyc2

2 protein has been purified from pure cultures of Acidithiobacillus ferrooxidans, and shown to form a double membrane-spanning Fe-oxidizing complex, with subunits of the aa3-type terminal oxidase, and periplasmic electron carriers, including rusticyanin, which was previously, and erroneously, thought to accept electrons directly from ferrous iron (Figure 1.1) (Hazra et al., 1992). Rusticyanin likely plays an intermediate role in electron transfer in At. ferrooxidans, either in a complex with Cyc2 (Yarzabal et al., 2002) or as a soluble electron carrier (Castelle et al., 2008; homologs to rusticyanin have not been identified in neutrophilic FeOB. In addition to demonstrating a very high redox-potential, appropriate for oxidizing ferrous iron, purified Cyc2 enzyme was shown to oxidize dissolved ferrous iron in vitro as part of this double-membrane-spanning complex, as well as in isolation (Castelle et al., 2008). Cyc2 has also been shown to be ~5-fold more highly expressed by Fe-oxidizing compared with sulfur-oxidizing At. ferrooxidans (Quatrini et al., 2006; Yarzabal et al., 2004). This evidence provides good support for Cyc2’s role as the Fe oxidase in At. ferrooxidans. While this cyc2 gene has not been shown to be involved in Fe-reduction (Osorio et al., 2013), a second copy of cyc2 (cyc2B) exists in the genome of At. ferrooxidans and, based on proteomic analysis, has been suggested function as an Fe- reductase during anaerobic growth on sulfur (Norris et al., 2018). A homolog to Cyc2, cytochrome 572, was isolated from a Leptospirillum- dominated acid mine drainage biofilm, biochemically characterized with respect to redox potential and Fe oxidase activity, and found to be poised for Fe oxidation (Jeans et al., 2008). This homolog was also shown to oxidize Fe(II) in vitro, providing ample evidence for its role as an Fe oxidase in Leptospirillum. Cytochrome 572 and Cyc2 are distant homologs, with approximately 20% amino acid identity, which is

3 expected given the relatively ancient divergence of Leptospirillum () from Acidithiobacillus (). Nonetheless, there is a good evidence that both Cyc2 and cytochrome 572 act as initial electron acceptors from Fe(II) (Castelle et al., 2008; Jean et al., 2008).

Figure 1.2: Proposed electron transport chain from iron(II) in Sideroxydans lithotrophicus ES-1. Electrons are accepted by MtoA, which may be embedded within the outer-membrane port created by MtoB. Those electrons then get passed down to MtoD and, possibly, other electron acceptors (Shi et al., 2008; Beckwith et al., 2015).

1.3 Fe oxidation in neutrophilic FeOB. Mechanisms for iron oxidation in neutrophilic Fe oxidizers are not as well- understood as mechanisms in acidophiles. Some of the first clues for an Fe oxidase in neutrophilic FeOB came about only recently, when homologs to genes in Shewanella oneidensis MR-1’s metal reduction (mtr) operon were discovered in some neutrophilic FeOB, particularly Rhodopseudomonas palustris TIE-1 (pioAB) and Sideroxydans lithotrophicus ES-1 (mtoAB) (Figure 1.2) (Liu et al., 2008; Jiao and

4 Newman 2007). These Mtr homologs feature a periplasmic decaheme cytochrome (MtrA) and a large outer membrane porin (MtrB). Genetic studies of R. palustris TIE-1 demonstrate the necessity of the operon encoding pioAB for Fe oxidation in this organism (Jiao and Newman 2007). The suspected Fe oxidase in S. lithotrophicus ES- 1, MtoA, on the other hand, has not been directly linked to growth on Fe(II). MtoA has been purified and shown to oxidize Fe in vitro (Liu et al., 2012). In addition, MtoA was shown to partially rescue the ∆mtrA deletion mutant of S. oneidensis MR-1 with respect to Fe reduction, suggesting that this protein can reduce Fe(III) as well (Liu et al., 2008). While this evidence supports the idea that MtoA can catalyze the reverse form of the same reaction that it catalyzes in Fe-reducing bacteria (FeRB), it does not definitively prove that this protein does so in vivo. For this same reason, MtoA is unlikely to be a reliable genetic marker for Fe oxidation, since its dual function may create ambiguity as to what its role is in environmental settings1. Moreover, the phylogenetic distribution of mtrAB homologs is limited to a phylogenetically- restricted subset of organisms within the Proteobacteria (mostly within the ), and excludes the obligate FeOB Zetaproteobacteria, which are found in marine settings, as well as some of the Gallionellaceae and Fe-oxidizing Chlorobi.

1 One of the components of the Fe reductase system, another decaheme cytochrome MtrC that likely exists on the extracellular side of the MtrB porin, has not been identified in FeOB, suggesting that it may not be necessary for Fe oxidation, and can serve as a proxy to distinguish Fe oxidizing from Fe reducing bacteria.

5 Figure 1.3: Proposed electron transport chain in M. ferrooxydans PV-1. This schematic is based on data from a proteome, and subsequent re- sequencing, leading to the discovery of an outer-membrane cytochrome, Cyc2. This protein is homologous to the confirmed Fe- oxidase in A. ferrooxidans, and suggested to act as such in PV-1 (Barco

Recently, a homolog to Cyc2 (Cyc2PV-1) was identified in the proteome of a neutrophilic marine FeOB isolate, Mariprofundus ferrooxydans PV-1 (Barco et al.,

2015). Barco et al. suggested that Cyc2PV-1 functions as an Fe oxidase, due to its homology with a functionally-verified Fe oxidase, subcellular localization (membrane fraction), porin-cytochrome structure, and the apparent absence of other putative Fe oxidases within the genome of PV-1. Cyc2PV-1was also shown to be redox-active, based on in-gel assays with ferrocyanide. In addition, Cyc2 was shown to be the fourth most abundant cytochrome c in the proteome of PV-1 (two subunits of the cbb3-type terminal oxidase and a cytochrome c4 were the most abundant c-type cytochromes). It is worth noting, however, that PV-1 encodes a total of 23 c-type cytochromes, and it is unclear exactly how many of these were detected in the proteome by Barco et al. (2015). The highly-expressed cytochrome c4 (Cyc1PV-1) identified in the proteome of M. ferrooxydans PV-1, is homologous to the Cyc1 protein that complexed with Cyc2 in A. ferrooxidans. Based on this homology and its high expression in this proteome, it was suggested to play a role in the electron

6 transport chain, in which Cyc2PV-1 is the initial electron acceptor from Fe(II) (Figure 1.3). Transcripts of cyc2 have also been detected in high abundance within the endogenous Zetaproteobacteria communities in a hydrothermal setting, Lōihi Seamount, Hawai’i, USA (Chan et al., in prep). Other c-type cytochromes, including cyc1 have also been detected in relatively high abundance, but cyc2 consistently represents one of the most highly expressed genes in obligate marine FeOB found in this hydrothermal setting. This highlights the potential importance of Cyc2 in natural environments inhabited by Zetaproteobacteria.

7 1.4 Phylogenetic distribution of cyc2

Figure 1.4: Maximum-likelihood phylogenetic tree of Cyc2 homologs (Chan et al., in prep). This tree represents 532 unique sequences from the NCBI and IMG databases.

Homologs to cyc2 have been identified in 897 genomes (Chan et al., in prep), from organisms representing a wide variety of phyla (including Proteobacteria, Chlorobi, , Nitrospinae, etc.) (Figure 1.4). The majority of cyc2 homologs

8 are encoded within the Proteobacteria phylum. With the exception of some phototrophic FeOB, such as Rhodopseudomonas palustris TIE-1 and Rhodobacter ferrooxidans (White et al., 2016), cyc2 homologs have been identified in all known gram-negative FeOB. Thus, Cyc2 has the potential to represent a widespread mechanism for microbial Fe-oxidation that may allow us to better understand the activity of FeOB in neutrophilic ecosystems. However, evidence for Cyc2 as an Fe oxidase in neutrophilic FeOB is only based homology to the functionally- characterized Cyc2 from At. ferrooxidans, as well as expression by an obligate FeOB.

This warrants more research to confirm that Cyc2 does, indeed, accept electrons from iron in neutrophilic conditions. Additionally, to better establish the role of cyc2 as a genetic marker for Fe oxidation in natural systems, its expression in environmental datasets must be understood with respect to microbial Fe oxidation.

1.5 Hypotheses and overall approach • I hypothesize that Cyc2 functions as an Fe oxidase in the neutrophilic marine FeOB Zetaproteobacteria.

o To address this hypothesis, the cyc2 gene from the neutrophilic Zetaproteobacteria Mariprofundus ferrooxydans PV-1 was synthesized and heterologously expressed in an E. coli expression system for functional characterization.

o The ability of the cyc2 gene product to oxidize Fe(II) in vitro was assessed.

9 • I hypothesize that the expression of cyc2 corresponds with active microbial Fe oxidation in environments inhabited by the freshwater FeOB Gallionellaceae.

o To explore the role of cyc2 in a groundwater aquifer inhabited by the known FeOB Gallionellaceae, which also encode homologs to MtoAB, I re-analyzed a supplementary dataset published alongside a metagenome/metatranscriptome of said habitat.

o To explore the role of cyc2 in a freshwater stream FeOB mat, I reanalyzed the metatranscriptome from the Brocéliande forests

stream, which contains FeOB mats inhabited by Leptothrix, Gallionellaceae, Fe-reducing bacteria, and other microbial guilds.

• To increase the known diversity of Cyc2 homologs and provide some intial clues into the potential evolution of this protein, I looked for more distant relatives to Cyc2.

o I searched across the Cyc2 homologs collected from NCBI using BLAST for short “standalone” cytochromes related to Cyc2.

o Using a custom and validated Cyc2 HMM, I searched the UniProtKB database to identify more distant ancestors. • With the objective of measuring expression of cyc2 in S. lithotrophicus ES-1 during growth on ferrous iron and thiosulfate, we grew the organism on both substrates, and attempted to extract RNA during growth on both substrates for reverse transcription and PCR.

10 Chapter 2

FUNCTIONAL CHARACTERIZATION OF CYCPV-1

2.1 Section Introduction Identification of Cyc2 homologs within the Zetaproteobacteria (e.g. Mariprofundus ferrooxydans PV-1) was an important breakthrough in our understanding of potential Fe oxidation mechanisms within that class of organisms. However, sequence homology is, ultimately, not a reliable predictor for protein function. The problem is highlighted by the fact that some proteins with very low sequence similarity (such as Cyc2 from At. ferrooxidans and cytochrome 572 from Leptospirillum) share the same function (Castelle et al., 2008; Jeans et al., 2008), while proteins with very high sequence similarity, (such as Gal1, galactokinase and its close homologs Gal3, a transcriptional inducer) perform very different functions (Platt et al., 2000; Whisstock and Lest, 2004). This has potential to result in mis- annotation of genes for which no functional information exists. Given the recent explosion in sequencing information, it is not surprising that the function of only 0.6% of all annotated proteins have been experimentally-verified (du Plessis et al., 2011), this number likely lower now in 2018, seven years after the aforementioned reference. Thus, a key purpose of this thesis is to experimentally verify the role of Cyc2 from a Zetaproteobacterium as an Fe oxidase.

11 2.2 2.2 Materials and Methods

2.2.1 Design of Escherichia coli expression system

12 Figure 2.1: Plasmids used in this study. (A) pMal-p4x plasmid (including the presently-excluded malE gene) into which the synthesized cyc2 gene was ligated (image from addgene.org). (B) The pEC86 plasmid, containing the eight ccm genes necessary for covalent heme-attachment onto c-type cytochrome (image from Jensen et

Neutrophilic chemolithoautotrophic Fe-oxidizing bacteria are typically associated with low growth yields. This greatly limits the amount of protein that can be analyzed with proteomics. To circumvent this issue, the cyc2 gene was synthesized with Genscript (Piscataway, NJ, USA), with the sequence codon- optimized for expression in E. coli. The signal sequence directing the Cyc2 protein to the periplasm was also exchanged with a periplasmic signal sequence native to the E. coli gene ompA, and the C-terminus of the gene was synthesized with a StrepII-tag (Figure 2.1C; Table 2.1). The cyc2 gene was then cloned into the pMal-p4X plasmid (with the malE gene removed; Figure 2.1A) and co-transformed into E. coli C43(DE3),

13 along with the pEC86 plasmid (Figure 2.1B), which encodes the genes (ccmABCDEFGH) necessary for covalent attachment of heme c onto the cytochrome (Arslan et al., 1998). In addition to the full-length cyc2 gene, a truncated version of the gene, with the cytochrome removed (porin-only) was also cloned into the same vector and transformed into E. coli C43(DE3) (Figure 2.1C; Table 2.1). An additional control consisted of the pMal-p4X plasmid without the cyc2 gene; this plasmid was also co-transformed into E. coli C43(DE3) with pEC86. The cloning work described above was performed by Beverly Hallahan, in collaboration with Sharon

Rozovsky.

2.2.2 Expression of Cyc2 and control vectors For expression of the Cyc2 protein, cultures of E. coli C43 were grown to mid-

Table 2.1: Final gene construct sequences, including the OmpA signal sequence (italicized), TEV site (underlined), and Strep Tag (bolded).

log phase at 37˚C with shaking, on Lysogeny broth (LB, Lennox) containing 30 µg/mL

14 chloramphenicol (for the pEC86 plasmid) and 100 µg/mL ampicillin (for the pMal-p4X plasmid). LB was buffered with 10 mM 2-(N-morpholino)ethanesulfonic acid (MES) to pH 6. After 3 hours, upon reaching mid-log phase, the cultures were induced with 0.5 mM isopropyl β-D-1-thiogalactopyranoside (IPTG), and incubated for 20 hours at 18˚C, with shaking. The cultures were sampled before addition of IPTG and after 20 hours of induction for measurement of optical density (OD) and analysis of protein expression via SDS-PAGE/western blot and heme stain. Expression was also tested under different concentrations of IPTG (0.05 mM - 0.5 mM) by Beverly Hallahan.

2.2.3 Western blot and heme stain Culture samples taken before and after induction with IPTG were centrifuged and frozen prior to being processed for SDS-PAGE. Thawed samples were lysed by addition of 5X sodium dodecyl sulfate (SDS) running buffer (125 mM Tris, 1.25 M glycine, 0.5% SDS, pH 8.3) to the pellet, and passing the suspension throught a 27.5- gauge syringe needle ten times. This cell lysate was then combined with gel loading buffer (50 mM Tris-HCl, 12.5 mM EDTA, 2% SDS, 10% glycerol, 0.02% bromophenol blue, pH 6.8) and centrifuged for 10 min at 15 000 x g: The supernatant was then loaded onto a 16% Tris-glycine SDS-PAGE gel, and ran at 100 V for 30 min, then 160 V for 40 min. The gel was then either Coomasie-stained (1 g Coomasie Brilliant Blue Stain in 10% acetic acid, 40% ethanol), or transferred to a PVDF membrane for heme stain and western blot. For heme and StrepII-tag detection, the SDS-PAGE gel was transferred to a PVDF membrane at 30 V for 16 h (4oC) in transfer buffer (25 mM Tris, 192 mM glycine, 20% methanol, pH 7.2). Heme peroxidase activity was assessed by washing the membrane with TBST buffer (20 mM Tris, 137 mM NaCl, 0.1% Tween-20, pH 7.6), and incubating for 30 minutes with Pierce ECL luminol and hydrogen

15 peroxide solution (Carlson et al., 2013), before imaging on the Typhoon FLA 9500 (GE Healthcare Life Sciences). For StrepII-tag detection, the PVDF membrane was blocked for one hour with 5% bovine serum albumin (BSA) and 50 µg/mL avidin (from egg white) in TBST buffer, rinsed with TBST, and incubated for one hour with Precision Protein Streptactin-HRP conjugate (1:60,000 dilution, Biorad). The membrane was then treated with four 15-minute washes in TBST, then incubated with Pierce ECL luminol and substrate for 5 minutes before imaging on the Typhoon FLA 9500.

2.2.4 Whole-cell Fe oxidation assay After expression of Cyc2 in E. coli, triplicate cultures were centrifuged for 15 minutes at 3200 x g, and washed with sterile LB (supplemented with 10 mM MES, pH 6) before resuspension and incubation in 50 mL glass beakers in the same medium at an OD of 2 (final volume = 25 mL); these cell suspensions were vigorously stirred to ensure homogeneity and aeration. In the case of azide experiments, 75 µL of a fresh 1 M sodium azide stock solution was added to 25 mL cell suspension (to a final concentration of 3 mM), and incubated for 5 min prior to addition of FeCl2. In all experiments, FeCl2 was added from an anoxic, filter-sterilized 100 mM stock solution to a target concentration of 100 µM. Subsequently, ferrozine measurements were taken at selected time points (ferrozine method adapted from (Stookey, 1970)). At each time point, a sample was taken from each of the triplicate cultures, and centrifuged at 15,000 x g for 20-30 s to remove cells and minerals, leaving behind dissolved Fe. A 150 µL portion of the supernatant was then combined with 40 µL of

1.225 mM ferrozine, 50 µL of 6.87 M acetate buffer, and 10 µL of H2O inside 200 µL 96-well plates, and incubated for 15 min in the dark before absorption was measured at 562 nm using a plate reader (Perkin Elmer 1420 Multilabel Counter Victor3V). pH of

16 the cell suspensions was monitored before and after each assay, and was found to be stable at pH 6 (within 0.1 pH units).

2.2.5 Cyc2 purification For purification of the recombinant Cyc2 protein, it was expressed in a total of 4 liters of E. coli (C43) cultures, grown with appropriate antibiotics, and induced with 0.5 mM IPTG. After a period of induction (21 hours), cells were harvested by centrifugation (4,225 g, 10 min) and resuspended in 50 mL of lysis buffer (50 mM Tris, 200 mM NaCl, 1 mM, EDTA, 0.5% Triton X-100, pH 7.5); PMSF and Benzamidine protease inhibitors were added immediately after resuspension. The cell concentrate was then homogenized at 6,000 psi using the EmusiFlex-C5 homogenizer (Avestin, Canada). The homogenized lysate was then centrifuged (27,000 g, 40 min) to isolate the soluble fraction, which was then loaded on a 5 mL HiTrap Strep column. The insoluble fraction (i.e. pellet) was solubilized with 6 M Urea for western blot analysis. The Strep column was then connected to an ÄKTAexplorer FPLC (fast protein liquid chromatography) system (Amersham Pharmacia). The column was washed with 10 column volumes (CVs) of wash buffer (50 mM Tris, 200 mM NaCl, 1 mM EDTA, 2% n- octyl-β-D-glucoside, pH 7.5), and eluted with 6 CVs of elution buffer (wash buffer with 5 mM desthiobiotin). The relatively-mild detergent n-octyl-β-D-glucoside (β – OG, 2%) was used during the purification in an attempt to maintain Cyc2 in its native, properly-folded state during the purification procedure. The elutions were collected in 3 mL fractions and, along with the flow-through and wash samples, subjected to SDS-PAGE and western blot analysis.

17 2.3 2.3 Results and Discussion

2.3.1 Expression of Cyc2 and controls in E. coli

18 For functional characterization of the Cyc2 protein, we heterologously expressed it in E. coli. Before testing its function, it was necessary to optimize the conditions under which the recombinant protein is most highly expressed. To this end, growth of E. coli expressing Cyc2 and controls were measured via optical density (OD) measurements. Expression of Cyc2 was also measured under different concentrations of IPTG (work done by Beverly Hallahan). Prior to IPTG-induced de- repression of the lac operon, the initial growth rates of Cyc2- and empty vector- containing cultures were near identical. However, inducing expression of Cyc2 with

IPTG led to significant differences in growth between the Cyc2-expressing cells and empty vector controls; the final growth yields of these cultures were markedly different, as the empty-vector expressing cells grew to an optical density (OD600) of approximately 4 (measured as an OD of 2 of a twice diluted culture), while the Cyc2- expressing cultures only reached an OD of approximately 1. The final culture density of the porin-only controls was consistently similar to the final optical density (of approximately 1) of the Cyc2-expressing cultures. These lower growth yields may be caused by toxicity of the Cyc2 protein to the E. coli cells. Membrane proteins are typically associated with cell toxicity (Kubicek et al., 2014) Over-expression of a

19 membrane protein may have caused the cells to run out of membrane space, resulting in Cyc2, which is largely made up of hydrophobic beta-strands, aggregating within cells (Zhengqi “Frank” Zhang, personal communication). Cultures expressing Cyc2 also displayed a distinctive red color, compared with E. coli cultures expressing empty-vector and porin-only controls (Figure 2.2A). This color is consistent with expression of c-type cytochromes, and may be indicative of Cyc2. UV-Vis spectra were also taken of whole cells (OD=0.25) expressing Cyc2 and empty-vector controls, as well as isolated membrane fractions. Spectra demonstrate the presence of peaks (of higher intensity than the empty vector control) at 408 nm (Soret) and 532 nm (b) (Figure 2.2B-C). These peaks are associated with porphyrin

Figure 2.2: A) Cell pellets of Cyc2-expressing cells (right) compared to the pellets of cultures expressing the Cyc2-porin (middle) and the empty-vector control (left). The visibly redder pellet belonging to Cyc2-expressing cells is suggestive of heme presence. B) UV-Vis spectra of empty vector and Cyc2-expressing whole-cells. C) UV-Vis spectra of isolated compounds, such as hemes (Giovannetti, 2012), andmembrane fraction from Cyc2-expressing cells. support the presence of a mature holocytochrome c within the Cyc2-expressing cultures. Spectra taken from Cyc2-expressing and empty-vector controls was normalized to absorbance at 600 nm. The membrane fraction was isolated and spectra analyzed by Dr. Jessica Keffer.

20

Figure 2.3: A) Western bot showing expression of Cyc2 and the Cyc2 porin-only control. B) Heme stain demonstrating the peroxidase activity of the Cyc2 western blot band. (C) Results of Bev Halahan’s work relating to optimization of Cyc2 expression using different concentrations of IPTG. (U=uninduced, I=induced).

2.3.2 Western blot and heme stain identification of Cyc2 and Cyc2 porin

21 Figure 2.4: Fe oxidation in empty vector-containing cultures compared to Fe oxidation in fresh, sterile LB medium.

In order to confirm that the Cyc2 protein is successfully expressed, the cyc2 gene was synthesized with a StrepII-tag, an oligopeptide with high affinity to Strep- Tactin, which is an engineered version of Streptavidin protein. When fused to horseradish peroxidase (HRP), Strep-Tactin allows for identification of a recombinant protein during SDS-PAGE/western blot when incubated in a peroxide solution. Western blot and heme stain analysis of the culture samples taken prior to de- repression with IPTG, as well as right before conducting the Fe oxidation assay, revealed that a protein of approximately 42 kDa, the expected molecular weight of Cyc2, was expressed (Figure 2.3A). The less-massive porin-only construct was also expressed at its expected molecular weight (37 kDa). The band corresponding to the full-length Cyc2 displayed peroxidase activity, indicating the presence of a heme

22 group within the protein, and suggesting that Cyc2 had been successfully matured and covalently equipped with a heme group (Figure 2.3B). No peroxidase activity was detected from the porin-only band, and gel lanes corresponding to the empty-vector cells had no discernable bands during western blotting or heme staining. Cyc2 expression with different concentration of IPTG demonstrated that an IPTG concentration of 0.5 mM resulted in the highest yield of Cyc2. Expression under lower concentrations of IPTG did not result in better yields of Cyc2, as inferred by the intensity of the Cyc2 band on a western blot (Figure 2.3C). The faint, similarly-sized

bands in the empty vector control lanes may correspond to background signals from E. coli native proteins (Beverly Hallahan, personal communication).

2.3.3 Fe oxidation by Cyc2-expressing cells To demonstrate the function of Cyc2, we expressed the protein in E. coli, and incubated the cell suspensions with ferrous iron, measuring Fe oxidation thereafter. Relatively little Fe oxidation was observed in empty-vector control cultures, even when the cultures were monitored for approximately four hours after Fe addition.

Table 2.2: Oxygen concentrations within sterile LB and suspensions of E. coli cells expressing Cyc2 and controls. Cells were at incubated at an optical density of 2 (109 cells/ml) for 5 minutes with vigorous stirring (300 RPM) during measurement of oxygen concentrations (error intervals reflect the fluctuation during those five minutes). Measurement of oxygen in sterile LB represents a single time-point. Measurements taken using the Firesting trace oxygen sensor.

23 This may be due to the fact that at such a high cell density, Fe(II) is stabilized by biomass and does not undergo spontaneous oxidation as it does in sterile LB (Figure 2.4). Moreover, the high cell density of respiring E. coli resulted in microaerophilic or anaerobic conditions within the cell suspensions, preventing Fe(II) from being oxidized by oxygen. Indeed, measurements of oxygen concentrations, using the Firesting trace oxygen meter, confirmed that oxygen was below detection in empty vector control cultures. However, oxygen conditions within cultures expressing Cyc2 and the porin-only control were microaerophilic, with oxygen concentrations below 1

µM (Table 2.2). The oxygen concentration in sterile LB was 268.295 µM, close to the theoretical value for water of 282 µM (given by Henry’s Law: [O2] = (0.203 atm)(1.39 mmolO2 / kg*atm)). All oxygen measurements were taken after a few minutes of incubation, to allow for media to equilibrate. Standard error in table 2.2 represents variations in oxygen concentrations measurement over ten minutes.

Figure 2.5: Expression of and Fe oxidation by Cyc2. Western blot image (A) of the successful expression of Cyc2 and the Cyc2 porin-only control (U=uninduced, I=induced). Heme-stain (B) demonstrates the heme peroxidase activity of the same band that corresponds to the Cyc2 protein. Fe oxidation assay (C) performed on cell suspensions expressing Cyc2 and the porin-only and empty- vector controls demonstrates that presence of Cyc2 results in substantially more Fe oxidation.

24 Results showed that the Cyc2-expressing cells (with pEC86) significantly accelerated oxidation of Fe(II) compared with the porin-only and empty vector controls (Figure 2.5). Fe oxidation by Cyc2-expressing cells proceeds for 15 minutes, and stops with 20 µM Fe(II) remaining in solution. Subsequent addition of 80 µM Fe(II) after 40 minutes results in the same amount of Fe(II) oxidation that has occurred after the initial addition of 100 µM of Fe(II). Addition of more Fe(II), after oxidation of the initial 80 µM Fe stops, results in more Fe oxidation, suggesting that the initial cessation of Fe(II) oxidation is not due to exhaustion of some kind of electron sink. Based on these results, it seems that Cyc2, if indeed catalyzing the observed Fe(II) oxidation, is being re-oxidized by something in the medium, possibly oxygen, or another electron carrier endogenous to E. coli.

2.3.4 Cyc2 expression without successful heme insertion Two sets of expression cultures resulted in expression of Cyc2 without successful heme maturation. As in previous experiments, E. coli (C43), which also contained pEC86, was used as the expression strain in these cultures. However, the

Figure 2.6: Expression of Cyc2 without successful heme insertion, and subsequent lack of Fe oxidation. A) Successful expression of the Cyc2 protein; B) absence of discernable heme peroxidase activity at the site of the Cyc2 protein. C) lack of Fe oxidation by cell suspensions expressing Cyc2 and the empty vector control. 25 final optical density of these Cyc2-expressing cultures was relatively higher than the final optical density of previous cultures expressing Cyc2, in which the heme was shown to be successfully incorporated (~1.5 vs ~1). It has previously been shown that expression of Cyc2 without heme-maturation genes (i.e. without the pEC86 plasmid) resulted in lower expression of Cyc2 (Beverly Hallahan, personal communication). This is possibly caused by accelerated degradation of the recombinant protein, a potential side-effect of misfolded proteins. Because Cyc2 likely requires the heme for proper folding of the cytochrome, lack of heme attachment may have resulted in the observed lower expression of Cyc2. Since Cyc2 is toxic to the host strain, increased degradation of the recombinant protein likely resulted in the observed higher cell density in those cultures in which the heme was not properly inserted.

In further support of this, cell pellets generated from these cultures did not display the distinctive red color typically associated with Cyc2-expressing cells. SDS- PAGE and western blot confirmed that Cyc2 expression indeed occurred (Figure 2.6A), as inferred by a band corresponding to Cyc2’s molecular weight. However, that band did not display heme peroxidase activity, suggesting that while the Cyc2 protein was successfully expressed, heme insertion did not occur (Figure 2.6B). It is unclear why heme insertion did not occur during these experiments. Cells expressing Cyc2 seemed to contain the pEC86 plasmid, since growth occurred with chloramphenicol, resistance to which is encoded on the pEC86 plasmid. Cell suspensions expressing Cyc2 without a properly attached heme were also assayed for Fe oxidation activity (Figure 2.6C). These cultures oxidized Fe(II) at nearly the same rate as the porin-only and empty-vector controls. This set of assays, in which the heme was not present in Cyc2, demonstrates the dependence of Fe oxidation not only the presence of the Cyc2 cytochrome, but also the heme group.

26 2.3.5 Inhibition of Fe oxidation activity using sodium azide I used sodium azide in an attempt to inhibit cytochrome c oxidase activity. This was done to constrain the fate of electrons from Fe2+ and demonstrate that the Fe oxidation rate difference between Cyc2-expressing cells and controls is caused by biotic, rather than abiotic factors. It is possible that some of E. coli’s periplasmic electron carriers are able to accept electrons from Cyc2; these electrons may eventually make it to a terminal oxidase. To test the possibility of Cyc2 interfacing with E. coli’s endogenous periplasmic electron carriers, I added 3 mM sodium azide to the cell suspensions to inhibit the terminal oxidase (Neubauer et al., 2002; Rentz et al., 2007; Druschel et al., 2008). Addition of azide decreased Fe oxidation in our cell suspensions by approximately half, and had no effect on the empty-vector control (Figure 2.7). It is not clear why Fe oxidation activity decreased only partially; it is possible that the amount of azide added was not enough to inhibit all biological activity within the E. coli suspensions. One disadvantage to using sodium azide is that it binds Fe(II) (Yoshikawa and Caughey, 1992), potentially making it unavailable to the Cyc2 heme. Another disadvantage to this technique is that azide targets the heme group of cytochrome c, creating a possibility that the observed inhibition is caused by azide binding the heme group of Cyc2. Both of these issues may have influenced our results.

27

Figure 2.7: Fe oxidation by Cyc2-expressing cells compared with empty-vector controls. Dotted lines represent cell suspensions incubated with 3 mM sodium azide for 5 minutes prior to addition of FeCl and 2 measuring Fe oxidation.

2.3.6 E. coli growth anaerobically with nitrate Given that azide specifically targets the heme group, it is unclear whether inhibition of Fe oxidation activity was caused by azide inhibiting the terminal cytochrome c oxidase or Cyc2 directly. To shed light on this issue, my thesis committee suggested expression of Cyc2 in E. coli under anaerobic conditions, with nitrate as the terminal electron acceptor. To this end, I made numerous attempts to grow E. coli and express Cyc2 anaerobically, with nitrate as the sole electron acceptor. The cultures successfully expressed and matured Cyc2; however, the

28 maximum optical density reached by these E. coli cultures was 0.4 (Figure 2.8). Subsequent incubation of these cultures with Fe(II) and 30 mM sodium nitrate (anaerobically, under a steady nitrogen stream) resulted in no observed Fe oxidation

Figure 2.8: A) growth curve of E. coli (C43) grown under anaerobic conditions with nitrate as the sole electron acceptor. Western blot (B) and heme stain (C) demonstrate the presence of heme- containing Cyc2 in the cell lysate.

(data not recorded). While this seems to imply that Fe(II) oxidation by Cyc2 requires oxygen as an electron acceptor, another possibility is that my E. coli strains cannot respire nitrate. This is supported by the low growth yields, which are no higher than the observed growth of E. coli anaerobically in glucose with no provided electron acceptor. Moreover, a study conducted by (Pinske et al., 2011) suggests that even though the BL21 strain of E. coli encodes a nitrate reductase, it is non-functional, and represents a metabolic deficiency in this strain. BL21 is engineered for the heterologous expression of proteins, and is the precursor to the C43 strain, which I used in my Fe oxidation assays. C43, specifically, is optimized for expression of toxic proteins, and

29 likely inherits the same metabolic deficiencies present in BL21. Therefore, the low

Figure 2.9: A) Concentrated E. coli suspensions of the empty vector control (left) and Cyc2-expressing cells (right). B) shows the western blot image of the Cyc2-expressing cell lysate (T), the insoluble (IS) and soluble (S) fractions, as well as the 3 mL elution fractions collected during affinity-tag purification. growth yields observed under anaerobic conditions were likely due to E. coli carrying out fermentation. While expression and maturation of Cyc2 was successful under these conditions, the absence of a functional nitrate reductase in my expression strain eliminates the utility of these assays.

2.3.7 Preliminary attempts for Cyc2 purification: For biochemical characterization of Cyc2, I also attempted to purify Cyc2 using affinity-tag chromatography. This section represents a work in progress because purification was only attempted once, without success. Expression of Cyc2 from 4 liters of E. coli culture became evident upon concentration of the culture 80

30 times, into a volume of just 50 mL; this concentrate displayed a red color consistent with that associated with c-type hemes (Figure 2.9A). After centrifugation of lysed cells, the soluble fraction of the cell lysate (i.e. supernatant) was loaded onto the Strep affinity column, and the bound Cyc2 eluted using desthiobiotin. The insoluble fraction (i.e. the pellet formed after centrifugation of cell lysate) was solubilized through the use of concentrated urea. Samples from the initial cell lysate (before centrifugation), soluble fraction (after centrifugation), urea-solubilized insoluble fraction (i.e. pellet), the flow-through (everything that did not bind in the affinity column), and the elution fractions (anything eluted from the column after addition of desthiobiotin), were loaded onto an SDS-PAGE gel for subsequent western blot analysis. Gel lanes corresponding to the initial cell lysate (before centrifugation) and the urea-solubilized cell insoluble fraction fractions contained double bands, with an approximate mass difference of 5-10 kDa; the slightly higher molecular weight band represents the molecular weight of Cyc2 (Figure 2.9B). These double bands were absent in the cell lysate soluble fraction, but present within the urea-solubilized cell pellet. The same double bands observed in the total cell lysate and urea-solubilized pellet were also observed in several elution fractions (fractions 5-12). This may suggest potential elution of Cyc2 by the increasing concentration of desthiobiotin within the elution buffer. Additional bands, of lower molecular weight, were also seen on the western blot; these are likely caused by one of E. coli’s endogenous proteins, the biotin carboxyl-carrier protein (Wang et al., 2005). These bands appeared in the homogenized cell lysate and the soluble cell lysate fraction, but not in the urea-

31 solubilized pellet, suggesting that, unlike Cyc2, the biotin carboxyl-carrier protein is soluble. This protein also appeared in high abundance in the elution fractions, suggesting that it binds the Strep-Tactin resin within the Strep column. Overall, these results are very preliminary and represent a work in progress. While the nature of the second band (of slightly-lower mass) that co-purified with Cyc2 is unclear, these preliminary purification results are promising in that they demonstrate that small amounts of Cyc2 are indeed solubilized by the buffers Triton X-100 and b-OG. Moreover, the presence of a Cyc2-sized band in the elution fractions suggests that Cyc2 binds the Strep column and is eluted by desthiobiotin, as expected, during StrepII-tag affinity chromatography. However, further work is necessary to optimize detergent conditions and better understand the nature of the second band that co-purified with Cyc2. The use of b-OG as the primary detergent in the lysis buffer may result in a better yield of recombinant protein in its native state.

2.4 Conclusions Work presented in this section has provided evidence in support of the role of a Cyc2 homolog from cluster 1 of the Cyc2 phylogenetic tree (Chan et al., in prep) as an Fe oxidase. Cluster 1 corresponds to neutrophilic microaerophilic FeOB, including Zetaproteobacteria and Gallionellaceae. The rapid oxidation of Fe(II) in E. coli cultures expressing Cyc2 from a neutrophilic marine Zetaproteobacterium, compared with controls, suggests that Cyc2 alone has the potential to confer Fe oxidation activity. Considering also the work done by Castelle et al. and Jeans et al. (2008), functional information now exists on cyc2 from all three clusters of the Cyc2 phylogenetic tree. This has potential implications for the prevalence of microbial Fe oxidation, since

32 cyc2 is found in many microbial genomes not previous associated with FeOB. However, given the diversity of cyc2 homologs, even within each cluster, more biochemical work is necessary to demonstrate that the gene product is an Fe oxidase. Moreover, the biochemically-characterized cyc2 homologs in clusters 2 and 3 represent acidophilic FeOB. While the cyc2 homologs from most known neutrophilic FeOB are found in cluster 1, neutrophiles not previously associated with Fe oxidation exist in other clusters, further necessitating functional verification of this gene.

33 Chapter 3

CYC2 EXPRESSION IN THE ENVIRONMENT

3.1 Section Introduction In the previous section, evidence was provided in support of Cyc2’s role as an Fe oxidase under neutrophilic condition. While understanding the structure and function of Cyc2 is important to substantiate its identity as an Fe oxidase, equally important is understanding its relevance in natural systems. To provide insights into the role of this protein in ecosystems associated with active Fe oxidation, I re- analyzed the supplementary data of a metatranscriptome from a groundwater system (Rifle aquifer) that is known to host a diverse chemolithoautotrophic microbial community capable of Fe and S-cycling (Jewell et al., 2016). In addition, I re-analyzed the metatranscriptome from the Broceliande forest freshwater stream, which hosts a microbial mat community in an Fe environment (Quaiser et al., 2014).

3.2 Materials and Methods

3.2.1 Development and Use of HMM Profiles Multiple-sequence alignments (discussed in more detail in section 4.2.2) were used to generate hidden Markov models (HMMs) using the hmmbuild program, part of the HMMER v.3.1b2 suite of tools. Each hidden Markov model was prepared by first aligning a representative set of sequences for each gene using MUSCLE v.3.8.31. Each alignment consisted of approximately 100 sequences, representative of the

34 diversity of Cyc2 and MtoA homologs that have been identified with BLAST. HMMER was then used to build (hmmbuild) the HMM profile, and search (hmmsearch) through the sequence databases for homologs to each gene of interest. Validation of the HMMs is discussed in detail in sections 4.2.4 and 4.3.3. Upon identification of genes the sequences were extracted from the sequence database and aligned with the representative set of sequences that were used to generate the HMM. This was done to verify that the HMM match was, in fact, related to each gene of interest. The HMMs and associated alignments have been submitted to the TIGRFAM resource for MSAs and HMMs; these are also publically available on my GitHub page (https://github.com/Arkadiy-Garber).

For genetic co-occurrence of genes relevant to extracellular electron transport (EET), the genomic affiliations of each gene were identified with a custom python script and imported into R Studio for visualization using the VennDiagram package.

3.2.2 Re-analysis of the metagenome/metatranscriptome from the Rifle aquifer To study the chemolithotrophic response to addition of electron acceptors, - Jewell et al. (2016) injected NO3 and O2 into a shallow, highly-reduced aquifer and extracted samples for metagenomic and metatranscriptomic analyses four times over the following 47 days (Jewell et al., 2016). Results indicate expression of genes related to chemolithoautotrophic metabolisms, including the sox genes used for sulfur oxidation, mtoAB genes used for Fe oxidation, nar/nap genes for nitrate reduction, in addition to cytochrome c oxidases for oxygen reduction. The supplemental materials included in the above publication included a supplementary table of approximately 200,000 ORFs identified in this study, and accompanying

35 metadata (e.g. abundance, annotation). To quantify the expression of cyc2 and mtoAB in this terrestrial ecosystem inhabited by FeOB related to the Gallionellaceae family, I extracted information from this supplementary dataset. Using a custom python script, I partitioned the sequence data based on bin affiliation into separate fasta files. Within these files, I used custom HMM profiles, designed for Cyc2 and MtoA, to look for homologs to these putative Fe oxidases. The expression data of verified Cyc2 and MtoA homologs were then extracted from the Jewell et al., 2016 supplemental dataset and imported into R Studio v.1.1.423, and plotted using the ggplot2 visualization package. PhyloSift v.1.0.1 (Darling et al., 2014) was used to confirm the taxonomic affiliation of the metagenome-assembled genomes used in this study.

36 3.2.3 Re-analysis of the metatranscriptome from Brocéliande stream To study the potential role of cyc2 in the Fe-mat community at the

Figure 3.1: Schematic of the Brocéliande forest stream FeOB mats, with stars indicating sampling locations 1-5. Source of ferruginous water is shown penetrating through a dam on the left side of the image, and Brocéliande forest stream (Brittany, France), I performed a reentering the stream nearest to sampling site 1. Schematic adapted-analysis of the metatranscriptome published by Quaiser et al., (2014), focusing on genes encoding the putative Fe oxidases cyc2 and mtoA. Single-end raw Illumina reads were downloaded from the NCBI Sequence Read Archive (SRA). Sequences that were downloaded corresponded to a total of five samples, taken at three distances from the ferruginous water source, and over two depths (Figure 3.1). The reads were then quality trimmed using Trimmomatic v.0.36 (ILLUMINACLIP:TruSeq3-SE:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36; Bolger et al., 2014) and assembled using Trinity v.2.4.0 (default settings; Haas et al., 2013). Open reading frames (ORFs) were then predicted using Prodigal v.2.6.3 (–p meta; Hyatt, et al., 2012) from the assembled contigs; bowtie2 v.2.3.3.1 (default settings; Langmead et al., 2012) was used to align and map reads back to the positions within the contigs corresponding to gene calls. Coverage (average over entire ORF) information, a proxy for expression level, was extracted using SAMtools v.1.4.1 (Li et al., 2009); briefly, the sequence alignment map created by bowtie2 was converted to binary format, sorted,

37 and indexed. These files were then used to calculate RPKM (reads per kilo-basepair million) for each gene:

coverage / [ (length of aligned fragment / 1 000) * (number of reads in sample / 1 000 000) ]

The resulting RPKM values were then imported into R and visualized using the ggplot2 visualization package. Putative Fe oxidases mtoA and cyc2 were identified and culled using custom HMM profiles. Identified homologs were aligned with other mtoA and cyc2 homologs to make sure that the they are, indeed, real homologs (inferred by alignment). Homologs to cyc2 were kept only if they contained the cytochrome portion of the gene (i.e. HMM matches without a heme-binding site were discarded).

3.3 Results and Discussion

3.3.1 Expression of cyc2 in the Rifle Aquifer Part of the microbial community of the Rifle aquifer are members of the family Gallionellaceae, which have previously been associated with microbial Fe oxidation. The metagenome and metatranscriptome from this ecosystem, taken as part of a time-series after injection of nitrate (dissolved in fully oxygenated water) into an aquifer well. Prior to injection of nitrate, Gallionellaceae accounted for only about 3% of the total community in this ecosystem, rising to approximately 30% after

38 Figure 3.2: Violin plots of gene expression in Gallionellaceae bin 22.1 (A) and 22.2 (B) from the Rifle aquifer. These bins displayed expression of cyc2 (red dots and lines) and mtoAB (light green=mtoA; dark- green=mtoB expression). The violin plots represent time-series injection (Jewell et al., 2016). The advantage of studying this groundwater community is that multiple members of Gallionellaceae encode more than one putative Fe oxidase, including cyc2 and mtrAB (Emerson et al., 2013). The relative expression levels of these genes, in relation to each other and within the context of each genome, may shed light into the purpose of each Fe oxidase. Jewell et al. (2016) note the expression of various genes related to chemolithoautotrophic metabolisms, including cyc2 and mtr/mtoAB, the latter of which is thought to be part of the Fe oxidation pathway in some Gallionellaceae (Shi et al., 2012). My independent re-analysis of the supplementary dataset, which was published alongside this study, highlights the relatively high abundance of cyc2 transcripts, in relation to mtoAB. Transcripts of cyc2 have been detected from all but one of the Gallionellaceae metagenome bins (Figure A1, Appendix A), including two bins in which mtoAB transcripts have also been detected at all time points (Figure 3.2). Expression of cyc2 is always higher than that of mtoAB, and represents one of the most highly expressed genes in all the Gallionellaceae bins that encode this protein. High expression of this protein by these known neutrophilic FeOB supports the hypothesis that Fe-oxidizing bacteria related to Gallionellaceae utilize Cyc2 for Fe oxidation. I also identified cyc2 transcripts belonging to reconstructed genomes of organisms not previously associated with Fe oxidation, such as Sulfurimonas denitrificans (Figure 3.3). In order to confirm that the cyc2 homolog did not get erroneously mapped to the S. denitrificans bin, I aligned the amino acid sequence of this transcript with other Cyc2 homologs, and constructed a phylogenetic tree of this alignment. Results show that cyc2 from this organism clusters with other

39 within cluster 1 of the Cyc2 phylogenetic tree (Chan et al., in

Figure 3.3: Violin plots of gene expression in bins related to Sulfurimonas denitrificans (A) and Candidate Superphylum Parcubacteria (B) These organisms have not previously been associated with Fe oxidation. Expression levels of cyc2 are denoted by red dots and lines. prep), supporting this transcript’s placement in the S. denitrificans bin. While it is unclear what the role of Cyc2 is in the S-oxidizing S. denitrificans, it is possible that this organism is capable of Fe oxidation, and may potentially utilize Cyc2 as an Fe oxidase.

3.3.2 Re-analysis of the metatranscriptome from Brocéliande stream In addition to their presence in the Rifle aquifer, Gallionellaceae are also found in freshwater surface streams, which, when sourced with Fe-rich waters, often host active microbial Fe oxidation (Quaiser et al., 2014). The Brocéliande forest stream (in Brittany, France) is inhabited by a community of microorganisms known to actively participate in Fe-cycling, including Gallionellaceae and Leptothrix (Kato et al., 2014; Quaiser et al., 2014). Part of this stream is adjacent to a dam, from which ferruginous water seeps into the stream. This creates an oxygen/Fe(II) gradient

40 favorable for microaerophilic FeOB; the microbial Fe mats that are adjacent to the dam were recently investigated using metatranscriptomics, focusing on the differences between surface and deep mat samples, as well as proximity to the ferruginous water source. Results presented in Quaiser et al. (2014) indicate active iron oxidation, evidenced by the presence of mtrA homologs. My own analysis of this metatranscriptome provides further insights into the Fe oxidation pathways in this ecosystem.

41

Figure 3.4: Histogram of contig lengths from the Brocéliande stream Fe mat assembled metatranscriptome.

42 Table 3.1: Summary of HMM-identified cyc2 and mtoA homologs from the

Assembly of the high-quality reads using Trinity resulted in a total of 32,409 contigs, with a mean length of 496 bp (Figure 3.4). From these contigs, 32 525 open reading frames (ORFs) were predicted. Using custom-made HMMs, I identified both cyc2 and mtoA transcripts from these ORFs. Because the Brocéliande forest stream metatranscriptome did not have a reference metagenome, the identified Fe oxidases were also aligned with cyc2 or mtoA homologs to infer a phylogenetic relationship of these transcripts. Phylogenetic trees of the cyc2 and mtoA homologs was constructed using RAxML, to compare the homologs identified in the Brocéliande FeOB mats to a set of reference sequences. Expression levels and closest phylogenetic affiliations of each identified Fe oxidase homolog are summarized in Table 3.1. All three of the cyc2 homologs that were identified were found only within samples collected from the top layers of the mat (samples 4 and 5 from Figure

4 3 3.1). Relative to the median, the expression of cyc2 is close to the median expression of other genes in this metatranscriptome. Two of the cyc2 homologs are phylogenetically related to the cyc2 homologs from cluster 1 of the Cyc2 tree (particularly within the Gallionellaceae sub-cluster) (Chan et al., in prep); one cyc2 homolog falls within cluster 3 of the Cyc2 phylogenetic tree, and is most closely related to cyc2 homologs from and Geobacteraceae. It is unclear

Figure 3.5: Maximum-likelihood phylogenetic tree of mtr/mtoA homologs. Red stars indicate positions of mtoA homologs identified within the Brocéliande stream Fe mat metatranscriptome. what the role of Cyc2 is in the Geobacteracaea family, which are primarily known for their high-metal tolerance, Fe reduction capability, and strictly anaerobic metabolisms (Falkow et al., 2006). Expression of a cyc2 homolog has previously been

44 reported in Geobacter bemidjiensis (Merkley et al., 2015); however, this gene has not been linked to any particular metabolism within that clade, which has not been tested for Fe oxidation. Therefore, it is unclear whether Cyc2 functions as an Fe oxidase in Geobacter. Five mtrA homologs were identified in this metatranscriptome. Since it is unclear which (if any) of the five homologs to mtrA identified within the Brocéliande Fe mat are catalyzing Fe oxidation, or Fe reduction, I attempted to delineate potential Fe oxidases from known Fe reductases by making a phylogenetic tree from mtrA homologs (Figure 3.5). The resulting phylogenetic tree demonstrates the close phylogenetic relationship between mtrA homologs from known FeRB, such as Shewanella and Ferrimonas. Other known FeRB are present throughout this tree (outside of the Shewanella cluster), such as Magnetospirillum magneticum and Geobacter uraniireducens; these sequences, however, are localized to exclusively- Gammaproteobacteria clusters. Homologs to mtrA from the Fe-oxidizing FeOB, such as the Gallionellaceae, form a broad cluster, which also contains the five mtrA homologs identified within the Broceliande metatranscriptome. Homologs to mtrA identified within this system do not fall within the cluster that contains the known Fe reductases from Shewanellaceae (Pitts et al., 2003. In other words, most of the Broceliande mtrA homologs are phylogenetically closer to the functionally- characterized Fe oxidases MtoA from S. lithotrophicus ES-1 (Liu et al., 2012), than to the functionally-characterized Fe reductase MtrA from S. oneidensis MR-1 (Pitts et al., 2003). This supports the role of mtrA homologs as Fe oxidases within this system. The full MtrA phylogenetic tree is provided as a Figure A2 in Appendix A.

Expression levels of cyc2 homologs are similar to the expression levels of mtoA homologs in this ecosystem, in contrast to the orders-of-magnitude differences observed between mtoA and cyc2 in the Rifle aquifer metatranscriptome. Another

45 prominent difference is that cyc2 in Broceliande is either at or below median expression of the metatranscriptome. One possible reason for this is that Broceliande forest stream is not dominated by chemolithoautotrophic Gallionellaceae, as is the case within microbial community of the Rifle aquifer (Jewell et al., 2016). Leptothrix, a suspected heterotrophic FeOB (Fleming et al., 2018), is the dominant organism within the Broceliande forest stream (based on 16S rRNA amplicon data), but does not encode any known putative Fe oxidase and potentially utilizes a different mechanism for Fe oxidation. Therefore, Leptothrix may be dominating Fe oxidation in this ecosystem, lowering the relative abundance of Gallionellaceae-related transcripts. Unlike the metagenome-coupled metatranscriptome from the Rifle aquifer, the lack of genomic reference libraries from the Broceliande Fe mats prevents us from mapping the reads to individual genomes and calculating expression level in separate bins. Thus, even though the Gallionellaceae represent only a small fraction of this microbial community, detection of cyc2 transcripts that are affiliated with organisms that are only 2-6% of the microbial community, demonstrates the potential of this gene to act as a reliable marker for chemolithoautotrophic Fe oxidation.

The presence of cyc2 transcripts only within the top layers of the Fe mat may suggest possible niche preference for organisms encoding Cyc2. Geochemical data reported by Quaiser et al. (2014) indicates significantly different redox conditions

Figure 3.6: Geochemical data (adapted from Quaiser 46 et al., 2014) and schematic of the Brocéliande stream Fe mat, showing the localization of cyc2 and mtoA transcripts. Red stars indicate presence of cyc2 transcripts; green stars indicate presence of mtoA transcripts. between the “surface” and “depth” layers of the mat, evidenced by differences in redox potential and Fe(II) concentration (Figure 3.6). The surface samples have lower concentrations of dissolved Fe(II). While dissolved oxygen measurements indicate that oxygen levels in the surface samples are similar with those in the deep mat samples, the lower redox potential of deep mat samples is consistent with a difference in oxygen concentrations between the surface and deep samples, indicating potentially inaccurate oxygen readings reported by this study. The localization of cyc2 only within the surface samples may indicate a redox preference for the organisms that use this enzyme as an Fe oxidase. Moreover, it is possible that the surface of the iron mat is where much of the active Fe oxidation is occurring.

3.3.3 Genomic Co-occurrence of cyc2 and other EET enzymes

47 The co-occurrence of cyc2 and mtoA homologs within the genomes of known and suspected FeOB suggests that there may be some benefit to encoding both of these Fe oxidases. To explore whether mtoA and cyc2 commonly co-occur within the same organisms, I compiled all genomes encoding homologs to Cyc2 and MtoA, in addition to two other enzymes (Mto/MtrB and MtrC) that are thought to interact

Figure 3.7: Venn diagram showing genomic co-occurrence of genes implicated in Fe-cycling metabolisms. This figure does not include the metagenome- assembled Gallionellaceae bins form the Rifle aquifer that encode cyc2 and mtoAB.

with MtrA in some FeRB. Organisms that encode a combination of two or more of these proteins were subsequently identified.

Eight isolates that encode homologs to both mto/mtrAB and cyc2 have been identified (Figure 3.7); this does not include the three metagenome-assembled Gallionellaceae bins from the Rifle aquifer that encode cyc2 and mtoAB. Two of the eight isolates belong to the family Gallionellaceae and are closely related to the FeOB found within the Rifle aquifer and Brocéliande stream mat (Quaiser et al., 2014: Jewell et al., 2016l; One of the eight organisms, Rhodoferax ferrireducens, is a known

48 Fe-reducer (Finneran et al., 2003), which, in addition to cyc2 and mto/mtrAB, also encodes mtrC, an enzyme implicated solely in Fe-reduction. Given that mtrC is generally found in the genomes of Fe-reducing Shewanellaceae, and required for Fe reduction (Myers and Myers, 2002), its presence in the genome of R. ferrireducens suggests that the MtrCAB-homologous complex that it encodes functions in Fe reduction. However, phylogenetic comparison of the mtr/mtoA homolog from R. ferrireducens with the homologs from known and suspected FeOB, as well as known FeRB, such as Shewanella, indicates that the mtr/mtoA from R. ferrireducens is more closely related to homologs encoded by known and suspected FeOB (Figure 3.5). The identification of only eight genomes (out of 1,589 genomes that encode cyc2 and mtoA homologs) suggests that Cyc2 and Mtr/MtoA may be adapted to different niches. This is also consistent with the localization of cyc2 transcripts only within the surface Fe mat samples from the Brocéliande forest stream (homologs to mto/mtrA have been identified in both surface and deep mat samples). It is possible that the genomic co-occurrence of both Fe oxidases is correlated with environments where redox conditions drastically fluctuate, such as those associated with groundwater aquifers and seeps. It is possible that the presence of both Fe oxidases permits the encoding organisms to survive in a greater range of redox conditions, and provides an advantage to organisms that live in environments where such conditions fluctuate, such as iron seeps and aquifers. Another possibility is that MtoA and Cyc2 are optimized for different Fe(II) sources (e.g. soluble and insoluble); this may indicate that organisms in which both Fe oxidases are present are exposed to multiple sources of Fe.

49 3.4 Conclusions The high level of cyc2 expression in environments with active Fe oxidation supports the role of cyc2 in Fe oxidation. This gene represents one of the most highly- abundant transcripts in ecosystems dominated by the marine obligate FeOB Zetaproteobacteria (from which the cyc2 homolog used in the E. coli expression work was taken) as well as the groundwater Gallionellaceae. At Loihi Seamount and Rifle Aquifer, these organisms represent the dominant FeOB, so it would make sense that if cyc2 codes for an Fe oxidase, it is one of the most highly-expressed genes in these ecosystems. In Brocéliande forest stream mats where Gallionellaceae are not dominant, but represent only a small fraction of the microbial community, cyc2 transcripts are still detected at levels comparable to the median expression of genes in that ecosystem. This also supports the potential role for cyc2 as a genetic marker for Fe oxidation in different ecosystems.

50 Chapter 4

INSIGHTS INTO DIVERSITY OF CYC2

4.1 Introduction The previous section has provided insights into the role of Cyc2 in natural ecosystems through the analysis of metatranscriptomic datasets. The following section will focus on the in silico analysis of the cyc2 gene with respect to its sequence diversity. The structure and phylogeny of Cyc2 has the potential to offer insights into the evolution of this porin-cytochrome enzyme. To look for evidence for the gene fusion event that may have led to the evolution of Cyc2, I looked for variations of the porin-monoheme cytochrome structure among the Cyc2 homologs, specifically focusing on short “standalone” cytochromes that were homologous to Cyc2, but lacking the porin. Moreover, I used a custom-made HMM (the same one used to search through metatranscriptomic datasets from sections 3.10 and 3.11) to search the UniProtKB database for more distantly-related Cyc2 homologs.

4.2 Materials and Methods

4.2.1 Identification of Cyc2 and MtrCAB Homologs Within Sequence Databases For identification of Cyc2 and MtrCAB homologs, I used BLAST to search for homologs of each gene. As queries, I used sequences that were collected from a phylogenetically diverse sample of organisms (Appendix: Table A1) against the NCBI non-redundant (nr) protein database. Homologs were initially filtered with an e-value

51 of 1E-5. These BLAST-identified homologs were then further scrutinized to remove sequences that were truncated or erroneously identified. For Cyc2 homologs, sequences shorter than 289 amino acids (i.e. the length of the shortest known 16- strand ß-barrel porin, Tamm et al., 2004) and/or missing the canonical CxxCH heme- binding motif were excluded. MtrA and MtrC homologs with less than 9, or more than 11 hemes were also filtered out. Cyc2 homologs were also identified within the UniProtKB database using an MSA of Cyc2.

4.2.2 Multiple Sequence Alignments of Cyc2 and MtrA homologs Multiple sequence alignments (MSA) for Cyc2 and MtrA homologs were carried out in MUSCLE v.3.8.31, and visualized using AliView v.3.0. The alignments were manually-curated by trimming abnormally long sequences and correcting for potential misalignments. The curated MSAs were then used to generate phylogenetic trees using the maximum-likelihood tree-building program RAxML v.8.2.8 (raxml- pthreads -x 2421 -f a -m PROTCATBLOSUM62 -p 748 -N 100). The resulting phylogenetic trees were visualized using FigTree v.1.4.3, and illustrated in Affinity Designer.

4.2.3 Cyc2 and MtrA Amino Acid Identity Analysis Amino acid identity (AAI) was computed for pairwise-aligned Cyc2 and MtoA homologs. Additionally, AAI was calculated for the Cyc2 protein cytochrome and porin regions separately. The cytochrome portion of Cyc2 was defined as the region starting 14 residues upstream and ending 21 residues downstream from the heme- binding site(CxxCH). The rest of the sequence downstream of this was defined as the

52 β-barrel porin domain. Pairwise alignments were carried out using MUSCLE, and AAI calculation was carried out using a custom python script (provided in Appendix: Figure A5). The AAI values were imported into R and a histogram was plotted.

4.2.4 Hidden Markov Model (HMM) Validation For validation of HMMs used in this thesis, the HMMs were used to search for homologs in the UniProtKB database. The results were parsed using a python script, bit scores extracted, and plotted as a histogram in RStudio. For Cyc2, the results were separated based on presence/absence of a single heme-binding motif, minimum length of 289 amino acids, and bit score. In addition, I also examined each sequence visually for presence of a conserved motif (in regex: Px[FY]AR[QK][TY]G) that is characteristic of the Cyc2 cytochrome. This was done to constrain the bit score above which there are the most number of true Cyc2 homologs (with the least number of false positives), and below which there are the most number of erroneous matches (with the least number of false negatives). Erroneous matches are defined as those without a single heme-binding motif and length shorter than 289 amino acids. Sequences that had the heme-binding site, but lacked the conserved upstream motif that is characteristic to Cyc2 were also considered erroneous matches. A similar method was used for validation of the MtrA/MtoA HMM, except that the HMM search results were separated based on presence/absence of nine or ten heme- binding motifs and a length range (defined as between 300 and 400 residues, based on examination of MtrA homologs identified using BLAST and used for the MtrA phylogenetic tree). Being unaware of the conserved motifs that may be present within MtrA/MtoA, and the much-greater quantity of identified homologs, I did not visually inspect the MtrA/MtoA HMM matches.

53

4.3 Results and Discussion

4.3.1 Cyc2 lateral gene transfer The scattered distribution of Cyc2 homologs is not congruent with the

Table 4.1: Organisms that encode homologs to cyc1 and cyc2 adjacent to each other in their genomes. The accession number for each gene is listed, as well as the BLAST e-value. The query genes for BLAST consisted of the cyc1 and cyc2 homologs from At. ferrooxidans.

54 distribution of phylogenetically-relevant marker genes, such as the 16s rRNA (Chan et al., 2018, bioRxiv; Hug et al., 2016). This suggests that cyc2 has undergone lateral gene transfer events. This is also supported by the fact that the cyc2 gene is rarely found in operons associated with known electron transport chains (such as rus operon of At. ferrooxidans). An exception to this is the localization of cyc2 next to cyc1, which is a periplasmic cytochrome found in the rus operon; this gene is found adjacent to cyc2 in two regions within the At. ferrooxidans genome, and is co- transcribed with cyc2 in both operons (Appia-Ayme et al., 1998; Norris et al., 2018).

Homologs to the cyc1 gene that are adjacently-encoded to cyc2 have been identified in 50 other genomes (Table 4.2). It is possible that cyc1 and cyc2 have been laterally- transferred together and function as redox couples in those organisms.

4.3.2 Cyc2 tertiary structure Cyc2 is an outer- membrane b-barrel porin, with a monoheme cytochrome domain present at the N-terminal, evidenced by a single canonical heme c binding site (CxxCH) (Figure Figure 4.1: 3D model of Cyc2 generated using MODELLER, based on the structural 4.1, from Chan et al., 2018, homolog, OprA, identified by bioRxiv). The rest of the HHPRED. Figure from Chan et al., 2018, bioRxiv.

55 sequence likely folds into 16 b-strands, and contains no other conspicuous cofactor- binding sites. This porin-cytochrome topology likely represents a fusion event between a small periplasmic c-type cytochrome and an outer-membrane porin. The single heme-binding motif is preceded by a relatively conserved motif ten residues upstream, as well as a conserved PxL motif 5 residues downstream. Some variations to these motifs also exist among the Cyc2 homologs. These conserved motifs may code for amino acids that interact with the redox-active heme. The b-barrel structure of the protein is supported by structural homology modeling, which was carried out using HHPRED, part of the Max-Planck Institute (MPI) Bioinformatics Toolkit (Söding et al., 2005; Zimmermann et al., 2017). 16 of the full-length Cyc2 homologs identified by BLAST structurally model after beta-barrel porins (Table 4.3, from Chan et al., 2018 bioRxiv).

56 Structural homologs to the N-terminal monoheme cytochrome are typically weaker than the homologs to the C-terminal porin, with e-values greater than 1E-5 and

Figure 4.2: Amino acid identities calculated from pairwise alignment of each gene, or portion of gene.

57 probabilities below 95%.

Table 4.2. Results from HHPRED structural homology search, showing the closest structural homologs to a representative set of Cyc2 homologs (Chan et al., 2018, bioRxiv)

4.3.3 HMM Validation and Identification of Cyc2-related “standalone” cytochromes While the Cyc2 cytochrome is relatively well-conserved, the majority of the Cyc2 protein sequence consists of an outer membrane porin (Figure 4.1); the generally-low sequence conservation associated with porins (Nikaido, 2003) limits the diversity of the Cyc2 homologs that can be identified through BLAST, which is

58 sensitive to amino acid identity. The low average amino acid identity (22%) is primarily caused by the porin, which makes up most of the sequence (Figure 4.2). In contrast, pairwise comparisons between only the cytochrome portion of the proteins results in an average amino acid identity values of 44%. To uncover more distant, but still identifiably Cyc2-related, cytochromes a more diverse set of sequences needs to be identified. Through the use of a custom-made Cyc2 HMM (the same one used to search through metatranscriptomic datasets from section 3.2), I was able to expand the diversity of known Cyc2 homologs.

Validation of the Cyc2 HMM was carried out by analyzing the results when the HMM was used to search the UniProtKB database. In addition to utilizing a custom python script (Appendix: Figure A4), I also visually inspected each sequence for the presence of the conserved motif that is upstream the heme-binding site and characteristic to Cyc2. This allowed me to manually constrain the approximate bit score which would yield the least number of false positives and false negatives: 125 (Figure 4.3). Above a bit score of 125, there are 746 HMM matches; of these, 716 were deemed to be bonafide Cyc2 homologs, based on length, presence of single heme-binding site, and presence of short conserved motif preceding the heme- binding site. Above the bit score of 125, I identified only 30 matches that either 1) were too short to encode a porin, 2) did not encode a heme-binding motif or 3) had the heme-binding site, but the upstream motif was different enough from the Cyc2 cytochrome that claiming homology would be controversial; these false positives represent 4.0% of all matches above a bit score of 125. Below a bit score of 125, there were 1,606 HMM matches, and I was able to identify 36 that were related to Cyc2, suggesting that the false negative rate is 2.1%.

I was unsuccessful in my attempt to validate the HMM derived from a curated alignment of representative MtrA/MtoA sequences. In addition to there being significantly more HMM matches from UniProtKB, compared with Cyc2 (18,602 vs.

59 2,352), I could not constrain a bit score that accurately separated real homologs from background noise. There may not be enough information on MtrA/MtoA; thus, in the analysis of metatranscriptomic datasets, we used the MtrA HMM as a discovery tool, and manually aligned each match with representative MtrA/MtoA sequences to make sure that the match was a true homolog.

60

Figure 4.3: Histogram of bit scores corresponding to Cyc2 HMM matches identified within the UniProtKB database. Vertical line denotes the proposed bit score cut-off for the HMM. *Cyc2 homologs are those matches whose sequences 1) contain the heme-binding motif (CxxCH), 2) have a conserved motif upstream the heme-binding site, and 3) are long enough to encode a 16-strand beta-barrel porin (Tamm, et al., 2005).

Cyc2 homologs that were shorter than 289 residues, but looked related to Cyc2 were further scrutinized to rule out truncated genes at the ends of contigs and frame-shift errors. I identified 16 homologs to the Cyc2 cytochrome that were standalone cytochromes, too short to encode a porin. A subset of these are shown in Figure 4.4. The short cytochrome-only Cyc2 homologs included the short 200-250 aa Cyc2 homologs from the Geobacteraceae family, as well as homologs from Leptospirillum, Paraburkholderia, and Candidate Phyla. In addition, two standalone cytochromes were identified in two Zetaproteobacteria bins from Crystal Geyser (Probst et al., 2016); These were approximately 100 amino acids in length.

61

Figure 4.4: Gene predictions (Prodigal v.2.6.3) for six of the 19 short “cytochrome-only” Cyc2 homologs, demonstrating the presence of potential start and stop codons denoting each gene; stop codons indicated by asterisks. These 16 Cyc2 cytochromes were aligned with a representative set of 77 Cyc2 homologs from the Cyc2 phylogenetic tree (Chan et al., 2018, bioRxiv). Alignment was trimmed to remove the porin part of the sequences from the full-length Cyc2 homologs. This resulted in an alignment of 42 positions (Figure 4.5). Alignment with sequence information is shown in appendix Figure A6.

62

Figure 4.5: Alignment of 16 “standalone” Cyc2 cytochromes against 17 cytochromes from the representative set of 77, which were derived from full-length Cyc2 homologs. Alignments was carried out in Muscle and manually curated.

63 To see whether the short standalone Cyc2-related cytochromes form a monophyletic cluster when treed with other Cyc2 homologs, a phylogenetic tree was created using RAxML (Figure 4.6). The tree presents three clusters similar to the clusters observed in the Cyc2 tree from Chan et al., (2018, bioRxiv). The short standalone cytochromes (shown as gray stars superimposed over branches) are paraphyletic with respect to the bonafide Cyc2 homologs, and are present in all three

64 clusters of the Cyc2 tree.

Figure 4.6: Cyc2 Phylogenetic tree of 77 cytochromes derived from a representative set of Cyc2 homologs from the Chan et al., 2018 (bioRxiv), as well as 16 cytochromes derived from HMM-identified homologs to Cyc2 lacking the porin, which are denoted as gray stars.

4.4 Conclusions The structure of Cyc2 (porin-cytochrome) is suggestive of its role as an Fe oxidase. This protein likely evolved as a fusion of a porin and a c-type cytochrome.

65 While the protein is generally poorly-conserved (due to the porin making up most of the sequence), the cytochrome domain is relatively well-conserved. A validated Cyc2 HMM was used to identify more distant homologs to Cyc2, particularly focusing on homologs that consisted of a short “standalone” cytochrome without the porin. The identified short cytochromes were paraphyletic within the Cyc2 phylogenetic tree. The presence of short “standalone” Cyc2 cytochromes in all three clusters is suggestive of multiple gene-fission events, in which the cytochrome split off from the porin, and functions independently, perhaps as a soluble periplasmic electron carrier. It is possible that the cytochrome may still interact with a porin to achieve Fe oxidation. It is also possible that one of the “standalone” Cyc2 cytochromes represents an ancestor before the gene-fusion event (perhaps one of the relatively longer branch present within cluster 3). While the UniProtKB database is one of the biggest curated protein databases, it is limited. Sequence repositories (such as IMG and NCBI) are currently being inundated with metagenomic/metatranscriptomic sequence datasets; exploration of these datasets may allow for the identification of more distant relatives to Cyc2 and provide more clues to the origin of this widespread Fe oxidase.

66 Chapter 5

PRELIMINARY WORK WITH SIDEROXYDANS LITHOTROPHICUS ES-1: CYC2 EXPRESSION UNDER GROWTH ON THIOSULFATE AND FE(II)

5.1 Introduction Sideroxydans lithotrophicus ES-1 encodes more than one putative Fe oxidase. In addition to encoding a homolog to Cyc2, which clusters phylogenetically with the Cyc2 homologs from other known neutrophilic microaerophilic FeOB, ES-1 also encodes homologs to MtrA and MtrB. The MtrA homolog in ES-1 (named MtoA), has been characterized as an Fe oxidase in this organism (Liu et al., 2012). However, given the growing evidence for Cyc2 as an Fe oxidase, it necessary to understand the role of each Fe oxidase in this organism. Given the ability of ES-1 to grow in either ferrous iron or thiosulfate as sole sources of energy, we set out to measure the expression of cyc2 during growth on these energy sources. The objective was to detect differential expression of cyc2 between different growth substrates. While we have, thus far, been unsuccessful in obtaining RNA from Fe-grown cultures, we present some preliminary data, obtained from RNA extracted from thiosulfate-grown cells.

67 5.2 Methods

5.2.1 Culturing of Sideroxydans lithotrophicus ES-1: ES-1 was grown in Modified Wolfe’s Minimal Medium (MWMM), also known as artificial freshwater (AFW) (Emerson and Floyd, 2005). Media was prepared as follows: 100 mL of AFW was aliquoted into 150 mL serum bottles, sparged with nitrogen gas for 30 min, sealed with rubber butyl stoppers, and autoclaved (121˚C, 50 minutes). After sterilization, the headspace of each serum bottle was sparged with nitrogen gas for an additional 5 minutes. Upon cooling, each bottle was supplemented with 2 mL of 1 M sodium bicarbonate, and pH adjusted to 6.2-6.4 by bubbling with carbon dioxide gas. Each serum bottle was also supplemented (in a 1:1000 ratio) with ATCC® vitamins (MD-VSTM) and minerals (MD-TMSTM). The headspace of each culture was replenished daily with 3% oxygen and 20% carbon dioxide (balanced with nitrogen). Growth on Fe was sustained with a daily addition of

FeCl2 to a concentration of 500 µM Fe(II), while the S-oxidizing cultures received a one-time addition of 5 mM sodium thiosulfate. For scaled-up cultures, ES-1 was inoculated into 250 mL of MWMM medium (inside 750 mL glass bottles, stoppered with rubber butyl stoppers), prepared and maintained the same way as the 100 mL cultures. Growth was monitored daily with cell counts of Syto13-stained culture samples.

5.2.2 Ferrozine measurements Dissolved Fe2+ was measured using the ferrozine assay (adapted from Stookey et al., 1970). Measurements were conducted inside 96-well non-treated microplates. At selected time points, 150 µL culture samples were combined with 40 µL of 1.225 mM ferrozine, 50 µL of 6.87 M acetate buffer, and 10 µL of H2O inside 200 µL 96-well non-treated microplates, and incubated for 15 min in the dark before absorption was

68 measured at 562 nm using a plate reader (Perkin Elmer 1420 Multilabel Counter Victor3V).

5.2.3 Culture harvesting, DNA/RNA extraction, and molecular analyses: At selected time points, cultures were filtered through 0.2 µM nylon filters and resuspended in 350 µL of MDB buffer and transferred to lysing matrix E tubes for RNA purification using the NucleoSpin RNA kit (Macherey-Nagel) with manufacturer’s instructions; extractions from FeCl2-supplemented cultures were amended with 5 mM sodium citrate after resuspension in MDB buffer. DNA was extracted from late- log phase thiosulfate and Fe-grown cultures using the FastDNA SPIN Kit for Soil (MP Biomedicals), following manufacturer’s instructions. For detection of cyc2 transcripts, primers were designed for each of the three homologs encoded in S. lithotrophicus ES-1. Primer design work was carried out by Sean McAllister and Andrew Currie. The primers were optimized to yield a 90 bp amplification product, which is within the recommended size range for quantitative PCR. The primers amplify the genomic region of cyc2 that encodes part of the porin, as that was the only region with enough sequence variability between the three cyc2 homologs to allow for gene specificity. The primers were tested on ES-1 DNA to confirm amplification and specificity. To test the extracted intact RNA from ES-1 for presence of cyc2 transcripts, total RNA was DNase-treated and reverse transcribed using random primers included in the Invitrogen SuperScript IV VILO reverse transcriptase kit. The reverse transcribed RNA, in addition to the non-reverse transcribed RNA as a control, was included in a PCR reaction containing primer sets 1 and 2. The PCR reaction consisted of an initial denaturing step of 95˚C for 5 minutes, then 30 cycles of amplification; the amplification cycles consisted of denaturation at 94˚C for 1 minute, annealing at 59˚C

69 for 1.5 minutes, and elongation at 72˚C for 2 minutes; the amplification cycles were followed by an additional incubation step at 72˚C for 7 minutes, and the products were moved subsequently moved for 4˚C. The PCR products were then imaged on a 1% agarose gel, stained with ethidium bromide.

Figure 5.1: Growth curves for ES-1 growth on Fe(II) and thiosulfate.

5.3 Results

5.3.1 Growth of Sideroxydans lithotrophicus ES-1: The growth rate of ES-1 on Fe(II) and thiosulfate were similar (Figure 5.1), with both cultures reaching stationary phase approximately 10 days after inoculation

70 at a density of 106 cells/mL, consistent with findings from Emerson and Moyer (1997). Cell density of both cultures at stationary phase was consistently above 108 cells/mL with visible turbidity in thiosulfate-grown cultures. While Fe(III) oxides are prevalent in the Fe-grown cultures, thiosulfate oxidation by ES-1 yielded no visible sulfur suggesting complete oxidation of thiosulfate to sulfate.

Figure 5.2: Gel of cyc2 amplification from ES-1 DNA. The expected size of each band is between 90 and 100 bp, corresponding to the presence of bands in lanes A-F. Lanes G-L represent extraction and PCR negatives.

71

5.3.2 Amplification of cyc2 from ES-1 DNA: DNA was extracted from late-log phase ES-1 cultures grown on Fe(II) and thiosulfate. Amplification was carried out using all three primers and resulted in clear bands corresponding to the expected size of the amplification products (Figure 5.2). Amplification of DNA extracted from Fe-grown ES-1 cultures yielded an additional product of approximately 1000 bp in size when amplified with the third primer set. This product is possibly attributed to the binding of one of the primers to a different cyc2 homolog since all three of the cyc2 homologs are encoded in tandem within the ES-1 genome, the resulting amplification product would be approximately 1000 bp in length, corresponding to the size of the observed extraneous band.

72 Figure 5.3: Gel of cyc2 amplification from reverse-transcribed RNA (cDNA) extracted from ES-1 grown on thiosulfate and Fe(II). The expected size of each band is between 90 and 100 bp. Faint bands are seen in lanes 1-2 (white arrow), corresponding to thiosulfate-grown ES- 1 cDNA samples.

5.3.3 Amplification of cyc2 from ES-1 RNA: RNA purification from late-log phase ES-1 cultures resulted in intact RNA only from thiosulfate-grown cultures. Even though the amount of biomass present within

thiosulfate and Fe-grown cells is similar, the presence of Fe oxides possibly interfered with the RNA purification procedure and/or facilitated the degradation of RNA. Intact RNA from thiosulfate cultures was tested for the presence of cyc2 transcripts by reverse transcribing RNA and carrying out a PCR reaction on the resulting cDNA. The presence of faint bands corresponding to the expected size range of the cyc2 amplification product (90 bp) were detected from thiosulfate-grown ES-1 cDNA

73 (Figure 5.3). These bands were absent in the PCR reagent controls, as well as the non- reverse-transcribed ES-1 RNA, ruling out the possibility of DNA contamination.

5.4 Conclusions The presence of cyc2 transcripts in thiosulfate-grown ES-1 cells may indicate that this gene is constitutively expressed. However, without transcripts from Fe- grown cells to compare the expression level, it is difficult to tell whether the detected transcripts represent a high or low level of expression. It may be the case that ES-2 maintains some basal level of Cyc2 expression, and upregulates when Fe(II) is encountered. Conversely, it is also possible that limitation or absence of Fe(II) stimulates expression of Cyc2. More work is necessary to address these possibilities and constrain the role of Cyc2 in the metabolism of ES-1. Measurement of differential expression of cyc2 and mtoA during growth on Fe and thiosulfate will likely provide valuable insights into the importance of each Fe oxidase to the metabolism of ES-1. RT-qPCR and transcriptomics on cultures that are switched between S and Fe metabolisms will also yield valuable insights into the regulation of ES-1’s putative Fe oxidases and likely improve understanding of metabolic adaptations to multiple energy sources in chemolithoautotrophs.

74 REFERENCES

Appia-Ayme, C., Bengrine, A., Cavazza, C., Giudici-Orticoni, M. T., Bruschi, M., Chippaux, M., & Bonnefoy, V. (1998). Characterization and expression of the co- transcribed cyc1 and cyc2 genes encoding the cytochrome c4 (c552) and a high- molecular-mass cytochrome c from Thiobacillus ferrooxidans ATCC 33020. FEMS Microbiology Letters, 167(2), 171–177. https://doi.org/10.1016/S0378- 1097(98)00385-1

Barco, R. A., & Edwards, K. J. (2014). Interactions of proteins with biogenic iron oxyhydroxides and a new culturing technique to increase biomass yields of neutrophilic, iron-oxidizing bacteria. Frontiers in Microbiology, 5(MAY), 1–11. https://doi.org/10.3389/fmicb.2014.00259 Barco, R. A., Emerson, D., Sylvan, J. B., Orcutt, B. N., Jacobson Meyers, M. E., Ramírez, G. A., … Edwards, K. J. (2015). New insight into microbial iron oxidation as revealed by the proteomic profile of an obligate iron-oxidizing chemolithoautotroph. Applied and Environmental Microbiology, 81(17), 5927– 5937. https://doi.org/10.1128/AEM.01374-15 Barco, R. A., Hoffman, C. L., Ramírez, G. A., Toner, B. M., Edwards, K. J., & Sylvan, J. B. (2017). In-situ incubation of iron-sulfur mineral reveals a diverse chemolithoautotrophic community and a new biogeochemical role for Thiomicrospira. Environmental Microbiology, 19(3), 1322–1337. https://doi.org/10.1111/1462-2920.13666 Beckwith, C. R., Edwards, M. J., Lawes, M., Shi, L., Butt, J. N., Richardson, D. J., & Clarke, T. A. (2015). Characterization of MtoD from Sideroxydans lithotrophicus: A cytochrome c electron shuttle used in lithoautotrophic growth. Frontiers in Microbiology, 6(APR). https://doi.org/10.3389/fmicb.2015.00332

Bolger, A. M., Lohse, M., & Usadel, B. (2014). Trimmomatic: A flexible trimmer for Illumina sequence data. Bioinformatics, 30(15), 2114–2120. https://doi.org/10.1093/bioinformatics/btu170

Bose, A., & Newman, D. K. (2011). Regulation of the phototrophic iron oxidation (pio) genes in Rhodopseudomonas palustris TIE-1 is mediated by the global regulator, FixK. Molecular Microbiology, 79(1), 63–75. https://doi.org/10.1111/j.1365-2958.2010.07430.x

75 Buschmann, S., Warkentin, E., Xie, H., Langer, J. D., Ermler, U., & Michel, H. (2010). The Structure of cbb3 Cytochrome Oxidase Provides Insights into Proton Pumping. Science, 329(5989), 327–330. https://doi.org/10.1126/science.1187303

Carlson, H. K., Iavarone, A. T., Gorur, A., Yeo, B. S., Tran, R., Melnyk, R. A., … Coates, J. D. (2012). Surface multiheme c-type cytochromes from Thermincola potens and implications for respiratory metal reduction by Gram-positive bacteria. Proceedings of the National Academy of Sciences, 109(5), 1702–1707. https://doi.org/10.1073/pnas.1112905109 Caugheys, S. (1990). Infrared Evidence of Cyanide Binding Bovine Heart Cytochrome c Oxidase to Iron and Copper Sites in, 265(14). Darling, A. E., Jospin, G., Lowe, E., Matsen, F. A., Bik, H. M., & Eisen, J. A. (2014). PhyloSift: phylogenetic analysis of genomes and metagenomes. PeerJ, 2, e243. https://doi.org/10.7717/peerj.243

Druschel, G. K., Emerson, D, Sutka, R., Suchecki, P., & Luther III, G.W. (2008) Low- oxygen and chemical kinetic constraints on the geochemical niche of neutrophilic iron(II) oxidizing microorganisms. Geochim Cosmochim Acta 72:3358–70. du Plessis, L., Škunca, N., & Dessimoz, C. (2011). The what, where, how and why of gene ontology-A primer for bioinformaticians. Briefings in Bioinformatics, 12(6), 723–735. https://doi.org/10.1093/bib/bbr002 Edgar, R. C. (2004). MUSCLE: Multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research, 32(5), 1792–1797. https://doi.org/10.1093/nar/gkh340 Emerson, D., Field, E. K., Chertkov, O., Davenport, K. W., Goodwin, L., Munk, C., … Woyke, T. (2013). Comparative genomics of freshwater Fe-oxidizing bacteria: Implications for physiology, ecology, and systematics. Frontiers in Microbiology, 4(SEP), 1–17. https://doi.org/10.3389/fmicb.2013.00254

Emerson, D., & Floyd, M. M. (2005). Enrichment and isolation of iron-oxidizing bacteria at neutral pH. Methods in Enzymology, 397(Ii), 112–123. https://doi.org/10.1016/S0076-6879(05)97006-7

Emerson, D. & Moyer, C. L. (1997). Isolation and characterization of novel iron- oxidizing bacteria that grow at circumneutral pH . Isolation and Characterization of Novel Iron-Oxidizing Bacteria That Grow at Circumneutral pH. Applied and Environmental Microbiology, 63(12), 4784–4792.

Emerson, J. B., Thomas, B. C., Alvarez, W., & Banfield, J. F. (2016). Metagenomic analysis of a high carbon dioxide subsurface microbial community populated by

76 chemolithoautotrophs and bacteria and archaea from candidate phyla. Environmental Microbiology, 18(6), 1686–1703. https://doi.org/10.1111/1462- 2920.12817 Field, E. K., Sczyrba, A., Lyman, A. E., Harris, C. C., Woyke, T., Stepanauskas, R., & Emerson, D. (2014). Genomic insights into the uncultivated marine Zetaproteobacteria at Loihi Seamount. The ISME Journal, 9(4), 857–870. https://doi.org/10.1038/ismej.2014.183 Fischer, F., Künzler, P., Ritz, D., Hennecke, H., Fischer, F., Ku, P., & Institut, M. (1995). Escherichia coli genes required for cytochrome c maturation . These include : Escherichia coli Genes Required for Cytochrome c Maturation, 177(15), 4321–4326. Fleming, E. J., Woyke, T., Donatello, A. R., Kuypers, M. M. M., Sczyrba, A., Littmann, S., & Emerson, D. (2018). Insights into the fundamental physiology of the uncultured Fe-oxidizing bacterium Leptothrix ochracea. Applied and Environmental Microbiology, (February), AEM.02239-17. https://doi.org/10.1128/AEM.02239-17 Giovannetti, R. (2012). The Use of Spectrophotometry UV-Vis for the Study of Porphyrins. Macro To Nano Spectroscopy. https://doi.org/10.5772/38797 Haas, B. J., Papanicolaou, A., Yassour, M., Grabherr, M., Blood, P. D., Bowden, J., … Regev, A. (2013). De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nature Protocols, 8(8), 1494–1512. https://doi.org/10.1038/nprot.2013.084 Hazra, T. K., Mukherjea, M., & Mukherjea, R. N. (1992) Role of rusticyanin in the electron transport process in Thiobacillus ferrooxidans. Indian Journal of Biochemistry and Biophysics 29(1), 77-81

Hirai, T., Osamura, T., Ishii, M., & Arai, H. (2016). Expression of multiple cbb 3 cytochrome c oxidase isoforms by combinations of multiple isosubunits in Pseudomonas aeruginosa. Proceedings of the National Academy of Sciences, 113(45), 12815–12819. https://doi.org/10.1073/pnas.1613308113

Hug, L. A., Baker, B. J., Anantharaman, K., Brown, C. T., Probst, A. J., Castelle, C. J., … Banfield, J. F. (2016). A new view of the tree of life. Nature Microbiology, 1(5), 1–6. https://doi.org/10.1038/nmicrobiol.2016.48 Hyatt, D., Chen, G. L., LoCascio, P. F., Land, M. L., Larimer, F. W., & Hauser, L. J. (2010). Prodigal: Prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics, 11. https://doi.org/10.1186/1471-2105-11- 119 Ilbert, M., & Bonnefoy, V. (2013). Insight into the evolution of the iron oxidation pathways. Biochimica et Biophysica Acta - Bioenergetics, 1827(2), 161–175. https://doi.org/10.1016/j.bbabio.2012.10.001 Jeans, C., Singer, S. W., Chan, C. S., VerBerkmoes, N. C., Shah, M., Hettich, R. L., … Thelen, M. P. (2008). Cytochrome 572 is a conspicuous membrane protein

77 with iron oxidation activity purified directly from a natural acidophilic microbial community. The ISME Journal, 2(5), 542–550. https://doi.org/10.1038/ismej.2008.17 Jewell, T. N. M., Karaoz, U., Brodie, E. L., Williams, K. H., & Beller, H. R. (2016). Metatranscriptomic evidence of pervasive and diverse chemolithoautotrophy relevant to C, S, N and Fe cycling in a shallow alluvial aquifer. ISME Journal, 10(9), 2106–2117. https://doi.org/10.1038/ismej.2016.25 Jiao, Y., & Newman, D. K. (2007). The pio operon is essential for phototrophic Fe(II) oxidation in Rhodopseudomonas palustris TIE-1. Journal of Bacteriology, 189(5), 1765–1773. https://doi.org/10.1128/JB.00776-06 Kato, S., Ohkuma, M., Powell, D. H., Krepski, S. T., Oshima, K., Hattori, M., … Chan, C. S. (2015). Comparative genomic insights into ecophysiology of neutrophilic, microaerophilic iron oxidizing bacteria. Frontiers in Microbiology, 6(NOV), 1–16. https://doi.org/10.3389/fmicb.2015.01265 Kubicek, J., Block, H., Maertens, B., Spriestersbach, A., & Labahn, J. (2014). Expression and purification of membrane proteins. Methods in Enzymology, 541, 117–140. https://doi.org/10.1016/B978-0-12-420119-4.00010-0 Langmead, B., & Salzberg, S. L. (2012). Fast gapped-read alignment with Bowtie 2. Nature Methods, 9(4), 357–359. https://doi.org/10.1038/nmeth.1923

Leang, C., Qian, X., Mester, T., & Lovley, D. R. (2010). Alignment of the c-type cytochrome OmcS along pili of Geobacter sulfurreducens. Applied and Environmental Microbiology, 76(12), 4080–4084. https://doi.org/10.1128/AEM.00023-10

Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., … Durbin, R. (2009). The Sequence Alignment/Map format and SAMtools. Bioinformatics, 25(16), 2078–2079. https://doi.org/10.1093/bioinformatics/btp352

Liu, J., Wang, Z., Belchik, S. M., Edwards, M. J., Liu, C., Kennedy, D. W., … Shi, L. (2012). Identification and characterization of MtoA: A decaheme c-type cytochrome of the neutrophilic Fe(II)-oxidizing bacterium Sideroxydans lithotrophicus ES-1. Frontiers in Microbiology, 3(FEB), 1–11. https://doi.org/10.3389/fmicb.2012.00037

Malarte, G., Leroy, G., Lojou, E., Abergel, C., Bruschi, M., & Giudici-Orticoni, M. T. (2005). Insight into molecular stability and physiological properties of the diheme cytochrome CYC41 from the acidophilic bacterium Acidithiobacillus ferrooxidans. Biochemistry, 44(17), 6471–6481. https://doi.org/10.1021/bi048425b Merkley, E. D., Wrighton, K. C., Castelle, C. J., Anderson, B. J., Wilkins, M. J., Shah, V., … Lipton, M. S. (2015). Changes in protein expression across laboratory and

78 field experiments in Geobacter bemidjiensis. Journal of Proteome Research, 14(3), 1361–1375. https://doi.org/10.1021/pr500983v

Modi, N., Ganguly, S., Bárcena-Uribarri, I., Benz, R., Van Den Berg, B., & Kleinekathöfer, U. (2015). Structure, dynamics, and substrate specificity of the OprO porin from Pseudomonas aeruginosa. Biophysical Journal, 109(7), 1429– 1438. https://doi.org/10.1016/j.bpj.2015.07.035

Myers, C. R., & Myers, J. M. (2002). MtrB is required for proper incorporation of the cytochromes OmcA and OmcB into the outer membrane of Shewanella putrefaciens MR-1. Applied and Environmental Microbiology, 68(11), 5585– 5594. https://doi.org/10.1128/AEM.68.11.5585-5594.2002

Neubauer, S., Emerson, D., & Megonigal, J.P. (2002) Life at the energetic edge: kinetics of circumneutral iron oxidation by lithotrophic iron-oxidizing bacteria isolated from the wetland-plant rhizosphere. Appl Env Microbiol, DOI: 10.1128/AEM.68.8.3988

Nikaido, H. (2003) Molecular basis of bacterial outer membrane permeability revisited. Microbiol Mol Biol R 67: 593–656.

Norris, P. R., Laigle, L., & Slade, S. (2018). Cytochromes in anaerobic growth of Acidithiobacillus ferrooxidans. Microbiology (United ), 164(3), 383– 394. https://doi.org/10.1099/mic.0.000616 Osorio, H., Mangold, S., Denis, Y., Ñancucheo, I., Esparza, M., Johnson, D. B., … Holmesa, D. S. (2013). Anaerobic sulfur metabolism coupled to dissimilatory iron reduction in the extremophile Acidithiobacillus ferrooxidans. Applied and Environmental Microbiology, 79(7), 2172–2181. https://doi.org/10.1128/AEM.03057-12 Pinske, C., Bönn, M., Krüger, S., Lindenstrauß, U., & Sawers, R. G. (2011). Metabolic deficiences revealed in the biotechnologically important model bacterium Escherichia coli BL21(DE3). PLoS ONE, 6(8). https://doi.org/10.1371/journal.pone.0022830 Pitts, K. E., Dobbin, P. S., Reyes-Ramirez, F., Thomson, A. J., Richardson, D. J., & Seward, H. E. (2003). Characterization of the Shewanella oneidensis MR-1 decaheme cytochrome MtrA: Expression in Escherichia coli confers the ability to reduce soluble FE(III) chelates. Journal of Biological Chemistry, 278(30), 27758–27765. https://doi.org/10.1074/jbc.M302582200 Platt, A., Ross, H. C., Hankin, S., & Reece, R. J. (2000). The insertion of two amino acids into a transcriptional inducer converts it into a galactokinase. Proceedings of the National Academy of Sciences, 97(7), 3154–3159. https://doi.org/10.1073/pnas.97.7.3154

79 Probst, A. J., Castelle, C. J., Singh, A., Brown, C. T., Anantharaman, K., Sharon, I., … Banfield, J. F. (2017). Genomic resolution of a cold subsurface aquifer community provides metabolic insights for novel microbes adapted to high CO2concentrations. Environmental Microbiology, 19(2), 459–474. https://doi.org/10.1111/1462-2920.13362

Quaiser, A., Bodi, X., Dufresne, A., Naquin, D., Francez, A. J., Dheilly, A., … Vandenkoornhuyse, P. (2014). Unraveling the stratification of an iron-oxidizing microbial mat by metatranscriptomics. PLoS ONE, 9(7), 1–9. https://doi.org/10.1371/journal.pone.0102561 Quatrini, R., Appia-Ayme, C., Denis, Y., Ratouchniak, J., Veloso, F., Valdes, J., … Bonnefoy, V. (2006). Insights into the iron and sulfur energetic metabolism of Acidithiobacillus ferrooxidans by microarray transcriptome profiling. Hydrometallurgy, 83(1–4), 263–272. https://doi.org/10.1016/j.hydromet.2006.03.030

Rentz, J. A., Kraiya C., Luther, G. W. & Emerson, D (2007) Control of ferrous iron oxidation within circumneutral microbial iron mats by cellular activity and autocatalysis. Environ Sci Technol 41:6084–9. Roger, M., Castelle, C., Guiral, M., Infossi, P., Lojou, E., Giudici-Orticoni, M.-T., & Ilbert, M. (2012). Mineral respiration under extreme acidic conditions: from a supramolecular organization to a molecular adaptation in Acidithiobacillus ferrooxidans. Biochemical Society Transactions, 40(6), 1324–1329. https://doi.org/10.1042/BST20120141

Saberi, F., Kamali, M., Najafi, A., Yazdanparast, A., & Moghaddam, M. M. (2016). Natural antisense RNAs as mRNA regulatory elements in bacteria: A review on function and applications. Cellular and Molecular Biology Letters, 21(1), 1–17. https://doi.org/10.1186/s11658-016-0007-z

Shi, L., Richardson, D. J., Wang, Z., Kerisit, S. N., Rosso, K. M., Zachara, J. M., & Fredrickson, J. K. (2009). The roles of outer membrane cytochromes of Shewanella and Geobacter in extracellular electron transfer. Environmental Microbiology Reports, 1(4), 220–227. https://doi.org/10.1111/j.1758- 2229.2009.00035.x

Stookey, L. L. (1970). Ferrozine-A New Spectrophotometric Reagent for Iron. Analytical Chemistry, 42(7), 779–781. https://doi.org/10.1021/ac60289a016

Tamm, L. K., Hong, H., & Liang, B. (2004). Folding and assembly of ??-barrel membrane proteins. Biochimica et Biophysica Acta - Biomembranes, 1666(1–2), 250–263. https://doi.org/10.1016/j.bbamem.2004.06.011

80 Vargas, M., Malvankar, N. S., Tremblay, P., Amino, A., Required, A., Conductivity, P., … Journal, A. S. M. (2013). Aromatic Amino Acids Required for Pili Conductivity and Long- Range Extracellular Electron Transport in Geobacter sulfurreducens. mB, 4(2), 1–6. https://doi.org/10.1128/mBio.00105-13.Editor

Wang, W. W.-S., Das, D., & Suresh, M. R. (2005). Biotin Carboxyl Carrier Protein Co-Purifies as a Contaminant in Core-Streptavidin Preparations. Molecular Biotechnology, 31(1), 029–040. https://doi.org/10.1385/MB:31:1:029 Weiss, J. V., Rentz, J. A., Plaia, T., Neubauer, S. C., Merrill-Floyd, M., Lilburn, T., … Emerson, D. (2007). Characterization of neutrophilic Fe(II)-oxidizing bacteria isolated from the rhizosphere of wetland plants and description of Ferritrophicum radicicola gen. nov. sp. nov., and Sideroxydans paludicola sp. nov. Geomicrobiology Journal, 24(7–8), 559–570. https://doi.org/10.1080/01490450701670152 White, G. F., Edwards, M. J., Gomez-Perez, L., Richardson, D. J., Butt, J. N., & Clarke, T. A. (2016). Mechanisms of Bacterial Extracellular Electron Exchange. Advances in microbial physiology (1st ed., Vol. 68). Elsevier Ltd. https://doi.org/10.1016/bs.ampbs.2016.02.002 Yarzábal, A., Appia-Ayme, C., Ratouchniak, J., & Bonnefoy, V. (2004). Regulation of the expression of the Acidithiobacillus ferrooxidans rus operon encoding two cytochromes c, a cytochrome oxidase and rusticyanin. Microbiology, 150(7), 2113–2123. https://doi.org/10.1099/mic.0.26966-0 Yarzábal, A., Brasseur, G., Ratouchniak, J., Lund, K., Lemesle-meunier, D., & Demoss, J. a. (2002). The High-Molecular-Weight Cytochrome c Cyc2 of Acidithiobacillus ferrooxidans Is an Outer Membrane Protein The High- Molecular-Weight Cytochrome c Cyc2 of Acidithiobacillus ferrooxidans Is an Outer Membrane Protein. J. Bacteriol., 184(1), 313–317. https://doi.org/10.1128/JB.184.1.313

Yoshikawa, S. & Caughey, W. S. (1992) Infrared evidence of azide binding to iron, copper, and non-metal sites in heart cytochrome c oxidase. The Journal of Biological Chemistry, 267(14), 9757-9766.

81 APPENDIX

SUPPLEMENTARY FIGURES FOR CYC2 EXPRESSION IN THE ENVIRONMENT

Figure A1: Violin plots of gene expression in six of the Gallionellaceae bins. Each plot shows expression over four time points. Expression of cyc2 is signified by red dots.

82

Figure A2: Maximum-likelihood phylogenetic tree of MtrA homologs. Bootstrap values are shown at each node and represent a total of 100 bootstraps (continued on next page).

83

Figure A2: Maximum-likelihood phylogenetic tree of MtrA homologs. Bootstrap values are shown at each node and represent a total of 100 bootstraps.

Table A1: Representative sequences used to collect homologs from NCBI with BLAST. Results for each BLAST search are also shown

84

85

# Author: Arkadiy Garber from collections import defaultdict import re

def filter(stringOrList, IllegalList, replaceWith): emptyList = [] for i in stringOrList: if i not in IllegalList: emptyList.append(i) else: emptyList.append(replaceWith) string = "".join(emptyList) return string

def fasta(txt): illegalCharacters = [":", ",", "(", ")", ";", "[", "]", "'", " "] sequence = '' header = '' Sequences = defaultdict(lambda: defaultdict(lambda: 'EMPTY')) for line in txt: if re.match(r'^>', line): if len(sequence) > 1: Sequences[header] = sequence header = line.rstrip() header = filter(header, illegalCharacters, "_") sequence = '' else: sequence += line.rstrip() return Sequences

print("Hello World")

# Defining a file to which the protein sequences of bin 22 will be written: outfile1 = open("/Users/arkadiygarber/Desktop/JewellprotsB22_1.txt", "w") outfile2 = open("/Users/arkadiygarber/Desktop/JewellprotsB22_2.txt", "w") outfile3 = open("/Users/arkadiygarber/Desktop/JewellprotsB22_3.txt", "w") outfile4 = open("/Users/arkadiygarber/Desktop/JewellprotsB22_4.txt", "w") outfile5 = open("/Users/arkadiygarber/Desktop/JewellprotsB22_5.txt", "w") outfile6 = open("/Users/arkadiygarber/Desktop/JewellprotsB22_6.txt", "w") outfile7 = open("/Users/arkadiygarber/Desktop/JewellprotsB22_7.txt", "w") outfile8 = open("/Users/arkadiygarber/Desktop/JewellprotsB22_8.txt", "w") outfile9 = open("/Users/arkadiygarber/Desktop/JewellprotsB22_9.txt", "w")

# # # Reading in the supplemental file with each row as a list, indexing out of that list only the columns that give # # # header info and AA seqs. Then, writing those rows to a fasta file (separate file for each sub-bin: jewell = open("/Users/arkadiygarber/Desktop/Ongoing Research Projects/Cyc2/Jewell_files/orfs2features.txt", "r") for i in jewell: if list(i.split('\t'))[104] == "b22.1": outfile1.write(">" + list(i.split('\t'))[21] + "\n")

86 outfile1.write(list(i.split('\t'))[101] + "\n") if list(i.split('\t'))[104] == "b22.2": outfile2.write(">" + list(i.split('\t'))[21] + "\n") outfile2.write(list(i.split('\t'))[101] + "\n") if list(i.split('\t'))[104] == "b22.3": outfile3.write(">" + list(i.split('\t'))[21] + "\n") outfile3.write(list(i.split('\t'))[101] + "\n") if list(i.split('\t'))[104] == "b22.4": outfile4.write(">" + list(i.split('\t'))[21] + "\n") outfile4.write(list(i.split('\t'))[101] + "\n") if list(i.split('\t'))[104] == "b22.5": outfile5.write(">" + list(i.split('\t'))[21] + "\n") outfile5.write(list(i.split('\t'))[101] + "\n") if list(i.split('\t'))[104] == "b22.6": outfile6.write(">" + list(i.split('\t'))[21] + "\n") outfile6.write(list(i.split('\t'))[101] + "\n") if list(i.split('\t'))[104] == "b22.7": outfile7.write(">" + list(i.split('\t'))[21] + "\n") outfile7.write(list(i.split('\t'))[101] + "\n") if list(i.split('\t'))[104] == "b22.8": outfile8.write(">" + list(i.split('\t'))[21] + "\n") outfile8.write(list(i.split('\t'))[101] + "\n") if list(i.split('\t'))[104] == "b22.9": outfile9.write(">" + list(i.split('\t'))[21] + "\n") outfile9.write(list(i.split('\t'))[101] + "\n")

# The fasta file will be the blast database against which the genes of interest shall be blasted against

# The following sequnce of commands extracts the rows to which a gene-of- interest homolog was found: outcsv = open("/Users/arkadiygarber/Desktop/Jewell.csv", "w") blast_results1 = open("/Users/arkadiygarber/Desktop/blast_results/Jewellbin22_1.faa.blast.txt" , "r") DictionaryHead1 = defaultdict(lambda: "Empty") DictionaryE1 = defaultdict(lambda: "Empty") for i in blast_results1: DictionaryHead1[list(i.split('\t'))[1]] = list(i.split('\t'))[0] DictionaryE1[list(i.split('\t'))[1]] = list(i.split('\t'))[10] counter = 0 for i in jewell: iList = list(i.split('\t')) if counter == 0: outcsv.write("Gene_hit" + "," + "e-value" + "," + str(iList[0:21]) + "," + str(iList[23:]) + "," + "uniref.besthit" + "\n") counter += 1 else: if iList[21] in DictionaryE1.keys(): outcsv.write(DictionaryHead1[iList[21]] + "," + DictionaryE1[iList[21]] + "," + str(iList[0:21]) + "," + str(iList[23:]) + "," + iList[21] + "\n") jewell = open("/Users/arkadiygarber/Desktop/orfs2features.txt", "r") blast_results2 =

87 open("/Users/arkadiygarber/Desktop/blast_results/Jewellbin22_2.faa.blast.txt" , "r") DictionaryHead2 = defaultdict(lambda: "Empty") DictionaryE2 = defaultdict(lambda: "Empty") for i in blast_results2: DictionaryHead2[list(i.split('\t'))[1]] = list(i.split('\t'))[0] DictionaryE2[list(i.split('\t'))[1]] = list(i.split('\t'))[10] for i in jewell: iList = list(i.split('\t')) print(iList[0:2]) if counter == 0: outcsv.write("Gene_hit" + "," + "e-value" + "," + str(iList[0:21]) + "," + str(iList[23:]) + "," + "uniref.besthit" + "\n") counter += 1 else: if iList[21] in DictionaryE2.keys(): outcsv.write(DictionaryHead2[iList[21]] + "," + DictionaryE2[iList[21]] + "," + str(iList[0:21]) + "," + str(iList[23:]) + "," + iList[21] + "\n") jewell = open("/Users/arkadiygarber/Desktop/orfs2features.txt", "r") blast_results3 = open("/Users/arkadiygarber/Desktop/blast_results/Jewellbin22_3.faa.blast.txt" , "r") DictionaryHead3 = defaultdict(lambda: "Empty") DictionaryE3 = defaultdict(lambda: "Empty") for i in blast_results3: DictionaryHead3[list(i.split('\t'))[1]] = list(i.split('\t'))[0] DictionaryE3[list(i.split('\t'))[1]] = list(i.split('\t'))[10] for i in jewell: iList = list(i.split('\t')) if counter == 0: outcsv.write("Gene_hit" + "," + "e-value" + "," + str(iList[0:21]) + "," + str(iList[23:]) + "," + "uniref.besthit" + "\n") counter += 1 else: if iList[21] in DictionaryE3.keys(): outcsv.write(DictionaryHead3[iList[21]] + "," + DictionaryE3[iList[21]] + "," + str(iList[0:21]) + "," + str(iList[23:]) + "," + iList[21] + "\n") jewell = open("/Users/arkadiygarber/Desktop/orfs2features.txt", "r") blast_results4 = open("/Users/arkadiygarber/Desktop/blast_results/Jewellbin22_4.faa.blast.txt" , "r") DictionaryHead4 = defaultdict(lambda: "Empty") DictionaryE4 = defaultdict(lambda: "Empty") for i in blast_results4: DictionaryHead4[list(i.split('\t'))[1]] = list(i.split('\t'))[0] DictionaryE4[list(i.split('\t'))[1]] = list(i.split('\t'))[10] for i in jewell: iList = list(i.split('\t')) if counter == 0: outcsv.write("Gene_hit" + "," + "e-value" + "," + str(iList[0:21]) + "," + str(iList[23:]) + "," + "uniref.besthit" + "\n")

88 counter += 1 else: if iList[21] in DictionaryE4.keys(): outcsv.write(DictionaryHead4[iList[21]] + "," + DictionaryE4[iList[21]] + "," + str(iList[0:21]) + "," + str(iList[23:]) + "," + iList[21] + "\n") jewell = open("/Users/arkadiygarber/Desktop/orfs2features.txt", "r") blast_results5 = open("/Users/arkadiygarber/Desktop/blast_results/Jewellbin22_5.faa.blast.txt" , "r") DictionaryHead5 = defaultdict(lambda: "Empty") DictionaryE5 = defaultdict(lambda: "Empty") for i in blast_results5: DictionaryHead5[list(i.split('\t'))[1]] = list(i.split('\t'))[0] DictionaryE5[list(i.split('\t'))[1]] = list(i.split('\t'))[10] for i in jewell: iList = list(i.split('\t')) if counter == 0: outcsv.write("Gene_hit" + "," + "e-value" + "," + str(iList[0:21]) + "," + str(iList[23:]) + "," + "uniref.besthit" + "\n") counter += 1 else: if iList[21] in DictionaryE5.keys(): outcsv.write(DictionaryHead5[iList[21]] + "," + DictionaryE5[iList[21]] + "," + str(iList[0:21]) + "," + str(iList[23:]) + "," + iList[21] + "\n") jewell = open("/Users/arkadiygarber/Desktop/orfs2features.txt", "r") blast_results6 = open("/Users/arkadiygarber/Desktop/blast_results/Jewellbin22_6.faa.blast.txt" , "r") DictionaryHead6 = defaultdict(lambda: "Empty") DictionaryE6 = defaultdict(lambda: "Empty") for i in blast_results6: DictionaryHead6[list(i.split('\t'))[1]] = list(i.split('\t'))[0] DictionaryE6[list(i.split('\t'))[1]] = list(i.split('\t'))[10] for i in jewell: iList = list(i.split('\t')) if counter == 0: outcsv.write("Gene_hit" + "," + "e-value" + "," + str(iList[0:21]) + "," + str(iList[23:]) + "," + "uniref.besthit" + "\n") counter += 1 else: if iList[21] in DictionaryE6.keys(): outcsv.write(DictionaryHead6[iList[21]] + "," + DictionaryE6[iList[21]] + "," + str(iList[0:21]) + "," + str(iList[23:]) + "," + iList[21] + "\n") jewell = open("/Users/arkadiygarber/Desktop/orfs2features.txt", "r") blast_results7 = open("/Users/arkadiygarber/Desktop/blast_results/Jewellbin22_7.faa.blast.txt" , "r") DictionaryHead7 = defaultdict(lambda: "Empty")

89 DictionaryE7 = defaultdict(lambda: "Empty") for i in blast_results7: DictionaryHead7[list(i.split('\t'))[1]] = list(i.split('\t'))[0] DictionaryE7[list(i.split('\t'))[1]] = list(i.split('\t'))[10] for i in jewell: iList = list(i.split('\t')) if counter == 0: outcsv.write("Gene_hit" + "," + "e-value" + "," + str(iList[0:21]) + "," + str(iList[23:]) + "," + "uniref.besthit" + "\n") counter += 1 else: if iList[21] in DictionaryE7.keys(): outcsv.write(DictionaryHead7[iList[21]] + "," + DictionaryE7[iList[21]] + "," + str(iList[0:21]) + "," + str(iList[23:]) + "," + iList[21] + "\n") jewell = open("/Users/arkadiygarber/Desktop/orfs2features.txt", "r") blast_results8 = open("/Users/arkadiygarber/Desktop/blast_results/Jewellbin22_8.faa.blast.txt" , "r") DictionaryHead8 = defaultdict(lambda: "Empty") DictionaryE8 = defaultdict(lambda: "Empty") for i in blast_results8: DictionaryHead8[list(i.split('\t'))[1]] = list(i.split('\t'))[0] DictionaryE8[list(i.split('\t'))[1]] = list(i.split('\t'))[10] for i in jewell: iList = list(i.split('\t')) if counter == 0: outcsv.write("Gene_hit" + "," + "e-value" + "," + str(iList[0:21]) + "," + str(iList[23:]) + "," + "uniref.besthit" + "\n") counter += 1 else: if iList[21] in DictionaryE8.keys(): outcsv.write(DictionaryHead8[iList[21]] + "," + DictionaryE8[iList[21]] + "," + str(iList[0:21]) + "," + str(iList[23:]) + "," + iList[21] + "\n") jewell = open("/Users/arkadiygarber/Desktop/orfs2features.txt", "r") blast_results9 = open("/Users/arkadiygarber/Desktop/blast_results/Jewellbin22_9.faa.blast.txt" , "r") DictionaryHead9 = defaultdict(lambda: "Empty") DictionaryE9 = defaultdict(lambda: "Empty") for i in blast_results9: DictionaryHead9[list(i.split('\t'))[1]] = list(i.split('\t'))[0] DictionaryE9[list(i.split('\t'))[1]] = list(i.split('\t'))[10] for i in jewell: iList = list(i.split('\t')) print(iList[0:21]) if counter == 0: outcsv.write("Gene_hit" + "," + "e-value" + "," + str(iList[0:21]) + "," + str(iList[23:]) + "," + "uniref.besthit" + "\n") counter += 1 else: if iList[21] in DictionaryE9.keys():

90 outcsv.write(DictionaryHead9[iList[21]] + "," + DictionaryE9[iList[21]] + "," + str(iList[0:21]) + "," + str(iList[23:]) + "," + iList[21] + "\n")

# ======

outfile = open("/Users/arkadiygarber/Desktop/Jewellprotein.txt", "w") for i in jewell: i = i.split('\t') outfile.write(">" + i[20] + "\n") outfile.write(i[101] + "\n") print(i[20]) print(i[101] + "\n")

HitDict = defaultdict(lambda: defaultdict(lambda: 'EMPTY')) EDict = defaultdict(lambda: defaultdict(lambda: 'EMPTY')) StartDict = defaultdict(lambda: defaultdict(lambda: 'EMPTY')) EndDict = defaultdict(lambda: defaultdict(lambda: 'EMPTY')) count = 0 RifleBlast = open("/Users/arkadiygarber/Desktop/Cyc2 Paper files/Jewell_Files/Workflow_files/Rifle_blast.txt", "r") for i in RifleBlast: i = i.split("\t") print(len(i)) print(i) EDict[i[1]] = i[10] HitDict[i[1]] = i[0] StartDict[i[1]] = i[8] EndDict[i[1]] = i[9] outfile = open("/Users/arkadiygarber/Desktop/Jewell.csv", "w") outfile.write("Gene" + "," + "bin" + "," + "scaffold" + "," + "S-Start" + "," + "S-End" + "," + "e-value" + "," + "S10_11.13.13.filter.0.2um.mRNA" + "," + "S12_12.04.13.filter.0.2um.mRNA" + "," + "S12_12.18.13.filter.0.2um.mRNA" + "," + "S12_12.30.13.filter.0.2um.mRNA" + "\n")

count = 0 for i in jewell: i = i.split('\t') if count == 0: for j in i: print(count) print(j) print("") count += 1 if i[21] in HitDict.keys() and i[21] != "NA": hit = HitDict[i[21]] header = i[21]

91 scaffold = i[0] mRNA = i[16:20] bin = i[103] outfile.write(hit[1:] + "," + bin + "," + scaffold + "," + StartDict[i[21]] + "," + EndDict[i[21]] + "," + EDict[i[21]] + "," + mRNA[0] + "," + mRNA[1] + "," + mRNA[2] + "," + mRNA[3] + "\n")

Rifle = open("/Users/arkadiygarber/Desktop/Jewell.csv", "r") jewellRPKM = defaultdict(list) jewellbins = defaultdict(list) jewellE = defaultdict(list) jewellscaf = defaultdict(list) jewellStart = defaultdict(list) jewellEnd = defaultdict(list) for i in Rifle: row = i.rstrip().split(",") jewellbins[row[1]].append(row[0]) jewellRPKM[row[1]].append(row[6:]) jewellscaf[row[1]].append(row[2]) jewellE[row[1]].append(row[5]) jewellStart[row[1]].append(row[3]) jewellEnd[row[1]].append(row[4]) for i in jewellbins.keys(): print(i) print(jewellbins[i]) print(jewellscaf[i]) print(jewellE[i]) print(jewellStart[i]) print(jewellEnd[i]) print(jewellRPKM[i]) print("\n\n") binList = [] for i in jewellbins.keys(): print(i) print(jewellbins[i]) print(jewellRPKM[i]) print(jewellE[i]) print("") outfile = open("/Users/arkadiygarber/Desktop/Jewell_bin_RPKM/%s.csv" % i, "w") if i not in binList: outfile.write("Gene" + "," + "scaffold" + "," + "S-Start" + "," + "S- End" + "," + "e-value" + "," + "S10_11.13.13.filter.0.2um.mRNA" + "," + "S10_12.04.13.filter.0.2um.mRNA" + "," + "S10_12.18.13.filter.0.2um.mRNA" + "," + "S10_12.30.13.filter.0.2um.mRNA" + "\n") for j in range(0, len(jewellbins[i])): if jewellbins[i][j] == "637915742_Rfer_0914__putative_cytochrome_c1_signal_peptide_protein__Rhodofer ax_ferrireducens_T118__NC_007908_":

92 outfile.write("Cyc2_Rhodoferax" + "," + str(jewellscaf[i][j]) + "," + str(jewellStart[i][j]) + "," + str(jewellEnd[i][j]) + "," + str(jewellE[i][j]) + "," + str(jewellRPKM[i][j][0]) + "," + str( jewellRPKM[i][j][1]) + "," + str(jewellRPKM[i][j][2]) + "," + str(jewellRPKM[i][j][3]) + "\n") else: outfile.write(str(jewellbins[i][j]) + "," + str(jewellscaf[i][j]) + "," + str(jewellStart[i][j]) + "," + str(jewellEnd[i][j]) + "," + str(jewellE[i][j]) + "," + str(jewellRPKM[i][j][0]) + "," + str(jewellRPKM[i][j][1]) + "," + str(jewellRPKM[i][j][2]) + "," + str(jewellRPKM[i][j][3]) + "\n") else: for j in range(0, len(jewellbins[i])): if jewellbins[i][j] == "637915742_Rfer_0914__putative_cytochrome_c1_signal_peptide_protein__Rhodofer ax_ferrireducens_T118__NC_007908_": outfile.write("Cyc2_Rhodoferax" + "," + str(jewellscaf[i][j]) + "," + str(jewellStart[i][j]) + "," + str(jewellEnd[i][j]) + "," + str(jewellE[i][j]) + "," + str(jewellRPKM[i][j][0]) + "," + str( jewellRPKM[i][j][1]) + "," + str(jewellRPKM[i][j][2]) + "," + str(jewellRPKM[i][j][3]) + "\n") else: outfile.write(str(jewellbins[i][j]) + "," + str(jewellscaf[i][j]) + "," + str(jewellStart[i][j]) + "," + str(jewellEnd[i][j]) + "," + str(jewellE[i][j]) + "," + str(jewellRPKM[i][j][0]) + "," + str(jewellRPKM[i][j][1]) + "," + str(jewellRPKM[i][j][2]) + "," + str(jewellRPKM[i][j][3]) + "\n")

blast = open("/Users/arkadiygarber/Desktop/Jewell_Dsr-Sox/blast", "r") BlastDict = defaultdict(lambda: defaultdict(lambda: 'EMPTY')) for i in blast: iList = i.split("\t") BlastDict[iList[1]]["gene"] = iList[0].split("_")[0] BlastDict[iList[1]]["e"] = iList[10]

out = open("/Users/arkadiygarber/Desktop/DsrSox_table2.csv", "w") out.write("gene" + "," + "bin" + "," + "time 0 RPKM" + "," + "time 21 RPKM" + "," + "time 35 RPKM" + "," + "time 47 RPKM" + "," + "E-value" + "," + "Scaffold" + "\n")

93 count = 0 jewellDict = defaultdict(lambda: defaultdict(lambda: 'EMPTY')) for i in jewell: iList = i.rstrip().split("\t") if iList[108] == "b22.9": scaff = (iList[20]) t0 = iList[16] t1 = iList[17] t2 = iList[18] t3 = iList[19] bin = iList[108] if scaff in BlastDict.keys(): jewellDict[bin]["gene"] = BlastDict[scaff]["gene"] jewellDict[bin]["e"] = BlastDict[scaff]["e"] jewellDict[bin]["scaff"] = scaff jewellDict[bin]["t0"] = t0 jewellDict[bin]["t1"] = t1 jewellDict[bin]["t2"] = t2 jewellDict[bin]["t3"] = t3 out.write(BlastDict[scaff]["gene"] + "," + bin + "," + t0 + "," + t1 + "," + t2 + "," + t3 + "," + BlastDict[scaff]["e"] + "," + scaff + "\n") print(count)

Figure A3: Python V.3. script for partitioning the Jewell supplementary dataset into separate bins.

94

#!/usr/bin/env python3 # !/bin/sh from collections import defaultdict import re import os import textwrap import argparse import urllib.request import urllib.error import ssl from Bio import Entrez from Bio import SwissProt from Bio import ExPASy gcontext = ssl.SSLContext(ssl.PROTOCOL_TLSv1)

def filter(list, items): outLS = [] for i in list: if i not in items: outLS.append(i) return outLS

def delim(line): ls = [] string = '' for i in line: if i != " ": string += i else: ls.append(string) string = '' ls.append(string) ls = filter(ls, [""]) return ls

def fasta(fasta_file): seq = '' header = '' Dict = defaultdict(lambda: defaultdict(lambda: 'EMPTY')) for i in fasta_file: i = i.rstrip() if re.match(r'^>', i): if len(seq) > 0: Dict[header] = seq header = i[1:] header = header.split(" # ")[0] seq = '' else: header = i[1:] header = header.split(" # ")[0] seq = ''

95 else: seq += i Dict[header] = seq return Dict def mass(seq): dal = defaultdict(lambda: defaultdict(lambda: 'EMPTY')) dal["A"] = 89 dal["R"] = 174 dal["N"] = 132 dal["D"] = 133 dal["C"] = 121 dal["Q"] = 146 dal["E"] = 147 dal["G"] = 75 dal["H"] = 155 dal["I"] = 131 dal["L"] = 131 dal["K"] = 146 dal["M"] = 149 dal["F"] = 165 dal["P"] = 115 dal["S"] = 105 dal["T"] = 119 dal["W"] = 204 dal["Y"] = 181 dal["V"] = 117 dal["*"] = 0 mw = 0 for i in seq: mw += int(dal[i]) return mw/1000

def replace(stringOrlist, list, item): emptyList = [] for i in stringOrlist: if i not in list: emptyList.append(i) else: emptyList.append(item) outString = "".join(emptyList) return outString

def remove(stringOrlist, list): emptyList = [] for i in stringOrlist: if i not in list: emptyList.append(i) else: pass outString = "".join(emptyList) return outString

Cyc2 = open("/Users/arkadiygarber/Desktop/HMM-validation/Cyc2.fa", "r")

96 Cyc2 = fasta(Cyc2)

Dict = defaultdict(lambda: defaultdict(lambda: 'EMPTY')) for i in Cyc2.keys(): id = i.split(" ")[0] Dict[id]["seq"] = Cyc2[i] Dict[id]["head"] = i cyc2hmm = open("/Users/arkadiygarber/Desktop/HMM-validation/cyc2.hmmout", "r") out = open("/Users/arkadiygarber/Desktop/HMM-validation/cyc2_bits.csv", "w") for i in cyc2hmm: if not re.search(r'^#', i): ls = delim(i.rstrip()) name = [] for j in (ls[18:]): name.append(j) name.append("_") name = "".join(name[0:len(name)-1]) seq = (Dict[ls[0]]["seq"]) seq = remove(seq, "-") bit = ls[5] out.write(str(bit) + "\n") if len(seq) < 300 and len(re.findall(r'C(..)CH', seq)) != 1: print(ls) print(bit) print(seq) print(len(re.findall(r'C(..)CH', seq))) print(len(seq)) print("") uniprot = open("/Users/arkadiygarber/Desktop/HMM-validation/Cyc2- uniprot.tsv", "r") UniprotDict = defaultdict(lambda: defaultdict(lambda: 'EMPTY')) for i in uniprot: ls = (i.rstrip().split("\t")) id = (ls[0]) bit = (ls[7]) UniprotDict[id] = bit # count = 0 out = open("/Users/arkadiygarber/Desktop/HMM-validation/cyc2- uniprot_bits.csv", "w") uniseqs = open("/Users/arkadiygarber/Desktop/HMM-validation/Cyc2-uniprot.fa", "r") uniseqs = fasta(uniseqs) for i in uniseqs.keys(): bit = (UniprotDict[i]) # count += 1 if float(bit) >= 124: if len(uniseqs[i]) > 289 and len(re.findall(r'C(..)CH', uniseqs[i])) == 1: count += 1 print(count) print(bit) print(re.findall(r'(...... )C(..)CH', uniseqs[i]))

97 print(len(uniseqs[i])) print(len(re.findall(r'C(..)CH', uniseqs[i]))) out.write(str(UniprotDict[i]) + "\n") print("") print(count)

count = 0 mtoA = open("/Users/arkadiygarber/Desktop/HMM-validation/MtoA.fa", "r") mtoA = fasta(mtoA) for i in mtoA.keys(): if len(mtoA[i]) < 280: print(i) print(mtoA[i]) print(len(mtoA[i])) print(len(re.findall(r'C(..)CH', mtoA[i]))) print("") print(count) out = open("/Users/arkadiygarber/Desktop/HMM-validation/MtoA2.fa", "w") for i in mtoA.keys(): print(i) out.write(i + "\n") out.write(mtoA[i] + "\n") hmmout = open("/Users/arkadiygarber/Desktop/HMM-validation/mtoA.hmmout", "r") out = open("/Users/arkadiygarber/Desktop/HMM-validation/mtoA_bits.csv", "w") for i in hmmout: if not re.search(r'^#', i): ls = delim(i.rstrip()) print(ls[5]) out.write(str(ls[5]) + "\n")

uniprots = open("/Users/arkadiygarber/Desktop/HMM-validation/MtoA- uniprot.fa", "r") uniprots = fasta(uniprots) count = 0 counter = 0 uniout = open("/Users/arkadiygarber/Desktop/HMM-validation/MtoA-uniprot.tsv", "r") for i in uniout: if counter == 0: counter += 1 else: count += 1 ls = i.rstrip().split("\t") bit = (ls[7]) id = (ls[0]) seq = (uniprots[ls[0]]) length = (len(uniprots[ls[0]])) hemes = (len(re.findall(r'C(..)CH', seq))) if float(bit) < 286: count += 1 if hemes >= 9 and hemes <= 11 and len(seq) > 280 and len(seq) < 400:

98 pass else: print(bit) print(count)

Figure A4: Python V.3 script used for validation of HMMs.

99

def identity(list2, list1): count = 0 for i in range(0, len(list2)): if list2[i] == list1[i]: count += 1 return float(float(count)/float(len(list2)))

def stop(string): counter = 0 returnString = [] for i in string: if counter == 2: break if i == "_": returnString.append(i) counter += 1 else: returnString.append(i) string = "".join(returnString[0:len(returnString)-1]) return string

def remove(stringOrlist, list): emptyList = [] for i in stringOrlist: if i not in list: emptyList.append(i) else: pass outString = "".join(emptyList) return outString

def replace(stringOrlist, list, item): emptyList = [] for i in stringOrlist: if i not in list: emptyList.append(i) else: emptyList.append(item) outString = "".join(emptyList) return outString

def fasta(fasta_file): seq = '' header = '' Dict = defaultdict(lambda: defaultdict(lambda: 'EMPTY')) for i in fasta_file: i = i.rstrip()

100 if re.match(r'^>', i): if len(seq) > 0: Dict[header] = seq header = i seq = '' else: header = i seq = '' else: seq += i Dict[header] = seq return Dict

Cyc2 = open("/home/arkg/AAI/AllSeqs.txt", "r") cyc2 = fasta(Cyc2) for i in cyc2.keys(): for j in cyc2.keys(): I = replace(i[1:], [" "], "_") J = replace(j[1:], [" "], "_") filenameList = sorted([I,J]) if "" not in filenameList and i != j: filename = (filenameList[0], filenameList[1]) try: outfile = open("/home/arkg/AAI/Cyc2_pairs/%s" % (filenameList[0] + "__" + filenameList[1]), "w") # print(filenameList[0] + "__" + filenameList[1]) outfile.write(i + "\n") outfile.write(remove(cyc2[i], ["-"]) + "\n") outfile.write(j + "\n") outfile.write(remove(cyc2[j], ["-"]) + "\n") except OSError: print("error")

outfile = open("/home/arkg/AAI/identities.csv", "w") outfile.write("seq1" + "," + "seq2" + "," + "id" + "\n") aaiDir = os.listdir("/home/arkg/AAI/aligned/") for i in aaiDir:

if i != "aai.py" and i != "identities.csv":

Orgs = i.split("_")

o1 = Orgs[0]

o2 = Orgs[1][0:len(Orgs[1])-3]

if o1 != o2:

fa = open("/home/arkg/AAI/align/%s" % i, "r")

101 seqList = [] Figure A5: Python V.3 script used for calculating amino acid identities between pairwise alignments of Cyc2 homologs.

fa = fasta(fa)

for j in fa.keys():

seqList.append(j)

print(seqList)

seq1 = fa[seqList[0]]

print(seq1)

seq2 = fa[seqList[1]]

print(seq2)

ident = identity(seq1, seq2)

print(ident)

outfile.write(seqList[0] + "," + seqList[1] + "," + str(ident) +

"\n")

Figure A5: Python V.3 script used for calculating amino acid identities between pairwise alignments of Cyc2 homologs.

102

Figure A6: Alignment of 16 “standalone” Cyc2 cytochromes against 17 cytochromes. from the representative set of 77, which were derived from full-length Cyc2 homologs. This figure replicates figure 4.5, with the addition of headers for each sequence.

103