Iowa State University Capstones, Theses and Graduate Theses and Dissertations Dissertations

2014 Metabolomics based gene function annotation in Escherichia coli and Methanosarcina acetivorans with genetic gain- or loss-of-function strains Lucas John Showman Iowa State University

Follow this and additional works at: https://lib.dr.iastate.edu/etd Part of the Biochemistry Commons

Recommended Citation Showman, Lucas John, "Metabolomics based gene function annotation in Escherichia coli and Methanosarcina acetivorans with genetic gain- or loss-of-function strains" (2014). Graduate Theses and Dissertations. 14013. https://lib.dr.iastate.edu/etd/14013

This Dissertation is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Graduate Theses and Dissertations by an authorized administrator of Iowa State University Digital Repository. For more information, please contact [email protected]. Metabolomics based gene function annotation in Escherichia coli and Methanosarcina acetivorans with genetic gain- or loss-of-function strains

by

Lucas Showman

A dissertation submitted to the graduate faculty

in partial fulfillment of the requirements for the degree of

DOCTOR OF PHILOSOPHY

Major: Biochemistry

Program of Study Committee: Basil J. Nikolau, Major Professor

Eve S. Wurtele Young-Jin Lee Thomas A. Bobik Alan A. Dispirito

Iowa State University

Ames, Iowa

2014

Copyright © Lucas Showman, 2014. All rights reserved ii

TABLE OF CONTENTS

Page

ACKNOWLEDGEMENTS ...... iv

ABSTRACT………………………………...... v

CHAPTER 1 GENERAL INTRODUCTION: THESIS FORMATTING ...... 1

CHAPTER 2 METABOLITE AND GROWTH ANALYSIS OF Β-KETOACYL -ACP SYNTHASE II AND III DELETION MUTANTS IN E. COLI ...... 6 Abstract…… ...... 6

Introduction ...... 7

Materials and methods ...... 10

Results……...... 16

Discussion… ...... 22

References…...... 25

CHAPTER 3 GENE 2 FUNCTION PROJECT IN METHANOSARCINA ACETIVORANS: INVESTIGATING A NOVEL COMBINED NMR LIGAND SCREENING AND METABOLOMIC ANALYSIS BASED APPROACH FOR EFFICIENT AND ACCURATE GENE FUNCTIONAL ANNOTATION ...... 56

Introduction… ...... 56

Materials and methods ...... 62

Results……...... 66

iii

Page

Discussion… ...... 70

References…...... 73

CHAPTER 4 THE NETWORK OF METHANOSARCINA ACETIVORANS: GENE FUNCTION, REGULATION, AND METABOLIC ROLE OF BIOTIN IN A METHANOGENIC ARCHAEON’S METABOLISM .... 93

Abstract…… ...... 94

Introduction… ...... 95

Materials and methods ...... 97

Results……...... 102

Discussion… ...... 105

References…...... 109

CHAPTER 5 GENERAL CONCLUSIONS ...... 129

APPENDIX ...... 130 iv

ACKNOWLEDGEMENTS

First of all, I would like to thank my major professor, Basil Nikolau, for his guidance and support throughout the course of my research and graduate studies. I would also like to express my gratitude to my committee members, Eve Wurtele, Young-Jin Lee, Thomas

Bobik, and Alan Dispirito for their assistance and guidance in my doctoral studies.

In addition, I would also like to thank my friends and colleagues in the Nikolau group for making my time doing research at Iowa State University a wonderful experience.

Finally, thanks to my family for their encouragement: my parents Gail and Homer for their love and support, my wife Leanna for her hours of patience, respect and love, and my girls Nora and Ayla for their love and ridiculous shenanigans that distract me from the daily grind of my research.

v

ABSTRACT

In this dissertation three studies are presented, all of which incorporated metabolomics as part of their experimental methods. The first study is a loss of function study focused on examining E. coli ΔfabH and ΔfabF single and double mutants.

Metabolomics studies of ΔfabH β-Ketoacyl-acyl carrier protein synthases (KAS) III (KASIII) assisted in the development of the hypothesis that fabF (KASII) is involved in the survival of ΔfabH

E. coli cells. This hypothesis was tested by generating and evaluating ΔfabF/ ΔfabH double mutants.

Characterization of the ΔfabF/ ΔfabH double mutant revealed that the fabB gene (KASI) is able to support E. coli cell growth as the sole KAS present in this strain. Additionally it was determined that fabF is required for the hyper-accumulation of cis-vaccenic acid observed in ΔfabH mutants.

The second study focused on a functional genomics effort to develop a gene-function annotation method based on metabolomics and NMR ligand binding assays. This study had mixed results. The metabolomics effort in this study was met with great technical hurdles.

Only in a single case (MA3250) was metabolomics useful in annotating a gene’s functions.

While the NMR binding assays were able to provide useful data for more accurate gene functional annotations in more than 10% of the genes examined.

The third study presented was an in depth study of the biotin network of Methanosarcina acetivorans, where metabolic changes associated with the presence and absence of biotin in

Methanosarcina acetivorans cultures was investigated. In this study metabolomics studies were able to detect metabolic changes that were associated with the presents or absence of vi

biotin. Additionally it was revealed that M. acetivorans does not require biotin or any detectable biotinylated protein to survive and maintain growth. 1

CHAPTER I. GENERAL INTRODUCTION

From time of the sequencing of the first complete genome in 1995 (Fleischmann et al.,

1995) to the present, sequencing efforts have yielded the sequences of approximately 22,800 genomes, from 16,500 organisms, representing 349 million genes and genomic features (CoGe,

2014). As the number of sequenced genomes has grown, the observation remains that a significant proportion (as high as 70% in some cases) of prokaryotic, archaeal, and eukaryotic genes have unknown functions. For example, among all archaea ~33% of the genes sequenced were recently labeled as genes of unknown function (Yelton et al., 2011). The persistent lack of annotations represents significant gap in our understanding of biology and biochemistry.

Metabolomics is a functional genomics tool that has emerged as a valuable method of resolving gene functional annotations. Metabolomics is the science of identifying and measuring the pool of metabolites that represents the chemical composition of a biological sample, termed the metabolome. Although metabolomics is a relatively new field in biology (Fiehn et al., 2000;

Hall et al., 2002), metabolomic analysis has increasing examples of being successfully employed to determine the metabolic functionality of genes (Hirai et al., 2005; Brauer et al., 2006; Liang et al., 2006; Schauer and Fernie, 2006; Hegeman et al., 2007; Swanepoel and Loots, 2014). Our main analytical approach utilized targeted and non-targeted metabolomic analysis of small molecules (<600 Da) by gas chromatography–mass spectrometry (GC-MS) analysis. GC-MS allows for compounds to be identified based on retention times in the chromatography, indexed relative to a series of hydrocarbon standards, the m/z value of the molecular ion for each metabolite, and matching the MS-fragmentation pattern of each to existing MS-databases. The identification of metabolites is a significant hurdle in metabolomics (de Jong and Beecher, 2

2012). Fourier transform-ion cyclotron resonance mass spectrometry (FT-ICR MS) and other high mass accuracy instruments are valuable for metabolite detection and identification. The advantage of this technique is determining high mass resolution and accuracy, which allows for empirical formula determination from molecular ions and is useful in determining the identity of unknown compounds. Isotope labeling studies can even more accurately determine empirical formulae, >85% of metabolites to a single empirical formula using 13C and 15N (Hegeman et al.,

2007; de Jong and Beecher, 2012). Metabolomics can be used for analyzing the functionality of genes, specifically by applying the technology to gain of function and loss of function genetic mutants. Additionally, metabolomics can be used to explore metabolic diversity of different accessions/strains, tissues, and cell types of an organism. Once the metabolic diversity has been described, the genetic-basis for the metabolic and/or phenotypic differences can be elucidated using other omics technologies such genomics, transcriptomics, and proteomics. In both strategies, the genetic and metabolomic data can be mapped to known and putative metabolic pathways to identify the functionality of specific genes and alleles (Förster et al., 2002; Schauer and Fernie, 2006; Swanepoel and Loots, 2014).

In this dissertation three studies are presented, all of which incorporated metabolomics as part of their experimental methods. The first study is a loss of function study focused on examining E. coli ΔfabH and ΔfabF single and double mutants. The second study is a functional genomics effort to develop a gene-function annotation method. This method relied on metabolomics and NMR ligand binding assays to assign function to genes from Methanosarcina acetivorans. The third study presented is an in depth study of the biotin network of

Methanosarcina acetivorans, where metabolic changes associated with the presence and absence of biotin in Methanosarcina acetivorans cultures were investigated. 3

Dissertation organization

This dissertation is composed of five chapters and one appendix. The first chapter is a general introduction presenting brief introduction to metabolomics. The purpose of this chapter is to describe the common approach used in the next three chapters.

The second chapter is a manuscript to be submitted to Journal of Bacteriology. In this chapter, the growth and metabolic consequences of KASII and KASIII single and double mutants in E. coli are presented. The third chapter describes a functional genomics effort to develop a gene function annotation method. This method relied on metabolomics and NMR ligand binding assays to assign function to genes from Methanosarcina acetivorans. The fourth chapter is a manuscript to be submitted to Journal of Bacteriology. In this chapter, the growth and metabolic consequences of the presence and absence of the vitamin biotin are described. The final chapter is a brief general conclusion concerning the studies outlined in chapters two through four. The last section is an appendix that contains the metabolomics results from the study in chapter three.

I was primarily responsible for conducting all of the experiments in chapter 2. Chapters 3 and 4 were part of collaboration between groups at Iowa State University and the University of

Maryland. I was responsible metabolite analysis and genetic complementation studies in chapter

3, while in chapter 4, I was primarily responsible for conducting the molecular biology and biochemical characterization of M. acetivorans cultures in the presences and absents of biotin (I was not responsible for NMR experiments).

REFERENCES

Brauer MJ, Yuan J, Bennett BD, Lu W, Kimball E, Botstein D, Rabinowitz JD (2006) Conservation of the metabolomic response to starvation across two divergent microbes. Proc Natl Acad Sci U S A 103: 19302-19307 4

Cha S, Song Z, Nikolau BJ, Yeung ES (2009) Direct profiling and imaging of epicuticular waxes on Arabidopsis thaliana by laser desorption/ionization mass spectrometry using silver colloid as a matrix. Anal Chem 81: 2991-3000 Cha S, Zhang H, Ilarslan HI, Wurtele ES, Brachova L, Nikolau BJ, Yeung ES (2008) Direct profiling and imaging of plant metabolites in intact tissues by using colloidal graphite- assisted laser desorption ionization mass spectrometry. Plant J 55: 348-360 CoGe (2014) The Place to Compare Genomes. In. genomevolution.org de Jong FA, Beecher C (2012) Addressing the current bottlenecks of metabolomics: Isotopic Ratio Outlier Analysis™, an isotopic-labeling technique for accurate biochemical profiling. Bioanalysis 4: 2303-2314 Fiehn O, Kopka J, Dörmann P, Altmann T, Trethewey RN, Willmitzer L (2000) Metabolite profiling for plant functional genomics. Nat Biotechnol 18: 1157-1161 Fleischmann RD, Adams MD, White O, Clayton RA, Kirkness EF, Kerlavage AR, Bult CJ, Tomb JF, Dougherty BA, Merrick JM (1995) Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269: 496-512 Förster J, Gombert AK, Nielsen J (2002) A functional genomics approach using metabolomics and in silico pathway analysis. Biotechnol Bioeng 79: 703-712 Hall R, Beale M, Fiehn O, Hardy N, Sumner L, Bino R (2002) Plant metabolomics: the missing link in functional genomics strategies. Plant Cell 14: 1437-1440 Hegeman AD, Schulte CF, Cui Q, Lewis IA, Huttlin EL, Eghbalnia H, Harms AC, Ulrich EL, Markley JL, Sussman MR (2007) Stable isotope assisted assignment of elemental compositions for metabolomics. Anal Chem 79: 6912-6921 Hirai MY, Klein M, Fujikawa Y, Yano M, Goodenowe DB, Yamazaki Y, Kanaya S, Nakamura Y, Kitayama M, Suzuki H, Sakurai N, Shibata D, Tokuhisa J, Reichelt M, Gershenzon J, Papenbrock J, Saito K (2005) Elucidation of gene-to-gene and metabolite-to-gene networks in arabidopsis by integration of metabolomics and transcriptomics. J Biol Chem 280: 25590-25595 Jun JH, Song Z, Liu Z, Nikolau BJ, Yeung ES, Lee YJ (2010) High-spatial and high-mass resolution imaging of surface metabolites of Arabidopsis thaliana by laser desorption- ionization mass spectrometry using colloidal silver. Anal Chem 82: 3255-3265 5

Lee YJ, Perdian DC, Song Z, Yeung ES, Nikolau BJ (2012) Use of mass spectrometry for imaging metabolites in plants. Plant J 70: 81-95 Liang YS, Kim HK, Lefeber AW, Erkelens C, Choi YH, Verpoorte R (2006) Identification of phenylpropanoids in methyl jasmonate treated Brassica rapa leaves using two- dimensional nuclear magnetic resonance spectroscopy. J Chromatogr A 1112: 148-155 Marcinowska R, Trygg J, Wolf-Watz H, Mortiz T, Surowiec I (2011) Optimization of a sample preparation method for the metabolomic analysis of clinically relevant bacteria. J Microbiol Methods 87: 24-31 Schauer N, Fernie AR (2006) Plant metabolomics: towards biological function and mechanism. Trends Plant Sci 11: 508-516 Swanepoel CC, Loots dT (2014) The use of functional genomics in conjunction with metabolomics for Mycobacterium tuberculosis research. Dis Markers 2014: 124218 Takahashi H, Kai K, Shinbo Y, Tanaka K, Ohta D, Oshima T, Altaf-Ul-Amin M, Kurokawa K, Ogasawara N, Kanaya S (2008) Metabolomics approach for determining growth-specific metabolites based on Fourier transform ion cyclotron resonance mass spectrometry. Anal Bioanal Chem 391: 2769-2782 Wu L, Rowe EW, Jeftinija K, Jeftinija S, Rizshsky L, Nikolau BJ, McKay J, Kohut M, Wurtele ES (2010) Echinacea-induced cytosolic Ca2+ elevation in HEK293. BMC Complement Altern Med 10: 72 Yelton AP, Thomas BC, Simmons SL, Wilmes P, Zemla A, Thelen MP, Justice N, Banfield JF (2011) A semi-quantitative, synteny-based method to improve functional predictions for hypothetical and poorly annotated bacterial and archaeal genes. PLoS Comput Biol 7: e1002230 Zech H, Thole S, Schreiber K, Kalhöfer D, Voget S, Brinkhoff T, Simon M, Schomburg D, Rabus R (2009) Growth phase-dependent global protein and metabolite profiles of Phaeobacter gallaeciensis strain DSM 17395, a member of the marine Roseobacter-clade. Proteomics 9: 3677-3697

6

CHAPTER II. METABOLITE AND GROWTH ANALYSIS OF Β- KETOACYL-ACP SYNTHASE II AND III DELETION MUTANTS IN E. COLI

Lucas Showman1, Riley Brien1, Ann Perera2, and Basil J Nikolau1 2 3*

Author Affiliations:

1 Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames IA 50011, USA

2 W.M. Keck Metabolomics, Research Laboratory Iowa State University, Ames, IA 50011

3 Center for Biorenewable Chemicals, Iowa State University, Ames IA 50011, USA

ABSTRACT

3-Ketoacyl-acyl carrier protein synthase III catalyzes the first carbon-carbon bond- forming reaction (Claisen condensation) of type II fatty acid synthesis systems in bacteria and plant plastids. In E. coli, KASIII is encoded by the fabH gene. FabH deletion mutants were originally thought to be lethal but more recent work has shown that ΔfabH mutants are viable.

Fatty acid profiling of ΔfabH mutants showed they display cis-vaccenic acid hyper-accumulation chemotype. In this study we recovered a ΔfabF/ΔfabH double deletion mutant that was viable on both complete and minimal mediums. We also demonstrated that fabF (KASII) is required for the cis-vaccenic acid hyper-accumulation chemotype displayed by ΔfabH cells. Here we propose that ΔfabH deletion mutant cells compensate for the loss of KASIII activity by fabB (or fabF) initiating fatty acid synthesis in the absence of an acetoacetyl-ACP primer via a side reaction involving malonyl-ACP decarboxylase activity and the elongation of the available substrate acetyl-ACP to form the required acetoacetyl-ACP, there by forming a fabH bypass pathway.

Also, we demonstrated for the first time in E. coli, in vivo data showing that fabB is able to support growth as the sole KAS enzyme present in E. coli. 7

INTRODUCTION

Three biochemically and genetically distinct β-Ketoacyl-acyl carrier protein (ACP) synthases (KAS) exist in E. coli: KASI, KASII, and KASIII, that are encoded by the genes fabB, fabF, and fabH, respectively. KASIII catalyzes the first elongation reaction (Claisen condensation) of type II fatty acid synthesis systems in bacteria (Fig. 1) and plant plastids.

KASIII differ from KASI and KASII enzymes by using a coenzyme A (CoA) bound acetyl group as a primer in place of the acyl-ACP molecules used by both KASI and KASII enzymes in E. coli. The fabH gene, encoding the sole copy of KASIII found in E. coli, was originally thought to be an essential gene and fabH deletion mutants made in the K-12 UB1005

(F- metB1 relA1 spoT1 gyrA216 λrλ-) derived strains were found not to be viable (Lai and

Cronan, 2003). Because KASIII enzymes are part of type II fatty acid synthesis systems found in bacteria, they have been extensively studied as a potential antibacterial drug targets (Heath and Rock, 2004; Li et al., 2011; Wang et al., 2012; Yang et al., 2012; Ramamoorthy et al., 2013;

Zhou et al., 2013). However, ΔfabH deletion mutants were generated in the K-12 BW25113 (F-

Δ(araD-araB)567 ΔlacZ4787(::rrnB-3) rph-1 Δ(rhaD-rhaB)568 hsdR514 λ-) background E. coli and are viable and available in the Keio collection of mutants (Baba et al., 2006; Baba and Mori,

2008). FabH deletion mutants made in the UB1005 derived strains were reevaluated and it was shown that the lethality found in these FabH mutants was caused by a background effect. The

UB1005 strain carries the spoT1 mutant allele and it was determined that ΔspoT and ΔfabH are synthetically lethal (Battesti and Bouveret, 2006, 2009; Yao et al., 2012). SpoT protein interacts directly with ACP molecules and initiates a stringent response during fatty acid starvation

(Battesti and Bouveret, 2006, 2009). A ΔfabH E. coli mutant in the BW25113 background was investigated and it was revealed that the fabH gene is required for proper regulation of cell size. 8

ΔfabH cells not only were shorter in length but also were uniquely smaller in measured cell width and had a remarkable 70% reduction in cell volume with respect to wild type cell width

(Yao et al., 2012).

3-Ketoacyl-acyl carrier II or KASII is another family of KAS genes found in E. coli.

KASII in E. coli is encoded by the fabF gene. The substrate specificity of fabF differs from that of E. coli KASIII and KASI in that it is known to have a preference towards the elongation of

C16:1 acyl-ACP to C18:1 acyl-ACP (Fig. 1). FabF plays a role in the regulation of membrane fluidity by altering the amount of the unsaturated fatty acid, cis-vaccenic acid (C18:1 Δ11) that is synthesized. The cis-vaccenic acid becomes incorporated in membrane phospholipids, which leads to increased membrane fluidity. E. coli express fabF in a temperature dependent manner, with increased fabF activity observed at lower temperatures (Garwin et al., 1980; de Mendoza et al., 1983; Ulrich et al., 1983; Oh-hashi and Okuyama, 1986). Although fabF is required to maintain proper membrane fluidity, fabF is not an essential gene in E. coli (Garwin et al., 1980).

KASI differs from the KASIII and KASII enzymes by possessing the ability to accept a broader array of substrates, ranging from short chain acyl-APCs to long chain acyl-ACPs (Fig.

1). KASI is encoded by the fabB gene in E. coli and is able to catalyze condensation reactions to elongate acyl chains up to 18 carbons in length. FabB is an essential gene and is required for the elongation of medium-chain unsaturated acyl-ACP (C10:1-C14:1)(Garwin et al., 1980;

Jackowski and Rock, 1987; Magnuson et al., 1993). In addition to catalyzing condensation reactions, both KASI and KASII enzymes are known to have malonyl-ACP decarboxylase activity that leads to the formation of acetyl-ACP (Alberts et al., 1972). FabB also has an interesting role in biotin biosynthesis where the KASI enzyme catalyzes the elongation of 3- 9 ketoglutaryl-ACP-methyl-ester to pimeloyl-ACP-methyl-ester as part of the biotin biosynthesis pathway (Lin et al., 2010; Cronan and Lin, 2011).

Studying fabH mutants was of particular interest, as fabH was originally thought to be an essential gene and ΔfabH deletion mutants were surprisingly generated in E. coli and are available in the Keio collection of mutants. These ΔfabH deletion mutants provide an attractive opportunity to investigate the metabolic consequences of removing the KASIII from E. coli.

Understanding the metabolic effects of the ΔfabH deletion could elucidate how the cells overcome the mutation as well as provide chemotypic evidence that may be related to observable phenotypes. ΔfabH deletion mutants also provided a tool to study how the remaining KAS enzymes function in the absence of a KASIII in E. coli.

10

MATERIALS AND METHODS

Development and confirmation of KAS mutant strains

The ΔfabH (JW1077) single mutant was obtained from the Keio mutant collection (Baba et al., 2006). PCR was conducted with primers for detection of fabH genes to detect any full length fabH genes sequences with GoTaq DNA Polymerase master mix (Promega, Madison,

WI). Primers fabF gene flanking primers used for the amplification of the GenR cassette were based on the KanR cassette primers used to form the Keio mutant collection (Baba et al., 2006).

The KanR specific primer sequences were replaced with GenR specific sequences and the PCR was amplified using GoTaq DNA Polymerase master mix (Promega, Madison, WI). The fabF gene was replaced with the GenR gene via the Datsenko and Wanner method (Datsenko and

Wanner, 2000). Cells were recovered on TB plates containing 50µg/mL kanamycin and

20µg/mL gentamicin. The ΔfabF (JW1081) single mutant was obtained from the Keio mutant collection (Baba et al., 2006). FabF specific antibodies and primers were used to confirm the deletion of the fabF gene through western blot, and PCR gel electrophoresis analysis. Keio collection mutants were generously provided by Thomas Bobik, Iowa State University.

Growth of analysis of KAS mutants

The four medias used to assess the growth characteristics of the KAS mutant cells were:

MOPS minimal media (Teknova, Hollister, CA), lysogeny broth (LB) (BERTANI, 1951), terrific broth (TB) (Tartoff and Hobbs, 1987), and M9 minimal media (Sigma-Aldrich, St. Louis, MO).

Inoculum pre-cultures were grown in the presence of the appropriate antibiotics at the following concentrations: kanamycin 50µg/mL, gentamycin 20µg/mL. Each sample was diluted to the same starting OD600 (0.001) and media without antibiotics. Culture growth was monitored at an 11 absorbance of 600nm using a Bio-Tek (Winooski, VT) ELx808 microplate reader. Culture growth data was analyzed using the microplate reader’s Gen5 software package. All growth was analyzed at 37°C and cultured with constant shaking performed at 250RPM in flask cultures and the fast shake setting in the microplate cultures. The OD600 readings where observed every 15 minutes until the incubation period of up to 48 hours was completed. Lag times were measured at the point when the measured OD surpassed the threshold of OD600 0.07. Solid plates with LB and MOPS media were made with 1% agar. Plates were incubated up to 48 hours at 37°C. The

OD600 to cell density was calculated based on the number of recovered colonies from diluted cultures plated on LB plates.

Fatty acid analysis with gas chromatography–mass spectrometry

Methyl ester fatty acid derivatives were used to analyze medium, hydroxyl, and long chain fatty acids. Cultures were grown in MOPS minimal media with 0.4% glucose at 37°C and culture with constant shaking performed at 250RPM in flasks. Cells were harvest by centrifugation at mid log phase. Cell pellets were lyophilized overnight to dryness. Dry masses of 5-10mg were recorded. 50µg of the odd chain fatty acid C19:0 was added as internal standards. The cell pellets were treated with 10% barium hydroxide and 1,4-dioxane and incubated overnight at 110°C, yielding free fatty acids. The free fatty acids were recovered in hexane. The extracted free fatty acids were converted to methyl esters by treatment with acidic methanol. The methyl esters were recovered in hexane (Bonaventure et al., 2003). The methyl ester solution was then dried, re-suspended in 0.5mL acetonitrile (Sigma-Aldrich, St. Louis,

MO), then silylated with 35µL 1% BSTFA in TCMS (Sigma-Aldrich, St. Louis, MO), and incubated 30 minutes at 60°C. Finally the samples were dried under N2 (g) to near dryness, re- 12 suspended with 0.1-1mL hexane, and submitted to GC-MS (gas chromatography–mass spectrometry) analysis.

Ethyl esters were used to view the whole range of fatty acids: short, medium, and long.

Cultures were grown in MOPS minimal media with 0.4% glucose at 37°C and culture with constant shaking performed at 250RPM in flasks. Cells were harvest by centrifugation at 0.2,

0.8, 1.0 OD600 and at stationary phase. Cell pellets were lyophilized overnight to dryness. Dry masses of 5-10mg were recorded. The odd chain fatty acids C7:0, C13:0, and C19:0 were added as internal standards with 50µg of each fatty acid added. The cell pellets were treated with 10% barium hydroxide and 1,4-dioxane and incubated overnight at 110°C, yielding free fatty acids.

The free fatty acids were recovered in hexane (Bonaventure et al., 2003). The extracted free fatty acids were converted to ethyl esters by treatment with 49:1 ethanol:H2SO4, recovered again in hexane, and submitted to GC-MS analysis (Unpublished protocol from Fuyuan Jing, Iowa State

University).

The following GC-MS method was used for the fatty acid ester quantifications. All GC-

MS data was captured using an Agilent Technologies Model 7890A Gas Chromatograph with an

Agilent 19091S-433: 1926.58224 HP-5MS 5% Phenyl Methyl Siloxane column -60 °C—325 °C

(325 °C): 30 m x 250 µm x 0.25 µm column coupled to a Model 5975C mass selective detector with electrical ionization (EI). The GC-MS system was controlled by and the data collected with

ChemStation software (Agilent Technologies, Santa Clara, CA). This GC-MS system was made available by the Office of Biotechnology W.M. Keck Metabolomics Research Laboratory, Iowa

State University. The injection port temperature was 280°C with a splitless mode. He was the carrier gas at a flow rate of 2.0 mL/min. The GC oven temperature ramp procedures were as follows: temperature set to 50 °C and held for 2 minutes, ramp #1 25 °C/min to 200°C, hold 1 13 minute, ramp #2 20°C/min to 320°C, hold 3 minute. MS conditions were as follows: ion source,

200°C; electron energy, 70eV; quadrupole temperature, 150°C; GC-MS interface zone, 280°C; scan range, 40-1000 mass units, a 3.6 minute solvent delay was used for the ethyl esters, while a

6.5 minute solvent delay was used for the methyl esters. Data was analyzed using AMDIS

(Automated Mass Spectral Deconvolution and Identification System; http://chemdata.nist.gov/mass-spc/amdis/) software with data interpretation from NIST08 library.

The NIST 08 MS Library and AMDIS were provided by National Institute of Standards and

Technology (www.nist.gov). Note that fatty acids will be abbreviated in Lipid numbers format, as follows: first the total number carbon atoms (X), CX, next the total number of unsaturated carbon-carbon double bounds (D) and their carbon number from the carboxyl group (ΔY),

(CX:D ΔY). Take palmitoleic acid as an example, it is abbreviated as C16:1 cis Δ9 (Rigaudy and

Klesney, 1979). Other possible substitutions include: hydroxyl groups (CXOH, C14OH is myristic acid) and cyclopropane fatty acyl groups cyCX:0 or CX:cyc.

Lipid analysis with FT-MS

Cultures were grown in MOPS minimal media with 0.4% glucose at 37°C and culture with constant shaking performed at 250RPM in flasks. Metabolite extraction with hot methanol was carried out similar to Schmidt et al., 2011. This particular extraction results in two phases for polar and nonpolar metabolites.

ESI-MS and MS/MS experiments were performed on a Bruker Daltonics SolariX FTICR equipped with a 7.0 tesla shielded superconducting magnet. The ions were generated from an external ESI source with mass range from m/z 100 up to 3,000. MS/MS profiles were generated using CID with varying CE settings (0.5-35 eV) depending on the abundance and fragmentation 14 energies required for each molecule. The instrument was externally calibrated with Agilent ESI calibration mix as well as using nonadecanoic acid and octadecylamine (Sigma-Aldrich, St.

Louis, MO) as internal quantification and mass lock standards.

Phospholipid analysis FT-MS and 13C labeling

Cultures were grown in MOPS minimal media with 0.4% 13C glucose (Cambridge

Isotope Laboratories, Tewksbury, MA) at 37°C and culture with constant shaking performed at

250RPM in flasks. Cultures were harvested at late log phase. Samples were analyzed by ESI-MS and MS/MS on a Bruker Daltonics SolariX FTICR. 13C shifted peaks were identified using

ShiftedIonsFinder analysis software (Kazusa DNA Research Institute, Kisarazu, Chiba, Japan) courtesy of Dr. Kota Kera and shared as part of a JST NSF collaboration.

Expression analysis of KASI and KASII

fabB and fabF genes were cloned from K-12 BW25113 E. coli and protein was expressed and purified as described previously by (Edwards et al., 1997). Purified protein samples were submitted to the Hybridoma Facility of the Iowa State University Office of Biotechnology for the generation of polyclonal antibodies in mice. Sera specific to the fabB or fabF proteins were generated. These antibodies were then used for the western blot detection of native fabB and fabF. Protein was extracted from cell pellets of cells at mid-log growth stage. Western blot analysis was performed on the Wt and ΔfabH protein extracts. Detection was achieved by goat anti-mouse IgG (H+L)-HRP conjugate (Bio-Rad, Hercules, CA) using chemilumninescence with

ELC western blotting reagents (GE Healthcare Bio-science Corporation, Piscataway, NJ).

Western blots were imaged with a ChemiDoc XRS+ imaging system and images were analyzed with Image Lab Software (Hercules, CA) to quantify abundances. 15

Profiling of short chain acyl-ACPs

Freshly grown cells at mid-log phase were centrifuged to collect and remove media.

20uL of 6 mg/mL purified HIS tagged E. coli ACP (Xin Guan, Publication in preparation) was added. 1.3 mL 7.5% ice cold trichloroacetic acid (TCA) was added and the samples were homogenized by vortexing. Centrifuge at 25000g for 10 minutes. We then removed the supernatant and quickly repeated with two additional 1.3 mL 7.5% TCA washes. Next the pellet was washed once more with 0.65mL 1% TCA; this wash supernatant was discarded. ACPs were dissolved in 50mM Mops buffer (pH 7.6 , with KOH), elute and centrifuged to pellet 3 times with 1.33 mL (4mL total). To remove high molecular weight contaminates, the dissolved ACPs solution was passed through 4mL Amicon centrifugal filter units with 50kDa MWCO (Millipore,

Billerica, MA, USA) by centrifuging for 30 minutes at 3400g to collect the flow through. The

ACP extracts were then washed and concentrated in 3kDa MWCO 4mL Amicon centrifugal filter units with 50kDa MWCO (Millipore, Billerica, MA, USA) by centrifuging for 30 minutes at 3400g three times with 1.33 mL HPLC grade water (Sigma-Aldrich, St. Louis, MO). The extracts were then centrifuged to concentrate until there was ~0.25mL of ACP solution remaining above the filter membrane.

ESI-MS experiments were performed on a Bruker Daltonics SolariX FTICR equipped with a 7.0 tesla shielded superconducting magnet. The ions were generated from an external ESI source with mass range from m/z 200 up to 10,000 and monitored in negative mode. The instrument was externally calibrated with Agilent ESI calibration mix as well as using purified HIS tagged

E. coli ACP as an internal quantification and mass lock standard.

16

RESULTS

Development and confirmation of KAS mutant strains

To confirm the genotypes of the stock strains and the recovered ΔfabF/ΔfabH double mutant, PCR and western blot based studies were utilized. The PCR with fabH specific primers showed observable bands in the Wt and ΔfabF strains, while no products were visible in the

ΔfabH or the ΔF /ΔH after visualization by gel electrophoresis (Fig. 2 A). These findings confirm that the Wt and ΔfabF strains have a copy of the fabH gene. Western blot detection of protein extracts using anti-fabF antibodies revealed that Wt and ΔfabH strains have detectible luminescence from fabFp bands, while ΔF /ΔH and ΔfabF strains had no detectable band for fabFp (Fig. 2 B). This western blot data was consistent with the ΔfabF genotype and the newly generated ΔF /ΔH cells. The PCR with primers specific for the GenR knockout cassette used to replace the native fabF gene showed an observable band in the ΔF /ΔH cells, while no products were visible in the Wt and ΔfabF, ΔfabH after visualization by gel electrophoresis (Fig. 2 C).

This PCR data confirms that the ΔF /ΔH cells have a copy of the GenR knockout cassette.

Growth of single and double KAS mutants

Growth of wild type and ΔfabH cells on MOPS minimal media and LB rich media agar plates revealed that the ΔfabH cells had slow growth effect. This slower growth was most visible after 24 hours, while after 48 hours the ΔfabH cells had developed larger colonies (Fig. 3).

Growth curves of wild type and ΔfabH cells in LB rich media (Fig. 4 A) and in MOPS minimal media (Fig. 3 B) both were monitored for 20 hours at 600nm. Doubling times (Fig. 4 C) were calculated at the point of maximum growth rate. The ΔfabH cells had statistically significantly longer doubling times in both LB and MOPS minimal media (p-value <0.05) with the doubling 17 times increasing by over 0.3 hours in both conditions. In addition, the final cell density of the

ΔfabH cultures was reduced, by nearly 0.2 OD600 under both media conditions.

Because ΔfabH cells are smaller in size than Wt cells (Yao et al., 2012), there was the prospect that the OD600 may not be an accurate measurement of cell numbers. To determine the utility of OD600 in determining growth rate and cell densities, cells/OD600/mL values were established (Fig. 4 D) based on the number of recovered colonies from diluted cultures plated on

LB agar plates. There was not a significant difference in the number of cells/mL/OD600 between wild type and ΔfabH cells. Having equivalent numbers of cells/mL/OD600 indicates that OD600 values are an accurate measurement of cell number and can be used for the calculation of cell division rates on other growth curve characteristics of the Wt and ΔfabH strains.

Phospholipid analysis FT-MS and 13C labeling

Stable isotope labeling is a powerful tool that can be used with high accuracy mass spectrometers such as FT-ICR MS to assign the elemental compositions of metabolites

(Hegeman et al., 2007). 13C labeled glucose was used as the sole carbon source to grow Wt and

ΔfabH cultures with 13C labeled metabolites that were compared to reference cultures grown on natural abundance glucose. The resulting cultures were subjected to lipid analysis using FT-ICR

MS and the resulting 13C and 12C spectrums were compared using shifted peak finding software,

ShiftedIonsFinder (Kazusa DNA Research Institute, Kisarazu, Chiba, Japan). These results were used to confirm the identity of the known PE and PG lipids. Interestingly, two shifted peaks

787.7m/z (12C) and 801.7m/z (12C) with 43 and 44 incorporated 13C labels, respectively, were detected in the lipid extracts. These masses match the predicted molecule weights of PG 18

C18:1/cyC19:0 and PG cyC19:0/cyC19:0. However these phospholipids are not known to exist in E. coli and are not found in literature. In an effort to confirm the identity of these novel phospholipids, MS-MS analysis was conducted on the 13C labeled samples (Fig. 5). As the more abundant of the two new lipids, MS-MS was successful in confirming the PG C18:1/cyC19:0

(13C labeled 830.8m/z) peak as being appropriately identified based on the identification techniques previously described (Oursel et al., 2007). To determine the positions of the acyl groups as sn1 or sn2, we looked at the relative abundance of 515m/z and 533m/z fragments, which are C18:1 specific, and 529m/z and 547m/z peak fragments that are cyC19 specific. The presence of the four times more abundant 515m/z and 533m/z peaks correspond to fragments without the sn2 R group and an aldehyde in its place, this fragment is observed with and without water. This peaks (515m/z and 533m/z) indicated a C18:1 acyl chain at the sn1 R group position.

These key fragments indicate that C18:1 group is most common at the sn1 positions (farthest away from PG group). In addition to being novel phospholipids PG C18:1/cyC19:0 and PG cyC19:0/cyC19:0 are both statically significantly increased in the ΔfabH mutant cells with a 7 and 6 fold increase in abundance respectively (Fig. 7).

Fatty acid and lipid analysis with GC and FT-MS

Because the of fabH, acetoacetyl-ACP contributes directly to de novo fatty acid biosynthesis, it is imperative to understand the impact of the ΔfabH mutation on the fatty acid pool and lipid composition of the mutant cells. Understanding the impact of removing the fabH gene on fatty acid and lipid pools can aid in elucidating the function of fabH and add information concerning how E. coli compensate for the lost KASIII activity. The fatty acid and lipid content 19 of ΔfabH deletion mutant cells were analyzed and compared to Wt cells. The total fatty acid content (Fig. 6 A) of the ΔfabH cells had a statistically significant reduction compared to wild type of 38 nmoles/g dry weight (p-value<0.5). The percent of unsaturated fatty acid remained unchanged between the Wt and ΔfabH cells (Fig. 6 B). Figure 6 C displays the relative abundance of each fatty acid as a percent of the total recovered. With the exception of C14:OH, all of the fatty acid groups C16 and shorter are significantly higher in the ΔfabH mutant, while any chains longer than C16 are statistically significantly increased compared to wild type (p- value<0.5).

The phospholipids quantified in using HPLC and FT-ICR MS (Fourier transform ion cyclotron resonance mass spectrometry), detected in both negative mode (Fig. 7) and positive mode (Fig. 8) electrospray ionization (ESI), strongly reflect the fatty acid quantification data

(Fig. 6). In ΔfabH cells phospholipid molecules with C16:1 or C16:0 acyl groups are significantly reduced in both types of the major membrane lipids found in E. coli, phosphatidylethanolamine (PE) and phosphatidylglycerol (PG). While PG and PE molecules with C18:1 or CyC19 without C16:0 or C16:1 acyl groups were significantly increased in abundance (p-value<0.5) in the ΔfabH mutant (Fig. 7).

ACP profiling of KASIII mutant

ACP molecules and short chain acyl-ACPs were analyzed to view the direct effect the fabH mutation has on the early intermediates of fatty acid biosynthesis. The acetyl, holo, and an unknown acyl-ACP (of 8925m/z) (Fig. 9), all are statistically significantly elevated in the ΔfabH mutant, with increases of 0.35 +/-0.08, 0.38+/-0.08, and 0.38+/ 0.06-fold respectively (Fig. 10

A). Interestingly, the acetoacetyl-ACP content was not statistically different between the Wt and

ΔfabH cells. When the APC molecules were compared as a fraction of the total small ACP pool, 20 the acetoacetyl-ACP content of the ΔfabH mutant was statistically significantly reduced 27% compared to wild type (p-value <0.5) (Fig. 10 B), while the fractional content of the other detected ACP molecules were unchanged between Wt and ΔfabH samples.

Quantification of fabF and fabB protein accumulation

Polyclonal antibodies specific to fabB and fabF proteins were produced to detect changes in the protein accumulation of these two KAS enzymes. Western blot analysis of fabF and fabB proteins showed the ΔfabH cells had increases in both fabF and fabB by 4- and 2-fold respectively (Fig. 11 A). When observed as a ratio of fabF/fabB, the ΔfabH cells showed an increase of over 50% in this ratio as compared to Wt (Fig. 11 B). These were statistically significant differences (p-value <0.05) between the Wt and ΔfabH cells.

Growth analysis of KAS single and double mutants among varying media types

The growth characteristics of the ΔfabH and ΔfabF single mutants, as well as the

ΔfabF/ΔfabH double mutant, were monitored under four specific media conditions: LB, terrific broth (TB), MOPS minimal media, and M9 minimal media. All the graphs Figures 12 to 17 represent the average of four replicates of each genotype under these four media conditions. Two different starting inoculum treatments were tested; in Figures 12-14 the cultures were started from cells grown to stationary phase in LB, while in Figures 15-17 the cultures were inoculated with cells grown to stationary phase in same media type they were entering. Even with this obvious slow growth phenotype, the final OD reached by all genotypes was nearly unchanged among them (Fig. 12 A and Fig. 15 A). Lag time was observed under three inoculum conditions: starting from overnight LB cultures, overnight MOPS cultures, and from mid-log 21

MOPS cultures (Fig. 18). The results were constant across the conditions with the ΔfabF/ΔfabH double mutant having the longest lag time, followed by the ΔfabH single mutant.

Samples were acquired throughout the growth of cultures in minimal MOPS media: collections were performed specifically at 4 hours after inoculation and at OD600 values of 0.2 (late lag/early log phase), 0.8 (log phase), 1 (early stationary phase) and at stationary phase, as indicated in

Figure 19. Fatty acids were analyzed as ethyl ester derivates and are presented (Fig. 20 – Fig.

26) The total fatty acid content was calculated relative to the Wt samples collected at an OD600 of 0.2 (Fig. 19 and Fig. 22). The only cells which had significant increases in fatty acid were

ΔfabF/ΔfabH double mutant cells collected at OD600 0.2, which had a 40% increase in total fatty acid content compared to the Wt control. This increase in total fatty acid amount was seen across nearly all detected fatty acid compounds, with C18:0 making the largest increase in proportion to the total, at 7%, from 13% in Wt to 20% in ΔfabF/ΔfabH cells (Fig. 21 and Fig. 22). Figure 23 focuses on the long chain fatty acids of the log phase samples. The ΔfabH samples showed C18:1 as more abundant than C16:0, while ΔfabF and Wt cells maintained C16:0 as the more abundant fatty acid constituent. The ΔfabF/ΔfabH cells reverted back to Wt condition, where the C16:0 amount exceeded that of the C18:1 fatty acid (Fig. 24). Figures 25 and 26 show the medium and short chain fatty acids profiles of the four genotypes analyzed. The profiles of these show cultures had significant changes within each genotype, depending on the growth phase.

22

DISCUSSION

The ΔfabH deletion mutant showed a consistent slow growth phenotype with longer doubling times and lag times. In addition to the slow growth, the ΔfabH cells also had a modest reduction in total fatty acid content as well as a dramatic increase in cis-vaccenic acid (C18:1) abundance. In fact C18:1 fatty acid replaced C16:0 as the most abundant fatty acid in ΔfabH cultures. Because cis-vaccenic acid is formed by the elongation of pamitoleoyl-ACP by the

KASII enzyme fabF, we investigated the role of fabF in the survival of ΔfabH mutant cells.

Perhaps in E. coli, the KASII gene was able to compensate for the loss of fabH, as has been shown previously in L. lactis (Morgan-Kiss and Cronan, 2008). We were able to recover

ΔfabF/ΔfabH double mutant E. coli cells and evaluate their growth characteristics. The double mutant had a slightly more severe slow growth phenotype than the ΔfabH single mutant, with longer doubling times and lag times, while the ΔfabF single mutant control did not have any discernible growth phenotypes when compared to Wt cells in the media conditions tested at

37°C.

The accumulation of C18:1 fatty acid may be a result of the increase in fabF/fabB ratio observed in ΔfabH mutant cells. Another possible mode of generating additional C18:1 would be an up regulation of fabA, as the fabA/fabB ratios are responsible for regulating the amount of unsaturated fatty acids (Xiao et al., 2013). However fabA is unlikely to be responsible for the

C18:1 accumulation, as there was not an observed change in total unsaturated fatty acid percent in the ΔfabH mutant cells compared to wild type. The definitive evidence for fabFs role in the accumulation of the C18:1 in the ΔfabH mutant cells was revealed in the ΔfabF/ΔfabH double mutant cells. In late log phase (0.8 OD600) cells the ΔfabH cells had significantly more C18:1 23 than C16:0 while the ΔfabF/ΔfabH double mutant cells had returned to the wild type profile of

C16:0 greater than C18:1. This result demonstrated that fabF is required for the C18:1 hyper- accumulation chemotype displayed in ΔfabH cells. As expected, the ΔfabF mutants showed increases in palmitoleic acid (C16:1). That is consistent with the loss of fabF function.

The altered phospholipid profile of the ΔfabH mutant was observed using high accuracy

FT-ICR MS with and without 13C labeling. Two novel phospholipids were detected. The two new phospholipids were both phosphatidylglycerol molecules: 1-(11Z-octadecenoyl)-2-(11- cyclopropyl-octadecanoyl)-sn-glycero-3-phosphatidylglycerol and 1-(11-cyclopropyl- octadecanoyl)-2-(11-cyclopropane-octadecanoyl)-sn-glycero-3-phosphatidylglycerol. These phospholipids are not specifically reported in E. coli phospholipid literature but were detectable in the wild types cells at reduced levels. The combination of these lipids being hyper- accumulated in the ΔfabH mutant and the sensitivity of the FT-ICR MS allowed for their detection and identification in this study.

The KASII and KASIII single and double mutants were grown and compared to the wild type cells under varying media and pre-culture conditions. The summation of these results is that regardless of the media condition and starting pre-culture inoculum treatment, the Wt and ΔfabF cells grew very similarly, while the ΔfabF/ΔfabH double mutant and ΔfabH single mutant grew more slowly. The ΔfabF/ΔfabH double mutant grew the slowest followed by the ΔfabH single mutant. It is unclear which role of fabF, initiation or elongation or both, the double mutants cells are deprived of, but their growth was modestly effected by the loss of fabF activity. Even with this obvious slow growth phenotype, the final OD reached by all genotypes was nearly unchanged among them. This result suggests that the growth rate is decreased but the growth 24 potential (final OD) of the cultures was not severely diminished by the loss of KASII and KAS II enzymes.

Although the ΔfabF/ΔfabH double mutant grew more slowly than the ΔfabH single mutant, it was, in fact, viable. This finding reveals that KASI (fabB) can support growth alone without any other KAS genes present. In addition to catalyzing condensation reactions, both

KASI and KASII enzymes are known to have malonyl-ACP decarboxylase activity that leads to the formation of acetyl-ACP (Alberts et al., 1972; Morgan-Kiss and Cronan, 2008). These previously reported findings were supported by the presence of considerable higher amounts acetyl-ACP in both the wild type and ΔfabH cells. This data, taken together with the previous findings, suggests that FabB likely can initiate fatty acid synthesis in the absence of an acetoacetyl-ACP primer via a side reaction involving malonyl-ACP decarboxylase activity and the elongation of the available acetyl-ACP to form the missing acetoacetyl-ACP primer, there by forming a fabH bypass pathway (Fig. 27) (Alberts et al., 1972; Garwin et al., 1980, 1980;

Jackowski and Rock, 1987; Magnuson et al., 1993; Morgan-Kiss and Cronan, 2008).

Additionally, it has also been shown that fabB is capable of catalyzing all the condensation reactions up to C18 in E. coli; this trait could allow fabB to produce mature long chain fatty acids without fabF or fabH present (Alberts et al., 1972; Garwin et al., 1980, 1980; Jackowski and Rock, 1987; Magnuson et al., 1993; Morgan-Kiss and Cronan, 2008). Although it has been predicted that fabB may be able to support growth as the only KAS enzyme present in E. coli, the ΔfabF/ΔfabH double mutant data in this study is the first in vivo demonstration of fabB supporting growth as the sole KAS in E. coli.

25

REFERENCES

Alberts AW, Bell RM, Vagelos PR (1972) Acyl carrier protein. XV. Studies of -ketoacyl-acyl carrier protein synthetase. J Biol Chem 247: 3190-3198 Baba T, Ara T, Hasegawa M, Takai Y, Okumura Y, Baba M, Datsenko KA, Tomita M, Wanner BL, Mori H (2006) Construction of Escherichia coli K-12 in-frame, single-gene knockout mutants: the Keio collection. Mol Syst Biol 2: 2006.0008 Baba T, Mori H (2008) The construction of systematic in-frame, single-gene knockout mutant collection in Escherichia coli K-12. Methods Mol Biol 416: 171-181 Battesti A, Bouveret E (2006) Acyl carrier protein/SpoT interaction, the switch linking SpoT- dependent stress response to fatty acid metabolism. Mol Microbiol 62: 1048-1063 Battesti A, Bouveret E (2009) Bacteria possessing two RelA/SpoT-like proteins have evolved a specific stringent response involving the acyl carrier protein-SpoT interaction. J Bacteriol 191: 616-624 BERTANI G (1951) Studies on lysogenesis. I. The mode of phage liberation by lysogenic Escherichia coli. J Bacteriol 62: 293-300 Bonaventure G, Salas JJ, Pollard MR, Ohlrogge JB (2003) Disruption of the FATB gene in Arabidopsis demonstrates an essential role of saturated fatty acids in plant growth. Plant Cell 15: 1020-1033 Cronan J, Lin S (2011) Synthesis of the alpha,omega-dicarboxylic acid precursor of biotin by the canonical fatty acid biosynthetic pathway. Current Opinion in Chemical Biology 15: 407-413 Datsenko KA, Wanner BL (2000) One-step inactivation of chromosomal genes in Escherichia coli K-12 using PCR products. Proc Natl Acad Sci U S A 97: 6640-6645 de Mendoza D, Klages Ulrich A, Cronan JE (1983) Thermal regulation of membrane fluidity in Escherichia coli. Effects of overproduction of beta-ketoacyl-acyl carrier protein synthase I. J Biol Chem 258: 2098-2101 26

Edwards P, Nelsen JS, Metz JG, Dehesh K (1997) Cloning of the fabF gene in an expression vector and in vitro characterization of recombinant fabF and fabB encoded enzymes from Escherichia coli. FEBS Lett 402: 62-66 Garwin JL, Klages AL, Cronan JE (1980) Beta-ketoacyl-acyl carrier protein synthase II of Escherichia coli. Evidence for function in the thermal regulation of fatty acid synthesis. J Biol Chem 255: 3263-3265 Garwin JL, Klages AL, Cronan JE (1980) Structural, enzymatic, and genetic studies of beta- ketoacyl-acyl carrier protein synthases I and II of Escherichia coli. J Biol Chem 255: 11949-11956 Heath RJ, Rock CO (2004) Fatty acid biosynthesis as a target for novel antibacterials. Curr Opin Investig Drugs 5: 146-153 Hegeman AD, Schulte CF, Cui Q, Lewis IA, Huttlin EL, Eghbalnia H, Harms AC, Ulrich EL, Markley JL, Sussman MR (2007) Stable isotope assisted assignment of elemental compositions for metabolomics. Anal Chem 79: 6912-6921 Jackowski S, Rock CO (1987) Altered molecular form of acyl carrier protein associated with beta-ketoacyl-acyl carrier protein synthase II (fabF) mutants. J Bacteriol 169: 1469-1473 Lai CY, Cronan JE (2003) Beta-ketoacyl-acyl carrier protein synthase III (FabH) is essential for bacterial fatty acid synthesis. J Biol Chem 278: 51494-51503 Li HQ, Luo Y, Zhu HL (2011) Discovery of vinylogous carbamates as a novel class of β- ketoacyl-acyl carrier protein synthase III (FabH) inhibitors. Bioorg Med Chem 19: 4454- 4459 Lin S, Hanson RE, Cronan JE (2010) Biotin synthesis begins by hijacking the fatty acid synthetic pathway. Nat Chem Biol 6: 682-688 Magnuson K, Jackowski S, Rock CO, Cronan JE (1993) Regulation of fatty acid biosynthesis in Escherichia coli. Microbiol Rev 57: 522-542 Morgan-Kiss RM, Cronan JE (2008) The Lactococcus lactis FabF fatty acid synthetic enzyme can functionally replace both the FabB and FabF proteins of Escherichia coli and the FabH protein of Lactococcus lactis. Arch Microbiol 190: 427-437 Oh-hashi Y, Okuyama H (1986) Effects of the temperature range and the lack of beta-ketoacyl acyl-carrier protein synthase II on fatty acid synthesis in Escherichia coli K12 after shifts in temperature. Biochim Biophys Acta 876: 146-153 27

Oursel D, Loutelier-Bourhis C, Orange N, Chevalier S, Norris V, Lange CM (2007) Lipid composition of membranes of Escherichia coli by liquid chromatography/tandem mass spectrometry using negative electrospray ionization. Rapid Commun Mass Spectrom 21: 1721-1728 Ramamoorthy D, Turos E, Guida WC (2013) Identification of a new in E. coli FabH using Molecular dynamics simulations: validation by computational alanine mutagenesis and docking studies. J Chem Inf Model 53: 1138-1156 Rigaudy J, Klesney SP (1979) Nomenclature of Organic Chemistry. Pergamon Press Tartoff KD, Hobbs CA (1987) Improved media for growing plasmid and cosmid clones. In, Vol 9:12. Bethseda Research Laboratories Focus Ulrich AK, de Mendoza D, Garwin JL, Cronan JE (1983) Genetic and biochemical analyses of Escherichia coli mutants altered in the temperature-dependent regulation of membrane lipid composition. J Bacteriol 154: 221-230 Wang XL, Zhang YB, Tang JF, Yang YS, Chen RQ, Zhang F, Zhu HL (2012) Design, synthesis and antibacterial activities of vanillic acylhydrazone derivatives as potential β- ketoacyl-acyl carrier protein synthase III (FabH) inhibitors. Eur J Med Chem 57: 373-382 Xiao X, Yu X, Khosla C (2013) Metabolic flux between unsaturated and saturated fatty acids is controlled by the FabA:FabB ratio in the fully reconstituted fatty acid biosynthetic pathway of Escherichia coli. Biochemistry 52: 8304-8312 Yang YS, Zhang F, Gao C, Zhang YB, Wang XL, Tang JF, Sun J, Gong HB, Zhu HL (2012) Discovery and modification of sulfur-containing heterocyclic pyrazoline derivatives as potential novel class of β-ketoacyl-acyl carrier protein synthase III (FabH) inhibitors. Bioorg Med Chem Lett 22: 4619-4624 Yao Z, Davis RM, Kishony R, Kahne D, Ruiz N (2012) Regulation of cell size in response to nutrient availability by fatty acid biosynthesis in Escherichia coli. Proc Natl Acad Sci U S A 109: E2561-2568 Zhou Y, Du QR, Sun J, Li JR, Fang F, Li DD, Qian Y, Gong HB, Zhao J, Zhu HL (2013) Novel Schiff-base-derived FabH inhibitors with dioxygenated rings as antibiotic agents. ChemMedChem 8: 433-441

28

Figure 1. E. coli de novo fatty acid biosynthesis pathway

This schematic shows the specific roles of the KAS enzymes in the elongation steps of the fatty acid biosynthetic pathway.

FabB (KAS I), β-Ketoacyl-[acyl carrier protein] synthase I

FabF (KAS II), β-Ketoacyl-[acyl carrier protein] synthase II

FabH (KAS III), β-Ketoacyl-[acyl carrier protein] synthase III

(Adapted figure from EC FAS paper)

29

Figure 2. KAS single mutants and ΔfabF/ Δ fabH double mutant confirmation

Gel electrophoresis analysis of PCR products of gene specific primers for fabH (KASIII) used to detect the presence of the fabH gene at the DNA level (A). Panel B shows the western blot analysis of protein extracts from the Wt and KAS mutants with mouse anti-fabF antibodies. Gel electrophoresis analysis of PCR products of primers specific for GenR knockout cassette used to delete the native fabF gene (C).

30

Figure 3. Growth of wild type and ΔfabH E. coli on solid media

Growth of wild type and ΔfabH cells on MOPS minimal media plates agar at 24 (A) and 48 hours (B) after streaking and LB rich media agar plates are also shown at 24 (C) and 48 (D) hours after streaking.

31

Figure 4. Growth of wild type and ΔfabH E. coli in liquid media

Growth curves of wild type and ΔfabH cells in LB rich media (A) and in MOPS minimal media (B), both were monitored for 20 hours at 600nm. Doubling times (C) were calculated at the point of maximum growth rate. The ΔfabH had statistically significantly longer doubling times in both LB and MOPS minimal media. The OD600 to cell density (D) was calculated based on the number of recovered colonies from diluted cultures plated on LB plates. 32

Figure 5. MS/MS spectrum of 830.8m/z in negative mode

The spectrum of the isolated 830.8 m/z ion from 13C labeled E. coli ΔfabH nonpolar extracts detected in negative mode with CID fragmentation is presented. Red arrows indicated C18:1 containing fatty acid constituents while blue arrows indicate cyC19:0 containing fragments (A). The bar graphs in figure B represent the relative abundance of the average of four biological replicates for each phospholipid relative to an internal standard and the mass of the sample. Molecules were detected using ESI in negative mode. The error bars represent +/- standard error. The “*” represent statistically different (p-value <0.05) values between Wt and ΔfabH cells.

33

Figure 6. Fatty methyl ester quantification of ΔfabH and Wt cells using GC-MS

The total fatty acid content of the Wt and ΔfabH cells is presented (A). The total percent of unsaturated fatty acids (acyl chains containing cyclopropyl or carbon-carbon double bounds) is presented in B. Figure C displays the relative abundance of each fatty acid as a percent of the total. Statistically significant differences are indicated by “*” (p-value<0.5). Error bars represent +/- standard error.

34

Figure 7. HPLC FT-ICR MS quantification of phospholipids

The bar graphs represent the relative abundance of the average of four biological replicates for each phospholipid relative to an internal standard and the mass of the sample. Molecules were detected using ESI in negative mode. Both types of the major membrane lipids of E. coli phosphatidylethanolamine (PE) and phosphatidylglycerol were detected (PG). The error bars represent +/- standard error. The * represent statistically different (p-value <0.05) between Wt and ΔfabH cells. Note for low abundance compounds: PG cyC19:0/cyC19:0 is elevated in ΔfabH while PE C16:0/C16:0 reduced in ΔfabH.

35

Figure 8. HPLC FT-ICR MS quantification of phospholipids

The bar graphs represent the relative abundance of the average of four biological replicates for each phospholipid relative to an internal standard and the mass of the sample. Molecules were detected using ESI in positive mode. The error bars represent +/- standard error. The * represent statistically different (p-value <0.05) between the Wt and ΔfabH cells.

36

Figure 9. FT-ICR MS spectrum of acyl-ACP extracts

Above are the mass spectrums of ACP containing extracts from Wt and ΔfabH cells analyzed using FT-MS with ESI in negative mode. The arrows highlight specific compounds with a -8 charge.

37

Figure 10. FT-ICR MS quantification of acyl-ACP

The short chain acyl-ACPs are presented as bar graphs of the average of four biological replicates made relative to the wild type average value (A). Figure B presents the ACPs as a fraction of the total amount of the total of the four ACP derivatives, average of four biological replicates is presented. The error bars represent +/- standard error. The “*” represent statistically different values (p-value <0.05) between Wt and ΔfabH cells.

38

Figure 11. Quantification of fabF and fabB protein accumulation

Western blot analysis of fabF and fabB proteins is presented as bar graphs with the average of four biological replicates made relative to Wt levels (A) and the calculated fabF/fabB ration (B). The error bars represent +/- standard error. The * represent statistically different values (p-value <0.05) between Wt and ΔfabH cells.

39

Figure 12. Growth curves of wild type and KAS mutant E. coli in liquid media starting from overnight LB cultures

The graphs represent the average of four replicates each genotype under the respective conditions. Each graph contains all four genotypes grown in a specific media. LB (A), terrific broth (TB) (B), MOPS minimal media (C), M9 minimal media (D). Each sample was diluted to the same starting OD600 (0.001) and grown until stationary phase was reached and maintained. The error bars represent +/- standard error.

40

Figure 13. Growth characteristics of wild type and KAS mutant E. coli in liquid media

The bars represent the average of four replicates each genotype under the respective conditions. Each sample was started with inoculum grown in LB overnight and all samples were diluted to an equal starting OD600 (0.001). Max OD600 represents the highest OD600 reached during the time the cultures were observed (A). Figure B shows the time to reach the max OD observed. The fastest doubling time is represented as max growth shown in figure C. The time to reach the observed max growth was calculated (D). The error bars represent +/- standard error. The “*” represent statistically different values (p-value <0.05) between Wt and other genotypes cells and “**” represents statistically different values for both Wt and ΔfabH.

41

Figure 14. Lag times of wild type and KAS mutant E. coli in liquid media

The bars represent the average of four replicates each genotype under the respective conditions. Each sample was started with inoculum grown in LB overnight and all samples were diluted to an equal starting OD600 (0.001). The lag time is measured at the point when the measured OD surpasses the threshold OD600 (0.07). The error bars represent +/- standard error.

42

Figure 15. Growth curves of wild type and KAS mutant E. coli in liquid media starting from each respective media

The graphs represent the average of four replicates each genotype under the respective conditions. Each graph contains all four genotypes grown in a specific media. LB (A), terrific broth (TB) (B), MOPS minimal media (C), M9 minimal media (D). Each sample was diluted to the same starting OD600 (0.001) and grown until stationary phase was reached and maintained. The error bars represent +/- standard error.

43

Figure 16. Lag times of wild type and KAS mutant E. coli in liquid media

The bars represent the average of four replicates each genotype under the respective conditions. Each sample was started with inoculum grown in the same media type and all samples were diluted to an equal starting OD600 (0.001). Max OD600 represents the highest OD600 reached during the time the cultures were observed (A). Figure B shows the time to reach the max OD observed. The fastest doubling time is represented as max growth shown in figure C. The time to reach the observed max growth was calculated (D). The error bars represent +/- standard error. The “*” represent statistically different values (p-value <0.05) between Wt and other genotypes cells and “**” represents statistically different values for both Wt and ΔfabH.

44

Figure 17. Lag times of wild type and KAS mutant E. coli in liquid media

The bars represent the average of four replicates each genotype under the respective conditions. Each sample was started with inoculum grown in the same media type and all samples were diluted to an equal starting OD600 (0.001). The lag time is measured at the point when the measured OD surpasses the threshold OD600 (0.07). The error bars represent +/- standard error.

45

Figure 18. Lag times of wild type and KAS mutant E. coli in liquid media

The bars represent the average of four replicates each genotype under the respective conditions.

Each samples was diluted to the same starting OD600 (0.001) and the time is measured at the point when the measured OD surpasses the threshold OD600 (0.07). The error bars represent +/- standard error. “*” denotes that the peak is statistically significantly different from all the other genotypes in that condition.

46

Figure 19. Collection of Wt and KAS mutant samples at specific growth points

The plots represent the average of three replicates each genotype grown in 250mL cultures in flasks. Samples were diluted to the same starting OD600 (0.001). Each arrow indicates a specific

OD600 cut off where samples were taken for further analysis.

47

Figure 20. Quantification of the total fatty acid content of KAS mutants and Wt cells by GC-MS analysis of ethyl esters

The total fatty acid content of KAS mutants and Wt cells at specific growth stages are displayed.

Each bar graph represents the average of four replicates calculated relative to the Wt OD600 0.2 average. Statistically significant changes compared to Wt OD600 0.2 (p-value<0.5) are marked (*). Error bars represent +/- standard error. F= ΔfabF, H= ΔfabH , FH= ΔfabF/ΔfabH.

48

Figure 21. Fatty composition of KAS mutants and Wt cells by GC-MS analysis of ethyl esters. The fatty acid content of KAS mutants and Wt cells at specific growth stages are presented. Each colored section of the bar graph represents the average of four replicates calculated percent contribution of each fatty acid to the total fatty acid pool. F= ΔfabF, H= ΔfabH , FH= ΔfabF/ΔfabH.

49

Figure 22. Fatty composition and total content of KAS mutants and Wt cells by GC-MS analysis of ethyl esters

The fatty acid content of KAS mutants and Wt cells at specific growth stages are displayed. Each colored section of the bar graph represents the average of four replicates calculated moles/mg contribution of each fatty acid to the total fatty acid pool. F= ΔfabF, H= ΔfabH , FH= ΔfabF/ΔfabH.

50

Figure 23. Long chain fatty composition of KAS mutants and Wt cells by GC-MS analysis of ethyl esters

The fatty acid content of KAS mutants and Wt cells at specific growth stages are presented. Each colored bar graph represents the average of four replicates and its content in moles/mg. Only fatty acids longer that 16 carbons are displayed. F= ΔfabF, H= ΔfabH , FH= ΔfabF/ΔfabH.

51

Figure 24. Long chain fatty composition of KAS mutants and Wt cells by GC-MS analysis of ethyl-esters

The fatty acid content of KAS mutants and Wt cells at specific growth stages are presented. Each colored bar graph represents the average of four replicates and its content in moles/mg. Only fatty acids longer that 16 carbons are displayed. Statistically significant changes compared to Wt (a), ΔfabH (b), ΔfabF (c), and ΔfabF/ΔfabH (d) (p-value<0.5). Error bars represent +/- standard error. F= ΔfabF, H= ΔfabH, FH= ΔfabF/ΔfabH.

52

Figure 25. Medium chain fatty composition of KAS mutants and Wt cells by GC-MS analysis of ethyl-esters

The fatty acid content of KAS mutants and Wt cells at specific growth stages are presented. Each colored bar graph represents the average of four replicates and its content in moles/mg. Only fatty acids between 10-14 carbons are displayed. F= ΔfabF, H= ΔfabH , FH= ΔfabF/ΔfabH.

53

Figure 26. Short chain fatty composition of KAS mutants and Wt cells by GC-MS analysis of ethyl-esters

The fatty acid content of KAS mutants and Wt cells at specific growth stages are presented. Each colored bar graph represents the average of four replicates and its content in moles/mg. Only fatty acids with fewer than 10 carbons are displayed. F= ΔfabF, H= ΔfabH, FH= ΔfabF/ΔfabH.

54

Figure 27. E. coli de novo fatty acid biosynthesis fabH bypass pathway

This schematic shows the specific roles of the KAS enzymes in the elongation of steps of the fatty acid biosynthetic pathway without a KASIII enzyme.

FabB (KAS I), β-Ketoacyl-[acyl carrier protein] synthase I

FabF (KAS II), β-Ketoacyl-[acyl carrier protein] synthase II

(Adapted figure from EC FAS paper)

55

Supplemental Table 1. Primer list

56

CHAPTER III. GENE 2 FUNCTION PROJECT IN METHANOSARCINA ACETIVORANS: INVESTIGATING A NOVEL COMBINED NMR LIGAND SCREENING AND METABOLOMIC ANALYSIS BASED APPROACH FOR EFFICIENT AND ACCURATE GENE FUNCTIONAL ANNOTATION

INTRODUCTION

The following was a collaboration among three groups, the Sowers and Orban groups at the University of Maryland and the Nikolau group here at Iowa State University. Note that as a group, we have published portions of this work already (Chen et al., 2011).

The first organism to have its complete genome sequenced was the Haemophilus influenza (Fleischmann et al., 1995). Of the 1740 genes in this genome approximately 25% had no known function. The majority of these had sequence homologs in other organisms, but no homology to any genes with determined functions. These genes were labeled as “conserved hypotheticals”. The amount of genomic sequence data has become massive. Currently, sequencing efforts have yielded the sequencing of approximately 22,800 genomes, from 16,500 organisms, representing 349 million genes and genomic features (CoGe, 2014). As the number of sequenced genomes has grown, the observation remains that a significant proportion (as high as 70% in some cases) of prokaryotic, archaeal, and eukaryotic genomes have unknown functions. For example, among all archaea ~33% of the genes sequenced are currently labeled as genes of unknown function (Yelton et al., 2011). This lack of functional annotation represents a persistent and significant gap in our understanding of Archaeal biology and biochemistry.

Advantageously, conserved hypotheticals often have sequence homologs in hundreds of other 57 organisms and so functional assignment for one homolog would provide leverage in annotating a large set of genes. With this potential cascading effect, these conserved hypothetical genes represent strong candidates for functional studies.

Protein functions are currently annotated in genomic databases using automated algorithms that search for sequence homology to a gene product with an established function.

These sequence-based annotations frequently have variability in their accuracy, especially if the sequence identity to a protein of known function is low. Additionally, recent work has shown that even proteins with very high sequence identity can have different structures and functions

(Parsons et al., 2003; Yeh et al., 2005; He et al., 2008; Luo and Yu, 2008; Tuinstra et al., 2008); therefore caution is necessary when assigning functions simply by sequence homology in the absence of experimental validation. Traditional experimental approaches to determine function, such as enzyme assays, are slow, painstaking, and have not been able to keep up with the ever- increasing large body of genome sequence data that contains many genes with unconfirmed and undetermined function. There is an obvious need for more efficient methods for accurate, experimentally based annotation and validation of protein function.

A novel combined metabolomic and NMR-based approach was developed to meet the challenge of efficient and accurate experimental validation of the functions of putative enzymes

(Fig. 1). This method utilized a NMR-based ligand screening strategy (Dalvit et al., 2002). Our preferred method was waterLOGSY (water-ligand observed via gradient spectroscopy) pulse sequence (Dalvit et al., 2001), as consistent results were obtained in our hands using this approach. The second method employed was metabolomics, which allowed for the determination of a biochemical phenotype associated with the presence or absence of a particular gene (Förster et al., 2002; Hall et al., 2002; Phelps et al., 2002; Bino et al., 2004; Fridman and Pichersky, 58

2005; Koek et al., 2006; Swanepoel and Loots, 2014). Knowledge of a gene’s biochemical phenotype can be exploited as a method to identify potential functionality directly, or provide rationale for choosing potential substrates for subsequently analysis via NMR-based ligand screening strategy. Conversely, proteins which waterLOGSY has indicated bind a particular ligand may be further investigated using a more directed metabolomics approach to ascertain which metabolic pathways are affected by the presence or absence of that particular protein. In order to independently confirm gene functions determined with waterLOGSY and or metabolomics, genetic complementation and enzyme assays were implemented, when appropriate. This combination of approaches has the potential to work synergistically towards efficiently and accurately determining gene functions.

Choice of organism

We used the methanogenic archaeon, Methanosarcina acetivorans (MA) (SOWERS et al., 1984; Galagan et al., 2002; Longstaff et al., 2007; Rother et al., 2007; Chen et al., 2011), as a model organism to develop tools for efficient and reliable annotation and validation of protein function. Methane-producing organisms are of particular interest because of their ability to generate cost-effective biofuel. However, there was and is a significant gap in the biological understanding of methanogens. The lack of accurate functional annotation for the large quantity sequence data represents a major hurdle in the pursuit of biological knowledge of these organisms. Information gleaned from the demonstration of our novel approach to experimentally annotating gene functions is valuable in the pursuit of biological understanding of methanogens in general, in addition to MA itself.

59

The waterLOGSY method was initially developed for use as a ligand screening tool for drug targets and is useful as a rapid, relatively high through-put approach (Dalvit et al., 2001).

WaterLOGSY experiments observe the transfer of magnetization between ligand and water molecules (Fig. 2 A.). When a ligand is in solution with a protein capable of binding it, there are two competing flows of magnetization: 1) from water to the free ligand and 2) from the bound water (via the protein) to the bound ligand (Chen et al., 2011). The two competing flows lead to opposite signs of NOEs (nuclear Overhauser enhancements) (Chen et al., 2011). The stronger magnetization flow establishes the sign of the waterLOGSY peak. Ligands that bind with the protein will have positive peaks while non-binding compounds will have negative peaks (Fig. 2

B.) WaterLOGSY has shown promise as a potential efficient and accurate experimental validation of function annotations (Chen et al., 2011). While WaterLOGSY is useful for experimentally determining the functionality of proteins when there is knowledge of potential substrate molecules, WaterLOGSY has limited ability to determine functionality when potential substrate molecules are unknown.

Metabolomics is the science of identifying and measuring the pool of metabolites that represents the chemical composition of a biological sample, termed the metabolome. Although metabolomics is a relatively new field in biology (Fiehn et al., 2000; Hall et al., 2002), metabolomic analysis has increasing examples of being successfully employed to determine the metabolic functionality of genes (Hirai et al., 2005; Brauer et al., 2006; Liang et al., 2006;

Schauer and Fernie, 2006). Our main analytical approach utilized targeted and non-targeted metabolomic analysis of small molecules (<600 Da) by gas chromatography–mass spectrometry

(GC-MS) analysis. Some ways GC-MS can identify compounds are based on retention times in the chromatography, indexed relative to a series of standards, the m/z value of the molecular ion 60 for each metabolite, and matching the MS-fragmentation pattern of each to existing MS- databases. Two commonly used strategies for analyzing the functionality of genes using metabolomics are gain of function studies and loss of function studies. Loss of function studies rely on the comparison of a control strain to a strain where a gene function has been removed.

Gain of function metabolomic studies rely on the comparison of a control strain to an experimental strain where a new gene or function has been introduced. We originally attempted to generate directed single gene mutants in MA for loss of function studies. However there were significant technical difficulties that made this approach impractical. As a second choice we utilized a gain of function approach, where selected MA genes were heterologously expressed in

E. coli and were compared to a control strain not containing an MA gene.

Many pathways in MA are incompletely annotated. One pathway with which we have direct experience and interest is the biotin network. Figure 3 shows the biotin biosynthesis pathway with current putative annotations for biotin synthase (MA0154 and/or MA01489), biotin (MA0676), and MA’s only known biotin dependent enzyme, a putative pyruvate carboxylase with subunits: pycA(MA0675) and pycB (MA0674). Notably, the enzymes in the pathway leading up to dethiobiotin (the immediate precursor of biotin) have not been annotated.

In addition, there are two putative biotin synthase genes; MA0154 and MA1489. MA0154 and

MA1489 have 43% and 39% similarity to E. coli biotin synthase and further experimental evidence is required to determine which gene(s) has actual biotin synthase activity. Though the biotin biosynthetic pathway appears to be incomplete, initial observations showed that MA does not require exogenous biotin to sustain normal growth (SOWERS et al., 1984). This observation implies that MA does indeed have a complete but not yet described biotin biosynthetic pathway or a novel situation in which MA does not require biotin to survive. Considering that the biotin 61 network is poorly understood in MA and we have expertise in this pathway, the biotin network was selected as our model pathway. In an effort to elucidate the complete biotin network in MA, we targeted and searched for the missing enzymes in the biotin biosynthetic pathway of MA.

Examining the biotin network using our novel approach to gene annotation allowed us a proof of concept opportunity that also tapped into our experience and interest in the biotin network.

The aim of this study was to demonstrate the effectiveness of this novel combination of waterLOGSY NMR substrate binding assays and metabolomics as an accurate and efficient approach to annotating gene functions. Within the broader objective of evaluation there is an opportunity to discover gene functions, elucidate incomplete pathways, and uncover new biochemical activities. Applying our experimental approaches to genes within our selected target list will provide an excellent venue for examining the effectiveness and accuracy of our method.

Examining the biotin network of MA will allow for the opportunity to apply this combined

NMR-substrate binding and metabolomics approach to elucidate an incomplete pathway.

Furthermore an in-depth study of the biotin network of MA will pursue the elucidation of the

MA biotin biosynthetic pathway. Understanding MA’s biotin biosynthetic pathway will provide significant insight towards determining whether MA has the capacity to synthesize biotin de novo, or if MA has a novel situation where it does not require biotin to sustain growth.

62

MATERIALS AND METHODS

WaterLOGSY ligand screening

WaterLOGSY requires purified protein for analysis. Selected MA genes were heterologously expressed in E. coli and soluble proteins were purified and folding is confirmed via 1D 1H NMR spectroscopy. MA gene cloning, protein purification, and NMR waterLOGSY ligand screening experiments were carried out as previously described (Chen et al., 2011). The selected MA genes were cloned into pET-21a plasmids and these plasmids, carrying cloned MA genes, were used for the metabolomics and genetic complementation assays, as well.

Gene complementation

Genetic complementation of E. coli mutants by the MA genes was performed with E. coli deletion mutants generated in the BW25113 background (Baba et al., 2006). The specific mutant strains used in this study were: JW1122 (icd), JW5807 (leuB), and JW1789 (yeaU), and JW2535

(glyA). Each strain was transformed with a non-recombinant control pET-21a plasmid (Novagen) and a recombinant pET-21a plasmid carrying a cloned MA gene. Complementation of glyA mutants was conducted with glycine as the sole carbon sources(Newman et al., 1976). Genetic complementation experiments were carried out as previously described (Chen et al., 2011).

Total metabolite analysis

The selected MA genes were cloned previously into pET-21a plasmids and these plasmids carrying cloned MA genes were used for the metabolomics assays (Chen et al., 2011).

The pET-21a plasmids containing the select MA genes were transformed into Rosetta 2.0 E. coli cells (Novagen) along with an empty pET-21a control. Two colonies of each genotype were 63 picked and cultures started in lysogeny broth (LB) (BERTANI, 1951) with 50µg/mL each of ampicillin and chloramphenicol. These cultures were then incubated 16 hours at 37°C with constant shaking performed at 250RPM. Cells were then centrifuged down at 2000G and were washed and re-suspended with MOPS minimal media (Teknova, Hollister, CA). Four 200mL

MOPS minimal media with 0.4% glucose cultures were started for each genotype from the overnight inoculums. Cultures started at the same OD600, 0.03. Expression of the genes of interest was induced with 1uL/mL 0.1mM IPTG when each culture reached 0.4 OD600. Four hours after induction the cells were quenched 1:1 with -40°C cold methanol (Buchholz et al., 2001), with each sample containing 25mL of culture. The cells were then centrifuged down at 5500G for 15 minutes at20°C and the supernatant was removed.

Cell pellets were lyophilized overnight and dry masses were recorded (4-15mg). All reagents for metabolite analysis were from Sigma-Aldrich, St. Louis, MO, unless stated differently. 25µL of ribitol was added as an internal standard (0.6 mg/mL in dry-pyridine) as well as 25µL of C19 fatty acid as internal standard (0.6 mg/mL in chloroform) was added to the cell pellets. 0.35mL hot methanol (Thermo Fisher Scientific, Fair Lawn, NJ) (60°C) was added, mixed, and immediately incubated at 60°C for 10 minutes. Samples were then vortexed 10 seconds and put in a sonication water bath for 10 minutes. 0.35mL of chloroform was added and mixed by vortexing for 2 minutes. 0.3mL HPLC grade water was added and vortexed 2 minutes.

Phases were separated by centrifuge, 5 minutes at 13000G. The upper layer (polar) and lower layer (non-polar) were separated, derivatized, and analyzed separately. Derivatization was adapted from (Koek et al., 2006). 200-400µl polar/non-polar extracts were then dried under nitrogen gas. 50µl methoxyamine (20mg/mL in pyridine) was added and was incubated at 30°C while shaking, 1.5 hours. 70µl of bis-trimethyl silyl trifluoroacetamide with 1% 64 trimethylchlorosilane (BSTFA+1% TMCS) was then added (TMS derivatization reagent) and was incubated at 37°C for 30 minutes. Hydrocarbon standards were also spiked into a representative sample for the calculation of retention indices. Samples were then submitted to

GC-MS (gas chromatography–mass spectrometry) analysis. The following GC-MS method was used for the total metabolite GC-MS analysis. All GC-MS data was captured using an Agilent

Technologies Model 6890 Gas Chromatograph with an Agilent 19091S-433: 1926.58224 HP-

5MS 5% Phenyl Methyl Siloxane column -60 °C—325 °C (325 °C): 30 m x 250 µm x 0.25 µm column coupled to a Model 5973 mass selective detector with electrical ionization (EI). The GC-

MS system was controlled by and the data collected with ChemStation software (Agilent

Technologies, Santa Clara, CA). This GC-MS system was made available by the Office of

Biotechnology W.M. Keck Metabolomics Research Laboratory, Iowa State University. The injection port temperature was 280°C with a splitless mode. He was the carrier gas at a flow rate of 2.0 mL/min. The GC oven temperature ramp procedures were as follows: temperature set to

70 °C and held for 4 minutes, ramp 5°C/min to 320°C, hold 12 minutes. MS conditions were as follows: ion source, 200°C; electron energy, 70eV; quadrupole temperature, 150°C; GC-MS interface zone, 280°C; scan range, 50-550 mass units, a 6.7 minute solvent delay was used. Data was analyzed using AMDIS (Automated Mass Spectral Deconvolution and Identification

System; http://chemdata.nist.gov/mass-spc/amdis/) software, with data interpretation from

NIST08 library and quantification with MET-IDEA (Broeckling et al., 2006). The NIST 08 MS

Library and AMDIS were provided by National Institute of Standards and Technology

(www.nist.gov) and MET-IDEA software was obtained from the Samuel Roberts Noble

Foundation (Ardmore, OK). 65

Amino acid analysis

Cell pellets were lyophilized overnight and dry masses were recorded (4-15mg). Dried cell pellets were treated with 0.6mL 10% trichloroacetic acid and mix by vortex for 10 minutes.

The mixture was centrifuged at 13,000G and the supernatant was collected. Amino acid quantification was performed using EZ:faast Amino Acid Analysis Kit (Phenomenex, Torrance,

CA) using the manufacturer’s protocol for GC-MS/FID analysis. All GC-MS data was captured using an Agilent Technologies Model 6890 Gas Chromatograph coupled to a Model 5973 mass selective detector with electrical ionization (EI). The injection port temperature was 280°C with a splitless mode. He was the carrier gas at a flow rate of 2.0 mL/min. The GC oven temperature ramp procedures were as follows: temperature set to 70 °C and held for 4 minutes, ramp 5°C/min to 320°C, hold 12 minutes. MS conditions were as follows: ion source, 200°C; electron energy,

70eV; quadrupole temperature, 150°C; GC-MS interface zone, 280°C; scan range, 50-550 mass units, a 6.7 minute solvent delay was used. The GC-MS system was controlled by, and the data was collected and analyzed with ChemStation software (Agilent Technologies, Santa Clara, CA).

This GC-MS system was made available by the Office of Biotechnology W.M. Keck

Metabolomics Research Laboratory, Iowa State University.

66

RESULTS

Target gene selection

MA is considered metabolically diverse among methanogenic archaea, possessing the largest known archaeal genome at 4524 predicted genes (Galagan et al., 2002). Close to 50% of these genes were annotated as having unknown or unconfirmed function. In preparation for target gene selection, the functional annotations were updated for over 700 of the 4524 predicted genes in the MA genome. This was performed by transferring many of the recently revised manual annotations in the related species M. burtonii (Allen et al., 2009) to homologous genes in

MA. In addition, a thorough literature search was conducted for published data that experimentally confirms the functionality of MA genes and closely related orthologs in other species. Confidence levels were given (Fig. 4) to each re-annotation based on current literature as follows - Level 1: An exact match in the literature with an experimentally defined function.

Level 2: Gene product contains all domains needed for enzymatic function with ≥35% sequence identity to a gene product of known function. Level 3: Gene product contains all domains needed for enzymatic function but ≤35% sequence identity to a gene product of known function. Level

4: Gene product has no experimental match but some domain similarities to a known function are recognizable. Level 5: Has no experimental match or domain similarities – i.e. annotated as hypothetical. This re-annotated list of more than 700 genes provided a platform from which gene targets with varying confidence levels were selected for experimental validation using our approach. Selection criteria were as follows: the two main selection criteria were 1) the protein should have a putative enzymatic activity on a small molecule substrate and 2) the protein should be nonmembranous based on amino acid sequence analysis. Additional characteristics that are preferable but not absolutely required were that the gene product was expressed in vivo in MA 67 based on published reports (Rohlin and Gunsalus, 2010), the gene is conserved among other species, and that an E. coli homolog exists for potential genetic complementation studies. 70 genes within these selected criteria were selected for further analysis. This was conducted as previously described (Chen et al., 2011).

WaterLOGSY ligand screening of MA4265, a putative isocitrate/isopropylmalate dehydrogenase

MA4265 is annotated as an Isocitrate/isopropylmalate dehydrogenase family protein.

Based on this functional annotation, substrates were selected. MA4265 was submitted to one- dimensional waterLOGSY 1H NMR analysis with each of the following putative substrates (300

μM) as follows: citrate, cis-aconitate, isocitrate, and 2-ketoglutarate, each tested with 30 μM

MA4265. Of these four metabolites, only isocitrate bound to MA4265 under the conditions used

(Chen et al., 2011). Note that the negative peak at 3.4 ppm in all spectra is thought to be a contaminant from the protein concentration process.

Genetic complementation results for MA4265

BLASTp results representing E. coli homologs to the MA4265 amino acid sequence

(Table 1.) were used to select the Keio (Baba et al., 2006) mutants of each of three top blast hits: yeaU, leuB, and icd. All three genes were selected for use in genetic complementation studies with MA4265. Genetic complementation experiments (Fig. 6) of yeaU, leuB, and icd mutants in

E. coli strains in the BW25113 (Wt) background revealed that MA4265 was only able to complement the icd mutant cells. Complementation was observed in both liquid and solid media.

This complementation result correlates with the waterLOGSY data (Fig. 5). Icd is an isocitrate dehydrogenase mutant and MA4265 bound to isocitrate, a substrate of isocitrate dehydrogenase. 68

Amino acid profile for MA3520 expressed in Rosetta 2.0 E. coli cells

Amino acid profiling of E.coli cells expressing MA3520 by GC-MS (Fig. 7) revealed that four amino acids had statistically significant changes, alanine (3.1 fold decrease), glycine (1.7 fold decrease), asparagine (1.5 fold decrease), and β-aminoisobutyric acid (1.3 fold decrease).

Glycine and serine are two predicted substrates of this putative serine hydroxymethyltransferase.

Between the two amino acids, only glycine showed a change in accumulation in the MA3520 expressing cells (Fig. 8).

WaterLOGSY ligand screening of MA3520, a putative Serine hydroxymethyltransferase

The MA3520 is annotated as a serine hydroxymethyltransferase and also had changes in the accumulation of amino acids including glycine when expressed in E. coli. Based on this functional annotation, substrates were selected. MA3520 was submitted to one-dimensional waterLOGSY 1H NMR analysis with each of the following putative substrates as follows: glycine (gly), serine (ser), alanine (ala), histidine (his), and methionine (met). Of these five amino acids, only glycine bound to MA3520 under the conditions used (Fig. 9) (Chen et al.,

2011).

Genetic complementation results for MA3520

Based on the amino acid profile of MA3520 expressing cells and substrate binding profile of MA3520, the serine hydroxymethyltransferase (glyA) knockout mutant was selected for use in genetic complementation studies with MA3520. The KEIO (Baba et al., 2006) mutant glyA was obtained and genetic complementation studies performed (Fig. 10). MA3520 was able to complement the glyA mutant cells when they were grown on M9 minimal media with glycine 69 as the sole carbon source. Complementation studies were attempted with serine as the sole carbon source, but no complementation was observed and the Wt cells showed severe growth inhibition as serine can be toxic near the levels required for it to be the sole carbon source (data not shown) (Newman et al., 1976). This complementation result correlates with the waterLOGSY data (Fig. 9) and amino acid analysis (Fig. 8) for MA3520.

Metabolite analysis of MA genes expressed in Rosetta 2.0 E. coli cells

Amino acid and total metabolite analysis was conducted on 21 MA genes. Examples of the total metabolite analysis are in Figures 11 and 12. These plots both represent data from the polar faction of the total metabolite analysis compared to the empty vector control. Figure 11 shows an ideal situation where there are only a few statistically significantly changed metabolites, while Fig. 12 displays a plot where there is an overwhelming amount of statistically significant changed metabolites. Unfortunately, the data generated from this gain of function approach was very rich in metabolite changes. Amino acids were often affected by the expression of the MA genes, on average 27% (SE +/-1%) of the amino acids were changed.

There were on average 987 (SE +/- 103) ion peaks (potential metabolites) detected using the total metabolites profiling method. Of the 987 ion peaks, on average 176 (SE +/- 46) or 18% (SE+/-

5%) had statistically significantly changed compared to the empty vector control. These metabolite changes were consistently multiple and unrelated pathways. Unknown metabolites consisted of 69% (SE+/-11%) of all the ion peaks detected. Table 2. shows the annotation status of the MA genes that were analyzed with metabolite analysis; only in the case of MA3520 was metabolite profiling able to assist directly in the final annotation of a gene’s function. Without consensus in the metabolomics data, the rest of the MA genes had inconclusive results in the evaluation of their putative functions. 70

DISCUSSION

MA4265 displayed isocitrate dehydrogenase activity

MA4265 was originally annotated as isocitrate/isopropylmalate dehydrogenase family protein. When analyzed by NMR-based ligand screening, MA4265 displayed selectivity for and only bound isocitrate and not any other putative substrates of isocitrate/isopropylmalate dehydrogenase family proteins under the conditions tested. Isocitrate is the substrate of isocitrate dehydrogenase. Additionally MA4265 was only able to complement the icd (isocitrate dehydrogenase) E. coli mutant not leuB (3-isopropylmalate dehydrogenase) of yeaU (D-malate dehydrogenase). The NMR and complementation results correlate and together provided the evidence to assign MA4265 a more precise annotation as an isocitrate dehydrogenase. This was an example of the use of NMR-based ligand screening to gain more accurate functional annotation that can be tested, further evaluated, and in this case, confirmed by conventional studies such as genetic complementation.

MA3520 displayed serine hydroxymethyltransferase activity

Amino acid profiling of E.coli cells expressing MA3520 by GC-MS showed that the

MA3520 expressing cells had statistically significant changes in four amino acids, including glycine. Glycine is a putative substrate of serine hydroxymethyltransferase. MA3520 was analyzed with NMR-based waterLOGSY ligand screening and was presented with five amino acids as potential substrates: glycine, serine, alanine, histidine, and methionine. Among these substrates MA3520 only displayed positive binding to glycine. The glyA (serine hydroxymethyltransferase) E. coli mutant was able to be complemented by MA3520, this result was again consistent with the annotated serine hydroxymethyltransferase activity of MA3520. 71

This study of MA3520 was an example of using the combination of NMR substrate binding assays with metabolomics to provide evidence for assigning a more accurate functional annotation. Again, fortunately, we were able to confirm this putative result with a conventional genetic complementation study.

Metabolite analysis of MA genes expressed in Rosetta 2.0 E. coli cells

Amino acid and total metabolite analysis was conducted on 21 MA genes. Of these 21 genes, only in the case of MA3520 was metabolite profiling able to assist directly in the final annotation of a gene’s function. Without consensus in the metabolomics data, the rest of the MA genes had inconclusive results in the evaluation of their putative functions. To determine a potential function of an enzyme from metabolomics data, metabolites correlating to the potential function of the enzyme need to be detected, identified, and have statically significant changes in accumulation compared to controls. If one of these factors is lacking or uncertain it becomes difficult to use metabolite data to assign function. The ability to zero in on the possible function of a putative enzyme using metabolite data was limited due to the overwhelming amount of changing metabolites that were detected (176 peaks changed on average/sample). Without consensus in the data, generating a putative function for an enzyme from metabolomics data becomes increasingly speculative and subjective. The majority of the metabolomics data we generated in this study was significantly complicated by the large amount of statistically significant changes in metabolite accumulation of the MA expressing cells compared to the controls. This was particularly problematic because the majority of the metabolites (69%) were not identified.

72

Trouble shooting

Ideally we would have used single gene knockouts in MA for our metabolomics studies as originally planned. Expressing the gene in E. coli was not the ideal situation for several reasons. A major drawback of using E. coli is that the metabolic pathway the MA gene normally participates in may not be present in E. coli. This could lead to the lack of substrate molecules in the E. coli host for the MA gene to act upon. An additional concern is that the E. coli could maintain tight regulation of the molecules affected by the MA gene by regulating precursors and degrading products. A potential problem and source of error was the use of the IPTG inducible

T7 polymerase to express the MA genes in E. coli. This expression system can produce large quantities of protein even with a low level of induction (Studier, 2005) and this protein accumulation could cause significant changes in growth rates (observed, data not shown) and in metabolite profiles of the cells. The protein’s expression may possibly deplete amino acid pools and cause potential toxic effects related to protein hyperaccumulation (Studier, 2005). The knockout MA mutant strains would have elevated these concerns although we then would face other limitations. One possible major limitation would be that we could only do metabolite profiling on genes that are not essential under our growth conditions. Another possible issue was with the timing of the sample collections. Our protocol was to induce for four hours and then collect the samples. However this method does not take into account that the MA expressing cells growth rates may change depending on the MA gene that was expressed in E. coli cells.

These potential changes in growth rates could even cause the cells to be in different growth phases during collection. Collecting cells at the same growth stage, or OD, has been shown to be more important for controlling metabolite variation (Takahashi et al., 2008; Zech et al., 2009;

Marcinowska et al., 2011). 73

Author contributions in this section

Lucas Showman performed genetic complementation and metabolomics studies. Riley

Brien assisted with metabolomics experiments and analysis. John Orban set up NMR experiments and the small molecule library, interpreted NMR and ITC data, participated in the manual re-annotation. Yihong Chen expressed and purified proteins, helped to acquire NMR data, and performed the calorimetry experiments. Ethel Apolinario cloned MA genes and participated in the manual re-annotation. Libuse Brachova assisted in setting up the complementation and metabolomics studies. Zvi Kelman designed strategies for cloning MA genes and assisted in the manual re-annotation. Zhuo Li cloned MA genes and assisted in the manual re-annotation. Basil J Nikolau supervised the complementation and metabolomics studies. Kevin Sowers coordinated cloning and manual re-annotation.

Acknowledgments for this section

This research is supported by the Office of Science (BER), U.S. Department of Energy,

Grant Number DE-FG02-07ER64502, and an equipment grant from the W. M. Keck Foundation.

We also wish to thank Dr. Dennis Maeder (SAIC-Frederick Inc., NCI) for providing spread- sheets lining up orthologs between M. acetivorans and M. burtonii.

REFERENCES

Allen MA, Lauro FM, Williams TJ, Burg D, Siddiqui KS, De Francisci D, Chong KW,

Pilak O, Chew HH, De Maere MZ, Ting L, Katrib M, Ng C, Sowers KR, Galperin

MY, Anderson IJ, Ivanova N, Dalin E, Martinez M, Lapidus A, Hauser L, Land M,

Thomas T, Cavicchioli R (2009) The genome sequence of the psychrophilic archaeon, 74

Methanococcoides burtonii: the role of genome evolution in cold adaptation. ISME J 3:

1012-1035

Baba T, Ara T, Hasegawa M, Takai Y, Okumura Y, Baba M, Datsenko KA, Tomita M,

Wanner BL, Mori H (2006) Construction of Escherichia coli K-12 in-frame, single-gene

knockout mutants: the Keio collection. Mol Syst Biol 2: 2006.0008

BERTANI G (1951) Studies on lysogenesis. I. The mode of phage liberation by lysogenic

Escherichia coli. J Bacteriol 62: 293-300

Bino RJ, Hall RD, Fiehn O, Kopka J, Saito K, Draper J, Nikolau BJ, Mendes P, Roessner-

Tunali U, Beale MH, Trethewey RN, Lange BM, Wurtele ES, Sumner LW (2004)

Potential of metabolomics as a functional genomics tool. Trends Plant Sci 9: 418-425

Brauer MJ, Yuan J, Bennett BD, Lu W, Kimball E, Botstein D, Rabinowitz JD (2006)

Conservation of the metabolomic response to starvation across two divergent microbes.

Proc Natl Acad Sci U S A 103: 19302-19307

Broeckling CD, Reddy IR, Duran AL, Zhao X, Sumner LW (2006) MET-IDEA: data

extraction tool for mass spectrometry-based metabolomics. Anal Chem 78: 4334-4341

Buchholz A, Takors R, Wandrey C (2001) Quantification of intracellular metabolites in

Escherichia coli K12 using liquid chromatographic-electrospray ionization tandem mass

spectrometric techniques. Anal Biochem 295: 129-137

Chen Y, Apolinario E, Brachova L, Kelman Z, Li Z, Nikolau BJ, Showman L, Sowers K,

Orban J (2011) A nuclear magnetic resonance based approach to accurate functional

annotation of putative enzymes in the methanogen Methanosarcina acetivorans. BMC

Genomics 12 Suppl 1: S7

CoGe (2014) The Place to Compare Genomes. In. genomevolution.org 75

Dalvit C, Flocco M, Veronesi M, Stockman BJ (2002) Fluorine-NMR competition binding

experiments for high-throughput screening of large compound mixtures. Comb Chem

High Throughput Screen 5: 605-611

Dalvit C, Fogliatto G, Stewart A, Veronesi M, Stockman B (2001) WaterLOGSY as a method

for primary NMR screening: practical aspects and range of applicability. J Biomol NMR

21: 349-359

Fiehn O, Kopka J, Dörmann P, Altmann T, Trethewey RN, Willmitzer L (2000) Metabolite

profiling for plant functional genomics. Nat Biotechnol 18: 1157-1161

Fleischmann RD, Adams MD, White O, Clayton RA, Kirkness EF, Kerlavage AR, Bult CJ,

Tomb JF, Dougherty BA, Merrick JM (1995) Whole-genome random sequencing and

assembly of Haemophilus influenzae Rd. Science 269: 496-512

Fridman E, Pichersky E (2005) Metabolomics, genomics, proteomics, and the identification of

enzymes and their substrates and products. Curr Opin Plant Biol 8: 242-248

Förster J, Gombert AK, Nielsen J (2002) A functional genomics approach using metabolomics

and in silico pathway analysis. Biotechnol Bioeng 79: 703-712

Galagan JE, Nusbaum C, Roy A, Endrizzi MG, Macdonald P, FitzHugh W, Calvo S,

Engels R, Smirnov S, Atnoor D, Brown A, Allen N, Naylor J, Stange-Thomann N,

DeArellano K, Johnson R, Linton L, McEwan P, McKernan K, Talamas J, Tirrell

A, Ye W, Zimmer A, Barber RD, Cann I, Graham DE, Grahame DA, Guss AM,

Hedderich R, Ingram-Smith C, Kuettner HC, Krzycki JA, Leigh JA, Li W, Liu J,

Mukhopadhyay B, Reeve JN, Smith K, Springer TA, Umayam LA, White O, White

RH, Conway de Macario E, Ferry JG, Jarrell KF, Jing H, Macario AJ, Paulsen I,

Pritchett M, Sowers KR, Swanson RV, Zinder SH, Lander E, Metcalf WW, Birren 76

B (2002) The genome of M. acetivorans reveals extensive metabolic and physiological

diversity. Genome Res 12: 532-542

Hall R, Beale M, Fiehn O, Hardy N, Sumner L, Bino R (2002) Plant metabolomics: the

missing link in functional genomics strategies. Plant Cell 14: 1437-1440

He Y, Chen Y, Alexander P, Bryan PN, Orban J (2008) NMR structures of two designed

proteins with high sequence identity but different fold and function. Proc Natl Acad Sci

U S A 105: 14412-14417

Hirai MY, Klein M, Fujikawa Y, Yano M, Goodenowe DB, Yamazaki Y, Kanaya S,

Nakamura Y, Kitayama M, Suzuki H, Sakurai N, Shibata D, Tokuhisa J, Reichelt

M, Gershenzon J, Papenbrock J, Saito K (2005) Elucidation of gene-to-gene and

metabolite-to-gene networks in arabidopsis by integration of metabolomics and

transcriptomics. J Biol Chem 280: 25590-25595

Koek MM, Muilwijk B, van der Werf MJ, Hankemeier T (2006) Microbial metabolomics

with gas chromatography/mass spectrometry. Anal Chem 78: 1272-1281

Liang YS, Kim HK, Lefeber AW, Erkelens C, Choi YH, Verpoorte R (2006) Identification

of phenylpropanoids in methyl jasmonate treated Brassica rapa leaves using two-

dimensional nuclear magnetic resonance spectroscopy. J Chromatogr A 1112: 148-155

Longstaff DG, Larue RC, Faust JE, Mahapatra A, Zhang L, Green-Church KB, Krzycki

JA (2007) A natural genetic code expansion cassette enables transmissible biosynthesis

and genetic encoding of pyrrolysine. Proc Natl Acad Sci U S A 104: 1021-1026

Luo X, Yu H (2008) Protein metamorphosis: the two-state behavior of Mad2. Structure 16:

1616-1625 77

Newman EB, Batist G, Fraser J, Isenberg S, Weyman P, Kapoor V (1976) The use of

glycine as nitrogen source by Escherichia coli K12. Biochim Biophys Acta 421: 97-105

Parsons L, Bonander N, Eisenstein E, Gilson M, Kairys V, Orban J (2003) Solution structure

and functional ligand screening of HI0719, a highly conserved protein from bacteria to

humans in the YjgF/YER057c/UK114 family. Biochemistry 42: 80-89

Phelps TJ, Palumbo AV, Beliaev AS (2002) Metabolomics and microarrays for improved

understanding of phenotypic characteristics controlled by both genomics and

environmental constraints. Curr Opin Biotechnol 13: 20-24

Rohlin L, Gunsalus RP (2010) Carbon-dependent control of electron transfer and central

carbon pathway genes for methane biosynthesis in the Archaean, Methanosarcina

acetivorans strain C2A. BMC Microbiol 10: 62

Rother M, Oelgeschlager E, Metcalf W (2007) Genetic and proteomic analyses of CO

utilization by Methanosarcina acetivorans. Archives of Microbiology 188: 463-472

Schauer N, Fernie AR (2006) Plant metabolomics: towards biological function and mechanism.

Trends Plant Sci 11: 508-516

SOWERS K, BARON S, FERRY J (1984) METHANOSARCINA-ACETIVORANS SP-NOV,

AN ACETOTROPHIC METHANE-PRODUCING BACTERIUM ISOLATED FROM

MARINE-SEDIMENTS. Applied and Environmental Microbiology 47: 971-978

Studier FW (2005) Protein production by auto-induction in high density shaking cultures.

Protein Expr Purif 41: 207-234

Swanepoel CC, Loots dT (2014) The use of functional genomics in conjunction with

metabolomics for Mycobacterium tuberculosis research. Dis Markers 2014: 124218 78

Tuinstra RL, Peterson FC, Kutlesa S, Elgin ES, Kron MA, Volkman BF (2008)

Interconversion between two unrelated protein folds in the lymphotactin native state.

Proc Natl Acad Sci U S A 105: 5057-5062

Yeh DC, Parsons LM, Parsons JF, Liu F, Eisenstein E, Orban J (2005) NMR structure of

HI0004, a putative essential gene product from Haemophilus influenzae, and comparison

with the X-ray structure of an Aquifex aeolicus homolog. Protein Sci 14: 424-430

Yelton AP, Thomas BC, Simmons SL, Wilmes P, Zemla A, Thelen MP, Justice N, Banfield

JF (2011) A semi-quantitative, synteny-based method to improve functional predictions

for hypothetical and poorly annotated bacterial and archaeal genes. PLoS Comput Biol 7:

e1002230

79

Target MA genes Select MA genes

Clone selected MA genes from MA strain C2A into pET vectors (IPTG inducible)

Complementation studies Express in E. coli

Metabolite analysis Purify protein

In vitro enzyme assays

Submit soluble protein to ligand screening via NMR Function

Figure 1. Flow diagram of our experimental strategy

Arrows represent the work flow as well as the information flow for determining conditions of further experiments and designating new gene functional annotations.

80

Figure 2. The NMR waterLOGSY Principle. Figure A shows when a ligand is in solution with a protein capable of binding it, there are two competing flows of magnetization: 1) from water to the free ligand and 2) from the bound water (via the protein) to the bound ligand. The two competing flows lead to opposite signs of NOEs, with the stronger flow determining the sign. 1 Example of one dimensional waterLOGSY H NMR spectrum. Positive peaks are indicative of binding while negative peaks represent unbound ligand (B). 81

Figure 3. The MA biotin network. The MA biotin biosynthesis pathway with current putative annotations for biotin synthase (MA0154 and/or MA01489), biotin ligase (MA0676), and MA’S only known biotin dependent enzyme, a putative pyruvate carboxylase with subunits: pycA(MA0675) and pycB (MA0674). Notably, the enzymes in the pathway leading up to dethiobiotin (the immediate precursor of biotin) have not been annotated. In addition, there are two putative biotin synthase genes MA0154 and MA1489. MA0154 and MA1489 have 43% and 39% similarity to E. coli biotin synthase. (Figure from John Orban)

82

Figure 4. Distributions of gene re-annotated results and confidence levels for the selected gene pool. Figure A) is a graph of the distribution of changes in functional annotation among selected MA genes. Graph B) displays the distribution of confidence levels of annotation within selected MA gene. Level 1: An exact match in the literature with an experimentally defined function. Level 2: Gene product contains all domains needed for enzymatic function with ≥35% sequence identity to a gene product of known function. Level 3: Gene product contains all domains needed for enzymatic function but ≤35% sequence identity to a gene product of known function. Level 4: Gene product has no experimental match but some domain similarities to a known function are recognizable. Level 5: Has no experimental match or domain similarities – i.e. annotated as hypothetical. (Figure from Kevin Sowers (Chen et al., 2011))

83

Figure 5. NMR-based ligand screening of MA4265, a putative isocitrate/isopropylmalate dehydrogenase. Part of the metabolic pathway involving isocitrate dehydrogenase functionality is shown at the top of the figure. The relevant section of the one-dimensional waterLOGSY 1H NMR spectrum is shown for each metabolite (300 μM) in the presence of MA4265 (30 μM) as follows: (a) citrate; (b) cis-aconitate; (c) isocitrate; (d) 2-ketoglutarate; (e) control spectrum of MA4265 alone. Peaks due to each compound are labeled with asterisks. Metabolite interaction with MA4265 is represented as positive peaks whereas negative peaks indicate no binding to the protein. Only isocitrate binds to MA4265 under the conditions used. (Chen et al., 2011).

84

Table 1. E. coli homologs of MA4265. Below are the BLASTp results representing E. coli homologs to the MA4265 amino acid sequence. The Keio mutants of each of these three genes; yeaU, leuB, and icd were selected for use in genetic complementation studies with MA4265.

85

Figure 6. Genetic complementation results for MA4265. E. coli strains in the BW25113 (WT) background carrying deletion mutant alleles icd (a), leuB (b), and yeaU (c), were transformed with an empty control vector (EV=pET-21a) or a plasmid expressing MA4265. All strains were grown on M9 minimal media with glucose (or malate in plate c) as the carbon source, and contained 100 μg/mL ampicillin. Plates were incubated at 37°C for 48 hours. For panel (c), D-malate was substituted for glucose as the sole carbon source to examine the growth phenotype of yeaU, which is masked with glucose (Reed et al., 2006). d) The icd mutant strain carrying either the empty control vector ( icd +EV) or the MA4265-expressing vector ( icd +MA4265) and the parental strain (WT) empty control vector (WT+EV) or the MA4265- expressing vector (WT+MA4265) were grown in M9 liquid media with glucose as the carbon source and 100 μg/mL ampicillin. Cultures were incubated at 37°C and OD600 was monitored. Data from the average of three replicates (± standard error) are presented (Chen et al., 2011).

86

Figure 7. Amino acid profile for MA3520 expressed in Rosetta 2.0 E. coli cells. MA3520 was expressed in MOPS minimal liquid media with glucose as the sole carbon source with 100 μg/mL ampicillin and induction for 4 hours with 0.1mM IPTG. The log2 ratio of these cells is compared to the content of an empty vector control. All amino acid amounts were determined by GC-MS. The average of 6 replicates (± standard error) are presented. Statistically significant changes are labelled with orange markers (p-value<0.5). 87

Figure 8. MA3520 Effects on E. coli Glycine and Serine Accumulation. MA3520 was expressed in MOPS minimal liquid media with glucose as the sole carbon source with100 μg/mL ampicillin and induction for 4 hours with 0.1mM IPTG. The control cells were grown under the same conditions with an empty vector control. Amino acid content determined by GC-MS. The bar graphs represent the average of 6 replicates (± standard error) are presented. The observed change in glycine was statistically significant (p-value<0.5).

88

Figure 9. NMR-based ligand screening of MA3520, a putative serine hydroxymethyltransferase. The relevant section of the one-dimensional waterLOGSY 1H NMR spectrum is shown for each metabolite in the presence of MA3520 as follows: glycine (gly), serine (ser), alanine (ala), histidine (his), and methionine(met). Metabolite interaction with MA3520 is represented as positive peaks whereas negative peaks indicate no binding to the protein. Only glyince was positive for binding.

89

Figure 10. Genetic complementation results for MA3520. Growth of isogenic E. coli strains in the BW25113 (WT) background carrying a deletion mutant allele for glyA .The glyA mutant strain carrying either the empty control vector ( icd +EV) or the MA3520-expressing vector ( glyA+MA3520) and the parental strain (WT) empty control vector (WT+EV) were grown in M9 liquid media with glycine as the sole carbon source with100 μg/mL ampicillin and induction with 0.1mM IPTG. Cultures were incubated at 37°C and OD600 was monitored. Data from the average of three replicates (± standard error) are presented.

90

Figure 11. Example of the total polar metabolites data of MA3520 vs EV as a volcano plot. This volcano plot represents the polar total metabolite profile for Z052 (MA3520) expressed in Rosetta 2.0 E. coli cells. Cell carrying the MA3520 were grown in MOPS minimal liquid media with glucose as the sole carbon source with100 μg/mL ampicillin and induction for 4 hours with 0.1mM IPTG. The log2 ratio of these cells is compared to the content of an empty vector control and displayed as the x-coordinate, while the p-value between the MA gene and the control cells is presented in a -Log10 format as the y-coordinate. All metabolite amounts were determined by GC-MS. The data from average of 6 replicates of each MA gene and the EV control are presented. Statistically significant changes are located above the orange shaded bar (p- value<0.5).

91

Figure 12. Example of the total polar metabolites data of MA1272 vs EV as a volcano plot. This volcano plot represents the polar total metabolite profile for Z052 (MA1272) expressed in Rosetta 2.0 E. coli cells. Cells carrying the MA1272 were grown in MOPS minimal liquid media with glucose as the sole carbon source with100 μg/mL ampicillin and induction for 4 hours with 0.1mM IPTG. The log2 ratio of these cells is compared to the content of an empty vector control and displayed as the x-coordinate, while the p-value between the MA gene and the control cells is presented in a -Log10 format as the y-coordinate. All metabolite amounts were determined by GC-MS. The data from average of 6 replicates of each MA gene and the EV control are presented. Statistically significant changes are located above the orange shaded bar (p- value<0.5). 92

Table 2. Annotation status of selected and analyzed MA genes

93

CHAPTER IV. THE BIOTIN NETWORK OF METHANOSARCINA ACETIVORANS: GENE FUNCTION, REGULATION, AND METABOLIC ROLE OF BIOTIN IN A METHANOGENIC ARCHAEON’S METABOLISM

Lucas Showman1, Yihong Chen2, Ethel Apolinario3, Libuse Brachova1, Zvi Kelman2 4,

Zhuo Li2, John Orban2 5*, Kevin Sowers3, and Basil J Nikolau1 6*

Author Affiliations:

1 Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames IA 50011, USA

2 Institute for Bioscience and Biotechnology Research, University of Maryland, 9600 Gudelsky Drive, Rockville MD 20850, USA

3 Department of Marine Biotechnology, University of Maryland Baltimore County, Baltimore MD, USA

4 Department of Cell Biology and Molecular Genetics, University of Maryland College Park, USA

5 Department of Chemistry and Biochemistry, University of Maryland College Park, USA

6 Center for Biorenewable Chemicals, Iowa State University, Ames IA 50011, USA

* Corresponding author: Basil Nikolau [email protected]

94

ABSTRACT

In this study we showed that Methanosarcina acetivorans was able to produce biotinylated pyruvate carboxylase from the biotin vitamers: 7,8-diaminopelargonic acid, dethiobiotin, and biotin. Furthermore the addition of exogenous biotin led to the hypeaccumulation of amino acids, including oxaloacetate derived aspartic acid. The pyruvate carboxylase and birA genes were found to be contained in a single putative birA biotin dependant regulated operon. Methanosarcina acetivorans cultures with no biotin vitamers available were unable to produce biotinylated pyruvate carboxylase, however the growth of these cells was not adversely affected. This demonstrated that neither biotin nor biotinylated proteins are required for the growth of Methanosarcina acetivorans.

95

INTRODUCTION

Biotin (vitamin H) is a water-soluble vitamin that was originally characterized at the beginning of the 20th century (Wildiers, 1901; Kogl and Tonnis, 1936; Du Vigneaud and

Melville, 1942). Biotin is covalently bound to enzymes and acts as a catalytic prosthetic group in carboxylation, decarboxylation, transcarboxylation reactions (KNOWLES, 1989; Nikolau et al.,

2003; Smith et al., 2007; Ding et al., 2012). Biotin is present as an essential enzyme of enzymes important in several metabolic processes including: anaplerosis, lipogenesis, and amino acid metabolism. Appropriately categorized as a vitamin, biotin is utilized by all organisms, though not all species have the ability to synthesize biotin de novo (Bender, 1992).

All higher plants, most bacteria, and some fungi can synthesize biotin, while animals must obtain biotin from their diets. The biotin biosynthetic in genes from model organisms such as E. coli (Eisenberg et al., 1987; Otsuka et al., 1988), B. subtilis (GLOECKLER et al., 1990;

FLORENTIN et al., 1994; Bower et al., 1996), S. cerevisiae (Phalip et al., 1999), and

Arabidopsis (Schneider et al., 1989; SHELLHAMMER and MEINKE, 1990; BALDET et al.,

1993; Weaver et al., 1996; Alban et al., 2000; Picciocchi et al., 2001; Pinon et al., 2005; Muralla et al., 2008; Cobessi et al., 2012) are well defined, while the biotin biosynthetic pathways in the domain of Archaea are poorly understood. Most commonly used media used to culture Archaeal species contain small amounts of biotin for proper growth. Interestingly, many species of

Archaea lack genes with significant homology to the well elucidated biotin biosynthetic genes

(Streit and Entcheva, 2003). However the methanogenic Archaeon M. jannaschii (MJ) possesses a biotin biosynthetic operon that is highly similar to the one found in B. subtilis (Rodionov et al., 96

2002; Streit and Entcheva, 2003). A complete pathway, the biosynthetic gene homologs found in

MJ are sufficient to derive biotin from the conserved precursor pimelic acid.

Methanosarcina acetivorans has six putative biotin network genes (Methanosarcina

Sequencing Project. Broad Institute of MIT and Harvard (www.broad.mit.edu), (Galagan et al.,

2002)). Fig 1. shows a consensus biotin biosynthesis pathway of known microbial biotin biosynthetic genes (Streit and Entcheva, 2003), along with putative annotations for biotin synthase (MA0154 and/or MA01489) and bioY, an ABC type biotin transporter (MA4340)

(Rodionov et al., 2002; Guillén-Navarro et al., 2005; Hebbeln et al., 2007; Finkenwirth et al.,

2013). The other biotin networks genes are biotin ligase (MA0676), and MA’s only known biotin dependent enzyme, a putative pyruvate carboxylase with subunits: pycA (MA0675) and pycB

(MA0674). Notably, the enzymes required for producing dethiobiotin (the penultimate precursor of biotin) have not been annotated. In addition, there are two putative biotin synthase genes

MA0154 and MA1489. MA0154 and MA1489 have 43% and 39% similarity to E. coli biotin synthase and further experimental evidence was required to determine which gene(s) has actual biotin synthase activity. Though the biotin biosynthetic pathway appears to be incomplete, initial observations showed that MA does not require exogenous biotin to sustain normal growth

(SOWERS et al., 1984). This observation implies that MA does indeed have a complete but not yet described biotin biosynthetic pathway or a novel situation in which MA does not require biotin to survive.

97

MATERIALS AND METHODS

Identification of biotin network gene homologs in MA

In preparation for target gene selection for aconcurrent study, the functional annotations were manually updated for over 700 of the 4524 predicted genes in the MA genome. This was performed by transferring many of the recently revised manual annotations in the closely related species M. burtonii (Allen et al., 2009) to homologous genes in MA. Biotin biosynthetic genes were targeted among the 700 selected (Chen et al., 2011) (see chapter III). Also, a thorough literature search was conducted for published data that experimentally confirms the functionality of MA genes and closely related orthologs in other species (Chen et al., 2011). In addition,

BLASTp searches (National Center for Biotechnology Information BLASTp blast.ncbi.nlm.nih.gov/Blast.cgi, (Altschul et al., 1990; Johnson et al., 2008) were conducted searching the MA genome database (taxid:2214) using known biotin network genes as query sequences. The known biotin biosynthetic genes sequences were from the model organisms: E. coli (Eisenberg et al., 1987; Otsuka et al., 1988), B. subtilis (GLOECKLER et al., 1990;

FLORENTIN et al., 1994; Bower et al., 1996), S. cerevisiae (Phalip et al., 1999), and

Arabidopsis (Schneider et al., 1989; SHELLHAMMER and MEINKE, 1990; BALDET et al.,

1993; Weaver et al., 1996; Alban et al., 2000; Picciocchi et al., 2001; Pinon et al., 2005; Muralla et al., 2008; Cobessi et al., 2012) (data not shown)

MA growth conditions

MA cells were grown in Marine media (cite with 80:20 nitrogen gas: carbon dioxide) containing 0.25g/L Na2S, 100mM trimethylamine, trace minerals and vitamins (WOLIN et al.,

1963), 50mM proline, either 8.0pM biotin (Sigma-Aldrich, St. Louis, MO), or 1.0mg/L avidin 98 from egg white (Sigma-Aldrich, St. Louis, MO), or 20pM KAPA (Cayman, Ann Arbor, MI),

DAPA, or Pimelic acid (Sigma-Aldrich, St. Louis, MO), or dethiobiotin (Sigma-Aldrich, St.

Louis, MO). The growth of 10mL cultures was monitored at 550nm with 3 to 12 three replicates for each time point.

Vitamer dependent biotinylation of pyruvate carboxylase subunit B

MA cultures were incubated for 48 hours in the presents of biotin a singal biotin vitamers. The biotinylation status of pyruvate carboxylase subunit B was detected using streptavidin-HRP, avidin with no biotin vitamers (-), pimelic acid (PA), 7-keto-8- aminopelargonic acid (KAPA), 7,8-diaminopelargonic acid (DAPA), dethiobiotin (BTB) and biotin. and subsequently stained with Coomassie Brilliant Blue or subjected to western blot analysis. Western blots were incubated with 0.2µg/mL of streptavidin-HRP (GE Health UK

Limited, Buckinghampshire,UK) and detection achieved by chemilumninescence with ELC western blotting reagents (GE Healthcare Bio-science Corporation, Piscataway, NJ). Western blots were imaged with a ChemiDoc XRS+ imaging system and images were analyzed with

Image Lab Software (Hercules, CA) to quantify abundances.

One dimensional waterLOGSY 1H NMR spectrum candidate BirA MA0676

WaterLOGSY (Dalvit et al., 2001; Chen et al., 2011) experiments were conducted with purified MA0676, purification as previously described (Chen et al., 2011). 1 µM up to 500 µM of the potential substrates: biotin, ATP, lysine and biotinyl-lysine were used The waterLOGSY experiments were conducted as describes previously (Chen et al., 2011). 99

Western blot analysis

Antibodies were created using purified proteins isolated as previously described (Chen et al., 2011). Purified protein samples were submitted to the Hybridoma Facility of the Iowa State

University Office of Biotechnology for the generation of polyclonal antibodies in mice. Sera specific to the birA, pycA, and pycB proteins were generated. These antibodies were used for the western blot detection of native birA, pycA, and pycB. Protein was extracted from cell pellets of cells at mid-log growth stage. Western blot analysis was performed on the biotin - and biotin + protein extracts. Detection was achieved by goat anti-mouse IgG (H+L)-HRP conjugate (Bio-

Rad, Hercules, CA) using chemilumninescence with ELC western blotting reagents (GE

Healthcare Bio-science Corporation, Piscataway, NJ). Western blots were imaged with a

ChemiDoc XRS+ imaging system and images were analyzed with Image Lab Software

(Hercules, CA) to quantify abundances.

Real-time RT-PCR transcriptional analysis

RNA was extracted using the Aurum Total RNA Mini Kit (Bio-Rad, Hercules, CA), and then quantified using NanoDrop ND-1000 (Thermo Fisher Scientific, Waltham, MA) and the quality was observed via agarose gel electrophoresis. RNA was then treated with DNase I

(Invitrogen, Carlsbad, CA) to remove genomic DNA contamination and Superscript First-Strand synthesis system (III) (Invitrogen, Carlsbad, CA) was used to produce cDNA. The probes and primers used in the real-time experiments were obtained from Integrated DNA Technology.

TaqMan Fast Advanced Master Mix (Applied Biosystems, Foster City, CA) was used for all of the real-time experiments using the standard conditions provided in the user manual. Real-time

PCR experiments were performed using a StepOnePlus system (Applied Biosystems, Foster 100

City, CA) at Iowa State University DNA Facility and results were analyzed using the instrument’s supporting software.

Metabolite analysis

Cell pellets were lyophilized overnight and dry masses were recorded (4-10mg). All reagents for metabolite analysis were from Sigma-Aldrich, St. Louis, MO unless stated differently. 25µL of ribitol was added as an internal standard (0.6 mg/mL in dry-pyridine) as well as 25µL of C19 fatty acid as internal standard (0.6 mg/mL in chloroform) was added to the cell pellets. 0.35mL hot methanol (Thermo Fisher Scientific, Fair Lawn, NJ) (60°C) was added, mixed, and immediately incubated at 60°C for 10 minutes. Samples were then vortexed 10 sec and put in sonicating water bath for 10 minutes. 0.35mL of chloroform was added and mixed by vortexing for 2 minutes. 0.3mL HPLC grade water was added and vortexed 2 minutes. Phases were separated by centrifuge, 5 minutes at 13000G. The upper layer (polar) and lower layer

(non-polar) were separated then derivatized and analyzed separately. Derivatization was adapted from (Koek et al., 2006). 200-400µl polar/non-polar extracts were dried under nitrogen gas. 50µl methoxyamine (20mg/mL in pyridine) was added and was incubated at 30°C while shaking, 1.5 hours. 70µl of bis-trimethyl silyl trifluoroacetamide with 1% trimethylchlorosilane (BSTFA+1%

TMCS) was then added (TMS derivatization reagent) and incubated at 37°C for 30 minuites.

Samples were then submitted to GC-MS (gas chromatography–mass spectrometry) analysis. The following GC-MS method was used for the total metabolite GC-MS analysis. All GC-MS data was captured using an Agilent Technologies Model 7890A Gas Chromatograph with an Agilent

19091S-433: 1926.58224 HP-5MS 5% Phenyl Methyl Siloxane column -60 °C—325 °C (325

°C): 30 m x 250 µm x 0.25 µm column coupled to a Model 5975C mass selective detector with electrical ionization (EI). The GC-MS system was control by and the data collected with 101

ChemStation software (Agilent Technologies, Santa Clara, CA). This GC-MS system was made available by the Office of Biotechnology W.M. Keck Metabolomics Research Laboratory, Iowa

State University. The injection port temperature was 280°C with a splitless mode. He was the carrier gas at a flow rate of 1.5 mL/min. The GC oven temperature ramp procedures were as follows: temperature set to 80 °C and held for 2 minutes, ramp #1 8°C/min to 260°C, hold 5 minutes, ramp #2 10°C/min to 320°C, hold 20 minutes. MS conditions were as follows: ion source, 200°C; electron energy, 70eV; quadrupole temperature, 150°C; GC-MS interface zone,

280°C; scan range, 50-550 mass units, a 3.7 minute solvent delay was used. Data was analyzed using AMDIS (Automated Mass Spectral Deconvolution and Identification System; http://chemdata.nist.gov/mass-spc/amdis/) software, with data interpretation from NIST08 library and quantification with MET-IDEA (Broeckling et al., 2006). The NIST 08 MS Library and

AMDIS were provided by National Institute of Standards and Technology (www.nist.gov) and

MET-IDEA software was obtained from the Samuel Roberts Noble Foundation (Ardmore, OK).

Computation identification of the MA biotin network

We searched the MA genome for homologs of known E. coli, Arabidopsis, S. cerevisiae, and B. subtilis biotin biosynthetic genes (BLASTp). MA biotin network genes were also identified as previously described (Chen et al., 2011). Sequences from Methanosarcina barkeri and Methanosarcina mazei were used as query sequences to identify putative BirA binding DNA sequences sites (Rodionov et al., 2002). 102

RESULTS

Vitamer dependent biotinylation of pyruvate carboxylase subunit B

In order to understand the biotin biosynthetic capabilities of MA and the specific biotin precursors MA is capable of utilizing, an exogenous feeding study was conducted. The biotinylation status of pyruvate carboxylase subunit B was monitored (Fig. 2) to indicate ability of MA to form biotin from its known precursors. The negative controls were cultures grown containing the biotin scavenging protein avidin, with no added biotin vitamers. These conditions did not lead to any detectable biotinylation. The addition pimelic acid or 7-keto-8- aminopelargonic acid (KAPA), also did not facilitate biotinylation,while 7,8-diaminopelargonic acid (DAPA), dethiobiotin (BTB) and biotin all led to observable biotinylation.

One dimensional waterLOGSY 1H NMR spectrum candidate BirA MA0676

WaterLOGSY (water-Ligand Observed via Gradient Spectroscopy), relies on competing flows of magnetization which lead to binding manifested as positive peaks and non-binding manifested as negative peaks (Dalvit et al., 2001; Chen et al., 2011). WaterLOGSY 1H NMR experiments were used to test candidate substrates of birA MA0676. Purified recombinant

MA0676 shows positive signal indicating binding of biotin (Fig. 3). Addition of ATP causes a notable change in the signal, most likely due to the formation of biotinyl-AMP. Moreover, the addition of a substrate mimic, lysine, leads to negative biotin signals, consistent with the formation of a dissociated biotinyl-lysine product. Note that control experiments showed that lysine alone does not displace biotin, which could be an alternative explanation of the binding data. Also, a biotinyl-lysine (biocytin) standard did not bind to MA0676 (data not shown). 103

Growth of MA with and without biotin

The growth curves in Figure 4 show the growth of MA cultures with and without biotin.

After 24 hours of incubation, the growth of the biotin containing cultures began to slow and lagged behind the biotin depleted cultures, leading to a reduction of 0.25 OD550 by 48 hours incubation; this reduction was maintained into stationary phase. Samples from the cultures, whose growth was represented in Figure 4, were collected at 24, 48, and 72 hours and were used for subsequent qRT-PCR, western, and metabolomics analysis.

Expression of MA pyruvate carboxylase subunits and birA with and without biotin

Quantitative RT-PCR was used to determine the expression levels of pyruvate carboxylase subunits and birA at three different growth time points: 24, 48, and 72 hours of incubation (Fig. 5A). The MA0674 (pycB), MA0675 (pycA), and MA0676 (birA) which are predicted to be members of a single operon, followed a very similar expression pattern. At 24 hours of growth, the expression level of all three transcripts are elevated in the biotin depleted cultures 12 to 20 fold, later on at 48 and 72 hours the transcripts levels returned to back to near the biotin containing cultures’ levels. Western blot comparison of pyruvate carboxylase subunits and birA of MA with and without biotin showed (Fig. 5B) that all three proteins were hyperaccumulated under biotin depleted conditions (increases estimate at ~10 fold, data not shown). The increased pyruvate carboxylase subunit accumulation at the protein level echoes the qRT-PCR results. Although there was significantly more pycB protein in the biotin depleted cultures there was no detectable biotinylated pycB, in contrast, inthe biotin containing cultures the pycB was clearly biotinylated.

104

BirA binding sequence homology

The putative birA DNA binding sites in the MA genome were identified by their homology to Methanosarcina barkeri (MBA) and Methanosarcina mazei (MMZ) identified previously by their homology to other known birA sites (Rodionov et al., 2002). Two birA DNA binding sites were found in the MA genome. One site is located in the operator region of the pycA–pycB–birA operon and the other is located in the putative operator region of the putative

ABC transporter bioY–cbiO–cbiO–cbiQ operon (Table 1). The MA birA DNA binding sites are identical to those found in MMZ.

Non-targeted metabolite profiling

Total metabolite profiling revealed that MA cells grown in media containing biotin accumulated higher amounts of 85 detected metabolites (indicated by p-values <0.5), in contrast only 8 detected metabolites were found to be reduced in the biotin containing cultures (Fig. 7).

Several of the hyperaccumulated metabolites were identified as amino acids. These amino acids included aspartic acid which exhibited a 5 fold increase and a 7 fold increase in alanine along with increases in serine, glycine, glutamine, and β-alanine (Fig. 7). While an amino acid, asparagines, was among the 8 metabolites reduced in the biotin containing cells.

105

DISCUSSION

We were able to establish that MA has the capacity to synthesize biotin from the vitamers

DAPA and dethiobiotin through an exogenous feeding study. This result was notable because

MA lacks genes with significant homology to known dethiobiotin synthetase genes found in other organisms. While the other biotin vitamers KAPA and pimelic acids failed to be converted into biotin by MA under the conditions tested. This negative result may not be indicative of MA lacking the proper enzymatic activity to process these vitamers to biotin, as the MA cells may lack uptake mechanisms for KAPA and pimelic acid and/or the proper enzymes may not be expressed under the conditions tested. There were initially two biotin synthase homologs found in MA, MA0154 and MA01489. MA0154 is now known to be involved in pyrrolysine biosynthesis and is located in pyrrolysine biosynthesis operon region in MA’s genome

(Longstaff et al., 2007). Additionally MA0154 did not bind dethiobiotin or biotin in waterLOGSY experiments(Chen et al., 2011) (Sp Fig. 3),while MA1489 displayed binding of dethiobiotin in waterLOGSY experiments (Sp Fig. 2). As a result MA1489 is the leading candidate for the biotin synthase activity in MA. MA3250 is the only gene with significant sequence homology to known dethiobiotin synthetases that catalyze the formation of DAPA to dethiobiotin. MA3250 is annotated as a cobyric acid synthase gene (Methanosarcina Sequencing

Project. Broad Institute of MIT and Harvard (www.broad.mit.edu), (Galagan et al., 2002)).

Cobyric aicd sythases are amidotransferases that catalyze the formation of amide groups on four different locations of a signal substrate. Cobyric acid synthase genes are known to have a N- terminal domain that is homologous to known dethiobiotin synthetases and in phosphotransacetylases. The active sites among these enzymes are thought to be remarkably similar (Galperin and Grishin, 2000; Endrizzi et al., 2004). MA3250 NMR waterLOGSY 106 substrate binding experiments indicated the positive binding of DAPA to MA3250 protein (Sp

Fig. 1). This substrate binding and the presence of the N-terminal domain similar to the domain found in dethiobiotin synthetases may indicate the MA3250 may be a dual functional enzyme or the binding could be nonspecific and the domain homology may be indicative of a common mechanism or mode of binding that cobyric acid synthases and dethiobiotin synthase share rather than actual dual enzyme function.

The birA homolog MA0676, contains both a predicted biotin ligase domain as well as a putative DNA binding domain (Rodionov et al., 2002). Not all BirA orthologs possess the N- terminal DNA-binding domain with the HTH motif (D-b-BirA). The presence of D-b-BirA in a genome coincides with the presence of potential BirA DNA binding sites existing upstream of biotin-related genes (Rodionov et al., 2002). When MA0676 was submitted to waterLOGSY

NMR experiments, MA0676 showed binding with biotin and the binding NMR profile was shifted by the second substrate, ATP, possibly indicating the formation of biotin-‘5-AMP intermediate. When in the substrate analog, lysine was presented in combination with biotin and

ATP, the bind signal was lost. Perhaps the MA0676 was able to catalyze the formation of biocytin, causing the release of this product and the loss of binding signals. When taken together, the substrate binding profile of MA0676 was indicative of biotin ligase activity. The pycB, pycA, and birA which were predicted to be members of a single operon (Mukhopadhyay et al.,

2001), followed a very similar expression pattern. At 24 hours of growth the expression level of all three transcripts were elevated in the biotin depleted cells and protein levels of pycB, pycA, and birA were all three hyperaccumulated under biotin depleted conditions. The pycB subunit was only biotinylated in the biotin containing cells. Although there was significantly more pycB protein in the biotin depleted cultures, there was no detectable biotinylated pycB. The expression 107 of these genes matches the regulation of birA of biotin synthetic genes in E. coli. The birA protein monitors the biotinylation status of the biotin carboxyl carrier protein (BCCP) an acetyl- coA carboxylase subunit and reduces the expression of biotin biosynthetic genes once the BCCP protein is biotinylated and no longer a sink for biotin-5’-AMP (Beckett, 2007). In the case of the

MA birA regulated operons, the biotin carrier protein is pycB. Once pycB is biotinylated it would no longer act as a sink for birA bond biotin-5’-AMP (Fig. 6). The concentration of birA bond to biotin-5’-AMP would then build up and this induces the formation homo-dimer complexes of birA bond to biotin-5’-AMP. The homo-dimers of birA bond to biotin-5’-AMP are the regulatory active form of birA. These dimers bind to the birA DNA binding sites (bioO) thereby inhibiting transcription of the genes contained in these operons (Beckett, 2007). The key difference in MA is that the birA gene itself appears to be under its own regulation as part of an operon that has a bioO, birA DNA binding site. In fact we located two birA DNA binding sites in

MA, one with the pycA, pycB, birA genes (Fig. 6) and the other in the operator region of the biotin uptake gene, bioY. This places biotin uptake, pyruvate carboxylase, and biotin ligase activities under the regulation of birA.

The appearance of a measurable growth defect in MA cultures with the addition of biotin was unexpected. The biological relevance of this result in MA’s natural environments is unclear, as this study was only conducted with synthetic marine media. The cause of this growth defect is likely a result of unbalanced growth caused by the hyperaccumulation pyruvate carboxylase produced OAA derived and secondarily derived metabolites (Fig. 8). OAA regenerates the carbon backbone of the anabolic biosynthesis focused TCA cycle in MA. This allows the potential for secondary pathways to have access to an increased amount in TCA intermediates.

This is supported by the significant number of metabolites (85 polar) that were hyper 108 accumulated in the biotin containing cultures. One major direct derivative of OAA is aspartic acid, produced via aspartate aminotransferase (Tomita et al., 2014). In fact aspartic acid was increased to 5-fold of the biotin depleted culture level. Aspartic acid, in addition to being a proteinogenic amino acid, is also an intermediate and precursor in several other pathways

(Mukhopadhyay et al., 2001; Tomita et al., 2014). A direct product of aspartic acid is amino acid

β-alanine, produced via aspartate-decarboxylase, and it was also increased by 6-fold in the biotin containing cultures. Potentially the biotinylated and active pyruvate carboxylase may lead to the funneling of carbon from anaerobic respiration and methane metabolism, and other anabolic pathways, to OAA mediated anabolic biosynthesis pathways and their intermediates and products. This metabolic shift could have led to unbalanced growth and the growth phenotype observed with the cultures incubated in the synthetic marine media with biotin.

The question arises that if biotin is a vitamin that has been shown to be required without exception in previously studied organisms (Schneider et al., 1989; Bender, 1992; Cobessi et al.,

2012), how is MA able to thrive without it? The answer is twofold. First, because MA uses iosprenoid based membrane lipids, not fatty acids, and consequently there is no biotin dependent acetyl-CoA carboxylase homolog found in MA. Acetyl-CoA carboxylase is normally required for fatty acid production in other organisms (Bailey et al., 1995; Ke et al., 2000; Rodríguez et al.,

2001; Kode et al., 2005; Kurth et al., 2009). Second, there is likely a biotin independent pyruvate carboxylase bypass pathway in MA (Fig. 8). The combination of phosphoenolpyruvate synthase and phosphoenolpyruvate carboxylase enzymes can form OAA from pyruvate with phosphoenolpyruvate as an intermediate (Galagan et al., 2002; Ettema et al., 2004).

109

REFERENCES

Alban C, Job D, Douce R (2000) Biotin metabolism in plants. Annual Review of Plant

Physiology and Plant Molecular Biology 51: 17-47

Allen MA, Lauro FM, Williams TJ, Burg D, Siddiqui KS, De Francisci D, Chong KW,

Pilak O, Chew HH, De Maere MZ, Ting L, Katrib M, Ng C, Sowers KR, Galperin

MY, Anderson IJ, Ivanova N, Dalin E, Martinez M, Lapidus A, Hauser L, Land M,

Thomas T, Cavicchioli R (2009) The genome sequence of the psychrophilic archaeon,

Methanococcoides burtonii: the role of genome evolution in cold adaptation. ISME J 3:

1012-1035

Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search

tool. J Mol Biol 215: 403-410

Bailey A, Keon J, Owen J, Hargreaves J (1995) The ACC1 gene, encoding acetyl-CoA

carboxylase, is essential for growth in Ustilago maydis. Mol Gen Genet 249: 191-201

BALDET P, GERBLING H, AXIOTIS S, DOUCE R (1993) BIOTIN BIOSYNTHESIS IN

HIGHER-PLANT CELLS - IDENTIFICATION OF INTERMEDIATES. European

Journal of Biochemistry 217: 479-485

Beckett D (2007) Biotin sensing: Universal influence of biotin status on transcription. Annual

Review of Genetics 41: 443-464

Bender DA (1992) Nutrition and Biochemistry of the Vitamins. Cambridge University Press,

New York, NY, USA

Bower S, Perkins JB, Yocum RR, Howitt CL, Rahaim P, Pero J (1996) Cloning, sequencing,

and characterization of the Bacillus subtilis biotin biosynthetic operon. J Bacteriol 178:

4122-4130 110

Broeckling CD, Reddy IR, Duran AL, Zhao X, Sumner LW (2006) MET-IDEA: data

extraction tool for mass spectrometry-based metabolomics. Anal Chem 78: 4334-4341

Chen Y, Apolinario E, Brachova L, Kelman Z, Li Z, Nikolau BJ, Showman L, Sowers K,

Orban J (2011) A nuclear magnetic resonance based approach to accurate functional

annotation of putative enzymes in the methanogen Methanosarcina acetivorans. BMC

Genomics 12 Suppl 1: S7

Cobessi D, Dumas R, Pautre V, Meinguet C, Ferrer JL, Alban C (2012) Biochemical and

structural characterization of the Arabidopsis bifunctional enzyme dethiobiotin

synthetase-diaminopelargonic acid aminotransferase: evidence for substrate channeling in

biotin synthesis. Plant Cell 24: 1608-1625

Dalvit C, Fogliatto G, Stewart A, Veronesi M, Stockman B (2001) WaterLOGSY as a method

for primary NMR screening: practical aspects and range of applicability. J Biomol NMR

21: 349-359

Ding G, Che P, Ilarslan H, Wurtele ES, Nikolau BJ (2012) Genetic dissection of

methylcrotonyl CoA carboxylase indicates a complex role for mitochondrial leucine

catabolism during seed development and germination. Plant J 70: 562-577

Du Vigneaud V, Melville D (1942) The structure of biotin: A study of desthiobiotin. Journal of

Biological Chemistry 146: 0475-0485

Eisenberg M, Neidhardt F, Ingraham J, Low K, Magasanik B, Schaechter M, Umbarger M

(1987) Biosynthesis of biotin and lipoic acid in Escherichia coli and Salmonella

thyphimurium. Cellular and Molecular Biology: American Society of Microbiology, New

York: 544-550 111

Endrizzi JA, Kim H, Anderson PM, Baldwin EP (2004) Crystal structure of Escherichia coli

cytidine triphosphate synthetase, a nucleotide-regulated glutamine

amidotransferase/ATP-dependent amidoligase fusion protein and homologue of

anticancer and antiparasitic drug targets. Biochemistry 43: 6447-6463

Ettema TJ, Makarova KS, Jellema GL, Gierman HJ, Koonin EV, Huynen MA, de Vos

WM, van der Oost J (2004) Identification and functional verification of archaeal-type

phosphoenolpyruvate carboxylase, a missing link in archaeal central carbohydrate

metabolism. J Bacteriol 186: 7754-7762

Finkenwirth F, Kirsch F, Eitinger T (2013) Solitary BioY proteins mediate biotin transport

into recombinant Escherichia coli. J Bacteriol 195: 4105-4111

FLORENTIN D, BUI B, MARQUET A, OHSHIRO T, IZUMI Y (1994) ON THE

MECHANISM OF BIOTIN SYNTHASE OF BACILLUS-SPHAERICUS. Comptes

Rendus De L Academie Des Sciences Serie Iii-Sciences De La Vie-Life Sciences 317:

485-488

Galagan JE, Nusbaum C, Roy A, Endrizzi MG, Macdonald P, FitzHugh W, Calvo S,

Engels R, Smirnov S, Atnoor D, Brown A, Allen N, Naylor J, Stange-Thomann N,

DeArellano K, Johnson R, Linton L, McEwan P, McKernan K, Talamas J, Tirrell

A, Ye W, Zimmer A, Barber RD, Cann I, Graham DE, Grahame DA, Guss AM,

Hedderich R, Ingram-Smith C, Kuettner HC, Krzycki JA, Leigh JA, Li W, Liu J,

Mukhopadhyay B, Reeve JN, Smith K, Springer TA, Umayam LA, White O, White

RH, Conway de Macario E, Ferry JG, Jarrell KF, Jing H, Macario AJ, Paulsen I,

Pritchett M, Sowers KR, Swanson RV, Zinder SH, Lander E, Metcalf WW, Birren 112

B (2002) The genome of M. acetivorans reveals extensive metabolic and physiological

diversity. Genome Res 12: 532-542

Galperin MY, Grishin NV (2000) The synthetase domains of cobalamin biosynthesis

amidotransferases cobB and cobQ belong to a new family of ATP-dependent

amidoligases, related to dethiobiotin synthetase. Proteins 41: 238-247

GLOECKLER R, OHSAWA I, SPECK D, LEDOUX C, BERNARD S, ZINSIUS M,

VILLEVAL D, KISOU T, KAMOGAWA K, LEMOINE Y (1990) CLONING AND

CHARACTERIZATION OF THE BACILLUS-SPHAERICUS GENES-

CONTROLLING THE BIOCONVERSION OF PIMELATE INTO DETHIOBIOTIN.

Gene 87: 63-70

Guillén-Navarro K, Araíza G, García-de los Santos A, Mora Y, Dunn MF (2005) The

Rhizobium etli bioMNY operon is involved in biotin transport. FEMS Microbiol Lett

250: 209-219

Hebbeln P, Rodionov DA, Alfandega A, Eitinger T (2007) Biotin uptake in prokaryotes by

solute transporters with an optional ATP-binding cassette-containing module. Proc Natl

Acad Sci U S A 104: 2909-2914

Johnson M, Zaretskaya I, Raytselis Y, Merezhuk Y, McGinnis S, Madden TL (2008) NCBI

BLAST: a better web interface. Nucleic Acids Res 36: W5-9

Ke J, Wen TN, Nikolau BJ, Wurtele ES (2000) Coordinate regulation of the nuclear and

plastidic genes coding for the subunits of the heteromeric acetyl-coenzyme A

carboxylase. Plant Physiol 122: 1057-1071

KNOWLES J (1989) THE MECHANISM OF BIOTIN-DEPENDENT ENZYMES. Annual

Review of Biochemistry 58: 195-221 113

Kode V, Mudd EA, Iamtham S, Day A (2005) The tobacco plastid accD gene is essential and

is required for leaf development. Plant J 44: 237-244

Koek MM, Muilwijk B, van der Werf MJ, Hankemeier T (2006) Microbial metabolomics

with gas chromatography/mass spectrometry. Anal Chem 78: 1272-1281

Kogl F, Tonnis B (1936) Concerning Bio-problems. Representations of crystallized Biotin in

egg yolk. 20. Communication concerning plant developments materials. Hoppe-Seylers

Zeitschrift Fur Physiologische Chemie 242: 43-73

Kurth DG, Gago GM, de la Iglesia A, Bazet Lyonnet B, Lin TW, Morbidoni HR, Tsai SC,

Gramajo H (2009) ACCase 6 is the essential acetyl-CoA carboxylase involved in fatty

acid and mycolic acid biosynthesis in mycobacteria. Microbiology 155: 2664-2675

Longstaff DG, Larue RC, Faust JE, Mahapatra A, Zhang L, Green-Church KB, Krzycki

JA (2007) A natural genetic code expansion cassette enables transmissible biosynthesis

and genetic encoding of pyrrolysine. Proc Natl Acad Sci U S A 104: 1021-1026

Mukhopadhyay B, Purwantini E, Kreder CL, Wolfe RS (2001) Oxaloacetate synthesis in the

methanarchaeon Methanosarcina barkeri: pyruvate carboxylase genes and a putative

Escherichia coli-type bifunctional biotin protein ligase gene (bpl/birA) exhibit a unique

organization. J Bacteriol 183: 3804-3810

Muralla R, Chen E, Sweeney C, Gray J, Dickerman A, Nikolau B, Meinke D (2008) A

bifunctional locus (BIO3-BIO1) required for biotin biosynthesis in Arabidopsis. Plant

Physiology 146: 60-73

Nikolau BJ, Ohlrogge JB, Wurtele ES (2003) Plant biotin-containing carboxylases. Arch

Biochem Biophys 414: 211-222 114

Otsuka AJ, Buoncristiani MR, Howard PK, Flamm J, Johnson C, Yamamoto R, Uchida K,

Cook C, Ruppert J, Matsuzaki J (1988) The Escherichia coli biotin biosynthetic

enzyme sequences predicted from the nucleotide sequence of the bio operon. J Biol Chem

263: 19577-19585

Phalip V, Kuhn I, Lemoine Y, Jeltsch J (1999) Characterization of the biotin biosynthesis

pathway in Saccharomyces cerevisiae and evidence for a cluster containing BI05, a novel

gene involved in vitamer uptake. Gene 232: 43-51

Picciocchi A, Douce R, Alban C (2001) Biochemical characterization of the Arabidopsis biotin

synthase reaction. The importance of mitochondria in biotin synthesis. Plant Physiology

127: 1224-1233

Pinon V, Ravanel S, Douce R, Alban C (2005) Biotin synthesis in plants. The first committed

step of the pathway is catalyzed by a cytosolic 7-keto-8-aminopelargonic acid synthase.

Plant Physiology 139: 1666-1676

Rodionov D, Mironov A, Gelfand M (2002) Conservation of the biotin regulon and the BirA

regulatory signal in eubacteria and archaea. Genome Research 12: 1507-1516

Rodionov DA, Mironov AA, Gelfand MS (2002) Conservation of the biotin regulon and the

BirA regulatory signal in Eubacteria and Archaea. Genome Res 12: 1507-1516

Rodríguez E, Banchio C, Diacovich L, Bibb MJ, Gramajo H (2001) Role of an essential acyl

coenzyme A carboxylase in the primary and secondary metabolism of Streptomyces

coelicolor A3(2). Appl Environ Microbiol 67: 4166-4176

Schneider T, Dinkins R, Robinson K, Shellhammer J, Meinke DW (1989) An embryo-lethal

mutant of Arabidopsis thaliana is a biotin auxotroph. Dev Biol 131: 161-167 115

SHELLHAMMER J, MEINKE D (1990) ARRESTED EMBRYOS FROM THE BIO1

AUXOTROPH OF ARABIDOPSIS-THALIANA CONTAIN REDUCED LEVELS OF

BIOTIN. Plant Physiology 93: 1162-1167

Smith AG, Croft MT, Moulin M, Webb ME (2007) Plants need their vitamins too. Curr Opin

Plant Biol 10: 266-275

SOWERS K, BARON S, FERRY J (1984) METHANOSARCINA-ACETIVORANS SP-NOV,

AN ACETOTROPHIC METHANE-PRODUCING BACTERIUM ISOLATED FROM

MARINE-SEDIMENTS. Applied and Environmental Microbiology 47: 971-978

Streit WR, Entcheva P (2003) Biotin in microbes, the genes involved in its biosynthesis, its

biochemical role and perspectives for biotechnological production. Appl Microbiol

Biotechnol 61: 21-31

Tomita H, Yokooji Y, Ishibashi T, Imanaka T, Atomi H (2014) An archaeal glutamate

decarboxylase homolog functions as an aspartate decarboxylase and is involved in β-

alanine and coenzyme A biosynthesis. J Bacteriol 196: 1222-1230

Weaver LM, Yu F, Wurtele ES, Nikolau BJ (1996) Characterization of the cDNA and gene

coding for the biotin synthase of Arabidopsis thaliana. Plant Physiol 110: 1021-1028

Wildiers E (1901) Nouvelle substance indispensable au developpement de la levure. La Cellule

18: 313-316

WOLIN EA, WOLIN MJ, WOLFE RS (1963) FORMATION OF METHANE BY

BACTERIAL EXTRACTS. J Biol Chem 238: 2882-2886

116

Figure 1. Biotin biosynthetic pathway

Displayed is the combined gram-positive and gram-negative bacterial biotin biosynthetic pathway, with MA genes label in red. Biotin biosynthetic pathways from plants, fungi, and Achaea are also thought to contain the last five compounds in the pathway. (Figure adapted from W. R. Streit and P. Entcheva, 2003)

117

Figure 2. Vitamer Dependant Biotinylation of Pyruvate Carboxylase Subunit B.

The biotinylation status of pyruvate carboxylase subunit B. detected using streptavidin-HRP, avidin with no biotin vitamers (-), pimelic acid (PA), 7-keto-8-aminopelargonic acid (KAPA), 7,8-diaminopelargonic acid (DAPA) , dethiobiotin (DTB), and biotin.

118

Figure 3. One Dimensional waterLOGSY 1H NMR Spectrum Candidate BirA MA0676.

One dimensional waterLOGSY 1H NMR spectrum was used to test candidate substrates of BirA MA0676. When biotin is added to purified recombinant MA0676 a positive signal is observed indicating binding. The reference spectrum is a biotin spectrum that is inverted to show a positive binding control spectrum (a). The following conditions were tested: biotin with MA0676 protein (b), biotin and ATP with MA0676 protein (c), and biotin, ATP, lysine with MA0676 (d) (Figure from John Orban).

119

Figure 4. Growth of MA cultures in +/- biotin conditions

MA cells were grown in Marine synthetic media containing either 8.0pM biotin, or 1.0mg/L avidin. The growth of 10mL cultures was monitored at 550nm absorbance for 72 hours.

120

Table 1. BirA Binding Sequence Homology

Shown above are the known E.coli birA binding site and operon along with the homologous putative birA binding sites found in and the associated genes forming candidate operons found in Methanosarcina . Methanosarcina barkeri (MBA) and Methanosarcina mazei (MMZ). $ denotes the location birA binding sites. The number inside the parenthesis (#) indicates the number of base pairs in-between the two sections of the binding sites.

121

Figure 6. Diagram of Novel birA Self Regulation

Proposed MA birA mediated biotin ligation and gene regulation pathway based on the homologous pathway found in E. coli (Adapted from Beckett, 2007). In the presence of biotin, if apoPYCB is present in significant quantities, the unrepressed bioO state is preferred, while once the apoPYCB is depleted, the repressed state is favored.

122

Figure 7. Volcano Plot of Polar Metabolites

This plot distributes the detected metabolites by relative fold changes and by statistical significance. Metabolites above the orange shaded box are considered to have statistically significantly changed (p-value<0.05) between the biotin + and biotin - conditions. Data points and the right side of the y-axis are elevated in the biotin + condition, while molecules on the left side of the y-axis are reduced in the biotin + condition.

123

Figure 8. Putative Metabolic Map Showing Affects of Pyruvate Carboxylase Activity

Red arrows indicate metabolite changes in the biotin containing MA cultures.

Abv. Enzyme name, MA gene number pyC: pyruvate carboxylase MA0674, MA0675

ACDS: acetyl-CoA decarbonylase/synthase MA1011, MA1012, MA1014 MA3862, MA3864, MA3865 pyOR: pyruvate ferredoxin MA0031, MA0032, Ma0033, MA0034, MA2407 pps: phosphoenolpyruvate synthase MA2458, MA2667, MA3408

AAT: aspartate aminotransferase MA0636, MA1385, MA1819 asdA: L-aspartate 4-carboxy- ?

ADC: aspartate-decarboxylase MA1949 124

(Figure 8. Continued)

Alt: Alanine transaminase ?

CO2: maybe from bicarbonate

H2N-R maybe on glutamate and donated to alpha keto gluterate

PEPC Phosphoenolpyruvate carboxylase MA2690

Gene annotations assisted by KEGG (Kanehisa Laboratories, JP: Kanehisa, M.; "Post-genome Informatics", Oxford University Press (2000).) http://www.genome.jp/kegg- bin/show_pathway?mac00680

125

Supplemental Table 1. Primers and probes used in this study

126

Supplemental Figure 1. One Dimensional waterLOGSY 1H NMR Spectrum Candidate DTB synthase MA3250

One dimensional waterLOGSY 1H NMR spectrum was used to test candidate substrates of the candidate DTB synthase MA3250. A positive signal indicates binding. MA3250 protein was evaluated under the following conditions: MA3250 only with no biotin vitamers, pimelic acid, 7- keto-8-aminopelargonic acid (KAPA), 7,8-diaminopelargonic acid (DAPA), dethiobiotin, and biotin. Only DAPA had a spectrum indicating binding. (Figure from John Orban)

127

Supplemental Figure 2. One dimensional waterLOGSY 1H NMR spectrum candidate biotin synthase MA1489

One dimensional waterLOGSY 1H NMR spectrum was used to test candidate substrates of the candidate biotin synthase MA1489. A positive signal indicates binding. MA1489 protein was evaluated under the following conditions: MA1489 only with no biotin vitamers, pimelic acid (pimelate), 7-keto-8-aminopelargonic acid (KAPA), 7,8-diaminopelargonic acid (DAPA), dethiobiotin, and biotin . Biotin and dethiobiotin have spectra indicating binding. (Figure from John Orban)

128

Supplemental Figure 3. One dimensional waterLOGSY 1H NMR spectrum candidate biotin synthase MA0154

One dimensional waterLOGSY 1H NMR spectrum was used to test candidate substrates of the candidate biotin synthase MA0154. A positive signal indicates binding. MA0154 protein was evaluated under the following conditions: MA0154 with dethiobiotin, or MA0154 with biotin. Biotin and dethiobiotin had spectra indicating no detectable binding under these conditions (Figure from John Orban).

129

CHAPTER V. GENERAL CONCLUSIONS

Metabolomics studies of ΔfabH (KASII) assisted in the development of the hypothesis that fabF

(KASII) was involved in the survival of ΔfabH E. coli cells. This hypothesis was tested by generating and evaluating ΔfabF/ ΔfabH double mutants. Through the studies of the ΔfabF/ ΔfabH double mutant, it was revealed that the KASI, fabB gene was able to support E. coli cell growth as the sole KAS enzyme present. Additionally it was determined that fabF is responsible for the hyper-accumulation of cis- vaccenic acid observed in ΔfabH mutants.

The second study focused on a functional genomics effort to develop a gene-function annotation method based on metabolomics and NMR ligand binding assays. This study had mixed results. The metabolomics effort in this study was met with great technical hurdles. Only in a single case (MA3250) was metabolomics useful in annotating a gene’s functions. While the

NMR substrate binding assays were able to provide useful data for more accurate gene functional annotations in more than 10% of the genes examined.

The third study presented was an in depth study of the biotin network of Methanosarcina acetivorans, where metabolic changes associated with the presence and absence of biotin in

Methanosarcina acetivorans cultures were investigated. In this study metabolomics studies were able to detect metabolic changes that were associated with the presents or absence of biotin. In this study it was also revealed that MA does not require biotin or any detectable biotinylated protein to survive.

130

APPENDIX

This appendix presents the individual figures from the metabolomics data described in chapter 3. Both total metabolite and amino acid data are presented. Methods are on pages 68-71 and discussion on pages 77.

131

Figure 1. The total metabolite data of MA0589 vs EV as volcano plots.

These volcano plots represent the non-polar (A) and polar (B) total metabolite profile for the specified MA gene expressed in Rosetta 2.0 E. coli cells. Cells carrying the specified MA gene were grown in MOPS minimal liquid media with glucose as the sole carbon source with100 μg/mL ampicillin and induction for 4 hours with 0.1mM IPTG. The log2 ratio of these cells is compared to the content of an empty vector control and displayed as the x-coordinate, while the p-value between the MA gene and the control cells is presented in a -Log10 format as the y- coordinate. All metabolite amounts were determined by GC-MS. The data from average of 6 replicates of each MA gene and the EV (empty vector) control are presented. Statistically significant changes are located above the orange shaded bar (p-value<0.5). 132

Figure 2. The total metabolite data of MA2489 vs EV as volcano plots.

These volcano plots represent the non-polar (A) and polar (B) total metabolite profile for the specified MA gene expressed in Rosetta 2.0 E. coli cells. Cells carrying the specified MA gene were grown in MOPS minimal liquid media with glucose as the sole carbon source with100 μg/mL ampicillin and induction for 4 hours with 0.1mM IPTG. The log2 ratio of these cells is compared to the content of an empty vector control and displayed as the x-coordinate, while the p-value between the MA gene and the control cells is presented in a -Log10 format as the y- coordinate. All metabolite amounts were determined by GC-MS. The data from average of 6 replicates of each MA gene and the EV (empty vector) control are presented. Statistically significant changes are located above the orange shaded bar (p-value<0.5). 133

Figure 3. The total metabolite data of MA2715 vs EV as volcano plots.

These volcano plots represent the non-polar (A) and polar (B) total metabolite profile for the specified MA gene expressed in Rosetta 2.0 E. coli cells. Cells carrying the specified MA gene were grown in MOPS minimal liquid media with glucose as the sole carbon source with100 μg/mL ampicillin and induction for 4 hours with 0.1mM IPTG. The log2 ratio of these cells is compared to the content of an empty vector control and displayed as the x-coordinate, while the p-value between the MA gene and the control cells is presented in a -Log10 format as the y- coordinate. All metabolite amounts were determined by GC-MS. The data from average of 6 replicates of each MA gene and the EV (empty vector) control are presented. Statistically significant changes are located above the orange shaded bar (p-value<0.5).

134

Figure 4. The total metabolite data of MA3994 vs EV as volcano plots.

These volcano plots represent the non-polar (A) and polar (B) total metabolite profile for the specified MA gene expressed in Rosetta 2.0 E. coli cells. Cells carrying the specified MA gene were grown in MOPS minimal liquid media with glucose as the sole carbon source with100 μg/mL ampicillin and induction for 4 hours with 0.1mM IPTG. The log2 ratio of these cells is compared to the content of an empty vector control and displayed as the x-coordinate, while the p-value between the MA gene and the control cells is presented in a -Log10 format as the y- coordinate. All metabolite amounts were determined by GC-MS. The data from average of 6 replicates of each MA gene and the EV (empty vector) control are presented. Statistically significant changes are located above the orange shaded bar (p-value<0.5).

135

Figure 5. The total metabolite data of MA2859 vs EV as volcano plots.

These volcano plots represent the non-polar (A) and polar (B) total metabolite profile for the specified MA gene expressed in Rosetta 2.0 E. coli cells. Cells carrying the specified MA gene were grown in MOPS minimal liquid media with glucose as the sole carbon source with100 μg/mL ampicillin and induction for 4 hours with 0.1mM IPTG. The log2 ratio of these cells is compared to the content of an empty vector control and displayed as the x-coordinate, while the p-value between the MA gene and the control cells is presented in a -Log10 format as the y- coordinate. All metabolite amounts were determined by GC-MS. The data from average of 6 replicates of each MA gene and the EV (empty vector) control are presented. Statistically significant changes are located above the orange shaded bar (p-value<0.5). 136

Figure 6. The total metabolite data of MA3520 vs EV as volcano plots.

These volcano plots represent the non-polar (A) and polar (B) total metabolite profile for the specified MA gene expressed in Rosetta 2.0 E. coli cells. Cells carrying the specified MA gene were grown in MOPS minimal liquid media with glucose as the sole carbon source with100 μg/mL ampicillin and induction for 4 hours with 0.1mM IPTG. The log2 ratio of these cells is compared to the content of an empty vector control and displayed as the x-coordinate, while the p-value between the MA gene and the control cells is presented in a -Log10 format as the y- coordinate. All metabolite amounts were determined by GC-MS. The data from average of 6 replicates of each MA gene and the EV (empty vector) control are presented. Statistically significant changes are located above the orange shaded bar (p-value<0.5).

137

Figure 7. The total metabolite data of MA0589 vs EV as volcano plots.

These volcano plots represent the non-polar (A) and polar (B) total metabolite profile for the specified MA gene expressed in Rosetta 2.0 E. coli cells. Cells carrying the specified MA gene were grown in MOPS minimal liquid media with glucose as the sole carbon source with100 μg/mL ampicillin and induction for 4 hours with 0.1mM IPTG. The log2 ratio of these cells is compared to the content of an empty vector control and displayed as the x-coordinate, while the p-value between the MA gene and the control cells is presented in a -Log10 format as the y- coordinate. All metabolite amounts were determined by GC-MS. The data from average of 6 replicates of each MA gene and the EV (empty vector) control are presented. Statistically significant changes are located above the orange shaded bar (p-value<0.5).

138

Figure 8. The total metabolite data of MA0674 vs EV as volcano plots.

These volcano plots represent the non-polar (A) and polar (B) total metabolite profile for the specified MA gene expressed in Rosetta 2.0 E. coli cells. Cells carrying the specified MA gene were grown in MOPS minimal liquid media with glucose as the sole carbon source with100 μg/mL ampicillin and induction for 4 hours with 0.1mM IPTG. The log2 ratio of these cells is compared to the content of an empty vector control and displayed as the x-coordinate, while the p-value between the MA gene and the control cells is presented in a -Log10 format as the y- coordinate. All metabolite amounts were determined by GC-MS. The data from average of 6 replicates of each MA gene and the EV (empty vector) control are presented. Statistically significant changes are located above the orange shaded bar (p-value<0.5). 139

Figure 9. The total metabolite data of MA4460 vs EV as volcano plots.

These volcano plots represent the non-polar (A) and polar (B) total metabolite profile for the specified MA gene expressed in Rosetta 2.0 E. coli cells. Cells carrying the specified MA gene were grown in MOPS minimal liquid media with glucose as the sole carbon source with100 μg/mL ampicillin and induction for 4 hours with 0.1mM IPTG. The log2 ratio of these cells is compared to the content of an empty vector control and displayed as the x-coordinate, while the p-value between the MA gene and the control cells is presented in a -Log10 format as the y- coordinate. All metabolite amounts were determined by GC-MS. The data from average of 6 replicates of each MA gene and the EV (empty vector) control are presented. Statistically significant changes are located above the orange shaded bar (p-value<0.5).

140

Figure 11. The total metabolite data of MA3751 vs EV as volcano plots.

These volcano plots represent the non-polar (A) and polar (B) total metabolite profile for the specified MA gene expressed in Rosetta 2.0 E. coli cells. Cells carrying the specified MA gene were grown in MOPS minimal liquid media with glucose as the sole carbon source with100 μg/mL ampicillin and induction for 4 hours with 0.1mM IPTG. The log2 ratio of these cells is compared to the content of an empty vector control and displayed as the x-coordinate, while the p-value between the MA gene and the control cells is presented in a -Log10 format as the y- coordinate. All metabolite amounts were determined by GC-MS. The data from average of 6 replicates of each MA gene and the EV (empty vector) control are presented. Statistically significant changes are located above the orange shaded bar (p-value<0.5).

141

Figure 12. The total metabolite data of MA589 vs EV as volcano plots.

These volcano plots represent the non-polar (A) and polar (B) total metabolite profile for the specified MA gene expressed in Rosetta 2.0 E. coli cells. Cells carrying the specified MA gene were grown in MOPS minimal liquid media with glucose as the sole carbon source with100 μg/mL ampicillin and induction for 4 hours with 0.1mM IPTG. The log2 ratio of these cells is compared to the content of an empty vector control and displayed as the x-coordinate, while the p-value between the MA gene and the control cells is presented in a -Log10 format as the y- coordinate. All metabolite amounts were determined by GC-MS. The data from average of 6 replicates of each MA gene and the EV (empty vector) control are presented. Statistically significant changes are located above the orange shaded bar (p-value<0.5).

142

Figure 13. The total metabolite data of MA4459 vs EV as volcano plots.

These volcano plots represent the non-polar (A) and polar (B) total metabolite profile for the specified MA gene expressed in Rosetta 2.0 E. coli cells. Cells carrying the specified MA gene were grown in MOPS minimal liquid media with glucose as the sole carbon source with100 μg/mL ampicillin and induction for 4 hours with 0.1mM IPTG. The log2 ratio of these cells is compared to the content of an empty vector control and displayed as the x-coordinate, while the p-value between the MA gene and the control cells is presented in a -Log10 format as the y- coordinate. All metabolite amounts were determined by GC-MS. The data from average of 6 replicates of each MA gene and the EV (empty vector) control are presented. Statistically significant changes are located above the orange shaded bar (p-value<0.5).

143

Figure 14. The total metabolite data of MA3734 vs EV as volcano plots.

These volcano plots represent the non-polar (A) and polar (B) total metabolite profile for the specified MA gene expressed in Rosetta 2.0 E. coli cells. Cells carrying the specified MA gene were grown in MOPS minimal liquid media with glucose as the sole carbon source with100 μg/mL ampicillin and induction for 4 hours with 0.1mM IPTG. The log2 ratio of these cells is compared to the content of an empty vector control and displayed as the x-coordinate, while the p-value between the MA gene and the control cells is presented in a -Log10 format as the y- coordinate. All metabolite amounts were determined by GC-MS. The data from average of 6 replicates of each MA gene and the EV (empty vector) control are presented. Statistically significant changes are located above the orange shaded bar (p-value<0.5). 144

Figure 15. The total metabolite data of MA1491 vs EV as volcano plots.

These volcano plots represent the non-polar (A) and polar (B) total metabolite profile for the specified MA gene expressed in Rosetta 2.0 E. coli cells. Cells carrying the specified MA gene were grown in MOPS minimal liquid media with glucose as the sole carbon source with100 μg/mL ampicillin and induction for 4 hours with 0.1mM IPTG. The log2 ratio of these cells is compared to the content of an empty vector control and displayed as the x-coordinate, while the p-value between the MA gene and the control cells is presented in a -Log10 format as the y- coordinate. All metabolite amounts were determined by GC-MS. The data from average of 6 replicates of each MA gene and the EV (empty vector) control are presented. Statistically significant changes are located above the orange shaded bar (p-value<0.5).

145

Figure 16. The total metabolite data of MA3342 vs EV as volcano plots.

These volcano plots represent the non-polar (A) and polar (B) total metabolite profile for the specified MA gene expressed in Rosetta 2.0 E. coli cells. Cells carrying the specified MA gene were grown in MOPS minimal liquid media with glucose as the sole carbon source with100 μg/mL ampicillin and induction for 4 hours with 0.1mM IPTG. The log2 ratio of these cells is compared to the content of an empty vector control and displayed as the x-coordinate, while the p-value between the MA gene and the control cells is presented in a -Log10 format as the y- coordinate. All metabolite amounts were determined by GC-MS. The data from average of 6 replicates of each MA gene and the EV (empty vector) control are presented. Statistically significant changes are located above the orange shaded bar (p-value<0.5). 146

Figure 17. The total metabolite data of MA3994 vs EV as volcano plots.

These volcano plots represent the non-polar (A) and polar (B) total metabolite profile for the specified MA gene expressed in Rosetta 2.0 E. coli cells. Cells carrying the specified MA gene were grown in MOPS minimal liquid media with glucose as the sole carbon source with100 μg/mL ampicillin and induction for 4 hours with 0.1mM IPTG. The log2 ratio of these cells is compared to the content of an empty vector control and displayed as the x-coordinate, while the p-value between the MA gene and the control cells is presented in a -Log10 format as the y- coordinate. All metabolite amounts were determined by GC-MS. The data from average of 6 replicates of each MA gene and the EV (empty vector) control are presented. Statistically significant changes are located above the orange shaded bar (p-value<0.5). 147

Figure 18. The total metabolite data of MA0202 vs EV as volcano plots.

These volcano plots represent the non-polar (A) and polar (B) total metabolite profile for the specified MA gene expressed in Rosetta 2.0 E. coli cells. Cells carrying the specified MA gene were grown in MOPS minimal liquid media with glucose as the sole carbon source with100 μg/mL ampicillin and induction for 4 hours with 0.1mM IPTG. The log2 ratio of these cells is compared to the content of an empty vector control and displayed as the x-coordinate, while the p-value between the MA gene and the control cells is presented in a -Log10 format as the y- coordinate. All metabolite amounts were determined by GC-MS. The data from average of 6 replicates of each MA gene and the EV (empty vector) control are presented. Statistically significant changes are located above the orange shaded bar (p-value<0.5). 148

Figure 19. The total metabolite data of MA4460 vs EV as volcano plots.

These volcano plots represent the non-polar (A) and polar (B) total metabolite profile for the specified MA gene expressed in Rosetta 2.0 E. coli cells. Cells carrying the specified MA gene were grown in MOPS minimal liquid media with glucose as the sole carbon source with100 μg/mL ampicillin and induction for 4 hours with 0.1mM IPTG. The log2 ratio of these cells is compared to the content of an empty vector control and displayed as the x-coordinate, while the p-value between the MA gene and the control cells is presented in a -Log10 format as the y- coordinate. All metabolite amounts were determined by GC-MS. The data from average of 6 replicates of each MA gene and the EV (empty vector) control are presented. Statistically significant changes are located above the orange shaded bar (p-value<0.5).

149

Figure 20. The total metabolite data of MA0677 vs EV as volcano plots.

These volcano plots represent the non-polar (A) and polar (B) total metabolite profile for the specified MA gene expressed in Rosetta 2.0 E. coli cells. Cells carrying the specified MA gene were grown in MOPS minimal liquid media with glucose as the sole carbon source with100 μg/mL ampicillin and induction for 4 hours with 0.1mM IPTG. The log2 ratio of these cells is compared to the content of an empty vector control and displayed as the x-coordinate, while the p-value between the MA gene and the control cells is presented in a -Log10 format as the y- coordinate. All metabolite amounts were determined by GC-MS. The data from average of 6 replicates of each MA gene and the EV (empty vector) control are presented. Statistically significant changes are located above the orange shaded bar (p-value<0.5).

150

Figure 21. The total metabolite data of MA1272 vs EV as volcano plots.

These volcano plots represent the non-polar (A) and polar (B) total metabolite profile for the specified MA gene expressed in Rosetta 2.0 E. coli cells. Cells carrying the specified MA gene were grown in MOPS minimal liquid media with glucose as the sole carbon source with100 μg/mL ampicillin and induction for 4 hours with 0.1mM IPTG. The log2 ratio of these cells is compared to the content of an empty vector control and displayed as the x-coordinate, while the p-value between the MA gene and the control cells is presented in a -Log10 format as the y- coordinate. All metabolite amounts were determined by GC-MS. The data from average of 6 replicates of each MA gene and the EV (empty vector) control are presented. Statistically significant changes are located above the orange shaded bar (p-value<0.5). 151

Figure 22. The total metabolite data of MA#### vs EV as volcano plots.

These volcano plots represent the non-polar (A) and polar (B) total metabolite profile for the specified MA gene expressed in Rosetta 2.0 E. coli cells. Cells carrying the specified MA gene were grown in MOPS minimal liquid media with glucose as the sole carbon source with100 μg/mL ampicillin and induction for 4 hours with 0.1mM IPTG. The log2 ratio of these cells is compared to the content of an empty vector control and displayed as the x-coordinate, while the p-value between the MA gene and the control cells is presented in a -Log10 format as the y- coordinate. All metabolite amounts were determined by GC-MS. The data from average of 6 replicates of each MA gene and the EV (empty vector) control are presented. Statistically significant changes are located above the orange shaded bar (p-value<0.5).

152

Figure 23. The total metabolite data of MA0106 vs EV as volcano plots.

These volcano plots represent the non-polar (A) and polar (B) total metabolite profile for the specified MA gene expressed in Rosetta 2.0 E. coli cells. Cells carrying the specified MA gene were grown in MOPS minimal liquid media with glucose as the sole carbon source with100 μg/mL ampicillin and induction for 4 hours with 0.1mM IPTG. The log2 ratio of these cells is compared to the content of an empty vector control and displayed as the x-coordinate, while the p-value between the MA gene and the control cells is presented in a -Log10 format as the y-coordinate. All metabolite amounts were determined by GC-MS. The data from average of 6 replicates of each MA gene and the EV (empty vector) control are presented. Statistically significant changes are located above the orange shaded bar.

153

Table 1. Amino acid abbreviations

Compound Name Abbreviation α-aminobutyric acid AABA Allo-Isoleucine aILE

L-Alanine ALA Alpha-aminopimelic acid_P APA

L-Asparagine ASN L-Aspartic acid ASP

Beta-Aminoisobutyric acid BAIB L-Cystine C-C Cystathionine CHT

Glutamine_P GLN L-Glutamic acid GLU

Glycine GLY Glycine-Proline GPR

L-Histidine HIS Hydroxylysine HYL

4-Hydroxyproline HYP

L-Isoleucine ILE

L-Leucine LEU

L-Lysine LYS

L-Methionine MET

L-Ornithine ORN

L-Phenylalanine PHE Proline-hydroxyproline PHP

L-Proline PRO

Thioproline PRS

Sarcosine SAR

L-Serine SER

L-Threonine THR

L-Tryptophan TRP

L-Tyrosine TYR

L-alpha-Aminoadipic acid UN1

L-valine VAL

154

Figure 24. The amino acid quantification data of MA0202, MA2489, MA4264 vs EV as volcano plots.

This volcano plot represents the amino acid profiles for the specified MA genes expressed in Rosetta 2.0 E. coli cells. Cells carrying the specified MA genes were grown in MOPS minimal liquid media with glucose as the sole carbon source with100 μg/mL ampicillin and induction for 4 hours with 0.1mM IPTG. The log2 ratio of these cells is compared to the content of an empty vector control and displayed as the x-coordinate, while the p-value between the MA gene and the control cells is presented in a -Log10 format as the y-coordinate. All metabolite amounts were determined by GC-MS. The data from average of 6 replicates of each MA gene and the EV (empty vector) control are presented. Statistically significant changes are located above the orange shaded bar (p-value<0.5).

155

Figure 25. The amino acid quantification data of MA3520, MA3994, MA0246, and MA4460 vs EV as volcano plots.

This volcano plot represents the amino acid profiles for the specified MA genes expressed in Rosetta 2.0 E. coli cells. Cells carrying the specified MA genes were grown in MOPS minimal liquid media with glucose as the sole carbon source with100 μg/mL ampicillin and induction for 4 hours with 0.1mM IPTG. The log2 ratio of these cells is compared to the content of an empty vector control and displayed as the x-coordinate, while the p-value between the MA gene and the control cells is presented in a -Log10 format as the y-coordinate. All metabolite amounts were determined by GC-MS. The data from average of 6 replicates of each MA gene and the EV (empty vector) control are presented. Statistically significant changes are located above the orange shaded bar (p-value<0.5).

156

Figure 26. The amino acid quantification data of MA3751, MA4459, and MA2715 vs EV as volcano plots.

This volcano plot represents the amino acid profiles for the specified MA genes expressed in Rosetta 2.0 E. coli cells. Cells carrying the specified MA genes were grown in MOPS minimal liquid media with glucose as the sole carbon source with100 μg/mL ampicillin and induction for 4 hours with 0.1mM IPTG. The log2 ratio of these cells is compared to the content of an empty vector control and displayed as the x-coordinate, while the p-value between the MA gene and the control cells is presented in a -Log10 format as the y-coordinate. All metabolite amounts were determined by GC-MS. The data from average of 6 replicates of each MA gene and the EV (empty vector) control are presented. Statistically significant changes are located above the orange shaded bar (p-value<0.5).

157

Figure 27. The amino acid quantification data of MA2489, MA3734, and MA1491, and MA0589 vs EV as volcano plots.

This volcano plot represents the amino acid profiles for the specified MA genes expressed in Rosetta 2.0 E. coli cells. Cells carrying the specified MA genes were grown in MOPS minimal liquid media with glucose as the sole carbon source with100 μg/mL ampicillin and induction for 4 hours with 0.1mM IPTG. The log2 ratio of these cells is compared to the content of an empty vector control and displayed as the x-coordinate, while the p-value between the MA gene and the control cells is presented in a -Log10 format as the y-coordinate. All metabolite amounts were determined by GC-MS. The data from average of 6 replicates of each MA gene and the EV (empty vector) control are presented. Statistically significant changes are located above the orange shaded bar (p-value<0.5).

158

Figure 28. The amino acid quantification data of MA3242, MA1272, and MA3342, and MA0202 vs EV as volcano plots.

This volcano plot represents the amino acid profiles for the specified MA genes expressed in Rosetta 2.0 E. coli cells. Cells carrying the specified MA genes were grown in MOPS minimal liquid media with glucose as the sole carbon source with100 μg/mL ampicillin and induction for 4 hours with 0.1mM IPTG. The log2 ratio of these cells is compared to the content of an empty vector control and displayed as the x-coordinate, while the p-value between the MA gene and the control cells is presented in a -Log10 format as the y-coordinate. All metabolite amounts were determined by GC-MS. The data from average of 6 replicates of each MA gene and the EV (empty vector) control are presented. Statistically significant changes are located above the orange shaded bar (p-value<0.5).

159

Figure 29. The amino acid quantification data of MA0677, MA4460, and MA1615, and MA0106 vs EV as volcano plots.

This volcano plot represents the amino acid profiles for the specified MA genes expressed in Rosetta 2.0 E. coli cells. Cells carrying the specified MA genes were grown in MOPS minimal liquid media with glucose as the sole carbon source with100 μg/mL ampicillin and induction for 4 hours with 0.1mM IPTG. The log2 ratio of these cells is compared to the content of an empty vector control and displayed as the x-coordinate, while the p-value between the MA gene and the control cells is presented in a -Log10 format as the y-coordinate. All metabolite amounts were determined by GC-MS. The data from average of 6 replicates of each MA gene and the EV (empty vector) control are presented. Statistically significant changes are located above the orange shaded bar (p-value<0.5).