US008349587B2

(12) United States Patent (10) Patent No.: US 8,349,587 B2 Fischer et al. (45) Date of Patent: Jan. 8, 2013

(54) METHODS AND SYSTEMS FOR Alber et al., “Propionyl-Coenzyme a Synthase from Chloroflexus CHEMOAUTOTROPHC PRODUCTION OF aurantiacus, as Key EnSyme of the 3-Hydroxypropionate Cycle for ORGANIC COMPOUNDS Autotrophic CO. Fixation.” The Journal of Biological Chemistry. 277(14): 12137-12143 (2002). (75) Inventors: Curt R. Fischer, Cambridge, MA (US); Andersen et al., “New Unstable Variants of Green Fluorescent Pro Austin J. Che, Cambridge, MA (US); tein for Studies of Transient Gene Expression in .” Applied Reshma P. Shetty, Boston, MA (US); and Environmental Microbiology, 64(6): 2240-2246 (1998). Jason R. Kelly, Cambridge, MA (US) Anderson et al., “Environmental Signal Integration by a Modular AND Gate.” Molecular Systems Biology, 3:133 (2007). (73) Assignee: Ginkgo BioWorks, Inc., Boston, MA Aoshima et al., “A Novel Enzyme, Citryl-CoA Synthetase, Catalys ing the First Step of the Citrate Cleavage Reaction in (US) Hydrogenobacter thermophilus TK-6.” Molecular Microbiology. (*) Notice: Subject to any disclaimer, the term of this 52(3): 751-761 (2004). patent is extended or adjusted under 35 Aoshima et al., “A Novel Enzyme, Citryl-CoA Lyase, Catalysing the Second Step of the Citrate Cleavage Reaction in Hydrogenobacter U.S.C. 154(b) by 0 days. thermophilus TK-6.” Molecular Microbiology. 52(3): 763-770 (2004). (21) Appl. No.: 13/285,919 Aoshima et al., “A Novel Biotin Protein Required for Reductive Carboxylation of 2-Oxoglutarate by Isocitrate Dehydrogenase in (22) Filed: Oct. 31, 2011 Hydrogenobacter thermophilus TK-6.” Molecular Microbiology. 51(3): 791-798 (2004). (65) Prior Publication Data Aoshima et al., “A Novel Oxalosuccinate-Forming Enzyme Involved in the Reductive Carboxylation of 2-Oxoglutarate in US 2012/0064622 A1 Mar. 15, 2012 Hydrogenobacter thermophilus TK-6.” Molecular Microbiology. 62(3): 748-759 (2006). (51) Int. Cl. Babaet al., “Construction of Escherichia coli K-12 In-Frame, Single CI2P I/O (2006.01) Gene Knockout Mutants: The Keio Collection.” Molecular Systems Biology. 2006.2008 (2006). (52) U.S. Cl...... 435/41 Bai et al., “Ethanol Fermentation Technologies from Sugar and (58) Field of Classification Search ...... None Starch Feedstocks.” Biotechnology Advances. 26(1): 89-105 (2008). See application file for complete search history. Bailer et al., “Determination of Saponifiable Glycerol in Bio-Die sel.” Fresenius' Journal of Analytical Chemistry. 340(3): 186 (56) References Cited (1991). Bar-Even et al., “Design and Analysis of Synthetic Carbon Fixation U.S. PATENT DOCUMENTS Pathways.” Proc. Natl. Acad. Sci. USA. 107(19): 8889-8894 (2010). 1,746,464 A 2f1930 Fischer Bassham et al., “The Path of Carbon in . XXI. The 7,118,896 B2 10/2006 Kalscheuer et al. Cyclic Regeneration of Carbon Dioxide Acceptor.” Radiation Labo 7,256,016 B2 8, 2007 San et al. ratory and Department of Chemistry, University of California, Ber 7,803,589 B2 9, 2010 Burk et al. keley. 76:1760-1770 (1954). 7,923,227 B2 4/2011 Hickey et al. Bayer et al., “Synthesis of Methyl Halides from Biomass. Using 7,981,647 B2 7/2011 Berry et al. Engineered Microbes.” JACS. 131(18): 6508-6515 (2009). 2007,0264688 A1 11/2007 Venter et al. Berrios-Rivera et al., “The Effect of the NAPRTase Overexpression 2007,0269.862 A1 11/2007 Glass et al. on the Total Levels of NAD, The NADH/NAD" Ratio, and the Dis 2010/0228067 A1 9, 2010 Peterson et al. tribution of Metabolites in Escherichia coli,” 4(3): 238-247 (2002). Brugna-Guiral et al., “NiFe Hydrogenases from the FOREIGN PATENT DOCUMENTS Hyperthermophilic Bacterium Aquifex aeolicus: Properties, Func WO WO 2007/041872 4/2007 tion, and Phylogenetics.” . 7(2): 145-157 (2003). WO WO 2007/139925 12/2007 Buchanan et al., “A Reverse KREBS Cycle in Photosynthesis: Con WO WO 2007/140339 12/2007 sensus at Last.” Photosynthesis Research. 24: 47-53 (1990). WO WO 2009,154753 12/2009 Burgdorfetal. “The Soluble NAD+-Reducing NiFe-Hydrogenase WO WO 2010/028262 3, 2010 from Ralstonia eutropha H16 Consists of Six Subunits and can be WO WO 2010/042197 4/2010 Specifically Activated by NADPH,” Journal of Bacteriology, 187(9): WO WO 2010/07O295 6, 2010 3.122-3132 (2005). WO WO 2011/028264 3, 2011 Camilliet al., “Bacterial Small-Molecule Signaling Pathways.” Sci WO WO 2011/088425 * 7/2011 ence. 311(5794): 1113-1116 (2006). OTHER PUBLICATIONS (Continued) Bar-Even et al., “Design and analysis of synthetic carbon fixation Primary Examiner — Suzanne M Noakes pathways”, PNAS, May 11, 2010, 107(19):8889-8894.* (74) Attorney, Agent, or Firm — Jennifer A. Camacho; Fang Huan-Hsu et al., “Addition of Autotrophic Carbon Fixation Pathways Xie; Greenberg Traurig, LLP to Increase the Theoretical Heterotrophic Yield of Acetate'. The Fourth International Conference on Computational Systems Biology (57) ABSTRACT (ISB2010), Suzhou, China, Sep. 9-11, 2010, pp. 314-322.* The present disclosure identifies pathways, mechanisms, sys Abelson et al., “Carbon Isotope Fractionation in Formation of Amino tems and methods to confer chemoautotrophic production of Acids by Photosynthetic Organisms.” Proc. Natl. Acad. Sci. USA. carbon-based products of interest, such as Sugars, alcohols, 47(5): 623-632 (1961). chemicals, amino acids, polymers, fatty acids and their Aharoni et al., “Identification of the SAAT Gene Involved in Straw derivatives, hydrocarbons, isoprenoids, and intermediates berry Flavor Biogenesis by Use of DNA Microarrays.” The Plant thereof, in organisms such that these organisms efficiently . 12: 647-661 (2000). convertinorganic carbon to organic carbon-based products of Alber et al., “Malonyl-Coenzyme A Reductase in the Modified interest using inorganic energy, Such as formate, and in par 3-Hydroxypropionate Cycle for Autotrophic Carbon Fixation in ticular the use of organisms for the commercial production of Archaeal Metallosphaera and Sulfolobus spp.” Journal of Bacteriol various carbon-based products of interest. ogy. 188(24): 8551-8559 (2006). 19 Claims, 29 Drawing Sheets US 8,349,587 B2 Page 2

OTHER PUBLICATIONS Ferenci et al., “Purification and Properties of 3-Hexulose Phosphate Campbell et al., "Growth and Phylogenetic Properties of Novel Bac Synthase and Phospho-3-Hexuloisomerase from Methylococcus teria Belonging to the Epsilon Subdivision of the Preteobacteria capsulatus.” Biochem. J. 144(3): 477-486 (1974). Enriched from and Deep-Sea Hydrothermal Fogel et al., “Prokaryotic Genome Size and SSU rDNA Copy Num Vents.” Applied and Environmental Microbiology. 67(10): 4566 ber: Estimation of Microbial Relative Abundance from a Mixed 4572 (2001). Population.” Microbial Ecology. 38(2): 93-113 (1999). Campbell et al., "Adaptations to Submarine Hydrothermal Environ Fong et al., "Predicting Specificity in b/IP Coiled-Coil Protein Inter ments Exemplified by the Genome of Nautilia profindicola.” PLoS actions.” Genome Biology. 5(2): R11.1-R 11.10 (2004). Genetics. 5(2): e1000362 (2009). Canton et al., “Refinement and Standardization of Synthetic Biologi Fong et al., "Metabolic Gene-Deletion Strains of Escherichia coli cal Parts and Devices.” Nature Biotechnology. 26(7): 787-793 Evolve to Computationally Predicted Growth Phenotypes.” Nature (2008). Genetics. 36(10): 1056-1058 (2004). Cheesbrough et al., “Alkane Biosynthesis by Decarbonylation of Friedmann et al., “Properties of Succinyl-Coenzyme A: L-Malate Aldehydes Catalyzed by a Particulate Preparation from Pisum Coenzyme A Transferase and Its Role in the Autotrophic sativum.” Proc. Natl. Acad. Sci. USA. 81 (21): 6613-6617 (1984). 3-Hydroxypropionate Cycle of ,” Journal of Chen et al., "Biosynthesis of Ansatrienin (Mycotrienin) and Bacteriology. 188(7): 2646-2655 (2006). Naphthomycin Identification and Analysis of Two Separate Friedmann et al., “Properties of R-Citramalyl-Coenzyme A Lyase Biosynthetic Gene Clusters in Streptomyces collinus Ti 1892.” Euro and Its Rold in the Autotrophic 3-Hydroxypropionate Cycle of pean Journal of Biochemistry. 261(1): 98-107 (1999). Chloroflexus aurantiacus,” Journal of Bacteriology, 189(7): 2906 Chin et al., “Improved NADPH Supply for Xylitol Production by 2914 (2007). Engineered Escherichia coli with Glycolytic Mutations.” Gehring and Arnon. "Purification and Properties of O-Ketoglutarate Biotechnol. Prog. 27(2): 333-341 (2011). Synthase from a Photosynthetic Bacterium.” The Journal of Biologi Cline, Joel. "Spectrophotometric Determination of Hydrogen Sulfide cal Chemistry, 247(21): 6963-6969 (1972). in Natural Waters.” Limnology and Oceanography, 14(3): 454-458 Gerhold et al., “DNA Chips: Promising Toys Have Become Powerful (1969). Tools.” Trends in Biochemical Science. 24(5):168-173 (1999). Cropp et al., “Identification of a Cyclohexylcarbonyl CoA Grantham et al., "Codon Catalog Usage and the Genome Hypoth Biosynthetic Gene Cluster and Application in the Production of esis.” Nucleid Acids Research. 8(1): r49-ro2 (1980). Doramectin.” Nature Biotechnology. 18(9): 980-983 (2000). Greene et al., "Artificially Evolved Synechococcus PCC6301 Davis et al., “Design, Construction and Characterization of a Set of Rubisco Variants Exhibit Improvements in Folding and Catalytic Insulated Bacterial Promoters.” Nucleic Acids Research. doi:10. Efficiency.” Biochem. J. 404(3):571-524 (2007). 1093 (2010). Griesbeck et al., "Biological Sulfide Oxidation: Sulfide-Quinone Dellomonaco et al., “Engineered Reversal of the f-oxidation Cycle Reductase (SQR), the Primary Reaction.” Recent Research Develop for the Synthesis of Fuels and Chemicals.” Nature. 476(7360): 355 ments in Microbiology. 4:129-203 (2000). 361 (2011). Gul-Karaguler et al., "A Single Mutation in the NAD-specific Dennis et al., "Alkane Biosynthesis by Decarbonylation of Aldehyde Formate Dehydrogenase from Candida methylica Allows the Catalyzed by a Microsomal Preparation from Botryococcus Enzyme to Use NADP” Biotechnology Letters. 23(4):283-287 brraunii.” Archives of Biochemistry and Biophysics. 287(2): 268 (2001). 275 (1991). Gutteridge et al., “Expression of Bacterial Rubisco Genes in Denoya et al., “A Second Branched-Chain O-Keto Acid Escherichia coli.” Phil. Trans. R. Soc. Lond. B. 313:433-445 (1986). Dehydrogenase Gene Cluster (bkdFGH) from Streptomyces Han and Reynolds. "A Novel Alternative Anaplerotic Pathway to the avermitilis: Its Relationship to Avermectin Biosynthesis and the Con Glyoxylate Cycle in Streptomycetes,” Journal of Bacteriology. struction of a bka F. Mutant Suitable for the Production of Novel 179(16): 5157-5164 (1997). Antiparasitic Avermectins,” Journal of Bacteriology, 177(12): 3504 Hatrongjit and Packdibamrung. “A Novel NADP'-dependent 3511 (1995). Formate Dehydrogenase from Burkholderia stabilis 15516: Screen Deshpande, Mukund, "Ethanol Production from Cellulose by ing, Purification and Characterization.” Enzyme and Microbial Tech Coupled Saccharification/Fermentation Using Saccharomyces nology, 46(7):557-561 (2010). cerevisiae and Cellulase Complex from Sclerotium rolfsii UV-8 Hawkley and McClure. “Compilation and Analysis of Escherichia Mutant.” Applied Biochemistry and Biotechnology, 36(3): 227-234 coli Promoter DNA Sequences.” Nucleic Acids Research. 11 (8): (1992). 2237-2255 (1983). Dettman et al., "Controls on the Stable Isotope Composition of Sea Hayes, John M. "Fractionation of Carbon and Hydrogem Isotopes in sonal Growth Bands in Aragonitic Fresh-Water Bivalves Biosynthetic Processes.” Rev. Mineral. Geochem. 43(1):225-277 (Unionidae).” Geochimica et Cosmochimica Acta. 63(7/8): 1049 (2001). 1057 (1999). Helling and Kukora. “Nalidixic Acid-Resistant Mutants of Edgar, Roger C. "MUSCLE: A Multiple Sequence Alignment Escherichia coli Deficient in Isocitrate Dehydrogenase.” Journal of Method with Reduced Time and Space Complexity.” BMC Bacteriology. 105(3): 1224-1226 (1971). Bioinformatics. 5: 113 (2004). Henry et al., "Genome-Scale Thermodynamic Analysis of Edgar, Roger C., "MUSCLE: Multiple Sequence Alignment with Escherichia coli .” Biophysical Journal. 90(4): 1453 High Accuracy and High Throughput.” Nucleic Acids Research. 1461 (2006). 32(5): 1792-1797 (2004). Henstra et al., “Microbiology of Synthesis Gas Fermentation for Edwards et al., "Characterizing the Metabolic Phenotype: A Pheno Biofuel Production.” Current Opinion in Biotechnology. 18(3): 200 type Phase Plane Analysis.” Biotechnology and Bioengineering. 206 (2007). 77(1): 27-36 (2002) Herter et al., “L-Malyl-Coenzyme A Lyases/f-Methylmalyl-Coen Eisenreich et al., “Retrobiosynthetic Analysis of Carbon Fixation in Zyme A Lyase from Chloroflexus aurantiacus, a Biofunctional the Phototrophic Eubacterium Chloroflexus aurantiacus.” Eur, J. Enzyme Involved in Autotrophic CO. Fixation.” Journal of Bacteri Biochem. 215(3): 619-632 (1993). ology. 184(21): 5999-6006 (2002). Evans et al., “I'C)Propionate Oxidation in Wild-Type and Citrate Herter et al., "A Bicyclic Autotrophic CO. Fixation Pathway in Synthase Mutant Escherichia coli: Evidence for Multiple Pathways Chloroflexus aurantiacus,” Journal of Bacteriology. 277 (23): 20277 of Propionate Utilization.” Biochem. J. 291; 927-932 (1993). 20283 (2002). Evans et al., “A New Ferredoxin-Dependent Carbon Reduction Cycle Ho et al., “Genetically Engineered Saccharomyces Yeast Capable of in a Photosynthetic Bacterium.” Biochemistry. 55(4): 928-934 Effective Cofermentation of Glucose and Xylose.” Applied and Envi (1966). ronmental Microbiology. 64(5): 1852-1859 (1998). Farquhar et al., "Carbon Isotope Discrimination and Photosynthesis.” Hoffmeister et al., "Mitochondrial trans-2-Enoyl-CoA Reductase of Annu. Rev. Plant Physiol. Plant Mol. Biol. 40: 503-537 (1989). Wax Ester Fermentation from Euglena gracilis Defines a New Fam US 8,349,587 B2 Page 3 ily of Enzymes Involved in Lipid Synthesis.” The Journal of Biologi Kim et al., “Production of Biohydrogen by Heterologous Expression cal Chemistry. 280(6): 4329-4338 (2005). of -Tolerant Hydrogenovibrio marinus NiFe-Hydrogenase Holo, Helge. “Chloroflexus aurantiacus Secretes in Escherichia coli,” Journal of Biotechnology. 1155(3):312-319 3-Hydroxypropionate, A Possible Intermediate in the Assimilation of (2011). CO and Acetate.” Archives of Microbiology. 151(3): 252-256 Klimkeet al., “The National Center for Biotechnology Information's (1989). Protein Clusters Database.' Nucleic Acids Research. 37:D216-D223 Hou et al., “Metabolic Impact of Increased NADH Availability in (2009). Saccharomyces cerevisiae.” Applied and Environmental Microbiol Knight, T., “Idempotent Vector Design for Standard Assembly of ogy. 76(3): 851-859 (2010). Biobricks.” http://hdl.handle.net/1721.1/21 168 (2003). Higler et al., “Autotrophic CO. Fixation via the Reductive Knight, T., “Draft Standard for Biobrick Biological Parts.” http://hdl. Tricarboxylic Acid Cycle in Different Lineages Within the handle.net/1721.1/45138 (2007). : Evidence for Two Ways of Citrate Cleavage.” Environ Knothe et al., “Biodiesel: The Use of Vegetable Oils and Their mental Microbiology, 9(1): 81-92 (2007). Derivatives as Alternative Diesel Fuels.” Am. Chem. Soc. Symp. Higler et al., “Malonyl-Coenzyme A Reductase from Chloroflexus Series, vol. 666, Fuels and Chemicals from Biomass, Chapter 10, pp. aurantiacus, a Key Enzyme of the 3-Hydroxypropionate Cycle for 172-208 (1997). Autotrophic CO. Fixation.” Journal of Bacteriology. 184(9): 2404 Knothe, Gerhard. “Dependence of Biodiesel Fuel Properties on the 2410 (2002). Structure of Fatty Acid Alkyl Esters.” Fuel Processing Technology. Higler et al., “Pathways of Carbon and Energy Metabolism of the 86:1059-1070 (2005). Epibiotic Community Associated with the Deep-Sea Hydrothermal Knothe, Gerhard. “Rapid Monitoring of Transesterification and Vent Shrimp Rimicaris exoculata.” PLoS One. 6(1):e 16018 (2011). Assessing Biodiesel Fuel Quality by Near-Infrared Spectroscopy Higler et al., “Beyond the Calvin Cycle: Autotrophic Carbon Fixa Using a Fiber-Optic Probe.” J. Am. Oil Chem. Soc. 76(7):795-800 tion in the Ocean.” Annu. Rev. Marine. Sci. 3:261-289 (2011). (1999). Huisman and Gray, “Towards Novel Processes for the Fine-Chemical Kolkman et al., “Directed Evolution of Proteins by Exon Shuffling.” and Pharmaceutical Industries.” Curr. Opin. Biotechnol. 13(4):352 Nat. Biotechnol. 19(5):423-428 (2001). 358 (2002). Larkum, AWD. “Limitation and Prospects of Natural Photosynthesis Ikeda et al., “Two Tandemly Arranged Ferredoxin Genes in the for Bioenergy Production.” Current Opinion in Biotechnology. Hydrogenobacter thermophilus Genome: Comparative Character 21(3):271-276 (2010). LaRue and Kurz. “Estimation of Nitrogenase Using a Colorimetric ization of the Recombinant 4FE-4S Ferredoxins. Biosci. Determination for Ethylene.” Plant Physiol. 51(6):1074-1075 Biotechnol. Biochem. 69(6): 1172-1177 (2005). (1973). Inokumaet al., “Characterization of Enzymes Involved in the Ethanol Li et al., “Alteration of the Fatty Acid Profile of Streptomyces Production of Moorella sp. HUC22-1.” Arch. Microbiol. 188(1):37 coelicolor by Replacement of the Initial Enzyme 3-Ketoacyl Carrier 45 (2007). Protein Synthase III (FabH).” Journal of Bacteriology. 187(11): Janausch et al., "C-Dicarboxylate Carriers and Sensors in Bacteria.” 3795-3799 (2005). Biochemica et Biophysica Acta. 1553(1-2): 39-56 (2002). Liu et al., “Formate Dehydrogenase of Clostridium pasteurianum.” Jukes and Osawa. “Mini Review: Evolutionary Changes in the Journal of Bacteriology. 159(1):375-380 (1984). Genetic Code.” Comp. Biochem. Physiol. 106B(3): 489-494 (1993). Marcia et al., “A New Structure-Based Classification of Sulfide: Kalscheuer and Steinbichel. “A Novel Bifunctional Wax Ester Quinone Oxidoreductases.” Proteins. 78(5):1073-1083 (2010). Synthase/Acyl-CoA: Diacylglycerol Acyltransferases Mediates Wax Marrakchi et al., “Mechanistic Diversity and Regulation of Type II Ester and Triacylglycerol Biosynthesis in Acinetobactor Fatty Acids Synthesis.” Biochem. Soc. Trans. 30(6): 1050-1055 calcoaceticus ADP1.” The Journal of Biological Chemistry. (2002). 278(10):8075-8082 (2003). Marrakchi et al., “A New Mechanism for Anaerobic Unsaturated Kalscheuer et al., “Microdiesel: Escherichia coli engineered for Fuel Fatty Acid Formation in Streptococcus pneumoniae.” The Journal of Production.” Microbiology. 152:2529-2536 (2006). Biological Chemistry. 277(47);44809-44816 (2002). Kanao et al., “Kinetic and Biochemical Analyses on the Reaction Martínez-Alonso et al., “Rehosting of Bacterial Chaperones for High-Quality Protein Production.” Applied and Environmental Mechanism of a Bacterial ATP-Citrate Lyase.” Eur, J. Biochem. Microbiology, 75(24): 7850-7854 (2009). 269(14): 3409-3416 (2002). Martínez-Alonso et al., “Side Effects of Chaperone Gene Co-Expres Kanao et al., “Characterization of Isocitrate Dehydrogenase from the sion in Recombinant Protein Production,” Microbial Cell Factories. Green Sulfur Bacterium Chlorobium limicola.' Eur, J. Biochem. 9(64): 1-6 (2010). 269(7): 1926-1931 (2002). Martin et al., “Redesigning Cells for Production of Complex Organic Kaneda, T. “Iso- and Antesio-Fatty Acids in Bacteria: Biosynthesis, Molecules.” ASM News. 68(7): 336-343 (2002). Function, and Taxonomic Significance.” Microbiological Reviews. Marty and Planas. “A Comparison of Methods to Determine Algal 55(2): 288-302 (1991). 6C in Freshwater.” Limnol Oceanogr Methods. 6:51-63 (2008). Kapust and Waugh. “Escherichia coli Maltose-Binding Protein is Mendoza et al., “Thermal Regulation of Membrane Fluidity in Uncommonly Effective at Promoting the Solubility of Polypeptides Escherichia coli.” The Journal of Biological Chemistry. to Which it is Fused.” Protein Science. 8:1668-1674 (1999). 258(4):2098-2101 (1983). Keasling, Jay D. "Gene-Expression Tools for the Metabolic Engi Menendez et al., “Presences of Acetyl Coenzyme A (CoA) neering of Bacteria.” Trends in Biotechnol. 17(11):452-460 (1999). Carboxylase and Propionyl-COA Carboxylase in Autotrophic Kelly et al., “Methodology: Measuring the Activity of BioBrick and Indication for Operation of a Promoters Using an in vivo Reference Standard.” Journal of Biologi 3-Hydroxypropionate Cycle in Autotrophic Carbon Fixation.” Jour cal Engineering. 3(4): 1-13 (2009). nal of Bacteriology. 181(4): 1088-1098 (1999). Kelly et al., Autotrophic Metabolism of Formate by Thiobacillus Minshull and Stemmer. “Protein Evolution by Molecular Breeding.” Strain A2, Journal of General Microbiology. 144:1-13 (1979). Curr. Opin. Chem. Biol. 3(3):284-290 (1999). Kemp, M.B. “The Hexose Phosphate Synthetase of Methylococcus Miroshnichenko et al., "Nautilia lithotrophica gen, nov... sp. nov., a capsulatus.” Biochem J. 127(3):64P-65P (1972). Thermophilic Sulfuro-Reducing e-Proteobacterium Isolated from a Kemp, M.B. "Hexose Phosphate Synthetase from Methylococcus Deep-Sea .” International Journal of Systematic capsulatus Makes a D-arabino-3-Hexulose Phosphate.” Biochem J. and Evolutionary Microbiology. 52: 1299-1304 (2002). 139(1): 129-134 (1974). Mitsuiet al., “A Novel Operon Encoding Formaldehyde Fixation: the Kim and Unden, “The I-Tartrate/Succinate AntiporterTtdT (YgE) of Ribulose Monophosphate Pathway in the Gram-Positive Facultative I-Tartrate Fermentation in Escherichia coli,” Journal of Bacteriology. Methylotrophic Bacterium Mycobacterium gastriMB19.” Journal of 189(5): 1597-1603 (2007). Bacteriology. 182(4):944-948 (2000). US 8,349,587 B2 Page 4

Monson and Hayes. “BioSynthetic Control of the Natural Abundance Sachdev and Chirgwin. “Solubility of Proteins Isolated from Inclu of Carbon 13 at Specific Positions within Fatty Acids in Escherichia sion Bodies is Enhanced by Fusion to Maltose-Binding Protein or coli.” The Journal of Biological Chemistry. 255(23): 11435-11441 Theoredoxin.” Protein Expression and Purification. 12(1): 122-132 (1980). (1998). Moriya et al., “KAAS: An Automatic Genome Annotation and Path Saitou and Nei. “The Neighbor-joining Method: A New Method for way Reconstruction Server.” Nucleic Acids Research. 35:W182 Reconstructing Phylogenetic Trees.” Mol. Biol. Evol. 4(4):406-425 W185 (2007). (1987). Morweiser et al., “Developments and Perspectives of Photobioreac Sakata et al., “Carbon Isotopic Fractionation Associated with Lipid tors for Biofuel Production.” Appl. Microbiol. Biotechnol. 87: 1291 Biosynthesis by a Cyanobacterium: Relevance for Interpretation of 1301 (2010). Biomarker Records.” Geochim. Cosmochim Acta. 61(24):5379 Murli et al., “A Role for the umul)C Gene Products of Escherichia 5389 (1997). coli in Increasing Resistance to DNA Damage in Stationary Phase by San et al., “Metabolic Engineering Through Cofactor Manipulation Inhibiting the Transition to Exponential Growth.” Journal of Bacte and its Effects on Metabolic Flux Redistribution in Escherichia coli.” riology. 182(4): 1127-1135 (2000). Metabolic Engineering. 4(2):182-192 (2002). Murtagh, F. “Complexities of Hierarchic Clustering Algorithms: Sauer et al., “The Soluble and Membrane-Bound Transhydrogenases State of the Art.” Computational Statistics Quarterly, 1(2):101-113 Udha and PntAB Have Divergent Functions in NADPH Metabolism (1984). of Escherichia coli.” The Journal of Biological Chemistry. Ober, Joyce A. "Sulfur.” U.S. Geological Survey Minerals Yearbook. 279(8):6613-6619 (2004). U.S. Department of the Interior U.S. Geological Survey. 74.1-74.17 Schlitz et al., "Sulfide-Quinone Reductase from Rhodobacter (2010). capsulates.” The Journal of Biological Chemistry. 272(15):9890 Orita et al., “Bifunctional Enzyme Fusion of 3-Hexulose-6-Phos 9894 (1997). phate Synthase and 6-Phospho-3-Hexuloisomerase.” Appl. Self et al., “Expression and Regulation of a Silent Operon, hyf. Microbiol. Biotechnol. 76:439-445 (2007). Coding for Hydrogenase 4 Isoenzyme in Escherichia coli,” Journal of Orita et al., “The Archaeon Pyrococcus horikoshii Possesses a Bacteriology. 186(2):580-587 (2004). Bifunctional Enzyme for Formaldehyde Fixation via the Ribulose Serov et al., “Engineering of Coenzyme Specificity of Formate Monophosphate Pathway,” Journal of Bacteriology. 187(11):3636 Dehydrogenase from Saccharomyces cerevisiae.” Biochem. J. 3642 (2005). 367:841-847 (2002). Orita et al., “The Ribulose Monophosphate Pathway Subsititues for Shetty et al., “Methodology: Engineering BioBrick Vectors from the Missing Pentose Phosphate Pathway in the Archaeon Thermococ BioBrick Parts.” Journal of Biological Engineering. 2(5):1-12 cus kodakaraensis,” Journal of Bacteriology, 188(13):4698-4704 (2008). (2006). Shetty et al., “Assembly of BioBrick Standard Biological Parts Using Palaniappan et al., “Enhancement and Selective Production of Three Antibiotic Assembly.” Methods in Enzymology. 498: 311-326 Phoslactomycin B, a Protein Phosphatase IIa Inhibitor, Through (2011). Identification and Engineering of the Corresponding Biosynthetic Shibata and Kobayashi. “Sulfide Oxidation in Gram-Negative Bac Gene Cluster.” The Journal of Biological Chemistry. 278(37): 35552 teria by Expression of the Sulfide-Quinone Reductase Gene of 35557 (2003). Rhodobacter capsulates and by Electron Transport to Ubiquinone.” Parikh et al., “Directed Evolution of RuBisCO Hypermorphs Can. J. Microbiol. 47(9):855-860 (2001). Through Genetic Selection in Engineered E. coli.” Protein Engineer Shpaer, Eugene G. "GeneAssist: Smith-Waterman and Other Data ing, Design & Selection. 19(3): 113-119 (2006). base Similarity Searches and Identification of Motifs.” Methods Mol. Park, Myong-Ok. “New Pathway for Long-Chain n-Alkane Synthe Biol. 70: 173-187 (1997). sis via 1-Alcohol in Vibrio furnissii M1.” Journal of Bacteriology. Smith et al., “Nautilia profundicola sp. nov., a Thermophilic Sulfur 187(4): 1426-1429 (2005). Reducing Epsilonproteobacterium from Deep-Sea Hydrothermal Patton et al., “A Novel A. A-Enoyl-CoA. Isomerase Involved in the Vents.” Int. j. Syst. Evol. Microbiol. 58:1598-1602 (2008). Biosynthesis of the Cyclohexanecarboxylic Acid-Derived Moiety of Smolke et al., "Coordinated, Differential Expression of Two Genes the Polyketide Anastrienin A.” Biochemistry. 39:7595-7604 (2000). Through Directed mRNA Cleavage and Stabilization by Secondary Pinske et al., “Metabolic Deficiences Revealed in the Biotechnologi Structures.” Appl. Environ. Microbiol. 66(12):5399-5405 (2000). cally Important Model Bacterium Escherichia coli BL21 (DE3).” Smolke and Keasling. “Effect of Copy Number and mRNA Process PLoS One. 6(8): e22830 (2011). ing and Stabilization on Transcript and Protein Levels from an Engi Portis and Parry. “Discoveries in Rubisco (Ribulose 1.5- neered Dual-Gene Operon.” Biotechnol. Bioeng. 78(4):412-424 Biophosphate Carboxylase/Oxygenase): A Historical Perspective.” (2002). Photosynth Res. 94:121-143 (2007). Smolke and Keasling. “Effect of Gene Location, mRNA Secondary Pramanik and Keasling. "Stoichiometric Model of Escherichia coli Structures, and RNase Sites of Expression of Two Genes in an Engi Metabolism: Incorporation of Growth-Rate Dependent Biomass neered Operon.” Biotechnol. Bioeng. 80(7):762-776 (2002). Composition and Mechanistic Energy Requirements. Biotechnol. Smolke et al., "Controlling the Metabolic Flux Through the Bioeng. 56(4): 398-421 (1997). Carotenoid Pathway Using Directed mRNA Processing and Stabili Pramanik and Keasling. “Effect of Escherichia coli Biomass Com zation.” Metabolic Engineering. 3(4):313-321 (2001). position on Central Metabolic Fluxes Predicted by a Stoichiometric Sokal and Michener. “A Statistical method for Evaluating Systematic Model.” Biotechnol. Bioeng. 60(2): 230-238 (1998). Relationships.” The University of Kansas Science Bulletin. Rathnasingh et al., “Production of 3-Hydroxypropionic Acid via 38(22): 1409-1438 (1958). Malonyl-CoA Pathway Using Recombinant Escherichia coli Strauss and Fuchs. "Enzymes of a Novel Autotrophic CO2 Fixation Strains,” Journal of Biotechnology. 157(4):633-640 (2012). Pathway in the Phototrophic Bacterium Chloroflexus aurantiacus, Reading and Sperandio. "Quorum Sensing: The Many Languages of the 3-Hydroxypropionate Cycle.” Eur, J. Biochem. 215(3):633-643 Bacteria.” FEMS Microbiol. Left. 254(1): 1-11 (2006). (1993). Rock et al., “Increased Unsaturated Fatty Acid Production Associ Strom et al., “The Carbon Assimilation Pathways of Methylococcus ated with a Suppressor of the fabA6(Ts) Mutation in Escherichia capsulatus, Pseudomonas methanica and Methylosinus coli,” Journal of Bacteriology. 178(18):5382-5387 (1996). trichosporium (OB3B) During Growth on Methane.” Biochem. J. Roessner et al., “Overexpression in Escherichia coli of 12 Vitamin 144(3):465-476 (1974). B. Biosynthetic Enzymes.” Protein Expression and Purification. Sun et al., “Heterologous Expression and Maturation of an NADP 6(2): 155-163 (1995). Dependent NiFe-Hydrogenase: A Key Enzyme in Biofuel Produc Sachdev and Chirgwin. "Fusions to Maltose-Binding Protein: Con tion.” PLoS One. 5(5): e10526 (2010). trol of Folding and Solubility in Protein Purification.” Methods Tabita et al., “Expression and Assembly of Active Cyanobacterial Enzymol. 326:312-321 (2000). Ribulose-1,5-Bisphosphate Carboxylase/Oxygenase in Escherichia US 8,349,587 B2 Page 5 coli Containing Stiochiometric Amounts of Large and Small Sub Yoon et al., “Purification and Characterization of units.” Proc. Natl. Acad. Sci. 82(18):6100-6103 (1985). Pyruvate:Ferredoxin Oxidoreductase from Hydrogenobacter Tatusov et al., “The COG Database: An Updated Version Includes thermophilus TK-6.” Arch. Microbiol. 167(5):275-279 (1997). Zarzycki and Fuchs. "Co-Assimilation of Organic Substrates via the .” BMC Bioinformatics. 4:1-14 (2003). Autotrophic 3-Hydroxypropionate Bi-Cycle in Chloroflexus Tatusov et al., “A Genomic Perspective on Protein Families.” Sci aurantiacus.” Appl. Environ. Microbio. 77(17):6181-6188 (2011). ence. 248(5338):631-637 (1997). Zarzycki et al., “Identifying the Missing Steps of the Autotrophic Venturi, Vittorio. "Regulation of Quorum Sensing in Pseudomonas.” 3-Hydroxypropionate CO. Fixation Cycle in Chloroflexus FEMS Microbiol. Rev. 30(2):274-291 (2006). aurantiacus.” Proc. Natl. Acad. Sci. USA. 106(50):21317-21322 Vignais and Billoud. “Occurrence, Classification, and Biological (2009). Function of Hydrogenases: An Overview.” Chem. Rev. Zdobnov and Apweiler. “InterProScan—An Integration Platform for 107(10):4206-4272 (2007). the Signature-Recognition Methods in InterPro.” Bioinformatics. Vignais and Colbeau. “Molecular Biology of Microbial 17(9):847-848 (2001). Hydrogenases.” Curr. Issues Mol. Biol. 6(2): 159-188 (2004). Zhang et al., “The FabR (YijC) Transcription Factor Regulates Wells et al., “Engineering a Non-Native Hydrogen Production Path Unsaturated Fatty Acid Biosynthesis in Escherichia coli.” The Jour way into Escherichia coli via a Cyanobacterial NiFe Hydrogenase.” nal of Biological Chemistry. 277(18): 15558-15565 (2002). Metabolic Engineering. 13(4):445-453 (2011). Zhang et al., “Molecular and Genetical Analysis of the Fructose van Wezel et al., “GlcP Constitutes the Major Glucose Uptake Sys Glucose Transport System in the Cyanobacterium Synechocystis tem of Streptomyces Coelicolor A3(2).” Molecular Microbiology. PCC6803.” Molecular Microbiology. 3(9): 1221-1229 (1989). 55(2):624–636 (2005). Zhu et al., “Production of Ubiquinone in Escherichia coli by Expres Wubbolts et al., “Variation of Cofactor Levels in Escherichia coli : sion of Various Genes Responsible for Ubiquinone Biosynthesis.” Sequence Analysis and Expression of the DncB Gene Encoding Nico Journal of Fermentation and Bioengineering. 79(5):493-495 (1995). tinic Acid Phosphoribosyktrabsferase.” The Journal of Biological Zweiger, Gary. “Knowledge Discovery in Gene-Expression Chemistry. 265(29): 17665-17672 (1990). Microarray Data: Mining the Information Output of the Genome.” Yamamoto et al., “Carboxylation Reaction Catalyzed by Trends Biotechnol. 17(11):429-436 (1999). 2-Oxoglutarate:Ferredoxin Oxidoreductases from Hydrogenobacter thremophilus.” Extremeophiles. 14(1):79-85 (2010). * cited by examiner U.S. Patent Jan. 8, 2013 Sheet 1 of 29 US 8,349,587 B2

U.S. Patent Jan. 8, 2013 Sheet 2 of 29 US 8,349,587 B2

U.S. Patent Jan. 8, 2013 Sheet 3 of 29 US 8,349,587 B2

Phosphoenolpyruvate m - Ace tyl-CoA co U.S. Patent Jan. 8, 2013 Sheet 4 of 29 US 8,349,587 B2

U.S. Patent Jan. 8, 2013 Sheet 5 Of 29 US 8,349,587 B2

yoxylate rty- & pyrets ico to 8 2 toy-A citraraiyilee. A 2 t l mis- hastery-- 2 precipieng-A reakeri-CiteA sieginyi-A-methylmalonyl-CoA : -as Hico glyoxylate U.S. Patent Jan. 8, 2013 Sheet 6 of 29 US 8,349,587 B2

Figure { U.S. Patent Jan. 8, 2013 Sheet 7 Of 29 US 8,349,587 B2

racehy es -

isities in - Rix set yies if 2 tests | Sirkeptilee - G hydroxyacetaxiFractasetts-bis-P - iserahyde: s Erythroa 4

retres Xysicisesi

*igar: U.S. Patent Jan. 8, 2013 Sheet 8 of 29 US 8,349,587 B2

Figure 8 U.S. Patent Jan. 8, 2013 Sheet 9 Of 29 US 8,349,587 B2

s

Frateats-as tasiaassas

yiryat ---- se: 3. yere 1,3-bis-P- -glycerate es

*igi: 88 U.S. Patent Jan. 8, 2013 Sheet 10 of 29 US 8,349,587 B2

gar8 is U.S. Patent Jan. 8, 2013 Sheet 11 of 29 US 8,349,587 B2

*igate is U.S. Patent Jan. 8, 2013 Sheet 12 of 29 US 8,349,587 B2

giatrix is 2-likeities

stafa:38 Me-issixtee

s: is striki U.S. Patent Jan. 8, 2013 Sheet 13 Of 29 US 8,349,587 B2

U.S. Patent Jan. 8, 2013 Sheet 14 of 29 US 8,349,587 B2

U.S. Patent Jan. 8, 2013 Sheet 15 Of 29 US 8,349,587 B2

glycericity de 3- pyruste

exy--xyle Six 2-(-methyterythrital2. 4-MEr 4-diphophocytidyl-2-(-methylerythritic3. 4-diphophecytidyl-2-methyisrythrite--EP4. 5. 2-x-Erathy-erythrite *Torone (Ear (E-4-hydroxy-3-methyl-but-2-enyl pyrophosphate (His-PP)

isopertyl pyropheaplator timethylsilyl pyrophospite (Ar)

Figare is U.S. Patent Jan. 8, 2013 Sheet 16 of 29 US 8,349,587 B2

U.S. Patent Jan. 8, 2013 Sheet 17 Of 29 US 8,349,587 B2

U.S. Patent Jan. 8, 2013 Sheet 18 Of 29 US 8,349,587 B2

U.S. Patent Jan. 8, 2013 Sheet 19 Of 29 US 8,349,587 B2

partata semialdehyde (S)-2,3-dihydrodiplcolinate". 5. s

s wors Nauccinyl-2-amino-6 relate 7. orumLL-diaminoplmelate 9. mostpone10. L-lysine

Figure 19 U.S. Patent Jan. 8, 2013 Sheet 20 Of 29 US 8,349,587 B2

3-hydroxypripplonyl-CoA

3-ketovaaryl-CoA 3.

U.S. Patent Jan. 8, 2013 Sheet 21 of 29 US 8,349,587 B2

* :::::::::::::::::::::::::::::-

8.

. 8

. s

3.3

::: S 238 & 38 33 3: 833 ::: S: 8: 83 f{:} { time is U.S. Patent Jan. 8, 2013 Sheet 22 of 29 US 8,349,587 B2

Kicave contro x:isis: 8 ww:risix 8:

O.O '''''''''''''''''''''''''''''''oxxgroxgrox O O 2O 3O (O SO Time (min)

figure 22 U.S. Patent Jan. 8, 2013 Sheet 23 Of 29 US 8,349,587 B2

14 **** 8:x: {xxxx: *::::::::::::::::8 *:::::::::::::::::::::::8:3838 88s::::::::::38.88:8: 8:8:8:888:888:38: *:::::::::::::::::::::::::3:38 08 de -

O2 a ...... a : 8 ...... Time (s)

*igare 33 U.S. Patent Jan. 8, 2013 Sheet 24 of 29 US 8,349,587 B2

:33:33:8:

:::::::::::

fire traias

*igate : U.S. Patent Jan. 8, 2013 Sheet 25 Of 29 US 8,349,587 B2

: 8:::::::::

anxi :

3.

iigate x U.S. Patent Jan. 8, 2013 Sheet 26 of 29 US 8,349,587 B2

Figure 26 U.S. Patent Jan. 8, 2013 Sheet 27 Of 29 US 8,349,587 B2

U.S. Patent Jan. 8, 2013 Sheet 28 Of 29 US 8,349,587 B2

Fig338 38 U.S. Patent Jan. 8, 2013 Sheet 29 Of 29 US 8,349,587 B2

* 8::::::::::::: & 3.

s: 8 & ::: 3 x 38 & Maximum isooctanol productivity, g hr

Figar: 2 US 8,349,587 B2 1. 2 METHODS AND SYSTEMIS FOR significantly enable more energy- and carbon-efficient pro CHEMOAUTOTROPHC PRODUCTION OF duction of carbon-based products of interest. Alternatively, ORGANIC COMPOUNDS the ability to add one or more additional or alternative path ways for chemoautotrophic capability to an autotrophic or STATEMENT REGARDING GOVERNMENT 5 mixotrophic organism would enhance its ability to produce LICENSE RIGHTS carbon-based products on interest. This invention was made with government Support under SUMMARY contract number DE-AR0000091 awarded by U.S. Depart ment of Energy, Office of ARPA-E. The government has 10 Systems and methods of the present invention provide for certain rights in the invention. efficient production of renewable energy and other carbon based products of interest (e.g., fuels, Sugars, chemicals) from TECHNICAL FIELD inorganic carbon (e.g., greenhouse gas) using inorganic energy. As such, the present invention materially contributes The invention relates to systems, mechanisms and methods 15 to the development of renewable energy and/or energy con to confer chemoautotrophic production of carbon-based servation, as well as greenhouse gas emission reduction. Fur products to a heterotrophic organism to efficiently convert thermore, systems and methods of the present invention can inorganic carbon into various carbon-based products using be used in the place of traditional methods of producing chemical energy, and in particular the use of Such organism chemicals such as olefins (e.g., ethylene, propylene), which for the commercial production of various carbon-based prod are traditionally derived from petroleum in a process that ucts of interest. The invention also relates to systems, mecha generates toxic by-products that are recognized as hazardous nisms and methods to confer additional and/or alternative waste pollutants and harmful to the environment. As such, the pathways for chemoautotrophic production of carbon-based present invention can additionally avoid the use of petroleum products to an organism that is already autotrophic or mix and the generation of Such toxic by-products, and thus mate otrophic. 25 rially enhances the quality of the environment by contributing to the maintenance of basic life-sustaining natural elements BACKGROUND Such as air, water and/or soil by avoiding the generation of hazardous waste pollutants in the form of petroleum-derived Heterotrophs are biological organisms that utilize energy by-products in the production of various chemicals. from organic compounds for growth and reproduction. Com 30 In certain aspect, the invention described herein provides mercial production of various carbon-based products of inter an organism engineered to confer chemoautotrophic produc est generally relies on heterotrophic organisms that ferment tion of various carbon-based products of interest from inor Sugar from crop biomass such as corn or Sugarcane as their ganic carbon and inorganic energy. The engineered organism energy and carbon source Bai, 2008. An alternative to fer comprises a modular metabolic architecture encompassing mentation-based bio-production is the production of carbon 35 three metabolic modules. The first module comprises one or based products of interest from photosynthetic organisms, more energy conversion pathways that use energy from an Such as plants, algae and , that derive their inorganic energy source, such as formate, formic acid, meth energy from Sunlight and their carbon from carbon dioxide to ane, carbon monoxide, carbonyl Sulfide, carbon disulfide, support growth U.S. Pat. No. 7,981,647). However, the hydrogen Sulfide, bisulfide anion, thiosulfate, elemental Sul algae-based production of carbon-based products of interest 40 fur, molecular hydrogen, ferrous iron, ammonia, cyanide ion, relies on the relatively inefficient process of photosynthesis to and/or hydrocyanic acid, to produce reduced cofactors inside Supply the reducing power needed for production of organic the cell, such as NADH, NADPH, ubiquinol, menaquinol, compounds from carbon dioxide Larkum, 2010. Moreover, cytochromes, flavins and/or ferredoxin. The second module commercial production of carbon-based products of interest comprises one or more carbon fixation pathways that use using photosynthetic organisms relies on reliable and consis 45 energy from reduced cofactors to convert inorganic carbon, tent exposure to light to achieve the high productivities Such as carbon dioxide, carbon monoxide, formate, formic needed for economic feasibility; hence, photobioreactor acid, carbonic acid, bicarbonate, carbon monoxide, carbonyl design remains a significant technical challenge Morweiser, Sulfide, carbon disulfide, cyanide ion and/or hydrocyanic 2010. acid, to central metabolites, such as acetyl-coA, pyruvate, Chemoautotrophs are biological organisms that utilize 50 pyruvic acid, 3-hydropropionate, 3-hydroxypropionic acid, energy from inorganic energy sources such as molecular glycolate, glycolic acid, glyoxylate, glyoxylic acid, dihy hydrogen, hydrogen Sulfide, ammonia or ferrous iron, and droxyacetone phosphate, glyceraldehyde-3-phosphate, carbon dioxide to produce all organic compounds necessary malate, malic acid, lactate, lactic acid, acetate, acetic acid, for growth and reproduction. Existing, naturally-occurring citrate and/or citric acid. Optionally, the third module com chemoautotrophs are poorly Suited for industrial bio-process 55 prises one or more carbon product biosynthetic pathways that ing and have therefore not demonstrated commercial viability convert central metabolites into desired products, such as for this purpose. Such organisms have long doubling times carbon-based products of interest. Carbon-based products of (minimum of approximately one hour for Thiomicrospira interest include but are not limited to alcohols, fatty acids, crunogena but generally much longer) relative to industrial fatty acid derivatives, fatty alcohols, fatty acid esters, wax ized heterotrophic organisms such as Escherichia coli 60 esters, hydrocarbons, alkanes, polymers, fuels, commodity (twenty minutes), reflective of low total productivities. In chemicals, specialty chemicals, carotenoids, isoprenoids, addition, techniques for genetic manipulation (homologous Sugars, Sugar phosphates, central metabolites, pharmaceuti recombination, transformation or transfection of nucleic acid cals and pharmaceutical intermediates. molecules, and recombinant gene expression) are inefficient, The resulting engineered chemoautotroph of the invention time-consuming, laborious or non-existent. 65 is capable of efficiently synthesizing carbon-based products Accordingly, the ability to endow an otherwise het of interest from inorganic carbon using inorganic energy. The erotrophic organism with chemoautotrophic capability would invention also provides energy conversion pathways, carbon US 8,349,587 B2 3 4 fixation pathways and carbon product biosynthetic pathways tose bisphosphate aldolase, transketolase, transaldolase, for conferring chemoautotrophic production of the carbon transketolase, ribose 5-phosphate isomerase and ribulose-5- based product of interest upon the host organism where the phosphate-3-epimerase. organism lacks the ability to efficiently produce carbon-based In Some embodiments, said carbon fixation pathway can be products of interest from inorganic carbon using inorganic at least partially engineered and can be derived from the energy. The invention also provides methods for culturing the Calvin-Benson-Bassham cycle or the reductive pentose phos engineered chemoautotroph to Support efficient chemoau phate (RPP) cycle. For example, the carbon fixation pathway totrophic production of carbon-based products of interest. can include one or more of ribulose bisphosphate carboxy In one aspect, the present invention provides an engineered lase, phosphoglycerate kinase, glyceraldehyde-3P dehydro cell for producing a carbon-based product of interest. The 10 genase (phosphorylating), triose-phosphate isomerase, fruc engineered cell includes an at least partially engineered tose-bisphosphate aldolase, fructose-bisphosphatase, energy conversion pathway having at least one of a recombi transketolase, Sedoheptulose-1.7-bisphosphate aldolase, nant formate dehydrogenase and a recombinant Sulfide Sedoheptulose bisphosphatase, transketolase, ribose-5-phos quinone oxidoreductase introduced into a host cell, wherein phate isomerase, ribulose-5-phosphate-3-epimerase and said energy conversion pathway is capable of using energy 15 phosphoribolukinase. from oxidation to produce a reduced cofactor. The engineered In certain embodiments, said carbon fixation pathway can cell also includes a carbon fixation pathway that is capable of be at least partially engineered and can be derived from the converting inorganic carbon to a central metabolite using reductive tricarboxylic acid (rTCA) cycle. In some embodi energy from the reduced cofactor. The engineered cell further ments, the carbon fixation pathway can include one or more includes, optionally, a carbon product biosynthetic pathway of ATP citrate lyase, citryl-CoA synthetase, citryl-CoA that is capable of converting the central metabolite into a lyase, malate dehydrogenase, fumarate dehydratase, fuma carbon-based product of interest. rate reductase, Succinyl-CoA synthetase, 2-oxoglutarate: In certain embodiments, the recombinant formate dehy ferredoxin oxidoreductase, isocitrate dehydrogenase, 2-oxo drogenase reduces NADP". For example, the recombinant glutarate carboxylase, oxaloSuccinate reductase, aconitrate formate dehydrogenase can be encoded by SEQID NO:1, or 25 hydratase, pyruvate:ferredoxin oxidoreductase, phospho a homolog thereof having at least 80% sequence identity enolpyruvate synthetase and phosphoenolpyruvate carboxy thereto. In some embodiments, the recombinant formate lase. dehydrogenase reduces NAD". In an example, the recombi nant formate dehydrogenase can be encoded by any one of BRIEF DESCRIPTION OF THE FIGURES SEQID NOS:2-4, or a homolog thereof having at least 80% 30 sequence identity thereto. In other embodiments, the recom FIG. 1 is an overview of modular architecture of an engi binant formate dehydrogenase reduces ferredoxin. As an neered chemoautotroph. An engineered chemoautotroph example, the recombinant formate dehydrogenase can be comprises three metabolic modules. (1) In Module 1, one or encoded by one or more of SEQID NOS:5-8, or a homolog more energy conversion pathways that use energy from an thereof having at least 80% sequence identity thereto. 35 extracellular inorganic energy source. Such as formate, In certain embodiments, the recombinant Sulfide-quinone hydrogen sulfide, molecular hydrogen, or ferrous iron, to oxidoreductase reduces quinone. For example, the recombi produce reduced cofactors inside the cell, such as NADH, nant Sulfide-quinone oxidoreductase can be encoded by any NADPH, reduced ferredoxin and/or reduced quinones or one of SEQID NOS:9-16, or a homolog thereofhaving at least cytochromes. Depicted examples of energy conversion path 80% sequence identity thereto. 40 ways include formate dehydrogenase (FDH), hydrogenase In some embodiments, the energy conversion pathway (Hase), and Sulfide-quinone oxidoreductase (SQR). (2) In includes the recombinant formate dehydrogenase and the Module 2, one or more carbon fixation pathways that use energy from oxidation is from formate oxidation. The energy energy from reduced cofactors to reduce and convert inor conversion pathway can also include the recombinant Sulfide ganic carbon, such as carbon dioxide, formate and formalde quinone oxidoreductase and the energy from oxidation can be 45 hyde, to central metabolites, such as acetyl-coA, pyruvate, from hydrogen Sulfide oxidation. glycolate, glyoxylate, and dihydroxyacetone phosphate. In various embodiments, the inorganic carbon is one or Depicted examples of carbon fixation pathways include the more of formate and carbon dioxide. 3-hydroxypropionate cycle (3-HPA), the reverse or reductive In certain embodiments, the carbon fixation pathway can tricarboxylic acid cycle (rTCA), and the ribulose monophos be at least partially engineered and can be derived from the 50 phate pathway (RuMP). (3) Optionally, in Module 3, one or 3-hydroxypropionate (3-HPA) bicycle. The carbon fixation more carbon product biosynthetic pathways that convert cen pathway can include one or more of acetyl-CoA carboxylase, tral metabolites into desired products, such as carbon-based malonyl-CoA reductase, propionyl-CoA synthase, propio products of interest. Since there are many possible carbon nyl-CoA carboxylase, methylmalonyl-CoA epimerase, meth based products of interest, no individual pathways are ylmalonyl-CoA mutase, Succinyl-CoA: (S)-malate CoA 55 depicted. transferase. Succinate dehydrogenase, fumarate hydratase, FIG. 2 is a block diagram of a computing architecture. (S)-malyl-CoA/B-methylmalyl-CoA/(S)-citramalyl-CoA FIG. 3 depicts the metabolic reactions of the reductive lyase, mesaconyl-C1-CoA hydratase or 3-methylmalyl-CoA tricarboxylic acid cycle Evans, 1966; Buchanan, 1990; dehydratase, mesaconyl-CoA C1-C4 CoA transferase and Higler, 2011. Each reaction is numbered. For certain reac mesaconyl-C4-CoA hydratase. 60 tions, such as reaction 1 and 7, there are two possible routes In some embodiments, the carbon fixation pathway can be denoted by a and b, each of which is catalyzed by different at least partially engineered and can be derived from the enzyme(s). Enzymes catalyzing each reaction are as follows: ribulose monophosphate (RuMP) cycle. In one embodiment, 1a, ATP citrate lyase (E.C. 2.3.3.8); 1b, citryl-CoA synthetase said carbon fixation pathway can include one or more of (E.C. 6.2.1.18) and citryl-CoA lyase (E.C. 4.1.3.34); 2. heXulose-6-phosphate synthase, 6-phospho-3-hexuloi 65 malate dehydrogenase (E.C. 1.1.1.37): 3, fumarate dehy Somerase, heXulose-6-phosphate synthase/6-phospho-3- dratase or fumarase (E.C. 4.2.1.2); 4 fumarate reductase heXuloisomerase fusion enzyme, phosphofructokinase, fruc (E.C. 1.3.99.1); 5, succinyl-CoA synthetase (E.C. 6.2.1.5); 6. US 8,349,587 B2 5 6 2-oxoglutarate synthase or 2-oxoglutarate:ferredoxin oxi FIG. 7 depicts the metabolic reactions of the ribulose doreductase (E.C. 1.2.7.3): 7a, isocitrate dehydrogenase monophosphate cycle Strom, 1974. In metabolite names, (E.C. 1.1.1.41 or E.C. 1.1.1.42); 7b, 2-oxoglutarate carboxy —P denotes phosphate. Each reaction is numbered. Enzymes lase (E.C. 6.4.1.7) and oxalosuccinate reductase (E.C. catalyzing each reaction are as follows: 1, heXulose-6-phos 1.1.1.41): 8, aconitrate hydratase (E.C. 4.2.1.3); 9, pyruvate phate synthase (E.C. 4.1.2.43): 2, 6-phospho-3-hexuloi synthase or pyruvate:ferredoxin oxidoreductase (E.C. somerase (E.C. 5.3.1.27); 3. phosphofructokinase (E.C. 1.2.7.1): 10, phosphoenolpyruvate synthetase (E.C. 2.7.9.2): 2.7.1.11): 4, fructose bisphosphate aldolase (E.C. 4.1.2.13): 11, phosphoenolpyruvate carboxylase (E.C. 4.1.1.3.1). 5, transketolase (E.C.2.2.1.1); 6, transaldolase (E.C.2.2.1.2): FIG. 4 depicts example metabolic reactions and enzymes 7, transketolase (E.C. 2.2.1.1): 8, ribose 5-phosphate needed to engineer a carbon fixation pathway derived from 10 isomerase (E.C. 5.3.1.6): 9, ribulose-5-phosphate-3-epime the reductive tricarboxylic acid (rTCA) cycle into the het rase (E.C. 5.1.3.1). erotroph Escherichia coli. Reactions in black are known to FIG. 8 depicts example metabolic reactions and enzymes occur in the wildtype host cell E. coli when grown in needed to engineer a carbon fixation pathway derived from microaerobic or anaerobic conditions Cronan, 2010. Reac the ribulose monophosphate (RuMP) cycle into the het tions in dark gray must be added to complete the rTCA 15 erotroph Escherichia coli. Reactions in black occur in the derived carbon fixation cycle in E. coli. The carbon input to wildtype host cell E. coli. Reactions in dark gray must be the pathway is carbon dioxide (CO) and the carbon outputs added to complete the RuMP cycle-derived carbon fixation of the pathway are acetyl-coA and/or pyruvate. The desired cycle in E. coli. The carbon input to the pathway is formal net flow of carbon is indicated by the wide, light gray arrow. dehyde and the carbon output of the pathway is dihydroxy Metabolites are shown in bold and enzyme abbreviations are acetone-phosphate. The desired net flow of carbon is indi as follows: AspC, aspartate aminotransferase; MDH, malate cated by the wide, light gray arrow. For simplicity, a series of dehydrogenase; AspA, aspartate ammonia-lyase, FumB. rearrangement reactions that regenerate ribulose-5-phos fumarase B; FRD, fumarate reductase; STK, succinate thioki phate and all occur natively in E. coli are denoted by a single nase; OGOR, 2-oxoglutarate:ferredoxin oxidoreductase: arrow. Metabolites are shown in bold with —P denoting IDH, isocitrate dehydrogenase; ACN, aconitase; ACL, ATP 25 phosphate. Enzyme abbreviations are as follows: HPS, hexu citrate lyase; POR, pyruvate:ferredoxin oxidoreductase. lose-6-phosphate synthase; PHI, 6-phospho-3-hexuloi FIG. 5 depicts the metabolic reactions of the 3-hydrox somerase; PFK, phosphofructokinase. ypropionate bicycle Holo, 1989; Strauss, 1993: Eisenreich, FIG.9 depicts the metabolic reactions of the Calvin-Ben 1993; Herter, 2002a: Zarzycki, 2009; Zarzycki, 2011. Each son-Bassham cycle or the reductive pentose phosphate (RPP) reaction is numbered. In some cases, multiple different reac 30 cycle Bassham, 1954. In metabolite names, —P denotes tions, such as reactions 10a, 10b and 10c, are catalyzed by the phosphate. Each reaction is numbered. Enzymes catalyzing same multi-functional enzyme. Enzymes catalyzing each each reaction are as follows: 1, ribulose bisphosphate car reaction are as follows: 1, acetyl-CoA carboxylase (E.C. boxylase (E.C. 4.1.1.39): 2, phosphoglycerate kinase (E.C. 6.4.1.2); 2. malonyl-CoA reductase (E.C. 1.2.1.75 and E.C. 2.7.2.3): 3, glyceraldehyde-3P dehydrogenase (phosphory 1.1.1.298); 3, propionyl-CoA synthase (E.C. 6.2.1.-, E.C. 35 lating) (E.C. 1.2.1.12 or E.C. 1.2.1.13): 4, triose-phosphate 4.2.1.- and E.C. 1.3.1.-): 4, propionyl-CoA carboxylase (E.C. isomerase (E.C. 5.3.1.1); 5, fructose-bisphosphate aldolase 6.4.1.3); 5, methylmalonyl-CoA epimerase (E.C. 5.1.99. 1); (E.C. 4.1.2.13); 6, fructose-bisphosphatase (E.C.3.1.3.11); 7. 6, methylmalonyl-CoA mutase (E.C. 5.4.99.2): 7, succinyl transketolase (E.C. 2.2.1.1): 8, sedoheptulose-1.7-bisphos CoA:(S)-malate CoA transferase (E.C. 2.8.3.-); 8, succinate phate aldolase (E.C. 4.1.2.-); 9, Sedoheptulose bisphos dehydrogenase (E.C. 1.3.5.1); 9, fumarate hydratase (E.C. 40 phatase (E.C. 3.1.3.37); 10, transketolase (E.C. 2.2.1.1): 11, 4.2.1.2): 10abc. (S)-malyl-CoA/B-methylmalyl-CoA/(S)-cit ribose-5-phosphate isomerase (E.C. 5.3.1.6); 12, ribulose-5- ramalyl-CoA lyase (E.C. 4.1.3.24 and E.C. 4.1.3.25): 11, phosphate-3-epimerase (E.C. 5.1.3.1): 13, phosphoriboluki mesaconyl-C1-CoA hydratase or B-methylmalyl-CoA dehy nase (E.C. 2.7.1.19). dratase (E.C. 4.2.1.-); 12, mesaconyl-CoA C1-C4 CoA trans FIG. 10 depicts example metabolic reactions and enzymes ferase (E.C. 2.8.3.-); 13, mesaconyl-C4-CoA hydratase (E.C. 45 needed to engineer a carbon fixation pathway derived from 4.2.1.-). the Calvin-Benson-Bassham cycle or the reductive pentose FIG. 6 depicts example metabolic reactions and enzymes phosphate (RPP) cycle into the heterotroph Escherichia coli. needed to engineer a carbon fixation pathway derived from Reactions in black occur in the wildtype host cell E. coli. the 3-hydroxypropionate (3-HPA) bicycle into the het Reactions in dark gray must be added to complete the RPP erotroph Escherichia coli. Reactions in black are reported to 50 cycle-derived carbon fixation cycle in E. coli. The carbon occur in the wildtype host cell E. coli. Reactions in dark gray input to the pathway is carbon dioxide and the carbon output must be added to complete the 3-HPA bicycle-derived carbon of the pathway is dihydroxyacetone-phosphate. The desired fixation cycle in E. coli. The carbon input to the pathway is net flow of carbon is indicated by the wide, light gray arrow. bicarbonate (HCO) and the carbon output of the pathway is Metabolites are shown in bold with —P denoting phosphate. glyoxylate. The desired net flow of carbon is indicated by the 55 Enzyme abbreviations areas follows: RuBisCO, ribulose bis wide, light gray arrow. Metabolites are shown in bold and phosphate carboxylase; PGK, phosphoglycerate kinase; enzyme abbreviations are as follows: PCC, propionyl-CoA GAPDH, NADPH-dependent glyceraldehyde-3P dehydro carboxylase; MCR, malonyl-CoA reductase: PCS, propio genase (phosphorylating); TPI, triose-phosphate isomerase; nyl-CoA synthase; MCE, methylmalonyl-CoA epimerase: FBA, fructose-bisphosphate aldolase; FBPase, fructose-bis ScpA. E. coli methylmalonyl-CoA mutase: SDH, E. coli suc 60 phosphatase: TK, transketolase; SBA, sedoheptulose-1,7- cinate dehydrogenase; Fum A/FumB/FumC, three E. coli bisphosphate aldolase; SBPase, sedoheptulose bisphos fumarate hydratases; SmtAB, succinyl-CoA:(S)-malate CoA phatase; RPI, ribose-5-phosphate isomerase; RPE, ribulose transferase: MMC lyase, (S)-malyl-CoA/B-methylmalyl 5-phosphate-3-epimerase: PRK, phosphoribolukinase. CoA/(S)-citramalyl-CoA lyase. Note that methylmalonyl FIG. 11 provides a schematic to convert succinate or 3-hy CoA epimerase activity has been reported in E. coli although 65 droxypropionate to various chemicals. no corresponding gene or gene product has been identified FIG. 12 provides a schematic of glutamate oritaconic acid Evans, 1993. conversion to various chemicals. US 8,349,587 B2 7 8 FIG. 13 depicts the metabolic reactions of a galactose tase (E.C. 1.3.1.26); 6, tetrahydrodipicolinate succinylase biosynthetic pathway. In metabolite names, -P denotes (E.C. 2.3.1.117): 7. N-succinyldiaminopimelate-aminotrans phosphate. Each reaction is numbered. Enzymes catalyzing ferase (E.C. 2.6.1.17): 8, N-succinyl-L-diaminopimelate des each reaction areas follows: 1, alpha-D-glucose-6-phosphate uccinylase (E.C. 3.5.1.18); 9, diaminopimelate epimerase ketol-isomerase (E.C. 5.3.1.9): 2, D-mannose-6-phosphate (E.C. 5.1.1.7); 10, diaminopimelate decarboxylase (E.C. ketol-isomerase (E.C. 5.3.1.8); 3. D-mannose 6-phosphate 4.1.1.20). 1.6-phosphomutase (E.C. 5.4.2.8); 4, mannose-1-phosphate FIG. 20 depicts the metabolic reactions of the Y-valerolac guanylyltransferase (E.C. 2.7.7.22); 5, GDP-mannose 3.5- tone biosynthetic pathway. Each reaction is numbered. epimerase (E.C. 5.1.3.18); 6, galactose-1-phosphate guany Enzymes catalyzing each reaction are as follows: 1, propio lyltransferase (E.C. 2.7.n.n); 7, L-galactose 1-phosphate 10 phosphatase (E.C. 3.1.3.n). nyl-CoA synthase (E.C. 6.2.1.-, E.C. 4.2.1.- and E.C. 1.3.1.-); FIG. 14 depicts different fermentation pathways from 2, beta-ketothiolase (E.C. 2.3.1.16); 3, acetoacetyl-CoA pyruvate to ethanol. Each reaction is numbered. Enzymes reductase (E.C. 1.1.1.36): 4, 3-hydroxybutyryl-CoA dehy catalyzing each reaction are as follows: 1. pyruvate decar dratase (E.C. 4.2.1.55); 5, vinylacetyl-CoA A-isomerase boxylase (E.C. 4.1.1.1): 2, alcohol dehydrogenase (E.C. 15 (E.C. 5.3.3.3.); 6, 4-hydroxybutyryl-CoA transferase (E.C. 1.1.1.1): 3, pyruvate-formate lyase (E.C. 2.3.1.54); 4, acetal 2.8.3.-); 7, 1.4-lactonase (E.C.3.1.1.25). dehyde dehydrogenase (E.C. 1.2.1.10); 5, pyruvate synthase FIG. 21 depicts the spectrophotometric assay results of in (E.C. 1.2.7.1). vitro formate dehydrogenase (FDH) assays for Strains propa FIG. 15 depicts the metabolic reactions of the mevalonate gating plasmid 2430, plasmid 2429 as well as positive and independent pathway (also known as the non-mevalonate negative control. The positive control is commercially avail pathway or deoxyxylulose 5-phosphate (DXP) pathway) for able purified NAD"-dependent FDH enzyme. The negative production ofisopentenyl pyrophosphate (IPP) and its isomer control is a strain propagating a plasmid without an FDH dimethylallyl pyrophosphate (DMAPP). In metabolite encoding gene. For each Strain, assay results are shown with names, —P denotes phosphate. Each reaction is numbered. for both NADP" and NAD" as the cofactor, as indicated. The Enzymes catalyzing each reaction are as follows: 1,1-deoxy 25 reduction of either NADP" or NAD" is monitored by mea D-xylulose-5-phosphate synthase (E.C. 2.2.1.7); 2, 1-deoxy Suring the absorbance at 340 nm. D-xylulose-5-phosphate reductoisomerase (E.C. 1.1.1.267); FIG. 22 depicts the spectrophotometric assay results of 3, 4-diphosphocytidyl-2C-methyl-D-erythritol synthase Sulfide oxidation assays for strain propagating plasmid 4767, (E.C.2.7.7.60): 4,4-diphosphocytidyl-2C-methyl-D-erythri plasmid 4768 and a negative control plasmid (a plasmid with tol kinase (E.C. 2.7.1.148); 5, 2C-methyl-D-erythritol 2,4- 30 out a constitutive promoter upstream of the Sqr gene). Deple cyclodiphosphate synthase (E.C.4.6.1.12): 6, (E)-4-hydroxy tion of sulfide over time is monitored by measuring the absor 3-methylbut-2-enyl diphosphate synthase (E.C. 1.17.7.1): 7. bance at 670 nm after treatment of the samples with Cline isopentyl/dimethylallyl diphosphate synthase or 4-hydroxy reagent Cline, 1969. 3-methylbut-2-enyl diphosphate reductase (E.C. 1.17.1.2). FIG. 23 depicts the spectrophotometric assay results of in FIG. 16 depicts the metabolic reactions of the mevalonate 35 vitro propionyl-CoA synthase (PCS) assays for strain propa pathway (also known as the HMG-CoA reductase pathway) gating plasmid 4986 as well as a negative control plasmid for production of isopentenyl pyrophosphate (IPP) and its containing no pcs gene. For the strain propagating plasmid isomer dimethylallyl pyrophosphate (DMAPP). In metabo 4986, assay results are shown with all required substrates as lite names, -P denotes phosphate. Each reaction is num well as control reactions that omit one of the required sub bered. Enzymes catalyzing each reaction are as follows: 1. 40 strates, as indicated. The oxidation of NADPH is monitored acetyl-CoA thiolase; 2. HMG-CoA synthase (E.C. 2.3.3.10); by measuring the absorbance at 340 nm. 3, HMG-CoA reductase (E.C. 1.1.1.34);4, mevalonate kinase FIG. 24 depicts hydrogenase assay results for Strains 242 (E.C. 2.7.1.36); 5, phosphomevalonate kinase (E.C. 2.7.4.2): (at three different dilutions),312 and 392. Hydrogenase activ 6, mevalonate pyrophosphate decarboxylase (E.C. 4.1.1.33): ity is measured by monitoring the reduction of the electron 7, isopentenyl pyrophosphate isomerase (E.C. 5.3.3.2). 45 acceptor methyl viologen; hence, the y axis is denoted in FIG. 17 depicts the metabolic reactions of the glycerol/1, umol of reduced methyl viologen. 3-propanediol biosynthetic pathway for production of glyc FIG. 25 depicts a standard curve correlating the rate of erol or 1,3-propanediol. In metabolite names, —P denotes NADH formation by a commercially available formate dehy phosphate. Each reaction is numbered. Enzymes catalyzing drogenase as a function of formate concentration in the each reaction are as follows: 1, sn-glycerol-3-P dehydroge 50 sample. nase (E.C. 1.1.1.8 or 1.1.1.94); 2, sn-glycerol-3-phosphatase FIG. 26 depicts the branched tricarboxylic acid cycle run (E.C. 3.1.3.21): 3, sn-glycerol-3-P. glycerol dehydratase by E. coli when grown under anaerobic conditions. If the gene (E.C. 4.2.1.30); 4, 1,3-propanediol oxidoreductase (E.C. encoding isocitrate dehydrogenase (Icd) is rendered non 1.1.1.202). functional (denoted by Xs), then synthesis of 2-oxoglutarate FIG. 18 depicts the metabolic reactions of the polyhy 55 is restored through introduction of a functional 2-oxoglut droxybutyrate biosynthetic pathway. Each reaction is num arate synthase (OGOR, bold gray arrow). Metabolite names bered. Enzymes catalyzing each reaction are as follows: 1. are denoted in bold. acetyl-CoA:acetyl-CoA C-acetyltransferase (E.C. 2.3.1.9); FIG. 27 depicts computed phenotypic phase planes for E. 2. (R)-3-hydroxyacyl-CoA:NADP+ oxidoreductase (E.C. coli strains with the native formate dehydrogenases deleted in 1.1.1.36): 3, polyhydroxyalkanoate synthase (E.C. 2.3.1.-). 60 either the absence (A and C) or presence (B and D) of an FIG. 19 depicts the metabolic reactions of one lysine bio exogenous NAD"-dependent formate dehydrogenase. The synthesis pathway. In metabolite names, —P denotes phos growth conditions are aerobic with dual carbon Sources of phate. Each reaction is numbered. Enzymes catalyzing each formate and either glucose (A and B) or glycolate (C and D). reaction are as follows: 1, aspartate aminotransferase (E.C. FIG. 28 depict computed phenotypic phase planes during 2.6.1.1): 2, aspartate kinase (E.C. 2.7.2.4): 3, aspartate semi 65 growth on formate as a sole carbon source for wildtype E. coli aldehyde dehydrogenase (E.C. 1.2.1.11): 4, dihydrodipicoli (FIG. 28A), E. coli with native formate dehydrogenases nate synthase (E.C. 4.2.1.52); 5, dihydrodipicolinate reduc deleted (FIG. 28B) and E. coli with native formate dehydro US 8,349,587 B2 10 genases deleted and an exogenous NAD-dependent formate neutral energy from Solar Voltaic, geothermal, wind, nuclear, dehydrogenase added (FIG. 28C). hydroelectric and more. However, most of these technologies FIG.29 depicts the required mass transfer coefficient (Ka) produce electricity and are thus limited in use to the electrical and required reactor volume for 0.5 tid of fuel production, as grid Whipple, 2010. Furthermore, at least some of these a function of maximum fuel productivity for isooctanol, renewable energy sources such as Solar and wind Suffer from assuming fuel production from inorganic energy source H. being intermittent and unreliable. The lack of practical, large and inorganic carbon source CO for an ideal engineered scale electricity storage technologies limits how much of the chemoautotroph. On the y axis, the typical range of Ka in electricity demand can be shifted to renewable sources. The large-scale stirred-tank bioreactors is denoted (A). On the X ability to store electrical energy in chemical form, Such as in axis, reported natural formate uptake rates at industrially 10 carbon-based products of interest, would both offer a means relevant culture densities is denoted (B). for large-scale electricity storage and allow renewable elec tricity to meet energy demand from the transportation sector. DETAILED DESCRIPTION Renewable electricity combined with electrolysis, such as the electrochemical production of hydrogen from water for The present invention relates to developing and using engi 15 example, WO/2009/154753, WO/2010/042197, WO/2010/ neered chemoautotrophs capable of utilizing energy from 028262 and WO/2011/028264 or formate/formic acid from inorganic energy sources and inorganic carbon to produce a carbon dioxide for example, WO/2007/041872), opens the desired product. The invention provides for the engineering possibility of a Sustainable, renewable Supply of the inorganic of a heterotrophic organism, for example, Escherichia coli or energy source as one aspect of the present invention. other organism suitable for commercial large-scale produc In some embodiments, the invention provides for the use of tion of fuels and chemicals, that can efficiently utilize inor an inorganic energy source, such as hydrogen Sulfide or ganic energy sources and inorganic carbon as a Substrate for molecular hydrogen, derived from waste streams. For growth (a chemoautotroph) and for chemical production pro example, hydrogen Sulfide is present in waste streams arising vides cost-advantaged processes for manufacturing of carbon from both hydrodesulfurization processes used during oil based products of interest. The organisms can be optimized 25 recovery and desulfurization of natural gas. Indeed, currently and tested rapidly and at reasonable costs. The invention many oil companies Stockpile elemental Sulfur (the oxidation further provides for the engineering of an autotrophic organ product of hydrogen sulfide) since worldwide production ism to include one or more additional or alternative pathways exceeds demand Ober, 2010. As lower quality oil deposits for utilization of inorganic energy sources and inorganic car with higher sulfur contents (5% w/w) open up to drilling, the bon to produce central metabolites for growth and/or other 30 expectation is that global Sulfur Supply will continue to grow. desired products. As a second example, hydrogen and carbon dioxide are off Inorganic energy sources together with inorganic carbon gas by-products of clostridial acetone-butanol-ethanol fer represent an alternative feedstock to Sugar or light plus carbon mentations. dioxide for the production of carbon-based products of inter In some embodiments, the invention provides for the use of est. There exist non-biological routes to convert inorganic 35 an inorganic carbon Source. Such as carbon dioxide, derived energy sources and inorganic carbon to chemicals and fuels of from waste streams. For example, carbon dioxide is a com interest. For example, the Fischer-Tropsch process consumes ponent of synthesis gas, the major product of gasification of carbon monoxide and hydrogen gas generated from gasifica coal, coal oil, natural gas, and of carbonaceous materials such tion of coal or biomass to produce methanol or mixed hydro as biomass materials, including agricultural crops and resi carbons as fuels U.S. Pat. No. 1,746,464. The drawbacks of 40 dues, and waste organic matter. Additional sources include, Fischer-Tropsch processes are: 1) a lack of product selectiv but are not limited to, production of carbon dioxide as a ity, which results in difficulties separating desired products; byproduct in ammonia and hydrogen plants, where methane 2) catalyst sensitivity to poisoning; 3) high energy costs due is converted to carbon dioxide; combustion of wood and fossil to high temperatures and pressures required; and 4) the lim fuels; production of carbon dioxide as a byproduct offermen ited range of products available at commercially competitive 45 tation of Sugar in the brewing of beer, whisky and other costs. Without the advent of carbon sequestration technolo alcoholic beverages, or other fermentative processes; thermal gies that can operate at Scale, the Fischer-Tropsch process is decomposition of limestone, CaCO, in the manufacture of widely considered to be an environmentally costly method for lime, CaO; production of carbon dioxide as byproduct of generating liquid fuels. Alternatively, processes that rely on Sodium phosphate manufacture; and directly from natural naturally occurring microbes that convert synthesis gas or 50 carbon dioxide springs, where it is produced by the action of syngas, a mixture of primarily molecular hydrogen and car acidified water on limestone or dolomite. As a second bon monoxide that can be obtained via gasification of any example, formaldehyde is an oxidation product of methanol organic feedstock, Such as coal, coal oil, natural gas, biomass, or methane. Methanol can be prepared from Synthesis gas or or waste organic matter, to products such as ethanol, acetate, reductive conversion of carbon dioxide and hydrogen by methane, or molecular hydrogen are available Henstra, 55 chemical synthetic processes. Methane is a major component 2007. However, these naturally occurring microbes can pro of natural gas and can also be obtained from renewable bio duce only a very restricted set of products, are limited in their a SS. efficiencies, lack established tools for genetic manipulation, In one embodiment, the invention provides for the inor and are sensitive to their end products at high concentrations. ganic energy source and the inorganic carbon coming from Finally, there is some work to introduce syngas utilization 60 the same chemical species, such as formate or formic acid. into industrial microbial hosts U.S. Pat. No. 7,803,589: Formate is oxidized by an energy conversion pathway to however, these processes have yet to be demonstrated at com generate reduced cofactor and carbon dioxide. The carbon mercial scale and are limited to using syngas as the feedstock. dioxide can then be used as the inorganic carbon Source. In some embodiments, the invention provides for the use of The invention provides for the expression of one or more an inorganic energy source, such as molecular hydrogen or 65 exogenous proteins or enzymes in the host cell, thereby con formate, derived from electrolysis. There is tremendous com ferring biosynthetic pathway(s) to utilize inorganic energy mercial activity towards the goal of renewable and/or carbon Sources and inorganic carbon to produce reduced organic US 8,349,587 B2 11 12 compounds. In a preferred embodiment, the present invention encoding an enzyme associated with or catalyzing, or a pro provides for a modular architecture for the metabolism of the tein associated with, the referenced metabolic reaction, reac engineered chemoautotroph comprising the following three tant or product. Unless otherwise expressly stated herein, metabolic modules (FIG. 1). those skilled in the art would understand that reference to a In Module 1, one or more energy conversion pathways that reaction also constitutes reference to the reactants and prod use energy from an extracellular inorganic energy ucts of the reaction. Similarly, unless otherwise expressly Source. Such as formate, hydrogen sulfide, molecular stated herein, reference to a reactant or product also refer hydrogen, or ferrous iron, to produce reduced cofactors ences the reaction, and reference to any of these metabolic inside the cell, such as NADH, NADPH, reduced ferre constituents also references the gene or genes encoding the doxin and/or reduced quinones or cytochromes. 10 enzymes that catalyze or proteins involved in the referenced In Module 2, one or more carbon fixation pathways that use reaction, reactant or product. Likewise, given the well-known energy from reduced cofactors to reduce and convert fields of metabolic biochemistry, enzymology and genomics, inorganic carbon, Such as carbon dioxide or formate, to reference herein to a gene or encoding nucleic acid also central metabolites, such as acetyl-coA, pyruvate, gly constitutes a reference to the corresponding encoded enzyme colate, glyoxylate, and dihydroxyacetone phosphate. 15 and the reaction it catalyzes or a protein associated with the Optionally, in Module 3, one or more carbon product bio reaction as well as the reactants and products of the reaction. synthetic pathways that convert central metabolites into desired products, such as carbon-based products of DEFINITIONS interest. A key advantage of a modular architecture for the metabo As used herein, the terms “nucleic acids.” “nucleic acid lism of an engineered chemoautotroph is that each module molecule' and “polynucleotide' may be used interchange may be instantiated via one or more possible biosynthetic ably and include both single-stranded (SS) and double pathways. For example, in Module 1, there are several pos stranded (ds) RNA, DNA and RNA:DNA hybrids. As used sible energy conversion pathways. Such as those based on herein the terms “nucleic acid”, “nucleic acid molecule', formate dehydrogenase (e.g., E.C. 1.2.1.2. E.C. 1.2.1.43. 25 "polynucleotide”, “oligonucleotide'. “oligomer' and "oligo' E.C. 1.1.5.6, E.C. 1.2.2.1 or E.C. 1.2.2.3), ferredoxin-depen are used interchangeably and are intended to include, but are dent formate dehydrogenase, hydrogenase (e.g., E.C. not limited to, a polymeric form of nucleotides that may have 1.12.1.2. E.C. 1.12.1.3, or E.C. 1.12.7.2), sulfide-quinone various lengths, including either deoxyribonucleotides or oxidoreductase (e.g., E.C. 1.8.5.4), flavocytochrome c sulfide ribonucleotides, or analogs thereof. For example, oligos may dehydrogenase (e.g., E.C. 1.8.2.3), ferredoxin-NADP+ 30 be from 5 to about 200 nucleotides, from 10 to about 100 reductase (e.g., E.C. 1.18.1.2), ferredoxin-NAD" reductase nucleotides, or from 30 to about 50 nucleotides long. How (e.g., E.C. 1.18.1.3), NAD(P)+ transhydrogenase (e.g., E.C. ever, shorter or longer oligonucleotides may be used. Oligos 1.6.1.1 or E.C. 1.6.1.2), NADH:ubiquinone oxidoreductase I for use in the present invention can be fully designed. A (e.g., E.C. 1.6.5.3). As a second example, in Module 2, there nucleic acid molecule may encode a full-length polypeptide are several possible naturally occurring carbon fixation path 35 or a fragment of any length thereof, or may be non-coding. ways, such as the Calvin-Benson-Bassham cycle or reductive Nucleic acids can refer to naturally-occurring or synthetic pentose phosphate cycle, the reductive tricarboxylic acid polymeric forms of nucleotides. The oligos and nucleic acid cycle, the Wood-Ljungdhal or reductive acetyl-coA pathway, molecules of the present invention may be formed from natu the 3-hydroxypropionate bicycle or 3-hydroxypropionate/ rally-occurring nucleotides, for example forming deoxyribo malyl-CoA cycle, 3-hydroxypropionate/4-hydroxybutyrate 40 nucleic acid (DNA) or ribonucleic acid (RNA) molecules. cycle and the dicarboxylate/4-hydroxybutyrate cycle Hi Alternatively, the naturally-occurring oligonucleotides may gler, 2011 as well as many possible synthetic carbon fixation include structural modifications to alter their properties. Such pathways Bar-Even, 2010. As a final example, in Module 3, as in peptide nucleic acids (PNA) or in locked nucleic acids there are numerous possible carbon-based products of inter (LNA). The terms should be understood to include equiva est, each of which has one or more corresponding biosyn 45 lents, analogs of either RNA or DNA made from nucleotide thetic pathways. Every combination of energy conversion analogs and as applicable to the embodiment being described, pathway, carbon fixation pathway and, optionally, carbon single-stranded or double-stranded polynucleotides. Nucle product biosynthetic pathway, when expressed in a het otides useful in the invention include, for example, naturally erotrophic or autotrophic host cell or organism, represents a occurring nucleotides (for example, ribonucleotides or deox different embodiment of the present invention. It should be 50 yribonucleotides), or natural or synthetic modifications of noted, however, that only certain embodiments of Module 1 nucleotides, or artificial bases. Modifications can also include may be paired with a particular embodiment of Module 2. For phosphorothioated bases for increased stability. example, the reductive tricarboxylic acid cycle likely requires Nucleic acid sequences that are “complementary are a low potential ferredoxin for particular carbon dioxide fixa those that are capable of base-pairing according to the stan tion steps in the pathway. Thus, the energy conversion path 55 dard Watson-Crick complementarity rules. As used herein, way paired with the reductive tricarboxylic acid cycle must be the term "complementary sequences' means nucleic acid capable of generating reduced low potential ferredoxin, Such sequences that are Substantially complementary, as may be as using a ferredoxin-reducing formate dehydrogenase or a assessed by the nucleotide comparison methods and algo ferredoxin-reducing hydrogenase (E.C. 1.12.7.2). Similarly, rithms set forth below, or as defined as being capable of only certain embodiments of carbon fixation pathways pro 60 hybridizing to the polynucleotides that encode the protein duce the necessary precursors for a particular carbon product Sequences. biosynthetic pathway. For example, fatty acid biosynthetic As used herein, the term “gene’ refers to a nucleic acid that pathways require acetyl-coA and malonyl-coA to be gener contains information necessary for expression of a polypep ated products from the carbon fixation pathway. tide, protein, or untranslated RNA (e.g., rRNA, tRNA, anti The invention is described herein with general reference to 65 sense RNA). When the gene encodes a protein, it includes the the metabolic reaction, reactant or product thereof, or with promoter and the structural gene open reading frame specific reference to one or more nucleic acids or genes sequence (ORF), as well as other sequences involved in US 8,349,587 B2 13 14 expression of the protein. When the gene encodes an untrans In eukaryotes, the genetic element can comprise at least three lated RNA, it includes the promoter and the nucleic acid that modules. For example, a genetic module can be a regulator encodes the untranslated RNA. sequence or a promoter, a coding sequence, and a polyade The term “gene of interest’ (GOI) refers to any nucleotide nylation tail or any combination thereof. In addition to the sequence (e.g., RNA or DNA), the manipulation of which promoter and the coding sequences, the nucleic acid sequence may be deemed desirable for any reason (e.g., has the relevant may comprises control modules including, but not limited to activity for a biosynthetic pathway, confer improved qualities a leader, a signal sequence and a transcription terminator. The and/or yields, expression of a protein of interest in a host cell, leader sequence is a non-translated region operably linked to expression of a ribozyme, etc.), by one of ordinary skill in the the 5' terminus of the coding nucleic acid sequence. The art. Such nucleotide sequences include, but are not limited to, 10 signal peptide sequence codes for an amino acid sequence coding sequences of structural genes (e.g., reporter genes, linked to the amino terminus of the polypeptide which directs Selection marker genes, oncogenes, drug resistance genes, the polypeptide into the cell's secretion pathway. growth factors, etc.), and non-coding sequences which do not As generally understood, a codon is a series of three nucle encode an mRNA or protein product (e.g., promoter otides (triplets) that encodes a specific amino acid residue in sequence, polyadenylation sequence, termination sequence, 15 a polypeptide chain or for the termination of translation (stop enhancer sequence, etc.). For example, genes involved in the codons). There are 64 different codons (61 codons encoding cis,cis-muconic acid biosynthesis pathway can be genes of for amino acids plus 3 stop codons) but only 20 different interest. It should be noted that non-coding regions are gen translated amino acids. The overabundance in the number of erally untranslated but can be involved in the regulation of codons allows many amino acids to be encoded by more than transcription and/or translation. one codon. Different organisms (and organelles) often show As used herein, the term “genome' refers to the whole particular preferences or biases for one of the several codons hereditary information of an organism that is encoded in the that encode the same amino acid. The relative frequency of DNA (or RNA for certain viral species) including both coding codon usage thus varies depending on the organism and and non-coding sequences. In various embodiments, the term organelle. In some instances, when expressing a heterologous may include the chromosomal DNA of an organism and/or 25 gene in a host organism, it is desirable to modify the gene DNA that is contained in an organelle Such as, for example, sequence So as to adapt to the codons used and codon usage the mitochondria or chloroplasts and/or extrachromosomal frequency in the host. In particular, for reliable expression of plasmid and/or artificial chromosome. A "native gene' or heterologous genes it may be preferred to use codons that "endogenous gene' refers to a gene that is native to the host correlate with the host’s tRNA level, especially the tRNAs cell with its own regulatory sequences whereas an “exog 30 that remain charged during starvation. In addition, codons enous gene' or "heterologous gene' refers to any gene that is having rare cognate tRNA’s may affect protein folding and not a native gene, comprising regulatory and/or coding translation rate, and thus, may also be used. Genes designed sequences that are not native to the host cell. In some embodi in accordance with codon usage bias and relative tRNA abun ments, a heterologous gene may comprise mutated sequences dance of the host are often referred to as being “optimized' or part of regulatory and/or coding sequences. In some 35 for codon usage, which has been shown to increase expres embodiments, the regulatory sequences may be heterologous sion level. Optimal codons also help to achieve faster trans or homologous to a gene of interest. A heterologous regula lation rates and high accuracy. In general, codon optimization tory sequence does not function in nature to regulate the same involves silent mutations that do not result in a change to the gene(s) it is regulating in the transformed host cell. "Coding amino acid sequence of a protein. sequence” refers to a DNA sequence coding for a specific 40 Genetic elements or genetic modules may derive from the amino acid sequence. As used herein, “regulatory sequences genome of natural organisms or from Synthetic polynucle refer to nucleotide sequences located upstream (5' non-cod otides or from a combination thereof. In some embodiments, ing sequences), within, or downstream (3' non-coding the genetic elements modules derive from different organ sequences) of a coding sequence, and which influence the isms. Genetic elements or modules useful for the methods transcription, RNA processing or stability, or translation of 45 described herein may be obtained from a variety of sources the associated coding sequence. Regulatory sequences may such as, for example, DNA libraries, BAC (bacterial artificial include promoters, ribosome binding sites, translation leader chromosome) libraries, de novo chemical synthesis, or exci sequences, RNA processing site, effector (e.g., activator, sion and modification of a genomic segment. The sequences repressor) binding sites, stem-loop structures, and so on. obtained from Such sources may then be modified using stan As described herein, a genetic element may be any coding 50 dard molecular biology and/or recombinant DNA technology or non-coding nucleic acid sequence. In some embodiments, to produce polynucleotide constructs having desired modifi a genetic element is a nucleic acid that codes for an amino cations for reintroduction into, or construction of a large acid, a peptide or a protein. Genetic elements may be operons, product nucleic acid, including a modified, partially synthetic genes, gene fragments, promoters, exons, introns, regulatory or fully synthetic genome. Exemplary methods for modifica sequences, or any combination thereof. Genetic elements can 55 tion of polynucleotide sequences obtained from a genome or be as short as one or a few codons or may be longer including library include, for example, site directed mutagenesis: PCR functional components (e.g. encoding proteins) and/or regu mutagenesis; inserting, deleting or Swapping portions of a latory components. In some embodiments, a genetic element sequence using restriction enzymes optionally in combina includes an entire open reading frame of a protein, or the tion with ligation; in vitro or in Vivo homologous recombina entire open reading frame and one or more (or all) regulatory 60 tion; and site-specific recombination; or various combina sequences associated therewith. One skilled in the art would tions thereof. In other embodiments, the genetic sequences appreciate that the genetic elements can be viewed as modular useful in accordance with the methods described herein may genetic elements or genetic modules. For example, a genetic be synthetic oligonucleotides or polynucleotides. Synthetic module can comprise a regulatory sequence or a promoter or oligonucleotides or polynucleotides may be produced using a a coding sequence or any combination thereof. In some 65 variety of methods known in the art. embodiments, the genetic element includes at least two dif In some embodiments, genetic elements share less than ferent genetic modules and at least two recombination sites. 99%, less than 95%, less than 90%, less than 80%, less than US 8,349,587 B2 15 16 70% sequence identity with a native or natural nucleic acid duced or disrupted is to be chosen for construction of the sequences. Identity can each be determined by comparing a non-naturally occurring microorganism. An example of position in each sequence which may be aligned for purposes orthologs exhibiting separable activities is where distinct of comparison. When an equivalent position in the compared activities have been separated into distinct gene products sequences is occupied by the same base or amino acid, then between two or more species or within a single species. A the molecules are identical at that position; when the equiva specific example is the separation of elastase proteolysis and lent site occupied by the same or a similar amino acid residue plasminogen proteolysis, two types of serine protease activ (e.g., similar in steric and/or electronic nature), then the mol ity, into distinct molecules as plasminogen activator and ecules can be referred to as homologous (similar) at that elastase. A second example is the separation of mycoplasma position. Expression as a percentage of homology, similarity, 10 5'-3' exonuclease and Drosophila DNA polymerase III activ or identity refers to a function of the number of identical or ity. The DNA polymerase from the first species can be con similar amino acids at positions shared by the compared sidered an ortholog to either or both of the exonuclease or the sequences. Expression as a percentage of homology, similar polymerase from the second species and vice versa. ity, or identity refers to a function of the number of identical In contrast, as used herein, "paralogs are homologs or similar amino acids at positions shared by the compared 15 related by, for example, duplication followed by evolutionary sequences. Various alignment algorithms and/or programs divergence and have similar or common, but not identical may be used, including FASTA, BLAST, or ENTREZFASTA functions. Paralogs can originate orderive from, for example, and BLAST are available as a part of the GCG sequence the same species or from a different species. For example, analysis package (University of Wisconsin, Madison, Wis.), microsomal epoxide hydrolase (epoxide hydrolase I) and and can be used with, e.g., default settings. ENTREZ is avail soluble epoxide hydrolase (epoxide hydrolase II) can be con able through the National Center for Biotechnology Informa sidered paralogs because they represent two distinct tion, National Library of Medicine, National Institutes of enzymes, co-evolved from a common ancestor, that catalyze Health, Bethesda, Md. In one embodiment, the percent iden distinct reactions and have distinct functions in the same tity of two sequences can be determined by the GCG program species. Paralogs are proteins from the same species with with a gap weight of 1, e.g., each amino acid gap is weighted 25 significant sequence similarity to each other Suggesting that as if it were a single amino acid or nucleotide mismatch they are homologous, or related through co-evolution from a between the two sequences. Other techniques for alignment common ancestor. Groups of paralogous protein families are described Doolittle, 1996. Preferably, an alignment pro include HipA homologs, luciferase genes, peptidases, and gram that permits gaps in the sequence is utilized to align the others. sequences. The Smith-Waterman is one type of algorithm that 30 As used herein, a "nonorthologous gene displacement' is a permits gaps in sequence alignments Shpaer, 1997. Also, nonorthologous gene from one species that can Substitute for the GAP program using the Needleman and Wunsch align a referenced gene function in a different species. Substitution ment method can be utilized to align sequences. An alterna includes, for example, being able to perform Substantially the tive search strategy uses MPSRCH software, which runs on a same or a similar function in the species of origin compared to MASPAR computer. MPSRCH uses a Smith-Waterman algo 35 the referenced function in the different species. Although rithm to score sequences on a massively parallel computer. generally, a nonorthologous gene displacement may be iden As used herein, an “ortholog” is a gene or genes that are tifiable as structurally related to a known gene encoding the related by vertical descent and are responsible for substan referenced function, less structurally related but functionally tially the same or identical functions in different organisms. similar genes and their corresponding gene products never For example, mouse epoxide hydrolase and human epoxide 40 theless still fall within the meaning of the term as it is used hydrolase can be considered orthologs for the biological herein. Functional similarity requires, for example, at least function of hydrolysis of epoxides. Genes are related by Some structural similarity in the active site or binding region Vertical descent when, for example, they share sequence simi of a nonorthologous gene product compared to a gene encod larity of sufficient amount to indicate they are homologous, or ing the function sought to be substituted. Therefore, a non related by evolution from a common ancestor. Genes can also 45 orthologous gene includes, for example, a paralog oran unre be considered orthologs if they share three-dimensional lated gene. structure but not necessarily sequence similarity, of a Suffi Orthologs, paralogs and nonorthologous gene displace cient amount to indicate that they have evolved from a com ments can be determined by methods well known to those monancestor to the extent that the primary sequence similar skilled in the art. For example, inspection of nucleic acid or ity is not identifiable. Genes that are orthologous can encode 50 amino acid sequences for two polypeptides can reveal proteins with sequence similarity of about 25% to 100% sequence identity and similarities between the compared amino acid sequence identity. Genes encoding proteins shar sequences. Based on Such similarities, one skilled in the art ing an amino acid similarity less that 25% can also be con can determine if the similarity is sufficiently high to indicate sidered to have arisen by vertical descent if their three-dimen the proteins are related through evolution from a common sional structure also shows similarities. Members of the 55 ancestor. Algorithms well known to those skilled in the art, serine protease family of enzymes, including tissue plasmi such as Align, BLAST, Clustal W and others compare and nogen activator and elastase, are considered to have arisen by determine a raw sequence similarity or identity, and also Vertical descent from a common ancestor. Orthologs include determine the presence or significance of gaps in the sequence genes or their encoded gene products that through, for which can be assigned a weight or score. Such algorithms also example, evolution, have diverged in structure or overall 60 are known in the art and are similarly applicable for deter activity. For example, where one species encodes a gene mining nucleotide sequence similarity or identity. Parameters product exhibiting two functions and where such functions for sufficient similarity to determine relatedness are com have been separated into distinct genes in a second species, puted based on well known methods for calculating statistical the three genes and their corresponding products are consid similarity, or the chance of finding a similar match in a ran ered to be orthologs. For the production of a biochemical 65 dom polypeptide, and the significance of the match deter product, those skilled in the art would understand that the mined. A computer comparison of two or more sequences orthologous gene harboring the metabolic activity to be intro can, if desired, also be optimized visually by those skilled in US 8,349,587 B2 17 18 the art. Related gene products or proteins can be expected to assembly. Genomic fragments may be excised from the chro have a high similarity, for example, 25% to 100% sequence mosome of a chemoautotrophic organism and altered before identity. Proteins that are unrelated can have an identity which being inserted into the host cell artificial genome or chromo is essentially the same as would be expected to occur by some. In some embodiments, the excised genomic fragments chance, if a database of sufficient size is scanned (about 5%). can be assembled with engineered promoters and/or other Sequences between 5% and 24% may or may not represent gene expression elements and inserted into the genome of the Sufficient homology to conclude that the compared sequences host cell. are related. Additional statistical analysis to determine the As used herein, the term “polypeptide' refers to a sequence significance of such matches given the size of the data set can of contiguous amino acids of any length. The terms "peptide. be carried out to determine the relevance of these sequences. 10 "oligopeptide.” “protein' or “enzyme” may be used inter Exemplary parameters for determining relatedness of two or changeably herein with the term “polypeptide'. In certain more sequences using the BLAST algorithm, for example, instances, “enzyme” refers to a protein having catalytic can be as set forth below. Briefly, amino acid sequence align activities. As used herein, the terms “protein of interest.” ments can be performed using BLASTP version 2.0.8 (Jan.5, “POI. and “desired protein’ refer to a polypeptide under 1999) and the following parameters: Matrix: 0 BLOSUM62: 15 study, or whose expression is desired by one practicing the gap open: 11, gap extension: 1; X dropoff 50, expect: 10.0; methods disclosed herein. A protein of interest is encoded by wordsize: 3; filter: on. Nucleic acid sequence alignments can its cognate gene of interest (GOI). The identity of a POI can be performed using BLASTN version 2.0.6 (Sep. 16, 1998) be known or not known. A POI can be a polypeptide encoded and the following parameters: Match: 1; mismatch: -2, gap by an open reading frame. open: 5; gap extension: 2; x dropoff 50; expect: 10.0; word A “proteome' is the entire set of proteins expressed by a size: 11; filter: off. Those skilled in the art would know what genome, cell, tissue or organism. More specifically, it is the modifications can be made to the above parameters to either set of expressed proteins in a given type of cells or an organ increase or decrease the stringency of the comparison, for ismata given time under defined conditions. Transcriptome is example, and determine the relatedness of two or more the set of all RNA molecules, including mRNA, rRNA, Sequences. 25 tRNA, and other non-coding RNA produced in one or a popu As used herein, the term “homolog” refers to any ortholog, lation of cells. Metabolome refers to the complete set of paralog, nonorthologous gene, or similar gene encoding an Small-molecule metabolites (such as metabolic intermedi enzyme catalyzing a similar or Substantially similar meta ates, hormones and other signaling molecules, and secondary bolic reaction, whether from the same or different species. metabolites) to be found within a biological sample, such as a As used herein, the phrase “homologous recombination' 30 single organism. refers to the process in which nucleic acid molecules with The term “fuse,” “fused’ or “link” refers to the covalent similar nucleotide sequences associate and exchange nucle linkage between two polypeptides in a fusion protein. The otide Strands. A nucleotide sequence of a first nucleic acid polypeptides are typically joined via a peptide bond, either molecule that is effective for engaging in homologous recom directly to each other or via an amino acid linker. Optionally, bination at a predefined position of a second nucleic acid 35 the peptides can be joined via non-peptide covalent linkages molecule can therefore have a nucleotide sequence that facili known to those of skill in the art. tates the exchange of nucleotide strands between the first As used herein, unless otherwise stated, the term “tran nucleic acid molecule and a defined position of the second scription” refers to the synthesis of RNA from a DNA tem nucleic acid molecule. Thus, the first nucleic acid can gener plate; the term “translation” refers to the synthesis of a ally have a nucleotide sequence that is sufficiently comple 40 polypeptide from an mRNA template. Translation in general mentary to a portion of the second nucleic acid molecule to is regulated by the sequence and structure of the 5' untrans promote nucleotide base pairing. Homologous recombina lated region (5'-UTR) of the mRNA transcript. One regula tion requires homologous sequences in the two recombining tory sequence is the ribosome binding site (RBS), which partner nucleic acids but does not require any specific promotes efficient and accurate translation of mRNA. The sequences. Homologous recombination can be used to intro 45 prokaryotic RBS is the Shine-Dalgarno sequence, a purine duce a heterologous nucleic acid and/or mutations into the rich sequence of 5'-UTR that is complementary to the UCCU host genome. Such systems typically rely on sequence flank core sequence of the 3'-end of 16S rRNA (located within the ing the heterologous nucleic acid to be expressed that has 30S small ribosomal subunit). Various Shine-Dalgarno enough homology with a target sequence within the host cell sequences have been found in prokaryotic mRNAS and gen genome that recombination between the vector nucleic acid 50 erally lie about 10 nucleotides upstream from the AUG start and the target nucleic acid takes place, causing the delivered codon. Activity of a RBS can be influenced by the length and nucleic acid to be integrated into the host genome. These nucleotide composition of the spacer separating the RBS and systems and the methods necessary to promote homologous the initiator AUG. In eukaryotes, the Kozak sequence A/GC recombination are known to those of skill in the art. CACCAUGG, which lies within a short 5' untranslated It should be appreciated that the nucleic acid sequence of 55 region, directs translation of mRNA. An mRNA lacking the interest or the gene of interest may be derived from the Kozak consensus sequence may also be translated efficiently genome of natural organisms. In some embodiments, genes in an in vitro systems if it possesses a moderately long 5'-UTR of interest may be excised from the genome of a natural that lacks stable secondary structure. While E. coli ribosome organism or from the host genome, for example E. coli. It has preferentially recognizes the Shine-Dalgarno sequence, been shown that it is possible to excise large genomic frag 60 eukaryotic ribosomes (such as those found in retic lysate) can ments by in vitro enzymatic excision and in vivo excision and efficiently use either the Shine-Dalgarno or the Kozak ribo amplification. For example, the FLP/FRT site specific recom Somal binding sites. bination system and the CrefloxP site specific recombination As used herein, the terms “promoter,” “promoter element.” systems have been efficiently used for excision large genomic or “promoter sequence” refer to a DNA sequence which when fragments for the purpose of sequencing Yoon, 1998. In 65 ligated to a nucleotide sequence of interest is capable of Some embodiments, excision and amplification techniques controlling the transcription of the nucleotide sequence of can be used to facilitate artificial genome or chromosome interest into mRNA. A promoter is typically, though not nec US 8,349,587 B2 19 20 essarily, located 5' (i.e., upstream) of a nucleotide sequence of DNA fragment may be spliced into the vector at these sites in interest whose transcription into mRNA it controls, and pro order to bring about the replication and cloning of the frag vides a site for specific binding by RNA polymerase and other ment. The term “expression vector refers to a vector which is transcription factors for initiation of transcription. capable of expressing of a gene that has been cloned into it. One should appreciate that promoters have modular archi Such expression can occur after transformation into a host tecture and that the modular architecture may be altered. cell, or in IVPS systems. The cloned DNA is usually operably Bacterial promoters typically include a core promoter ele linked to one or more regulatory sequences, such as promot ment and additional promoter elements. The core promoter ers, activator/repressor binding sites, terminators, enhancers refers to the minimal portion of the promoter required to and the like. The promoter sequences can be constitutive, initiate transcription. A core promoter includes a Transcrip 10 tion Start Site, a binding site for RNA polymerases and gen inducible and/or repressible. eral transcription factor binding sites. The “transcription start As used herein, the term “host” refers to any prokaryotic or site' refers to the first nucleotide to be transcribed and is eukaryotic (e.g., mammalian, insect, yeast, plant, bacterial, designated +1. Nucleotides downstream the start site are archaeal, avian, animal, etc.) cell or organism. The host cell numbered +1, +2, etc., and nucleotides upstream the start site 15 can be a recipient of a replicable expression vector, cloning are numbered -1, -2, etc. Additional promoter elements are vector or any heterologous nucleic acid molecule. Host cells located 5' (i.e., typically 30-250 bp upstream of the start site) may be prokaryotic cells such as M. florum and E. coli, or of the core promoter and regulate the frequency of the tran eukaryotic cells Such as yeast, insect, amphibian, or mamma Scription. The proximal promoter elements and the distal lian cells or cell lines. Cell lines refer to specific cells that can promoter elements constitute specific transcription factor grow indefinitely given the appropriate medium and condi site. In , a core promoter usually includes two tions. Cell lines can be mammalian cell lines, insect cell lines consensus sequences, a -10 sequence or a -35 sequence, or plant cell lines. Exemplary cell lines can include tumor cell which are recognized by Sigma factors (see, for example, lines and stem cell lines. The heterologous nucleic acid mol Hawley, 1983). The -10 sequence (10 bp upstream from the ecule may contain, but is not limited to, a sequence of interest, first transcribed nucleotide) is typically about 6 nucleotides in 25 a transcriptional regulatory sequence (such as a promoter, length and is typically made up of the nucleotides adenosine enhancer, repressor, and the like) and/or an origin of replica and thymidine (also known as the Pribnow box). In some tion. As used herein, the terms "host,” “host cell,” “recombi embodiments, the nucleotide sequence of the -10 sequence is nant host' and “recombinant host cell may be used inter 5'-TATAAT or may comprise 3 to 6 bases pairs of the consen changeably. For examples of Such hosts, see Sambrook, Sus sequence. The presence of this box is essential to the start 30 2001). of the transcription. The -35 sequence of a core promoter is One or more nucleic acid sequences can be targeted for typically about 6 nucleotides in length. The nucleotide delivery to target prokaryotic or eukaryotic cells via conven sequence of the -35 sequence is typically made up of the each tional transformation or transfection techniques. As used of the four nucleosides. The presence of this sequence allows herein, the terms “transformation' and “transfection' are a very high transcription rate. In some embodiments, the 35 intended to refer to a variety of art-recognized techniques for nucleotide sequence of the -35 sequence is 5'-TTGACA or introducing an exogenous nucleic acid sequence (e.g., DNA) may comprise 3 to 6 bases pairs of the consensus sequence. In into a target cell, including calcium phosphate or calcium Some embodiments, the -10 and the -35 sequences are chloride co-precipitation, DEAE-dextran-mediated transfec spaced by about 17 nucleotides. Eukaryotic promoters are tion, lipofection, electroporation, optoporation, injection and more diverse than prokaryotic promoters and may be located 40 the like. Suitable transformation or transfection media several kilobases upstream of the transcription starting site. include, but are not limited to, water, CaCl2, cationic poly Some eukaryotic promoters contain a TATA box (e.g. con mers, lipids, and the like. Suitable materials and methods for taining the consensus sequence TATAAA or part thereof), transforming or transfecting target cells can be found in which is located typically within 40 to 120 bases of the Sambrook, 2001, and other laboratory manuals. In certain transcriptional start site. One or more upstream activation 45 instances, oligo concentrations of about 0.1 to about 0.5 sequences (UAS), which are recognized by specific binding micromolar (per oligo) can be used for transformation or proteins can act as activators of the transcription. Theses UAS transfection. sequences are typically found upstream of the transcription As used herein, the term “marker” or “reporter” refers to a initiation site. The distance between the UAS sequences and gene or protein that can be attached to a regulatory sequence the TATA box is highly variable and may be up to 1 kb. 50 of another gene or protein of interest, so that upon expression As used herein, the term “vector” refers to any genetic in a host cell or organism, the reporter can confer certain element, such as a plasmid, phage, transposon, cosmid, chro characteristics that can be relatively easily selected, identified mosome, artificial chromosome, episome, virus, virion, etc., and/or measured. Reporter genes are often used as an indica capable of replication when associated with the proper con tion of whether a certain gene has been introduced into or trol elements and which can transfer gene sequences into or 55 expressed in the host cell or organism. Examples of com between cells. The vector may contain a marker suitable for monly used reporters include: antibiotic resistance genes, use in the identification of transformed or transfected cells. auxotropic markers, B-galactosidase (encoded by the bacte For example, markers may provide antibiotic resistant, fluo rial gene lacZ), luciferase (from lightning bugs), chloram rescent, enzymatic, as well as other traits. As a second phenicol acetyltransferase (CAT; from bacteria), GUS example, markers may complement auxotrophic deficiencies 60 (B-glucuronidase; commonly used in plants) and green fluo or Supply critical nutrients not in the culture media. Types of rescent protein (GFP; from jelly fish). Reporters or markers vectors include cloning and expression vectors. As used can be selectable or screenable. A selectable marker (e.g., herein, the term "cloning vector” refers to a plasmid or phage antibiotic resistance gene, auxotropic marker) is a gene con DNA or other DNA sequence which is able to replicate fers a trait suitable for artificial selection; typically host cells autonomously in a host cell and which is characterized by one 65 expressing the selectable marker is protected from a selective or a small number of restriction endonuclease recognition agent that is toxic or inhibitory to cell growth. A screenable sites and/or sites for site-specific recombination. A foreign marker (e.g., gfp, lacZ) generally allows researchers to dis US 8,349,587 B2 21 22 tinguish between wanted cells (expressing the marker) and includes not only the gaseous form (CO) but also water unwanted cells (not expressing the marker or expressing at Solvated forms, such as bicarbonate ion. insufficient level). As used herein, the term “biosynthetic pathway' or “meta As used herein, the term “chemotroph” or “chemotrophic bolic pathway” refers to a set of anabolic or catabolic bio organism' refers to organisms that obtain energy from the chemical reactions for converting (transmuting) one chemical oxidation of electron donors in their environment. As used species into another. Anabolic pathways involve constructing herein, the term “chemoautotroph” or “chemoautotrophic a larger molecule from Smallermolecules, a process requiring organism' refers to organisms that produce complex organic energy. Catabolic pathways involve breaking down of larger compounds from simple inorganic carbon molecules using molecules, often releasing energy. As used herein, the term 10 “energy conversion pathway refers to a metabolic pathway oxidation of inorganic compounds as an external source of that transfers energy from an inorganic energy source to a energy. In contrast, "heterotrophs” or "heterotrophic organ reducing cofactor. The term “carbon fixation pathway' refers isms' refers to organisms that must use organic carbon for to a biosynthetic pathway that converts inorganic carbon, growth because they cannot convert inorganic carbon into Such as carbon dioxide, bicarbonate or formate, to reduced organic carbon. Instead, heterotrophs obtain energy by break 15 organic carbon, Such as one or more carbon product precur ing down the organic molecules they consume. Organisms sors. The term "carbon product biosynthetic pathway” refers that can use a mix of different sources of energy and carbon to a biosynthetic pathway that converts one or more carbon are mixotrophs or mixotrophic organisms which can alter product precursors to one or more carbon based products of nate, e.g., between autotrophy and heterotrophy, between interest. phototrophy and chemotrophy, between lithotrophy and orga As used herein, the term “engineered chemoautotroph” or notrophy, or a combination thereof, depending on environ “engineered chemoautotrophic organism” refers to organ mental conditions. isms that have been genetically engineered to convert inor As used herein, the term “inorganic energy source', 'elec ganic carbon compounds, such as carbon dioxide or formate, tron donor”, “source of reducing power” or “source of reduc to organic carbon compounds using energy derived from ing equivalents’ refers to chemical species, such as formate, 25 inorganic energy sources. The genetic modifications neces formic acid, methane, carbon monoxide, carbonyl sulfide, sary to produce an engineered chemoautotroph comprise the carbon disulfide, hydrogen sulfide, bisulfide anion, thiosul introduction of heterologous energy conversion pathway(s) fate, elemental Sulfur, molecular hydrogen, ferrous iron, and/or carbon fixation pathway(s) into the host organism. The ammonia, cyanide ion, and/or hydrocyanic acid, with high host organism can be originally heterotrophic organism. As potential electron(s) that can be donated to another chemical 30 used herein, an engineered chemoautotroph need not derive species with a concomitant release of energy (a process by its organic carbon compounds solely from inorganic carbon which the electron donor undergoes “oxidation” and the and need not derive its energy solely from inorganic energy other, recipient chemical species or "electron acceptor Sources. The term engineered chemoautotroph may also be undergoes “reduction'). Inorganic energy sources are gener used to refer to originally autotrophic or mixotrophic organ ally but not always present external to the cell or biological 35 isms that have been genetically engineered to include one or organism. The term “reducing cofactor” refers to intracellular more energy conversion, carbon fixation and/or carbon prod redox and energy carriers, such as NADH, NADPH, uct biosynthetic pathways in addition or instead of its endog ubiquinol, menaquinol, cytochromes, flavins and/or ferre enous autotrophic capability. The term “engineer,” “engineer doxin, that can donate high energy electrons in reduction ing’ or “engineered as used herein, refers to genetic oxidation reactions. The terms “reducing cofactor”, “reduced 40 manipulation or modification of biomolecules such as DNA, cofactor and “redox cofactor' can be used interchangeably. RNA and/or protein, or like technique commonly known in As used herein, the term “inorganic carbon” or “inorganic the biotechnology art. carbon compound” refers to chemical species, such as carbon As used herein, the term “carbon based products of inter dioxide, carbon monoxide, formate, formic acid, carbonic est” refers to include alcohols such as ethanol, propanol, acid, bicarbonate, carbon monoxide, carbonyl Sulfide, carbon 45 isopropanol, butanol, octanol, fatty alcohols, fatty acid esters, disulfide, cyanide ion and/or hydrocyanic acid, that contains wax esters; hydrocarbons and alkanes such as propane, carbon but lacks the carbon-carbon bounds characteristic of octane, diesel, Jet Propellant 8, polymers such as terephtha organic carbon compounds. Inorganic carbon may be present late, 1,3-propanediol. 1,4-butanediol, polyols, polyhydroxy in a gaseous form, Such as carbon monoxide or carbon diox alkanoates (PHAs), polyhydroxybutyrates (PHBs), acrylate, ide, or may be present in a liquid form, such as formate. 50 adipic acid, epsilon-caprolactone, isoprene, caprolactam, As used herein, the term "central metabolite” refers to rubber, commodity chemicals such as lactate, docosa organic carbon compounds, such as acetyl-coA, pyruvate, hexaenoic acid (DHA), 3-hydroxypropionate, Y-Valerolac pyruvic acid, 3-hydropropionate, 3-hydroxypropionic acid, tone, lysine, serine, aspartate, aspartic acid, Sorbitol, ascor glycolate, glycolic acid, glyoxylate, glyoxylic acid, dihy bate, ascorbic acid, isopentanol, lanosterol, omega-3 DHA. droxyacetone phosphate, glyceraldehyde-3-phosphate, 55 lycopene, itaconate, 1,3-butadiene, ethylene, propylene, Suc malate, malic acid, lactate, lactic acid, acetate, acetic acid, cinate, citrate, citric acid, glutamate, malate, 3-hydroxypri citrate and/or citric acid, that can be converted into carbon onic acid (HPA), lactic acid, THF, gamma butyrolactone, based products of interest by a host cell or organism. Central pyrrolidones, hydroxybutyrate, glutamic acid, levulinic acid, metabolites are generally restricted to those reduced organic acrylic acid, malonic acid; specialty chemicals such as caro compounds from which all or most cell mass components can 60 tenoids, isoprenoids, itaconic acid; biological Sugars such as be derived in a given host cell or organism. In some embodi glucose, fructose, lactose, Sucrose, starch, cellulose, hemicel ments, the central metabolite is also the carbon product of lulose, glycogen, Xylose, dextrose, galactose, uronic acid, interest in which case no additional chemical conversion is maltose, polyketides, or glycerol; central metabolites, such as necessary. acetyl-coA, pyruvate, pyruvic acid, 3-hydropropionate, 3-hy Reference to a particular chemical species includes not 65 droxypropionic acid, glycolate, glycolic acid, glyoxylate, only that species but also water-solvated forms of the species, glyoxylic acid, dihydroxyacetone phosphate, glyceralde unless otherwise stated. For example, carbon dioxide hyde-3-phosphate, malate, malic acid, lactate, lactic acid, US 8,349,587 B2 23 24 acetate, acetic acid, citrate and/or citric acid, from which densities but not at the same time. Hence, additional technol other carbon products can be made; pharmaceuticals and ogy improvements are needed for electrochemical production pharmaceutical intermediates such as 7-aminodesacetox of formate. ycephalosporonic acid, cephalosporin, erythromycin, Organisms or Host Cells for Engineering polyketides, statins, paclitaxel, docetaxel, terpenes, peptides, 5 The host cell or organism, as disclosed herein, may be steroids, omegafatty acids and other Such suitable products of chosen from eukaryotic or prokaryotic systems, such as bac interest. Such products are useful in the context of biofuels, terial cells (Gram-negative or Gram-positive), , yeast industrial and specialty chemicals, as intermediates used to cells (for example, Saccharomyces cereviseae or Pichiapas make additional products, such as nutritional Supplements, toris), animal cells and cell lines (such as Chinese hamster nutraceuticals, polymers, paraffin replacements, personal 10 ovary (CHO) cells), plant cells and cell lines (such as Arabi dopsis T87 cells and Tobacco BY-2 cells), and/or insect cells care products and pharmaceuticals. and cell lines. Suitable cells and cell lines can also include As used herein, the term “hydrocarbon” refers a chemical those commonly used in laboratories and/or industrial appli compound that consists of the elements carbon, hydrogen and cations. In some embodiments, host cells/organisms can be optionally, oxygen. "Surfactants' are substances capable of selected from Escherichia coli, Gluconobacter oxydans, Glu reducing the Surface tension of a liquid in which they are 15 conobacter Achromobacter delmarvae, Achromobacter vis dissolved. They are typically composed of a water-soluble cosus, Achronobacter lacticum, Agrobacterium tumefaciens, head and a hydrocarbon chain or tail. The water soluble group Agrobacterium radiobacter, Alcaligenes faecalis, Arthro is hydrophilic and can either be ionic or nonionic, and the bacter citreus, Arthrobacter tumescens, Arthrobacter paraf hydrocarbon chain is hydrophobic. The term “biofuel is any fineus, Arthrobacter hydrocarboglutamicus, Arthrobacter fuel that derives from a biological source. 20 Oxydans, Aureobacterium Saperdae, Azotobacter indicus, The accession numbers provided throughout this descrip Brevibacterium ammoniagenes, divaricatum, Brevibacte tion are derived from the NCBI database (National Center for rium lactofermentum, Brevibacterium flavum, Brevibacte Biotechnology Information) maintained by the National rium globosum, Brevibacterium fiscum, Brevibacterium Institute of Health, USA. The accession numbers are pro ketoglutamicum, Brevibacterium helicolum, Brevibacterium vided in the database on Aug. 1, 2011. The Enzyme Classifi 25 pusillum, Brevibacterium testaceum, Brevibacterium cation Numbers (E.C.) provided throughout this description roseum, Brevibacterium immariophilium, Brevibacterium are derived from the KEGG Ligand database, maintained by linens, Brevibacterium protopharmiae, Corynebacterium the Kyoto Encyclopedia of Genes and Genomics, sponsored acetophilum, Corynebacterium glutamicum, Corynebacte in part by the University of Tokyo. The E.C. numbers are rium callunae, Corynebacterium acetoacidophilum, Coryne provided in the database on Aug. 1, 2011. 30 bacterium acetoglutamicum, Enterobacter aerogenes, Other terms used in the fields of recombinant nucleic acid Erwinia amylovora, Erwinia carotovora, Erwinia herbicola, technology, microbiology, metabolic engineering, and Erwinia chrysanthemi, Flavobacterium peregrinum, Fla molecular and cell biology as used herein will be generally vobacterium ficatum, Flavobacterium aurantinum, Fla understood by one of ordinary skill in the applicable arts. vobacterium rhenanum, Flavobacterium sewanense, Fla Electrolytic/Electrochemical Production of Hydrogen and 35 vobacterium breve, Flavobacterium meningosepticum, Formate Mesoplasma florum, Micrococcus sp. CCM825, Morganella Hydrogen gas and formate can be produced via the elec morgani, Nocardia opaca, Nocardia rugosa, Planococcus trolysis of HO and the electrochemical conversion CO, eucinatus, Proteus rettgeri, Propionibacterium shermanii, respectively Whipple, 2010. Each has advantages and dis Pseudomonas Synxantha, Pseudomonas azotoformans, advantages as inorganic energy sources for the engineered 40 Pseudomonas fluorescens, Pseudomonas Pseudomonas chemoautotroph of the present invention. Stutzeri, Pseudomonas acidovolans, Pseudomonas mucidol Hydrogen gas mixtures with air are explosive across a wide ens, Pseudomonas testosteroni, Pseudomonas aeruginosa, range of hydrogen compositions. Hence, use of hydrogen gas Rhodococcus erythropolis, Rhodococcus rhodochirous, as an inorganic energy source and oxygen gas as the terminal Rhodococcus sp. ATCC 15592, Rhodococcus sp. ATCC electronacceptor of an engineered chemoautotroph must nec 45 19070, ureae, Staphylococcus aureus, Vibrio essarily be set up to cope with the resulting safety risk. To metschnikovii, Vibrio tyrogenes, Actinomadura madurae, address this challenge, the reactor or fermentation conditions Actinomyces violaceochromogenes, Kitasatosporia paru may be kept Substantially anaerobic and alternative electron losa, Streptomyces coelicolor; Streptomyces flavelus, Strep acceptors, such as nitrate, may be used. tomyces griseolus, Streptomyces lividans, Streptomyces Oli Hydrogen is a gas with low water solubility which creates 50 vaceus, Streptomyces tanashiensis, Streptomyces virginiae, mass transfer limitations when using hydrogen as an inor Streptomyces antibioticus, Streptomyces cacaoi, Streptomy ganic energy source for engineered chemoautotrophs (bio ces lavendulae, Streptomyces viridochromogenes, Aeromo logical systems are aqueous). At large reactor or fermentor nas Salmonicida, subtilis, Bacillus pumilus, Bacillus scales, high rates of mass transfer from the gas to liquid circulans, Bacillus thiaminolyticus, Escherichia freundii, phases is challenging (Example 11). There are new technolo 55 Microbacterium ammoniaphilum, Serratia marcescens, Sal gies being developed to address this issue U.S. Pat. No. monella enterica, Salmonella typhimurium, Salmonella 7.923,227. Formate, due to its higher solubility in HO, does schottmulleri, Xanthomonas citri, Sacchromyces spp. (e.g., not have this problem (Example 11). Saccharomyces cerevisiae, Saccharomyces bavanus, Saccha The energy efficiency of electrolysis for production of romyces boulardii, Schizosaccharomyces pombe), Arabidop hydrogen or electrochemical conversion of carbon dioxide 60 Sis thaliana, Nicotiana tabacum, CHO cells, 3T3 cells, impacts the overall energy efficiency of a bio-manufacturing COS-7 cells, DuCaP cells, HeLa cells, LNCap cells, THP1 process using an engineered chemoautotroph of the present cells, 293-T cells, Baby Hamster Kidney (BHK) cells, HKB invention. Electrolyzers achieve overall energy efficiencies of cells, hybridoma cells, as well as bacteriophage, baculovirus, 56-73% at current densities of 110-300 mA/cm (alkaline adenovirus, or any modifications and/or derivatives thereof. electrolyzers) or 800-1600 mA/cm (PEM electrolyzers) 65 In certain embodiments, the genetically modified host cell is Whipple, 2010. In contrast, electrochemical systems to date a Mesoplasma florum, E. coli, yeast, archaea, mammalian have achieved moderate energy efficiencies or high current cells and cell lines, green plant cells and cell lines, or algae. US 8,349,587 B2 25 26 Non-limiting examples of algae that can be used in this aspect disclosed methods, such as, but not limited to, Aeropyrum of the invention include: Botryococcus braunii; Neochloris pernix, Agrobacterium tumefaciens, Anabaena, Anopheles Oleoabundans, Scenedesmus dimorphus, Euglena gracilis, gambiae, Apis mellifera, Aquifex aeolicus, Arabidopsis Nannochloropsis Salina, Dunaliella tertiolecta, Tetraselinis thaliana, Archaeoglobus fulgidus, Ashbya gossypii Bacillus chui, Isochrysis galbana, Phaeodactylum tricornutum, Pleu 5 anthracis, Bacillus cereus, Bacillus halodurans, Bacillus rochrysis carterae, Prynnesium parvum, Tetraselmis licheniformis, Bacillus subtilis, Bacteroides fragilis, suecica; or Spirulina species. Those skilled in the art would Bacteroides thetaiotaOmicron, Bartonella hemselae, Bar understand that the genetic modifications, including meta tonella quintana, Baellovibrio bacteriovirus, Bifidobacte bolic alterations exemplified herein, are described with ref rium longum, Blochmannia floridanus, Bordetella bron erence to a suitable host organism Such as E. coli and their 10 corresponding metabolic reactions or a suitable source organ chiseptica, Bordetella parapertussis, Bordetella pertussis, ism for desired nucleic acids such as genes for a desired Borrelia burgdorferi, Bradyrhizobium japonicum, Brucella metabolic pathway. However, given the complete genome melitensis, Brucella suis, Buchnera aphidicola, Burkhold sequencing of a wide variety of organisms and the high level eria mallei. Burkholderia pseudomalilei, Caenorhabditis of skill in the area of genomics, those skilled in the art would 15 briggsae, Caenorhabditis elegans, Campylobacter jejuni; readily be able to apply the teachings and guidance provided Candida glabrata, Canis familiaris, Caulobacter Crescen herein to essentially all other host cells and organisms. For tus, Chlamydia muridarum, Chlamydia trachomatis, example, the E. coli metabolic modifications exemplified Chlamydophila caviae, Chlamydophila pneumoniae, Chlo herein can readily be applied to other species by incorporat robium tepidum, Chronobacterium violaceum, Ciona intes ing the same or analogous encoding nucleic acid from species 20 tinalis, Clostridium acetobutyllicum, Clostridium perfin other than the referenced species. Such genetic modifications gens, Clostridium tetania Corynebacterium diphtheriae, include, for example, genetic alterations of species homologs, Corynebacterium efficiens, Coxiella burnetii, Cryptospo in general, and in particular, orthologs, paralogs or non ridium hominis, Cryptosporidium parvum, orthologous gene displacements. merolae, Debaryomyces hansenii; Deinococcus radiodu In certain embodiments, the host cell or organism is a 25 rans, Desulfotalea psychrophila, Desulfovibrio vulgaris, microorganism which includes prokaryotic and eukaryotic Drosophila melanogaster, Encephalitozoon cuniculi: microbial species from the Domains Archaea, Bacteria and Enterococcus faecalis, Erwinia carotovora, Escherichia Eucarya, the latter including yeast and filamentous fungi, coli. Fusobacterium nucleatum, Gallus gallus, Geobacter protozoa, algae, or higher Protista. The terms “microbial sulfurreducens, Gloeobacter violaceus, Guillardia theta, organisms”, “microbial cells' and “microbes’ are used inter 30 Haemophilus ducreyi, Haemophilus influenzae, Halobacte changeably with the term microorganism. rium, Helicobacter hepaticus, Helicobacter pylori; Homo In certain embodiments, host microbial organisms can be sapiens, Kluyveromyces waltii; Lactobacillus johnsonii, selected from, and the engineered microbial organisms gen Lactobacillus plantarum, Legionella pneumophila, Leifso erated in, for example, bacteria, yeast, fungus or any of a nia xyli, Lactococcus lactis, Leptospira interrogans, Listeria variety of other microorganisms applicable to fermentation 35 innocua, Listeria monocytogenes, Magnaporthe grisea, processes. Exemplary bacteria include species selected from Mannheimia succiniciproducens, Mycoplasma florum, Escherichia coli, Klebsiella Oxytoca, Anaerobiospirillum Mesorhizobium loti Methanobacterium thermoautotrophi succiniciproducens, Acetobacter aceti, Actinobacillus succi cum, Methanococcoides burtonii; Methanococcus jann nogenes, Mannheimia succiniciproducens, Mesoplasma flo aschii; Methanococcus maripaludis, Methanogenium frigi rum, Rhizobium etli, Bacillus subtilis, Corynebacterium 40 dum, kandleri, Methanosarcina acetivorans, glutamicum, Gluconobacter Oxydans, Zymomonas mobilis, Methanosarcina mazei Methylococcus capsulatus, Mus Lactococcus lactis, Lactobacillus plantarum, Cupriavidus musculus, Mycobacterium Bovis, Mycobacterium leprae, necator (formerly Ralstonia eutropha), Streptomyces coeli Mycobacterium paratuberculosis, Mycobacterium tubercu color; Clostridium ljungdahlii, Clostridium thermocellum, losis, Mycoplasma gallisepticum, Mycoplasma genitalium, Clostridium acetobutylicum, Pseudomonas fluorescens, and 45 Mycoplasma mycoides, Mycoplasma penetrans, Myco Pseudomonasputida. Exemplary yeasts or fungi include spe plasma pneumoniae, Mycoplasma pulmonis, Mycoplasma cies selected from Saccharomyces cerevisiae, Schizosaccha mobile, Nanoarchaeum equitans, Neisseria meningitidis, romyces pombe, Kluyveromyces lactis, Kluyveromyces marx Neurospora crassa, Nitrosomonas europaea, Nocardia far ianus, Aspergillus terreus, Aspergillus niger, Penicillium cinica, Oceanobacillus iheyensis; Onions yellows phyto chrysogenium and Pichia pastoris. E. coli is a particularly 50 plasma; Oryza sativa, Pan troglodytes, Pasteurella multo useful host organisms since it is a well characterized micro cida, Phanerochaete chrysosporium, Photorhabdus bial organism Suitable for genetic engineering. Other particu luminescens, Picrophilus torridus, Plasmodium falciparum, larly useful host organisms include yeast Such as Saccharo Plasmodium voelii voelii, Populus trichocarpa, Porphy myces cerevisiae. romonas gingivalis Prochlorococcus marinus, Propionibac In various aspects of the invention, the cells are genetically 55 terium acnes, Protochlamydia amoebophila, Pseudomonas engineered or metabolically evolved, for example, for the aeruginosa, Pseudomonas putida, Pseudomonas Syringae, purposes of optimized energy conversion and/or carbon fixa Pyrobaculum aerophilum, Pyrococcus abyssi, Pyrococcus tion. The terms “metabolically evolved or “metabolic evo firiosus, Pyrococcus horikoshii, ; Ralsto lution” relates to growth-based selection (metabolic evolu nia Solanacearum, Rattus norvegicus, Rhodopirellula bal tion) of host cells that demonstrate improved growth (cell 60 tica, Rhodopseudomonas palustris, Rickettsia Conorii, Rick yield). Yet other suitable organisms include synthetic cells or ettsia typhi; Rickettsia prowaZeki. Rickettsia Sibirica, cells produced by synthetic genomes US Patent Publication Saccharomyces cerevisiae, Saccharomyces bavanus, Sac Number 2007/0264688 and cell-like systems or synthetic charomyces boulardii; Saccharopolyspora erythraea, cells US Patent Publication Number 2007/02698.62. Schizosaccharomyces pombe, Salmonella enterica, Salmo Exemplary genomes and nucleic acids include full and 65 nella typhimurium, Schizosaccharomyces pombe, partial genomes of a number of organisms for which genome Shewanella Oneidensis, Shigella flexneria. Sinorhizobium sequences are publicly available and can be used with the melilotii Staphylococcus aureus, Staphylococcus epidermi US 8,349,587 B2 27 28 dis, Streptococcus agalactiae, Streptococcus mutans, Strep carbon product biosynthetic activity for one or more genes in tococcus pneumoniae, Streptococcus pyogenes, Streptococ related or distant species, including for example, homologs, cus thermophilus, Streptomyces avermitilis, Streptomyces orthologs, paralogs and nonorthologous gene displacements coelicolor, Sulfolobus solfataricus, Sulfolobus tokodai, Syn of known genes, and the replacement of gene homolog either echococcus, Synechoccous elongates, Synechocystis, Tak within an particular engineered chemoautotroph or between ifugu rubripes, Tetraodon nigroviridis, Thalassiosira pseud different host cells for the engineered chemoautotroph is Onana, Thermoanaerobacter tengcongensis, Thermoplasma routine and well known in the art. Accordingly, the metabolic acidophilum, Thermoplasma volcanium, Thermosynechoc modifications enabling chemoautotrophic growth and pro occus elongatus. Thermotagoa maritima, Thermus thermo duction of carbon-based products described herein with ref philus, Treponema denticola, Treponema pallidum, Tro 10 erence to a particular organism Such as E. coli can be readily pheryma whipplei. Ureaplasma urealyticum, Vibrio applied to other microorganisms, including prokaryotic and cholerae, Vibrio parahaemolyticus, Vibrio vulnificus, eukaryotic organisms alike. Given the teachings and guidance Wigglesworthia glossinidia, Wolbachia pipientis, Wolinella provided herein, those skilled in the art would know that a succinogenes, Xanthomonas axonopodis, Xanthomonas metabolic modification exemplified in one organism can be campestris, Xvlella fastidiosa, Yarrowia lipolytica, Yersinia 15 applied equally to other organisms. pseudotuberculosis; and Yersinia pestis nucleic acids. In some instances, such as when an alternative energy In certain embodiments, sources of encoding nucleic acids conversion, carbon fixation or carbon product biosynthetic for enzymes for an energy conversion pathway, carbon fixa pathway exists in an unrelated species, chemoautotrophic tion pathway or carbon product biosynthetic pathway can growth and production of carbon-based products can be con include, for example, any species where the encoded gene ferred onto the host species by, for example, exogenous product is capable of catalyzing the referenced reaction. expression of a paralog or paralogs from the unrelated species Exemplary species for Such sources include, for example, that catalyzes a similar, yet non-identical metabolic reaction Aeropyrum permix, Aquifex aeolicus, Aquifex pyrophilus, to replace the referenced reaction. Because certain differ Candidatus Arcobacter sulfidicus, Candidatus Endorifia ences among metabolic networks exist between different persephone, Candidatus Nitrospira defluvii; Chlorobium 25 organisms, those skilled in the art would understand that the limicola, Chlorobium tepidum, Clostridium pasteurianum, actual gene usage between different organisms may differ. Desulfobacter hydrogenophilus, Desulfurobacterium therm However, given the teachings and guidance provided herein, olithotrophum, Geobacter metallireducens, those skilled in the art also would understand that the teach sp. NRC-1; Hydrogenimonas thermophila, Hydrogenivirga ings and methods of the invention can be applied to all micro strain 128-5-R1; Hydrogenobacter thermophilus, Hydro 30 bial organisms using the cognate metabolic modifications to genobaculum sp.YO4AAS1; Lebetimonas acidiphila Pd55"; those exemplified herein to construct a microbial organism in Leptospirillum ferriphilum, Leptospirillum ferrodiazotro a species of interest that would produce carbon-based prod phum, Leptospirillum rubarum, Magnetococcus marinus, ucts of interest from inorganic energy and inorganic carbon. Magnetospirillum magneticum, Mycobacterium bovis, It should be noted that various engineered Strains and/or Mycobacterium tuberculosis, Methylobacterium nodulans, 35 mutations of the organisms or cell lines discussed herein can Nautilia lithotrophica, Nautilia profindicola, Nautilia sp. also be used. strain AmN; Nitratifractorsalsuginis, Nitratiruptorsp. strain Methods for Identification and Selection of Candidate SB155-2: Persephonella marina, Rimcaris exoculata epi Enzymes for a Metabolic Activity of Interest symbiont, Streptomyces avermitilis, Streptomyces coeli In one aspect, the present invention provides a method for color, Sulfolobus avermitilis, Sulfolobus solfataricus, Sul 40 identifying candidate proteins or enzymes of interest capable folobus tokodai Sulfurihydrogenibium azorense, of performing a desired metabolic activity. Leveraging the Sulfurihydrogenibium sp. YO3AOP1; Sulfurihydrogenibium exponential growth of gene and genome sequence databases yellowstonense, Sulfurihydrogenibium subterraneum, Sulfil and the availability of commercial gene synthesis at reason rimonas autotrophica, Sulfurimonas denitrificans, Sulfuri able cost, Bayer and colleagues adopted a synthetic metage monas paralvinella, Sulfurovum lithotrophicum, Sulfurovum 45 nomics approach to bioinformatically search sequence data sp. strain NBC37-1; Thermocrinis ruber. Thermovibrio bases for homologous or similar enzymes, computationally ammonificans. Thermovibrio ruber. Thioreductor micati optimize their encoding gene sequences for heterologous soli; Nostoc sp. PCC 7120; Acidithiobacillus ferrooxidans, expression, synthesize the designed gene sequence, clone the Allochromatium vinosum, Aphanothece halophytica, Oscil synthetic gene into an expression vector and screen the result latoria limnetica, Rhodobacter capsulatus. Thiobacillus 50 ing enzyme for a desired function in E. coli or yeast Bayer, denitrificans, Cupriavidus necator (formerly Ralstonia 2009. However, depending on the metabolic activity or pro eutropha), Methanosarcina barkeri Methanosarcia mazei; tein of interest, there can be thousands of putative homologs Methanococcus maripaludis, Mycobacterium Smegmatis, in the publicly available sequence databases. Thus, it can be Burkholderia stabilis, Candida boidinii, Candida methylica, experimentally challenging or in Some cases infeasible to Pseudomonas sp. 101; Methylcoccus capsulatus, Mycobac 55 synthesize and Screen all possible homologs at reasonable terium gastri, Cenarchaeum symbiosum, Chloroflexus cost and within a reasonable timeframe. To address this chal aurantiacus, Erythrobacter sp. NAP1; Metallosphaera lenge, in one aspect, this invention provides an alternate sedula; gamma proteobacterium NOR51-B; marine gamma method for identifying and selecting candidate protein proteobacterium HTCC2080; Nitrosopumilus maritimus, sequences for a metabolic activity of interest. The method Roseiflexus castenholzii. Synechococcus elongatus; and the 60 comprises the following steps. First, for a desired metabolic like, as well as other exemplary species disclosed herein or activity, such as an enzyme-catalyzed step in an energy con available as source organisms for corresponding genes. How version, carbon fixation or carbon product biosynthetic path ever, with the complete genome sequence publicly available way, one or more enzymes of interest are identified. Typically, for now more than 4400 species (including viruses), includ the enzyme(s) of interest have been previously experimen ing 1701 microbial genomes and a variety of yeast, fungi, 65 tally validated to perform the desired activity, for example in plant, and mammalian genomes, the identification of genes the published scientific literature. In some embodiments, one encoding the requisite energy conversion, carbon fixation or or more of the enzymes of interest has been heterologously US 8,349,587 B2 29 30 expressed and experimentally demonstrated to be functional. described here, whose gene products catalyze a similar or Second, a bioinformatic search is performed on protein clas substantially similar metabolic reaction. Such modifications sification or grouping databases, such as Clusters of Ortholo can be done, for example, to increase flux through a metabolic gous Groups (COGs) Tatusov, 1997; Tatusov, 2003. Entrez pathway (for example, flux of energy or carbon), to reduce Protein Clusters (ProtClustDB) Klimke, 2009 and/or Inter 5 accumulation of toxic intermediates, to improve the kinetic Pro Zdobnov, 2001, to identify protein groupings that con properties of the pathway, and/or to otherwise optimize the tain one or more of the enzyme(s) of interest (or closely engineered chemoautotroph. Indeed, gene homologs for a related enzymes). If the enzyme(s) of interest contain mul particular metabolic activity may be preferable when confer tiple subunits, then the protein corresponding to a single ring chemoautrotrophic capability on a different host cell or Subunit, for example the catalytic Subunit or the largest Sub 10 organism. unit, is selected as being representative of the enzyme(s) of interest for the purposes of bioinformatic analysis. Third, a Methods for Design of Nucleic Acids Encoding Enzymes for systematic, expert-guided search is then performed to iden Heterologous Expression tify which database groupings are likely to contain a majority In one aspect, the present invention provides a computer of members whose metabolic activity is the same or similar as 15 program product for designing a nucleic acid that encodes a the protein(s) of interest. Fourth, the list of NCBI Protein protein or enzyme of interest that is codon optimized for the accession numbers corresponding to every members of each host cell or organism (the target species). The program can selected database grouping is then compiled and the corre reside on a hardware computer readable storage medium and sponding protein sequences are downloaded from the having a plurality of instructions which, when executed by a sequence databases. Protein sequences available from processor, cause the processor to perform operations. The Sources other than the public sequence databases may be program comprises the following operations. At each amino added to this set. Fifth, optionally, one or more outgroup acid position of the protein of interest, the codon is selected in protein sequences are identified and added to the set. Out which the rank order codon usage frequency of that codon in group proteins are proteins which may share some functional, the target species is the same as the rank order codon usage structural, or sequence similarities to the model enzyme(s) 25 frequency of the codon that occurs at that position in the but lack an essential feature of the enzyme(s) of interest or Source species gene. To select the desired codon at each desired metabolic activity. For example, the enzyme flavocy amino acid position, both the genetic code (the mapping of tochrome c (E.C. 1.8.2.3) is similar to sulfide-quinone oxi codons to amino acids Jukes, 1993) and codon frequency doreductase (E.C. 1.8.5.4) in that it oxidizes hydrogen sulfide table (the frequency with which each synonymous codon but it reduces cytochrome c instead of ubiquinone and thus 30 occurs in a genome or genome Grantham, 1980) for both the offers a useful outgroup during bioinformatic analysis of Source and target species are needed. For source species for sulfide-quinone oxidoreductases. Sixth, the complete set of which a complete genome sequence is available, the usage protein sequences are aligned with an sequence alignment frequency for each codon may be calculate simply by Sum program capable of aligning large numbers of sequences, ming the number of instances of that codon in all annotated such as MUSCLE Edgar, 2004a: Edgar, 2004b). Seventh, a 35 coding sequences, dividing by the total number of codons in tree is drawn based on the resulting MUSCLE alignment via that genome, and then multiplying by 1000. For source spe methods known to those skilled in the art, such as neighbor cies for which no complete genome is available, the usage joining Saitou, 1987 or UPGMA Sokal, 1958; Murtagh, frequency can be computed based on any available coding 1984. Eighth, different clades are selected from the tree so sequences or by using the codon frequency table of a closely that the number of clades equals the desired number of pro 40 related organism. The program then preferably standardizes teins for Screening. Finally, one protein from each clade is the start codonto ATG, the stop codonto TAA, and the second selected for gene synthesis and functional Screening based on and second last codons to one of twenty possible codons (one the following heuristics per amino acid). The program then Subjects the codon opti Preference is given to proteins that have been heterolo mized nucleic acid sequence to a series of checks to improve gously expressed and experimentally demonstrated to 45 the likelihood that the sequence can be synthesized via com have the desired metabolic activity. mercial gene synthesis and Subsequently manipulated via Preference is given to proteins that have been biochemi molecular biology Sambrook, 2001 and DNA assembly cally characterized to have the desired metabolic activity methods Knight, 2003: Knight, 2007: WO/2010/070295). previously. These checks comprise identifying if key restriction enzyme Preference is given to proteins from Source organisms for 50 recognition sites used in a DNA assembly standard or DNA which there is strong experimental or genomic evidence assembly method are present; if hairpins whose GC content that the organism has the desired metabolic activity. exceeds a threshold percentage. Such as 60%, and whose Preference is given to proteins in which the key catalytic, length exceeds a threshold number of base pairs, such as 10, binding and/or other signature residues are conserved are present; if sequence repeats are present; if any Subse with respect to the protein(s) of interest. 55 quence between 100 and 150 nucleotides in length exceeds a Preference is given to protein from source organisms threshold GC content, such as 65%; if G or Chomopolymers whose optimal growth temperature is similar to that of greater than 5 nucleotides in length are present; and, option the host cell or organism. For example, if the host cell is ally, if any sequence motifs are present that might give rise to a mesophile, then the source organism is also a meso spurious transposon insertion sites, transcriptional or transla phile. 60 tional initiation or termination, mRNA secondary structure, Therefore, in constructing the engineered chemoautotroph RNase cleavage, and/or transcription factor binding. If the of the invention, those skilled in the art would understand that codon optimized nucleic acid sequence fails any of these by applying the teaching and guidance provided herein, it is checks, the program then iterates through all possible Syn possible to replace or augment particular genes within a meta onymous mutations and designs a new nucleic acid sequence bolic pathway, such as an energy conversion pathway, a car 65 that both passes all checks and minimizes the difference in bon fixation pathway, and/or a carbon product biosynthetic codon frequencies between the original and new nucleic acid pathway, with homologs identified using the methods Sequence. US 8,349,587 B2 31 32 Various implementations of the systems and techniques Program 210 may be a computer program capable of per described here can be realized in digital electronic circuitry, forming the processes and functions described above. Pro integrated circuitry, specially designed ASICs (application gram 210 may include various instructions and Subroutines, specific integrated circuits), computer hardware, firmware, which, when loaded into memory 206 and executed by pro software, and/or combinations thereof. These various imple cessor 204 cause processor 204 to perform various opera mentations can include one or more computer programs that tions, some or all of which may effectuate the methods, pro are executable and/or interpretable on a programmable sys cesses, and/or functions associated with the presently tem including at least one programmable processor, which disclosed embodiments. may be special or general purpose, coupled to receive data Although not shown, computer processing device 200 may 10 include various forms of input and output. The I/O may and instructions from, and to transmit data and instructions to, include network adapters, USB adapters, Bluetooth radios, a storage system, at least one input device, and at least one mice, keyboards, touchpads, displays, touch screens, LEDs, output device. Such computer programs (also known as pro vibration devices, speakers, microphones, sensors, or any grams, Software, software applications or code) may include other input or output device for use with computer processing machine instructions for a programmable processor, and may 15 device 200. be implemented in any form of programming language, Methods for Expression of Heterologous Enzymes including high-level procedural and/or object-oriented pro Composite nucleic acids can be constructed to include one gramming languages, and/or in assembly/machine lan or more energy conversion, carbon fixation and optionally guages. A computer program may be deployed in any form, carbon product biosynthetic pathway encoding nucleic acids including as a stand-alone program, or as a module, compo as exemplified herein. The composite nucleic acids can Sub nent, Subroutine, or other unit Suitable for use in a computing sequently be transformed or transfected into a suitable host environment. A computer program may be deployed to be organism for expression of one or more proteins of interest. executed or interpreted on one computer or on multiple com Composite nucleic acids can be constructed by operably link puters at one site, or distributed across multiple sites and ing nucleic acids encoding one or more standardized genetic interconnected by a communication network. 25 parts with protein(s) of interest encoding nucleic acids that A computer program may, in an embodiment, be stored on have also been standardized. Standardized genetic parts are a computer readable storage medium. A computer readable nucleic acid sequences that have been refined to conform to storage medium stores computer data, which data can include one or more defined technical standards, such as an assembly computer program code that is executed and/or interpreted by standard Knight, 2003: Shetty, 2008: Shetty, 2011). Stan a computer system or processor. By way of example, and not 30 dardized genetic parts can encode transcriptional initiation limitation, a computer readable medium may comprise com elements, transcriptional termination elements, translational puter readable storage media, for tangible or fixed storage of initiation elements, translational termination elements, pro data, or communication media for transient interpretation of tein affinity tags, protein degradation tags, protein localiza code-containing signals. Computer readable storage media, tion tags, selectable markers, replication elements, recombi may refer to physical or tangible storage (as opposed to sig 35 nation sites for integration onto the genome, and more. nals) and may include without limitation volatile and non Standardized genetic parts have the advantage that their func volatile, removable and non-removable media implemented tion can be independently validated and characterized Kelly, in any method or technology for the tangible storage of infor 2009 and then readily combined with other standardized mation Such as computer-readable instructions, data struc parts to produce functional nucleic acids Canton, 2008. By tures, program modules or other data. Computer readable 40 mixing and matching standardized genetic parts encoding storage media includes, but is not limited to, RAM, ROM, different expression control elements with nucleic acids EPROM, EEPROM, flash memory or other solid state encoding proteins of interest, transforming the resulting memory technology, CD-ROM, DVD, or other optical stor nucleic acid into a Suitable host cell and functionally screen age, magnetic cassettes, magnetic tape, magnetic disk storage ing the resulting engineered cell, the process of both achiev or other magnetic storage devices, or any other physical or 45 ing Soluble expression of proteins of interest and validating material medium which can be used to tangibly store the the function of those proteins is made dramatically faster. For desired information or data or instructions and which can be example, the set of standardized parts might comprise con accessed by a computer or processor. stitutive promoters of varying strengths Davis, 2011, ribo FIG. 2 shows a block diagram of a generic processing Some binding sites of varying strengths Anderson, 2007 and architecture, which may execute software applications and 50 protein degradation of tags of varying strengths Andersen, processes. Computer processing device 200 may be coupled 1998). to display 202 for graphical output. Processor 204 may be a For exogenous expression in E. coli or other prokaryotic computer processor capable of executing software. Typical cells, Some nucleic acids encoding proteins of interest can be examples of processor 204 are general-purpose computer modified to introduce solubility tags onto the protein of inter processors (such as Intel(R) or AMDR) processors), ASICs, 55 est to ensure soluble expression of the protein of interest. For microprocessors, any other type of processor, or the like. example, addition of the maltose binding protein to a protein Processor 204 may be coupled to memory 206, which may be of interest has been shown to enhance soluble expression in E. a volatile memory (e.g. RAM) storage medium for storing coli Sachdev, 1998; Kapust, 1999: Sachdev, 2000). Either instructions and/or data while processor 204 executes. Pro alternatively or in addition, chaperone proteins, such as cessor 204 may also be coupled to storage device 208, which 60 DnaK, DnaJ. GroES and GroEL may be either co-expressed may be a non-volatile storage medium such as a hard drive, or overexpressed with the proteins of interest, Such as FLASH drive, tape drive, DVDROM, or similar device. Pro RuBisCO Greene, 2007, to promote correct folding and gram 210 may be a computer program containing instructions assembly Martinez-Alonso, 2009; Martinez-Alonso, 2010. and/or data, and may be stored on storage device 208 and/or For exogenous expression in E. coli or other prokaryotic in memory 206, for example. In a typical scenario, processor 65 cells, some nucleic acid sequences in the genes or cDNAs of 204 may load some or all of the instructions and/or data of eukaryotic nucleic acids can encode targeting signals such as program 210 into memory 206 for execution. an N-terminal mitochondrial or other targeting signal, which US 8,349,587 B2 33 34 can be removed before transformation into prokaryotic host carbon fixation pathways use NADPH as the redox cofactor cells, if desired. For example, removal of a mitochondrial rather than NADH, such as the reductive pentose phosphate leader sequence led to increased expression in E. coli pathway and several variants of the 3-hydroxypropionate Hoffmeister, 2005. For exogenous expression in yeast or cycle. Accordingly, in certain aspects of the invention, the other eukaryotic cells, genes can be expressed in the cytosol 5 engineered chemoautotroph expresses a Burkholderia Stabi without the addition of leader sequence, or can be targeted to lis NADP-dependent formate dehydrogenase (E.C. 1.2.1.43, mitochondrion or other organelles, or targeted for secretion, ACF35003) or a homolog thereof. The homologs can be by the addition of a Suitable targeting sequence Such as a selected by any suitable methods known in the art or by the mitochondrial targeting or secretion signal Suitable for the methods described herein. This enzyme has been previously host cells. Thus, it is understood that appropriate modifica 10 shown to preferentially use NADP' as a cofactor Hatrongjit, tions to a nucleic acid sequence to remove or include a tar 2010). SEQID NO:1 represents the E. coli codon optimized geting sequence can be incorporated into an exogenous coding sequence for the fah gene of the present invention. In nucleic acid sequence to impart desirable properties. one aspect, the invention provides a nucleic acid molecule Energy Conversion from Inorganic Energy Sources to and homologs, variants and derivatives of SEQID NO:1. The Reduced Cofactors 15 nucleic acid sequence can be preferably 78%, 79%. 80%, In certain aspects, the engineered chemoautotroph of the 81-85%, 90-95%,96-98%, 99%, 99.9% or even higher iden present invention comprises one or more energy conversion tity to SEQ ID NO:1. The present invention also provides pathways to convert energy from one or more inorganic nucleic acids comprising or consisting of a sequence which is energy sources, such as formate, formic acid, carbon monox a codon optimized version of the wild-type fah gene. In ide, methane, molecular hydrogen, hydrogen Sulfide, bisul another embodiment, the invention provides a nucleic acid fide anion, thiosulfate, elemental Sulfur, ferrous iron, and/or encoding a polypeptide having the amino acid sequence of ammonia, to one or more reduced cofactors, such as NADH, Genbank accession ACF35003. Alternatively, enzymes that NADPH, reduced ferredoxins, quinols, reduced flavins, and naturally use NAD" can be engineered using established pro reduced cytochromes. An energy conversion pathway com tein engineering techniques to require NADP instead of prises the following enzymes (only some of which may be 25 NAD" Serov, 2002; Gul-Karaguler, 2001). exogenous depending on the host organism). Together, the In another embodiment, the formate dehydrogenase enzymes confer an energy conversion capability on the host reduces NAD". For example, formate dehydrogenase (E.C. cell or organism that the natural organism lacks. 1.2.1.2) can couple the oxidation of formate to carbon dioxide one or more redox enzymes to oxidize the inorganic energy with the reduction of NAD" to NADH. Exemplary FDH Source and transfer the electrons to a reducing cofactor 30 enzymes include Genbank accession numbers CAA57036, optionally, one or more proteins that serve as a reducing AAC49766 and NP 015033 or homologs thereof. SEQ ID cofactor and/or enzymes that can alter intracellular NO:2, SEQ ID NO:3 and SEQ ID NO:4 represent E. coli pools of reducing cofactors codon optimized coding sequence for each of these three optionally, one or more oxidoreductases or transhydroge FDHs, respectively, of the present invention. In one aspect, nases that can transfer electrons from high to lower 35 the invention provides nucleic acid molecules and homologs, energy redox cofactors (or between redox cofactors with variants and derivatives of SEQID NO:2, SEQID NO:3 and similar redox potentials) SEQID NO:4. The nucleic acid sequences can be preferably optionally, one or more transporters or channels to facili 78%, 79%, 80%, 81-85%, 90-95%, 96-98%, 99%, 99.9% or tate uptake of extracellular inorganic energy sources by even higheridentity to SEQID NO:2, SEQID NO:3 and SEQ the engineered chemoautotroph. 40 ID NO:4. The present invention also provides nucleic acids In certain embodiments, the nucleic acids encoding the each comprising or consisting of a sequence which is a codon proteins and enzymes of a energy conversion pathway are optimized version of one of the wild-type fah genes. In introduced into a host cellor organism that does not naturally another embodiment, the invention provides nucleic acids contain all the energy conversion pathway enzymes. A par each encoding a polypeptide having the amino acid sequence ticularly useful organism for genetically engineering energy 45 of one of Genbank accession numbers CAA57036, conversion pathways is E. coli, which is well characterized in AAC49766 and NP O15033. terms of available genetic manipulation tools as well as fer In certain embodiments, the invention provides an engi mentation conditions. Following the teaching and guidance neered chemoautotrophthat can utilize formate and/or formic provided herein for introducing a sufficient number of encod acid as an inorganic energy source and produce reduced, low ing nucleic acids to generate a particular energy conversion 50 potential ferredoxin as the reducing cofactor. The reductive pathway, those skilled in the art would understand that the tricarboxylic acid cycle carbon fixation pathway is believed same engineering design also can be performed with respect to require a low potential ferredoxin for particular carboxy to introducing at least the nucleic acids encoding the energy lation steps Brugna-Guiral, 2003, Yoon, 1997: Ikeda, 2005. conversion pathway enzymes or proteins absent in the host The organisms Nautilia sp. Strain AmN, Nautilia profindi organism. Therefore, the introduction of one or more encod 55 cola, Nautilia lithotrophica 525 and Thermocrinis ruber are ing nucleic acids into the host organisms of the invention Such reported to grow on formate as the sole electron donorand use that the modified organism contains an energy conversion the reductive tricarboxylic acid cycle as their carbon fixation pathway can confer the ability to use inorganic energy to pathway Campbell, 2001; Smith, 2008; Campbell, 2009: make reducing cofactors, provided the modified organism has Miroshnichenko, 2002; Higler, 2007, thus implying that a suitable inorganic energy source. 60 each of these organisms have an energy conversion pathway In certain embodiments, the invention provides an engi from formate to reduced ferredoxin. To engineer a host cell neered chemoautotrophthat can utilize formate and/or formic for the utilization of formate and/or formic acid as the inor acid as an inorganic energy source. To engineer a host cell for ganic energy source and production of reduced ferredoxin as the utilization of formate and/or formic acid as the inorganic the reducing cofactor, in certain embodiments the present energy source, one or more formate dehydrogenases (FDH) 65 invention provides for the expression of formate dehydroge can be expressed. In a preferred embodiment, the formate nase capable of reducing low potential ferredoxin in the engi dehydrogenase reduces NADP". Some naturally occurring neered chemoautotroph. Such an enzyme would facilitate the US 8,349,587 B2 35 36 combination of an energy conversion pathway that utilizes particular the peptidase that cleaves the C-terminal end of the formate with a carbon fixation pathway based on the reduc large Subunit, tend to be very specific for their cognate hydro tive tricarboxylic acid cycle as an embodiment of the engi genase and can not be substituted by homologous hydroge neered chemoautotroph of the present invention. Exemplary nase maturation factors endogenous to the host cell. Hence, putative ferredoxin-dependent formate dehydrogenases functional heterologous expression of a NiFel-hydrogenase include (with Genbank accession numbers of the FDH sub requires expression of not only the Subunit proteins, such as units listed in parentheses) Nautilia profindicola AmH(YP the large and Small Subunit, but also one or more of the 0.02607699, YP 002607700, YP 002607701 and associated maturation factors, such as the peptidase. In a YP 002607702), Sulfurimonas denitrificans DSM 1251 preferred embodiment, the hydrogenase reduces ferredoxin (YP 394.410 and YP 394411), Caminibacter mediatlanti 10 (E.C. 1.12.7.2) and in particular a low potential ferredoxin cus TB-2 (ZP 01871216, ZP 01871217, ZP 01871218 capable of being used as the reducing cofactor for the car and ZP 01871219) and Methanococcus maripaludis strain boxylation steps of the reductive tricarboxylic acid cycle S2 (NP 988417 and NP 988418) or homologs thereof. In Yoon, 1997: Ikeda, 2005. The group 2a NiFel-hydrogena another embodiment, the invention provides nucleic acids ses are associated with reducing the ferredoxin needed for the each encoding a polypeptide having the amino acid sequence 15 reductive tricarboxylic acid cycle Brugna-Guiral, 2003; Vig of one of Genbank accession numbers YP 0026.07699, nais, 2007. Exemplary hydrogenases include (with Genbank YP 002607700, YP 002607701, YP 002607702, accession numbers of the hydrogenase subunits listed in YP 394.410, YP 394411, ZP 01871216, ZP 01871217, parentheses) Aquifex aeolicus Hydrogenase 3 (NP 213549 ZP 01871218, ZP 01871219, NP 988417 and and NP 213548); Hydrogenobacter thermophilus TK-6 NP 988418. Hup2 (YP 003432664 and YP 003432663); Hydrogeno A ferredoxin-reducing formate dehydrogenase (FDH) has baculum Sp. YO4AAS1 HYO44AAS1 14007 been previously purified from Clostridium pasteurianum W5 HYO44AAS1 1399 (YP 002122063 and Liu, 1984; however, no protein or nucleic acid sequence YP 002122062); Magnetococcus marinus Mmc1 2493/ information is available on the enzyme nor is there a publicly Mmc1 2494 (YP 866399 and YP 866400); Magnetospir available genome sequence for Clostridium pasteurianum as 25 illum magneticum AMB-1 amb1114/amb1115 (YP 420477 of Aug. 1, 2011. Based on the sequencing and bioinformatic and YP 420478); Methanococcus maripaludis S2 Hydroge analysis of the Clostridium pasteurianum genome, the nase B (NP 988273 and NP 988742); Methanosarcina sequence of a two putative subunits of a ferredoxin-depen barkeri str. fitsaro Ech (YP 303717, YP 303716, dent FDH (FdhF and FdhD) as well as two associated putative YP 303715, YP 303714, YP 303713 and YP 303712); ferredoxin -containing proteins were identified (EX 30 Methanosarcina mazei Gol Ech (NP 634344, NP 634345, ample 7). In one aspect, the invention provides nucleic acids NP 634346, NP 634347, NP 634348 and NP 634349); each encoding a polypeptide having the amino acid sequence Mycobacterium smegmatis str. MC2 155 Hydrogenase-2 of one of SEQID NO:5, SEQ ID NO:6, SEQ ID NO:7 and (YP 886615 and YP 88.6614), Nautilia profindicola AmH SEQ ID NO:8. In another embodiment, the invention pro NAMH 0573/NAMH 0572 (YP 002606989 and vides nucleic acids each encoding a polypeptide having the 35 YP 002606988), Nitratiruptor sp. SB155-2 Hup amino acid sequence of one of SEQID NO:5, SEQID NO:6, (YP 001356429 and YP 001356428); Persephonella SEQ ID NO:7 and SEQ ID NO:8 which have been codon marina EX-H1 PERMA 0914/PERMA 0915 optimized for the host organism, such as E. coli. Based on the (YP 002730701 and YP 002730702); Sulfurihydro Clostridium pasteurianum putative FDH subunits, additional genibium azorense AZ-Fu1 SULAZ 0749/SULAZ 0748 putative ferredoxin-dependent FDH were identified. Exem 40 (YP 002728734 and YP 002728733); Sulfurimonas deni plary ferredoxin-dependent FDH include (with Genbank trificans DSM 1251 Suden 1437/Suden 1436 accession numbers of the FDH subunits listed in parentheses) (YP 393949 and YP 393948); Sulfurovum sp NBC37-1 Clostridium beijerincki NCIMB 8052 (YP 001310874 and Hup (YP 001358971 and YP 001358972); Thermocrinis YP 001310871), Clostridium difficile 630 (YP 001089834 albus DSM 14484 Thal 1414/Thal 1413 and YP 001089833), Clostridium difficile CD196 45 (YP 003474170 and YP 0.03474,169); and homologs (YP 003216147 and YP 003216146), Clostridium difficile thereof. In an alternate embodiment, the hydrogenase reduces R20291 (YP 003219654 andYP 003219653) or homologs NADP" (E.C. 1.12.1.3). The group 3b and 3d NiFel-hydro thereof. In another embodiment, the invention provides genases are typically NAD(P) reducing hydrogenases from nucleic acids each encoding a polypeptide having the amino bacteria Vignais, 2007. Exemplary hydrogenases include acid sequence of one of Genbank accession numbers 50 (with Genbank accession numbers of the hydrogenase Sub YP 001310874, YP 001310871, YP 001089834, units listed in parentheses) Cupriavidus necator SH (NP YP 001089833, YP 003216147, YP 003216146, 942732, NP 942730, NP 942729, NP 942728 and YP 003219654 and YP 003219653. NP 942727) and Synechocystis sp PCC6803 bidirectional In certain embodiments, the invention provides an engi hydrogenase (NP 441418, NP 441417, NP 441415, neered chemoautotroph that can utilize molecular hydrogen 55 NP 441414 and NP 441411), and homologs thereof. In an as an inorganic energy source. To engineer a host cell for the alternate embodiment, the hydrogenase reduces NAD" (E.C. utilization of molecular hydrogen as an inorganic energy 1.12.1.2). Exemplary hydrogenases include (with the Gen Source, one or more hydrogenases can be expressed. For bank accession numbers of the hydrogenase Subunits listed in example, NiFel-hydrogenases are typically associated with parentheses) Cupriavidus necator SH without the HoxI sub the coupling of hydrogen oxidation to cofactor reduction 60 unit (NP 942730, NP 942729, NP 942728 and Vignais, 2004. These hydrogenases tend to be composed of NP 942727) and homologs thereof Burgdorf, 2005). at least a large and Small subunit and require several access In certain embodiments, the invention provides an engi sory genes for maturation including a peptidase Vignais, neered chemoautotrophthat can utilize hydrogen Sulfide as an 2004. Recently, there have been several published examples inorganic energy source. To engineer a host cell for the utili of heterologous expression of NiFel-hydrogenases in E. coli 65 Zation of hydrogen sulfide as the inorganic energy source, one Sun, 2010; Wells, 2011; Kim, 2011. Taken together, these or more sulfide-quinone oxidoreductases (SQR) can be results demonstrate that particular maturation proteins, in expressed. Sulfide-quinone oxidoreductase couples the oxi US 8,349,587 B2 37 38 dation of hydrogen Sulfide to the reduction of a quinone to the tives of SEQID NO:21 and SEQID NO:23. The nucleic acid corresponding quinol (E.C. 1.8.5.4). The Rhodobacter cap sequences can be preferably 78%, 79%, 80%, 81-85%, sulatus SQR has been functionally expressed in the heterolo 90-95%,96-98%,99%,99.9% oreven higher identity to SEQ gous host E. coli Schütz, 1997 and demonstrated to reduce ID NO:21 and SEQ ID NO:23. The present invention also ubiquinone Shibata, 2001. Exemplary SQR enzymes provides nucleic acids each comprising or consisting of a include NP 214500, NP 488552, NP 661023, sequence which is a codon optimized version of one of these YP 002426210, YP 003444098, YP 003576957, two wild-type ferredoxin genes. In another embodiment, the YP 315983, YP 866354, and homologs thereof. SEQ ID invention provides nucleic acids each encoding a polypeptide NO:9, SEQID NO:10, SEQID NO:11, SEQID NO:12, SEQ having the amino acid sequence of one of SEQID NO:22 and ID NO:13, SEQ ID NO:14, SEQ ID NO:15 and SEQ ID 10 SEQID NO:24. NO:16 represent E. coli codon optimized coding sequence for In certain embodiments, the invention provides an engi each of these eight SQRs, respectively, of the present inven neered chemoautotroph that can transfer energy from one tion. In one aspect, the invention provides nucleic acid mol reduced cofactor to another. In one embodiment, a ferre ecules and homologs, variants and derivatives of SEQ ID doxin-NADP" reductase (FNR) is expressed. FNR can cata NO:9, SEQID NO:10, SEQID NO:11, SEQID NO:12, SEQ 15 lyze reversible electron transfer between the two-electron ID NO:13, SEQ ID NO:14, SEQ ID NO:15 and SEQ ID carrier NADPH and the one-electron carrier ferredoxin (E.C NO:16. The nucleic acid sequences can be preferably 78%, 1.18.1.2). Exemplary FNR enzymes include the Hydrogeno 79%, 80%, 81-85%, 90-95%, 96-98%, 99%, 99.9% or even bacter thermophilus Fpr (Genbank accession BAH29712) higher identity to SEQ ID NO:9, SEQ ID NO:10, SEQ ID and homologs thereof Ikeda, 2009. In another embodiment, NO:11, SEQ ID NO:12, SEQ ID NO:13, SEQ ID NO:14, a ferredoxin-NAD" reductase (E.C. 1.18.1.3) and/or a NAD SEQ ID NO:15 and SEQ ID NO:16. The present invention (P) transhydrogenase (E.C. 1.6.1.1 or E.C. 1.6.1.2) is provides nucleic acids each comprising or consisting of a expressed. sequence which is a codon optimized version of one of the Carbon Fixation of Inorganic Carbon to Central Metabolites wild-type Sqr genes. In another embodiment, the invention In certain aspects, the engineered chemoautotroph of the provides nucleic acids each encoding a polypeptide having 25 present invention comprises one or more carbon fixation the amino acid sequence of one of Genbank accession num pathways to use energy from one or more reduced cofactors, bers NP 214500, NP 488552, NP 661023, such as NADH, NADPH, reduced ferredoxins, quinols, YP 002426210, YP 003444098, YP 003576957, reduced flavins, and reduced cytochromes, to convert inor YP 315983, YP 866354, and homologs thereof. Alterna ganic carbon, such as carbon dioxide, formate, or formic acid, tively, to engineer a host cell for the utilization of hydrogen 30 into central metabolites, such as acetyl-coA, pyruvate, gly Sulfide, one or more flavocytochrome c sulfide dehydrogena oxylate, glycolate and dihydroxyacetone phosphate. One or ses can be expressed. Flavocytochrome c sulfide dehydroge more of the carbon fixation pathways can be derived from nase is similar in structure to SQR but couples the oxidation naturally occurring carbon fixation pathways, such as the of hydrogen sulfide to the reduction of a cytochrome (E.C. Calvin-Benson-Bassham cycle or reductive pentose phos 1.8.2.3) Marcia, 2010). 35 phate cycle, the reductive tricarboxylic acid cycle, the Wood In certain embodiments, the invention provides an engi Ljungdhal or reductive acetyl-coA pathway, the 3-hydrox neered chemoautotrophthat expresses a protein that can serve ypropionate bicycle, 3-hydroxypropionate/4- as a reducing cofactor, Such as preferably ferredoxin or alter hydroxybutyrate cycle and the dicarboxylate/4- natively cytochrome c. In one embodiment, the ferredoxin is hydroxybutyrate cycle Higler, 2011. Alternatively, one or a low potential ferredoxin that can donate electrons to the 40 more of the carbon fixation pathways can be derived from carboxylation steps in the reductive tricarboxylic acid cycle synthetic metabolic pathways not found in nature, Such as Yoon, 1997: Ikeda, 2005. Exemplary ferredoxins include those enumerated by Bar-Even et al. Bar-Even, 2010. In AAA83524, YP 003433536, YP 003433535, certain embodiments, the nucleic acids encoding the proteins YP 304316, and homologs thereof SEQID NO:17, SEQID and enzymes of a carbon fixation pathway are introduced into NO:18, SEQ ID NO:19, SEQ ID NO:20 represent E. coli 45 a host cell or organism that does not naturally contain all the codon optimized coding sequence for each of these four ferre carbon fixation pathway enzymes. A particularly useful doxins, respectively, of the present invention. In one aspect, organism for genetically engineering carbon fixation path the invention provides nucleic acid molecules and homologs, ways is E. coli, which is well characterized in terms of avail variants and derivatives of SEQID NO:17, SEQID NO:18, able genetic manipulation tools as well as fermentation con SEQID NO:19, SEQID NO:20. The nucleic acid sequences 50 ditions. Following the teaching and guidance provided herein can be preferably 78%, 79%. 80%, 81-85%, 90-95%, for introducing a sufficient number of encoding nucleic acids 96-98%, 99%, 99.9% or even higher identity to SEQ ID to generate a particular carbon fixation pathway, those skilled NO:17, SEQIDNO:18, SEQID NO:19, SEQID NO:20. The in the art would understand that the same engineering design present invention also provides nucleic acids each comprising also can be performed with respect to introducing at least the or consisting of a sequence which is a codon optimized ver 55 nucleic acids encoding the carbon fixation pathway enzymes sion of one of the wild-type ferredoxin genes. In another or proteins absent in the host organism. Therefore, the intro embodiment, the invention provides nucleic acids each duction of one or more encoding nucleic acids into the host encoding a polypeptide having the amino acid sequence of organisms of the invention Such that the modified organism one of Genbank accession numbers AAA83524, contains a carbon fixation pathway can confer the ability to YP 003433536, YP 003433535 and YP 304316. Two 60 use inorganic carbon to make central metabolites, provided additional exemplary ferredoxins for which no Genbank the modified organism has a suitable inorganic energy source accession number has been assigned include SEQID NO:22 and energy conversion pathway. and SEQ ID NO:24. SEQ ID NO:21 and SEQ ID NO:23 In certain embodiments, the invention provides an engi represent E. coli codon optimized coding sequence for each neered chemoautotroph with a carbon fixation pathway of these two unannotated ferredoxins, respectively, of the 65 derived from the reductive tricarboxylic acid (rTCA) cycle. present invention. In one aspect, the invention provides The rTCA cycle is well known in the art and consists of nucleic acid molecules and homologs, variants and deriva approximately 11 reactions (FIG.3) Evans, 1966; Buchanan, US 8,349,587 B2 39 40 1990. For two of the reactions (reaction 1 and 7), there are YP 392.613; YP 001357517, YP 001357518; two known routes between the substrate and product and each YP 001357515 and YP 001357515; YP 001357066, route is catalyzed by different enzyme(s). The reactions in the YP 001357065, YP 001357068 and YP 001357067; and rTCA cycle are catalyzed by the following enzymes: ATP homologs thereof. ATP citrate lyases comprise 1-4 protein citrate lyase (E.C. 2.3.3.8) Sintsov, 1980: Kanao, 2002b: Subunits depending on the species. Exemplary ATP citrate citryl-CoA synthetase (E.C. 6.2.1.18) Aoshima, 2004a): cit lyases include AAC06486; YP 393085 and YP 393084; ryl-CoA lyase (E.C. 4.1.3.34) Aoshima, 2004b; malate BAFT1501 and BAFT1502; BAF69766 and BAF69767; dehydrogenase (E.C. 1.1.1.37); fumarate dehydratase or ACX98447: AAM72322 and AAM72321; YP 002607124 fumarase (E.C. 4.2.1.2); fumarate reductase (E.C. 1.3.99.1); and YP 002607125; BAB21376 and BAB21375; and Succinyl-CoA synthetase (E.C. 6.2.1.5), 2-oxoglutarate Syn 10 homologs thereof. Exemplary citryl-coA synthetases include thase or 2-oxoglutarate:ferredoxin oxidoreductase (E.C. BAD17846 and BAD 17844. Exemplary citryl-coA lyases 1.2.7.3) Gehring, 1972; Yamamoto, 2010; isocitrate dehy include BAD 17841. drogenase (E.C. 1.1.1.41 or E.C. 1.1.1.42) Kanao, 2002a: In certain embodiments, the invention provides an engi 2-oxoglutarate carboxylase (E.C. 6.4.1.7) Aoshima, 2004c: neered chemoautotroph with a carbon fixation pathway Aoshima, 2006; oxalosuccinate reductase (E.C. 1.1.1.41) 15 derived from the 3-hydroxypropionate (3-HPA) bicycle. The Aoshima, 2004c: Aoshima, 2006; aconitrate hydratase 3-HPA bicycle is well known in the art and consists of 19 (E.C. 4.2.1.3); pyruvate synthase or pyruvate:ferredoxin oxi reactions catalyzed by 13 enzymes (FIG. 5) Holo, 1989; doreductase (E.C. 1.2.7.1); phosphoenolpyruvate synthetase Strauss, 1993: Eisenreich, 1993; Herter, 2002a: Zarzycki, (E.C. 2.7.9.2); phosphoenolpyruvate carboxylase (E.C. 2009; Zarzycki, 2011. The number of reactions in the meta 4.1.1.3.1). In one embodiment, the invention provides an engi bolic pathway exceeds the number of enzymes because par neered chemoautotroph comprising one or more exogenous ticular enzymes, such as malonyl-CoA reductase, propionyl proteins from the rTCA cycle conferring to the organism the CoA synthase, and malyl-CoA/B-methylmalyl-CoA/ ability to produce central metabolites from inorganic carbon, citramalyl-CoA lyase, are multi-functional enzymes that wherein the organism lacks the ability to fix carbon via the catalyze more than one reaction. Also, in Some species. Such rTCA cycle (for example, see FIG. 4). For example, the one or 25 as Metallosphaera sedulla, the same enzyme can carboxylate more exogenous proteins can be selected from ATP citrate acetyl-CoA and propionyl-CoA. The reactions in the 3-HPA lyase, citryl-CoA synthetase, citryl-CoA lyase, malate dehy bicycle are catalyzed by the following enzymes: acetyl-CoA drogenase, fumarate dehydratase, fumarate reductase, Succi carboxylase (E.C. 6.4.1.2) Menendez, 1999: Higler, 2003: nyl-CoA synthetase, 2-oxoglutarate synthase, isocitrate malonyl-CoA reductase (E.C. 1.2.1.75 and E.C. 1.1.1.298) dehydrogenase, 2-oxoglutarate carboxylase, oxaloSuccinate 30 Higler, 2002; Alber, 2006: Rathnasingh, 2011; propionyl reductase, aconitrate hydratase, pyruvate synthase, phospho CoA synthase (E.C. 6.2.1.-, E.C. 4.2.1.- and E.C. 1.3.1.-) enolpyruvate synthetase, and phosphoenolpyruvate carboxy Alber, 2002: propionyl-CoA carboxylase (E.C. 6.4.1.3) lase. The host organism can also express two or more, three or Menendez, 1999; Higler, 2003; methylmalonyl-CoA epi more, four or more, five or more, and the like, including up to merase (E.C. 5.1.99.1); methylmalonyl-CoA mutase (E.C. all the protein and enzymes that confer the rTCA pathway. 35 5.4.99.2); succinyl-CoA: (S)-malate CoA transferase (E.C. For example, in the host organism E. coli, the exogenous 2.8.3.-) Friedmann, 2006; succinate dehydrogenase (E.C. enzymes comprise 2-oxoglutarate synthase and ATP citrate 1.3.5.1); fumarate hydratase (E.C. 4.2.1.2): (S)-malyl-CoA/ lyase. As a second example, in the host organism E. coli, the B-methylmalyl-CoA/(S)-citramalyl-CoA lyase (MMC lyase, exogenous enzymes comprise 2-oxoglutarate synthase, ATP E.C. 4.1.3.24 and E.C. 4.1.3.25) Herter, 2002b: Friedmann, citrate lyase and pyruvate synthase. Finally, as a third 40 2007; mesaconyl-C1-CoA hydratase or 3-methylmalyl example, in the host organism E. coli, the exogenous enzymes CoA dehydratase (E.C. 4.2.1.-)Zarzycki, 2008; mesaconyl comprise 2-oxoglutarate synthase, ATP citrate lyase, pyru CoA C1-C4 CoA transferase (E.C. 2.8.3.-) Zarzycki, 2009: vate synthase, 2-oxoglutarate carboxylase and oxaloSucci mesaconyl-C4-CoA hydratase (E.C. 4.2.1.-) Zarzycki, nate reductase. In another embodiment, alternate enzymes 2009. In one embodiment, the invention provides an engi can be used that result in the same overall carbon fixation 45 neered chemoautotroph comprising one or more exogenous pathway. For example, the enzyme malate dehydrogenase proteins from the 3-HPA bicycle conferring to the organism (E.C. 1.1.1.39) can substitute for malate dehydrogenase and the ability to produce central metabolites from inorganic car phosphoenolpyruvate carboxylase. The enzymes 2-oxoglut bon, wherein the organism lacks the ability to fix carbon via arate synthase and pyruvate synthase can be difficult to dis the 3-HPA bicycle (for example, see FIG. 6). Methylmalonyl tinguish from sequence data alone. Both enzymes comprise 50 CoA epimerase activity has been reported in E. coli although 1-5 protein subunits depending on the species. Exemplary no corresponding gene or gene product has been identified pyruvate/2-oxoglutarate synthases include NP 213793, Evans, 1993. For E. coli ScpA to be active, vitamin B12 NP 213794, and NP 213795; NP 213818, NP 213819 must be present in culture medium or produced intracellu and NP 213820; AAD07654, AAD07655, AAD07656 and larly. For example, the one or more exogenous proteins can be AAD07653; ABK44257, ABK44258 and ABK44249; 55 selected from acetyl-CoA carboxylase, malonyl-CoA reduc ACD90193 and ACD90192: YP 001942282 and tase, propionyl-CoA synthase, propionyl-CoA carboxylase, YP 001942281; and homologs thereof. Exemplary 2-oxo methylmalonyl-CoA epimerase, methylmalonyl-CoA glutarate synthases include BAI69550 and BAI69551; mutase, Succinyl-CoA:(S)-malate CoA transferase. Succinate YP 003432753, YP 003432754, YP 003432755, dehydrogenase, fumarate hydratase, (S)-malyl-CoA/B-meth YP 003432756 and YP 003432757; YP 393565, 60 ylmalyl-CoA/(S)-citramalyl-CoA lyase, mesaconyl-C1-CoA YP 393566, YP 393567 and YP 393568; BAFT1539, hydratase, mesaconyl-CoA C1-C4 CoA transferase, and BAFT1540, BAF71541 and BAFT1538; BAF69954, mesaconyl-C4-CoA hydratase. The host organism can also BAF69955, BAF69956 and BAF69953: AAM71411 and express two or more, three or more, four or more, five or more, AAM71410; YP 002607621, YP 002607620, six or more, seven or more, and the like, including up to all the YP 002607619 and YP 002607622; CAA12243 and 65 protein and enzymes that confer the 3-HPA pathway. For CAD27440; and homologs thereof. Exemplary pyruvate syn example, in the host organism E. coli, the exogenous enzymes thases include YP 392.614, YP 392615, YP 392612 and comprise malonyl-CoA reductase, propionyl-CoA synthase, US 8,349,587 B2 41 42 acetyl-CoA/propionyl-CoA carboxylase, Succinyl-CoA:(S)- SEQID NO:45. The nucleic acid sequences can be preferably malate CoA transferase, and MMC lyase. As a second 78%, 79%, 80%, 81-85%, 90-95%, 96-98%, 99%, 99.9% or example, in the host organism E. coli, the exogenous enzymes even higher identity to SEQID NO:32, SEQID NO:33, SEQ comprise malonyl-CoA reductase, propionyl-CoA synthase, ID NO:34, SEQID NO:35, SEQID NO:36, SEQID NO:37, acetyl-CoA/propionyl-CoA carboxylase, Succinyl-CoA:(S)- SEQ ID NO:38, SEQ ID NO:39, SEQ ID NO:40, SEQ ID malate CoA transferase, MMC lyase, and methylmalonyl NO:41, SEQID NO:42, SEQID NO:43, SEQID NO:44 and CoA epimerase. Finally, as a third example, in the host organ SEQID NO:45. The present invention provides nucleic acids ism E. coli, the exogenous enzymes comprise malonyl-CoA each comprising or consisting of a sequence which is a codon reductase, propionyl-CoA synthase, propionyl-CoA car optimized version of one of the wild-type acetyl-CoA/pro boxylase, succinyl-CoA:(S)-malate CoA transferase, MMC 10 pionyl-CoA carboxylase genes. In another embodiment, the lyase, methylmalonyl-CoA epimerase and methylmalonyl invention provides nucleic acids each encoding a polypeptide CoA mutase. Exemplary malonyl-coA reductases include having the amino acid sequence of one of Genbank accession ZP 049571.96, YP 001433009, ZP 01626393, numbers YP 001 191457, YP 001 190248, ZP 01039179 and YP 001636209, and homologs thereof. YP 001 190249, YP 00158606, YP 001581607, SEQ ID NO:25, SEQ ID NO:26, SEQ ID NO:27, SEQ ID 15 YP 001581608, YP 876582, YP 876583, YP 876584, NO:28 and SEQID NO:29 represent E. coli codon optimized NP 280337, NP 279647, NP 280339, NP 280547 and coding sequence for each of these five malonyl-CoA reduc NP 280866. The enzyme succinyl-CoA:malate-CoA trans tases, respectively, of the present invention. In one aspect, the ferase is composed of two subunits, such as SmtA and SmtB invention provides nucleic acid molecules and homologs, in Chloroflexus aurantiacus. Exemplary Succinyl-CoA: variants and derivatives of SEQID NO:25, SEQID NO:26, malate-CoA transferase subunits include ABF 14399 and SEQ ID NO:27, SEQ ID NO:28 and SEQ ID NO:29. The ABF 14400, and homologs thereof. Exemplary MMC lyases nucleic acid sequences can be preferably 78%, 79%. 80%, include YP 0017633817, and homologs thereof. 81-85%,90-95%, 96-98%, 99%, 99.9% or even higher iden In certain embodiments, the invention provides an engi tity to SEQID NO:25, SEQID NO:26, SEQID NO:27, SEQ neered chemoautotroph with a carbon fixation pathway ID NO:28 and SEQ ID NO:29. The present invention also 25 derived from the ribulose monophosphate (RuMP) cycle. The provides nucleic acids each comprising or consisting of a RuMP cycle is well known in the art and consists of 9 reac sequence which is a codon optimized version of one of the tions (FIG. 7) Strom, 1974). Reactions 1 and 2 (FIG. 7) are wild-type malonyl-CoA reductase genes. In another embodi catalyzed by two separate enzymes in Some organisms and by ment, the invention provides nucleic acids each encoding a a bifunctional fusion enzyme in other organisms Yurimoto, polypeptide having the amino acid sequence of one of Gen 30 2009. The reactions in the RuMP cycle are catalyzed by the bank accession numbers ZP 0495.7196, YP 001433009, following enzymes: hexulose-6-phosphate synthase (HPS, ZP 01626393, ZP 01039179 and YP 001636209. Exem E.C. 4.1.2.43) Kemp, 1972; Kemp, 1974: 6-phospho-3- plary propionyl-CoA synthases include AAL47820, and hexuloisomerase (PHI, E.C. 5.3.1.27) Strom, 1974; Ferenci, homologs thereof. SEQ ID NO:30 represents the E. coli 1974; phosphofructokinase (PFK, E.C. 2.7.1.11); fructose codon optimized coding sequence for this propionyl-CoA 35 bisphosphate aldolase (FBA, E.C. 4.1.2.13); transketolase synthase of the present invention. In one aspect, the invention (TK, E.C. 2.2.1.1); transaldolase (TA, E.C. 2.2.1.2); 7, tran provides nucleic acid molecule and homologs, variants and sketolase (TK, E.C. 2.2.1.1); ribose 5-phosphate isomerase derivatives of SEQID NO:30. The nucleic acid sequence can (RPI, E.C. 5.3.1.6): ribulose-5-phosphate-3-epimerase (RPE, be preferably 78%, 79%. 80%, 81-85%, 90-95%, 96-98%, E.C. 5.1.3.1). In one embodiment, the invention provides an 99%, 99.9% or even higher identity to SEQ ID NO:30. The 40 engineered chemoautotroph comprising one or more exog present invention provides nucleic acids each comprising or enous proteins from the RuMP cycle conferring to the organ consisting of a sequence which is a codon optimized version ism the ability to produce central metabolites from inorganic of the wild-type propionyl-CoA synthase gene. In another carbon, wherein the organism lacks the ability to fix carbon embodiment, the invention provides a nucleic acid encoding via the RuMP cycle (for example, see FIG. 8). For example, a polypeptide having the amino acid sequence of SEQ ID 45 the one or more exogenous proteins can be selected from NO:31. The enzyme acetyl-CoA/propionyl-CoA carboxylase heXulose-6-phosphate synthase, 6-phospho-3-hexuloi is composed of three subunits: PccB. AccCand AccB. Exem Somerase, heXulose-6-phosphate synthase/6-phospho-3- plary acetyl-CoA/propionyl-CoA carboxylases include those hexuloisomerase fusion enzyme Orita, 2005: Orita, 2006; from Metallosphaera sedula DSM5348 (YP 001 191457, Orita, 2007, phosphofructokinase, fructose bisphosphate YP 001 190248, YP 001 190249): Nitrosopumilus mariti 50 aldolase, transketolase, transaldolase, transketolase, ribose FiliS SCM1 (YP 00158606, YP 001581607, 5-phosphate isomerase, and ribulose-5-phosphate-3-epime YP 001581608); Cenarchaeum symbiosum A (YP rase. The host organism can also express one or more, two or 876582, YP 876583, YP 876584); Halobacterium sp. more, three or more, and the like, including up to all the NRC-1 (NP 280337 or NP 279647; NP 280339 or protein and enzymes that confer the RuMP pathway. For NP 280547; NP 280866), and homologs thereof. SEQ ID 55 example, in the host organism E. coli, the exogenous enzymes NO:32, SEQ ID NO:33, SEQ ID NO:34, SEQ ID NO:35, comprise hexulose-6-phosphate synthase and 6-phospho-3- SEQ ID NO:36, SEQ ID NO:37, SEQ ID NO:38, SEQ ID heXuloisomerase. As a second example, in the host organism NO:39, SEQ ID NO:40, SEQ ID NO:41, SEQ ID NO:42, E. coli, the exogenous enzymes comprise the bifunctional SEQID NO:43, SEQID NO:44 and SEQID NO:45 represent fusion enzyme heXulose-6-phosphate synthase/6-phospho-3- E. coli codon optimized coding sequence for each of these 60 hexuloisomerase. Exemplary HPS enzymes include acetyl-CoA/propionyl-CoA carboxylase Subunits, respec YP 115138, YP 115430 and BAA90546, and homologs tively, of the present invention. In one aspect, the invention thereof. SEQID NO:46 and SEQID NO.47 represent E. coli provides nucleic acid molecules and homologs, variants and codon optimized coding sequence for HPS enzymes derivatives of SEQ ID NO:32, SEQ ID NO:33, SEQ ID YP 115138 and YP 115430, respectively, of the present NO:34, SEQ ID NO:35, SEQ ID NO:36, SEQ ID NO:37, 65 invention. In one aspect, the invention provides nucleic acid SEQ ID NO:38, SEQ ID NO:39, SEQ ID NO:40, SEQ ID molecules and homologs, variants and derivatives of SEQID NO:41, SEQID NO:42, SEQID NO:43, SEQID NO:44 and NO:46 and SEQID NO:47. The nucleic acid sequences can US 8,349,587 B2 43 44 be preferably 78%, 79%. 80%, 81-85%, 90-95%, 96-98%, merase and phosphoribolukinase. The host organism can also 99%, 99.9% or even higher identity to SEQ ID NO:46 and express two or more, three or more, four or more, and the like, SEQID NO:47. The present invention provides nucleic acids including up to all the protein and enzymes that confer the each comprising or consisting of a sequence which is a codon RPP pathway. For example, in the host organism E. coli, the optimized version of one of the wild-type HPS genes. In exogenous enzymes comprise ribulose bisphosphate car another embodiment, the invention provides nucleic acids boxylase, Sedoheptulose bisphosphatase and phosphori each encoding a polypeptide having the amino acid sequence bolukinase. As a second example, in the host organism E. coli, of one of YP 115138 and YP 115430. Exemplary PHI the exogenous enzymes comprise ribulose bisphosphate car enzymes includeYP 115431 and BAA90545, and homologs boxylase, NADPH-dependent glyceraldehyde-3P dehydro thereof. SEQ ID NO:48 represent E. coli codon optimized 10 genase, Sedoheptulose bisphosphatase and phosphoriboluki coding sequence for PHI enzyme YP 115431 of the present nase. Ribulose bisphosphate carboxylase has two distinct invention. In one aspect, the invention provides nucleic acid forms: Form I and Form II Portis, 2007. Form I is composed molecule and homologs, variants and derivatives of SEQID of four large Subunit dimers and eight Small Subunits (LSs) NO:48. The nucleic acid sequence can be preferably 78%, and has been expressed previously in heterologous hosts, 79%, 80%, 81-85%, 90-95%, 96-98%, 99%, 99.9% or even 15 such as Escherichia coli Gatenby, 1985; Tabita, 1985; Gut higher identity to SEQ ID NO:48. The present invention teridge, 1986. Exemplary RuBisCO subunits include provides nucleic acids each comprising or consisting of a YP 170840 andYP 170839, and homologs thereof. Exten sequence which is a codon optimized version of one of the sive work has been done to attempt to optimize the function of wild-type PHI genes. In another embodiment, the invention RuBisCO Parikh, 2006: Greene, 2007), and thus engineered provides nucleic acids each encoding a polypeptide having RuBisCO enzymes may also be used in the present invention. the amino acid sequence of YP 115431. Exemplary HPS Exemplary NADPH-dependent GAPDH enzymes include PHI enzymes include NP 143767 and YP 182888, and YP 400759, and homologs thereof. SEQ ID NO:51 repre homologs thereof. SEQID NO:49 represents an E. coli codon sents an E. coli codon optimized coding sequence for this optimized coding sequence for a fusion of the Mycobacte GAPDH of the present invention. In one aspect, the invention rium gastri MB19 HPS enzyme (BAA90546) and PHI 25 provides nucleic acid molecule and homologs, variants and enzyme (BAA90545) of the present invention. In one aspect, derivatives of SEQID NO:51. The nucleic acid sequence can the invention provides nucleic acid molecule and homologs, be preferably 78%, 79%. 80%, 81-85%, 90-95%, 96-98%, variants and derivatives of SEQID NO:49. The nucleic acid 99%, 99.9% or even higher identity to SEQ ID NO:51. The sequence can be preferably 78%, 79%. 80%, 81-85%, present invention provides nucleic acids each comprising or 90-95%,96-98%,99%,99.9% or even higher identity to SEQ 30 consisting of a sequence which is a codon optimized version ID NO:49. The present invention provides nucleic acids each of one of the wild-type GAPDH genes. In another embodi comprising or consisting of a sequence which is a codon ment, the invention provides nucleic acids each encoding a optimized version of one of the wild-type HPS and one of the polypeptide having the amino acid sequence ofYP 400759. wild-type PHI genes. In another embodiment, the invention Exemplary SBPase enzymes include YP 399524, and provides nucleic acids each encoding a polypeptide having 35 homologs thereof. SEQID NO:52 represents an E. coli codon the amino acid sequence of SEQID NO:50. optimized coding sequence for this SBPase of the present In certain embodiments, the invention provides an engi invention. In one aspect, the invention provides nucleic acid neered chemoautotroph whose carbon fixation pathway is the molecule and homologs, variants and derivatives of SEQID Calvin-Benson-Bassham cycle or reductive pentose phos NO:52. The nucleic acid sequence can be preferably 78%, phate (RPP) cycle. The Calvin cycle is well known in the art 40 79%, 80%, 81-85%, 90-95%, 96-98%, 99%, 99.9% or even and consists of 13 reactions (FIG. 9) Bassham, 1954. The higher identity to SEQ ID NO:52. The present invention reactions in the RPP cycle are catalyzed by the following provides nucleic acids each comprising or consisting of a enzymes: ribulose bisphosphate carboxylase (RuBisCO, E.C. sequence which is a codon optimized version of one of the 4.1.1.39); phosphoglycerate kinase (PGK, E.C. 2.7.2.3); wild-type SBPase genes. In another embodiment, the inven glyceraldehyde-3P dehydrogenase (phosphorylating) 45 tion provides nucleic acids each encoding a polypeptide hav (GAPDH, E.C. 1.2.1.12 or E.C. 1.2.1.13); triose-phosphate ing the amino acid sequence ofYP 399524. Exemplary PRK isomerase (TPI, E.C. 5.3.1.1); fructose-bisphosphate aldo enzymes include YP 399994, and homologs thereof. SEQ lase (FBA, E.C. 4.1.2.13); fructose-bisphosphatase (FBPase, ID NO:53 represents an E. coli codon optimized coding E.C. 3.1.3.11); transketolase (TK, E.C. 2.2.1.1); sedoheptu sequence for this PRK of the present invention. In one aspect, lose-1.7-bisphosphate aldolase (SBA, E.C. 4.1.2.-); sedohep 50 the invention provides nucleic acid molecule and homologs, tulose bisphosphatase (SBPase, E.C.3.1.3.37); transketolase variants and derivatives of SEQID NO:53. The nucleic acid (TK, E.C. 2.2.1.1); ribose-5-phosphate isomerase (RPI, E.C. sequence can be preferably 78%, 79%. 80%, 81-85%, 5.3.1.6); ribulose-5-phosphate-3-epimerase (RPE. E.C. 90-95%,96-98%,99%,99.9% oreven higher identity to SEQ 5.1.3.1); phosphoribolukinase (PRK, E.C. 2.7.1.19). In one ID NO:53. The present invention provides nucleic acids each embodiment, the invention provides an engineered chemoau 55 comprising or consisting of a sequence which is a codon totroph comprising one or more exogenous proteins from the optimized version of one of the wild-type PRK genes. In RPP cycle conferring to the organism the ability to produce another embodiment, the invention provides nucleic acids central metabolites from inorganic carbon, wherein the each encoding a polypeptide having the amino acid sequence organism lacks the ability to fix carbon via the RPP cycle (for of YP 39.9994. example, see FIG. 10). For example, the one or more exog 60 Production of Central Metabolites as the Carbon-Based Prod enous proteins can be selected from ribulose bisphosphate ucts of Interest carboxylase, phosphoglycerate kinase, glyceraldehyde-3P In certain embodiments, the engineered chemoautotroph dehydrogenase (phosphorylating), triose-phosphate of the present invention produces the central metabolites, isomerase, fructose-bisphosphate aldolase, fructose-bispho including but not limited to citrate, malate. Succinate, dihy sphatase, transketolase, Sedoheptulose-1.7-bisphosphate 65 droxyacetone, dihydroxyacetone phosphate, 3-hydroxypro aldolase, Sedoheptulose bisphosphatase, transketolase, pionate, as the carbon-based products of interest. The engi ribose-5-phosphate isomerase, ribulose-5-phosphate-3-epi neered chemoautotroph produces central metabolites as an US 8,349,587 B2 45 46 intermediate or product of the carbon fixation pathway or as verts D-O-glucose-1-P to D-O-glucose. Aldose 1-epimerase a intermediate or product of host metabolism. In Such cases, (E.C. 5.1.3.3; e.g., E. coli GalM) D-3-glucose to D-O-glu one or more transporters may be expressed in the engineered cose. The Sugars or Sugar phosphates may optionally be chemoautotroph to export the central metabolite from the exported from the engineered chemoautotroph into the cul cell. For example, one or more members of a family of 5 ture medium. enzymes known as C4-dicarboxylate carriers serve to export Sugar phosphates may be converted to their corresponding succinate from cells into the media Janausch, 2002; Kim, Sugars via dephosphorylation that occurs either intra- or 2007. These central metabolites can be converted to other extracellularly. For example, phosphatases such as a glucose products (FIG. 11). 6-phosphatase (E.C.3.1.3.9) or glucose-1-phosphatase (E.C. In some embodiments, the engineered chemoautotroph 10 3.1.3.10) can be introduced into the engineered chemoau may interconvert between different central metabolites to totroph of the present invention. Exemplary phosphatases produce alternate carbon-based products of interest. In one include Homo sapiens glucose-6-phosphatase G6PC embodiment, the engineered chemoautotroph produces (P35575), Escherichia coli glucose-1-phosphatase Agp aspartate by expressing one or more aspartate aminotrans (P19926), E. cloacae glucose-1-phosphatase AgpE ferase (E.C. 2.6.1.1), such as Escherichia coli AspC, to con 15 (Q6EV19) and Escherichia coli acid phosphatase YihX Vert oxaloacetate and L-glutamate to L-aspartate and 2-oxo (POA8Y3). glutarate. Sugar phosphates can be exported from the engineered In another embodiment, the engineered chemoautotroph chemoautotroph into the culture media via transporters. produces dihydroxyacetone phosphate by expressing one or Transporters for Sugar phosphates generally act as anti-por more dihydroxyacetone kinases (E.C. 2.7.1.29), such as C. ters withinorganic phosphate. An exemplary triose phosphate freundii Dhak, to convert dihydroxyacetone and ATP to dihy transporter includes A. thaliana triose-phosphate transporter droxyacetone phosphate. APE2 (Genbank accession AT5G46110.4). Exemplary glu In another embodiment, the engineered chemoautotroph cose-6-phosphate transporters include E. coli Sugar phos produces serine as the carbon-based product of interest. The phate transporter UhpT (NP 418122.1), A. thaliana glu metabolic reactions necessary for serine biosynthesis 25 cose-6-phosphate transporter GPT1 (AT5G54800.1), A. include: phosphoglycerate dehydrogenase (E.C. 1.1.1.95), thaliana glucose-6-phosphate transporter GPT2, or phosphoserine transaminase (E.C. 2.6.1.52), phosphoserine homologs thereof. Dephosphorylation of glucose-6-phos phosphatase (E.C. 3.1.3.3). Phosphoglycerate dehydroge phate can also be coupled to glucose transport, such as Gen nase, Such as E. coli SerA, converts 3-phospho-D-glycerate bank accession numbers AAA16222, AAD 19898, O43826. and NAD" to 3-phosphonooxypyruvate and NADH. Phos 30 Sugars can be diffusively effluxed from the engineered phoserine transaminase. Such as E. coli SerC, interconverts chemoautotroph into the culture media via permeases. Exem between 3-phosphonooxypyruvate--L-glutamate and plary permeases include H. sapiens glucose transporter O-phospho-L-serine-2-oxoglutarate. Phosphoserine phos GLUT-1, -3, or -7 (P11166, P11169, Q6PXP3), S. cerevisiae phatase, Such as E. coli SerB, converts O-phospho-L-serine to hexose transporter HXT-1, -4, or -6 (P32465, P32467, L-serine. 35 P39003), Z. mobilis glucose uniporter Glf (P21906), Syn In another embodiment, the engineered chemoautotroph echocystis sp. 1148 glucose/fructose:H" symporter GlcP produces glutamate as the carbon-based product of interest. (T.C. 2.A.1.1.32: P15729) Zhang, 1989), Streptomyces liv The metabolic reactions necessary for glutamate biosynthesis idans major glucose (or 2-deoxyglucose) uptake transporter include glutamate dehydrogenase (E.C. 1.4.1.4, e.g., E. coli GlcP (T.C. 2.A.1.1.35; Q7BEC4) van Wezel, 2005), Plas Gdh A) which converts C.-ketoglutarate, NH and NADPH to 40 modium falciparum hexose (glucose and fructose)transporter glutamate. Glutamate can Subsequently be converted to vari PfHT1 (T.C. 2.A.1.1.24: O97467), or homologs thereof. ous other carbon-based products of interest, e.g., according to Alternatively, to enable active efflux of sugars from the engi the scheme presented in FIG. 12. neered chemoautotroph, one or more active transporters may In another embodiment, the engineered chemoautotroph be introduced to the cell. Exemplary transporters include produces itaconate as the carbon-based product of interest. 45 mouse glucose transporter GLUT 1 (AAB20846) or The metabolic reactions necessary for itaconate biosynthesis homologs thereof. include aconitrate decarboxylase (E.C. 4.1.1.6; such as that Preferably, to prevent buildup of other storage polymers from A. terreus) which converts cis-aconitrate to itaconate from Sugars or Sugar phosphates, the engineered chemoau and CO. Itaconate can subsequently be converted to various totrophs of the present invention are attenuated in their ability other carbon-based products of interest, e.g., according to the 50 to build other storage polymers such as glycogen, starch, scheme presented in FIG. 12. Sucrose, and cellulose using one or more of the following Production of Sugars as the Carbon-Based Products of Inter enzymes: cellulose synthase (UDP forming) (E.C. 2.4.1.12), est glycogen synthase e.g. glgA1, glgA2 (E.C. 2.4.1.21). Sucrose Industrial production of chemical products from biological phosphate synthase (E.C. 2.4.1.14). Sucrose phosphorylase organisms is often accomplished using a Sugar Source. Such as 55 (E.C. 3.1.3.24), alpha-1,4-glucan lyase (E.C. 4.2.2.13), gly glucose or fructose, as the feedstock. Hence, in certain cogen synthase (E.C. 2.4.1.11), 1,4-alpha-glucan branching embodiments, the engineered chemoautotroph of the present enzyme (E.C. 2.4.1.18). invention produces Sugars including glucose and fructose or The invention also provides engineered chemoautotrophs Sugar phosphates including triose phosphates (such as that produce other Sugars such as Sucrose, Xylose, maltose, 3-phosphoglyceraldehyde and dihydroxyacetone-phosphate) 60 pentose, rhamnose, galactose and arabinose according to the as the carbon-based products of interest. Sugars and Sugar same principles. A pathway for galactose biosynthesis is phosphates may also be interconverted. For example, glu shown (FIG. 13). The metabolic reactions in the galactose cose-6-phosphate isomerase (E.C. 5.3.1.9; e.g., E. coli Pgi) biosynthetic pathway are catalyzed by the following may interconvert between D-fructose 6-phosphate and D-glu enzymes: alpha-D-glucose-6-phosphate ketol-isomerase cose-6-phosphate. Phosphoglucomutase (E.C. 5.4.2.2, e.g., 65 (E.C. 5.3.1.9; e.g., Arabidopsis thaliana PGI1), D-mannose E. coli Pgm) converts D-O-glucose-6-P to D-O-glucose-1-P. 6-phosphate ketol-isomerase (E.C. 5.3.1.8; e.g., Arabidopsis Glucose-1-phosphatase (E.C.3.1.3.10; e.g., E. coli Agp) con thaliana DIN9), D-mannose 6-phosphate 1,6-phosphomu US 8,349,587 B2 47 48 tase (E.C. 5.4.2.8; e.g., Arabidopsis thaliana ATPMM), man continuously return it to maltose or glucose units that can be nose-1-phosphate guanylyltransferase (E.C. 2.7.7.22; e.g., collected as a carbon-based product. Arabidopsis thaliana CYT), GDP-mannose 3.5-epimerase (E.C. 5.1.3.18: e.g., Arabidopsis thaliana GME), galactose TABLE 1 1-phosphate guanylyltransferase (E.C. 2.7.n.n; e.g., Arabi 5 dopsis thaliana VTC2), L-galactose 1-phosphate phos Enzymes for hydrolysis of glycogen phatase (E.C. 3.1.3.n; e.g., Arabidopsis thaliana VTC4). In E.C. one embodiment, the invention provides an engineered Enzyme number Function chemoautotroph comprising one or more exogenous proteins 10 C-amylase 3.2.1.1 endohydrolysis of 14-C-D-gluco from the galactose biosynthetic pathway. sidic The invention also provides engineered chemoautotrophs linkages in polysaccharides that produce Sugar alcohols, such as Sorbitol, as the carbon 3-amylase 3.2.1.2 hydrolysis of 1,4-C-D-glucosidic linkages in polysaccharides so as to based product of interest. In certain embodiments, the engi remove successive maltose units neered chemoautotroph produces D-sorbitol from D-O-glu from cose and NADPH via the enzyme polyol dehydrogenase 15 the non-reducing ends of the chains (E.C. 1.1.1.21; e.g., Saccharomyces cerevisiae GRE3). Y-amylase 3.2.1.3 hydrolysis of terminal 1.4-linked C-D- The invention also provides engineered chemoautotrophs glucose residues successively from that produce Sugar derivatives, such as ascorbate, as the car non-reducing ends of the chains with bon-based product of interest. In certain embodiments, the release off-D-glucose glucoamylase 3.2.1.3 hydrolysis of terminal 1.4-linked engineered chemoautotroph produces ascorbate from galac C-D- tose via the enzymes L-galactose dehydrogenase (E.C. glucose residues successively from 1.1.1.122; e.g., Arabidopsis thaliana AtáG33670) and L-ga non-reducing ends of the chains with release off-D-glucose lactonolactone oxidase (E.C. 1.3.3.12; e.g., Saccharomyces isoamylase 3.21.68 hydrolysis of (1->6)-C-D-glucosidic cerevisiae ATGLDH). Optionally, a catalase (E.C. 1.11.1.6: branch linkages in glycogen, e.g., E. coli Kate) may be included to convert the waste 25 amylopectin and their beta-limit produce hydrogen peroxide to molecular oxygen. dextrins pullulanase 3.2.1.41 hydrolysis of (1->6)-C-D-glucosidic The fermentation products according to the above aspect of inkages in pullulan a linear poly the invention are Sugars, which are exported into the media as le a result of carbon fixation during chemoautotrophy. The Sug of C-(1->6)-linked maltotriose units ars can also be reabsorbed later and fermented, directly sepa 30 and in amylopectin and glycogen, and rated, or utilized by a co-cultured organism. This approach he C- and B-limit dextrins of has several advantages. First, the total amount of Sugars the amylopectin and glycogen cell can handle is not limited by maximum intracellular con amylomaltase 24.125 transfers a segment of a 1,4-C-D- glucan to a new position in an centrations because the end-product is exported to the media. acceptor, which may be glucose or a Second, by removing the Sugars from the cell, the equilibria 35 4-C-D-glucan (part of yeast of carbon fixation reactions are pushed towards creating more debranching system) Sugar. Third, during chemoautotrophy, there is no need to amylo-C-1,6-glucosidase 3.2.1.33 debranching enzyme: hydrolysis of (1->6)-C-D-glucosidic branch push carbon flow towards glycolysis. Fourth, the Sugars are inkages in glycogen phosphorylase potentially less toxic than the fermentation products that imit dextrin would be directly produced. 40 phosphorylase kinase 2.7.11.19 2 ATP + phosphorylase b = 2 ADP+ Chemoautotrophic fixation of carbon dioxide may be fol phosphorylase a phosphorylase 2.4.1.1 (1,4-C-D-glucosyl), + phosphate = lowed by flux of carbon compounds to the creation and main (1,4-C-D-glucosyl), 1 + C-D-glu tenance of biomass and to the storage of retrievable carbon in COSE the form of glycogen, cellulose and/or Sucrose. Glycogen is a 1-phosphate polymer of glucose composed of linear alpha 1.4-linkages 45 and branched alpha 1.6-linkages. The polymer is insoluble at degree of polymerization (DP) greater than about 60,000 and Production of Fermentative Products as the Carbon-Based forms intracellular granules. Glycogen in Synthesized in vivo Products of Interest via a pathway originating from glucose 1-phosphate. Its In certain embodiments, the engineered chemoautotroph hydrolysis can proceed through phosphorylation to glucose 50 of the present invention produces alcohols such as ethanol, phosphates; via the internal cleavage of polymer to maltodex propanol, isopropanol, butanol and fatty alcohols as the car trins; via the Successive exo-cleavage to maltose; or via the bon-based products of interest. concerted hydrolysis of polymer and maltodextrins to mal In some embodiments, the engineered chemoautotroph of tose and glucose. Hence, an alternative biosynthetic route to the present invention is engineered to produce ethanol via glucose and/or maltose is via the hydrolysis of glycogen 55 pyruvate fermentation. Pyruvate fermentation to ethanol is which can optionally be exported from the cell as described well know to those in the art and there are several pathways above. There area number of potential enzyme candidates for including the pyruvate decarboxylase pathway, the pyruvate glycogen hydrolysis (Table 1). synthase pathway and the pyruvate formate-lyase pathway In addition to the above, another mechanism is described to (FIG. 14). The reactions in the pyruvate decarboxylase path produce glucose biosynthetically. In certain embodiments, 60 way are catalyzed by the following enzymes: pyruvate decar the present invention provides for cloned genes for glycogen boxylase (E.C. 4.1.1.1) and alcohol dehydrogenase (E.C. hydrolyzing enzymes to hydrolyze glycogen to glucose and/ 1.1.1.1 or E.C. 1.1.1.2). The reactions in the pyruvate syn or maltose and transport maltose and glucose from the cell. thase pathway are catalyzed by the following enzymes: pyru Preferred enzymes are set forth below in Table 1. Glucose is vate synthase (E.C. 1.2.7.1), acetaldehyde dehydrogenase transported from the engineered chemoautotroph by a glu 65 (E.C. 1.2.1.10 or E.C. 1.2.1.5), and alcohol dehydrogenase cose/hexose transporter. This alternative allows the cell to (E.C. 1.1.1.1 or E.C. 1.1.1.2). The reactions in the pyruvate accumulate glycogen naturally but adds enzyme activities to formate-lyase pathway are catalyzed by the following US 8,349,587 B2 49 50 enzymes: pyruvate formate-lyase (E.C. 2.3.1.54), acetalde variants of genes generated by mutagenesis (i.e., error-prone hyde dehydrogenase (E.C. 1.2.1.10 or E.C. 1.2.1.5), and alco PCR, synthetic libraries, chemical mutagenesis, etc.). holdehydrogenase (E.C. 1.1.1.1 or E.C. 1.1.1.2). The ethanol dehydratase gene, after development to a Suit In some embodiments, the engineered chemoautotroph of able level of activity, can then be expressed in an ethanolo the present invention is engineered to produce lactate via 5 genic organism to enable that organism to produce ethylene. pyruvate fermentation. Lactate dehydrogenase (E.C. For instance, coexpress native or evolved ethanol dehydratase 1.1.1.28) converts NADH and pyruvate to D-lactate. Exem gene into an organism that already produces ethanol, then test plary enzymes include E. colildh A. a culture by GC analysis of offgas for ethylene production Currently, fermentative products such as ethanol, butanol, that is significantly higher than without the added gene or via 10 a high-throughput assay adapted from a colorimetric test lactic acid, formate, acetate produced in biological organisms Larue, 1973. It may be desirable to eliminate ethanol-export employ a NADH-dependent processes. However, depending proteins from the production organism to prevent ethanol on the energy conversion pathways added to the engineered from being secreted into the medium and preventing its con chemoautotroph, the cell may produce NADPH or reduced version to ethylene. ferredoxin as the reducing cofactor. NADPH is used mostly 15 Alternatively, acryloyl-CoA can be produced as described for biosynthetic operations in biological organisms, e.g., cell above, and acryloyl-CoA hydrolases (E.C. 3.1.2.-), such as for growth, division, and for building up chemical stores. Such the acuN gene from Halomonas sp. HTNK1, can convert as glycogen, Sucrose, and other macromolecules. Using natu acryloyl-CoA into acrylate, which can be thermally decar ral or engineered enzymes that utilize NADPH or reduced boxylated to yield ethylene. ferredoxin as a source of reducing power instead of NADH Alternatively, genes encoding ethylene-forming enzyme would allow direct use of chemoautotrophic reducing power activities (EfE. E.C. 1.14.17.4) from various sources are towards formation of normally fermentative byproducts. expressed. Exemplary enzymes include Pseudomonas Syrin Accordingly, the present invention provides methods for pro gae pv. Phaseolicola (BAA02477), P. syringae pv. Pisi ducing fermentative products Such as ethanol by expressing (AAD16443), Ralstonia Solanacearum (CAD18680). Opti NADP-dependent or ferredoxin-dependent enzymes. 25 mizing production may require further metabolic engineering NADP-dependent enzymes include alcohol dehydrogenase (improving production of alpha-ketoglutarate, recycling Suc NADP' (E.C. 1.1.1.2) and acetaldehyde dehydrogenase cinate as two examples). NAD(P)" (E.C. 1.2.1.5). Exemplary NADP-dependent In some embodiments, the engineered chemoautotroph of alcohol dehydrogenases include Moorella sp. HUC22-1 the present invention is engineered to produce ethylene from Adha (YP 430754) Inokuma, 2007, and homologs 30 methionine. The reactions in the ethylene biosynthesis path thereof. way are catalyzed by the following enzymes: methionine In addition to providing exogenous genes or endogenous adenosyltransferase (E.C. 2.5.1.6), 1-aminocyclopropane-1- genes with novel regulation, the optimization of ethanol pro carboxylate synthase (E.C. 4.4.1.14) and 1-aminocyclopro duction in engineered chemoautotrophs preferably requires pane-1-carboxylate oxidase (E.C. 1.14.17.4). the elimination or attenuation of certain host enzyme activi 35 In some embodiments, the engineered chemoautotroph of ties. These include, but are not limited to, pyruvate oxidase the present invention is engineered to produce propylene as (E.C. 1.2.2.2), D-lactate dehydrogenase (E.C. 1.1.1.28), the carbon-based product of interest. In one embodiment, the acetate kinase (E.C. 2.7.2.1), phosphate acetyltransferase engineered chemoautotroph is engineered to express one or (E.C. 2.3.1.8), citrate synthase (E.C. 2.3.3.1), phospho more of the following enzymes: propionyl-CoA synthase enolpyruvate carboxylase (E.C. 4.1.1.3.1). The extent to 40 (E.C. 6.2.1.-, E.C. 4.2.1.- and E.C. 1.3.1.-), propionyl-CoA which these manipulations are necessary is determined by the transferase (E.C. 2.8.3.1), aldehyde dehydrogenase (E.C. observed byproducts found in the bioreactor or shake-flask. 1.2.1.3 or E.C. 1.2.1.4), alcohol dehydrogenase (E.C. 1.1.1.1 For instance, observation of acetate would suggest deletion of or E.C. 1.1.1.2), and alcohol dehydratase (E.C. 4.2.1.-). Pro pyruvate oxidase, acetate kinase, and/or phosphotransacety pionyl-CoA synthase is a multi-functional enzyme that con lase enzyme activities. In another example, observation of 45 verts 3-hydroxypropionate, ATP and NADPH to propionyl D-lactate would suggest deletion of D-lactate dehydrogenase CoA. Exemplary propionyl-CoA synthases include enzyme activities, whereas observation of Succinate, malate, AAL47820, and homologs thereof. SEQ ID NO:30 repre fumarate, oxaloacetate, or citrate would suggest deletion of sents the E. coli codon optimized coding sequence for this citrate synthase and/or PEP carboxylase enzyme activities. propionyl-CoA synthase of the present invention. In one Production of Ethylene, Propylene, 1-Butene, 1,3-Butadiene, 50 aspect, the invention provides nucleic acid molecule and Acrylic Acid, Etc. as the Carbon-Based Products of Interest homologs, variants and derivatives of SEQ ID NO:30. The In certain embodiments, the engineered chemoautotroph nucleic acid sequence can be preferably 78%, 79%. 80%, of the present invention produces ethylene, propylene, 81-85%, 90-95%,96-98%, 99%, 99.9% or even higher iden 1-butene, 1,3-butadiene and acrylic acid as the carbon-based tity to SEQID NO:30. The present invention provides nucleic products of interest. Ethylene and/or propylene may be pro 55 acids each comprising or consisting of a sequence which is a duced by either (1) the dehydration of ethanol or propanol codon optimized version of the wild-type propionyl-CoA (E.C. 4.2.1.-), respectively or (2) the decarboxylation of acry synthase gene. In another embodiment, the invention pro late or crotonate (E.C. 4.1.1.-), respectively. While many vides a nucleic acid encoding a polypeptide having the amino dehydratases exist in nature, none has been shown to convert acid sequence of SEQID NO:31. Propionyl-CoA transferase ethanol to ethylene (or propanol to propylene, propionic acid 60 converts propionyl-CoA and acetate to acetyl-CoA and pro to acrylic acid, etc.) by dehydration. Genes encoding pionate. Exemplary enzymes include Ralstonia eutropha pct enzymes in the 4.2.1.x or 4.1.1.x group can be identified by and homologs thereof. Aldehyde dehydrogenase converts searching databases such as GenBank using the methods propionate and NADPH to propanal. Alcohol dehydrogenase described above, expressed in any desired host (such as converts propanal and NADPH to 1-propanol. Alcohol dehy Escherichia coli, for simplicity), and that host can be assayed 65 dratase converts 1-propanol to propylene. for the appropriate enzymatic activity. A high-throughput In another embodiment, E. colithiolase atoB (E.C. 2.3.1.9) screen is especially useful for screening many genes and converts 2 acetyl-CoA into acetoacetyl-CoA, and C. aceto US 8,349,587 B2 51 52 butyllicum hbd (E.C. 1.1.1.157) converts acetoacetyl-CoA Genes to be knocked-out or attenuated include fadE, gps A, and NADH into 3-hydroxybutyryl-CoA. E. coli tesB (EC ldh A. pflb, adhE, pta, poxB, ackA, and/or ackB. Exemplary 3.1.2.20) or C. acetobutyllicum ptb and buk (E.C. 2.3.1.19 and enzymes include fadE (AAC73325), gspa (AAC76632), 2.7.2.7 respectively) convert 3-hydroxybutyryl-CoA into ldh A (AAC74462), pflb (AAC73989), adhE (AAC74323), 3-hydroxybutyrate, which can be simultaneously decarboxy pta (AAC75357), poxB (AAC73958), ackA (AAC75356), lated and dehydrated to yield propylene. Optionally, the 3-hy ackB (BAB81430), and homologs thereof. droxybutyryl-CoA is polymerized to form poly(3-hydroxy Additional potential modifications include the following. butyrate), a solid compound which can be extracted from the To achieve fatty acid overproduction, lipase (E.C. 3.1.1.3) fermentation medium and simultaneously depolymerized, which produce triacylglycerides from fatty acids and glycerol 10 and in Some cases serves as a Suppressor of fabA can be hydrolyzed, dehydrated, and decarboxylated to yield propy included in the engineered chemoautotroph of the present lene (U.S. patent application Ser. No. 12/527,714, 2008). invention. Exemplary enzymes include Saccharomyces cer Production of Fatty Acids, their Intermediates and Deriva evisiae LipA (CAA89087), Saccharomyces cerevisiae TGL2 tives as the Carbon-Based Products of Interest CAA98876, and homologs thereof. To remove limitations on In certain embodiments, the engineered chemoautotroph 15 the pool of acyl-CoA, the D311E mutation in plsB of the present invention produces fatty acids, their interme (AAC77011) can be introduced. diates and their derivatives as the carbon-based products of To engineer an engineered chemoautotroph for the produc interest. The engineered chemoautotrophs of the present tion of a population of fatty acid derivatives with homoge invention can be modified to increase the production of acyl neous chain length, one or more endogenous genes can be ACP or acyl-CoA, to reduce the catabolism of fatty acid attenuated or functionally deleted and one or more derivatives and intermediates, or to reduce feedback inhibi thioesterases can be expressed. Thioesterases (E.C.3.1.2.14) tion at specific points in the biosynthetic pathway used for generate acyl-ACP from fatty acid and ACP. For example, fatty acid products. In addition to modifying the genes C10 fatty acids can be produced by attenuating endogenous described herein, additional cellular resources can be diverted C18 thioesterases (for example, E. coli tesAAAC73596 and to over-produce fatty acids. For example the lactate. Succinate 25 POADA1, and homologs thereof), which uses C18:1-ACP, and/or acetate pathways can be attenuated and the fatty acid and expressing a C10 thioesterase, which uses C10-ACP, biosynthetic pathway precursors acetyl-CoA and/or malonyl thus, resulting in a relatively homogeneous population of CoA can be overproduced. fatty acids that have a carbon chain length of 10. In another In one embodiment, the engineered chemoautotrophs of example, C14 fatty acid derivatives can be produced by the present invention can be engineered to express certain 30 attenuating endogenous thioesterases that produce non-C14 fatty acid synthase activities (FAS), which is a group of pep fatty acids and expressing the C14 thioesterase, which uses tides that catalyze the initiation and elongation of acyl chains C14-ACP. In yet another example, C12 fatty acid derivatives Marrakchi, 2002a). The acyl carrier protein (ACP) and the can be produced by expressing thioesterases that use C12 enzymes in the FAS pathway control the length, degree of ACP and attenuating thioesterases that produce non-C12 fatty saturation and branching of the fatty acids produced, which 35 acids. Exemplary C8:0 to C10:0 thioesterases include can be attenuated or over-expressed. Such enzymes include Cuphea hookeriana fat32 (AAC49269) and homologs accABCD, Fab), FabH, FabG, FabA, FabZ, Fabl, FabK, thereof. Exemplary C12:0 thioesterases include Umbellu Fabi, FabM, FabB. FabF, and homologs thereof. laria California fath3 (Q41635) and homologs thereof. Exem In another embodiment, the engineered chemoautotrophs plary C14:0thioesterases include Cinnamonum camphorum of the present invention form fatty acid byproducts through 40 fath3 (Q39473). Exemplary C14:0 to C16:0 thioesterases ACP-independent pathways, for example, the pathway include Cuphea hookeriana fat33 (AAC49269). Exemplary described recently by Dellomonaco, 2011 involving rever C16:0 thioesterases include Arabidopsis thaliana fat3 sal of beta oxidation. Enzymes involved in these pathways (CAA85388), Cuphea hookeriana fat31 (Q39513) and include such genes as atoB, fadA, fadB, fadD, fadE, fadI, homologs thereof. Exemplary C18:1 thioesterases include fadK, fad, paaZ, yoliO, yfcY, yfc7. ydiD, and homologs 45 Arabidopsis thaliana fatA (NP 189147, NP 193041), Ara thereof. bidopsis thaliana fath3 (CAA85388), Bradyrhizobium In one aspect, the fatty acid biosynthetic pathway precur japonicum fatA (CAC39106), Cuphea hookeriana fatA sors acetyl-CoA and malonyl-CoA can be overproduced in (AAC72883), Escherichia coli tesA (NP 415027) and the engineered chemoautotroph of the present invention. Sev homologs thereof. Acetyl CoA, malonyl CoA, and fatty acid eral different modifications can be made, either in combina 50 overproduction can be verified using methods known in the tion or individually, to the host cell to obtain increased acetyl art, for example by using radioactive precursors, HPLC, and CoA/malonyl CoA/fatty acid and fatty acid derivative pro GC-MS subsequent to cell lysis. duction. To modify acetyl-CoA and/or malonyl-CoA produc In yet another aspect, fatty acids of various lengths can be tion, the expression of acetyl-CoA carboxylase (E.C. 6.4.1.2) produced in the engineered chemoautotroph by expressing or can be modulated. Exemplary genes include accABCD 55 overexpressing acyl-CoA synthase peptides (E.C. 2.3.1.86), (AAC73296) or homologs thereof. To increase acetyl CoA which catalyzes the conversion of fatty acids to acyl-CoA. production, the expression of several genes may be altered Some acyl-CoA synthase peptides, which are non-specific, including pdh, pank, aceEF, (encoding the Elp dehydroge accept other Substrates in addition to fatty acids. nase component and the E2p dihydrolipoamide acyltrans In yet another aspect, branched chain fatty acids, their ferase component of the pyruvate and 2-oxoglutarate dehy 60 intermediates and their derivatives can be produced in the drogenase complexes), fabH/fab)/fabG/acpp/fabF, and in engineered chemoautotroph as the carbon-based products of Some examples additional nucleic acid encoding fatty-acyl interest. By controlling the expression of endogenous and CoA reductases and aldehyde decarbonylases. Exemplary heterologous enzymes associated with branched chain fatty enzymes include pdh (BAB34380, AAC73227, AAC73226), acid biosynthesis, the production of branched chain fatty acid panK (also known as coaA, AAC76952), aceEF (AAC73227. 65 intermediates including branched chain fatty acids can be AAC73226), fabH (AAC74175), fablD (AAC74176), fabG enhanced. Branched chain fatty acid production can be (AAC74177), acpP (AAC74178), fabF (AAC74179). achieved through the expression of one or more of the fol US 8,349,587 B2 53 54 lowing enzymes Kaneda, 1991: branched chain amino acid pathways. Enzymes that interfere with production of aminotransferase to produce C.-ketoacids from branched branched chain fatty acids include B-ketoacyl-ACP synthase chain amino acids such as isoleucine, leucine and valine (E.C. II (E.C. 2.3.1.41) and B-ketoacyl-ACP synthase III (E.C. 2.6.1.42), branched chain C-ketoacid dehydrogenase com 2.3.1.41) with straight chain acyl CoA specificity. Exemplary plexes which catalyzes the oxidative decarboxylation of 5 enzymes for deletion include E. coli fabF (NP 415613) and C.-ketoacids to branched chain acyl-CoA (bkd, E.C. 1.2.4.4) fabH (NP 415609). Denoya, 1995, dihydrolipoyl dehydrogenase (E.C. 1.8.1.4), In yet another aspect, fatty acids, their intermediates and beta-ketoacyl-ACP synthase with branched chain acyl CoA their derivatives with varying degrees of Saturation can be specificity (E.C. 2.3.1.41) Li, 2005, crotonyl-CoA reduc produced in the engineered chemoautotroph as the carbon tase (E.C. 1.3.1.8, 1.3.1.85 or 1.3.1.86) Han, 1997), and 10 based products of interest. In one aspect, hosts are engineered isobutyryl-CoA mutase (large subunit E.C. 5.4.99.2 and to produce unsaturated fatty acids by over-expressing B-ke small subunit E.C. 5.4.99.13). Exemplary branched chain toacyl-ACP synthase I (E.C. 2.3.1.41), or by growing the host amino acid aminotransferases include E. coli ilvE (YP at low temperatures (for example less than 37° C.). FabB has 026247), Lactococcus lactis ilvE (AAF34406), Pseudomo preference to cis-ödecenoyl-ACP and results in unsaturated nas putida ilvE (NP 745648), Streptomyces coelicolor ilvE 15 fatty acid production in E. coli. Over-expression of FabB (NP 629657), and homologs thereof. Branched chain C-ke results in the production of a significant percentage of unsat toacid dehydrogenase complexes consist of El C/B (decar urated fatty acids de Mendoza, 1983. These unsaturated boxylase), E2 (dihydrolipoyl transacylase) and E3 (dihydro fatty acids can then be used as intermediates in hosts that are lipoyl dehydrogenase) subunits. The industrial host E. coli engineered to produce fatty acids derivatives, such as fatty has only the E3 component as a part of its pyruvate dehydro alcohols, esters, waxes, olefins, alkanes, and the like. Alter genase complex (lpd, E.C. 1.8.1.4, NP 414658) and so it natively, the repressor offatty acid biosynthesis, E. coli FabR requires the E1C/B and E2 bkd proteins. Exemplary C.-ke (NP 418398), can be deleted, which can also result in toacid dehydrogenase complexes include Streptomyces coeli increased unsaturated fatty acid production in E. coli Zhang, color bkdA1 (NP 628006) E1C. (decarboxylase compo 2002. Further increase in unsaturated fatty acids is achieved nent), S. coelicolor bkdB2 (NP 628005) E1 B 25 by over-expression of heterologous trans-2, cis-3-decenoyl (decarboxylase component), S. coelicolor bkdA3 (NP ACP isomerase and controlled expression of trans-2-enoyl 638004) E2 (dihydrolipoyl transacylase); or S. coelicolor ACP reductase II Marrakchi, 2002b, while deleting E. coli bkdA2 (NP 733618) E1C. (decarboxylase component), S. Fabi (trans-2-enoyl-ACP reductase, E.C. 1.3.1.9, coelicolorbkdB2 (NP 628019) E13 (decarboxylase compo NP 415804) or homologs thereof in the host organism. nent), S. coelicolor bkdC2 (NP 628018) E2 (dihydrolipoyl 30 Exemplary 3-ketoacyl-ACP synthase I include Escherichia transacylase); or S. avermitilis bkdA (BAC72074) E1C. (de coli fabB (BAA16180) and homologs thereof. Exemplary carboxylase component), S. avermitilis bkdB (BAC72075) trans-2, cis-3-decenoyl-ACP isomerase include Streptococ E13 (decarboxylase component), S. avermitilis bkdC cus mutans UA159 FabM (DAA05501) and homologs (BAC72076) E2 (dihydrolipoyl transacylase); S. avermitilis thereof. Exemplary trans-2-enoyl-ACP reductase II include bkdF (E.C.1.2.4.4, BAC72088) E1C. (decarboxylase compo 35 Streptococcus pneumoniae R6 FabK (NP 357969) and nent), S. avermitilis bkdG (BAC72089) E1 B (decarboxylase homologs thereof. To increase production of monounsat component), S. avermitilis bkdH (BAC72090) E2 (dihydro urated fatty acids, the Sfa gene, Suppressor of FabA, can be lipoyl transacylase); B. subtilis bkdAA (NP 390288) E1C. over-expressed Rock, 1996. Exemplary proteins include (decarboxylase component), B. subtilis bkdAB (NP AAN79592 and homologs thereof. One of ordinary skill in 390288) E13 (decarboxylase component), B. subtilis bkdB 40 the art would appreciate that by attenuating fabA, or over (NP 390288) E2 (dihydrolipoyl transacylase); or P putida expressing fabB and expressing specific thioesterases (de bkdA1 (AAA65614) E1C. (decarboxylase component), P. scribed above), unsaturated fatty acids, their derivatives, and putida bkdA2 (AAA65615) El B (decarboxylase compo products having a desired carbon chain length can be pro nent). PputidabkdC (AAA65617) E2 (dihydrolipoyltransa duced. cylase); and homologs thereof. An exemplary dihydrolipoyl 45 In some examples the fatty acid or intermediate is produced dehydrogenase is E. colilpd(NP 414658) E3 and homologs in the cytoplasm of the cell. The cytoplasmic concentration thereof. Exemplary beta-ketoacyl-ACP synthases with can be increased in a number of ways, including, but not branched chain acyl CoA specificity include Streptomyces limited to, binding of the fatty acid to coenzyme A to forman coelicolor fabH1 (NP 626634), ACP (NP 626635) and acyl-CoA thioester. Additionally, the concentration of acyl fabF (NP 626636); Streptomyces avermitilis fabH3 (NP 50 CoAS can be increased by increasing the biosynthesis of CoA 823466), fabC3 (NP 823467), fabF (NP 823468); Bacillus in the cell. Such as by over-expressing genes associated with subtilis fabH A (NP 389015), fabH B (NP 388898), ACP pantothenate biosynthesis (panD) or knocking out the genes (NP 389474), fabF (NP 389016); Stenotrophomonas mal associated with glutathione biosynthesis (glutathione syn tophilia SmalDRAFT 0818 (ZP 01643059), thase). Small RAFT 0821 (ZP 01643063), SmalDRAFT 0822 55 Production of Fatty Alcohols as the Carbon-Based Products (ZP 01643064); Legionella pneumophila fabH (YP of Interest 123672), ACP (YP 123675), fabF (YP 123676); and In yet further aspects, hosts cells are engineered to convert homologs thereof. Exemplary crotonyl-CoA reductases acyl-CoA to fatty alcohols by expressing or overexpressing a include Streptomyces coelicolor ccr (NP 630556), Strepto fatty alcohol forming acyl-CoA reductase (FAR. E.C. myces cinnamonensis ccr (AAD53915), and homologs 60 1.1.1.), or an acyl-CoA reductases (E.C. 1.2.1.50) and alco thereof. Exemplary isobutyryl-CoA mutases include Strepto hol dehydrogenase (E.C. 1.1.1.1) or a combination of the myces coelicolor icmA & icmB (NP 629554 and foregoing to produce fatty alcohols from acyl-CoA. Herein NP 630904), Streptomyces cinnamomensis icmA and icmB after fatty alcohol forming acyl-CoA reductase (FAR. E.C. (AAC08713 and AJ24.6005), and homologs thereof. Addi 1.1.1.*), acyl-CoA reductases (E.C. 1.2.1.50) and alcohol tionally or alternatively, endogenous genes that normally lead 65 dehydrogenase (E.C. 1.1.1.1) are collectively referred to as to straight chain fatty acids, their intermediates, and deriva fatty alcohol forming peptides. Some fatty alcohol forming tives may be attenuated or deleted to eliminate competing peptides are non-specific and catalyze other reactions as well: US 8,349,587 B2 55 56 for example, some acyl-CoA reductase peptides accept other Aharoni, 2000, Saccharomyces cerevisiae Atfp 1 (NP substrates in addition to fatty acids. Exemplary fatty alcohol 015022), and homologs thereof. forming acyl-CoA reductases include Acinetobacter baylvi In some embodiments, one or more wax synthases (E.C. ADP1 acr1 (AAC45217), Simmondsia chinensis far 2.3.1.75) is expressed in the engineered chemoautotroph to (AAD38039), Mus musculus mfarl (AAHO7178), Mus mus- 5 produce fatty esters including waxes from acyl-CoA and culus mfar2 (AAH55759), Acinetobacter sp. M1 acrM1, alcohols as the carbon-based product of interest. Wax syn Homo sapiens hfar (AAT42129), and homologs thereof. Fatty thase peptides are capable of catalyzing the conversion of an alcohols can be used as Surfactants. acyl-thioester to fatty esters. Some wax synthase peptides can Many fatty alcohols are derived from the products of fatty catalyze other reactions, such as converting short chain acyl O CoAs and short chain alcohols to produce fatty esters. Meth acid biosynthesis. Hence, the production offatty alcohols can 1 ods to identify wax synthase activity are provided in U.S. Pat. be controlled by engineering fatty acid biosynthesis in the No. 7,118,896, which is herein incorporated by reference. engineered chemoautotroph. The chain length, branching and Medium-chain waxes that have low melting points, such as degree of saturation offatty acids and their intermediates can octyl octanoate and octyl decanoate, are good candidates for be altered using the methods described herein, thereby affect 15 biofuel to replace triglyceride-based biodiesel. Exemplary ing the nature of the resulting fatty alcohols. wax synthases include Acinetobacter baylvi ADP1 wsadp1, As mentioned above, through the combination of express Acinetobacter baylvi ADP1 wax-dgaT (AAO17391) ing genes that Support brFA synthesis and alcohol synthesis, Kalscheuer, 2003, Saccharomyces cerevisiae Eeb1 (NP branched chain alcohols can be produced. For example, when 015230), Saccharomyces cerevisiae YMR210w (NP an alcohol reductase such as Acrl from Acinetobacter baylvi 20 013937), Simmondsia chinensis acyltransferase ADP1 is coexpressed with a bkd operon, E. coli can synthe (AAD38041), Mus musculus Dgat214 (Q6E1M8), and size isopentanol, isobutanol or 2-methylbutanol. Similarly, homologs thereof. when Acrl is coexpressed with ccrificm genes, E. coli can In other aspects, the engineered chemoautotrophs are synthesize isobutanol. modified to produce a fatty ester-based biofuel by expressing Production of Fatty Esters as the Carbon-Based Products of 25 nucleic acids encoding one or more wax ester synthases in Interest order to confer the ability to synthesize a saturated, unsatur In another aspect, engineered chemoautotrophs produce ated, or branched fatty ester. In some embodiments, the wax various lengths of fatty esters (biodiesel and waxes) as the ester synthesis proteins include, but are not limited to: fatty carbon-based products of interest. Fatty esters can be pro acid elongases, acyl-CoA reductases, acyltransferases or wax duced from acyl-CoAs and alcohols. The alcohols can be 30 synthases, fatty acyl transferases, diacylglycerol acyltrans provided in the fermentation media, produced by the engi ferases, acyl-coA wax alcohol acyltransferases, bifunctional neered chemoautotroph itself or produced by a co-cultured wax ester synthase/acyl-CoA: diacylglycerol acyltransferase organism. selected from a multienzyme complex from Simmondsia In some embodiments, one or more alcohol O-acetyltrans chinensis, Acinetobacter sp. strain ADP1 (formerly Acineto ferases is expressed in the engineered chemoautotroph to 35 bacter calcoaceticus ADP1), Pseudomonas aeruginosa, produce fatty esters as the carbon-based product of interest. Fundibacter jadensis, Arabidopsis thaliana, or Alkaligenes Alcohol O-acetyltransferase (E.C. 2.3.1.84) catalyzes the eutrophus. In one embodiment, the fatty acid elongases, acyl reaction of acetyl-CoA and an alcohol to produce CoA and an CoA reductases or wax synthases are from a multienzyme acetic ester. In some embodiments, the alcohol O-acetyltrans complex from Alkaligenes eutrophus and other organisms ferase peptides are co-expressed with selected thioesterase 40 known in the literature to produce wax and fatty acid esters. peptides, FAS peptides and fatty alcohol forming peptides to Many fatty esters are derived from the intermediates and allow the carbon chain length, Saturation and degree of products offatty acid biosynthesis. Hence, the production of branching to be controlled. In other embodiments, the bkd fatty esters can be controlled by engineering fatty acid bio operon can be co-expressed to enable branched fatty acid synthesis in the engineered chemoautotroph. The chain precursors to be produced. 45 length, branching and degree of Saturation of fatty acids and Alcohol O-acetyltransferase peptides catalyze other reac their intermediates can be altered using the methods tions such that the peptides accept other Substrates in addition described herein, thereby affecting the nature of the resulting to fatty alcohols or acetyl-CoA thioester. Other substrates fatty esters. include other alcohols and other acyl-CoA thioesters. Modi Additionally, to increase the percentage of unsaturated fication of Such enzymes and the development of assays for 50 fatty acid esters, the engineered chemoautotroph can also characterizing the activity of a particular alcohol O-acetyl overexpress Sfa which encodes a suppressor of fabA transferase peptides are within the scope of a skilled artisan. (AAN79592, AAC44390), B-ketoacyl-ACP synthase I (E.C. Engineered O-acetyltransferases and O-acyltransferases can 2.3.1.41, BAA 16180), and secC null mutant suppressors be created that have new activities and specificities for the (cold shock proteins) gns.A and gnsB (ABD18647 and donor acyl group or acceptor alcohol moiety. 55 AAC74076). In some examples, the endogenous fabF gene Alcohol acetyl transferases (AATs, E.C. 2.3.1.84), which can be attenuated, thus, increasing the percentage of palmi are responsible for acyl acetate production in various plants, toleate (C 16:1) produced. can be used to produce medium chain length waxes, such as Optionally a wax ester exporter such as a member of the octyl octanoate, decyl octanoate, decyl decanoate, and the FATP family is used to facilitate the release of waxes or esters like. Fatty esters, synthesized from medium chain alcohol 60 into the extracellular environment from the engineered (such as C6, C8) and medium chain acyl-CoA (or fatty acids, chemoautotroph. An exemplary wax esterexporter that can be such as C6 or C8) have a relative low melting point. For used is fatty acid (long chain) transport protein CG7400-PA, example, hexylhexanoate has a melting point of -55° C. and isoform. A from D. melanogaster (NP 524723), or homologs octyl octanoate has a melting point of-18 to - 17°C. The low thereof. melting points of these compounds make them good candi 65 The centane number (CN), Viscosity, melting point, and dates for use as biofuels. Exemplary alcohol acetyltrans heat of combustion for various fatty acid esters have been ferases include Fragariaxananassa SAAT (AAG 13130) characterized in for example, Knothe, 2005. Using the US 8,349,587 B2 57 58 teachings provided herein the engineered chemoautotroph phoslactomycin B gene cluster of Streptomyces sp. HK803 can be engineered to produce any one of the fatty acid esters Palaniappan, 2003 together with the acyl-CoA isomerase described in Knothe, 2005. (chch3 gene) Patton, 2000 from S. collinus, S. avermitilis or Production of Alkanes as the Carbon-Based Products of Inter S. coelicolor. Exemplary ansatrienin gene cluster enzymes est include AAC44655, AAF73478 and homologs thereof. In another aspect, engineered chemoautotrophs produce Exemplary phoslactomycin B gene cluster enzymes include alkanes of various chain lengths (hydrocarbons) as the car AAQ84158, AAQ84159, AAQ84160, AAQ84161 and bon-based products of interest. Many alkanes are derived homologs thereof. Exemplary chcB enzymes include from the products of fatty acid biosynthesis. Hence, the pro NP 629292, AAF73478 and homologs thereof. duction of alkanes can be controlled by engineering fatty acid 10 The genes (fabH, ACP and fabF) are sufficient to allow biosynthesis in the engineered chemoautotroph. The chain initiation and elongation of co-cyclic fatty acids, because they length, branching and degree of saturation of fatty acids and can have broad substrate specificity. In the event that coex their intermediates can be altered using the methods pression of any of these genes with the ansJKLM/chcAB or described herein. The chain length, branching and degree of pml.JKLM/chcAB genes does not yield cyFAs, fabH, ACP saturation of alkanes can be controlled through their fatty acid 15 and/or fabF homologs from microorganisms that make cyFAS biosynthesis precursors. can be isolated (e.g., by using degenerate PCR primers or In certain aspects, fatty aldehydes can be converted to heterologous DNA probes) and coexpressed. alkanes and CO in the engineered chemoautotroph via the Production of Halogenated Derivatives of Fatty Acids expression of decarbonylases Cheesbrough, 1984; Dennis, Genes are known that can produce fluoroacetyl-CoA from 1991. Exemplary enzymes include Arabidopsis thaliana fluoride ion. In one embodiment, the present invention allows cer1 (NP 171723), Oryza sativa cer1 CER1 (AAD29719) for production offluorinated fatty acids by combining expres and homologs thereof. sion of fluoroacetate-involved genes (e.g., fluorinase, nucle In another aspect, fatty alcohols can be converted to otide phosphorylase, fluorometabolite-specific aldolases, alkanes in the engineered chemoautotroph via the expression fluoroacetaldehyde dehydrogenase, and fluoroacetyl-CoA of terminal alcohol oxidoreductases as in Vibrio firmissii M1 25 synthase). Park, 2005. Transport/Efflux/Release of Fatty Acids and their Derivatives Production of Olefins as the Carbon-Based Products of Inter Also disclosed herein is a system for continuously produc est ing and exporting hydrocarbons out of recombinant host In another aspect, engineered chemoautotrophs produce microorganisms via a transport protein. Many transport and olefins (hydrocarbons) as the carbon-based products of inter 30 efflux proteins serve to excrete a large variety of compounds est. Olefins are derived from the intermediates and products and can be evolved to be selective for a particular type offatty of fatty acid biosynthesis. Hence, the production of olefins acid. Thus, in some embodiments an ABC transporter can be can be controlled by engineering fatty acid biosynthesis in the functionally expressed by the engineered chemoautotroph, so engineered chemoautotroph. Introduction of genes affecting that the organism exports the fatty acid into the culture the production of unsaturated fatty acids, as described above, 35 medium. In one example, the ABC transporter is an ABC can result in the production of olefins. Similarly, the chain transporter from Caenorhabditis elegans, Arabidopsis thala length of olefins can be controlled by expressing, overex nia, Alkaligenes eutrophus or Rhodococcus erythropolis or pressing or attenuating the expression of endogenous and homologs thereof. Exemplary transporters include heterologous thioesterases which control the chain length of AAU44368, NP 188746, NP 175557, AAN73268 or the fatty acids that are precursors to olefin biosynthesis. Also, 40 homologs thereof. by controlling the expression of endogenous and heterolo The transport protein, for example, can also be an efflux gous enzymes associated with branched chain fatty acid bio protein selected from: AcrAB (NP 414996.1, synthesis, the production of branched chain olefins can be NP 414995.1), ToIC (NP 417507.2) and AcrEF (NP enhanced. Methods for controlling the chain length and 417731.1, NP 417732.1) from E. coli, or t11 1618 (NP branching of fatty acid biosynthesis intermediates and prod 45 682408), t11 1619 (NP 682409), t110139 (NP 680930), ucts are described above. H1 1619 and U10139 from Thermosynechococcus elongatus Production of co-Cyclic Fatty Acids and their Derivatives as BP-I or homologs thereof. the Carbon-Based Products of Interest In addition, the transport protein can be, for example, a In another aspect, the engineered chemoautotroph of the fatty acid transport protein (FATP) selected from Drosophila present invention produces ()-cyclic fatty acids (cyFAS) as the 50 melanogaster; Caenorhabditis elegans, Mycobacterium carbon-based product of interest. To synthesize ()-cyclic fatty tuberculosis or Saccharomyces cerevisiae, Acinetobacter sp. acids (cyFAS), several genes need to be introduced and H01-N, any one of the mammalian FATPs or homologs expressed that provide the cyclic precursor cyclohexylcarbo thereof. The FATPs can additionally be resynthesized with nyl-CoA Cropp, 2000. The genes (fabH, ACP and fabF) can the membranous regions reversed in order to invert the direc then be expressed to allow initiation and elongation of co-cy 55 tion of Substrate flow. Specifically, the sequences of amino clic fatty acids. Alternatively, the homologous genes can be acids composing the hydrophilic domains (or membrane isolated from microorganisms that make cyFAS and domains) of the protein can be inverted while maintaining the expressed in E. coli. Relevant genes include bkdC, lpd, fabH, same codons for each particular amino acid. The identifica ACP fabF, fabH1, ACP, fabF, fabH3, fabC3, fabF, fabH A, tion of these regions is well known in the art. fabH B, ACP. 60 Production of Isoprenoids as the Carbon-Based Products of Expression of the following genes are sufficient to provide Interest cyclohexylcarbonyl-CoA in E. coli: ans.J. ansK, ansL, chcA In one aspect, the engineered chemoautotroph of the (1-cyclohexenylcarbonyl CoA reductase) and ansM from the present invention produces isoprenoids or their precursors ansatrienin gene cluster of Streptomyces collinus Chen, isopentenyl pyrophosphate (IPP) and its isomer, dimethylal 1999 or plmJK (5-enolpyruvylshikimate-3-phosphate syn 65 lyl pyrophosphate (DMAPP) as the carbon-based products of thase), plmL (acyl-CoA dehydrogenase), chcA (enoyl-(ACP) interest. There are two known biosynthetic pathways that reductase) and plmM (2,4-dienoyl-CoA reductase) from the synthesize IPP and DMAPP. Prokaryotes, with some excep US 8,349,587 B2 59 60 tions, use the mevalonate-independent or deoxyxylulose kinase (E.C. 2.7.4.2), mevalonate pyrophosphate decarboxy 5-phosphate (DXP) pathway to produce IPP and DMAPP lase (E.C. 4.1.1.33), isopentenyl pyrophosphate isomerase separately through a branch point (FIG. 15). Eukaryotes other (E.C. 5.3.3.2). In one embodiment, the engineered chemoau than plants use the mevalonate-dependent (MEV) isoprenoid totroph of the present invention expresses one or more pathway exclusively to convert acetyl-coenzyme A (acetyl enzymes from the MEV pathway. For example, one or more CoA) to IPP, which is subsequently isomerized to DMAPP exogenous proteins can be selected from acetyl-CoA thiolase, (FIG. 16). In general, plants use both the MEV and DXP HMG-CoA synthase, HMG-CoA reductase, mevalonate pathways for IPP synthesis. kinase, phosphomevalonate kinase, mevalonate pyrophos The reactions in the DXP pathway are catalyzed by the phate decarboxylase and isopentenyl pyrophosphate following enzymes: 1-deoxy-D-xylulose-5-phosphate Syn 10 isomerase. The host organism can also express two or more, thase (E.C. 2.2.1.7), 1-deoxy-D-xylulose-5-phosphate reduc three or more, four or more, and the like, including up to all toisomerase (E.C. 1.1.1.267), 4-diphosphocytidyl-2C-me the protein and enzymes that confer the MEV pathway. thyl-D-erythritol synthase (E.C. 2.7.7.60), Exemplary acetyl-CoA thiolases include NC 000913 4-diphosphocytidyl-2C-methyl-D-erythritol kinase (E.C. REGION: 232413 L.2325315, E. coli: D49362, Paracoccus 2.7.1.148), 2C-methyl-D-erythritol 2.4-cyclodiphosphate 15 denitrificans; L20428, S. cerevisiae; and homologs thereof. synthase (E.C. 4.6.1.12), (E)-4-hydroxy-3-methylbut-2-enyl Exemplary HMG-CoA synthases include NC 001145 diphosphate synthase (E.C. 1.17.7.1), isopentyl/dimethylal complement 19061 . . . 20536, S. cerevisiae; X96617, S. lyl diphosphate synthase or 4-hydroxy-3-methylbut-2-enyl cerevisiae; X83882, A. thaliana; AB037907, Kitasatospora diphosphate reductase (E.C. 1.17.1.2). In one embodiment, griseola; BT007302, H. sapiens; NC 002758, Locus tag the engineered chemoautotroph of the present invention SAV2546, GeneID 1 122571, S. aureus; and homologs expresses one or more enzymes from the DXP pathway. For thereof. Exemplary HMG-CoA reductases include example, one or more exogenous proteins can be selected NM 206548, D. melanogaster, NC 002758, Locus tag from 1-deoxy-D-xylulose-5-phosphate reductoisomerase, SAV2545, GeneID 1 122570, S. aureus; NM 204485, Gallus 4-diphosphocytidyl-2C-methyl-D-erythritol synthase, gallus; AB015627, Streptomyces sp. KO 3988: AF542543, 4-diphosphocytidyl-2C-methyl-D-erythritol kinase, 2C-me 25 Nicotiana attenuata; AB037907, Kitasatospora griseola; thyl-D-erythritol 2.4-cyclodiphosphate synthase, (E)-4-hy AX128213, providing the sequence encoding a truncated droxy-3-methylbut-2-enyl diphosphate synthase, and 4-hy HMGR, S. cerevisiae: NC 001145: complement 115734... droxy-3-methylbut-2-enyl diphosphate reductase. The host 1 18898, S. cerevisiae; and homologs thereof. Exemplary organism can also express two or more, three or more, four or mevalonate kinases include L77688, A. thaliana; X55875, S. more, and the like, including up to all the protein and enzymes 30 cerevisiae; and homologs thereof. Exemplary phosphomeva that confer the DXP pathway. Exemplary 1-deoxy-D-xylu lonate kinases include AF429385, Hevea brasiliensis; lose-5-phosphate synthases include E. coli DXs (AAC46162): NM 006556, H. sapiens; NC 001145 complement P. putida KT2440 DXs (AAN66154); Salmonella enterica 712315 . . . 713670, S. cerevisiae; and homologs thereof. Paratyphi, see ATCC 9150 DXs (AAV78186); Rhodobacter Exemplary mevalonate pyrophosphate decarboxylase sphaeroides 2.4.1 DXs (YP 353327); Rhodopseudomonas 35 include X97557, S. cerevisiae: AF290095, E. faecium: palustris CGA009 DxS (NP 94.6305); Xylella fastidiosa U49260. H. sapiens; and homologs thereof. Exemplary iso Temeculal DxS (NP 779493); Arabidopsis thaliana DXs pentenyl pyrophosphate isomerases include NC 000913, (NP 001078570 and/or NP 196699); and homologs 3031087 . . .3031635, E. coli: AF082326, Haematococcus thereof. Exemplary 1-deoxy-D-xylulose-5-phosphate reduc pluvialis; and homologs thereof. toisomerases include E. coli DXr (BAA32426); Arabidopsis 40 In some embodiments, the host cell produces IPP via the thaliana DXR (AAF73140); Pseudomonas putida KT2440 MEV pathway, either exclusively or in combination with the DXr (NP 743754 and/or Q88 MH4); Streptomyces coeli DXP pathway. In other embodiments, a host cells DXP path color A3(2) DXr (NP 629822); Rhodobacter sphaeroides way is functionally disabled so that the host cell produces IPP 2.4.1 DXr (YP 352764); Pseudomonas fluorescens Pfo-1 exclusively through a heterologously introduced MEV path DXr (YP 346389); and homologs thereof. Exemplary 45 way. The DXP pathway can be functionally disabled by dis 4-diphosphocytidyl-2C-methyl-D-erythritol synthases abling gene expression or inactivating the function of one or include E. coli Isp) (AAF43.207); Rhodobacter sphaeroides more of the DXP pathway enzymes. 2.4.1 IspD (YP 352876); Arabidopsis thaliana ISPD (NP In some embodiments, the host cell produces IPP via the 565286); P putida KT2440 IspD (NP 743771); and DXP pathway, either exclusively or in combination with the homologs thereof. Exemplary 4-diphosphocytidyl-2C-me 50 MEV pathway. In other embodiments, a host cell's MEV thyl-D-erythritol kinases include E. coli IspE (AAF29530): pathway is functionally disabled so that the host cell produces Rhodobacter sphaeroides 2.4.1 Isp (YP 351828); and IPP exclusively through a heterologously introduced DXP homologs thereof. Exemplary 2C-methyl-D-erythritol 2,4- pathway. The MEV pathway can be functionally disabled by cyclodiphosphate synthases include E. coli IspF disabling gene expression or inactivating the function of one (AAF44656); Rhodobacter sphaeroides 2.4.1 IspF (YP 55 or more of the MEV pathway enzymes. 352877); P putida KT2440 IspF (NP 743775); and Provided herein is a method to produce isoprenoids in homologs thereof. Exemplary (E)-4-hydroxy-3-methylbut-2- engineered chemoautotrophs engineered with the isopente enyldiphosphate synthase include E. coli IspG (AAK53460); nyl pyrophosphate pathway enzymes. Some examples of iso P. putida KT2440 IspG (NP 743.014); Rhodobacter prenoids include: hemiterpenes (derived from 1 isoprene sphaeroides 2.4.1 IspG (YP 353044); and homologs 60 unit) Such as isoprene; monoterpenes (derived from 2 iso thereof. Exemplary 4-hydroxy-3-methylbut-2-enyl diphos prene units) Such as myrcene; sesquiterpenes (derived from 3 phate reductases include E. coli IspH (AAL38655); Pputida isoprene units) Such as amorpha-4,11-diene; diterpenes (de KT2440 IspH(NP 742768); and homologs thereof. rived from four isoprene units) Such as taxadiene; triterpenes The reactions in the MEV pathway are catalyzed by the (derived from 6 isoprene units) Such as squalene; tetraterpe following enzymes: acetyl-CoA thiolase, HMG-CoA syn 65 nes (derived from 8 isoprenoids) such as B-carotene; and thase (E.C. 2.3.3.10), HMG-CoA reductase (E.C. 1.1.1.34), polyterpenes (derived from more than 8 isoprene units) Such mevalonate kinase (E.C. 2.7.1.36), phosphomevalonate as polyisoprene. The production of isoprenoids is also US 8,349,587 B2 61 62 described in some detail in the published PCT applications Exemplary (R)-limonene synthases include that from Citrus WO2007/139925 and WO/2007/140339. limon (AAM53946) and homologs thereof. Exemplary (4S)- In another embodiment, the engineered chemoautotroph of limonene synthases include that from Mentha spicata the present invention produces rubber as the carbon-based (AAC37366) and homologs thereof. product of interest via the isopentenyl pyrophosphate path 5 Production of Glycerol or 1.3-Propanediol as the Carbon way enzymes and cis-polyprenylcistransferase (E.C. Based Products of Interest 2.5.1.20) which converts isopentenyl pyrophosphate to rub In one aspect, the engineered chemoautotroph of the ber. The enzyme cis-polyprenylcistransferase may come present invention produces glycerol or 1,3-propanediol as the from, for example, Hevea brasiliensis. carbon-based products of interest (FIG. 17). The reactions in In another embodiment, the engineered chemoautotroph of 10 the glycerol pathway are catalyzed by the following enzymes: the present invention produce isopentanol as the carbon sn-glycerol-3-P dehydrogenase (E.C. 1.1.1.8 or E.C. based product of interest via the isopentenyl pyrophosphate 1.1.1.94) and sn-glycerol-3-phosphatase (E.C. 3.1.3.21). To pathway enzymes and isopentanol dikinase. produce 1.3.-propanediol, the following enzymes are also In another embodiment, the engineered chemoautotroph included: sn-glycerol-3-P. glycerol dehydratase (E.C. produces squalene as the carbon-based product of interest via 15 4.2.1.30) and 1,3-propanediol oxidoreductase (E.C. the isopentenyl pyrophosphate pathway enzymes, geranyl 1.1.1.202). Exemplary sn-glycerol-3-P dehydrogenases diphosphate synthase (E.C. 2.5.1.1), farnesyl diphosphate include Saccharomyces cerevisiae darl and homologs synthase (E.C. 2.5.1.10) and squalene synthase (E.C. thereof. Exemplary sn-glycerol-3-phosphatases include Sac 2.5.1.21). Geranyl diphosphate synthase converts dimethyla charomyces cerevisiae gpp2 and homologs thereof. Exem llyl pyrophosphate and isopentenyl pyrophosphate to geranyl plary sn-glycerol-3-P. glycerol dehydratases include K. pneu diphosphate. Farnesyl diphosphate synthase converts geranyl moniae dhaE1-3. Exemplary 1,3-propanediol diphosphate and isopentenyl diphosphate to farnesyl diphos oxidoreductase include K. pneumoniae dhaT. phate. A bifunctional enzyme carries out the conversion of Production of 1,4-Butanediol or 1,3-Butadiene as the Car dimethylallyl pyrophosphate and two isopentenyl pyrophos bon-Based Products of Interest phate to farnesyl pyrophosphate. Exemplary enzymes include 25 In one aspect, the engineered chemoautotroph of the Escherichia coli Isp A (NP 414955) and homologs thereof. present invention produces 1,4-butanediolor 1,3-butanediene Squalene synthase converts two farnesyl pyrophosphate and as the carbon-based products of interest. The metabolic reac NADPH to squalene. In another embodiment, the engineered tions in the 1,4-butanediol or 1,3-butadiene pathway are cata chemoautotroph produces lanosterol as the carbon-based lyzed by the following enzymes: Succinyl-CoA dehydroge product of interest via the above enzymes, squalene 30 nase (E.C. 1.2.1.n; e.g., C. kluyveri SucD), monooxygenase (E.C. 1.14.99.7) and lanosterol synthase 4-hydroxybutyrate dehydrogenase (E.C. 1.1.1.2; e.g., Arabi (E.C. 5.4.99.7). Squalene monooxygenase converts squalene, dopsis thaliana GHBDH), aldehyde dehydrogenase (E.C. NADPH and O, to (S)-squalene-2,3-epoxide. Exemplary 1.1.1.n; e.g., E. coli AldH), 1,3-propanediol oxidoreductase enzymes include Saccharomyces cerevisiae Ergil (NP (E.C. 1.1.1.202; e.g., K. pneumoniae DhaT), and optionally 01 1691) and homologs thereof. Lanosterol synthase converts 35 alcohol dehydratase (E.C. 4.2.1.-). Succinyl-CoA dehydro (S)-squalene-2,3-epoxide to lanosterol. Exemplary enzymes genase converts succinyl-CoA and NADPH to succinic semi include Saccharomyces cerevisiae Erg7 (NP 011939) and aldehyde and CoA. 4-hydroxybutyrate dehydrogenase con homologs thereof. verts succinic semialdehyde and NADPH to In another embodiment, the engineered chemoautotroph of 4-hydroxybutyrate. Aldehyde dehydrogenase converts 4-hy the present invention produces lycopene as the carbon-based 40 droxybutyrate and NADH to 4-hydroxybutanal. 1,3-pro product of interest via the isopentenyl pyrophosphate path panediol oxidoreductase converts 4-hydroxybutanal and way enzymes, geranyl diphosphate synthase (E.C. 2.5.1.21, NADH to 1,4-butanediol. Alcohol dehydratase converts 1,4- described above), farnesyl diphosphate synthase (E.C. butanediol to 1,3-butadiene. 2.5.1.10, described above), geranylgeranyl pyrophosphate Production of Polyhydroxybutyrate as the Carbon-Based synthase (E.C. 2.5.1.29), phytoene synthase (E.C. 2.5.1.32), 45 Products of Interest phytoene oxidoreductase (E.C. 1.14.99.n) and -carotene In one aspect, the engineered chemoautotroph of the oxidoreductase (E.C. 1.14.99.30). Geranylgeranyl pyrophos present invention produces polyhydroxybutyrate as the car phate synthase converts isopentenyl pyrophosphate and far bon-based products of interest (FIG. 18). The reactions in the nesyl pyrophosphate to (all trans)-geranylgeranyl pyrophos polyhydroxybutyrate pathway are catalyzed by the following phate. Exemplary geranylgeranyl pyrophosphate synthases 50 enzymes: acetyl-CoA:acetyl-CoA C-acetyltransferase (E.C. include Synechocystis sp. PCC6803 crtE (NP 440010) and 2.3.1.9), (R)-3-hydroxyacyl-CoA:NADP+ oxidoreductase homologs thereof. Phytoene synthase converts 2 geranylgera (E.C. 1.1.1.36) and polyhydroxyalkanoate synthase (E.C. nyl-PP to phytoene. Exemplary enzymes include Syn 2.3.1.-). Exemplary acetyl-CoA:acetyl-CoA C-acetyltrans echocystis sp. PCC6803 crtB (P37294). Phytoene oxi ferases include Ralstonia eutropha phaA. Exemplary (R)-3- doreductase converts phytoene, 2 NADPH and 2 O. to 55 hydroxyacyl-CoA:NADP+ oxidoreductases include Ralsto -carotene. Exemplary enzymes include Synechocystis sp. nia eutropha phaB. Exemplary polyhydroxyalkanoate PCC6803 crtI and Synechocystis sp. PCC6714 crtI (P21134). synthase include Ralstonia eutropha phaC. In the event that -caroteine oxidoreductase converts -carotene, 2 NADPH the host organism also has the capacity to degrade polyhy and 2 O to lycopene. Exemplary enzymes include Syn droxybutyrate, the corresponding degradation enzymes. Such echocystis sp. PCC6803 crtO-2 (NP 441720). 60 as poly(R)-3-hydroxybutanoatehydrolase (E.C. 3.1.1.75), In another embodiment, the engineered chemoautotroph of may be inactivated. Hosts that lack the ability to naturally the present invention produces limonene as the carbon-based synthesize polyhydroxybutyrate generally also lack the product of interest via the isopentenyl pyrophosphate path capacity to degrade it, thus leading to irreversible accumula way enzymes, geranyl diphosphate synthase (E.C. 2.5.1.21, tion of polyhydroxybutyrate if the biosynthetic pathway is described above) and one of (R)-limonene synthase (E.C. 65 introduced. 4.2.3.20) and (4S)-limonene synthase (E.C. 4.2.3.16) which Intracellular polyhydroxybutyrate can be measured by sol convert geranyl diphosphate to a limonene enantiomer. vent extraction and esterification of the polymer from whole US 8,349,587 B2 63 64 cells. Typically, lyophilized biomass is extracted with metha enzyme that converts 3-hydroxypropionate, ATP and nol-chloroform with 10% HCl as a catalyst. The chloroform NADPH to propionyl-CoA. Exemplary propionyl-CoA syn dissolves the polymer, and the methanol esterifies it in the thases include AAL47820, and homologs thereof. SEQ ID presence of HC1. The resulting mixture is extracted with NO:30 represents the E. coli codon optimized coding water to remove hydrophilic Substances and the organic phase 5 sequence for this propionyl-CoA synthase of the present is analyzed by G.C. invention. In one aspect, the invention provides nucleic acid Production of Lysine as the Carbon-Based Products of Inter molecule and homologs, variants and derivatives of SEQID est NO:30. The nucleic acid sequence can be preferably 78%, In one aspect, the engineered chemoautotroph of the 79%, 80%, 81-85%, 90-95%, 96-98%, 99%, 99.9% or even present invention produces lysine as the carbon-based prod 10 higher identity to SEQ ID NO:30. The present invention uct of interest. There are several known lysine biosynthetic provides nucleic acids each comprising or consisting of a pathways. One lysine biosynthesis pathway is depicted in sequence which is a codon optimized version of the wild-type FIG. 19. The reactions in one lysine biosynthetic pathway are propionyl-CoA synthase gene. In another embodiment, the catalyzed by the following enzymes: aspartate aminotrans invention provides a nucleic acid encoding a polypeptide ferase (E.C. 2.6.1.1; e.g. E. coli AspC), aspartate kinase (E.C. 15 having the amino acid sequence of SEQID NO:31. 2.7.2.4; e.g., E. coli LySC), aspartate semialdehyde dehydro Integration of Metabolic Pathways into Host Metabolism genase (E.C. 1.2.1.11; e.g., E. coli Asd), dihydrodipicolinate The engineered chemoautotrophs of the invention can be synthase (E.C. 4.2.1.52; e.g., E. coli DapA), dihydrodipicoli produced by introducing expressible nucleic acids encoding nate reductase (E.C. 1.3.1.26; e.g., E. coli Dapb), tetrahy one or more of the enzymes or proteins participating in one or drodipicolinate succinylase (E.C. 2.3.1.117; e.g., E. coli more energy conversion, carbon fixation and, optionally, car DapD), N-Succinyldiaminopimelate-aminotransferase (E.C. bon product biosynthetic pathways. Depending on the host 2.6.1.17; e.g., E. coli Arg)), N-Succinyl-L-diaminopimelate organism chosen for conferring a chemoautotrophic capabil desuccinylase (E.C. 3.5.1.18; e.g., E. coli DaphE), diami ity, nucleic acids for Some or all of particular metabolic path nopimelate epimerase (E.C. 5.1.1.7: E. coli DapF), diami ways can be expressed. For example, if a chosen host is nopimelate decarboxylase (E.C. 4.1.1.20; e.g., E. coli LySA). 25 deficient in one or more enzymes or proteins for desired In one embodiment, the engineered chemoautotroph of the metabolic pathways, then expressible nucleic acids for the present invention expresses one or more enzymes from a deficient enzyme(s) or protein(s) are introduced into the host lysine biosynthetic pathway. For example, one or more exog for Subsequent exogenous expression. Alternatively, if the enous proteins can be selected from aspartate aminotrans chosen host exhibits endogenous expression of Some path ferase, aspartate kinase, aspartate semialdehyde dehydroge 30 way genes, but is deficient in others, then an encoding nucleic nase, dihydrodipicolinate synthase, dihydrodipicolinate acid is needed for the deficient enzyme(s) or protein(s) to reductase, tetrahydrodipicolinate succinylase, N-succinyl achieve production of desired carbon products frominorganic diaminopimelate-aminotransferase, N-Succinyl-L-diami energy and inorganic carbon. Thus, an engineered chemoau nopimelate desuccinylase, diaminopimelate epimerase, totroph of the invention can be produced by introducing exog diaminopimelate decarboxylase, L,L-diaminopimelate ami 35 enous enzyme or proteinactivities to obtain desired metabolic notransferase (E.C. 2.6.1.83; e.g., Arabidopsis thaliana pathways or desired metabolic pathways can be obtained by At4.g33680), homocitrate synthase (E.C. 2.3.3.14: e.g., Sac introducing one or more exogenous enzyme or protein activi charomyces cerevisiae LYS21), homoaconitase (E.C. ties that, together with one or more endogenous enzymes or 4.2.1.36; e.g., Saccharomyces cerevisiae LYS4, LYS3). proteins, produces a desired product Such as reduced cofac homoisocitrate dehydrogenase (E.C. 1.1.1.87; e.g., Saccha 40 tors, central metabolites and/or carbon-based products of romyces cerevisiae LYS12, LYS11, LYS10), 2-aminoadipate interest. transaminase (E.C. 2.6.1.39; e.g., Saccharomyces cerevisiae Depending on the metabolic pathway constituents of a ARO8), 2-aminoadipate reductase (E.C. 1.2.1.31; e.g., Sac selected host microbial organism, the engineered chemoau charomyces cerevisiae LYS2, LYS5), aminoadipate semial totrophs of the invention can include at least one exogenously dehyde-glutamate reductase (E.C. 1.5.1.10: e.g., Saccharo 45 expressed metabolic pathway-encoding nucleic acid and up myces cerevisiae LYS9, LYS13), lysine-2-oxoglutarate to all encoding nucleic acids for one or more energy conver reductase (E.C. 1.5.1.7; e.g., Saccharomyces cerevisiae Sion, carbon fixation and, optionally, carbon-based product LYS1). The host organism can also express two or more, three pathways. For example, a RuMP-derived carbon fixation or more, four or more, and the like, including up to all the pathway can be established in a host deficient in a pathway protein and enzymes that confer lysine biosynthesis. 50 enzyme or protein through exogenous expression of the cor Production of Y-Valerolactone as the Carbon-Based Product responding encoding nucleic acid. In a host deficient in all of Interest enzymes or proteins of a metabolic pathway, exogenous In some embodiments, the engineered chemoautotroph of expression of all enzyme or proteins in the pathway can be the present invention is engineered to produce Y-Valerolac included, although it is understood that all enzymes or pro tone as the carbon-based product of interest. One example 55 teins of a pathway can be expressed even if the host contains Y-Valerolactone biosynthetic pathway is shown in FIG. 20. In at least one of the pathway enzymes or proteins. For example, one embodiment, the engineered chemoautotroph is engi exogenous expression of all enzymes or proteins in a carbon neered to express one or more of the following enzymes: fixation pathway derived from the 3-HPA bicycle can be propionyl-CoA synthase (E.C. 6.2.1.-, E.C. 4.2.1.- and E.C. included, such as the acetyl-CoA carboxylase, malonyl-CoA 1.3.1.-), beta-ketothiolase (E.C. 2.3.1.16; e.g., Ralstonia 60 reductase, propionyl-CoA synthase, propionyl-CoA car eutropha BktB), acetoacetyl-CoA reductase (E.C. 1.1.1.36: boxylase, methylmalonyl-CoA epimerase, methylmalonyl e.g., Ralstonia eutropha PhaB), 3-hydroxybutyryl-CoA CoA mutase, Succinyl-CoA: (S)-malate CoA transferase, Suc dehydratase (E.C. 4.2.1.55; e.g., X. axonopodis Crt), viny cinate dehydrogenase, fumarate hydratase, (S)-malyl-CoA/ lacetyl-CoA A-isomerase (E.C. 5.3.3.3; e.g., C. difficile B-methylmalyl-CoA/(S)-citramalyl-CoA lyase, mesaconyl Abfl D), 4-hydroxybutyryl-CoA transferase (E.C. 2.8.3.-; e.g., 65 C1-CoA hydratase, mesaconyl-CoA C1-C4 CoA transferase, C. kluyveri OrfA), 1.4-lactonase (E.C.3.1.1.25; e.g., that from and mesaconyl-C4-CoA hydratase. Given the teachings and R. norvegicus). Propionyl-CoA synthase is a multi-functional guidance provided herein, those skilled in the art would US 8,349,587 B2 65 66 understand that the number of encoding nucleic acids to intro usually, but not exclusively, arise from metabolism endog duce in an expressible form can, at least, parallel the meta enous to the host cell or organism. bolic pathway deficiencies of the selected host microbial A combination of different approaches may be used to organism. identify candidate genetic modifications. Such approaches Genetic Engineering Methods for Optimization of Metabolic include, for example, metabolomics (which may be used to Pathways identify undesirable products and metabolic intermediates In some embodiments, the engineered chemoautotrophs of that accumulate inside the cell), metabolic modeling and iso the invention also can include other genetic modifications that topic labeling (for determining the flux through metabolic facilitate or optimize production of a carbon-based product reactions contributing to hydrocarbon production), and con from an inorganic energy source and inorganic carbon or that 10 ventional genetic techniques (for eliminating or Substantially confer other useful functions onto the host organism. disabling unwanted metabolic reactions). For example, meta In one aspect, the expression levels of the proteins of inter bolic modeling provides a means to quantify fluxes through est of the energy conversion pathways, carbon fixation path the cells metabolic pathways and determine the effect of ways and, optionally, carbon product biosynthetic pathways elimination of key metabolic steps. In addition, metabolom can be either increased or decreased by, for example, replac 15 ics and metabolic modeling enable better understanding of ing or altering the expression control sequences with alternate the effect of eliminating key metabolic steps on production of expression control sequences encoded by Standardized desired products. genetic parts. The exogenous standardized genetic parts can To predict how a particular manipulation of metabolism regulate the expression of either heterologous or endogenous affects cellular metabolism and synthesis of the desired prod genes of the metabolic pathway. Altered expression of the uct, a theoretical framework was developed to describe the enzyme or enzymes and/or protein or proteins of a metabolic molar fluxes through all of the known metabolic pathways of pathway can occur, for example, through changing gene posi the cell. Several important aspects of this theoretical frame tion or gene order Smolke, 2002b, altered gene copy num work include: (i) a relatively complete database of known ber Smolke, 2002a, replacement of a endogenous, naturally pathways, (ii) incorporation of the growth-rate dependence of occurring regulated promoters with constitutive or inducible 25 cell composition and energy requirements, (iii) experimental synthetic promoters, mutation of the ribosome binding sites measurements of the amino acid composition of proteins and Wang, 2009, or introduction of RNA secondary structural the fatty acid composition of membranes at different growth elements and/or cleavage sites Smolke, 2000; Smolke, rates and dilution rates and (iv) experimental measurements 2001). of side reactions which are known to occur as a result of In another aspect, some engineered chemoautotrophs of 30 metabolism manipulation. These new developments allow the present invention may require specific transporters to significantly more accurate prediction of fluxes in key meta facilitate uptake of inorganic energy sources and/or inorganic bolic pathways and regulation of enzyme activity Keasling, carbon Sources. In some embodiments, the engineered 1999a; Keasling, 1999b; Martin, 2002: Henry, 2006). chemoautotrophs use formate as an inorganic energy source, Such types of models have been applied, for example, to inorganic carbon source or both. If formate uptake is limiting 35 analyze metabolic fluxes in organisms responsible for for either growth or production of carbon-based products of enhanced biological phosphorus removal in wastewater treat interest, then expression of one or more formate transporters ment reactors and in filamentous fungi producing polyketides in the engineered chemoautotroph of the present invention Pramanik, 1997: Pramanik, 1998a; Pramanik, 1998b; Pra can alleviate this bottleneck. The formate transporters may be manik, 1998c). heterologous or endogenous to the host organism. Exemplary 40 In some embodiments, the host organism may have native formate transporters include NP 415424 and NP 416987, formate dehydrogenases or other enzymes that consume for and homologs thereof. SEQ ID NO:54 and SEQID NO:55 mate thereby competing with either energy conversion path represent E. coli codon optimized coding sequence each of ways that use formate as an inorganic energy source or carbon these two formate transporters, respectively, of the present fixation pathways that use formate as an inorganic carbon invention. The present invention provides nucleic acids each 45 Source; hence, these competing formate consumption reac comprising or consisting of a sequence which is a codon tions may be disrupted to increase the efficiency of energy optimized version of one of the wild-type malonyl-CoA conversion and/or carbon fixation in the engineered reductase genes. In another embodiment, the invention pro chemoautotroph of the present invention. For example, in the vides nucleic acids each encoding a polypeptide having the host organism E. coli, there are three native formate dehydro amino acid sequence of one of NP 415424 and NP 416987. 50 genases. Exemplary E. coli formate dehydrogenase genes for In addition, the invention provides an engineered disruption include fanG, fanH, fan I, floI, floH, floG and/or chemoautotroph comprising a genetic modification confer fah F. Alternatively, since all three native formate dehydroge ring to the engineered chemoautotrophic microorganism an nases in E. coli require selenium and only those three increased efficiency of using inorganic energy and inorganic enzymes require selenium, in a preferred embodiment, genes carbon to produce carbon-based products of interest relative 55 for selenium uptake and/or biosynthesis of selenocysteine, to the microorganism in the absence of the genetic modifica such as selA, selB, selC, and/or selD, are disrupted. tion. The genetic modification comprises one or more gene In other embodiments, the host organism may have native disruptions, whereby the one or more gene disruptions hydrogenases or other enzymes that consume molecular increase the efficiency of producing carbon-based products of hydrogen thereby competing with energy conversion path interest from inorganic energy and inorganic carbon. In one 60 ways that use hydrogen as an inorganic energy source. For aspect, the one or more gene disruptions target genes encod example, in the host organism E. coli, there are four native ing competing reactions for inorganic energy, reduced cofac hydrogenases although the fourth is not expressed to signifi tors, inorganic carbon, and/or central metabolites. In another cant levels Self, 2004. Exemplary E. coli formate hydroge aspect, the one or more gene disruptions target genes encod nase genes for disruption include hyaB, hybC, hycE, hyfi ing competing reactions for intermediates or products of the 65 and fhlA. In another embodiment, a particular strain of the energy conversion, carbon fixation, and/or carbon product host organism can be selected that specifically lacks the com biosynthetic pathways of interest. The competing reactions peting reactions typical found in the species. For example, E. US 8,349,587 B2 67 68 coli B strain BL21 (DE3) lacks formate and hydrogenase cobalt-precorrin-3 (C7)-methyltransferase (E.C. 2.1.1.131), metabolism unlike E. coli K strains Pinske, 2011. cobalt precorrin-4 (C')-methyltransferase (E.C. 2.1.1.133), In some embodiments, the host organism may have meta cobalt-precorrin 5A hydrolase (E.C.3.7.1.12), cobalt-precor bolic reactions that compete with reactions of the carbon rin-5B (C)-methyltransferase (E.C. 2.1.1.195), cobalt-pre fixation pathways in the engineered chemoautotroph of the corrin-6A reductase, cobalt-precorrin-6V (C)-methyltrans present invention. For example, in the host organism E. coli, ferase (E.C. 2.1.1.-), cobalt-precorrin-7 (C')- the tricarboxylic acid cycle generally runs in the oxidative methyltransferase (decarboxylating) (E.C. 2.1.1.196), direction during aerobic growth and as a split reductive and cobalt-precorrin-8x methylmutase, cobyrinate A.C-diamide oxidative branches during anaerobic growth. Hence, E. coli synthase (E.C. 6.3.5.11), cob(II)yrinate a,c-diamide reduc has several endogenous reactions that may compete with 10 tase (E.C. 1.16.8.1), cob(I)yrinic acid a,c-diamide adenosyl desired reactions of an rTCA-derived carbon fixation path transferase (E.C. 2.5.1.17), adenosyl-cobyrate synthase (E.C. way. Exemplary E. coli enzymes whose function are candi 6.3.5.10), adenosylcobinamide phosphate synthase (E.C. dates for disruption include citrate synthase (competes with 6.3.1.10), GTP:adenosylcobinamide-phosphate guanylyl reaction 1 in FIG. 3), 2-oxoglutarate dehydrogenase (com transferase (E.C. 2.7.7.62), nicotinate-nucleotide dimethyl petes with reaction 6), isocitrate dehydrogenase (may com 15 benzimidazole phosphoribosyltransferase (E.C. 2.4.2.21), pete with desired flux for reaction 7), isocitrate dehydroge adenosylcobinamide-GDP:O-ribazole-5-phosphate ribazole nase phosphatase (competes with reaction 8), pyruvate transferase (E.C. 2.7.8.26) and adenosylcobalamine-5'-phos dehydrogenase (competes with reaction 9). phate phosphatase (E.C. 3.1.3.73). In addition, to allow for In another aspect, some engineered chemoautotrophs of cobalt uptake and incorporation into Vitamin B12, the genes the present invention may require alterations to the pool of encoding the cobalt transporter are overexpressed. The exem intracellular reducing cofactors for efficient growth and/or plary cobalt transporter protein found in Salmonella enterica production of the carbon-based product of interest from inor is overexpressed and is encoded by proteins ABC-type Co" ganic energy and inorganic carbon. In some embodiments, transport system, permease component (CbiM. the total pool of NAD+/NADH in the engineered chemoau NP 460968), ABC-type cobalt transport system, periplas totrophis increased or decreased by adjusting the expression 25 mic component (CbiN, NP 460967), and ABC-type cobalt level of nicotinic acid phosphoribosyltransferase (E.C. transport system, permease component (CbiQ, NP 461989). 2.4.2.11). Over-expression of either the E. coli or Salmonella In some embodiments, the intracellular concentration (e.g., gene pncB which encodes nicotinic acid phosphoribosyl the concentration of the intermediate in the engineered transferase has been shown to increase total NAD+/NADH chemoautotroph) of the metabolic pathway intermediate can levels in E. coli Wubbolts, 1990; Berrios-River, 2002; San, 30 be increased to further boost the yield of the final product. For 2002. In another embodiment, the availability of intracellu example, by increasing the intracellular amount of a substrate lar NADPH can be also altered by modifying the engineered (e.g., a primary substrate) for an enzyme that is active in the chemoautotroph to express an NADH:NADPH transhydro metabolic pathway, and the like. genase Sauer, 2004; Chin, 2011. In another embodiment, In another aspect, the carbon-based products of interest are the total pool of ubiquinone in the engineered chemoau 35 or are derived from the intermediates or products offatty acid totrophis increased or decreased by adjusting the expression biosynthesis. To increase the production of waxes/fatty acid level of ubiquinone biosynthetic enzymes, such as p-hy esters, and fatty alcohols, one or more of the enzymes offatty droxybenzoate-polyprenyl pyrophosphate transferase and acid biosynthesis can be over expressed or mutated to reduce polyprenyl pyrophosphate synthetase. Overexpression of the feedback inhibition. Additionally, enzymes that metabolize corresponding E. coli genes ubiA and ispB increased the 40 the intermediates to make nonfatty-acid based products (side ubiquinone pool in E. coli Zhu, 1995. In another embodi reactions) can be functionally deleted or attenuated to ment, the level of the redox cofactor ferredoxin in the engi increase the flux of carbon through the fatty acid biosynthetic neered chemoautotroph can be increased or decreased by pathway thereby enhancing the production of carbon-based changing the expression control sequences that regulate its products of interest. expression. 45 Growth-Based Selection Methods for Optimization of Engi In another aspect, in addition to an inorganic energy and neered Carbon-Fixing Strains carbon Source, Some engineered chemoautotrophs may Selective pressure provides a valuable means for testing require a specific nutrients or vitamin(s) for growth and/or and optimizing the engineered chemoautotrophs of the production of carbon-based products of interest. For present invention. In some embodiments, the engineered example, hydroxocobalamin, a vitamer of Vitamin B12, is a 50 chemoautotrophs of the invention can be evolved under selec cofactor for particular enzymes of the present invention, Such tive pressure to optimize production of a carbon-based prod as methylmalonyl-CoA mutase (E.C. 5.4.99.2). Required uct from an inorganic energy source and inorganic carbon or nutrients are generally supplemented to the growth media that confer other useful functions onto the host organism. The during bench scale propagation of such organisms. However, ability of an optimized engineered chemoautotroph to repli such nutrients can be prohibitively expensive in the context of 55 cate more rapidly than unmodified counterparts confirms the industrial scale bio-processing. In one embodiment of the utility of the optimization. Similarly, the ability to survive and present invention, the host cell is selected from an organism replicate in media lacking a required nutrient, such as Vitamin that naturally produces the required nutrient(s). Such as Sal B12, confirms the Successful implementation of a nutrient monella enterica or Pseudomonas denitrificans which natu biosynthetic module. In some embodiments, the engineered rally produces hydroxocobalamin. In an alternate embodi 60 chemoautotrophs can be cultured in the presence of inorganic ment, the need for a vitamin is obviated by modifying the energy source(s), inorganic carbon and a limiting amount of engineered chemoautotroph to express a vitamin biosynthesis organic carbon. Over time, the amount of organic carbon pathway Roessner, 1995. An exemplary biosynthesis path present in the culture media is decreased in order to select for way for hydroxocobalamin comprises the following evolved strains that more efficiently utilize the inorganic enzymes: uroporphyrin-III C-methyltransferase (E.C. 65 energy and carbon. 2.1.1.107), precorrin-2 cobaltochelatase (E.C. 4.99.1.3), Evolution can occur as a result of either spontaneous, natu cobalt-precorrin-2 (C)-methyltransferase (E.C. 2.1.1.151), ral mutation or by addition of mutagenic agents or conditions US 8,349,587 B2 69 70 to live cells. If desired, additional genetic variation can be normal cellular lifecycles carbon is used in cellular functions introduced prior to or during selective pressure by treatment including producing lipids, Saccharides, proteins, organic with mutagens, such as ultra-violet light, alkylators e.g., acids, and nucleic acids. Reducing the amount of carbon ethyl methanesulfonate (EMS), methyl methane sulfonate necessary for growth-related activities can increase the effi (MMS), diethylsulfate (DES), and nitrosoguanidine (NTG, ciency of carbon source conversion to output. This can be NG, MMG), DNA intercalcators (e.g., ethidium bromide), achieved by first growing engineered chemoautotrophs to a nitrous acid, base analogs, bromouracil, transposons and the desired density, such as a density achieved at the peak of the like. The engineered chemoautotrophs can be propagated log phase of growth. At Such a point, replication checkpoint either in serial batch culture or in a turbidostatas a controlled genes can be harnessed to stop the growth of cells. Specifi growth rate. 10 Alternately or in addition to selective pressure, pathway cally, quorum sensing mechanisms Camilli, 2006; Venturi, activity can be monitored following growth under permissive 2006: Reading, 2006 can be used to activate genes such as (i.e., non-selective) conditions by measuring specific product p53, p21, or other checkpoint genes. Genes that can be acti output via various metabolic labeling studies (including vated to stop cell replication and growth in E. coli include radioactivity), biochemical analyses (Michaelis-Menten), 15 umu C genes, the over-expression of which stops the pro gas chromatography-mass spectrometry (GC/MS), mass gression from stationary phase to exponential growth Murli, spectrometry, matrix assisted laser desorption ionization 2000. UmuC is a DNA polymerase that can carry out trans time-of-flight mass spectrometry (MALDI-TOF), capillary lesion synthesis over non-coding lesions—the mechanistic electrophoresis (CE), and high pressure liquid chromatogra basis of most UV and chemical mutagenesis. The umu DC phy (HPLC). gene products are used for the process of translesion synthesis To generate engineered chemoautotrophs with improved and also serve as a DNA damage checkpoint. Umu C gene yield of central metabolites and/or carbon-based products of products include UmuC, Umu), umul)', Umul'C, Umulo' interest, metabolic modeling can be utilized to guide strain and Umu D. Simultaneously, the carbon product biosyn optimization. Modeling analysis allows reliable predictions thetic pathway genes are activated, thus minimizing the need of the effects on cell growth of shifting the metabolism 25 for replication and maintenance pathways to be used while towards more efficient production of central metabolites or the carbon-based product of interest is being made. products derived from central metabolites. Modeling can also Alternatively, cell growth and product production can be be used to design gene knockouts that additionally optimize achieved simultaneously. In this method, cells are grown in utilization of the energy conversion, carbon fixation and car bioreactors with a continuous Supply of inputs and continu bon product biosynthetic pathways. In some embodiments, 30 ous removal of product. Batch, fed-batch, and continuous modeling is used to select growth conditions that create selec fermentations are common and well known in the art and tive pressure towards uptake and utilization of inorganic examples can be found in Brock, 1989: Deshpande, 1992. energy and inorganic carbon. An in silico Stoichiometric In a preferred embodiment, the engineered chemoau model of host organism metabolism and the metabolic path totroph is engineered such that the final product is released way(s) of interest can be constructed (see, for example, a 35 from the cell. In embodiments where the final product is model of the E. coli metabolic network Edwards, 2002). The released from the cell, a continuous process can be employed. resulting model can be used to compute phenotypic phase In this approach, a reactor with organisms producing desir planes for the engineered chemoautotrophs of the present able products can be assembled in multiple ways. In one invention. A phenotypic phase plane is a portrait of the acces embodiment, the reactor is operated in bulk continuously, sible growth states of an engineered chemoautotroph as a 40 with a portion of media removed and held in a less agitated function of imposed Substrate uptake rates. A particular engi environment such that an aqueous product can self-separate neered chemoautotroph, at particular uptake rates for limiting out with the product removed and the remainder returned to nutrients, may not grow as well as the phenotypic phase plane the fermentation chamber. In embodiments where the product predicts, but no strain should be able to grow better than does not separate into an aqueous phase, media is removed indicated by the phenotypic phase plane. Under a variety of 45 and appropriate separation techniques (e.g., chromatography, circumstances, it has been shown the modified E. coli strains distillation, etc.) are employed. evolve towards, and then along, the phenotypic phase plane, In an alternate embodiment, the product is not secreted by always in the direction of increasing growth rates Fong, the engineered chemoautotrophs. In this embodiment, a 2004. Thus, a phenotypic phase plane can be viewed as a batch-fed fermentation approach is employed. In Such cases, landscape of selective pressure. Strains in an environment 50 cells are grown under continued exposure to inputs (inorganic where a given nutrient uptake is positively correlated with energy and inorganic carbon) as specified above until the growth rate are predicted to evolve towards increased nutrient reaction chamber is Saturated with cells and product. A sig uptake. Conversely, strains in an environment where nutrient nificant portion to the entirety of the culture is removed, the uptake are inversely correlated with growth rate are predicted cells are lysed, and the products are isolated by appropriate to evolve away from nutrient uptake. 55 separation techniques (e.g., chromatography, distillation, fil Fermentation Conditions tration, centrifugation, etc.). The engineered chemoautotrophs of the present invention In certain embodiments, the engineered chemoautotrophs are cultured in a medium comprising inorganic energy of the invention can be sustained, cultured or fermented under Source(s), inorganic carbon Source(s) and any required nutri anaerobic or substantially anaerobic conditions. Briefly, ents. The culture conditions can include, for example, liquid 60 anaerobic conditions refers to an environment devoid of oxy culture procedures as well as fermentation and other large gen. Substantially anaerobic conditions include, for example, scale culture procedures. a culture, batch fermentation or continuous fermentation Such The production and isolation of carbon-based products of that the dissolved oxygen concentration in the medium interest can be enhanced by employing specific fermentation remains between 0 and 10% of saturation. Substantially techniques. One method for maximizing production while 65 anaerobic conditions also includes growing or resting cells in reducing costs is increasing the percentage of the carbon that liquid medium or on Solid agar inside a sealed chamber main is converted to carbon-based products of interest. During tained with an atmosphere of less than 1% oxygen. It is highly US 8,349,587 B2 71 72 desirable to maintain anaerobic conditions in the fermenter to this instance, a stable reductive environment can be created. reduce the cost of the overall process. The electron balance would be maintained by the release of If desired, the pH of the medium can be maintained at a oxygen. Efforts to augment the NAD/Hand NADP/H balance desired pH, in particular neutral pH, such as a pH of around 7 can also facilitate in stabilizing the electron balance. by addition of a base, such as NaOH or other bases, or acid, as Consolidated Chemoautotrophic Fermentation needed to maintain the culture medium at a desirable pH. The The above aspect of the invention is an alternative to growth rate can be determined by measuring optical density directly producing final carbon-based product of interest as a using a spectrophotometer (600 nm), and the glucose uptake result of chemoautotrophic metabolism. In this approach, rate by monitoring carbon Source depletion over time. carbon-based products of interest would be produced by In another embodiment, the engineered chemoautotrophs 10 leveraging other organisms that are more amenable to making can be cultured in the presence of an electron acceptor, for any one particular product while culturing the engineered example, nitrate, in particular under Substantially anaerobic chemoautotroph for its carbon Source. Consequently, fermen conditions. It is understood that an appropriate amount of tation and production of carbon-based products of interest nitrate can be added to a culture to achieve a desired increase can occur separately from carbon Source production in a in biomass, for example, 1 mM to 100 mM nitrate, or lower or 15 bioreactor. higher concentrations, as desired, so long as the amount In one aspect, the methods of producing Such carbon-based added provides a Sufficient amount of electron acceptor for products of interest include two steps. The first-step includes the desired increase in biomass. Such amounts include, but using engineered chemoautotrophs to convert inorganic car are not limited to, 5 mM, 10 mM, 15 mM, 20 mM, 25 mM, 30 bon to central metabolites or Sugars such as glucose. The mM, 40 mM, 50 mM, as appropriate to achieve a desired second-step is to use the central metabolites or Sugars as a increase in biomass. carbon source for cells that produce carbon-based products of In some embodiments, the engineered chemoautotrophs of interest. In one embodiment, the two-stage approach com the present invention are initially grown in culture conditions prises a bioreactor comprising engineered chemoautotrophs; with a limiting amount of organic carbon to facilitate growth. a second reactor comprising cells capable of fermentation; Then, once the Supply of organic carbon is exhausted, the 25 wherein the engineered chemoautotrophs provides a carbon engineered chemoautotrophs transition from heterotrophic to Source Such as glucose for cells capable of fermentation to autotrophic growth relying on energy from an inorganic produce a carbon-based product of interest. The second reac energy sources to fix inorganic carbon in order to produce tor may comprise more than one type of microorganism. The carbon-based products of interest. The organic carbon can be, resulting carbon-based products of interest are Subsequently for example, a carbohydrate source. Such sources include, for 30 separated and/or collected. example, Sugars such as glucose, Xylose, arabinose, galac Preferably, the two steps are combined into a single-step tose, mannose, fructose and starch. Other sources of carbo process whereby the engineered chemoautotrophs convert hydrate include, for example, renewable feedstocks and bio inorganic energy and inorganic carbon and directly into cen mass. Exemplary types of biomasses that can be used as tral metabolites or Sugars such as glucose and Such organisms feedstocks in the methods of the invention include cellulosic 35 are capable of producing a variety of carbon-based products biomass, hemicellulosic biomass and lignin feedstocks or of interest. portions offeedstocks. Such biomass feedstocks contain, for The present invention also provides methods and compo example, carbohydrate Substrates useful as carbon Sources sitions for Sustained glucose production in engineered Such as glucose, Xylose, arabinose, galactose, mannose, fruc chemoautotrophs wherein these or other organisms that use tose and starch. Given the teachings and guidance provided 40 the Sugars are cultured using inorganic energy and inorganic herein, those skilled in the art would understand that renew carbon for use as a carbon Source to produce carbon-based able feedstocks and biomass other than those exemplified products of interest. In such embodiments, the host cells are above also can be used for culturing the engineered chemoau capable of secreting the Sugars, such as glucose from within totrophs of the invention. In some embodiments, the engi the cell to the culture media in continuous or fed-batch in a neered chemoautotrophs are optimized for a two stage fer 45 bioreactor. mentation by regulating the expression of the carbon product Certain changes in culture conditions of engineered biosynthetic pathway. chemoautotrophs for the production of Sugars can be opti In one aspect, the percentage of input carbon atoms con mized for growth. For example, conditions are optimized for Verted to hydrocarbon products is an efficient and inexpensive inorganic energy source(s) and their concentration(s), inor process. Typical efficiencies in the literature are ~<5%. Engi 50 ganic carbon Source(s) and their concentration(s), electron neered chemoautotrophs which produce hydrocarbon prod acceptor(s) and their concentrations, addition of supplements ucts can have greater than 1, 3, 5, 10, 15, 20, 25, and 30% and nutrients. As would be apparent to those skilled in the art, efficiency. In one example engineered chemoautotrophs can the conditions sufficient to achieve optimum growth can vary exhibit an efficiency of about 10% to about 25%. In other depending upon location, climate, and other environmental examples, such microorganisms can exhibit an efficiency of 55 factors, such as the temperature, oxygen concentration and about 25% to about 30%, and in other examples such engi humidity. Other adjustments may be required, for example, neered chemoautotrophs can exhibit >30% efficiency. an organism’s ability for carbon uptake. Increased inorganic In some examples where the final product is released from carbon, such as in the form of carbon dioxide, may be intro the cell, a continuous process can be employed. In this duced into a bioreactor by a gas sparger or aeration devices. approach, a reactor with engineered chemoautotrophs pro 60 Advantages of consolidated chemoautotrophic fermenta ducing for example, fatty acid derivatives, can be assembled tion include a process where there is separation of chemical in multiple ways. In one example, a portion of the media is end products, e.g., glucose, spatial separation between end removed and allowed to separate. Fatty acid derivatives are products (membranes) and time. Additionally, unlike tradi separated from the aqueous layer, which can in turn, be tional or cellulosic biomass to biofuels production, pretreat returned to the fermentation chamber. 65 ment, saccharification and crop plowing are obviated. In another example, the fermentation chamber can enclose The consolidated chemoautotrophic fermentation process a fermentation that is undergoing a continuous reduction. In produces continuous products. In preferred embodiments, the US 8,349,587 B2 73 74 process involves direct conversion of inorganic energy and Carbon Fingerprinting inorganic carbon to product from engineered front-end organ Biologically-produced carbon-based products, e.g., etha isms to produce various products without the need to lyse the nol, fatty acids, alkanes, isoprenoids, represent a new com organisms. For instance, the organisms can utilize 3PGAL to modity for fuels, such as alcohols, diesel and gasoline. Such make a desired fermentation product, e.g., ethanol. Such end biofuels have not been produced using biomass but use car products can be readily secreted as opposed to intracellular bon dioxide as its carbon source. These new fuels may be products Such as oil and cellulose. In yet other embodiments, distinguishable from fuels derived form petrochemical car organisms produce Sugars, which are secreted into the media bon on the basis of carbon-isotopic fingerprinting. Such prod and Such Sugars are used duringfermentation with the same or ucts, derivatives, and mixtures thereof may be completely different organisms or a combination of both. 10 distinguished from their petrochemical derived counterparts Processing and Separation of Carbon-Based Products of on the basis of ''C (fM) and carbon-isotopic fingerprinting, Interest indicating new compositions of matter. The carbon-based products produced by the engineered There are three naturally occurring isotopes of carbon: 'C, chemoautotrophs during fermentation can be separated from 15 'C, and ''C. These isotopes occur in above-ground total the fermentation media. Known techniques for separating carbon at fractions of 0.989, 0.011, and 10°, respectively. fatty acid derivatives from aqueous media can be employed. The isotopes ''C and "Care stable, while ''C decays natu One exemplary separation process provided herein is a two rally with a half-life of 5730 years to 'N, a beta particle, and phase (bi-phasic) separation process. This process involves an anti-neutrino. The isotope C originates in the atmo fermenting the genetically-engineered production hosts sphere, dueprimarily to neutron bombardment of 'N caused under conditions Sufficient to produce for example, a fatty ultimately by cosmic radiation. Because of its relatively short acid, allowing the fatty acid to collect in an organic phase and half-life (in geologic terms), ''C occurs at extremely low separating the organic phase from the aqueous fermentation levels in fossil carbon. Over the course of 1 million years media. This method can be practiced in both a batch and without exposure to the atmosphere, just 1 part in 10 will continuous fermentation setting. 25 remain ''C. Bi-phasic separation uses the relative immiscibility offatty The 'C:''C ratio varies slightly but measurably among acid to facilitate separation. A skilled artisan would appreci natural carbon sources. Generally these differences are ate that by choosing a fermentation media and the organic expressed as deviations from the 'C:''C ratio in a standard phase such that the fatty acid derivative being produced has a material. The international standard for carbon is Pee Dee high log Pvalue, even at very low concentrations the fatty acid 30 Belemnite, a form of limestone found in South Carolina, with can separate into the organic phase in the fermentation vessel. a C fraction of 0.01 12372. For a carbon source a, the devia When producing fatty acids by the methods described tion of the 'C:''C ratio from that of Pee Dee Belemnite is herein, such products can be relatively immiscible in the expressed as: fermentation media, as well as in the cytoplasm. Therefore, 6-(R/R)-1, where R'C:''C ratio in the natural the fatty acid can collect in an organic phase either intracel 35 source, and R. "C: 'Cratio in Pee Dee Belemnite, the stan lularly or extracellularly. The collection of the products in an dard. organic phase can lessen the impact of the fatty acid derivative For convenience, Ö, is expressed in parts per thousand, or on cellular function and allows the production host to produce %0. A negative value of 8, shows a bias toward ''Cover Cas more product. 40 compared to Pee Dee Belemnite. Table 2 shows 8, and ''C The fatty alcohols, fatty acid esters, waxes, and hydrocar fraction for several natural sources of carbon. bons produced as described herein allow for the production of homogeneous compounds with respect to other compounds TABLE 2 wherein at least 50%, 60%, 70%, 80%, 90%, or 95% of the 'C:'C variations in natural carbon sources fatty alcohols, fatty acid esters, waxes and hydrocarbons pro 45 duced have carbon chain lengths that vary by less than 4 Source -8 (%o) References carbons, or less than 2 carbons. These compounds can also be Underground coal 32.5 Farquhar, 1989 produced so that they have a relatively uniform degree of Fossil fuels 26 Farquhar, 1989 saturation with respect to other compounds, for example at Ocean DIC O-15 Goericke, 1994: Ivlev, 2010 least 50%, 60%, 70%, 80%, 90%, or 95% of the fatty alco 50 Atmospheric CO2 6-8 Ivlev, 2010; Farquhar, 1989 hols, fatty acid esters, hydrocarbons and waxes are mono-, Freshwater DIC* 6-14 Dettman, 1999 di-, or tri-unsaturated. Pee Dee Belemnite O Ivlev, 2010) Detection and Analysis *DIC = dissolved inorganic carbon. Generally, the carbon-based products of interest produced using the engineered chemoautotrophs described herein can 55 Biological processes often discriminate among carbon iso be analyzed by any of the standard analytical methods, e.g., topes. The natural abundance of ''C is very small, and hence gas chromatography (GC), mass spectrometry (MS) gas chro discrimination for or against ''C is difficult to measure. Bio matography-mass spectrometry (GCMS), and liquid chroma logical discrimination between 'Cand 'C, however, is well tography-mass spectrometry (LCMS), high performance liq documented. For a biological product p, we can define similar uid chromatography (HPLC), capillary electrophoresis, 60 quantities to those above: Matrix-Assisted Laser Desorption Ionization time-of-flight 8-(R/R)-1, where R =''C:''C ratio in the biological mass spectrometry (MALDI-TOF MS), nuclear magnetic product, and R='C: Cratio in Pee Dee Belemnite, the resonance (NMR), near-infrared (NIR) spectroscopy, vis standard. cometry Knothe, 1997: Knothe, 1999, titration for deter Table 3 shows measured deviations in the 'C:"Cratio for mining free fatty acids Komers, 1997, enzymatic methods 65 Some biological products that arise from carbon fixation by Bailer, 1991, physical property-based methods, wet chemi the Calvin cycle. Other carbon fixation pathways provide cal methods, etc. different “fingerprint” 'C:''C ratios. US 8,349,587 B2 75 76 TABLE 3 TABLE 4-continued PC: C variations in selected biological products. Sequences -8. -epsilon SEQ Product (%o) (%o)* References ID NO Sequence Plant Sugarf starch from 18-28 10-20 Ivlev, 2010) atmospheric CO2 Cyanobacterial biomass from 18-31 16.5-31 Goericke, 1994: 3 Codon optimized Candida boidini NAD' FDH gene marine DIC Sakata, 1997 Codon optimized Saccharomyces cerevisiae S288c NAD' FDH gene Cyanobacterial lipid from marine 39-40 37.5-40 Sakata, 1997 10 s Clostridium pasteurianum putative ferredoxin-FDH FolhF Subunit DIC amino acid sequence Algal lipid from marine DIC 17-28 15.5-28 Goericke, 1994: Clostridium pasteurianum putative ferredoxin-FDH FolhD subunit Abelseon, 1961 amino acid sequence Algal biomass from freshwater 17-36 3-30 Marty, 2008) Clostridium pasteurianum putative FDH-associated ferredoxin DIC domain containing protein 1 amino acid sequence E. coilipid from plant Sugar 15-27 near O Monson, 1980 15 Clostridium pasteurianum putative FDH-associated ferredoxin Cyanobacterial lipid from fossil 635-66 37.5-40 - domain containing protein 2 amino acid sequence carbon Codon optimized Aquifex aeolicus VF5 SQR gene Cyanobacterial biomass from 42.5-57 16.5-31 – 10 Codon o fossil carbon imized Nostoc sp. PCC 7120 SQR gene 11 Codon o imized Chlorobium tepidum TLS SQR gene *epsilon = fractionation by a biological process in its utilization of 'C versus C (see text) 12 Codon o imized Acidithiobacilius ferrooxidans ATCC 23270 SQR gene 13 Codon optimized Allochromatium vinosum DSM 180 SQR gene Table 3 introduces a new quantity, epsilon. This is the 14 Codon o imized Rhodobacter capsulatus SB 1003 SQR gene discrimination by a biological process in its utilization of 'C 15 Codon o imized Thiobacilius denitrificans ATCC 25259 SQR gene vs. C. We define epsilon as follows: epsilon=(R/R)-1. 16 Codon o imized Magnetococcus sp. MC-1 SQR gene 17 Codon o This quantity is very similar to 8, and 6, except we now 25 imized Clostridium pasteuriant in ferredoxin gene compare the biological product directly to the carbon Source 18 Codon o imized Hydrogenobacter thermophilus TK-6 fix1 gene 19 Codon o imized Hydrogenobacter thermophilus TK-6 flx2 gene rather than to a standard. Using epsilon, we can combine the 2O Codon o imized Meihanosarcina barkeri str. Fusaro ferredoxin bias effects of a carbon Source and a biological process to gene obtain the bias of the biological product as compared to the 21 Codon optimized Aquifex aeolicits fix7 gene standard. Solving for 8, we obtain: 8 (epsilon)(6)+epsi 30 22 Aquifex aeolicits fix7 amino acid sequence lon+6, and, because (epsilon)(ö) is generally very Small 23 Codon optimized Aquifex aeolicits fix6 gene compared to the other terms, Örö,-epsilon. 24 Aquifex aeolicus fix6 amino acid sequence For a biological product having a production process with 25 Codon optimized gamma-proteobacterium NOR51-B MCR gene a known epsilon, we may therefore estimate 8 by summing 26 Codon optimized Roseiflexus castenholzii DSM 13941 MCR gene 27 Codon optimized marine gamme proteobacterium HTCC2080 MCR 6, and epsilon. We assume that epsilon operates irrespective 35 of the carbon source. 28 Codon optimized Erythrobacter sp. NAP1 MCR gene This has been done in Table 3 for cyanobacterial lipid and 29 Codon optimized Chloroflexus aurantiacus J-10-fl MCR gene biomass produced from fossil carbon. As shown in the Tables 30 Codon optimized Chloroflexus aurantiacus PCS gene above, cyanobacterial products made from fossil carbon (in 31 Chloroflexits aurantiacus PCS amino acid sequence the form of for example, flue gas or other emissions) can have 40 32 Codon optimized Metaliosphaera sedulla Pech gene a higher 8, than those of comparable biological products 33 Codon optimized Metaliosphaera Seditia AccC gene made from other sources, distinguishing them on the basis of 34 Codon op imized Metaliosphaera Sedula AccB gene composition of matter from these other biological products. 35 Codon o imized Nitrosopumilus maritimus SCM1 PccB gene In addition, any product derived solely from fossil carbon can 36 Codon o imized Nitrosopumilus maritimus SCM1 AccC gene 37 Codon o imized Nitrosopumilus maritimus SCM1 AccB gene have a negligible fraction of ''C, while products made from 45 38 Codon o imized Cenarchaeum symbiosum A Pech gene above-ground carbon can have a ''C fraction of approxi 39 Codon o imized Cenarchael in Symbiostin AAccC gene mately 10'. 40 Codon o imized Cenarchael in Symbiostin AAccB gene Accordingly, in certain aspects, the invention provides 41 Codon optimized Halobacterium sp. NRC-1 PccB gene 1 various carbon-based products of interest characterized as 42 Codon optimized Halobacterium sp. NRC-1 PccB gene 2 –6(%0) of about 63.5 to about 66 and -epsilon(%0) of about 50 43 Codon o imized Halobacterium sp. NRC-1 PccB gene 1 37.5 to about 40. For carbon-based products that are derived 44 Codon o imized Halobacterium sp. NRC-1 AccC gene 2 from engineered autotrophs that make use of carbon fixation 45 Codon o imized Halobacterium sp. NRC-1 AccB gene 46 Codon o imized Methylcoccus capsulatus str. Bath HPS gene 1 pathways other than the Calvin cycle, epsilon, and thus 8 can 47 Codon o imized Methylcoccus capsulatus str. Bath HPS gene 2 vary, as previously described Hayes, 2001. 48 Codon o imized Methylcoccus capsulatus str. Bath PHI gene Sequences Provided by the Invention 55 49 Codon o imized Mycobacterium gastri MB19 HPS-PHI fusion gene Table 4 provides a summary of SEQ ID NOS:1-60 dis 50 Mycobacterium gastri MB19 HPS-PHI fusion amino acid sequence closed herein. 51 Codon optimized Synechococcus elongatus PCC 7942 GAPDH gene 52 Codon o imized Synechococcus elongatus PCC 7942 SBPase gene TABLE 4 53 Codon op imized Synechococcus elongatus PCC 7942 PRK gene 60 S4 Codon o imized Escherichia coli FocA gene Sequences 55 Codon optimized Escherichia coli FocB gene 56 Plasmid 2430 SEQ 57 Plasmid 2429 ID NO Sequence 58 Plasmid 4767 59 Plasmid 4768 1 Codon optimized Burkholderia stabilis NADP' FDH gene 65 60 Plasmid 4986 2 Codon optimized Candida methylica NAD' FDH gene US 8,349,587 B2 77 78 EXAMPLES buffer, pH 7.0 (made by titering 200 mM dipotassium hydro gen phosphate into 200 mM potassium dihydrogen phosphate The examples below are provided herein for illustrative until the solution pH reached 7.0), 15ul of 10 mM NAD(P)" purposes and are not intended to be restrictive. as appropriate, 20 ul cell lysate, and 30 Jul 0.5 M sodium formate. The absorbance at 340 nm of each sample was mea Example 1 sured every 20 seconds in a Spectramax Gemini Plus plate reader in order to monitor the reduction of NAD(P)". The assay plate was maintained at a temperature of 37° C. The Identification and Selection of Candidate measured rates of NAD(P)" reduction were normalized to the Sulfide:Quinone Oxidoreductase Enzymes number of cells used to prepare the cell lysates. The assay 10 results are shown in FIG. 21. From the assay data, the quan To identify candidate sulfide-quinone oxidoreductases titative activities of each FDH can be computed as well as (SQR) for the energy conversion pathway that uses hydrogen their cofactor preference (Table 5). Sulfide as an inorganic energy source, the Rhodobacter cap sulatus SQR was selected as the model enzyme. The R. cap TABLE 5 sulatus SQR has been functionally expressed in the heterolo 15 gous host E. coli Schütz, 1997 and demonstrated to reduce Quantitative, measured activities of FDH ubiquinone Shibata, 2001. A search of the NCBI Protein Clusters database was performed using the search term 'sul amo NADP amo NADP fide quinone reductase' and 17 different protein clusters were Plasmid min CFU min CFU ln(NADP'/NAD') identified as of Feb. 1, 2011 (CLSK2755575, CLSK2397089, negative control -0.05 O.18 CLSK2336986, CLSK2302249, CLSK22.99965, 2430 21.37 3.06 1.9 CLSK943035, CLSK.940594, CLSK917086, CLSK903971, 2429 O.12 9.79 -4.4 CLSK892.907, CLSK884384, CLSK871744, CLSK871685, CLSK870501, CLSK785404, CLSK767599, Example 3 CLSK724710). The 17 protein clusters comprised 203 puta 25 tive SQRs which were subsequently aligned using MUSCLE Engineered E. coli that Oxidizes Hydrogen Sulfide 3.8.31 using sequence YP 003443063 as an outgroup. The resulting alignment was imported into Geneious Pro 5.3.6 Plasmids comprising a high copy number replication ori and a tree was made using a neighbor-joining method. Based gin, chloramphenicol resistance marker and a codon-opti on the alignment, any sequences containing less than four of 30 mized sulfide-quinone oxidoreductase from Rhodobacter six conserved residues were eliminated from the set. The six capsulatus (sqr) gene under the control of two different rrnB conserved residues were three conserved cysteines, two con derived constitutive promoters were constructed using DNA served histidines thought to be involved in quinone binding assembly methods described in WO/2010/070295. The and the absence of a conserved aspartate that is characteristic resulting plasmids 4767 (SEQID NO:58) and 4768 (SEQID of all glutathione reductase family of flavoproteins with the 35 NO:59) were transformed into E. coli using standard plasmid exception of SQRS Griesbeck, 2000. The resulting transformation techniques. As a negative control, an expres sequences were realigned using MUSCLE and a new tree was sion plasmid without a constitutive promoter but including the Sqr gene was also constructed. made. Representative sequences from each clade were Cultures propagating each of the plasmids were inoculated selected as candidate SQRs. from glycerol stocks and grown for two days in an 8-well 40 plate with fresh LB media supplemented with 34 g/ml Example 2 chloramphenicol at 30°C. Cells were pelleted by centrifuga tion for 10 minutes at 2500 rpm and the supernatant decanted. Engineered E. coli that Transfer Electrons from The cell pellets were resuspended in 2 ml of SQRassay buffer Formate to NADH or NADPH (5 g/L Sodium chloride, 5 mM magnesium chloride hexahy 45 drate, 1 mM calcium chloride dihydrate, 20 mM Tris-HCl, pH Plasmids comprising a high copy number replication ori 7.5). The absorbance at 600 nm of a 100 ul aliquot of each gin, chloramphenicol resistance marker and each of two dif resuspended culture was measured to monitor the cell density. ferent codon-optimized formate dehydrogenase (fdh) genes The assay reactions were prepared in a 96-well plate contain under the control of an irrnB-derived constitutive promoter ing 0, 100, 150, 200 ul of SQR assay buffer; 10 ul of 0.1M were constructed using DNA assembly methods described in 50 sodium sulfide; and 200, 100, 50, and Oul of resuspended WO/2010/070295. The resulting plasmids 2430 (SEQ ID cells. The absorbance at 600 nm of each assay reaction was NO:56) and 2429 (SEQ ID NO:57) and transformed into E. measured to monitor the cell density. The sampling reactions coli using standard plasmid transformation techniques. As a were prepared in a 96-well assay plate and contained the negative control, an expression plasmid without any fah gene following: 90 ul of Tris-HCl, pH 7.5; 8 ul aliquot from sam was also constructed. As a positive control, purified NAD"- 55 pling plate; and 8 ul Cline reagent Cline, 1969. The absor dependent FDH enzyme obtained from commercial sources bance at 670 nm of each sampling reaction was measured to was used. monitor the Sulfide concentration. The assay results are Cultures propagating each of the plasmids were inoculated shown in FIG. 22. Based on this data, we estimate the sulfide from glycerol stocks and grown overnight in a 24-well plate oxidation rates in the cell resuspensions to be between 2-3.5 with fresh LB media supplemented with 34 ug/ml chloram 60 mM hour' or roughly 0.5-2.0 mmol sulfideg DCW hour'. phenicol at 37°C. The grown cultures were then diluted into Example 4 1 ml fresh media in a 96-well plate. Cells were pelleted by centrifugation for 10 minutes at 3000xg and the supernatant Engineered E. coli Producing Propionyl-coA from decanted. The cell pellets were resuspended in 100 ulcom 3-Hydroxypropionate plete B-PER (contains DNaseI and lysozyme). The assay 65 reactions were prepared in a 96-well assay plate and con Plasmids comprising a high copy number replication ori tained the following: 100 ul of 200 mM potassium phosphate gin, chloramphenicol resistance marker and a codon-opti US 8,349,587 B2 79 80 mized propionyl-coA synthase from Chloroflexus aurantia TABLE 6 cus (pcs) gene under the control of two different rrnB-derived constitutive promoters were constructed using DNA assem Formate uptake by various deletion strains, minimal medium bly methods described in WO/2010/070295. The resulting Strain genotype O 2O 40 60 240 plasmid 4986 (SEQID NO:60) was transformed into E. coli negative control 88 89 98 90 85 using standard plasmid transformation techniques. As a nega AfdhF 89 91 85 66 46 tive control, an expression plasmid without the pcs gene was AfdnG 84 8O 65 48 14 also constructed. AfdoG 84 77 93 S4 S4 Cultures propagating each of the plasmids were inoculated AselA 84 130 93 88 77 from glycerol stocks and grown overnight in a 24-well plate 10 AselB 89 124 95 86 59 with fresh LB media supplemented with 34 ug/ml chloram phenicol at 37° C. Cells were pelleted by centrifugation and the supernatant decanted. The cell pellets were resuspended TABLE 7 in 600 ul complete B-PER (contains DNasel and lysozyme) 15 Formate uptake by various deletion strains, rich medium and incubated for 30 minutes at 37°C. The assay reactions were prepared in a 96-well assay plate and contained the Strain genotype O 2O 40 60 240 following: 71 ul of reaction buffer (3 mM ATP, 0.5 mM negative control 68 74 74 64 70 CoASH, 0.4 mMNADPH, 1xPCS buffer), 20 pull of cell lysate AfdhF 81 76 74 66 62 and 9 ul of a ten-fold dilution of chemically synthesized AfdnG 73 74 66 57 28 AfdoG 77 74 69 63 64 3-hydroxypropionate (see below). The 1xPCS buffer con AselA 77 78 76 72 78 tained 100 mM Tris-HCl, pH 7.6, 10 mM potassium chloride, AselB 72 46 67 60 76 5 mM magnesium chloride hexahydrate, 2 mM 1,4-dithio erythritol. The absorbance at 340 nm of each assay reaction was measured every 12 seconds to monitor the oxidation of 25 Example 6 NADPH. As controls, the assay reaction contain lysate from a strain propagating plasmid 4986 was also assayed in the Assay Methods to Measure Hydrogenase Activity absence of each required substrate (ATP CoASH, NADPH, 3-hydroxypropionate or 3-HPAA). The assay results are 30 The following assay can be used to measure hydrogenase shown in FIG. 23. enzyme activity in intact cells. All steps are performed in a The chemical 3-hydroxypropionate is used a substrate in Shel-labs Bactron IV anaerobic chamber containing anaero enzymatic assays of propionyl-coA synthase (PCS). 3-hy bic mixed gas (90% nitrogen gas, 5% hydrogen gas, 5% droxypropionate can be made via chemical synthesis from carbon dioxide). Cultures with and without hydrogenase B-propiolactone via the following method. A Solution is pre 35 activity are inoculated from single colonies on LB-agar plates pared containing 0.3 M technical grade 3-propiolactone and grown overnight in a 24-well plate with fresh LB media. (Sigma Aldrich catalog number P-5648) and 2 M sodium An aliquot of each culture (1-2 ml) is pelleted by centrifuga hydroxide and incubated overnight at room temperature. The tion and the Supernatant decanted. The cells are then resus solution is then neutralized with either hydrochloric acid or pended in 1-2 ml 50 mM Tris-HCl, pH 7.6. A very small phosphoric acid. The presence of the reaction product 3-hy 40 amount of sodium dithionite is picked up with a pipette tip droxypropionate can be confirmed via LC-MS. LC-MS can and dissolved into 100 ul of 50 mM Tris-HCl, pH 7.6. The also reveal that no other measurable side-products are assay reactions are prepared in a 96-well plate and contain the formed. Since the starting material, B-propiolactone, is following: 100 ul resuspended cells and 100 ul 0.8 mM highly bacteriocidal, but the product, 3-hydroxypropionate, methyl viologen in 50 mM Tris-HCl, pH 7.6. The 96-well is not, growth inhibition assays can also be used to demon 45 plate is then loaded into a Biochrom UVM340 spectrophoto strate complete conversion of the starting material. metric plate reader and the absorbance at 600 nm is measured at 45 second intervals. To validate the assay, we assayed E. Example 5 coli strain 242 (K strain MG 1655), strain 312 (B strain BL21 DE3 with plysSplasmid) and strain 393 (B strain BL21 DE2 Engineered E. coli with Reduced Competing 50 with genes tonA, hycE, hyaB and hybC deleted). E. coli K Formate Uptake Activity strains are known to have hydrogenase activity whereas B strains do not Pinske, 2011. Assay results are shown in FIG. The formate uptake of a series of gene deletion strains of E. 24. coli were analyzed as to identify genes responsible for com peting, endogenous formate uptake activity in E. coli. All 55 Example 7 deletion strains were obtained from the Keio collection Baba, 2006. The negative control was the absence of cells. Identification and Sequencing of a Cultures were grown aerobically in LB medium supple Formate-Ferredoxin Oxidoreductase from mented with 50 mM formate overnight, harvested by cen Clostridium pasteurianum trifugation, resuspended in fresh LB medium with formate, 60 and incubated for four hours to allow the cells to reenter A culture sample of Clostridium pasteurianum W5 (ATCC growth phase. The cells were then resuspended in either M9 6013) was obtained from the ATCC (genome size is 3.9 Mbp) minimal medium with 50 mM formate as the sole carbon Fogel, 1999. The strain was cultured under anaerobic con source (results shown in Table 6) or LB medium with 50 mM ditions in reinforced clostridial medium (Difco). Four ali formate (results shown in Table 7). Assays for formate levels 65 quots of 1 ml of culture were pelleted by centrifugation at (as measured in mM of formate) were performed as described 6000xg for 5 minutes and the supernatant removed by aspi in Example 8 at different timepoints. ration. Genomic DNA was isolated with the Wizard genomic US 8,349,587 B2 81 82 DNA purification kit (Promega) according to the manufac TABLE 8 turer's instructions for Gram-positive bacteria with the fol lowing exceptions. In the lysis step, 10 mg/L lysozyme in 10 Endpoint A600 nm measurements of Strain 149 mM Tris, 0.5 mM EDTA, pH 8.2 was used without any Growth conditions Average Std Dev additional lysis enzymes. Also, 10 mM Tris, 0.5 mM EDTA, Negative control O.O358 O.OOO3 pH 8.2 was used in lieu of DNA rehybridization solution. The M9 media + glucose O.O363 O.OOO8 DNA yield was approximate 26 lug of DNA from 4 ml of M9 media + glucose + glutamate 0.2155 O.OO73 culture. The genomic DNA was sequenced at the Harvard/ M9 media + glucose + proline O.1913 O.OO41 MGH sequencing facility. They prepared 160 bp inserts from M9 media + glucose + glutamate + proline O.2145 O.0049 the genomic DNA and obtained 300MM 75 bp paired end 10 reads on an Illumina HiSeq sequencer. The resulting coverage When grown under anaerobic conditions, E. coli runs a was 5000x. De novo assembly of the reads using Velvet branched version of the tricarboxylic acid cycle. Hence, the resulting in 170 contigs greater than 5 kb in length comprising glutamate/proline auxotrophy phenotype of strains such as 3.9 Mbp. The resulting contigs were analyzed by Glimmer 15 Strain 149 in which the iccd gene is rendered non-functional resulting in 3474 identified ORFs comprising 3.6 Mbp. A can be rescued by introduction of an exogenous, functional BLASTable database of amino acid sequences of all identi 2-oxoglutarate synthase (FIG. 26). fied ORFs was produced using NCBI BLAST formatdb tool and Subsequently a BLASTable contig database was gener Example 10 ated. Based on inspection of the BLAST results, two putative FDH subunits were identified (SEQ ID NO:5 and SEQ ID Methods for Growth-Based Selections for Formate NO:6) as well as two putative associated ferredoxin domain Utilization containing subunits (SEQID NO:7 and SEQID NO:8). Using a model of E. coli metabolism Edwards, 2002, the Example 8 25 phenotypic phase planes for E. coli under a variety of growth conditions were computed. The growth conditions examined Assay Methods to Measure Formate Uptake by included formate co-metabolism with a second, limiting Intact Cells organic carbon Source under both anaerobic and aerobic (i.e., unlimited oxygen uptake) conditions. The organic carbon The following assay can be used to measure formate levels 30 Sources examined include glucose, glycerol, malate. Succi in cultures thereby facilitating measurement of formate nate, acetate and glycolate. For each carbon Source, several in uptake by intact cells. Cultures are inoculated from glycerols silico genotypes were evaluated including (1) wild-type E. and grown overnight in a 24-well plate with fresh LB media coli, (2) E. coli with its native formate dehydrogenases (FDH) supplemented with the appropriate antibiotic as needed. The enzymes removed, (3) wild-type E. coli with a heterologous cultures are pelleted and an aliquot of the supernatant (300 ul) 35 NAD(P)-dependent FDH and (4) E. coli with native FDHs is saved. The assay reactions are prepared in a 96-well plate removed and a heterologous NAD(P)-dependent FDH. The and contain the following: 80 ul of 200 mM potassium phos purpose of the analysis was to identify growth conditions that phate buffer pH 7.0, 15ul of freshly prepared 100 mMNAD", created selective pressure for increased formate uptake and 35ul of culture supernatant, 20 ul of 100x dilution of pure utilization. Based on the computed phenotypic phase planes FDH enzyme purchased commercially. The 96-well plate is 40 (FIG. 27), increased formate uptake correlated with increased then loaded into a Spectramax spectrophotometric plate growth rates under aerobic growth conditions with a non reader and the absorbance at 340 nm is measured at 12 second fermentable inorganic carbon SOUC intervals preceded by 5 seconds of mixing. The rate of NADH (glycerold succinate-malate propionate-acetate glycolate). formation can be calculated from the rate of change in the Hence, this set of growth conditions is the preferred set of absorbance at 340 nm and varies with the level of formate in 45 conditions for growth-based selections for formate utiliza the sample (FIG. 25). tion. The model analysis also suggests that wildtype E. coli is capable of growth on formate as a sole carbon Source with a Example 9 predicted doubling time of 1.4 days and that inclusion of an exogenous NAD-dependent FDH reduces the doubling time Methods for Growth-Based Selections for 50 (FIG. 28). 2-Oxoglutarate Synthase Activity E. coli strains can be evolved for improved formate utili Zation either through repeated Subculturing or through con To select for functional 2-oxoglutarate synthase activity in tinuous culturing in a chemostat or turbidostat using the E. coli, the following growth-based selection can be used. A above culture conditions. strain with the gene encoding isocitrate dehydrogenase ren 55 dered non-functional is used such that the strain cannot make Example 11 2-oxoglutarate (a precursor to glutamate synthesis in the cell). Such a strain can only grow in glucose minimal media that is Computing Mass Transfer Limitations of Hydrogen Supplemented with either glutamate or proline (proline deg Versus Formate as an Inorganic Energy Source radation produces glutamate) Helling, 1971. Strain 149 60 (CGSC#4451) has the iccd-3 mutation rendering isocitrate The mass transfer limitations of hydrogen from the gas to dehydrogenase non-functional. Table 8 shows the results of liquid phase is illustrated here. For the purpose of this analy endpoint absorbance at 600 nm measurements of Strain 149 sis, an ideal engineered chemoautotrophthat has an unlimited grown under different conditions for 36 hours at 37°C. The capacity to (i) metabolize dissolved aqueous-phase hydrogen negative control is M9 media with glucose with no cells. All 65 and (ii) convert it and carbon dioxide to a desired fuel at 100% readings shown are an average of three measurement repli of the theoretical yield is assumed. Under these conditions, cates of the same culture. the rate of fuel production per unit of reactor volume can US 8,349,587 B2 83 84 depend solely on the rate at which hydrogen can be trans mass transfer limitations discussed above for hydrogen do not ferred from the gas phase to the liquid phase. apply. However, a modified form of the fuel productivity Fuel productivity P in units of gL':h' can be expressed equation, written for formic acid (A) instead of hydrogen (H), as the product of fuel molecular weight m, fuel molar yield still applies, as shown below. on hydrogen Y, the biomass concentration in a bioreactor X, and the specific cellular uptake rate of hydrogen q, as P-in-YX 4 shown in the equation below. Unlike hydrogen-powered electrofuels bioproduction, limits on formate-powered fuel productivity P stem only from the attainable yield, the biomass concentration in the reactor, At steady state, the bulk hydrogen uptake rate Xq is equal 10 and the specific uptake rate. We assume Y, the molar yield to the rate of hydrogen transfer from gas to liquid, meaning of fuel on formic acid, is the Stoichiometric maximum, whose the productivity can be expressed as in the equation below, value is the same as for hydrogen, 0.0467 mol isooctanol (mol where C* is the liquid-phase solubility of hydrogen, C is the HCOOH). For high-cell density cultivations of E. coli, bio liquid-phase concentration of hydrogen, and Ka is the mass mass concentrations of X=50 g|DCW L are attainable, transfer coefficient for hydrogen transport from the gas phase 15 although these values have not been observed for growth on (e.g., as bubbles sparged into the reactor) to the liquid. Kais formate or in minimal medium. For Thiobacillus strain A2, a complex function of reactor geometry, bubble size, Super naturally capable of growing on formate, observed values of ficial gas Velocity, impeller speed, etc. and is best regarded as were 0.0368 mol formate g|DCW':h' Kelly, 1979. The an empirical parameter that needs to be determined for a representative values for q and X imply a maximal isooc given bioreactor setup. tanol productivity on formate of about 10g L'h'. On the y-axis of FIG. 29, the range of reported Ka attain P=m-YKia (C-C) able in large-scale stirred-tank bioreactors is shown. Again, as a best-case scenario, an ideal engineered Although there are many reports of higher Ka values in chemoautotroph capable of maintaining rapid hydrogen laboratory-scale reactors, during scale up the inevitable uptake rates even at Vanishingly low hydrogen concentrations 25 increase in Volume-to-Surface area ratios means that main (i.e. that q is not a function of C, even as C, tends to Zero) is taining high Ka values is for practical purposes impossible. assumed. This assumption maximizes the fuel productivity at The maximum of the indicated range of 10-800h' translates Pm.Y.KaC*. to a best-case productivity of 4 g. L'h', which implies a For a fixed production targett, say 0.5 t d' (equivalent to best-case reactor volume of 6,400 L. The best-case produc 20800 g h"), the productivity P determines the required 30 tivity on formate is 10 gL'h', implying a reactor volume reactor volume V because V=t/P. Thus, both fuel productivity less than half as large would be required to achieve the same and reactor volumes, even assuming “perfect organisms, are production. Most sources that give Ka values for large scale bounded by achievable Ka values, as shown in the equations reactors have values much closer to 100 h", meaning the below. best-case productivity using formate as the inorganic energy 35 Source would be more than 15 times larger than on hydrogen. Example 12

W = - (mf Yr HC)KLa Engineered Organisms Producing Butanol 40 The enzyme beta-ketothiolase (R. eutropha PhaA or E. coli Maximal productivity corresponds to minimal reaction AtoB) (E.C. 2.3.1.16) converts 2 acetyl-CoA to acetoacetyl Volumes, and occurs at maximal values of mYCKa. CoA and CoA. Acetoacetyl-CoA reductase (R. eutropha The fuel yield cannot exceed the stoichiometric maximal PhaB) (E.C. 1.1.1.36) generates R-3-hydroxybutyryl-CoA yield. For the fuel isooctanol, the stoichiometric maximal 45 from acetoacetyl-CoA and NADPH. Alternatively, 3-hy yield is determined from the balanced chemical equation droxybutyryl-CoA dehydrogenase (C. acetobutylicum Hbd) 8CO+24H-->CHsO4-15H2O, which shows that 24 moles (E.C. 1.1.1.30) generates S-3-hydroxybutyryl-CoA from of H are required for each mole of isooctanol produced. At acetoacetyl-CoA and NADH. Enoyl-CoA hydratase (E. coli atmospheric pressure, C* is unlikely to greatly exceed 0.75 MaoC or C. acetobutylicum Crt) (E.C. 4.2.1.17) generates mM, the solubility of H in pure water. Using these represen 50 crotonyl-CoA from 3-hydroxybutyryl-CoA. Butyryl-CoA tative values for representative values form. Y., C and t, dehydrogenase (C. acetobutyllicum Bcd) (E.C. 1.3.99.2) gen the relationships between Ka and Pas well as between Ka erates butyryl-CoA and NAD(P)H from crotonyl-CoA. Alter and t are shown (FIG. 29). natively, trans-enoyl-coenzyme A reductase (Treponema Alternative electron donors have the potential to solve both denticola Ter) (E.C. 1.3.1.86) generates butyryl-CoA from the safety problem and the mass transfer problem presented 55 crotonyl-CoA and NADH. Butyrate CoA-transferase (R. by hydrogen. An ideal non-hydrogen vector for carrying elec eutropha Pct) (E.C. 2.8.3.1) generates butyrate and acetyl trical energy would share hydrogen's attractive characteris CoA from butyryl-CoA and acetate. Aldehyde dehydroge tics, which include (a) a highly negative standard reduction nase (E. coli AdhE) (E.C. 1.2.1.{3,4}) generates butanal from potential, and (b) established high-efficiency technology to butyrate and NADH. Alcohol dehydrogenase (E. coli adhE) for converting electricity into the vector. Unlike hydrogen, 60 (E.C. 1.1.1.{1,2}) generates 1-butanol from butanal and however, it would (c) have a low propensity to explode when NADH, NADPH. Production of 1-butanol is conferred by the mixed with air, and (d) have high water solubility under engineered host cell by expression of the above enzyme bio-compatible conditions. Formic acid, HCOOH, or its salts, activities. satisfies these conditions. Formic acid is stoichiometrically To create butanol-producing cells, host cells can be further equivalent to H+CO, and formate has as standard reduction 65 engineered to express acetyl-CoA acetyltransferase (atoB) potential nearly identical to that of hydrogen. Since both from E. coli K12, B-hydroxybutyryl-CoA dehydrogenase formic acid and formate salts are highly soluble in water, the from Butyrivibrio fibrisolvens, crotonase from Clostridium US 8,349,587 B2 85 86 beijerinckii, butyryl CoA dehydrogenase from Clostridium INCORPORATION BY REFERENCE beijerinckii, CoA-acylating aldehyde dehydrogenase (ALDH) from Cladosporium filvum, and adhE encoding an All publications, patents and patent applications refer aldehyde-alcohol dehydrogenase of Clostridium acetobutyli enced in this specification are incorporated herein by refer cum (or homologs thereof). 5 ence in their entirety for all purposes to the same extent as if each individual publication, patent or patent application were Example 13 specifically indicated to be so incorporated by reference. Engineered Organisms Producing Acrylate REFERENCES CITED 10 Enoyl-CoA hydratase (E. coli paaF) (E.C. 4.2.1.17) con Abelseon PH, Hoering TC. Carbon isotope fractionation in verts 3-hydroxypropionyl-CoA to acryloyl-CoA. Propionyl formation of amino acids by photosynthetic organisms. CoA synthase (E.C. 6.2.1.-, E.C. 4.2.1.- and E.C. 1.3.1.-) also Proc Natl Acad Sci. 1961; 47:623-32. converts 3-hydroxypropionyl-CoA to acryloyl-CoA Aharoni A, Keizer L. C. Bouwmeester H J, Sun Z, Alvarez (AAL47820, SEQ ID NO:30, SEQ ID NO:31). Acrylate 15 Huerta M, Verhoeven H A, Blaas J. van Houwelingen AM, CoA-transferase (R. eutropha pct) (E.C. 2.8.3.n) generates De Vos RC, van der Voet H, Jansen RC, Guis M, Mol J, acrylate--acetyl-CoA from acryloyl-CoA and acetate. Davis R. W. Schena M, van Tunen A J, O'Connell A. P. Identification of the SAAT gene involved in strawberry Other Embodiments 2O flavor biogenesis by use of DNA microarrays. Plant Cell. 2000 May; 12(5):647-62. The examples have focused on E. coli. Nevertheless, the Alber BE, Fuchs G. Propionyl-coenzyme A synthase from key concept of using genetically engineering to convert a Chloroflexus aurantiacus, a key enzyme of the 3-hydrox heterotroph into an engineered chemoautotrophis extensible ypropionate cycle for autotrophic CO fixation. J Biol to other, more complex organisms such as other prokaryotic 25 Chem. 2002 Apr. 5; 277(14): 12137-43. or eukaryotic single cell organisms such as E. coli or S. Alber B. Olinger M. Rieder A, Kockelkorn D. Jobst B, Higler cerevisiae, hosts suitable for scale up during fermentation, M, and Fuchs G. Malonyl-coenzyme A reductase in the archaea, plant cells or cell lines, mammalian cells or cell modified 3-hydroxypropionate cycle for autotrophic car lines, or insect cells or cell lines. Alternatively, the same bon fixation in archaeal Metallosphaera and Sulfolobus energy conversion, carbon fixation and/or carbon product 30 spp. J Bacteriol 2006 December; 188(24) 8551-9. biosynthetic pathways described here may be used to enhance Andersen J. B. Sternberg C. Poulsen L. K. Bjorn S. P. Givskov or augment the autotrophic capability of an organism that is M. Molin S, New unstable variants of green fluorescent natively autotrophic. protein for studies of transient gene expression in bacteria. Various aspects of the present invention may be used alone, in combination, or in a variety of arrangements not specifi- 35 Appl Environ Microbiol. 1998 June: 64(6):2240-6. cally discussed in the embodiments described in the forego Anderson J C, Voigt CA, Arkin A P. Environmental signal ing and is therefore not limited in its application to the details integration by a modular AND gate. Mol Syst Biol. 2007: and arrangement of components set forth in the foregoing 3:133. description or illustrated in the drawings. For example, Aoshima M. Ishii M., and Igarashi Y. A novel enzyme, citryl aspects described in one embodiment may be combined in 40 CoA synthetase, catalysing the first step of the citrate any manner with aspects described in other embodiments. cleavage reaction in Hydrogenobacter thermophilus TK-6. Use of ordinal terms such as “first,” “second,” “third,' etc., Mol Microbiol 2004 May: 52(3) 751-61. (a) in the claims to modify a claim element does not by itself Aoshima M. Ishii M., and Igarashi Y. A novel enzyme, citryl connote any priority, precedence, or order of one claim ele CoA lyase, catalysing the second step of the citrate cleav ment over another or the temporal order in which acts of a 45 age reaction in Hydrogenobacter thermophilus TK-6. Mol method are performed, but are used merely as labels to dis Microbiol 2004 May: 52(3) 763-70. (b) tinguish one claim element having a certain name from Aoshima M, Ishii M., and Igarashi Y. A novel biotin protein another element having a same name (but for use of the required for reductive carboxylation of 2-oxoglutarate by ordinal term) to distinguish the claim elements. isocitrate dehydrogenase in Hydrogenobacter thermophi Also, the phraseology and terminology used herein is for 50 lus TK-6. Mol Microbiol 2004 February: 51(3) 791-8.C) the purpose of description and should not be regarded as Aoshima M and Igarashi Y. A novel oxaloSuccinate-forming limiting. The use of “including.” “comprising,” or “having.” enzyme involved in the reductive carboxylation of 2-oxo “containing.” “involving.” and variations thereof herein, is glutarate in Hydrogenobacter thermophilus TK-6. Mol meant to encompass the items listed thereafter and equiva Microbiol 2006 November; 62(3) 748-59. lents thereofas well as additional items. 55 Baba T, Ara T. Hasegawa M. Takai Y. Okumura Y. Baba M. Datsenko KA, Tomita M, Wanner BL, Mori H. Construc EQUIVALENTS tion of Escherichia coli K-12 in-frame, single-gene knock out mutants: the Keio collection. Mol Syst Biol. 2006; The present invention provides among other things novel 2:2006.OOO8. methods and systems for synthetic biology. While specific 60 Bai F.W. Anderson WA, Moo-Young M. Ethanol fermenta embodiments of the subject invention have been discussed, tion technologies from Sugar and starch feedstocks. Bio the above specification is illustrative and not restrictive. Many technol Adv. 2008 January-February; 26(1):89-105. variations of the invention will become apparent to those Bailer J. de Hueber K. Determination of saponifiable glycerol skilled in the art upon review of this specification. The full in “bio-diesel.” Fresenius J Anal Chem. 1991; 340(3):186. scope of the invention should be determined by reference to 65 Bar-Even A, Noor E. Lewis N E. Milo R. Design and analysis the claims, along with their full scope of equivalents, and the of synthetic carbon fixation pathways. Proc Natl Acad Sci specification, along with Such variations. USA. 2010 May 11; 107(19):8889-94. US 8,349,587 B2 87 88 Bassham J A, Benson AA, Kay L. D. Harris AZ, Wilson AT, and Salmonella: Cellular and Molecular Biology. http:// Calvin M. The path of carbon in photosynthesis. XXI. The www.ecosal.org. ASM Press, Washington, D.C. 2010 Mar. cyclic regeneration of carbon dioxide acceptor. JAm Chem 12. Soc. 1954; 76:1760-70. Cropp T A, Wilson DJ, Reynolds KA. Identification of a Bayer TS, Widmaier DM, Temme K, Mirsky EA, SantiDV, cyclohexylcarbonyl CoA biosynthetic gene cluster and Voigt CA. Synthesis of methylhalides from biomass using application in the production of doramectin. Nat Biotech engineered microbes. JAm ChemSoc. 2009 May 13; 131 nol. 2000 September; 18(9):980-3. (18):6508-15. Davis J. H. Rubin A.J. Sauer RT. Design, construction and Berrios-Rivera SJ, San KY, Bennett G. N. The effect of characterization of a set of insulated bacterial promoters. NAPRTase overexpression on the total levels of NAD, the 10 Nucleic Acids Res. 2011 February: 39(3):1131-41. NADH/NAD+ ratio, and the distribution of metabolites in de Mendoza D. Klages Ulrich A, Cronan J E Jr. Thermal Escherichia coli. Metab Eng. 2002 July; 4(3):238-47. regulation of membrane fluidity in Escherichia coli. Brock T. Biotechnology: A Textbook of Industrial Microbi Effects of overproduction of beta-ketoacyl-acyl carrier ology. Second Edition. Sinauer Associates, Inc. Sunder 15 protein synthase I. J Biol Chem. 1983 Feb. 25; 258(4): land, Mass. 1989. 2098-101. Brugna-Guiral M, Tron P. Nitschke W. Stetter KO, Burlat B, Dellomonaco C. Clomburg J M. Miller E. N. Gonzalez R. Guigliarelli B, Bruschi M, Giudici-Orticoni M T. NiFel Engineered reversal of the B-oxidation cycle for the syn hydrogenases from the hyperthermophilic bacterium thesis of fuels and chemicals. Nature. 2011 Aug. 10; 476 Aquifex aeolicus: properties, function, and phylogenetics. (7360):355-9. Extremophiles. 2003 April; 7(2):145-57. Dennis MW, Kolattukudy PE. Alkane biosynthesis by decar Buchanan BB, Arnon DI. A reverse KREBS cycle in photo bonylation of aldehyde catalyzed by a microsomal prepa synthesis: consensus at last. Photosynth Res. 1990; 24:47 ration from Botryococcus braunii. Arch Biochem Biophys. 53. 1991 June: 287(2):268-75. BurgdorfT, vander Linden E. Bernhard M. Yin QY. Back JW, 25 Denoya CD, Fedechko R. W. Hafner E. W. McArthur HA, Hartog A F, Muisers AO, de Koster C G, Albracht S. P. Morgenstern M R. Skinner D D, Stutzman-Engwall K. Friedrich B. The soluble NAD"-Reducing NiFel-hydro Wax R G. Wernau W. C. A second branched-chain alpha genase from Ralstonia eutropha H16 consists of six Sub keto acid dehydrogenase gene cluster (bkdFGH) from units and can be specifically activated by NADPH. J. Bac Streptomyces avermitilis: its relationship to avermectin teriol. 2005 May: 187(9):3122-32. 30 Camilli A, Bassler B L. Bacterial small-molecule signaling biosynthesis and the construction of abkdF mutant suitable pathways. Science. 2006 Feb. 24; 311 (5764): 1113-6. for the production of novel antiparasitic avermectins. J Campbell BJ, Jeanthon C, Kostka J E, Luther G W 3', Cary Bacteriol. 1995 June; 177(12): 3504-11. SC. Growth and phylogenetic properties of novel bacteria Deshpande M. V. Ethanol production from cellulose by belonging to the epsilon subdivision of the 35 coupled Saccharification/fermentation using Saccharomy enriched from Alvinella pompeiana and deep-sea hydro ces cerevisiae and cellulase complex from Sclerotium rolf thermal vents. Appl Environ Microbiol. 2001 October; sii UV-8 mutant. Appl Biochem Biotechnol. 1992 Septem 67(10):4566-72. ber: 36(3):227-34. Campbell BJ, Smith J. L. Hanson TE, Klotz, MG, Stein LY. Dettman D L, Reische AK, Lohmann KC. Controls on the Lee C K, Wu D, Robinson J M, Khouri HM, Eisen JA, 40 stable isotope composition of seasonal growth bands in Cary SC.Adaptations to submarine hydrothermal environ aragonitic fresh-waterbivalves (unionidae). Geochim Cos ments exemplified by the genome of Nautilia profindicola. mochim Acta. 1999: 63:1049-57. PLoS Genet. 2009 February; 5(2): e1000362. Doolittle, R F (Editor). Computer Methods for Macromo Canton B. Labno A, Endy D. Refinement and standardization lecular Sequence Analysis. Methods in Enzymology. 1996; of synthetic biological parts and devices. Nat Biotechnol. 45 266:3-711. 2008 July; 26(7):787-93. Edgar R C. MUSCLE: multiple sequence alignment with Cheesbrough TM, Kolattukudy PE. Alkane biosynthesis by high accuracy and high throughput. Nucleic Acids Res. decarbonylation of aldehydes catalyzed by a particulate 2004 Mar. 19; 32(5):1792-7. (a) preparation from Pisum sativum. Proc Natl AcadSci USA. Edgar R.C. MUSCLE: a multiple sequence alignment method 1984 November; 81 (21):6613-7. 50 with reduced time and space complexity. BMC Bioinfor Chen S. von Bamberg D. Hale V. Breuer M, Hardt B, Müller matics. 2004 Aug. 19: 5:113. (b) R, Floss H G, Reynolds KA, Leistner E. Biosynthesis of Edwards J S. Ramakrishna R, Palsson B O. Characterizing ansatrienin (mycotrienin) and naphthomycin. Identifica the metabolic phenotype: a phenotype phase plane analy tion and analysis of two separate biosynthetic gene clusters sis. Biotechnol Bioeng. 2002 Jan. 5; 77(1):27-36. in Streptomyces collinus Tii. 1892. Eur J. Biochem. 1999 55 Eisenreich W. Strauss G. WerZU, Fuchs G, Bacher A. Retro April; 261(1):98-107. biosynthetic analysis of carbon fixation in the phototrophic Chin J. W. Cirino PC. Improved NADPH supply for xylitol eubacterium Chloroflexus aurantiacus. Eur J. Biochem. production by engineered Escherichia coli with glycolytic 1993 Aug. 1; 215(3):619-32. mutations. Biotechnol Prog. 2011 March-April; Evans MC, Buchanan B B, Arnon DI. A new ferredoxin 27(2):333-41. doi:10.1002/btprS59. 60 dependent carbon reduction cycle in a photosynthetic bac Cline J. D. Spectrophotometric Determination of Hydrogen terium. Proc Natl Acad Sci USA. 1966 April; 55(4): 928 Sulfide in Natural Waters. Limnol Oceanogr. 1969; 14(3): 34. 454-8. Evans CT, Sumegi B, Srere PA, Sherry AD, Malloy C R. Cronan J. E. LaPorte D. Tricarboxylic Acid Cycle and Gly 13Cpropionate oxidation in wild-type and citrate syn oxylate Bypass. In A. Bock, R. Curtiss III, J.B. Kaper, P. D. 65 thase mutant Escherichia coli: evidence for multiple path Karp, F. C. Neidhardt, T. Nystrom, J. M. Slauch, C. L. ways of propionate utilization. Biochem J. 1993 May 1: Squires, and D. Ussery (ed.), EcoSal—Escherichia coli 291 (Pt3):927-32. US 8,349,587 B2 89 90 Farquhar G D. Ehleringer J R, and Hubick K. T. Carbon Helling R B, Kukora J S. Nalidixic acd-resistant mutants of isotope discrimination and photosynthesis. Annu Rev Escherichia coli deficient in isocitrate dehydrogenase. J Plant Physiol Plant Mol Biol. 1989: 40:503-37. Bacteriol. 1971 March; 105(3):1224-6. Ferenci T. Strom T. and Quayle J. R. Purification and proper Henry CS, Jankowski MD, Broadbelt LJ, Hatzimanikatis V. ties of 3-hexulose phosphate synthase and phospho-3- Genome-scale thermodynamic analysis of Escherichia heXuloisomerase from Methylococcus capsulatus. Bio coli metabolism. Biophys J. 2006 Feb. 15:90(4): 1453-61. chem J 1974 December; 144(3) 477-86. Henstra A M, Sipma J, Rinzema A, Stams A.J. Microbiology Fogel G. B. Collins CR, Li J. Brunk C F. Prokaryotic Genome of synthesis gas fermentation for biofuel production. Curr Size and SSU rDNA Copy Number: Estimation of Micro Opin Biotechnol. 2007 June; 18(3):200-6. bial Relative Abundance from a Mixed Population. Microb 10 Herter S, Fuchs G, Bacher A, Eisenreich W. A bicyclic Ecol. 1999 August;38(2):93-113. autotrophic CO2 fixation pathway in Chloroflexus auran tiacus. J Biol Chem. 2002 Jun. 7:277 (23):20277-83. (a) Fong SS, Palsson B O. Metabolic gene-deletion strains of Herter S. Busch A, Fuchs G. L-Malyl-coenzyme A lyase/ Escherichia coli evolve to computationally predicted beta-methylmalyl-coenzyme A lyase from Chloroflexus growth phenotypes. Nat. Genet. 2004 October, 36(10): 15 aurantiacus, a bifunctional enzyme involved in 1056-8. autotrophic CO, fixation. J Bacteriol. 2002 November; Friedmann S, Steindorf A, Alber BE, Fuchs G. Properties of 184(21):5999-6006. (b) Succinyl-coenzyme A:L-malate coenzyme A transferase Ho NW, Chen Z. Brainard A. P. Genetically engineered Sac and its role in the autotrophic 3-hydroxypropionate cycle charomyces yeast capable of effective cofermentation of of Chloroflexus aurantiacus. J Bacteriol. 2006 April; 188 glucose and xylose. Appl Environ Microbiol. 1998 May; (7):2646-55. 64(5):1852-9. Friedmann S, AlberBE, Fuchs G. Properties of R-citramalyl Hoffmeister M, Piotrowski M, Nowitzki U, Martin W. Mito coenzyme A lyase and its role in the autotrophic 3-hydrox chondrial trans-2-enoyl-CoA reductase of wax ester fer ypropionate cycle of Chloroflexus aurantiacus. J Bacteriol. mentation from Euglena gracilis defines a new family of 2007 April; 189(7):2906-14. 25 enzymes involved in lipid synthesis. J Biol Chem. 2005 Gehring U and Arnon D. I. Purification and properties of Feb. 11; 280(6):4329-38. -ketoglutarate synthase from a photosynthetic bacterium.J Holo H. Chloroflexus aurantiacus secretes 3-hydroxypropi Biol Chem 1972 Nov. 10; 247(21) 6963-9. onate, a possible intermediate in the assimilation of CO Gerhold D. Rushmore T. Caskey CT. DNA chips: promising and acetate. Arch Microbiol. 1989; 151(3):252-6. 30 Higler M, Menendez, C. Schägger H, Fuchs G. Malonyl toys have become powerful tools. Trends Biochem Sci. coenzyme A reductase from Chloroflexus aurantiacus, a 1999 May; 24(5):168-73. key enzyme of the 3-hydroxypropionate cycle for Goericke R. Montoya J. P. Fry B. Physiology of isotopic autotrophic CO, fixation. J Bacteriol. 2002 May: 184(9): fractionation in algae and cyanobacteria. Chapter 9 in 2404-10. "Stable Isotopes in Ecology and Environmental Science”. 35 Higler M, Huber H, Molyneaux SJ, Vetriani C, Sievert S.M. Blackwell Publishing. 1994. Autotrophic CO fixation via the reductive tricarboxylic Grantham R. Gautier C. Gouy M. Mercier R, Pavé A. Codon acid cycle in different lineages within the phylum Aquifi catalog usage and the genome hypothesis. Nucleic Acids cae: evidence for two ways of citrate cleavage. Environ Res. 1980 Jan. 11; 8(1):r49-ré2. Microbiol. 2007 January: 9(1):81-92. Greene DN, Whitney SM, Matsumura I. Artificially evolved 40 Higler M, Sievert S.M. Beyond the Calvin cycle: autotrophic Synechococcus PCC6301 Rubisco variants exhibit carbon fixation in the ocean. Ann Rev Mar Sci. 2011; improvements in folding and catalytic efficiency. Biochem 3:261-89. J. 2007 Jun. 15:404(3):517-24. Huisman G. W. Gray D. Towards novel processes for the Griesbeck C. Hauska G, Schutz M. Biological Sulfide Oxi fine-chemical and pharmaceutical industries. Curr Opin dation: Sulfide-Quinone Reductase (SQR), the Primary 45 Biotechnol. 2002 August; 13(4):352-8. Reaction. Recent Research Developments in Microbiol Ikeda T.Yamamoto M, Arai H. Ohmori D, Ishii M. IgarashiY. ogy. 2000; 4:179-203. Two tandemly arranged ferredoxin genes in the Hydro Gul-Karaguler N. Session R B, Clarke A R, Holbrook J.J. A genobacter thermophilus genome: comparative character single mutation in the NAD-specific formate dehydroge ization of the recombinant 4Fe-4S ferredoxins. Biosci nase from Candida methylica allows the enzyme to use 50 Biotechnol Biochem. 2005 June: 69(6): 1172-7. NADP. Biotechnol Lett. 2001; 23(4):283-7. Inokuma K, Nakashimada Y. Akahoshi T, Nishio N. Charac Gutteridge S. Phillips A L. Kettleborough CA, Parry M A J. terization of enzymes involved in the ethanol production of Expression of bacterial Rubisco genes in Escherichia coli. Moorella sp. HUC22-1. Arch Microbiol. 2007 July; 188 Phil Trans R Soc Lond B 313:433-45. (1):37-45. Han L, Reynolds K.A. A novel alternate anaplerotic pathway 55 Ivlev A. A. Carbon isotope effects (C/°C) in biological to the glyoxylate cycle in streptomycetes. J Bacteriol. 1997 systems. Separation SciTechnol. 2010: 36:1819-1914. August; 179(16):5157-64. Janausch I G. Zientz E. Tran Q H. Kröger A. Unden G. Hatrongjit R, Packdibamrung K. A novel NADP+-dependent C4-dicarboxylate carriers and sensors in bacteria. Biochim formate dehydrogenase from Burkholderia stabilis 15516: Biophys Acta. 2002 Jan. 17: 1553(1-2):39-56. Screening, purification and characterization. Enzyme 60 Jukes T H, Osawa S. Evolutionary changes in the genetic Microb Technol. 2010 Jun. 7:46(7):557-61. code. Comp Biochem Physiol B. 1993 November: 106(3): Hawley D. K. McClure W R. Compilation and analysis of 489-94. Escherichia coli promoter DNA sequences. Nucleic Acids Kalscheuer R, Steinbichel A. A novel bifunctional wax ester Res. 1983 Apr. 25; 11 (8):2237-55. synthase/acyl-CoA: diacylglycerol acyltransferase medi Hayes J. M. Fractionation of Carbon and Hydrogen Isotopes 65 ates wax ester and triacylglycerol biosynthesis in Acineto in Biosynthetic Processes. Rev Mineral Geochem. 2001 bacter calcoaceticus ADP1. J Biol Chem. 2003 Mar. 7: January; 43(1):225-77. 278(10):8075-82. US 8,349,587 B2 91 92 Kalscheuer R. Stölting T. Steinbüchel A. Microdiesel: Knothe G. Dependence of biodiesel fuel properties on the Escherichia coli engineered for fuel production. Microbi structure of fatty acid alkyl Esters. Fuel Process Technol. ology. 2006 September; 152(Pt 9):2529-36. 2005; 86:1059-1070. Kanao T. Kawamura M, Fukui T, Atomi H, and Imanaka T. Kolkman JA, Stemmer W P. Directed evolution of proteins by Characterization of isocitrate dehydrogenase from the exon shuffling. Nat Biotechnol. 2001 May: 19(5):423-8. green sulfur bacterium Chlorobium limicola. A carbon Komers K, Skopal F. Stloukal R. Determination of the neu dioxide-fixing enzyme in the reductive tricarboxylic acid tralization number for biodiesel fuel production. Fett/ cycle. Eur J. Biochem 2002 April; 269(7) 1926-31. (a) Lipid. 1997: 99(2):52-54. Kanao T, Fukui T, Atomi H, and Imanaka T. Kinetic and Larue TA, Kurz. W. G. Estimation of nitrogenase using a 10 colorimetric determination for ethylene. Plant Physiol. biochemical analyses on the reaction mechanism of a bac 1973 June; 51(6): 1074-5. terial ATP-citrate lyase. Eur J. Biochem 2002 July; 269(14) Li Y., Florova G, Reynolds K.A. Alteration of the fatty acid 3409-16. (b) profile of Streptomyces coelicolor by replacement of the Kaneda T. Iso- and anteiso-fatty acids in bacteria: biosynthe initiation enzyme 3-ketoacyl acyl carrier protein synthase sis, function, and taxonomic significance. Microbiol. Rev. 15 III (FabH).J. Bacteriol. 2005 June; 187(11):3795-9. 1991 June; 55(2):288-302. Liu C L. Mortenson L. E. Formate dehydrogenase of Kapust R. B. Waugh D S. Escherichia coli maltose-binding Clostridium pasteurianum. J Bacteriol. 1984 July; 159(1): protein is uncommonly effective at promoting the solubil 375-80. ity of polypeptides to which it is fused. Protein Sci. 1999 Marcia M, Ermler U, Peng G. Michel H. Anew structure August; 8(8): 1668-74. based classification of Sulfide:Quinone oxidoreductases. Keasling J D. Jones KL, Van Dien SJ. New Tools for Meta Proteins. 2010 April; 78(5):1073-83. bolic Engineering of Escherichia coli. Chapter 5 in Meta Marrakchi H. Zhang YM, Rock C.O. Mechanistic diversity bolic Engineering. Marcel Dekker. New York, N.Y. 1999. and regulation of Type II fatty acid synthesis. Biochem Soc (a) Trans. 2002 November; 30(Pt 6):1050-5. (a) Keasling J D. Gene-expression tools for the metabolic engi 25 Marrakchi H, Choi KH, Rock C O. A new mechanism for neering of bacteria. Trends Biotechnol. 1999 November; anaerobic unsaturated fatty acid formation in Streptococ 17(11):452-60. (b) cus pneumoniae. J Biol Chem. 2002 Nov. 22: 277(47): Kelly D P. Wood P. Gottschal J C, Kuenen J. G. Autotrophic 44809-16. (b) metabolism of formate by Thiobacillus strain A2. J Gen Martin VJJ. Smolke C, Keasling J D. Redesigning cells for Microbiol. 1979; 114:1-13. 30 production of complex organic molecules. ASM News. 2002: 68:336-343. Kelly JR, Rubin A.J. Davis J. H. Ajo-Franklin CM, Cumbers Martínez-Alonso M, Toledo-Rubio V, Noad R, Unzueta U, J. Czar M.J. de Mora K. Glieberman AL, Monie D D, Endy Ferrer-Miralles N. Roy P. Villaverde A. Rehosting of bac D. Measuring the activity of BioBrick promoters using an terial chaperones for high-quality protein production. Appl in vivo reference standard. J Biol Eng. 2009 Mar. 20; 3:4. 35 Environ Microbiol. 2009 December, 75(24):7850-4. Kemp MB. The hexose phosphate synthetase of Methylococ Martínez-Alonso M. Garcia-Fruitós E, Ferrer-Miralles N. cus capsulatus. Biochem J. 1972 April; 127(3):64P-65P. Rinas U, Villaverde A. Side effects of chaperone gene Kemp M. B. Hexose phosphate synthase from Methylcoccus co-expression in recombinant protein production. Microb capsulatus makes D-arabino-3-hexulose phosphate. Bio Cell Fact. 2010 Sep. 2:9:64. chem J. 1974 April; 139(1): 129-34. 40 Marty J. Planar D. Comparison of methods to determine algal Kim O B. Unden G. The L-tartrate/succinate antiporter TtdT o'C in freshwater. Limnol Oceanogr: Methods. 2008; (YgE) of L-tartrate fermentation in Escherichia coli. J 6:51-63. Bacteriol. 2007 March; 189(5):1597-603. Menendez C, Bauer Z, Huber H, Gadon N, Stetter K O, Kim JY, Jo B H, Cha H J. Production of biohydrogen by Fuchs G. Presence of acetyl coenzyme A (CoA) carboxy heterologous expression of oxygen-tolerant Hydrogen 45 lase and propionyl-CoA carboxylase in autotrophic Cre ovibrio marinus NiFel-hydrogenase in Escherichia coli. J narchaeota and indication for operation of a 3-hydroxypro Biotechnol. 2011 Jul. 20. pionate cycle in autotrophic carbon fixation. J Bacteriol. Klimke W. Agarwala R. Badretdin A, Chetvernin S, Ciufo S. 1999 February; 181(4): 1088-98. Fedorov B, Kiryutin B, O'Neill K, Resch W. Resenchuk S, Minshull J. Stemmer W P. Protein evolution by molecular Schafer S, Tolstoy I, Tatusova T. The National Center for 50 breeding. Curr Opin Chem. Biol. 1999 June; 3(3):284-90. Biotechnology Information’s Protein Clusters Database. Miroshnichenko ML, Kostrikina NA, L'Haridon S, Jeanthon Nucleic Acids Res. 2009 January; 37(Database issue): C, Hippe H. Stackebrandt E. Bonch-Osmolovskaya E A. D216-23. Nautilia lithotrophica gen. nov... sp. nov., a thermophilic Knight T. Idempotent Vector Design for Standard Assembly Sulfur-reducing epsilon-proteobacterium isolated from a of BiobrickS. DOI: 1721.1/21168. 55 deep-sea hydrothermal vent. Int J Syst Evol Microbiol. Knight T. BBF RFC10: Draft Standard for BioBrickTM bio 2002 July; 52(Pt 4): 1299-304. logical parts. DOI: 1721.1/45138. Mitsui R. Sakai Y.Yasueda H. Kato N. A novel operon encod Larkum A W. Limitations and prospects of natural photosyn ing formaldehyde fixation: the ribulose monophosphate thesis for bioenergy production. Curr Opin Biotechnol. pathway in the gram-positive facultative methylotrophic 2010 June; 21(3):271-6. 60 bacterium Mycobacterium gastri MB19. J Bacteriol. 2000 Knothe G: Dunn RO, Bagby M. O. Biodiesel: The use of February; 182(4):944-8. vegetable oils and their derivatives as alternative diesel Monson KD, Hayes J. M. Biosynthetic control of the natural fuels. Am Chem Soc Symp Series. 1997: 666:172-208. abundance of carbon 13 at specific positions within fatty Knothe G. Rapid monitoring of transesterification and assess acids in Escherichia coli. J Biol Chem. 1980; 255:11435 ing biodiesel fuel quality by NIR spectroscopy using a 65 41. fiber-optic probe. J Am Oil ChemSoc. 1999: 76(7):795 Moriya Y. Itoh M, Okuda S. Yoshizawa AC, Kanehisa M. 8OO. KAAS: an automatic genome annotation and pathway US 8,349,587 B2 93 94 reconstruction server. Nucleic Acids Res. 2007 July; Pramanik J. Trelstad PL Keasling J D. A flux-based stoichio 35(Web Server issue):W182-5. metric model of enhanced biological phosphorus removal Morweiser M. Kruse O, Hankamer B. Posten C. Develop metabolism. Wat SciTechnol. 1998: 37(4-5):609-13. (b) ments and perspectives of photobioreactors for biofuel pro Pramanik J. Trelstad PL. Schuler AJ, Jenkins D. Keasling J duction. Appl Microbiol Biotechnol. 2010 July; D. Development and validation of a flux-based stoichio 87(4): 1291-301. metric model for enhanced biological phosphorus removal Murli S. Opperman T. Smith BT, Walker G. C. A role for the metabolism. Water Res. 1998; 33(2):462-76. (C). umul C gene products of Escherichia coli in increasing Rathnasingh C. Raj S M, Lee Y. Catherine C, Ashok S, and resistance to DNA damage in stationary phase by inhibit Park S. Production of 3-hydroxypropionic acid via malo ing the transition to exponential growth. J Bacteriol. 2000 10 nyl-CoA pathway using recombinant Escherichia coli February; 182(4): 1127-35. strains. J Biotechnol 2011 Jun. 23. Murtagh, F. Complexities of Hierarchic Clustering Algo Reading NC, Sperandio V. Quorum sensing: the many lan rithms: the State of the Art. Computational Statistics Quar guages of bacteria. FEMS Microbiol Lett. 2006 January; terly. 1984; 1:101-13. 15 254(1): 1-11. Nature Genetics. 1999; 21 (1):1-60. Rock C.O., Tsay JT Heath R. Jackowski S. Increased unsat Ness J E. Del Cardayré S B, Minshull J, Stemmer W P. urated fatty acid production associated with a Suppressor of Molecular breeding: the natural approach to protein the fabA6(Ts) mutation in Escherichia coli. J. Bacteriol. design. Adv. Protein Chem. 2000: 55:261-92. 1996 September; 178(18):5382-7. OberJA. Sulfur. U.S. Geological Survey Minerals Report— Roessner CA, Spencer JB, Ozaki S. Min C, Atshaves B P. 2008. 2010; 74:1-17. Nayar P. Anousis N, Stolowich NJ, Holderman MT, Scott Orita I, Yurimoto H, Hirai R. Kawarabayasi Y. Sakai Y. Kato AI. Overexpression in Escherichia coli of 12 vitamin B12 N. The archaeon Pyrococcus horikoshii possesses a bifunc biosynthetic enzymes. Protein Expr Purif. 1995 April; tional enzyme for formaldehyde fixation via the ribulose 6(2):155-63. monophosphate pathway. J Bacteriol. 2005 June; 187(11): 25 Sachdev D, Chirgwin J M. Solubility of proteins isolated 3636-42. from inclusion bodies is enhanced by fusion to maltose Orita I, Sato T. Yurimoto H, Kato N, Atomi H, Imanaka T. binding protein or thioredoxin. Protein Expr Purif. 1998 Sakai Y. The ribulose monophosphate pathway substitutes February; 12(1): 122-32. for the missing pentose phosphate pathway in the archaeon Sachdev D, Chirgwin J M. Fusions to maltose-binding pro Thermococcus kodakaraensis. J Bacteriol. 2006 July; 188 30 tein: control of folding and solubility in protein purifica (13):4698-704. Orita I, Sakamoto N, Kato N, Yurimoto H, and Sakai Y. tion. Methods Enzymol. 2000: 326:312-21. Bifunctional enzyme fusion of 3-hexulose-6-phosphate Saitou N. Nei M. The neighbor-joining method: a new synthase and 6-phospho-3-hexuloisomerase. Appl Micro method for reconstructing phylogenetic trees. Mol Biol biol Biotechnol 2007 August: 76(2) 439-45. 35 Evol. 1987 July; 4(4):406-25. Palaniappan N, Kim BS, Sekiyama Y. Osada H. Reynolds K Sakata S. Hayes J. M. McTaggart A. R. Evans RA, Leckrone K A Enhancement and selective production of phoslactomy J. Togasaki R. K. Carbon isotopic fractionation associated cin B, a protein phosphatase IIa inhibitor, through identi with lipid biosynthesis by a cyanobacterium: relevance for fication and engineering of the corresponding biosynthetic interpretation of biomarker records. Geochim Cosmochim gene cluster. J Biol Chem. 2003 Sep. 12: 278(37):35552-7. 40 Acta. 1997; 61:5379-89. Park MO. New pathway for long-chain n-alkane synthesis Sambrook, J. Russell, D. Molecular Cloning: A Laboratory via 1-alcohol in Vibrio furnissii M1. J Bacteriol. 2005 Manual, Third Edition. CSHL Press. Cold Spring Harbor, February; 187(4): 1426-9. N.Y. 2001. Parikh MR, Greene DN, Woods KK, Matsumura I. Directed San KY, Bennett GN, Berrios-Rivera SJ, Vadali R.V.YangY evolution of RuBisCO hypermorphs through genetic selec 45 T. Horton E. Rudolph FB, Sariyar B, Blackwood K. Meta tion in engineered E. coli. Protein Eng Des Sel. 2006 bolic engineering through cofactor manipulation and its March; 19(3): 113-9. effects on metabolic flux redistribution in Escherichia coli. Patton SM, Cropp T.A., Reynolds K.A. A novel delta(3),delta Metab Eng. 2002 April; 4(2):182-92. (2)-enoyl-CoA isomerase involved in the biosynthesis of Sauer U, Canonaco F. Heri S, Perrenoud A, Fischer E. The the cyclohexanecarboxylic acid-derived moiety of the 50 soluble and membrane-bound transhydrogenases Udha polyketide ansatrienin A. Biochemistry. 2000 Jun. 27: and PntAB have divergent functions in NADPH metabo 39(25):7595-604. lism of Escherichia coli. J Biol Chem. 2004 Feb. 20; 279 Pinske C, Bönn M. Krüger S. Lindenstrau? U. Sawers R G. (8):6613-9. Metabolic Deficiencies Revealed in the Biotechnologi Schena M (editor). DNA Microarrays: A Practical Approach. cally Important Model Bacterium Escherichia coli BL21 55 The Practical Approach Series, Oxford University Press, (DE3). PLoS One. 2011; 6(8):e22830. 1999. Portis A R Jr. Parry M A. Discoveries in Rubisco (Ribulose Schena M (editor). Microarray Biochip: Tools and Technol 1.5-bisphosphate carboxylase/oxygenase): a historical ogy. Eaton Publishing Company/BioTechniques Books perspective. Photosynth Res. 2007 October; 94(1): 121-43. Division. 2000. Pramanik J. Keasling J D. Stoichiometric model of Escheri 60 Schütz M, ShahakY. Padan E, and Hauska G. Sulfide-quinone chia coli metabolism: incorporation of growth-rate depen reductase from Rhodobacter capsulatus. Purification, dent biomass composition and mechanistic energy require cloning, and expression. J Biol Chem 1997 Apr. 11; 272 ments. Biotechnol Bioeng. 1997 Nov. 20:56(4):398-421. (15) 9890-4. Pramanik J. Keasling J D. Effect of Escherichia coli biomass Self W T Hasona A, Shanmugam KT. Expression and regu composition on central metabolic fluxes predicted by a 65 lation of a silent operon, hyf, coding for hydrogenase 4 stoichiometric model. Biotechnol Bioeng. 1998 Oct. 20; isoenzyme in Escherichia coli. J Bacteriol. 2004 January; 60(2):23.0-8. (a) 186(2):580-7. US 8,349,587 B2 95 96 Serov A E, Popova AS, Fedorchuk V. V. Tishkov VI. Engi Tatusov RL, Fedorova N D. Jackson J D, Jacobs A R. Kiryu neering of coenzyme specificity of formate dehydrogenase tin B, Koonin E. V. Krylov DM, Mazumder R, Mekhedov from Saccharomyces cerevisiae. Biochem J. 2002 Nov. 1; S L, Nikolskaya AN, Rao BS, Smirnov S, Sverdlov A.V. 367(Pt3):841-7. Vasudevan S, Wolf Y I, Yin JJ, Natale DA. The COG Shetty R P. Endy D, Knight T F Jr. Engineering BioBrick 5 database: an updated version includes eukaryotes. BMC vectors from BioBrick parts. J Biol Eng. 2008 Apr. 14:2:5. Bioinformatics. 2003 Sep. 11; 4:41. Shetty R, Lizarazo M. Rettberg R, Knight T F. Assembly of van Wezel G. P. Mahr K, König M, Traag BA, Pimentel BioBrick standard biological parts using three antibiotic Schmitt E. F. Willimek A, Titgemeyer F. GlcP constitutes assembly. Methods Enzymol. 2011: 498:311-26. the major glucose uptake system of Streptomyces coeli Shibata Hand Kobayashi S. Sulfide oxidation in gram-nega 10 color A3(2). Mol. Microbiol. 2005 January: 55(2):624-36. tive bacteria by expression of the Sulfide-quinone reductase Venturi V. Regulation of quorum sensing in Pseudomonas. gene of Rhodobacter capsulatus and by electron transport FEMS Microbiol Rev. 2006 March; 30(2):274-91. to ubiquinone. Can J Microbiol 2001 September; 47(9) Vignais PM, Colbeau A. Molecular biology of microbial 855-60. 15 hydrogenases. Curr Issues Mol Biol. 2004 July; 6(2): 159 Shpaer EG. Gene Assist. Smith-Waterman and other database 88. similarity searches and identification of motifs. Methods Vignais PM, Billoud B. Occurrence, classification, and bio Mol Biol. 1997: 70:173-87. logical function of hydrogenases: an overview. Chem. Rev. Sintsov NV. Ivanovskii RN, and Kondrat'eva E N. ATP 2007 October; 107(10):4206-72. dependent citrate lyase in the green phototrophic bacte Wells MA, Mercer J, Mott RA, Pereira-Medrano AG, Burja rium, Chlorobium limicola. Mikrobiologia 1980 July A M, Radianingtyas H. Wright PC. Engineering a non August; 49(4) 514-6. native hydrogen production pathway into Escherichia coli Smith J. L., Campbell BJ, Hanson TE, Zhang CL, Cary S.C. via a cyanobacterial NiFe hydrogenase. Metab Eng. 2011 Nautilia profindicola sp. nov., a thermophilic, Sulfur-re July; 13(4):445-53. ducing epsilonproteobacterium from deep-sea hydrother 25 Wubbolts M G. Terpstra P. van Beilen J. B. Kingma J. mal vents. Int J Syst Evol Microbiol. 2008 July; 58(Pt Meesters HA, Witholt B. Variation of cofactor levels in 7): 1598-602. Escherichia coli. Sequence analysis and expression of the Smolke CD, Carrier TA, Keasling J D. Coordinated, differ pncB gene encoding nicotinic acid phosphoribosyltrans ential expression of two genes through directed mRNA ferase. J Biol Chem. 1990 Oct. 15:265(29):17665-72. cleavage and stabilization by secondary structures. Appl 30 Yamamoto M, Ikeda T. Arai H, Ishii M., and Igarashi Y. Car Environ Microbiol. 2000 December; 66(12):5399-405. boxylation reaction catalyzed by 2-oxoglutarate:ferre Smolke CD, Martin VJ, Keasling J D. Controlling the meta doxin oxidoreductases from Hydrogenobacter thermophi bolic flux through the carotenoid pathway using directed lus. Extremophiles 2010 January; 14(1) 79-85. mRNA processing and stabilization. Metab Eng. 2001 Yoon KS, Ishii M. Kodama T, Igarashi Y. Purification and October; 3(4):313-21. 35 characterization of pyruvate:ferredoxin oxidoreductase Smolke CD, Keasling J D. Effect of copy number and mRNA from Hydrogenobacter thermophilus TK-6. Arch Micro processing and stabilization on transcript and protein levels biol. 1997 May: 167(5):275-9. from an engineered dual-gene operon. Biotechnol Bioeng. YoonYG, ChoJH, Kim SC. CrefloxP-mediated excision and 2002 May 20; 78(4):412-24. (a) amplification of large segments of the Escherichia coli Smolke CD, Keasling J D. Effect of gene location, mRNA 40 genome. Genet Anal. 1998 January: 14(3):89-95. secondary structures, and RNase sites on expression of two Zarzycki J. Brecht V. Müller M. Fuchs G. Identifying the genes in an engineered operon. Biotechnol Bioeng. 2002 missing steps of the autotrophic 3-hydroxypropionate CO Dec. 30; 80(7):762-76. (b) fixation cycle in Chloroflexus aurantiacus. Proc Natl Acad Sokal R. Michener, C. A Statistical Method for Evaluating Sci USA. 2009 Dec. 15: 106(50):21317-22. Systematic Relationships. University of Kansas Science 45 Zarzycki J. Fuchs G. Co-Assimilation of Organic Substrates Bulletin. 1958; 38:1409-38. via the Autotrophic 3-Hydroxypropionate Bi-Cycle in Strauss G, Fuchs G. Enzymes of a novel autotrophic CO, Chloroflexus aurantiacus. Appl Environ Microbiol. 2011 fixation pathway in the phototrophic bacterium Chlorof. Jul. 15. lexus aurantiacus, the 3-hydroxypropionate cycle. Eur J Zdobnov. E. M. Apweiler R. InterProScan—an integration Biochem. 1993 Aug. 1; 215(3):633-43. 50 platform for the signature-recognition methods in InterPro. Strom T. FerenciT, and Quayle J. R. The carbon assimilation Bioinformatics. 2001 September; 17(9):847-8. pathways of Methylococcus capsulatus, Pseudomonas Zhang C C, Durand MC, Jeanjean R. Joset F. Molecular and methanica and Methylosinus trichosporium (OB3B) dur genetical analysis of the fructose-glucose transport system ing growth on methane. Biochem J 1974 December; 144(3) in the cyanobacterium Synechocystis PCC6803. Mol. 465-76. 55 Microbiol. 1989 September; 3(9): 1221-9. SunJ, Hopkins RC, Jenney FE. McTernan PM, Adams MW. ZhangYM, Marrakchi H, Rock CO. The FabR (YijC) tran Heterologous expression and maturation of an NADP-de Scription factor regulates unsaturated fatty acid biosynthe pendent NiFel-hydrogenase: a key enzyme in biofuel pro sis in Escherichia coli. J. Biol Chem. 2002 May 3:277(18): duction. PLoS One. 2010 May 6:5(5):e10526. 15558-65. Tabita F. R. Small CL. Expression and assembly of active 60 Zhu X, Yuasa M, Okada K. Suzuki K, Nakagawa T. Kawa cyanobacterial ribulose-1,5-bisphosphate carboxylase/ mukai M. Matsuda H. Production of ubiquinone in oxygenase in Escherichia coli containing Stoichiometric Escherichia coli by expression of various genes respon amounts of large and small subunits. Proc Natl Acad Sci sible for ubiquinone biosynthesis. J Ferm Bioeng. 1995; USA. 1985 September; 82(18):6100-3. 79(5):493-5. Tatusov RL, Koonin E.V. Lipman DJ. A genomic perspective 65 Zweiger G. Knowledge discovery in gene-expression-mi on protein families. Science. 1997 Oct. 24; 278(5338):631 croarray data mining the information output of the genome. 7. Trends Biotechnol. 1999 November; 17(11):429-36.

US 8,349,587 B2 101 102 - Continued alaccgt attg ccgaagcc cc galacct caaa citatgcgtga cc.gc.cggagt tdggtctgat 3OO cacgtggatc taggcagc caatgaacgt aaaataa.cag talaccgaggit tactgggagt 360 aacgtggt ca gcgtagctga gcacgittatg gcgacaatcc tigtactitat ccgitalactac 42O aacggggg.tc at Cagcaa.gc gat caatggit gaatgggata t cqctggcgt agcaaagaac 48O gaatatgatt taggataa gattattagt accgtgggag ccgggcggat cqggitat ct 54 O gtactggaac gtc.ttgtagc titt caatc.cg aaaaagcttic tdt attacga citat caagaa 6OO ttgc.cggc.cg aagc.catcaa ticggcttaat gaa.gc.cticta agctgttcaa C9gcc.gcggg 660 gacatcgttc agcgcgttga gaagctggag gacatggtgg cqcagt caga tigt cqttaca 72 O atcaattgtc. cqctacataa agact coaga gqcttgttta acaaaaaact tatat cocat 78O atgaaagatg gag cct at ct ttaaatact gcacgcggcg ctatttgcgt agcagaggac 84 O gttgcc.gagg Ctgtaaaatc gggcaa.gctg gctggctatg gaggcgacgt gtgggacaaa 9 OO Caac Ctgcgc cca aggacca t ccttggcgt acaatggata acaaggacca C9taggaaat 96.O gcgatgacgg ttcat atcag cigcacgagt Ctggatgcac agaag.cgitta togcagggg 1 O2O gtcaagaata t cottaattic ctatttitt ca aagaaatttg actatagacc ccaggatatic 108 O at agtgcaaa atggitt cata cqC cactaga gcttacggac aaaaaaagta a 1131

<210s, SEQ ID NO 5 &211s LENGTH: 731 212. TYPE: PRT <213> ORGANISM: Clostridium pasteurianum

<4 OOs, SEQUENCE: 5 Met Tyr Lys Ile Llys Met His Cys Thr Gly Lieu Lleu Phe Cys Lieu. Ile 1. 5 1O 15 Glin Arg Ser Val Asn Met Glu Lys Llys Val Lieu. Thr Val Cys Pro Tyr 2O 25 3O Cys Gly Ser Gly Cys Asn Lieu. Tyr Lieu Val Val Glu Gly Gly Llys Val 35 4 O 45 Val Arg Ala Glu Pro Ala Lys Gly Arg Asn. Asn. Glu Gly Lys Lieu. Cys SO 55 6 O Lieu Lys Gly Tyr Tyr Gly Trp Asp Phe Lieu. Asn Asp Pro Llys Lieu. Lieu. 65 70 7s 8O Thir Ser Arg Lieu Lys Llys Pro Met Ile Arg Lys Asn Gly Val Lieu. Glu 85 90 95 Glu Val Ser Trp Asp Glu Ala Ile Llys Phe Thr Ala Glu Asn Lieu Met 1OO 105 11 O Lys Ile Lys Ala Glin Tyr Gly Pro Asp Ala Ile Met Gly. Thr Gly Ser 115 12 O 125 Ala Arg Gly Pro Gly Asn Glu Pro Asn Tyr Ile Met Gln Llys Phe Met 13 O 135 14 O Arg Ala Ala Ile Gly. Thir Asn. Asn. Ile Asp His Cys Ala Arg Val Cys 145 150 155 160 His Gly Pro Ser Val Ala Gly Lieu. Asp Tyr Ser Lieu. Gly Gly Ala Ala 1.65 17O 17s Met Ser Asn Ser Ile Pro Glu Ile Glu Asp Thr Asp Val Val Phe Val 18O 185 19 O Phe Gly Tyr Asn Pro Ser Glu Thr His Pro Ile Val Ala Arg Arg Ile 195 2OO 2O5 Val Lys Ala Arg Glu Lys Gly Ala Lys Ile Ile Val Ala Asp Pro Arg 21 O 215 22O US 8,349,587 B2 103 104 - Continued

Lys Ile Glu Thir Wall Lys Ile Ser Asp Luell Trp Lell Glin Luell Lys Gly 225 23 O 235 24 O

Gly Thir Asn Met Ala Lell Wall Asn Ala Luell Gly Asn Wall Luell Ile Asn 245 250 255

Glu Glu Luell Tyr Asp Glu Phe Wall Glu ASn Thir Glu Gly Phe 26 O 265 27 O

Glu Glu Tyr Glu Ala Wall Lys Tyr Thir Pro Glu Ala Glu 27s 285

Ile Thir Gly Wall Ser Ala Glu Ile Arg Lys Ala Met Arg Ile 29 O 295 3 OO

Tyr Ala Ala Lys Lys Ala Thir Ile Luell Tyr Gly Met Gly Wall Cys 3. OS 310 315

Glin Phe Ser Glin Ala Wall Asp Wall Wall Lys Gly Lell Ala Ser Luell Ala 3.25 330 335

Lell Luell Thir Gly Asn Lell Gly Arg Pro Asn Wall Gly Ile Gly Pro Wall 34 O 345 35. O

Arg Gly Glin Asn Asn Wall Glin Gly Thir Cys Asp Met Gly Wall Luell Pro 355 360 365

Asn Arg Phe Pro Gly Glin Ser Wall Thir Asp Glu Ala Arg Glu 37 O 375

Lys Phe Glu Ala Trp Gly Wall Luell Ser Asp Arg Wall Gly Tyr 385 390 395 4 OO

Phe Luell Thir Glu Wall Pro His Wall Luell Glu Asp Ile 4 OS 415

Ala Ile Phe Gly Glu Asp Pro Ala Glin Ser Asp Pro Asn Ala 42O 425 43 O

Ala Glu Wall Arg Glu Ala Lell Asp Ile Asp Phe Wall Ile Wall Glin 435 44 O 445

Asp Ile Phe Met Asn Thir Ala Luell His Ala Asp Wall Wall Luell Pro 450 45.5 460

Ala Thir Ser Trp Gly Glu His Asp Gly Wall Tyr Ser Ala Ala Asp Arg 465 470

Ser Phe Glin Arg Ile Arg Ala Wall Glu Pro Met Gly Glu Ala 485 490 495

Asp Asp Trp Glu Ile Ile Glu Ile Ser Thir Ala Met Gly Pro SOO 505

Met His Tyr Asn Asn Thir Glu Glu Ile Trp ASn Glu Met Arg Ser Luell 515 525

Pro Phe Ala Gly Ala Ser Met Glu Glin Gly 53 O 535 54 O

Ala Wall Pro Trp Pro Cys Thir Ser Glu Asp Pro Gly Thir Asp Tyr 5.45 550 555 560

Lell Asp Asp Gly Phe Met Thir ASn Gly Arg Gly Lys Luell 565 sts

Phe Ala Glu Trp Arg His Pro Phe Luell Thir Asp Glu 585 59 O

Pro Luell Wall Luell Ser Thir Wall Arg Glu Gly His Tyr Ser Wall Arg 595 605

Thir Met Thir Gly Asn Arg Thir Luell Lell Ala Asp Glu Pro 610 615

Gly Ile Glu Ile Ser Wall Glu Asp Lys Glu Lell Asn Ile Lys 625 630 635 64 O

Asp Glin Glu Luell Wall Thir Wall Ser Ser Arg Arg Gly Ile Ile Thir 645 650 655 US 8,349,587 B2 105 106 - Continued

Arg Ala Ala Wall Ala Glu Arg Wall Lys Lys Gly Ala Thir Tyr Met Thir 660 665 67 O

Glin Trp Trp Wall Gly Ala Cys Asn Glu Luell Thir Ile Asp Ser Luell 675 68O 685

Asp Pro Ile Ser Lys Thir Pro Glu Phe Lys Cys Ala Wall Wall 69 O. 695 7 OO

Glu Arg Ile Asp Glin Glin Ala Glu Glin Glu Ile Glu Glu Arg 7 Os 71O 71s 72O

Ser Ser Luell Lys Glin Met Ala Glu 72 73 O

<210s, SEQ ID NO 6 &211s LENGTH: 211 212. TYPE : PRT &213s ORGANISM: Clostridium pasteurianum

<4 OOs, SEQUENCE: 6

Met Asp Arg Phe Lys Thir Ala Wall Ile Luell Ala Gly Gly Ser Ser 1. 1O 15

Arg Met Gly Phe Asp Glin Phe Luell Lys Ile Gly Glu Lys Arg Luell 25 3O

Met Asp Ile Luell Ile Asn Glu Ile Glu Glu Phe Glin Asp Ile Ile 35 4 O 45

Ile Wall Thir Asn Lys Pro Lys Glu Ser Lell Ser Cys SO 55 6 O

Arg Ile Wall Ser Asp Glu Ile Glu Ser Glin Gly Pro Lell Ser Gly Ile 65 70

His Ile Gly Luell Lys Glu Ser Ser Lys Ala Phe Ile Ala 85 90 95

Asp Met Pro Lys Wall Asn Ile Pro Ile Arg Met Glu 105 11 O

Glu Luell Ile Thir Asp Ala Asp Ala Wall Thir Glu Ala Cys 115 12 O 125

Arg Met Glin Pro Phe Asn Ala Phe Ser Glu Wall Phe Lys 13 O 135 14 O

Ile Glu Asp Luell Lell Arg Glu Gly Arg Ser Met Phe Ser Phe Ile 145 150 155 160

Asn Ile Ile Asn Thir His Phe Ile Asp Glu Asp Thir Ala Lys Tyr 1.65 17O 17s

Asn Asp Phe Asn Met Phe Phe Asn Luell ASn Thir Pro Glu Asp Luell 18O 185 19 O

Asp Phe Glin Wall Lell Tyr Asn Pro Lys Asn Met Asp Asn 195 2OO 2O5

Ile Glu 21 O

SEO ID NO 7 LENGTH: 191 TYPE : PRT ORGANISM: Clostridium pasteurianum

< 4 OOs SEQUENCE: 7 Met Arg Asn. Phe Ile Llys Lieu. Phe Lieu. Tyr Arg Lieu. Ser Gly Llys Val 1. 5 15 Gly Lys Ala Met Ser Arg Glu Val Asn. Ser Phe Val Ile Gly Asp Ala 25 US 8,349,587 B2 107 108 - Continued

Ser Cys Wall Gly Arg Ala Glu Wall Ala Cys Phe Lys Ala 35 4 O 45

His Ser Asn Arg Glu Glu Ser Ser Pro Ile Phe Wall Lys Gly Lys SO 55 6 O

Arg Arg Asp Ile Ile Thir Arg Ile His Wall Wall Asn Glu Lys Phe 65 70

Ser Wall Pro Wall Glin Arg Glin Glu Asp Ala Pro Ala Asn 85 90 95

Ala Pro Wall Gly Ala Ile Glu Lys Glu His Wall Luell Wall Wall 105 11 O

Glu Glu Glu Luell Cys Ile Gly Cys Ala Wall Met Ala Pro 115 12 O 125

Phe Gly Ala Ile Glu Wall Lys Arg Ser Glu Glu Wall Arg Wall 13 O 135 14 O

Ala Asp Lell Arg Asn Arg Asp Thir Ala Wall 145 150 155 160

Glu Ile Ser Lys Ala Luell Luell Phe Asp Pro Wall Lys Glu 1.65 17O 17s

Arg Glin Arg Asn Ile Asp Thir Wall Asn ASn Lell Ile Asp Asp 18O 185 19 O

<210s, SEQ ID NO 8 &211s LENGTH: 212. TYPE : PRT &213s ORGANISM: Clostridium pasteurianum

<4 OOs, SEQUENCE: 8

Met Thir Asn Lieu. Cys His Phe His Arg Glin Arg Glu Glu Arg Ile Ile 1. 15

Met Asn Ser Phe Wall Ile Ala Asn Pro Ile Gly Lys 25 3O

Thir Glu Ala Gly Ala Met Ala His Ser Glu Lys Asn Ile Luell 35 4 O 45

Asn Arg Ser Asp Glu Lell Phe Asn Pro Arg Lell Wall Ile SO 55 6 O

Lys Thir Asp Wall Thir Ala Pro Wall Met Cys Arg His Glu Asn 65 70

Ser Pro Ala Ser Wall Pro Asn Gly Ser Ile Thir Asn Lys Glu 85 90 95

Gly Wall Wall Luell Ile Asn Glin Asp Thir Ile Gly Lys Ser Cys 1OO 105 11 O

Met Wall Ala Pro Phe Gly Ala Ile Asn Luell Ile Wall Glin Glin Asp 115 12 O 125

Gly Glu Gly Ala Ile Thir Glin Ser Gly Luell Lys Thir Asp Gly 13 O 135 14 O

Lys Glu Ile Ile His Lys Glu Ile Wall Ala Asn Asp Luell 145 150 155 160

Ile Glu Arg Asp Gly Pro Ala Cys Wall Glu Wall Pro Thir 1.65 17s

Glu Ala Luell Arg Lell Wall Ser Gly Glu Asp Ile Glu Glu Ser Ile Lys 18O 185 19 O

Glu Arg Glu Ala Ala Ala Luell Gly Luell Ser Arg Ile Gly 195 2OO 2O5

<210s, SEQ ID NO 9 &211s LENGTH: 1293