<<

US 2015 0037853A1 (19) United States (12) Patent Application Publication (10) Pub. No.: US 2015/0037853 A1 Fischer et al. (43) Pub. Date: Feb. 5, 2015

(54) METHODS AND SYSTEMIS FOR Publication Classification CHEMOAUTOTROPHC PRODUCTION OF ORGANIC COMPOUNDS (51) Int. Cl. CI2P 7/40 (2006.01) (71) Applicant: Ginkgo BioWorks, Inc., Boston, MA CI2P 7/16 (2006.01) (US) CI2N 15/70 (2006.01) (52) U.S. Cl. (72) Inventors: Curt R. Fischer, Cambridge, MA (US); CPC. CI2P 7/40 (2013.01): CI2N 15/70 (2013.01); Austin J. Che, Cambridge, MA (US); CI2P 7/16 (2013.01) Reshma P. Shetty, Boston, MA (US); USPC ...... 435/136: 435/252.33:435/160 Jason R. Kelly, Cambridge, MA (US) (21) Appl. No.: 14/354,354 (57) ABSTRACT (22) PCT Fled: Oct. 30, 2012 The present disclosure identifies pathways, mechanisms, sys tems and methods to confer chemoautotrophic production of (86) PCT NO.: PCT/US12A62S4O carbon-based products of interest, such as Sugars, alcohols, chemicals, amino acids, polymers, fatty acids and their S371 (c)(1), derivatives, hydrocarbons, isoprenoids, and intermediates (2) Date: Apr. 25, 2014 thereof, in organisms such that these organisms efficiently convertinorganic carbon to organic carbon-based products of Related U.S. Application Data interest using inorganic energy, Such as formate, and in par (63) Continuation of application No. 13/285,919, filed on ticular the use of organisms for the commercial production of Oct. 31, 2011, now Pat. No. 8,349,587. various carbon-based products of interest. Patent Application Publication Feb. 5, 2015 Sheet 1 of 29 US 2015/0037853 A1

Forfate Patent Application Publication Feb. 5, 2015 Sheet 2 of 29 US 2015/0037853 A1

Sissy:3:

Figure 2 Patent Application Publication Feb. 5, 2015 Sheet 3 of 29 US 2015/0037853 A1

yruvate O 9 Phosphoenolpyruvate Acetyl-CoA co2 11 b Oxaloacetate Citryl-CoA

2 a s itrate

7b Fumarate co, locatea Oxalosuccinate Succinate 2-oxostant Hco, t CO2 Succinyl-CoA 6

Figure 3 Patent Application Publication Feb. 5, 2015 Sheet 4 of 29 US 2015/0037853 A1

Figure 4 Patent Application Publication Feb. 5, 2015 Sheet 5 of 29 US 2015/0037853 A1

glyoxylate mayl-CoA -2 acetyl-CoA pyruvate Oa malate 1 – HCO 1OC 9 2 malonyl-CoA citramalyl-CoA

fumarate 7 2 t 8 23-hydroxypropionate mesaconyl-C4-CoA Succinate 3 1st 2 propionyl-CoA mesaconyl-C1-CoA succinyl-CoA 6 methylmalonyl-CoA 4 Ob methylmalyl-CoA HCO glyoxylate

Figure 5 Patent Application Publication Feb. 5, 2015 Sheet 6 of 29 US 2015/0037853 A1

Figure 6 Patent Application Publication Feb. 5, 2015 Sheet 7 of 29 US 2015/0037853 A1

to 1 ner - Hexose 6-P

FructOSe 6-P

Fructose 1,6-bis-P Dihydroxyacetone-P - 4 6 5 Glyceraldehyde 3-P Erythrose 4-P Fructose 6-P

FructOSe 6-P Xylulose 5-P

Figure 7 Patent Application Publication Feb. 5, 2015 Sheet 8 of 29 US 2015/0037853 A1

Figure 8 Patent Application Publication Feb. 5, 2015 Sheet 9 of 29 US 2015/0037853 A1

Sedoheptulose 1.7-bis-P-Sedoheptulose 7-P Ribose 5-P

Erythrose 4-P Xylulose 5-P

Fructose 6-P le. Ribose 5-P

13

Fructose 1,6-bis-P Ribulose 1,5-bis-P

Dihydroxyacetone 3-P- --Glyceraldehyde 3-P- -Glycerate 1,3-bis-P- -Glycerate 3-P

Figure 9 Patent Application Publication Feb. 5, 2015 Sheet 10 of 29 US 2015/0037853 A1

Figure 10 Patent Application Publication Feb. 5, 2015 Sheet 11 of 29 US 2015/0037853 A1

Succinate 3-hydroxypropionate reductases dehydratase 2 reductases 4-hydroxybutyrate acrylic acid 1,3-propanediol

reductases eStefaSe etransaminase eSterase 1,4-butanediolates Y-butyrolactone acrylamide acrylate esters tetrahydrofuran

Figure 11 Patent Application Publication Feb. 5, 2015 Sheet 12 of 29 US 2015/0037853 A1

glutamate ... itaconic aci : synthase 8 3 reductases s 5s, Y poly(glutamic acid) tSE C c cy 8 5 w s

s w h g C s . t S w s 2-Me-1,4-butanediol glutaric acid o w dehydratase 2-Me-1,4-butanediamine

esterases 3-Me-tetrahydrofuran t ->Y-valerolactone

v 1,5-pentanediol

Figure 12 Patent Application Publication Feb. 5, 2015 Sheet 13 of 29 US 2015/0037853 A1

D-alpha-glucose-6-P 1. beta-D-fructose-6-P 2. D-mannose-6-P 3. D-mannose-1-P 4. GDP-mannose 5. GDP-L-galactose 6. L-galactose-1-P 7. L-galactose

Figure 13 Patent Application Publication Feb. 5, 2015 Sheet 14 of 29 US 2015/0037853 A1

yruvate CO2 tomato- 3 5 acetyl-CoA

4

Figure 14 Patent Application Publication Feb. 5, 2015 Sheet 15 of 29 US 2015/0037853 A1

glyceraldehyde 3-P pyruvate

1-deoxy-D-xylulose 5-P (DOXP) 2. 2-C-methylerythritol 4-P (MEP) 3. 4-diphophocytidyl-2-C-methylerythritol (CDP-ME) 4. 4-diphophocytidyl-2-C-methylerythritol 2-P (CDP-MEP) 5. 2-C-methyl-D-erythritol 2,4-cyclopyrophosphate (MEcPP) 6. (E)-4-Hydroxy-3-methyl-but-2-enyl pyrophosphate (HMB-PP)

isopenty pyrophosphate (IPP) dimethylallyl pyrophosphate (DMAPP)

Figure 15 Patent Application Publication Feb. 5, 2015 Sheet 16 of 29 US 2015/0037853 A1

acetyl-CoA 1-acetoacetyl-CoA

3-hydroxy-3-methylglutaryl-CoA (HMG-CoA) 3. mevalonic acid 4. mevalonate-5-P 5. mevalonate-5-pyrophosphate 6 —-coa isopenty pyrophosphate (IPP) 7. dimethylallyl pyrophosphate (DMAPP)

Figure 16 Patent Application Publication Feb. 5, 2015 Sheet 17 of 29 US 2015/0037853 A1

dihydroxyacetone-P 1. sn-glycerol-3-P 2. glycerol 3. 3-hydroxypropanal 4. 1,3-propanediol

Figure 17 Patent Application Publication Feb. 5, 2015 Sheet 18 of 29 US 2015/0037853 A1

2 acetyl-CoA

acetoacetyl-CoA 2 3-hydroxybutyryl-CoA 3 t (addition onto a growing chain) polyhydroxybutyrate

Figure 18 Patent Application Publication Feb. 5, 2015 Sheet 19 of 29 US 2015/0037853 A1

L-glutamate Xaloacetate

L-aspartate 2-oxoglutarate

L-aspartyl-4-P 3. L-aspartate semialdehyde 4. (S)-2,3-dihydrodipicolinate 5. (S)-2,3,4,5-tetrahydrodipicolinate 6. N-succinyl-2-amino-6-ketopimelate 7. N-succinyl-L,L-2,6-diaminopimelate 8. L,L-diaminopimelate 9. meso-diaminopimelate 0. L-lysine

Figure 19 Patent Application Publication Feb. 5, 2015 Sheet 20 of 29 US 2015/0037853 A1

3-hydroxypropionyl-CoA 1. propionyl-CoA 2 acetyl-CoA 3-ketovaleryl-CoA 3. (R)-3-hydroxyvaleryl-CoA 4. 3-pentenoyl-CoA 5. 4-hydroxypentanoyl-CoA a Cetate acetyl-CoA 4-hydroxypentanoate 7. Y-valerolactone

Figure 20 Patent Application Publication Feb. 5, 2015 Sheet 21 of 29 US 2015/0037853 A1

Negative contro, NADP+ Negative control, NAD+ 4. * Pasrid 2:30, NADP+ * * Plasmid 2438, NA+ 12 * Passniid 2:29, NAD+ * * Plasmid 2428, NAD+ Positive contris, A+ 1 :

rt c. 0.8 E 0.6 w t c. O.4

O.2

O O 50 OO 150 200 250 300 350 4OO 450 500 550 600 650 700 750 Time (s)

Figure 21 Patent Application Publication Feb. 5, 2015 Sheet 22 of 29 US 2015/0037853 A1

Negative control

& wePlasmid 4768

Time (min)

Figure 22 Patent Application Publication Feb. 5, 2015 Sheet 23 of 29 US 2015/0037853 A1

14 **** Negative control in Rasic A986 12 . :: Plasmid 4986 to AP * Plasmid 4988, 3d CoAS “Plasmid 4988, to NADPH ^plasmid 4986, a 3-HFA

E O.8

O.4

O2 is

O O 2CO 4 OO 6OC 80 OOO 1200 400 1600 18OO 2COO Time (s)

Figure 23 Patent Application Publication Feb. 5, 2015 Sheet 24 of 29 US 2015/0037853 A1

Stain 2:2 Stair 24, 2-fold ::itic WStrait 242, 4-fold citation ŠStrait 32 & Stair 392

2.500E-03 ;

y - 3.48E {4x + 1.28 E-03 2.000E-03 Rs. 9.4.8

1,500E-03

y 59E-Cax + 7.19.83 1.000E-03

5.000E-04 - y - 7, 1 E-35x - 3.58E-04:

(.000E+00 y is 400E-35x - 3.163-34

-5.000E-04 »oxw x: : 8,83E-6x S.E.C.

-1990E-93 icoscycross OS 5 S. Time (mins)

Figure 24 Patent Application Publication Feb. 5, 2015 Sheet 25 of 29 US 2015/0037853 A1

9 8 Standards

7 -MM Fit

Formate, mM

Figure 25 Patent Application Publication Feb. 5, 2015 Sheet 26 of 29 US 2015/0037853 A1

GltA AspC xaloacetate-Ho-Citraten n Aspartate Malate SOCitrate AspA FumEB an Fumarate 2-oxoglutarate & FrCABCD CO2 Succinateco S

Figure 26 Patent Application Publication Feb. 5, 2015 Sheet 27 of 29 US 2015/0037853 A1

s

.

i

res fit 88.3 F-833i

a. Patent Application Publication Feb. 5, 2015 Sheet 28 of 29 US 2015/0037853 A1

& 33i: initiatig 8&i Ekx. forcea firi.(miring E85. Ex_2e)x & Frisis:imagig Ew-hr- Existiev. W firrig . Ji

is 38

Figure 28 Patent Application Publication Feb. 5, 2015 Sheet 29 of 29 US 2015/0037853 A1

, Sys,sy. is Y.is

Ss Ss

S. &St.

a. 8 & 2 x 18 8 Maximum isooctanol productivity, gll hr

Figure 29 US 2015/003 7853 A1 Feb. 5, 2015

METHODS AND SYSTEMIS FOR (twenty minutes), reflective of low total productivities. In CHEMOAUTOTROPHC PRODUCTION OF addition, techniques for genetic manipulation (homologous ORGANIC COMPOUNDS recombination, transformation or transfection of nucleic acid molecules, and recombinant gene expression) are inefficient, STATEMENT REGARDING GOVERNMENT time-consuming, laborious or non-existent. LICENSE RIGHTS 0006. Accordingly, the ability to endow an otherwise het 0001. This invention was made with government support erotrophic organism with chemoautotrophic capability would under contract number DE-AR0000091 awarded by U.S. significantly enable more energy- and carbon-efficient pro Department of Energy, Office of ARPA-E. The government duction of carbon-based products of interest. Alternatively, the ability to add one or more additional or alternative path has certain rights in the invention. ways for chemoautotrophic capability to an autotrophic or CROSS REFERENCE TO RELATED mixotrophic organism would enhance its ability to produce APPLICATIONS carbon-based products on interest. 0002 This application claims priority to and the benefit of SUMMARY U.S. patent application Ser. No. 13/285,919, filed on Oct. 31, 2011, which is incorporated herein by reference in its entirety. 0007 Systems and methods of the present invention pro vide for efficient production of renewable energy and other TECHNICAL FIELD carbon-based products of interest (e.g., fuels, Sugars, chemi cals) frominorganic carbon (e.g., greenhouse gas) using inor 0003. The invention relates to systems, mechanisms and ganic energy. As such, the present invention materially con methods to confer chemoautotrophic production of carbon tributes to the development of renewable energy and/or based products to a heterotrophic organism to efficiently con energy conservation, as well as greenhouse gas emission Vert inorganic carbon into various carbon-based products reduction. Furthermore, systems and methods of the present using chemical energy, and in particular the use of Such invention can be used in the place of traditional methods of organism for the commercial production of various carbon producing chemicals such as olefins (e.g., ethylene, propy based products of interest. The invention also relates to sys lene), which are traditionally derived from petroleum in a tems, mechanisms and methods to confer additional and/or process that generates toxic by-products that are recognized alternative pathways for chemoautotrophic production of car as hazardous waste pollutants and harmful to the environ bon-based products to an organism that is already autotrophic ment. As such, the present invention can additionally avoid or mixotrophic. the use of petroleum and the generation of Such toxic by products, and thus materially enhances the quality of the BACKGROUND environment by contributing to the maintenance of basic life 0004 Heterotrophs are biological organisms that utilize Sustaining natural elements such as air, water and/or soil by energy from organic compounds for growth and reproduc avoiding the generation of hazardous waste pollutants in the tion. Commercial production of various carbon-based prod form of petroleum-derived by-products in the production of ucts of interest generally relies on heterotrophic organisms various chemicals. that ferment Sugar from crop biomass such as corn or Sugar 0008. In certain aspect, the invention described herein pro cane as their energy and carbon source Bai, 2008. An alter vides an organism engineered to confer chemoautotrophic native to fermentation-based bio-production is the production production of various carbon-based products of interest from of carbon-based products of interest from photosynthetic inorganic carbon and inorganic energy. The engineered organisms, such as plants, algae and cyanobacteria, that organism comprises a modular metabolic architecture derive their energy from Sunlight and their carbon from car encompassing three metabolic modules. The first module bon dioxide to support growth U.S. Pat. No. 7.981,647. comprises one or more energy conversion pathways that use However, the algae-based production of carbon-based prod energy from an inorganic energy source, such as formate, ucts of interest relies on the relatively inefficient process of formic acid, methane, carbon monoxide, carbonyl sulfide, photosynthesis to supply the reducing power needed for pro carbon disulfide, hydrogen sulfide, bisulfide anion, thiosul duction of organic compounds from carbon dioxide Larkum, fate, elemental Sulfur, molecular hydrogen, ferrous iron, 2010. Moreover, commercial production of carbon-based ammonia, cyanide ion, and/or hydrocyanic acid, to produce products of interest using photosynthetic organisms relies on reduced cofactors inside the cell, such as NADH, NADPH, reliable and consistent exposure to light to achieve the high ubiquinol, menaquinol, cytochromes, flavins and/or ferre productivities needed for economic feasibility; hence, photo doxin. The second module comprises one or more carbon bioreactor design remains a significant technical challenge fixation pathways that use energy from reduced cofactors to Morweiser, 2010. convert inorganic carbon, such as carbon dioxide, carbon 0005 Chemoautotrophs are biological organisms that uti monoxide, formate, formic acid, carbonic acid, bicarbonate, lize energy from inorganic energy sources such as molecular carbon monoxide, carbonyl sulfide, carbon disulfide, cyanide hydrogen, hydrogen Sulfide, ammonia or ferrous iron, and ion and/or hydrocyanic acid, to central metabolites, such as carbon dioxide to produce all organic compounds necessary acetyl-coA, pyruvate, , 3-hydropropionate, 3-hy for growth and reproduction. Existing, naturally-occurring droxypropionic acid, glycolate, glycolic acid, glyoxylate, chemoautotrophs are poorly Suited for industrial bio-process glyoxylic acid, dihydroxyacetone phosphate, glyceralde ing and have therefore not demonstrated commercial viability hyde-3-phosphate, malate, malic acid, lactate, lactic acid, for this purpose. Such organisms have long doubling times acetate, acetic acid, citrate and/or citric acid. Optionally, the (minimum of approximately one hour for Thiomicrospira third module comprises one or more carbon product biosyn crunogena but generally much longer) relative to industrial thetic pathways that convert central metabolites into desired ized heterotrophic organisms such as Escherichia coli products, such as carbon-based products of interest. Carbon US 2015/003 7853 A1 Feb. 5, 2015

based products of interest include but are not limited to alco carboxylase, malonyl-CoA reductase, propionyl-CoA syn hols, fatty acids, fatty acid derivatives, fatty alcohols, fatty thase, propionyl-CoA carboxylase, methylmalonyl-CoA epi acid esters, wax esters, hydrocarbons, alkanes, polymers, merase, methylmalonyl-CoA mutase, Succinyl-CoA:(S)- fuels, commodity chemicals, specialty chemicals, caro malate CoA . Succinate dehydrogenase, fumarate tenoids, isoprenoids, Sugars, Sugar phosphates, central hydratase, (S)-malyl-CoA/B-methylmalyl-CoA/(S)-citrama metabolites, pharmaceuticals and pharmaceutical intermedi lyl-CoA , mesaconyl-C1-CoA hydratase or B-methyl ates. malyl-CoA dehyratase, mesaconyl-CoA C1-C4 CoA trans 0009. The resulting engineered chemoautotroph of the ferase and mesaconyl-C4-CoA hydratase. invention is capable of efficiently synthesizing carbon-based 0016. In some embodiments, the carbon fixation pathway products of interest from inorganic carbon using inorganic can be at least partially engineered and can be derived from energy. The invention also provides energy conversion path the ribulose monophosphate (RuMP) cycle. In one embodi ways, carbon fixation pathways and carbon product biosyn ment, said carbon fixation pathway can include one or more thetic pathways for conferring chemoautotrophic production of heXulose-6-phosphate synthase, 6-phospho-3-hexuloi of the carbon-based product of interest upon the host organ Somerase, heXulose-6-phosphate synthase/6-phospho-3- ism where the organism lacks the ability to efficiently pro heXuloisomerase fusion , phosphofructokinase, fruc duce carbon-based products of interest frominorganic carbon tose bisphosphate aldolase, transketolase, transaldolase, using inorganic energy. The invention also provides methods transketolase, ribose 5-phosphate and ribulose-5- for culturing the engineered chemoautotroph to Support effi phosphate-3-epimerase. cient chemoautotrophic production of carbon-based products 0017. In some embodiments, said carbon fixation pathway of interest. can be at least partially engineered and can be derived from 0010. In one aspect, the present invention provides an the Calvin-Benson-Bassham cycle or the reductive pentose engineered cell for producing a carbon-based product of phosphate (RPP) cycle. For example, the carbon fixation interest. The engineered cell includes an at least partially pathway can include one or more of ribulose bisphosphate engineered energy conversion pathway having at least one of carboxylase, phosphoglycerate kinase, glyceraldehyde-3P a recombinant formate dehydrogenase and a recombinant dehydrogenase (phosphorylating), triose-phosphate Sulfide-quinone introduced into a host cell, isomerase, fructose-bisphosphate aldolase, fructose-bispho wherein said energy conversion pathway is capable of using sphatase, transketolase, Sedoheptulose-1.7-bisphosphate energy from oxidation to produce a reduced . The aldolase, Sedoheptulose bisphosphatase, transketolase, engineered cell also includes a carbon fixation pathway that is ribose-5-phosphate isomerase, ribulose-5-phosphate-3-epi capable of converting inorganic carbon to a central metabolite merase and phosphoribulokinase. using energy from the reduced cofactor. The engineered cell 0018. In certain embodiments, said carbon fixation path further includes, optionally, a carbon product biosynthetic way can be at least partially engineered and can be derived pathway that is capable of converting the central metabolite from the reductive tricarboxylic acid (rTCA) cycle. In some into a carbon-based product of interest. embodiments, the carbon fixation pathway can include one or 0011. In certain embodiments, the recombinant formate more of: ATP citrate lyase, citryl-CoA synthetase, citryl-CoA dehydrogenase reduces NADP". For example, the recombi lyase, malate dehydrogenase, fumarate dehydratase, fuma nant formate dehydrogenase can be encoded by SEQ ID rate reductase, Succinyl-CoA synthetase, 2-oxoglutarate: NO:1, or a homolog thereof having at least 80% sequence ferredoxin oxidoreductase, isocitrate dehydrogenase, 2-oxo identity thereto. In some embodiments, the recombinant for glutarate carboxylase, oxaloSuccinate reductase, aconitate mate dehydrogenase reduces NAD". In an example, the hydratrase, pyruvate:ferredoxin oxidoreductase, phospho recombinant formate dehydrogenase can be encoded by any enolpyruvate synthetase and phosphoenolpyruvate carboxy one of SEQID NOS:2-4, or a homolog thereofhaving at least lase. 80% sequence identity thereto. In other embodiments, the recombinant formate dehydrogenase reduces ferredoxin. As BRIEF DESCRIPTION OF THE FIGURES an example, the recombinant formate dehydrogenase can be 0019 FIG. 1 is an overview of modular architecture of an encoded by one or more of SEQID NOS:5-8, or a homolog engineered chemoautotroph. An engineered chemoautotroph thereof having at least 80% sequence identity thereto. comprises three metabolic modules. (1) In Module 1, one or 0012. In certain embodiments, the recombinant sulfide more energy conversion pathways that use energy from an quinone oxidoreductase reduces quinone. For example, the extracellular inorganic energy source. Such as formate, recombinant Sulfide-quinone oxidoreductase can be encoded hydrogen sulfide, molecular hydrogen, or ferrous iron, to by any one of SEQID NOS:9-16, or a homolog thereofhaving produce reduced cofactors inside the cell, such as NADH, at least 80% sequence identity thereto. NADPH, reduced ferredoxin and/or reduced quinones or 0013. In some embodiments, the energy conversion path cytochromes. Depicted examples of energy conversion path way includes the recombinant formate dehydrogenase and ways include formate dehydrogenase (FDH), hydrogenase the energy from oxidation is from formate oxidation. The (Hase), and Sulfide-quinone oxidoreductase (SQR). (2) In energy conversion pathway can also include the recombinant Module 2, one or more carbon fixation pathways that use Sulfide-quinone oxidoreductase and the energy from oxida energy from reduced cofactors to reduce and convert inor tion can be from hydrogen Sulfide oxidation. ganic carbon, such as carbon dioxide, formate and formalde 0014. In various embodiments, the inorganic carbon is one hyde, to central metabolites, such as acetyl-coA, pyruvate, or more of formate and carbon dioxide. glycolate, glyoxylate, and dihydroxyacetone phosphate. 0015. In certain embodiments, the carbon fixation path Depicted examples of carbon fixation pathways include the way can be at least partially engineered and can be derived 3-hydroxypropionate cycle (3-HPA), the reverse or reductive from the 3-hydroxypropionate (3-HPA) bicycle. The carbon tricarboxylic acid cycle (rTCA), and the ribulose monophos fixation pathway can include one or more of acetyl-CoA phate pathway (RuMP). (3) Optionally, in Module 3, one or US 2015/003 7853 A1 Feb. 5, 2015 more carbon product biosynthetic pathways that convert cen 0024 FIG. 6 depicts example metabolic reactions and tral metabolites into desired products, such as carbon-based needed to engineer a carbon fixation pathway products of interest. Since there are many possible carbon derived from the 3-hydroxypropionate (3-HPA) bicycle into based products of interest, no individual pathways are the heterotroph Escherichia coli. Reactions in black are depicted. reported to occur in the wildtype host cell E. coli. Reactions in 0020 FIG. 2 is a block diagram of a computing architec dark gray must be added to complete the 3-HPA bicycle ture. derived carbon fixation cycle in E. coli. The carbon input to 0021 FIG.3 depicts the metabolic reactions of the reduc the pathway is bicarbonate (HCO) and the carbon output of tive tricarboxylic acid cycle Evans, 1966; Buchanan, 1990; the pathway is glyoxylate. The desired net flow of carbon is Higler, 2011. Each reaction is numbered. For certain reac indicated by the wide, light gray arrow. Metabolites are tions, such as reaction 1 and 7, there are two possible routes shown in bold and enzyme abbreviations areas follows: PCC, denoted by a and b, each of which is catalyzed by different propionyl-CoA carboxylase; MCR, malonyl-CoA reductase: enzyme(s). Enzymes catalyzing each reaction are as follows: PCS, propionyl-CoA synthase; MCE, methylmalonyl-CoA 1a, ATP citrate lyase (E.C. 2.3.3.8); 1b, citryl-CoA synthetase epimerase: ScpA. E. coli methylmalonyl-CoA mutase: SDH, (E.C. 6.2.1.18) and citryl-CoA lyase (E.C. 4.1.3.34); 2. E. coli succinate dehydrogenase; Fum A/FumB/FumC, three malate dehydrogenase (E.C. 1.1.1.37): 3, fumarate dehy E. coli fumarate hydratases; SmtAB, succinyl-CoA:(S)- dratase or fumarase (E.C. 4.2.1.2); 4 fumarate reductase malate CoA transferase: MMC lyase, (S)-malyl-CoA/B-me (E.C. 1.3.99.1); 5, succinyl-CoA synthetase (E.C. 6.2.1.5); 6. thylmalyl-CoA/(S)-citramalyl-CoA lyase. Note that methyl 2-oxoglutarate synthase or 2-oxoglutarate:ferredoxin oxi malonyl-CoA epimerase activity has been reported in E. coli doreductase (E.C. 1.2.7.3): 7a, isocitrate dehydrogenase although no corresponding gene or gene product has been (E.C. 1.1.1.41 or E.C. 1.1.1.42); 7b, 2-oxoglutarate carboxy identified Evans, 1993. lase (E.C. 6.4.1.7) and oxalosuccinate reductase (E.C. 1.1.1. (0025 FIG. 7 depicts the metabolic reactions of the ribu 41): 8, aconitate hydratrase (E.C. 4.2.1.3); 9, pyruvate syn lose monophosphate cycle Strom, 1974. In metabolite thase or pyruvate:ferredoxin oxidoreductase (E.C. 1.2.7.1); names, —P denotes phosphate. Each reaction is numbered. 10, phosphoenolpyruvate synthetase (E.C.2.7.9.2): 11, phos Enzymes catalyzing each reaction areas follows: 1, heXulose phoenolpyruvate carboxylase (E.C. 4.1.1.3.1). 6-phosphate synthase (E.C. 4.1.2.43): 2, 6-phospho-3-hexu 0022 FIG. 4 depicts example metabolic reactions and loisomerase (E.C. 5.3.1.27); 3. phosphofructokinase (E.C. enzymes needed to engineer a carbon fixation pathway 2.7.1.11): 4, fructose bisphosphate aldolase (E.C. 4.1.2.13): derived from the reductive tricarboxylic acid (rTCA) cycle 5, transketolase (E.C.2.2.1.1); 6, transaldolase (E.C.2.2.1.2): into the heterotroph Escherichia coli. Reactions in black are 7, transketolase (E.C. 2.2.1.1): 8, ribose 5-phosphate known to occur in the wildtype host cell E. coli when grown isomerase (E.C. 5.3.1.6): 9, ribulose-5-phosphate-3-epime in microaerobic or anaerobic conditions Cronan, 2010. rase (E.C. 5.1.3.1). Reactions in dark gray must be added to complete the rTCA 0026 FIG. 8 depicts example metabolic reactions and derived carbon fixation cycle in E. coli. The carbon input to enzymes needed to engineer a carbon fixation pathway the pathway is carbon dioxide (CO) and the carbon outputs derived from the ribulose monophosphate (RuMP) cycle into of the pathway are acetyl-coA and/or pyruvate. The desired the heterotroph Escherichia coli. Reactions in black occur in net flow of carbon is indicated by the wide, light gray arrow. the wildtype host cell E. coli. Reactions in dark gray must be Metabolites are shown in bold and enzyme abbreviations are added to complete the RuMP cycle-derived carbon fixation as follows: AspC, aspartate aminotransferase; MDH, malate cycle in E. coli. The carbon input to the pathway is formal dehydrogenase; AspA, aspartate ammonia-lyase, FumB. dehyde and the carbon output of the pathway is dihydroxy fumarase B; FRD, fumarate reductase; STK, succinate thioki acetone-phosphate. The desired net flow of carbon is indi nase; OGOR, 2-oxoglutarate:ferredoxin oxidoreductase: cated by the wide, light gray arrow. For simplicity, a series of IDH, isocitrate dehydrogenase; ACN, aconitase; ACL, ATP rearrangement reactions that regenerate ribulose-5-phos citrate lyase; POR, pyruvate:ferredoxin oxidoreductase. phate and all occur natively in E. coli are denoted by a single 0023 FIG. 5 depicts the metabolic reactions of the 3-hy arrow. Metabolites are shown in bold with —P denoting droxypropionate bicycle Holo, 1989: Strauss, 1993: Eisen phosphate. Enzyme abbreviations are as follows: HPS, hexu reich, 1993; Herter, 2002a: Zarzycki, 2009; Zarzycki, 2011. lose-6-phosphate synthase; PHI, 6-phospho-3-hexuloi Each reaction is numbered. In some cases, multiple different somerase; PFK, phosphofructokinase. reactions, such as reactions 10a, 10b and 10c, are catalyzed by (0027 FIG.9 depicts the metabolic reactions of the Calvin the same multi-functional enzyme. Enzymes catalyzing each Benson-Bassham cycle or the reductive pentose phosphate reaction are as follows: 1, acetyl-CoA carboxylase (E.C. 6.4. (RPP) cycle Bassham, 1954. In metabolite names, —P 1.2); 2. malonyl-CoA reductase (E.C. 1.2.1.75 and E.C. 1.1. denotes phosphate. Each reaction is numbered. Enzymes 1.298): 3, propionyl-CoA synthase (E.C. 6.2.1.-, E.C. 4.2.1.- catalyzing each reaction are as follows: 1, ribulose bisphos and E.C. 1.3.1.-): 4, propionyl-CoA carboxylase (E.C. 6.4.1. phate carboxylase (E.C. 4.1.1.39): 2, phosphoglycerate 3); 5, methylmalonyl-CoA epimerase (E.C. 5.1.99.1); 6. kinase (E.C. 2.7.2.3): 3, glyceraldehyde-3P dehydrogenase methylmalonyl-CoA mutase (E.C. 5.4.99.2): 7. Succinyl (phosphorylating) (E.C. 1.2.1.12 or E.C. 1.2.1.13): 4, triose CoA:(S)-malate CoA transferase (E.C. 2.8.3.-); 8, succinate phosphate isomerase (E.C. 5.3.1.1); 5, fructose-bisphosphate dehydrogenase (E.C. 1.3.5.1); 9, fumarate hydratase (E.C. aldolase (E.C. 4.1.2.13); 6, fructose-bisphosphatase (E.C. 4.2.1.2): 10abc. (S)-malyl-CoA/B-methylmalyl-CoA/(S)-cit 3.1.3.11); 7, transketolase (E.C. 2.2.1.1): 8, sedoheptulose-1, ramalyl-CoA lyase (E.C. 4.1.3.24 and E.C. 4.1.3.25): 11, 7-bisphosphate aldolase (E.C. 4.1.2.-); 9, sedoheptulose bis mesaconyl-C1-CoA hydratase or B-methylmalyl-CoA phosphatase (E.C.3.1.3.37); 10, transketolase (E.C. 2.2.1.1); dehyratase (E.C. 4.2.1.-); 12, mesaconyl-CoA C1-C4 CoA 11, ribose-5-phosphate isomerase (E.C. 5.3.1.6); 12, ribu transferase (E.C. 2.8.3.-); 13, mesaconyl-C4-CoA hydratase lose-5-phosphate-3-epimerase (E.C. 5.1.3.1): 13, phosphor (E.C. 4.2.1.-). ibulokinase (E.C. 2.7.1.19). US 2015/003 7853 A1 Feb. 5, 2015

0028 FIG. 10 depicts example metabolic reactions and metabolite names, —P denotes phosphate. Each reaction is enzymes needed to engineer a carbon fixation pathway numbered. Enzymes catalyzing each reaction are as follows: derived from the Calvin-Benson-Bassham cycle or the reduc 1, acetyl-CoA thiolase; 2. HMG-CoA synthase (E.C. 2.3.3. tive pentose phosphate (RPP) cycle into the heterotroph 10): 3, HMG-CoA reductase (E.C. 1.1.1.34): 4, mevalonate Escherichia coli. Reactions in black occur in the wildtype kinase (E.C. 2.7.1.36); 5, phosphomevalonate kinase (E.C. host cell E. coli. Reactions in dark gray must be added to 2.7.4.2): 6, mevalonate pyrophosphate decarboxylase (E.C. complete the RPP cycle-derived carbon fixation cycle in E. 4.1.1.33); 7, isopentenyl pyrophosphate isomerase (E.C. 5.3. coli. The carbon input to the pathway is carbon dioxide and 3.2). the carbon output of the pathway is dihydroxyacetone-phos 0035 FIG. 17 depicts the metabolic reactions of the glyc phate. The desired net flow of carbon is indicated by the wide, erol/1,3-propanediol biosynthetic pathway for production of light gray arrow. Metabolites are shown in bold with —P glycerol or 1,3-propanediol. In metabolite names, —P denoting phosphate. Enzyme abbreviations are as follows: denotes phosphate. Each reaction is numbered. Enzymes RuBisCO, ribulose bisphosphate carboxylase; PGK, phos catalyzing each reaction are as follows: 1, sn-glycerol-3-P phoglycerate kinase; GAPDH, NADPH-dependent glyceral dehydrogenase (E.C. 1.1.1.8 or 1.1.1.94); 2, sn-glycerol-3- dehyde-3P dehydrogenase (phosphorylating); TPI, triose phosphatase (E.C. 3.1.3.21): 3, sn-glycerol-3-R glycerol phosphate isomerase; FBA, fructose-bisphosphate aldolase; dehydratase (E.C. 4.2.1.30); 4, 1,3-propanediol oxidoreduc FBPase, fructose-bisphosphatase; TK, transketolase; SBA, tase (E.C. 1.1.1.202). sedoheptulose-1.7-bisphosphate aldolase; SBPase, sedohep 0036 FIG. 18 depicts the metabolic reactions of the poly tulose bisphosphatase; RPI, ribose-5-phosphate isomerase: hydroxybutyrate biosynthetic pathway. Each reaction is num RPE, ribulose-5-phosphate-3-epimerase: PRK, phosphoribu bered. Enzymes catalyzing each reaction are as follows: 1. lokinase. acetyl-CoA:acetyl-CoA C-acetyltransferase (E.C. 2.3.1.9): 0029 FIG. 11 provides a schematic to convert succinate or 2. (R)-3-hydroxyacyl-CoA:NADP+ oxidoreductase (E.C. 3-hydroxypropionate to various chemicals. 1.1.1.36): 3, polyhydroxyalkanoate synthase (E.C. 2.3.1.-). 0030 FIG. 12 provides a schematic of glutamate or ita 0037 FIG. 19 depicts the metabolic reactions of one lysine conic acid conversion to various chemicals. biosynthesis pathway. In metabolite names, -P denotes 0031 FIG. 13 depicts the metabolic reactions of a galac phosphate. Each reaction is numbered. Enzymes catalyzing tose biosynthetic pathway. In metabolite names, —P denotes each reaction are as follows: 1, aspartate aminotransferase phosphate. Each reaction is numbered. Enzymes catalyzing (E.C. 2.6.1.1): 2, aspartate kinase (E.C. 2.7.2.4): 3, aspartate each reaction areas follows: 1, alpha-D-glucose-6-phosphate semialdehyde dehydrogenase (E.C. 1.2.1.11): 4, dihydrodi ketol-isomerase (E.C. 5.3.1.9): 2, D-mannose-6-phosphate picolinate synthase (E.C. 4.2.1.52); 5, dihydrodipicolinate ketol-isomerase (E.C. 5.3.1.8); 3. D-mannose 6-phosphate reductase (E.C. 1.3.1.26); 6, tetrahydrodipicolinate succiny 1.6-phosphomutase (E.C. 5.4.2.8); 4, mannose-1-phosphate lase (E.C. 2.3.1.117): 7. N-succinyldiaminopimelate-ami guanylyltransferase (E.C. 2.7.7.22); 5, GDP-mannose 3.5- notransferase (E.C. 2.6.1.17): 8, N-succinyl-L-diaminopime epimerase (E.C. 5.1.3.18); 6, galactose-1-phosphate guany late desuccinylase (E.C. 3.5.1.18); 9, diaminopimelate lyltransferase (E.C. 2.7.n.n); 7, L-galactose 1-phosphate epimerase (E.C. 5.1.1.7); 10, diaminopimelate decarboxylase phosphatase (E.C. 3.1.3.n). (E.C. 4.1.1.20). 0032 FIG. 14 depicts different fermentation pathways 0038 FIG. 20 depicts the metabolic reactions of the from pyruvate to ethanol. Each reaction is numbered. Y-Valerolactone biosynthetic pathway. Each reaction is num Enzymes catalyzing each reaction are as follows: 1. pyruvate bered. Enzymes catalyzing each reaction are as follows: 1. decarboxylase (E.C. 4.1.1.1): 2, alcoholdehydrogenase (E.C. propionyl-CoA synthase (E.C. 6.2.1.-, E.C. 4.2.1.- and E.C. 1.1.1.1): 3, pyruvate-formate lyase (E.C. 2.3.1.54); 4, acetal 1.3.1.-); 2, beta-ketothiolase (E.C. 2.3.1.16): 3, acetoacetyl dehyde dehydrogenase (E.C. 1.2.1.10); 5, pyruvate synthase CoA reductase (E.C. 1.1.1.36): 4, 3-hydroxybutyryl-CoA (E.C. 1.2.7.1). dehydratase (E.C. 4.2.1.55); 5, vinylacetyl-CoA A-isomerase 0033 FIG. 15 depicts the metabolic reactions of the meva (E.C. 5.3.3.3.); 6, 4-hydroxybutyryl-CoA transferase (E.C. lonate-independent pathway (also known as the non-meva 2.8.3.-); 7, 1.4-lactonase (E.C.3.1.1.25). lonate pathway or deoxyxylulose 5-phosphate (DXP) path 0039 FIG. 21 depicts the spectrophotometric assay results way) for production of isopentenyl pyrophosphate (IPP) and of in vitro formate dehydrogenase (FDH) assays for strains its isomer dimethylallyl pyrophosphate (DMAPP). In propagating plasmid 2430, plasmid 2429 as well as positive metabolite names, —P denotes phosphate. Each reaction is and negative control. The positive control is commercially numbered. Enzymes catalyzing each reaction are as follows: available purified NAD"-dependent FDH enzyme. The nega 1, 1-deoxy-D-xylulose-5-phosphate synthase (E.C. 2.2.1.7); tive control is a strain propagating a plasmid without an 2, 1-deoxy-D-xylulose-5-phosphate reductoisomerase (E.C. FDH-encoding gene. For each Strain, assay results are shown 1.1.1.267): 3, 4-diphosphocytidyl-2C-methyl-D-erythritol with for both NADP and NAD" as the cofactor, as indicated. synthase (E.C. 2.7.7.60): 4, 4-diphosphocytidyl-2C-methyl The reduction of either NADP" or NAD" is monitored by D-erythritol kinase (E.C. 2.7.1.148); 5, 2C-methyl-D-eryth measuring the absorbance at 340 nm. ritol 2.4-cyclodiphosphate synthase (E.C. 4.6.1.12): 6, (E)-4- 0040 FIG.22 depicts the spectrophotometric assay results hydroxy-3-methylbut-2-enyl diphosphate synthase (E.C. of Sulfide oxidation assays for strain propagating plasmid 1.17.7.1): 7, isopentyl/dimethylallyl diphosphate synthase or 4767, plasmid 4768 and a negative control plasmid (a plasmid 4-hydroxy-3-methylbut-2-enyl diphosphate reductase (E.C. without a constitutive promoter upstream of the Sqr gene). 1.17.1.2). Depletion of sulfide over time is monitored by measuring the 0034 FIG.16 depicts the metabolic reactions of the meva absorbance at 670 nm after treatment of the samples with lonate pathway (also known as the HMG-CoA reductase Cline reagent Cline, 1969. pathway) for production of isopentenyl pyrophosphate (IPP) 0041 FIG. 23 depicts the spectrophotometric assay results and its isomer dimethylallyl pyrophosphate (DMAPP). In of in vitro propionyl-CoA synthase (PCS) assays for strain US 2015/003 7853 A1 Feb. 5, 2015 propagating plasmid 4986 as well as a negative control plas 0049 Inorganic energy sources together with inorganic mid containing no pcs gene. For the Strain propagating plas carbon represent an alternative feedstock to Sugar or light plus mid 4986, assay results are shown with all required substrates carbon dioxide for the production of carbon-based products as well as control reactions that omit one of the required of interest. There exist non-biological routes to convert inor substrates, as indicated. The oxidation of NADPH is moni ganic energy sources and inorganic carbon to chemicals and tored by measuring the absorbance at 340 nm. fuels of interest. For example, the Fischer-Tropsch process 0042 FIG. 24 depicts hydrogenase assay results for consumes carbon monoxide and hydrogen gas generated strains 242 (at three different dilutions), 312 and 392. Hydro from gasification of coal or biomass to produce methanol or genase activity is measured by monitoring the reduction of mixed hydrocarbons as fuels U.S. Pat. No. 1,746,464. The the electron acceptor methyl viologen; hence, the y axis is drawbacks of Fischer-Tropsch processes are: 1) a lack of denoted in umol of reduced methyl viologen. product selectivity, which results in difficulties separating 0043 FIG. 25 depicts a standard curve correlating the rate desired products; 2) catalyst sensitivity to poisoning; 3) high of NADH formation by a commercially available formate energy costs due to high temperatures and pressures required; dehydrogenase as a function of formate concentration in the and 4) the limited range of products available at commercially sample. competitive costs. Without the advent of carbon sequestration technologies that can operate at Scale, the Fischer-TropSch 0044 FIG. 26 depicts the branched tricarboxylic acid process is widely considered to be an environmentally costly cycle run by E. coli when grown under anaerobic conditions. method for generating liquid fuels. Alternatively, processes If the gene encoding isocitrate dehydrogenase (Icd) is ren that rely on naturally occurring microbes that convert synthe dered non-functional (denoted by Xs), then synthesis of sis gas or syngas, a mixture of primarily molecular hydrogen 2-oxoglutarate is restored through introduction of a func and carbon monoxide that can be obtained via gasification of tional 2-oxoglutarate synthase (OGOR, bold gray arrow). any organic feedstock, Such as coal, coal oil, natural gas, Metabolite names are denoted in bold. biomass, or waste organic matter, to products such as ethanol, 0045 FIG. 27 depicts computed phenotypic phase planes acetate, methane, or molecular hydrogen are available Hen for E. coli strains with the native formate dehydrogenases stra, 2007. However, these naturally occurring microbes can deleted in either the absence (A and C) or presence (Band D) produce only a very restricted set of products, are limited in of an exogenous NAD-dependent formate dehydrogenase. their efficiencies, lack established tools for genetic manipu The growth conditions are aerobic with dual carbon sources lation, and are sensitive to their end products at high concen of formate and either glucose (A and B) or glycolate (C and trations. Finally, there is some work to introduce Syngas ulti D). lization into industrial microbial hosts U.S. Pat. No. 7.803, 0046 FIG. 28 depict computed phenotypic phase planes 589; however, these processes have yet to be demonstrated at during growth on formate as a sole carbon source for wildtype commercial scale and are limited to using syngas as the feed E. coli (FIG. 28A), E. coli with native formate dehydrogena stock. ses deleted (FIG. 28B) and E. coli with native formate dehy drogenases deleted and an exogenous NAD-dependent for 0050. In some embodiments, the invention provides for the use of an inorganic energy source. Such as molecular mate dehydrogenase added (FIG. 28C). hydrogen or formate, derived from electrolysis. There is tre 0047 FIG. 29 depicts the required mass transfer coeffi mendous commercial activity towards the goal of renewable cient (Ka) and required reactor volume for 0.5 tid of fuel and/or carbon-neutral energy from Solar Voltaic, geothermal, production, as a function of maximum fuel productivity for wind, nuclear, hydroelectric and more. However, most of isooctanol, assuming fuel production from inorganic energy these technologies produce electricity and are thus limited in source H and inorganic carbon source CO for an ideal engi use to the electrical grid Whipple, 2010. Furthermore, at neered chemoautotroph. On the y axis, the typical range of least some of these renewable energy sources such as Solar Kain large-scale stirred-tank bioreactors is denoted (A). On and wind suffer from being intermittent and unreliable. The the X axis, reported natural formate uptake rates at industrially lack of practical, large scale electricity storage technologies relevant culture densities is denoted (B). limits how much of the electricity demand can be shifted to renewable sources. The ability to store electrical energy in DETAILED DESCRIPTION chemical form, such as in carbon-based products of interest, 0048. The present invention relates to developing and would both offer a means for large-scale electricity storage using engineered chemoautotrophs capable of utilizing and allow renewable electricity to meet energy demand from energy frominorganic energy sources and inorganic carbon to the transportation sector. Renewable electricity combined produce a desired product. The invention provides for the with electrolysis, such as the electrochemical production of engineering of a heterotrophic organism, for example, hydrogen from water for example, WO/2009/154753, Escherichia coli or other organism suitable for commercial WO/2010/0421.97, WO/2010/028262 and WO/2011/ large-scale production of fuels and chemicals, that can effi 028264 or formate/formic acid from carbon dioxide for ciently utilize inorganic energy sources and inorganic carbon example, WO/2007/041872, opens the possibility of a sus as a Substrate for growth (a chemoautotroph) and for chemical tainable, renewable Supply of the inorganic energy source as production provides cost-advantaged processes for manufac one aspect of the present invention. turing of carbon based products of interest. The organisms 0051. In some embodiments, the invention provides for can be optimized and tested rapidly and at reasonable costs. the use of an inorganic energy source. Such as hydrogen The invention further provides for the engineering of an Sulfide or molecular hydrogen, derived from waste streams. autotrophic organism to include one or more additional or For example, hydrogen Sulfide is present in waste streams alternative pathways for utilization of inorganic energy arising from both hydrodesulfurization processes used during Sources and inorganic carbon to produce central metabolites oil recovery and desulfurization of natural gas. Indeed, cur for growth and/or other desired products. rently many oil companies stockpile elemental Sulfur (the US 2015/003 7853 A1 Feb. 5, 2015 oxidation product of hydrogen sulfide) since worldwide pro module may be instantiated via one or more possible biosyn duction exceeds demand Ober, 2010. As lower quality oil thetic pathways. For example, in Module 1, there are several deposits with higher sulfur contents (5% w/w) open up to possible energy conversion pathways, such as those based on drilling, the expectation is that global Sulfur supply will con formate dehydrogenase (e.g., E.C. 1.2.1.2. E.C. 1.2.1.43. tinue to grow. As a second example, hydrogen and carbon E.C. 1.1.5.6, E.C. 1.2.2.1 or E.C. 1.2.2.3), ferredoxin-depen dioxide are off-gas by-products of clostridial acetone-bu dent formate dehydrogenase, hydrogenase (e.g., E.C. 1.12.1. tanol-ethanol fermentations. 2. E.C. 1.12.1.3, or E.C. 1.12.7.2), sulfide-quinone oxi 0052. In some embodiments, the invention provides for doreductase (e.g., E.C. 1.8.5.4), flavocytochrome c sulfide the use of an inorganic carbon Source. Such as carbon dioxide, dehydrogenase (e.g., E.C. 1.8.2.3), ferredoxin-NADP+ derived from waste streams. For example, carbon dioxide is a reductase (e.g., E.C. 1.18.1.2), ferredoxin-NAD" reductase component of synthesis gas, the major product of gasification (e.g., E.C. 1.18.1.3), NAD(P)+ transhydrogenase (e.g., E.C. of coal, coal oil, natural gas, and of carbonaceous materials Such as biomass materials, including agricultural crops and 1.6.1.1 or E.C. 1.6.1.2), NADH:ubiquinone oxidoreductase I residues, and waste organic matter. Additional sources (e.g., E.C. 1.6.5.3). As a second example, in Module 2, there include, but are not limited to, production of carbon dioxide are several possible naturally occurring carbon fixation path as a byproduct in ammonia and hydrogen plants, where meth ways, such as the Calvin-Benson-Bassham cycle or reductive ane is converted to carbon dioxide; combustion of wood and pentose phosphate cycle, the reductive tricarboxylic acid fossil fuels; production of carbon dioxide as a byproduct of cycle, the Wood-Ljungdhal or reductive acetyl-coA pathway, fermentation of sugar in the brewing of beer, whisky and other the 3-hydroxypropionate bicycle or 3-hydroxypropionate/ alcoholic beverages, or other fermentative processes; thermal malyl-CoA cycle, 3-hydroxypropionate/4-hydroxybutyrate decomposition of limestone, CaCOs, in the manufacture of cycle and the dicarboxylate/4-hydroxybutyrate cycle Hi lime, CaO; production of carbon dioxide as byproduct of gler, 2011 as well as many possible synthetic carbon fixation Sodium phosphate manufacture; and directly from natural pathways Bar-Even, 2010. As a final example, in Module 3, carbon dioxide springs, where it is produced by the action of there are numerous possible carbon-based products of inter acidified water on limestone or dolomite. As a second est, each of which has one or more corresponding biosyn example, formaldehyde is an oxidation product of methanol thetic pathways. Every combination of energy conversion or methane. Methanol can be prepared from Synthesis gas or pathway, carbon fixation pathway and, optionally, carbon reductive conversion of carbon dioxide and hydrogen by product biosynthetic pathway, when expressed in a het chemical synthetic processes. Methane is a major component erotrophic or autotrophic host cell or organism, represents a of natural gas and can also be obtained from renewable bio different embodiment of the present invention. It should be a SS. noted, however, that only certain embodiments of Module 1 0053. In one embodiment, the invention provides for the may be paired with a particular embodiment of Module 2. For inorganic energy source and the inorganic carbon coming example, the reductive tricarboxlic acid cycle likely requires from the same chemical species, such as formate or formic acid. Formate is oxidized by an energy conversion pathway to a low potential ferredoxin for particular carbon dioxide fixa generate reduced cofactor and carbon dioxide. The carbon tion steps in the pathway. Thus, the energy conversion path dioxide can then be used as the inorganic carbon Source. way paired with the reductive tricarboxylic acid cycle must be 0054 The invention provides for the expression of one or capable of generating reduced low potential ferredoxin, Such more exogenous proteins or enzymes in the host cell, thereby as using a ferredoxin-reducing formate dehydrogenase or a conferring biosynthetic pathway(s) to utilize inorganic ferredoxin-reducing hydrogenase (E.C. 1.12.7.2). Similarly, energy sources and inorganic carbon to produce reduced only certain embodiments of carbon fixation pathways pro organic compounds. In a preferred embodiment, the present duce the necessary precursors for a particular carbon product invention provides for a modular architecture for the metabo biosynthetic pathway. For example, fatty acid biosynthetic lism of the engineered chemoautotroph comprising the fol pathways require acetyl-coA and malonyl-coA to be gener lowing three metabolic modules (FIG. 1). ated products from the carbon fixation pathway. 0055. In Module 1, one or more energy conversion path 0059. The invention is described herein with general ref ways that use energy from an extracellular inorganic erence to the metabolic reaction, reactant or product thereof. energy source. Such as formate, hydrogen sulfide, or with specific reference to one or more nucleic acids or molecular hydrogen, or ferrous iron, to produce reduced genes encoding an enzyme associated with or catalyzing, or a cofactors inside the cell, such as NADH, NADPH, protein associated with, the referenced metabolic reaction, reduced ferredoxin and/or reduced quinones or cyto reactant or product. Unless otherwise expressly stated herein, chromes. those skilled in the art would understand that reference to a 0056. In Module 2, one or more carbon fixation path reaction also constitutes reference to the reactants and prod ways that use energy from reduced cofactors to reduce ucts of the reaction. Similarly, unless otherwise expressly and convert inorganic carbon, such as carbon dioxide or stated herein, reference to a reactant or product also refer formate, to central metabolites, such as acetyl-coA, ences the reaction, and reference to any of these metabolic pyruvate, glycolate, glyoxylate, and dihydroxyacetone constituents also references the gene or genes encoding the phosphate. enzymes that catalyze or proteins involved in the referenced 0057 Optionally, in Module 3, one or more carbon reaction, reactant or product. Likewise, given the well-known product biosynthetic pathways that convert central fields of metabolic biochemistry, enzymology and genomics, metabolites into desired products, such as carbon-based reference herein to a gene or encoding nucleic acid also products of interest. constitutes a reference to the corresponding encoded enzyme 0058. A key advantage of a modular architecture for the and the reaction it catalyzes or a protein associated with the metabolism of an engineered chemoautotroph is that each reaction as well as the reactants and products of the reaction. US 2015/003 7853 A1 Feb. 5, 2015

DEFINITIONS product (e.g., promoter sequence, polyadenylation sequence, 0060. As used herein, the terms “nucleic acids.”99 “nucleic&g termination sequence, enhancer sequence, etc.). For example, acid molecule' and “polynucleotide' may be used inter genes involved in the cis,cis-muconic acid biosynthesis path changeably and include both single-stranded (SS) and double way can be genes of interest. It should be noted that non stranded (ds) RNA, DNA and RNA:DNA hybrids. As used coding regions are generally untranslated but can be involved herein the terms “nucleic acid”, “nucleic acid molecule', in the regulation of transcription and/or translation. "polynucleotide, “oligonucleotide'. “oligomer' and "oligo' 0065. As used herein, the term “genome' refers to the are used interchangeably and are intended to include, but are whole hereditary information of an organism that is encoded not limited to, a polymeric form of nucleotides that may have in the DNA (or RNA for certain viral species) including both various lengths, including either deoxyribonucleotides or coding and non-coding sequences. In various embodiments, ribonucleotides, or analogs thereof. For example, oligos may the term may include the chromosomal DNA of an organism be from 5 to about 200 nucleotides, from 10 to about 100 and/or DNA that is contained in an organelle Such as, for nucleotides, or from 30 to about 50 nucleotides long. How example, the mitochondria or chloroplasts and/or extrachro ever, shorter or longer oligonucleotides may be used. Oligos mosomal plasmid and/or artificial chromosome. A "native for use in the present invention can be fully designed. A gene' or 'endogenous gene refers to a gene that is native to nucleic acid molecule may encode a full-length polypeptide the host cell with its own regulatory sequences whereas an or a fragment of any length thereof, or may be non-coding. “exogenous gene' or "heterologous gene' refers to any gene 0061 Nucleic acids can refer to naturally-occurring or that is not a native gene, comprising regulatory and/or coding synthetic polymeric forms of nucleotides. The oligos and sequences that are not native to the host cell. In some embodi nucleic acid molecules of the present invention may be ments, a heterologous gene may comprise mutated sequences formed from naturally-occurring nucleotides, for example or part of regulatory and/or coding sequences. In some forming deoxyribonucleic acid (DNA) or ribonucleic acid embodiments, the regulatory sequences may be heterologous (RNA) molecules. Alternatively, the naturally-occurring oli or homologous to a gene of interest. A heterologous regula gonucleotides may include structural modifications to alter tory sequence does not function in nature to regulate the same their properties, such as in peptide nucleic acids (PNA) or in gene(s) it is regulating in the transformed host cell. "Coding locked nucleic acids (LNA). The terms should be understood sequence” refers to a DNA sequence coding for a specific to include equivalents, analogs of either RNA or DNA made amino acid sequence. As used herein, “regulatory sequences from nucleotide analogs and as applicable to the embodiment refer to nucleotide sequences located upstream (5' non-cod being described, single-stranded or double-stranded poly ing sequences), within, or downstream (3' non-coding nucleotides. Nucleotides useful in the invention include, for sequences) of a coding sequence, and which influence the example, naturally-occurring nucleotides (for example, ribo transcription, RNA processing or stability, or translation of nucleotides or deoxyribonucleotides), or natural or synthetic the associated coding sequence. Regulatory sequences may modifications of nucleotides, or artificial bases. Modifica include promoters, ribosome binding sites, translation leader tions can also include phosphorothioated bases for increased sequences, RNA processing site, effector (e.g., activator, stability. repressor) binding sites, stem-loop structures, and so on. 0062) Nucleic acid sequences that are “complementary’ 0066. As described herein, a genetic element may be any are those that are capable of base-pairing according to the coding or non-coding nucleic acid sequence. In some standard Watson-Crick complementarity rules. As used embodiments, a genetic element is a nucleic acid that codes herein, the term "complementary sequences' means nucleic for an amino acid, a peptide or a protein. Genetic elements acid sequences that are substantially complementary, as may may be operons, genes, gene fragments, promoters, exons, be assessed by the nucleotide comparison methods and algo introns, regulatory sequences, or any combination thereof. rithms set forth below, or as defined as being capable of Genetic elements can be as short as one or a few codons or hybridizing to the polynucleotides that encode the protein may be longer including functional components (e.g. encod Sequences. ing proteins) and/or regulatory components. In some embodi 0063 As used herein, the term “gene' refers to a nucleic ments, a genetic element includes an entire open reading acid that contains information necessary for expression of a frame of a protein, or the entire open reading frame and one or polypeptide, protein, or untranslated RNA (e.g., rRNA, more (or all) regulatory sequences associated therewith. One tRNA, anti-sense RNA). When the gene encodes a protein, it skilled in the art would appreciate that the genetic elements includes the promoter and the structural gene open reading can be viewed as modular genetic elements or genetic mod frame sequence (ORF), as well as other sequences involved in ules. For example, a genetic module can comprise a regula expression of the protein. When the gene encodes an untrans tory sequence or a promoter or a coding sequence or any lated RNA, it includes the promoter and the nucleic acid that combination thereof. In some embodiments, the genetic ele encodes the untranslated RNA. ment includes at least two different genetic modules and at 0064. The term “gene of interest” (GOD) refers to any least two recombination sites. In eukaryotes, the genetic ele nucleotide sequence (e.g., RNA or DNA), the manipulation ment can comprise at least three modules. For example, a of which may be deemed desirable for any reason (e.g., has genetic module can be a regulator sequence or a promoter, a the relevant activity for a biosynthetic pathway, confer coding sequence, and a polyadenlylation tail or any combi improved qualities and/or yields, expression of a protein of nation thereof. In addition to the promoter and the coding interest in a host cell, expression of a ribozyme, etc.), by one sequences, the nucleic acid sequence may comprises control of ordinary skill in the art. Such nucleotide sequences modules including, but not limited to a leader, a signal include, but are not limited to, coding sequences of structural sequence and a transcription terminator. The leader sequence genes (e.g., reporter genes, selection marker genes, onco is a non-translated region operably linked to the 5' terminus of genes, drug resistance genes, growth factors, etc.), and non the coding nucleic acid sequence. The signal peptide coding sequences which do not encode an mRNA or protein sequence codes for an amino acid sequence linked to the US 2015/003 7853 A1 Feb. 5, 2015

amino terminus of the polypeptide which directs the polypep acid residue (e.g., similar in steric and/or electronic nature), tide into the cells secretion pathway. then the molecules can be referred to as homologous (similar) 0067. As generally understood, a codon is a series of three at that position. Expression as a percentage of homology, nucleotides (triplets) that encodes a specific amino acid resi similarity, or identity refers to a function of the number of due in a polypeptide chain or for the termination of translation identical or similar amino acids at positions shared by the (stop codons). There are 64 different codons (61 codons compared sequences. Expression as a percentage of homol encoding for amino acids plus 3 stop codons) but only 20 ogy, similarity, or identity refers to a function of the number different translated amino acids. The overabundance in the of identical or similar amino acids at positions shared by the number of codons allows many amino acids to be encoded by compared sequences. Various alignment algorithms and/or more than one codon. Different organisms (and organelles) programs may be used, including FASTA, BLAST, or often show particular preferences or biases for one of the ENTREZ FASTA and BLAST are available as apart of the several codons that encode the same amino acid. The relative GCG sequence analysis package (University of Wisconsin, frequency of codon usage thus varies depending on the organ Madison, Wis.), and can be used with, e.g., default settings. ism and organelle. In some instances, when expressing a ENTREZ is available through the National Center for Bio heterologous gene in a host organism, it is desirable to modify technology Information, National Library of Medicine, the gene sequence so as to adapt to the codons used and codon National Institutes of Health, Bethesda, Md. In one embodi usage frequency in the host. In particular, for reliable expres ment, the percent identity of two sequences can be deter sion of heterologous genes it may be preferred to use codons mined by the GCG program with a gap weight of 1, e.g., each that correlate with the hosts tRNA level, especially the amino acid gap is weighted as if it were a singleamino acid or tRNAs that remain charged during starvation. In addition, nucleotide mismatch between the two sequences. Other tech codons having rare cognate tRNA’s may affect protein fold niques for alignment are described Doolittle, 1996. Prefer ing and translation rate, and thus, may also be used. Genes ably, an alignment program that permits gaps in the sequence designed in accordance with codon usage bias and relative is utilized to align the sequences. The Smith-Waterman is one tRNA abundance of the host are often referred to as being type of algorithm that permits gaps in sequence alignments “optimized for codon usage, which has been shown to Shpaer, 1997. Also, the GAP program using the Needleman increase expression level. Optimal codons also help to and Wunsch alignment method can be utilized to align achieve faster translation rates and high accuracy. In general, sequences. An alternative search strategy uses MPSRCH codon optimization involves silent mutations that do not software, which runs on a MASPAR computer. MPSRCH result in a change to the amino acid sequence of a protein. uses a Smith-Waterman algorithm to score sequences on a 0068 Genetic elements or genetic modules may derive massively parallel computer. from the genome of natural organisms or from Synthetic 0070. As used herein, an “ortholog” is a gene or genes that polynucleotides or from a combination thereof. In some are related by vertical descent and are responsible for sub embodiments, the genetic elements modules derive from dif stantially the same or identical functions in different organ ferent organisms. Genetic elements or modules useful for the isms. For example, mouse epoxide and human methods described herein may be obtained from a variety of epoxide hydrolase can be considered orthologs for the bio sources such as, for example, DNA libraries, BAC (bacterial logical function of hydrolysis of epoxides. Genes are related artificial chromosome) libraries, de novo chemical synthesis, by Vertical descent when, for example, they share sequence or excision and modification of a genomic segment. The similarity of Sufficient amount to indicate they are homolo sequences obtained from Such sources may then be modified gous, or related by evolution from a common ancestor. Genes using standard molecular biology and/or recombinant DNA can also be considered orthologs if they share three-dimen technology to produce polynucleotide constructs having sional structure but not necessarily sequence similarity, of a desired modifications for reintroduction into, or construction sufficient amount to indicate that they have evolved from a of a large product nucleic acid, including a modified, par common ancestor to the extent that the primary sequence tially synthetic or fully synthetic genome. Exemplary meth similarity is not identifiable. Genes that are orthologous can ods for modification of polynucleotide sequences obtained encode proteins with sequence similarity of about 25% to from a genome or library include, for example, site directed 100% amino acid sequence identity. Genes encoding proteins mutagenesis; PCR mutagenesis; inserting, deleting or Swap sharing an amino acid similarity less that 25% can also be ping portions of a sequence using restriction enzymes option considered to have arisen by vertical descent if their three ally in combination with ligation; in vitro or in vivo homolo dimensional structure also shows similarities. Members of gous recombination; and site-specific recombination; or the serine protease family of enzymes, including tissue plas various combinations thereof. In other embodiments, the minogen activator and elastase, are considered to have arisen genetic sequences useful in accordance with the methods by Vertical descent from a common ancestor. Orthologs described herein may be synthetic oligonucleotides or poly include genes or their encoded gene products that through, for nucleotides. Synthetic oligonucleotides or polynucleotides example, evolution, have diverged in structure or overall may be produced using a variety of methods known in the art. activity. For example, where one species encodes a gene 0069. In some embodiments, genetic elements share less product exhibiting two functions and where such functions than 99%, less than 95%, less than 90%, less than 80%, less have been separated into distinct genes in a second species, than 70% sequence identity with a native or natural nucleic the three genes and their corresponding products are consid acid sequences. Identity can each be determined by compar ered to be orthologs. For the production of a biochemical ing a position in each sequence which may be aligned for product, those skilled in the art would understand that the purposes of comparison. When an equivalent position in the orthologous gene harboring the metabolic activity to be intro compared sequences is occupied by the same base or amino duced or disrupted is to be chosen for construction of the acid, then the molecules are identical at that position; when non-naturally occurring microorganism. An example of the equivalent site occupied by the same or a similar amino orthologs exhibiting separable activities is where distinct US 2015/003 7853 A1 Feb. 5, 2015 activities have been separated into distinct gene products the art. Related gene products or proteins can be expected to between two or more species or within a single species. A have a high similarity, for example, 25% to 100% sequence specific example is the separation of elastase proteolysis and identity. Proteins that are unrelated can have an identity which plasminogen proteolysis, two types of serine protease activ is essentially the same as would be expected to occur by ity, into distinct molecules as plasminogen activator and chance, if a database of sufficient size is scanned (about 5%). elastase. A second example is the separation of mycoplasma Sequences between 5% and 24% may or may not represent 5'-3' exonuclease and Drosophila DNA polymerase III activ Sufficient homology to conclude that the compared sequences ity. The DNA polymerase from the first species can be con are related. Additional statistical analysis to determine the sidered an ortholog to either or both of the exonuclease or the significance of such matches given the size of the data set can polymerase from the second species and vice versa. be carried out to determine the relevance of these sequences. 0071. In contrast, as used herein, “paralogs are homologs Exemplary parameters for determining relatedness of two or related by, for example, duplication followed by evolutionary more sequences using the BLAST algorithm, for example, divergence and have similar or common, but not identical can be as set forth below. Briefly, amino acid sequence align functions. Paralogs can originate or derive from, for example, ments can be performed using BLASTP version 2.0.8 (Jan.5, the same species or from a different species. For example, 1999) and the following parameters: Matrix: 0 BLOSUM62: microsomal epoxide hydrolase (epoxide hydrolase I) and gap open: 11, gap extension: 1; X dropoff 50; expect: 10.0; soluble epoxide hydrolase (epoxide hydrolase II) can be con wordsize: 3; filter: on. Nucleic acid sequence alignments can sidered paralogs because they represent two distinct be performed using BLASTN version 2.0.6 (Sep. 16, 1998) enzymes, co-evolved from a common ancestor, that catalyze and the following parameters: Match: 1; mismatch: -2, gap distinct reactions and have distinct functions in the same open: 5; gap extension: 2; x dropoff 50; expect: 10.0; word species. Paralogs are proteins from the same species with size: 11; filter: off. Those skilled in the art would know what significant sequence similarity to each other suggesting that modifications can be made to the above parameters to either they are homologous, or related through co-evolution from a increase or decrease the stringency of the comparison, for common ancestor. Groups of paralogous protein families example, and determine the relatedness of two or more include HipA homologs, luciferase genes, peptidases, and Sequences. others. 0074 As used herein, the term “homolog” refers to any 0072. As used herein, a “nonorthologous gene displace ortholog, paralog, nonorthologous gene, or similar gene ment' is a nonorthologous gene from one species that can encoding an enzyme catalyzing a similar or Substantially Substitute for a referenced gene function in a different spe similar metabolic reaction, whether from the same or differ cies. Substitution includes, for example, being able to per ent species. form Substantially the same or a similar function in the spe 0075. As used herein, the phrase “homologous recombi cies of origin compared to the referenced function in the nation” refers to the process in which nucleic acid molecules different species. Although generally, a nonorthologous gene with similar nucleotide sequences associate and exchange displacement may be identifiable as structurally related to a nucleotide Strands. A nucleotide sequence of a first nucleic known gene encoding the referenced function, less structur acid molecule that is effective for engaging in homologous ally related but functionally similar genes and their corre recombination at a predefined position of a second nucleic sponding gene products nevertheless still fall within the acid molecule can therefore have a nucleotide sequence that meaning of the term as it is used herein. Functional similarity facilitates the exchange of nucleotide strands between the first requires, for example, at least some structural similarity in the nucleic acid molecule and a defined position of the second or binding region of a nonorthologous gene prod nucleic acid molecule. Thus, the first nucleic acid can gener uct compared to a gene encoding the function sought to be ally have a nucleotide sequence that is Sufficiently comple Substituted. Therefore, a nonorthologous gene includes, for mentary to a portion of the second nucleic acid molecule to example, a paralog or an unrelated gene. promote nucleotide base pairing. Homologous recombina 0.073 Orthologs, paralogs and nonorthologous gene dis tion requires homologous sequences in the two recombining placements can be determined by methods well known to partner nucleic acids but does not require any specific those skilled in the art. For example, inspection of nucleic sequences. Homologous recombination can be used to intro acid or amino acid sequences for two polypeptides can reveal duce a heterologous nucleic acid and/or mutations into the sequence identity and similarities between the compared host genome. Such systems typically rely on sequence flank sequences. Based on Such similarities, one skilled in the art ing the heterologous nucleic acid to be expressed that has can determine if the similarity is sufficiently high to indicate enough homology with a target sequence within the host cell the proteins are related through evolution from a common genome that recombination between the vector nucleic acid ancestor. Algorithms well known to those skilled in the art, and the target nucleic acid takes place, causing the delivered such as Align, BLAST, Clustal W and others compare and nucleic acid to be integrated into the host genome. These determine a raw sequence similarity or identity, and also systems and the methods necessary to promote homologous determine the presence or significance of gaps in the sequence recombination are known to those of skill in the art. which can be assigned a weight or score. Such algorithms also 0076. It should be appreciated that the nucleic acid are known in the art and are similarly applicable for deter sequence of interest or the gene of interest may be derived mining nucleotide sequence similarity or identity. Parameters from the genome of natural organisms. In some embodi for sufficient similarity to determine relatedness are com ments, genes of interest may be excised from the genome of puted based on well known methods for calculating statistical a natural organism or from the host genome, for example E. similarity, or the chance of finding a similar match in a ran coli. It has been shown that it is possible to excise large dom polypeptide, and the significance of the match deter genomic fragments by in vitro enzymatic excision and in vivo mined. A computer comparison of two or more sequences excision and amplification. For example, the FLP/FRT site can, if desired, also be optimized visually by those skilled in specific recombination system and the CrefloxP site specific US 2015/003 7853 A1 Feb. 5, 2015 recombination systems have been efficiently used for exci eukaryotic ribosomes (such as those found in retic lysate) can sion large genomic fragments for the purpose of sequencing efficiently use either the Shine-Dalgarno or the Kozak ribo Yoon, 1998. In some embodiments, excision and amplifi Somal binding sites. cation techniques can be used to facilitate artificial genome or I0081. As used herein, the terms “promoter,” “promoter chromosome assembly. Genomic fragments may be excised element,” or “promoter sequence” refer to a DNA sequence from the chromosome of a chemoautotrophic organism and which when ligated to a nucleotide sequence of interest is altered before being inserted into the host cell artificial capable of controlling the transcription of the nucleotide genome or chromosome. In some embodiments, the excised sequence of interest into mRNA. A promoter is typically, genomic fragments can be assembled with engineered pro though not necessarily, located 5' (i.e., upstream) of a nucle moters and/or other gene expression elements and inserted otide sequence of interest whose transcription into mRNA it into the genome of the host cell. controls, and provides a site for specific binding by RNA 0077. As used herein, the term “polypeptide” refers to a polymerase and other transcription factors for initiation of sequence of contiguous amino acids of any length. The terms transcription. "peptide.” “oligopeptide.” “protein’ or “enzyme” may be I0082 One should appreciate that promoters have modular used interchangeably herein with the term “polypeptide'. In architecture and that the modular architecture may be altered. certain instances, “enzyme” refers to a protein having cata Bacterial promoters typically include a core promoter ele lytic activities. As used herein, the terms “protein of interest.” ment and additional promoter elements. The core promoter “POI. and “desired protein’ refer to a polypeptide under refers to the minimal portion of the promoter required to study, or whose expression is desired by one practicing the initiate transcription. A core promoter includes a Transcrip methods disclosed herein. A protein of interest is encoded by tion Start Site, a for RNA polymerases and gen its cognate gene of interest (GOI). The identity of a POI can eral transcription factor binding sites. The “transcription start be known or not known. A POI can be a polypeptide encoded site' refers to the first nucleotide to be transcribed and is by an open reading frame. designated +1. Nucleotides downstream the start site are 0078. A “proteome' is the entire set of proteins expressed numbered +1, +2, etc., and nucleotides upstream the start site by a genome, cell, tissue or organism More specifically, it is are numbered -1, -2, etc. Additional promoter elements are the set of expressed proteins in a given type of cells or an located 5' (i.e., typically 30-250 bp upstream of the start site) organism at a given time under defined conditions. Transcrip of the core promoter and regulate the frequency of the tran tome is the set of all RNA molecules, including mRNA, Scription. The proximal promoter elements and the distal rRNA, tRNA, and other non-coding RNA produced in one or promoter elements constitute specific transcription factor a population of cells. Metabolome refers to the complete set site. In prokaryotes, a core promoter usually includes two of small-molecule metabolites (such as metabolic intermedi consensus sequences, a -10 sequence or a -35 sequence, ates, hormones and other signaling molecules, and secondary which are recognized by Sigma factors (see, for example, metabolites) to be found within a biological sample, such as a Hawley, 1983). The -10 sequence (10 bp upstream from the single organism. first transcribed nucleotide) is typically about 6 nucleotides in length and is typically made up of the nucleotides adenosine 007.9 The term “fuse.” “fused or “link” refers to the cova and thymidine (also known as the Pribnow box). In some lent linkage between two polypeptides in a fusion protein. embodiments, the nucleotide sequence of the -10 sequence is The polypeptides are typically joined via a peptide bond, 5'-TATAAT or may comprise 3 to 6 bases pairs of the consen either directly to each other or via an amino acid linker. Sus sequence. The presence of this box is essential to the start Optionally, the peptides can be joined via non-peptide cova of the transcription. The -35 sequence of a core promoter is lent linkages known to those of skill in the art. typically about 6 nucleotides in length. The nucleotide 0080. As used herein, unless otherwise stated, the term sequence of the -35 sequence is typically made up of the each “transcription” refers to the synthesis of RNA from a DNA of the four nucleosides. The presence of this sequence allows template; the term “translation” refers to the synthesis of a a very high transcription rate. In some embodiments, the polypeptide from an mRNA template. Translation in general nucleotide sequence of the -35 sequence is 5'-TTGACA or is regulated by the sequence and structure of the 5' untrans may comprise 3 to 6 bases pairs of the consensus sequence. In lated region (5'-UTR) of the mRNA transcript. One regula Some embodiments, the -10 and the -35 sequences are tory sequence is the ribosome binding site (RBS), which spaced by about 17 nucleotides. Eukaryotic promoters are promotes efficient and accurate translation of mRNA. The more diverse than prokaryotic promoters and may be located prokaryotic RBS is the Shine-Dalgarno sequence, a purine several kilobases upstream of the transcription starting site. rich sequence of 5'-UTR that is complementary to the UCCU Some eukaryotic promoters contain a TATA box (e.g. con core sequence of the 3'-end of 16S rRNA (located within the taining the consensus sequence TATAAA or part thereof), 30S small ribosomal subunit). Various Shine-Dalgarno which is located typically within 40 to 120 bases of the sequences have been found in prokaryotic mRNAS and gen transcriptional start site. One or more upstream activation erally lie about 10 nucleotides upstream from the AUG start sequences (UAS), which are recognized by specific binding codon. Activity of a RBS can be influenced by the length and proteins can act as activators of the transcription. Theses UAS nucleotide composition of the spacer separating the RBS and sequences are typically found upstream of the transcription the initiator AUG. In eukaryotes, the Kozak sequence A/GC initiation site. The distance between the UAS sequences and CACCAUGG, which lies within a short 5' untranslated the TATA box is highly variable and may be up to 1 kb. region, directs translation of mRNA. An mRNA lacking the I0083. As used herein, the term “vector” refers to any Kozak consensus sequence may also be translated efficiently genetic element, such as a plasmid, phage, transposon, in an invitro systems if it possesses a moderately long 5'-UTR cosmid, chromosome, artificial chromosome, episome, virus, that lacks stable secondary structure. While E. coli ribosome virion, etc., capable of replication when associated with the preferentially recognizes the Shine-Dalgarno sequence, proper control elements and which can transfer gene US 2015/003 7853 A1 Feb. 5, 2015

sequences into or between cells. The vector may contain a an indication of whether a certain gene has been introduced marker suitable for use in the identification of transformed or into or expressed in the host cell or organism. Examples of transfected cells. For example, markers may provide antibi commonly used reporters include: antibiotic resistance otic resistant, fluorescent, enzymatic, as well as other traits. genes, auxotropic markers, B-galactosidase (encoded by the As a second example, markers may complement auxotrophic bacterial gene lacZ), luciferase (from lightning bugs), deficiencies or Supply critical nutrients not in the culture chloramphenicol acetyltransferase (CAT; from bacteria), media. Types of vectors include cloning and expression vec GUS (B-glucuronidase; commonly used in plants) and green tors. As used herein, the term "cloning vector refers to a fluorescent protein (GFP; from jelly fish). Reporters or mark plasmid or phage DNA or other DNA sequence which is able to replicate autonomously in a host cell and which is charac ers can be selectable or screenable. A selectable marker (e.g., terized by one or a small number of restriction endonuclease antibiotic resistance gene, auxotropic marker) is a gene con recognition sites and/or sites for site-specific recombination. fers a trait suitable for artificial selection; typically host cells A foreign DNA fragment may be spliced into the vector at expressing the selectable marker is protected from a selective these sites in order to bring about the replication and cloning agent that is toxic or inhibitory to cell growth. A screenable of the fragment. The term “expression vector” refers to a marker (e.g., gfp, lacZ) generally allows researchers to dis vector which is capable of expressing of a gene that has been tinguish between wanted cells (expressing the marker) and cloned into it. Such expression can occur after transformation unwanted cells (not expressing the marker or expressing at into a host cell, or in IVPS systems. The cloned DNA is insufficient level). usually operably linked to one or more regulatory sequences, I0087. As used herein, the term “chemotroph” or Such as promoters, activator/repressor binding sites, termina “chemotrophic organism” refers to organisms that obtain tors, enhancers and the like. The promoter sequences can be energy from the oxidation of electron donors in their environ constitutive, inducible and/or repressible. ment. As used herein, the term “chemoautotroph” or 0084 As used herein, the term “host” refers to any “chemoautotrophic organism” refers to organisms that pro prokaryotic or eukaryotic (e.g., mammalian, insect, , duce complex organic compounds from simple inorganic car plant, bacterial, archaeal, avian, animal, etc.) cell or organ bon molecules using oxidation of inorganic compounds as an ism. The host cell can be a recipient of a replicable expression external source of energy. In contrast, "heterotrophs” or “het vector, cloning vector or any heterologous nucleic acid mol erotrophic organisms' refers to organisms that must use ecule. Host cells may be prokaryotic cells such as M. florum organic carbon for growth because they cannot convert inor and E. coli, or eukaryotic cells such as yeast, insect, amphib ganic carbon into organic carbon. Instead, heterotrophs ian, or mammalian cells or cell lines. Cell lines refer to spe obtain energy by breaking down the organic molecules they cific cells that can grow indefinitely given the appropriate consume. Organisms that can use a mix of different sources of medium and conditions. Cell lines can be mammalian cell energy and carbon are mixotrophs or mixotrophic organisms lines, insect cell lines or plant cell lines. Exemplary cell lines which can alternate, e.g., between autotrophy and heterotro can include tumor cell lines and stem cell lines. The heterolo phy, between phototrophy and chemotrophy, between gous nucleic acid molecule may contain, but is not limited to, a sequence of interest, a transcriptional regulatory sequence lithotrophy and organotrophy, or a combination thereof, (such as a promoter, enhancer, repressor, and the like) and/or depending on environmental conditions. an origin of replication. As used herein, the terms "host.” I0088 As used herein, the term “inorganic energy source', "host cell,” “recombinant host' and “recombinant host cell’ “electron donor”, “source of reducing power” or “source of may be used interchangeably. For examples of such hosts, see reducing equivalents' refers to chemical species, such as Sambrook, 2001. formate, formic acid, methane, carbon monoxide, carbonyl 0085. One or more nucleic acid sequences can be targeted sulfide, carbon disulfide, hydrogen sulfide, bisulfide anion, for delivery to target prokaryotic or eukaryotic cells via con thiosulfate, elemental Sulfur, molecular hydrogen, ferrous ventional transformation or transfection techniques. As used iron, ammonia, cyanide ion, and/or hydrocyanic acid, with herein, the terms “transformation' and “transfection' are high potential electron(s) that can be donated to another intended to refer to a variety of art-recognized techniques for chemical species with a concomitant release of energy (a introducing an exogenous nucleic acid sequence (e.g., DNA) process by which the electron donor undergoes “oxidation into a target cell, including calcium phosphate or calcium and the other, recipient chemical species or “electron accep chloride co-precipitation, DEAE-dextran-mediated transfec tor” undergoes “reduction'). Inorganic energy sources are tion, lipofection, electroporation, optoporation, injection and generally but not always present external to the cell or bio the like. Suitable transformation or transfection media logical organism. The term “reducing cofactor” refers to include, but are not limited to, water, CaCl2, cationic poly intracellular redox and energy carriers, such as NADH, mers, lipids, and the like. Suitable materials and methods for NADPH, ubiquinol, menaquinol, cytochromes, flavins and/ transforming or transfecting target cells can be found in or ferredoxin, that can donate high energy electrons in reduc Sambrook, 2001, and other laboratory manuals. In certain tion-oxidation reactions. The terms “reducing cofactor. instances, oligo concentrations of about 0.1 to about 0.5 “reduced cofactor” and “redox cofactor” can be used inter micromolar (per oligo) can be used for transformation or changeably. transfection. I0089. As used herein, the term “inorganic carbon” or I0086. As used herein, the term “marker” or “reporter” “inorganic carbon compound” refers to chemical species, refers to a gene or protein that can be attached to a regulatory Such as carbon dioxide, carbon monoxide, formate, formic sequence of another gene or protein of interest, so that upon acid, carbonic acid, bicarbonate, carbon monoxide, carbonyl expression in a host cell or organism, the reporter can confer Sulfide, carbon disulfide, cyanide ion and/or hydrocyanic certain characteristics that can be relatively easily selected, acid, that contains carbon but lacks the carbon-carbon bounds identified and/or measured. Reporter genes are often used as characteristic of organic carbon compounds. Inorganic car US 2015/003 7853 A1 Feb. 5, 2015 bon may be present in a gaseous form, such as carbon mon isopropanol, butanol, octanol, fatty alcohols, fatty acid esters, oxide or carbon dioxide, or may be present in a liquid form, wax esters; hydrocarbons and alkanes such as propane, Such as formate. octane, diesel, Jet Propellant 8, polymers such as terephtha 0090. As used herein, the term “central metabolite” refers late, 1,3-propanediol. 1,4-butanediol, polyols, polyhydroxy to organic carbon compounds, Such as acetyl-coA, pyruvate, alkanoates (PHAs), polyhydroxybutyrates (PHBs), acrylate, pyruvic acid, 3-hydropropionate, 3-hydroxypropionic acid, adipic acid, epsilon-caprolactone, isoprene, caprolactam, glycolate, glycolic acid, glyoxylate, glyoxylic acid, dihy rubber, commodity chemicals such as lactate, docosa droxyacetone phosphate, glyceraldehyde-3-phosphate, hexaenoic acid (DHA), 3-hydroxypropionate, Y-Valerolac malate, malic acid, lactate, lactic acid, acetate, acetic acid, tone, lysine, serine, aspartate, aspartic acid, Sorbitol, ascor citrate and/or citric acid, that can be converted into carbon bate, ascorbic acid, isopentenol, lanosterol, omega-3 DHA. based products of interest by a host cell or organism. Central lycopene, itaconate, 1,3-butadiene, ethylene, propylene, Suc metabolites are generally restricted to those reduced organic cinate, citrate, citric acid, glutamate, malate, 3-hydroxypri compounds from which all or most cell mass components can onic acid (HPA), lactic acid, THF, gamma butyrolactone, be derived in a given host cell or organism. In some embodi pyrrolidones, hydroxybutyrate, glutamic acid, levulinic acid, ments, the central metabolite is also the carbon product of acrylic acid, malonic acid; specialty chemicals such as caro interest in which case no additional chemical conversion is tenoids, isoprenoids, itaconic acid; biological Sugars such as necessary. glucose, fructose, lactose, Sucrose, starch, cellulose, hemicel 0091 Reference to a particular chemical species includes lulose, glycogen, Xylose, dextrose, galactose, uronic acid, not only that species but also water-solvated forms of the maltose, polyketides, or glycerol; central metabolites, such as species, unless otherwise stated. For example, carbon dioxide acetyl-coA, pyruvate, pyruvic acid, 3-hydropropionate, 3-hy includes not only the gaseous form (CO) but also water droxypropionic acid, glycolate, glycolic acid, glyoxylate, Solvated forms, such as bicarbonate ion. glyoxylic acid, dihydroxyacetone phosphate, glyceralde 0092. As used herein, the term “biosynthetic pathway' or hyde-3-phosphate, malate, malic acid, lactate, lactic acid, “metabolic pathway' refers to a set of anabolic or catabolic acetate, acetic acid, citrate and/or citric acid, from which biochemical reactions for converting (transmuting) one other carbon products can be made; pharmaceuticals and chemical species into another. Anabolic pathways involve pharmaceutical intermediates Such as 7-aminodesacetox constructing a larger molecule from Smaller molecules, a ycephalosporonic acid, cephalosporin, erythromycin, process requiring energy. Catabolic pathways involve break polyketides, statins, paclitaxel, docetaxel, terpenes, peptides, ing down of larger molecules, often releasing energy. As used steroids, omegafatty acids and other Such suitable products of herein, the term “energy conversion pathway refers to a interest. Such products are useful in the context of biofuels, metabolic pathway that transfers energy from an inorganic industrial and specialty chemicals, as intermediates used to energy source to a reducing cofactor. The term "carbon fixa make additional products, such as nutritional Supplements, tion pathway' refers to a biosynthetic pathway that converts neutraceuticals, polymers, paraffin replacements, personal inorganic carbon, such as carbon dioxide, bicarbonate or care products and pharmaceuticals. formate, to reduced organic carbon, such as one or more (0095. As used herein, the term “hydrocarbon” refers a carbon product precursors. The term “carbon product biosyn chemical compound that consists of the elements carbon, thetic pathway” refers to a biosynthetic pathway that converts hydrogen and optionally, oxygen. "Surfactants' are Sub one or more carbon product precursors to one or more carbon stances capable of reducing the Surface tension of a liquid in based products of interest. which they are dissolved. They are typically composed of a 0093. As used herein, the term “engineered chemoau water-solublehead and a hydrocarbon chain or tail. The water totroph” or “engineered chemoautotrophic organism' refers soluble group is hydrophilic and can either be ionic or non to organisms that have been genetically engineered to convert ionic, and the hydrocarbon chain is hydrophobic. The term inorganic carbon compounds, Such as carbon dioxide or for “biofuel is any fuel that derives from a biological source. mate, to organic carbon compounds using energy derived 0096. The accession numbers provided throughout this from inorganic energy sources. The genetic modifications description are derived from the NCBI database (National necessary to produce an engineered chemoautotroph com Center for Biotechnology Information) maintained by the prise the introduction of heterologous energy conversion National Institute of Health, USA. The accession numbers are pathway(s) and/or carbon fixation pathway(s) into the host provided in the database on Aug. 1, 2011. The Enzyme Clas organism. The host organism can be originally heterotrophic sification Numbers (E.C.) provided throughout this descrip organism. As used herein, an engineered chemoautotroph tion are derived from the KEGG Ligand database, maintained need not derive its organic carbon compounds solely from by the Kyoto Encyclopedia of Genes and Genomics, spon inorganic carbon and need not derive its energy solely from sored in part by the University of Tokyo. The E.C. numbers inorganic energy sources. The term engineered chemoau are provided in the database on Aug. 1, 2011. totroph may also be used to refer to originally autotrophic or 0097. Other terms used in the fields of recombinant mixotrophic organisms that have been genetically engineered nucleic acid technology, microbiology, metabolic engineer to include one or more energy conversion, carbon fixation ing, and molecular and cell biology as used herein will be and/or carbon product biosynthetic pathways in addition or generally understood by one of ordinary skill in the appli instead of its endogenous autotrophic capability. The term cable arts. “engineer,” “engineering or “engineered as used herein, refers to genetic manipulation or modification of biomol Electrolytic/Electrochemical Production of Hydrogen and ecules such as DNA, RNA and/or protein, or like technique Formate commonly known in the biotechnology art. 0.098 Hydrogen gas and formate can be produced via the 0094. As used herein, the term “carbon based products of electrolysis of HO and the electrochemical conversion CO, interest” refers to include alcohols such as ethanol, propanol, respectively Whipple, 2010. Each has advantages and dis US 2015/003 7853 A1 Feb. 5, 2015 advantages as inorganic energy sources for the engineered vobacterium breve, Flavobacterium meningosepticum, chemoautotroph of the present invention. Mesoplasma florum, Micrococcus sp. CCM825, Morganella 0099 Hydrogen gas mixtures with air are explosive across morgani, Nocardia opaca, Nocardia rugosa, Planococcus a wide range of hydrogen compositions. Hence, use of hydro eucinatus, Proteus rettgeri, Propionibacterium shermanii, gen gas as an inorganic energy source and oxygen gas as the Pseudomonas Synxantha, Pseudomonas azotoformans, terminal electron acceptor of an engineered chemoautotroph Pseudomonas fluorescens, Pseudomonas Pseudomonas must necessarily be set up to cope with the resulting safety Stutzeri, Pseudomonas acidovolans, Pseudomonas mucidol risk. To address this challenge, the reactor or fermentation ens, Pseudomonas testosteroni, Pseudomonas aeruginosa, conditions may be kept Substantially anaerobic and alterna Rhodococcus erythropolis, Rhodococcus rhodochirous, tive electron acceptors, such as nitrate, may be used. Rhodococcus sp. ATCC 15592, Rhodococcus sp. ATCC 0100 Hydrogen is a gas with low water solubility which 19070, Sporosarcina ureae, Staphylococcus aureus, Vibrio creates mass transfer limitations when using hydrogen as an metschnikovii, Vibrio tyrogenes, Actinomadura madurae, inorganic energy source for engineered chemoautotrophs Actinomyces violaceochromogenes, Kitasatosporia paru (biological systems are aqueous). At large reactor or fermen losa, Streptomyces coelicolor; Streptomyces flavelus, Strep tor scales, high rates of mass transfer from the gas to liquid tomyces griseolus, Streptomyces lividans, Streptomyces Oli phases is challenging (Example 11). There are new technolo vaceus, Streptomyces tanashiensis, Streptomyces virginiae, gies being developed to address this issue U.S. Pat. No. Streptomyces antibioticus, Streptomyces cacaoi, Streptomy 7.923.227). Formate, due to its higher solubility in HO, does ces lavendulae, Streptomyces viridochromogenes, Aeromo not have this problem (Example 11). nas Salmonicida, Bacillus subtilis, Bacillus pumilus, Bacillus 0101 The energy efficiency of electrolysis for production circulans, Bacillus thiaminolyticus, Escherichia freundii, of hydrogen or electrochemical conversion of carbon dioxide Microbacterium ammoniaphilum, Serratia marcescens, Sal impacts the overall energy efficiency of a bio-manufacturing monella enterica, Salmonella typhimurium, Salmonella process using an engineered chemoautotroph of the present schottmulleri, Xanthomonas citri, Sacchromyces spp. (e.g., invention. Electrolyzers achieve overall energy efficiencies of Saccharomyces cerevisiae, Saccharomyces bavanus, Saccha 56-73% at current densities of 110-300 mA/cm (alkaline romyces boulardii, Schizosaccharomyces pombe), Arabidop electrolyzers) or 800-1600 mA/cm (PEM electrolyzers) sis thaliana, Nicotiana tabacum, CHO cells, 3T3 cells, Whipple, 2010. In contrast, electrochemical systems to date COS-7 cells, DuCaP cells, HeLa cells, LNCap cells, THP1 have achieved moderate energy efficiencies or high current cells, 293-T cells, Baby Hamster Kidney (BHK) cells, HKB densities but not at the same time. Hence, additional technol cells, hybridoma cells, as well as bacteriophage, baculovirus, ogy improvements are needed for electrochemical production adenovirus, or any modifications and/or derivatives thereof. of formate. In certain embodiments, the genetically modified host cell is a Mesoplasma florum, E. coli, yeast, archaea, mammalian Organisms or Host Cells for Engineering cells and cell lines, green plant cells and cell lines, or algae. Non-limiting examples of algae that can be used in this aspect 0102 The host cell or organism, as disclosed herein, may of the invention include: Botryococcus braunii; Neochloris be chosen from eukaryotic or prokaryotic systems, such as Oleoabundans, Scenedesmus dimorphus, Euglena gracilis, bacterial cells (Gram-negative or Gram-positive), archaea, Nannochloropsis Salina, Dunaliella tertiolecta, Tetraselinis yeast cells (for example, Saccharomyces cereviseae or Pichia chui, Isochrysis galbana, Phaeodactylum tricornutum, Pleu pastoris), animal cells and cell lines (such as Chinese hamster rochrysis carterae, Prynnesium parvum, Tetraselmis ovary (CHO) cells), plant cells and cell lines (such as Arabi suecica; or Spirulina species. Those skilled in the art would dopsis T87 cells and Tabacco BY-2 cells), and/or insect cells understand that the genetic modifications, including meta and cell lines. Suitable cells and cell lines can also include bolic alterations exemplified herein, are described with ref those commonly used in laboratories and/or industrial appli erence to a Suitable host organism Such as E. coli and their cations. In some embodiments, host cells/organisms can be corresponding metabolic reactions or a suitable source organ selected from Escherichia coli, Gluconobacter oxydans, Glu ism for desired nucleic acids such as genes for a desired conobacter Achromobacter delmarvae, Achromobacter vis metabolic pathway. However, given the complete genome cosus, Achronobacter lacticum, Agrobacterium tumefaciens, sequencing of a wide variety of organisms and the high level Agrobacterium radiobacter, Alcaligenes faecalis, Arthro of skill in the area of genomics, those skilled in the art would bacter citreus, Arthrobacter tumescens, Arthrobacter paraf readily be able to apply the teachings and guidance provided fineus, Arthrobacter hydrocarboglutamicus, Arthrobacter herein to essentially all other host cells and organisms. For Oxydans, Aureobacterium Saperdae, Azotobacter indicus, example, the E. coli metabolic modifications exemplified Brevibacterium ammoniagenes, divaricatum, Brevibacte herein can readily be applied to other species by incorporat rium lactofermentum, Brevibacterium flavum, Brevibacte rium globosum, Brevibacterium fuscum, Brevibacterium ing the same or analogous encoding nucleic acid from species ketoglutamicum, Brevibacterium helicolum, Brevibacterium other than the referenced species. Such genetic modifications pusillum, Brevibacterium testaceum, Brevibacterium include, for example, genetic alterations of species homologs, roseum, Brevibacterium immariophilium, Brevibacterium in general, and in particular, orthologs, paralogs or non linens, Brevibacterium protopharmiae, Corynebacterium orthologous gene displacements. acetophilum, Corynebacterium glutamicum, Corynebacte 0103) In certain embodiments, the host cell or organism is rium callunae, Corynebacterium acetoacidophilum, Coryne a microorganism which includes prokaryotic and eukaryotic bacterium acetoglutamicum, Enterobacter aerogenes, microbial species from the Domains Archaea, Bacteria and Erwinia amylovora, Erwinia carotovora, Erwinia herbicola, Eucarya, the latter including yeast and filamentous fungi, Erwinia chrysanthemi, Flavobacterium peregrinum, Fla protozoa, algae, or higher Protista. The terms “microbial vobacterium ficatum, Flavobacterium aurantinum, Fla organisms”, “microbial cells' and “microbes’ are used inter vobacterium rhenanum, Flavobacterium sewanense, Fla changeably with the term microorganism. US 2015/003 7853 A1 Feb. 5, 2015

0104. In certain embodiments, host microbial organisms sulfurreducens, Gloeobacter violaceus, Guillardia theta, can be selected from, and the engineered microbial organisms Haemophilus ducreyi, Haemophilus influenzae, Halobacte generated in, for example, bacteria, yeast, fungus or any of a rium, Helicobacter hepaticus, Helicobacter pylori; Homo variety of other microorganisms applicable to fermentation sapiens, Kluyveromyces waltii; Lactobacillus johnsonii, processes. Exemplary bacteria include species selected from Lactobacillus plantarum, Legionella pneumophila, Leifso Escherichia coli, Klebsiella Oxytoca, Anaerobiospirillum nia xyli, Lactococcus lactis, Leptospira interrogans, Listeria succiniciproducens, Acetobacter aceti, Actinobacillus succi innocua, Listeria monocytogenes, Magnaporthe grisea, nogenes, Mannheimia succiniciproducens, Mesoplasma flo Mannheimia succiniciproducens, Mesoplasma florum, rum, Rhizobium etli, Bacillus subtilis, Corynebacterium Mesorhizobium loci, Methanobacterium thermoautotrophi glutamicum, Gluconobacter Oxydans, Zymomonas mobilis, cum, Methanococcoides burtonii; Methanococcus jann Lactococcus lactis, Lactobacillus plantarum, Cupriavidus aschii; Methanococcus maripaludis, Methanogenium frigi necator (formerly Ralstonia eutropha), Streptomyces coeli dum, Methanopyrus kandleri, Methanosarcina acetivorans, color; Clostridium ljungdahlii, Clostridium thermocellum, Methanosarcina mazei Methylococcus capsulatus, Mus Clostridium acetobutylicum, Pseudomonas fluorescens, and musculus, Mycobacterium Bovis, Mycobacterium leprae, Pseudomonasputida. Exemplary or fungi include spe Mycobacterium paratuberculosis, Mycobacterium tubercu cies selected from Saccharomyces cerevisiae, Schizosaccha losis, Mycoplasma gallisepticum, Mycoplasma genitalium, romyces pombe, Kluyveromyces lactis, Kluyveromyces marx Mycoplasma mycoides, Mycoplasma penetrans, Myco ianus, Aspergillus terreus, Aspergillus niger, Penicillium plasma pneumoniae, Mycoplasma pulmonis, Mycoplasma chrysogenium and Pichia pastoris. E. coli is a particularly mobile, Nanoarchaeum equitans, Neisseria meningitidis, useful host organisms since it is a well characterized micro Neurospora crassa, Nitrosomonas europaea, Nocardia far bial organism Suitable for genetic engineering. Other particu cinica, Oceanobacillus iheyensis, Onions yellows phyto larly useful host organisms include yeast Such as Saccharo plasma, Oryza sativa, Pan troglodytes, Pasteurella multo myces cerevisiae. cida, Phanerochaete chrysosporium, Photorhabdus 0105. In various aspects of the invention, the cells are luminescens, Picrophilus torridus, Plasmodium falciparum, genetically engineered or metabolically evolved, for Plasmodium voelii voelii, Populus trichocarpa, Porphy example, for the purposes of optimized energy conversion romonas gingivalis Prochlorococcus marinus, Propionibac and/or carbon fixation. The terms “metabolically evolved’ or terium acnes, Protochlamydia amoebophila, Pseudomonas “metabolic evolution relates to growth-based selection aeruginosa, Pseudomonas putida, Pseudomonas Syringae, (metabolic evolution) of host cells that demonstrate improved Pyrobaculum aerophilum, Pyrococcus abyssi Pyrococcus growth (cell yield). Yet other suitable organisms include Syn firiosus, Pyrococcus horikoshii, Pyrolobus fumarii; Ralsto thetic cells or cells produced by synthetic genomes US nia Solanacearum, Rattus norvegicus, Rhodopirellula bal Patent Publication Number 2007/0264688 and cell-like sys tica, Rhodopseudomonas palustris, Rickettsia Conorii, Rick tems or synthetic cells US Patent Publication Number 2007/ ettsia typhi; Rickettsia prowaZeki. Rickettsia Sibirica, 0269862). Saccharomyces cerevisiae, Saccharomyces bavanus, Sac charomyces boulardii; Saccharopolyspora erythraea, 0106 Exemplary genomes and nucleic acids include full Schizosaccharomyces pombe, Salmonella enterica, Salmo and partial genomes of a number of organisms for which nella typhimurium, Schizosaccharomyces pombe, genome sequences are publicly available and can be used Shewanella Oneidensis, Shigella flexneria. Sinorhizobium with the disclosed methods, such as, but not limited to, Aero melilotii Staphylococcus aureus, Staphylococcus epidermi pyrum permix, Agrobacterium tumefaciens, Anabaena, dis, Streptococcus agalactiae, Streptococcus mutans, Strep Anopheles gambiae, Apis mellifera, Aquifex aeolicus, Ara tococcus pneumoniae, Streptococcus pyogenes, Streptococ bidopsis thaliana, Archaeoglobus fulgidus, Ashbya gossypii; cus thermophilus, Streptomyces avermitilis, Streptomyces Bacillus anthracia, Bacillus cereus, Bacillus halodurans, coelicolor, Sulfolobus solfataricus, Sulfolobus tokodai Syn Bacillus licheniformis, Bacillus subtilis, Bacteroides fragi echococcus, Synechoccous elongates, Synechocystis, Tak lis, Bacteroides thetaiotaOmicron, Bartonella hemselae, Bar ifugu rubripes, Tetraodon nigroviridis, Thalassiosira pseud tonella quintana, Baello vibrio bacteriovirus, Bifidobacte Onana, Thermoanaerobacter tengcongensis, Thermoplasma rium longum, Blochmannia floridanus, Bordetella acidophilum, Thermoplasma volcanium, Thermosynechoc bronchiseptica, Bordetella parapertussis, Bordetella pertus occus elongatus. Thermotagoa maritima, Thermus thermo sis, Borrelia burgdorferi; Bradyrhizobium japonicum, Bru philus, Treponema denticola, Treponema pallidum, Tro cella melitensis, Brucella suis, Buchnera aphidicola, pheryma whipplei. Ureaplasma urealyticum, Vibrio Burkholderia mallei. Burkholderia pseudomalilei, Cae norhabditis briggsae, Caenorhabditis elegans, Campylo cholerae, Vibrio parahaemolyticus, Vibrio vulnificus, bacter jejuni, Candida glabrata, Canis familiaris, Caulo Wigglesworthia glossinidia, Wolbachia pipientis, Wolinella bacter Crescentus, Chlamydia muridarum, Chlamydia succinogenes, Xanthomonas axonopodis, Xanthomonas trachomatis, Chlamydophila caviae, Chlamydophila pneu campestris, Xylella fastidiosa, Yarrowia lipolytica, Yersinia moniae, Chlorobium tepidum, Chronobacterium violaceum, pseudotuberculosis; and Yersinia pestis nucleic acids. Cioma intestinalis, Clostridium acetobutylicum, Clostridium 0107. In certain embodiments, sources of encoding perfingens, Clostridium tetania Corynebacterium diphthe nucleic acids for enzymes for an energy conversion pathway, riae, Corynebacterium efficiens, Coxiella burnetii, carbon fixation pathway or carbon product biosynthetic path Cryptosporidium hominis, Cryptosporidium parvum, Cyan way can include, for example, any species where the encoded idioschyzon merolae, Debaryomyces hansenii; Deinococcus gene product is capable of catalyzing the referenced reaction. radiodurans, Desulfotalea psychrophila, Desulfovibrio vul Exemplary species for Such sources include, for example, garis, Drosophila melanogaster, Encephalitozoon cuniculi: Aeropyrum permix, Aquifex aeolicus, Aquifex pyrophilus, Enterococcus faecalis, Erwinia carotovora, Escherichia Candidatus Arcobacter sulfidicus; Candidatus Endorifia coli. Fusobacterium nucleaturn, Gallus gallus, Geobacter persephone; Candidatus Nitrospira defluvii; Chlorobium US 2015/003 7853 A1 Feb. 5, 2015

limicola, Chlorobium tepidum, Clostridium pasteurianum, reaction to replace the referenced reaction. Because certain Desulfobacter hydrogenophilus, Desulfurobacterium therm differences among metabolic networks exist between differ olithotrophum, Geobacter metallireducens, Halobacterium ent organisms, those skilled in the art would understand that sp. NRC-1, Hydrogenimonas thermophila, Hydrogenivirga the actual gene usage between different organisms may differ. strain 128-5-R1, Hydrogenobacter thermophilus, Hydro However, given the teachings and guidance provided herein, genobaculum sp.YO4AAS1, Lebetimonas acidiphila Pd55"; those skilled in the art also would understand that the teach Leptospirillum ferriphilum, Leptospirillum ferrodiazotro ings and methods of the invention can be applied to all micro phum, Leptospirillum rubarum, Magnetococcus marinus, bial organisms using the cognate metabolic modifications to Magnetospirillum magneticum, Mycobacterium bovis, those exemplified herein to construct a microbial organism in Mycobacterium tuberculosis, Methylobacterium nodulans, a species of interest that would produce carbon-based prod Nautilia lithotrophica, Nautilia profindicola, Nautilia sp. ucts of interest from inorganic energy and inorganic carbon. strain AmN; Nitratifractorsalsuginis, Nitratiruptorsp. strain 0109. It should be noted that various engineered strains SB155-2, Persephonella marina, Rimcaris exoculata epi and/or mutations of the organisms or cell lines discussed symbiont, Streptomyces avermitilis, Streptomyces coeli herein can also be used. color, Sulfolobus avermitilis, Sulfolobus solfataricus, Sul folobus tokodai Sulfurihydrogenibium azorense, Methods for Identification and Selection of Candidate Sulfurihydrogenibium sp. Y03AOP1. Sulfurihydrogenibium Enzymes for a Metabolic Activity of Interest yellowstonense, Sulfurihydrogenibium subterraneum, Sulfil 0110. In one aspect, the present invention provides a rimonas autotrophica, Sulfurimonas denitrificans, Sulfuri method for identifying candidate proteins or enzymes of monas paralvinella, Sulfurovum lithotrophicum, Sulfurovum interest capable of performing a desired metabolic activity. sp. strain NBC37-1. Thermocrinis ruber. Thermovibrio Leveraging the exponential growth of gene and genome ammonificans. Thermovibrio ruber. Thioreductor micati sequence databases and the availability of commercial gene soli; Nostoc sp. PCC 7120, Acidithiobacillus ferrooxidans, synthesis at reasonable cost, Bayer and colleagues adopted a Allochromatium vinosum, Aphanothece halophytica, Oscil synthetic metagenomics approach to bioinformatically latoria limnetica, Rhodobacter capsulatus. Thiobacillus search sequence databases for homologous or similar denitrificans, Cupriavidus necator (formerly Ralstonia enzymes, computationally optimize their encoding gene eutropha), Methanosarcina barkeri Methanosarcia mazei; sequences for heterologous expression, synthesize the Methanococcus maripaludis, Mycobacterium Smegmatis, designed gene sequence, clone the synthetic gene into an Burkholderia stabilis, Candida boidinii, Candida methylica, expression vector and screen the resulting enzyme for a Pseudomonas sp. 101, Methylcoccus capsulatus, Mycobac desired function in E. coli or yeast Bayer, 2009. However, terium gastri, Cenarchaeum symbiosum, Chloroflexus depending on the metabolic activity or protein of interest, aurantiacus, Erythrobacter sp. NAP1, Metallosphaera there can be thousands of putative homologs in the publicly sedula; gamma proteobacterium NOR51-B; marine gamma available sequence databases. Thus, it can be experimentally proteobacterium HTCC2080, Nitrosopumilus maritimus, challenging or in some cases infeasible to synthesize and Roseiflexus castenholzii. Synechococcus elongatus; and the screen all possible homologs at reasonable cost and within a like, as well as other exemplary species disclosed herein or reasonable timeframe. To address this challenge, in one available as source organisms for corresponding genes. How aspect, this invention provides an alternate method for iden ever, with the complete genome sequence publicly available tifying and selecting candidate protein sequences for a meta for now more than 4400 species (including viruses), includ bolic activity of interest. The method comprises the following ing 1701 microbial genomes and a variety of yeast, fungi, steps. First, for a desired metabolic activity, such as an plant, and mammalian genomes, the identification of genes enzyme-catalyzed step in an energy conversion, carbon fixa encoding the requisite energy conversion, carbon fixation or tion or carbon product biosynthetic pathway, one or more carbon product biosynthetic activity for one or more genes in enzymes of interest are identified. Typically, the enzyme(s) of related or distant species, including for example, homologs, interest have been previously experimentally validated to per orthologs, paralogs and nonorthologous gene displacements form the desired activity, for example in the published scien of known genes, and the replacement of gene homolog either tific literature. In some embodiments, one or more of the within an particular engineered chemoautotroph or between enzymes of interest has been heterologously expressed and different host cells for the engineered chemoautotroph is experimentally demonstrated to be functional. Second, a bio routine and well known in the art. Accordingly, the metabolic informatic search is performed on protein classification or modifications enabling chemoautotrophic growth and pro grouping databases, such as Clusters of Orthologous Groups duction of carbon-based products described herein with ref (COGs) Tatusov, 1997: Tatusov, 2003), Entrez Protein Clus erence to a particular organism Such as E. coli can be readily ters (ProtClustDB) Klimke, 2009 and/or InterPro Zdob applied to other microorganisms, including prokaryotic and nov, 2001, to identify protein groupings that contain one or eukaryotic organisms alike. Given the teachings and guidance more of the enzyme(s) of interest (or closely related provided herein, those skilled in the art would know that a enzymes). If the enzyme(s) of interest contain multiple Sub metabolic modification exemplified in one organism can be units, then the protein corresponding to a single Subunit, for applied equally to other organisms. example the catalytic Subunit or the largest Subunit, is 0108. In some instances, such as when an alternative selected as being representative of the enzyme(s) of interest energy conversion, carbon fixation or carbon product biosyn for the purposes of bioinformatic analysis. Third, a system thetic pathway exists in an unrelated species, chemoau atic, expert-guided search is then performed to identify which totrophic growth and production of carbon-based products database groupings are likely to contain a majority of mem can be conferred onto the host species by, for example, exog bers whose metabolic activity is the same or similar as the enous expression of a paralog or paralogs from the unrelated protein(s) of interest. Fourth, the list of NCBI Protein acces species that catalyzes a similar, yet non-identical metabolic sion numbers corresponding to every members of each US 2015/003 7853 A1 Feb. 5, 2015

Selected database grouping is then compiled and the corre Methods for Design of Nucleic Acids Encoding Enzymes for sponding protein sequences are downloaded from the Heterologous Expression sequence databases. Protein sequences available from 0117. In one aspect, the present invention provides a com Sources other than the public sequence databases may be puter program product for designing a nucleic acid that added to this set. Fifth, optionally, one or more outgroup encodes a protein or enzyme of interest that is codon opti protein sequences are identified and added to the set. Out mized for the host cell or organism (the target species). The group proteins are proteins which may share some functional, program can reside on a hardware computer readable storage structural, or sequence similarities to the model enzyme(s) medium and having a plurality of instructions which, when but lack an essential feature of the enzyme(s) of interest or executed by a processor, cause the processor to perform desired metabolic activity. For example, the enzyme flavocy operations. The program comprises the following operations. tochrome c (E.C. 1.8.2.3) is similar to sulfide-quinone oxi At each amino acid position of the protein of interest, the doreductase (E.C. 1.8.5.4) in that it oxidizes hydrogen sulfide codon is selected in which the rank order codon usage fre but it reduces cytochrome c instead of ubiquinone and thus quency of that codon in the target species is the same as the offers a useful outgroup during bioinformatic analysis of rank order codon usage frequency of the codon that occurs at sulfide-quinone . Sixth, the complete set of that position in the source species gene. To select the desired protein sequences are aligned with an sequence alignment codon at each amino acid position, both the genetic code (the program capable of aligning large numbers of sequences, mapping of codons to amino acids Jukes, 1993) and codon such as MUSCLE Edgar, 2004a: Edgar, 2004b). Seventh, a frequency table (the frequency with which each synonymous tree is drawn based on the resulting MUSCLE alignment via codon occurs in a genome or genome Grantham, 1980) for methods known to those skilled in the art, such as neighbor both the source and target species are needed. For source joining Saitou, 1987 or UPGMA Sokal, 1958; Murtagh, species for which a complete genome sequence is available, 1984. Eighth, different clades are selected from the tree so the usage frequency for each codon may be calculate simply that the number of clades equals the desired number of pro by Summing the number of instances of that codon in all teins for screening. Finally, one protein from each clade is annotated coding sequences, dividing by the total number of Selected for gene synthesis and functional screening based on codons in that genome, and then multiplying by 1000. For the following heuristics Source species for which no complete genome is available, the usage frequency can be computed based on any available 0111 Preference is given to proteins that have been coding sequences or by using the codon frequency table of a heterologously expressed and experimentally demon closely related organism. The program then preferably stan strated to have the desired metabolic activity. dardizes the start codon to ATG, the stop codon to TAA, and 0112 Preference is given to proteins that have been the second and second last codons to one of twenty possible biochemically characterized to have the desired meta codons (one per amino acid). The program then subjects the bolic activity previously. codon optimized nucleic acid sequence to a series of checks to improve the likelihood that the sequence can be synthesized 0113 Preference is given to proteins from source organ via commercial gene synthesis and subsequently manipu isms for which there is strong experimental or genomic lated via molecular biology Sambrook, 2001 and DNA evidence that the organism has the desired metabolic assembly methods Knight, 2003: Knight, 2007; WO/2010/ activity. 070295). These checks comprise identifying if key restriction I0114 Preference is given to proteins in which the key enzyme recognition sites used in a DNA assembly standard or catalytic, binding and/or other signature residues are DNA assembly method are present; if hairpins whose GC conserved with respect to the protein(s) of interest. content exceeds a threshold percentage, such as 60%, and whose length exceeds a threshold number of base pairs, such I0115 Preference is given to protein from source organ as 10, are present; if sequence repeats are present; if any isms whose optimal growth temperature is similar to that subsequence between 100 and 150 nucleotides in length of the host cellor organism. For example, if the host cell exceeds a threshold GC content, such as 65%; if G or C is a mesophile, then the source organism is also a meso homopolymers greater than 5 nucleotides in length are phile. present; and, optionally, if any sequence motifs are present 0116. Therefore, in constructing the engineered chemoau that might give rise to spurious transposon insertion sites, totroph of the invention, those skilled in the art would under transcriptional or translational initiation or termination, Stand that by applying the teaching and guidance provided mRNA secondary structure, RNase cleavage, and/or tran herein, it is possible to replace or augment particular genes Scription factor binding. If the codon optimized nucleic acid within a metabolic pathway, such as an energy conversion sequence fails any of these checks, the program then iterates pathway, a carbon fixation pathway, and/or a carbon product through all possible synonymous mutations and designs a biosynthetic pathway, with homologs identified using the new nucleic acid sequence that both passes all checks and methods described here, whose gene products catalyze a minimizes the difference in codon frequencies between the similar or substantially similar metabolic reaction. Such original and new nucleic acid sequence. modifications can be done, for example, to increase flux 10118 Various implementations of the systems and tech through a metabolic pathway (for example, flux of energy or niques described here can be realized in digital electronic carbon), to reduce accumulation of toxic intermediates, to circuitry, integrated circuitry, specially designed ASICs (ap improve the kinetic properties of the pathway, and/or to oth plication-specific integrated circuits), computer hardware, erwise optimize the engineered chemoautotroph. Indeed, firmware, software, and/or combinations thereof. These vari gene homologs for a particular metabolic activity may be ous implementations can include one or more computer pro preferable when conferring chemoautrotrophic capability on grams that are executable and/or interpretable on a program a different host cell or organism. mable system including at least one programmable processor, US 2015/003 7853 A1 Feb. 5, 2015

which may be special or general purpose, coupled to receive tions, some or all of which may effectuate the methods, pro data and instructions from, and to transmit data and instruc cesses, and/or functions associated with the presently tions to, a storage system, at least one input device, and at disclosed embodiments. least one output device. Such computer programs (also 0.122 Although not shown, computer processing device known as programs, software, Software applications or code) 200 may include various forms of input and output. The I/O may include machine instructions for a programmable pro may include network adapters, USB adapters, Bluetooth cessor, and may be implemented in any form of programming radios, mice, keyboards, touchpads, displays, touch screens, language, including high-level procedural and/or object-ori LEDs, vibration devices, speakers, microphones, sensors, or ented programming languages, and/or in assembly/machine any other input or output device for use with computer pro languages. A computer program may be deployed in any cessing device 200. form, including as a stand-alone program, or as a module, component, Subroutine, or other unit Suitable for use in a Methods for Expression of Heterologous Enzymes computing environment. A computer program may be I0123 Composite nucleic acids can be constructed to deployed to be executed or interpreted on one computer or on include one or more energy conversion, carbon fixation and multiple computers at one site, or distributed across multiple optionally carbon product biosynthetic pathway encoding sites and interconnected by a communication network. nucleic acids as exemplified herein. The composite nucleic 0119) A computer program may, in an embodiment, be acids can Subsequently be transformed or transfected into a stored on a computer readable storage medium. A computer Suitable host organism for expression of one or more proteins readable storage medium stores computer data, which data of interest. Composite nucleic acids can be constructed by can include computer program code that is executed and/or operably linking nucleic acids encoding one or more stan interpreted by a computer system or processor. By way of dardized genetic parts with protein(s) of interest encoding example, and not limitation, a computer readable medium nucleic acids that have also been standardized. Standardized may comprise computer readable storage media, for tangible genetic parts are nucleic acid sequences that have been or fixed storage of data, or communication media for transient refined to conform to one or more defined technical standards, interpretation of code-containing signals. Computer readable such as an assembly standard Knight, 2003: Shetty, 2008; storage media, may refer to physical or tangible storage (as Shetty, 2011. Standardized genetic parts can encode tran opposed to signals) and may include without limitation Vola Scriptional initiation elements, transcriptional termination tile and non-volatile, removable and non-removable media elements, translational initiation elements, translational ter implemented in any method or technology for the tangible mination elements, protein affinity tags, protein degradation storage of information Such as computer-readable instruc tags, protein localization tags, selectable markers, replication tions, data structures, program modules or other data. Com elements, recombination sites for integration onto the puter readable storage media includes, but is not limited to, genome, and more. Standardized genetic parts have the RAM, ROM, EPROM, EEPROM, flash memory or other advantage that their function can be independently validated solid state memory technology, CD-ROM, DVD, or other and characterized Kelly, 2009 and then readily combined optical storage, magnetic cassettes, magnetic tape, magnetic with other standardized parts to produce functional nucleic disk storage or other magnetic storage devices, or any other acids Canton, 2008. By mixing and matching standardized physical or material medium which can be used to tangibly genetic parts encoding different expression control elements store the desired information or data or instructions and with nucleic acids encoding proteins of interest, transforming which can be accessed by a computer or processor. the resulting nucleic acid into a suitable host cell and func tionally screening the resulting engineered cell, the process of 0120 FIG. 2 shows a block diagram of a generic process both achieving soluble expression of proteins of interest and ing architecture, which may execute Software applications validing the function of those proteins is made dramatically and processes. Computer processing device 200 may be faster. For example, the set of standardized parts might com coupled to display 202 for graphical output. Processor 204 prise constitutive promoters of varying strengths Davis, may be a computer processor capable of executing Software. 2011, ribosome binding sites of varying strengths Ander Typical examples of processor 204 are general-purpose com son, 2007 and protein degradation of tags of varying puter processors (such as Intel(R) or AMDR processors), strengths Andersen, 1998. ASICs, microprocessors, any other type of processor, or the 0.124 For exogenous expression in E. coli or other like. Processor 204 may be coupled to memory 206, which prokaryotic cells, some nucleic acids encoding proteins of may be a volatile memory (e.g. RAM) storage medium for interest can be modified to introduce solubility tags onto the storing instructions and/or data while processor 204 executes. protein of interest to ensure soluble expression of the protein Processor 204 may also be coupled to storage device 208, of interest. For example, addition of the maltose binding which may be a non-volatile storage medium such as a hard protein to a protein of interest has been shown to enhance drive, FLASH drive, tape drive, DVDROM, or similar device. soluble expression in E. coli Sachdev, 1998; Kapust, 1999; Program 210 may be a computer program containing instruc Sachdev, 2000. Either alternatively or in addition, chaperone tions and/or data, and may be stored on Storage device 208 proteins, such as DnaK, DnaJ. GroES and GroEL may be and/or in memory 206, for example. In a typical scenario, either co-expressed or overexpressed with the proteins of processor 204 may load some or all of the instructions and/or interest, such as RuBisCO Greene, 2007, to promote correct data of program 210 into memory 206 for execution. folding and assembly Martinez-Alonso, 2009; Martinez 0121 Program 210 may be a computer program capable of Alonso, 2010. performing the processes and functions described above. Pro 0.125 For exogenous expression in E. coli or other gram 210 may include various instructions and Subroutines, prokaryotic cells, some nucleic acid sequences in the genes or which, when loaded into memory 206 and executed by pro cDNAs of eukaryotic nucleic acids can encode targeting sig cessor 204 cause processor 204 to perform various opera nals such as an N-terminal mitochondrial or other targeting US 2015/003 7853 A1 Feb. 5, 2015

signal, which can be removed before transformation into for the utilization of formate and/or formic acid as the inor prokaryotic host cells, if desired. For example, removal of a ganic energy source, one or more formate dehydrogenases mitochondrial leader sequence led to increased expression in (FDH) can be expressed. In a preferred embodiment, the E. coli Hoffmeister, 2005. For exogenous expression in formate dehydrogenase reduces NADP". Some naturally yeast or other eukaryotic cells, genes can be expressed in the occurring carbon fixation pathways use NADPH as the redox cytosol without the addition of leader sequence, or can be cofactor rather than NADH, such as the reductive pentose targeted to mitochondrion or other organelles, or targeted for phosphate pathway and several variants of the 3-hydroxypro secretion, by the addition of a Suitable targeting sequence pionate cycle. Accordingly, in certain aspects of the inven Such as a mitochondrial targeting or secretion signal Suitable tion, the engineered chemoautotroph expresses a Burkhold for the host cells. Thus, it is understood that appropriate eria stabilis NADP-dependent formate dehydrogenase modifications to a nucleic acid sequence to remove or include (E.C. 1.2.1.43, ACF35003) or a homolog thereof. The a targeting sequence can be incorporated into an exogenous homologs can be selected by any Suitable methods known in nucleic acid sequence to impart desirable properties. the art or by the methods described herein. This enzyme has Energy Conversion from Inorganic Energy Sources to been previously shown to preferentially use NADP as a Reduced Cofactors cofactor Hatrongjit, 2010. SEQ ID NO:1 represents the E. 0126. In certain aspects, the engineered chemoautotroph coli codon optimized coding sequence for the fah gene of the of the present invention comprises one or more energy con present invention. In one aspect, the invention provides a version pathways to convert energy from one or more inor nucleic acid molecule and homologs, variants and derivatives ganic energy sources, such as formate, formic acid, carbon of SEQ ID NO:1. The nucleic acid sequence can have pref monoxide, methane, molecular hydrogen, hydrogen Sulfide, erably 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, bisulfide anion, thiosulfate, elemental sulfur, ferrous iron, 79%, 80%, 81-85%, 90-95%, 96-98%, 99%, 99.9% or even and/or ammonia, to one or more reduced cofactors, such as higher identity to SEQID NO:1. The present invention also NADH, NADPH, reduced ferredoxins, quinols, reduced fla provides nucleic acids comprising or consisting of a sequence Vins, and reduced cytochromes. An energy conversion path which is a codon optimized version of the wild-typefah gene. way comprises the following enzymes (only some of which In another embodiment, the invention provides a nucleic acid may be exogenous depending on the host organism). encoding a polypeptide having the amino acid sequence of Together, the enzymes confer an energy conversion capability Genbank accession ACF35003, or homologs thereof having on the host cell or organism that the natural organism lacks. 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, I0127 one or more redox enzymes to oxidize the inor 80%. 81-85%,90-95%,96-98%, 99%, 99.9% or even higher ganic energy source and transfer the electrons to a reduc identity thereto. Alternatively, enzymes that naturally use ing cofactor NAD" can be engineered using established protein engineer I0128 optionally, one or more proteins that serve as a ing techniques to require NADP instead of NAD" Serov, reducing cofactor and/or enzymes that can alter intrac 2002; Gul-Karaguler, 2001. ellular pools of reducing cofactors I0133. In another embodiment, the formate dehydrogenase I0129 optionally, one or more oxidoreductases or tran reduces NAD". For example, formate dehydrogenase (E.C. shydrogenases that can transfer electrons from high to 1.2.1.2) can couple the oxidation of formate to carbon dioxide lower energy redox cofactors (or between redox cofac with the reduction of NAD" to NADH. Exemplary FDH tors with similar redox potentials) enzymes include Genbank accession numbers CAA57036, 0.130 optionally, one or more transporters or channels AAC49766 and NP 015033 or homologs thereof. SEQ ID to facilitate uptake of extracellular inorganic energy NO:2, SEQ ID NO:3 and SEQ ID NO:4 represent E. coli Sources by the engineered chemoautotroph. codon optimized coding sequence for each of these three 0131. In certain embodiments, the nucleic acids encoding FDHs, respectively, of the present invention. In one aspect, the proteins and enzymes of a energy conversion pathway are the invention provides nucleic acid molecules and homologs, introduced into a host cellor organism that does not naturally variants and derivatives of SEQID NO:2, SEQID NO:3 and contain all the energy conversion pathway enzymes. A par SEQID NO:4. The nucleic acid sequences can have prefer ticularly useful organism for genetically engineering energy ably 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, conversion pathways is E. coli, which is well characterized in 79%, 80%, 81-85%, 90-95%, 96-98%, 99%, 99.9% or even terms of available genetic manipulation tools as well as fer higher identity to SEQID NO:2, SEQID NO:3 and SEQID mentation conditions. Following the teaching and guidance NO:4. The present invention also provides nucleic acids each provided herein for introducing a sufficient number of encod comprising or consisting of a sequence which is a codon ing nucleic acids to generate a particular energy conversion optimized version of one of the wild-type fah genes. In pathway, those skilled in the art would understand that the another embodiment, the invention provides nucleic acids same engineering design also can be performed with respect each encoding a polypeptide having the amino acid sequence to introducing at least the nucleic acids encoding the energy of one of Genbank accession numbers CAA57036, conversion pathway enzymes or proteins absent in the host AAC49766 and NP 015033, or homologs thereof having organism. Therefore, the introduction of one or more encod 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, ing nucleic acids into the host organisms of the invention Such 80%. 81-85%,90-95%,96-98%, 99%, 99.9% or even higher that the modified organism contains an energy conversion identity thereto. pathway can confer the ability to use inorganic energy to I0134. In certain embodiments, the invention provides an make reducing cofactors, provided the modified organism has engineered chemoautroph that can utilize formate and/or for a suitable inorganic energy source. mic acid as an inorganic energy source and produce reduced, 0.132. In certain embodiments, the invention provides an low potential ferredoxin as the reducing cofactor. The reduc engineered chemoautroph that can utilize formate and/or for tive tricarboxylic acid cycle carbon fixation pathway is mic acid as an inorganic energy source. To engineer a host cell believed to require a low potential ferredoxin for particular US 2015/003 7853 A1 Feb. 5, 2015

carboxylation steps Brugna-Guiral, 2003, Yoon, 1997: ficile CD196 (YP 003216147 and YP 003216146), Ikeda, 2005. The organisms Nautilia sp. strain AmN, Nau Clostridium difficile 820291 (YP 003219654 and tilia profundicola, Nautilia lithotrophica 525 and Ther YP 003219653) or homologs thereof. In another embodi mocrinis ruber are reported to grow on formate as the sole ment, the invention provides nucleic acids each encoding a electron donor and use the reductive tricarboxylic acid cycle polypeptide having the amino acid sequence of one of Gen as their carbon fixation pathway Campbell, 2001; Smith, bank accession numbers YP 001310874, YP 001310871, 2008; Campbell, 2009; Miroshnichenko, 2002: Higler, YP 001089834, YP 001089833, YP 003216147, 2007, thus implying that each of these organisms have an YP 003216146, YP 003219654 and YP 003219653, or energy conversion pathway from formate to reduced ferre homologs thereofhaving 70%, 71%, 72%, 73%, 74%, 75%, doxin. To engineer a host cell for the utilization of formate 76%, 77%, 78%, 79%, 80%,81-85%,90-95%,96-98%, 99%, and/or formic acid as the inorganic energy source and pro 99.9% or even higher identity thereto. duction of reduced ferredoxin as the reducing cofactor, in 0.136. In certain embodiments, the invention provides an certain embodiments the present invention provides for the engineered chemoautotroph that can utilize molecular hydro expression of formate dehydrogenase capable of reducing gen as an inorganic energy source. To engineer a host cell for low potential ferredoxin in the engineered chemoautotroph. the utilization of molecular hydrogen as an inorganic energy Such an enzyme would facilitate the combination of an Source, one or more hydrogenases can be expressed. For energy conversion pathway that utilizes formate with a car example, NiFel-hydrogenases are typically associated with bon fixation pathway based on the reductive tricarboxylic the coupling of hydrogen oxidation to cofactor reduction acid cycle as an embodiment of the engineered chemoau Vignais, 2004. These hydrogenases tend to be composed of totroph of the present invention. Exemplary putative ferre at least a large and Small Subunit and require several accessory doxin-dependent formate dehydrogenases include (with genes for maturation including a peptidase Vignais, 2004. Genbank accession numbers of the FDH subunits listed in Recently, there have been several published examples of het parentheses) Nautilia profundicola AmH (YP 002607699, erologous expression of NiFel-hydrogenases in E. coli Sun, YP 002607700,YP 002607701 andYP 002607702), Sul 2010; Wells, 2011; Kim, 2011. Taken together, these results furimonas denitrificans DSM 1251 (YP 394.410 and demonstrate that particular maturation proteins, in particular YP 394411), Caminibacter mediatlanticus TB-2 (ZP the peptidase that cleaves the C-terminal end of the large 01871216, ZP 01871217, ZP 01871218 and Subunit, tend to be very specific for their cognate hydrogenase ZP 01871219) and Methanococcus maripaludis strain S2 and can not be substituted by homologous hydrogenase matu (NP 988417 and NP 988418) or homologs thereof. In ration factors endogenous to the host cell. Hence, functional another embodiment, the invention provides nucleic acids heterologous expression of a NiFel-hydrogenase requires each encoding a polypeptide having the amino acid sequence expression of not only the Subunit proteins, such as the large of one of Genbank accession numbers YP 0026.07699, and Small Subunit, but also one or more of the associated YP 002607700, YP 002607701, YP 002607702, maturation factors, such as the peptidase. In a preferred YP 394.410, YP 394411, ZP 01871216, ZP 01871217, embodiment, the hydrogenase reduces ferredoxin (E.C. 1.12. ZP 01871218, ZP 01871219, NP 988417 and 7.2) and in particular a low potential ferredoxin capable of NP 988418, or homologs thereof having 70%, 71%, 72%, being used as the reducing cofactor for the carboxylation 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81-85%, steps of the reductive tricarboxylic acid cycle Yoon, 1997: 90-95%, 96-98%, 99%, 99.9% or even higher identity Ikeda, 2005. The group 2a NiFel-hydrogenases are associ thereto. ated with reducing the ferredoxin needed for the reductive 0135 A ferredoxin-reducing formate dehydrogenase tricarboxylic acid cycle Brugna-Guiral, 2003; Vignais, (FDH) has been previously purified from Clostridium pas 2007. Exemplary hydrogenases include (with Genbank teurianum W5 Liu, 1984; however, no protein or nucleic accession numbers of the hydrogenase subunits listed in acid sequence information is available on the enzyme nor is parentheses) Aquifex aeolicus Hydrogenase 3 (NP 213549 there a publicly available genome sequence for Clostridium and NP 213548); Hydrogenobacter thermophilus TK-6 pasteurianum as of Aug. 1, 2011. Based on the sequencing Hup2 (YP 003432664 and YP 003432663); Hydrogeno and bioinformatic analysis of the Clostridium pasteurianum baculum sp. YO4AAS 1 HY044AAS1 1400/ genome, the sequence of a two putative subunits of a ferre HYO44AAS1 1399 (YP 002122063 and doxin-dependent FDH (FdhF and FdhD) as well as two asso YP 002122062); Magnetococcus marinus Mmc1 2493/ ciated putative ferredoxin domain-containing proteins were Mmc1 2494 (YP 866399 and YP 866400); Magnetospir identified (Example 7). In one aspect, the invention provides illum magneticum AMB-1 amb1114/amb1115 (YP 420477 nucleic acids each encoding a polypeptide having the amino and YP 420478); Methanococcus maripaludis S2 Hydroge acid sequence of one of SEQID NO:5, SEQID NO:6, SEQ nase B (NP 988273 and NP 988742); Methanosarcina ID NO:7 and SEQ ID NO:8. In another embodiment, the barkeri str. fusaro Ech (YP 303717, YP 303716, invention provides nucleic acids each encoding a polypeptide YP 303715, YP 303714, YP 303713 and YP 303712); having the amino acid sequence of one of SEQID NO:5, SEQ Methanosarcina mazei Gol Ech (NP 634344, NP 634345, ID NO:6, SEQID NO:7 and SEQID NO:8 which have been NP 634346, NP 634347, NP 634348 and NP 634349); codon optimized for the host organism, Such as E. coli. Based Mycobacterium smegmatis str. MC2 155 Hydrogenase-2 on the Clostridium pasteurianum putative FDH subunits, (YP 886615 and YP 88.6614), Nautilia profindicola AmH additional putative ferredoxin-dependent FDH were identi NAMH 0573/NAMH 0572 (YP 002606989 and fied. Exemplary ferredoxin-dependent FDH include (with YP 002606988), Nitratiruptor sp. SB155-2 Hup Genbank accession numbers of the FDH subunits listed in (YP 001356429 and YP 001356428); Persephonella parentheses) Clostridium beijerincki NCIMB 8052 marina EX-H1 PERMA 0914/PERMA 0915 (YP 001310874 and YP 001310871), Clostridium difficile (YP 002730701 and YP 002730702); Sulfurihydro 630 (YP 001089834 andYP 001089833), Clostridium dif: genibium azorense AZ-Fu1 SULAZ 0749/SULAZ 0748 US 2015/003 7853 A1 Feb. 5, 2015 20

(YP 002728734 and YP 002728733); Sulfurimonas deni oxidation of hydrogen sulfide to the reduction of a cyto trificans DSM 1251 Suden 1437/Suden 1436 chrome (E.C. 1.8.2.3) Marcia, 2010. (YP 393949 and YP 393948); Sulfurovum sp NBC37-1 Hup (YP 001358971 and YP 001358972); Thermocrinis 0.138. In certain embodiments, the invention provides an albus DSM 14484 Thal 1414/Thal 1413 (YP 0.03474-170 engineered chemoautotroph that expresses a protein that can and YP 003474169); and homologs thereof. In an alternate serve as a reducing cofactor. Such as preferably ferredoxin or embodiment, the hydrogenase reduces NADP' (E.C. 1.12.1. alternatively cytochrome c. In one embodiment, the ferre 3). The group 3b and 3d NiFel-hydrogenases are typically doxin is a low potential ferredoxin that can donate electrons to NAD(P)" reducing hydrogenases from bacteria Vignais, the carboxylation steps in the reductive tricarboxylic acid 2007. Exemplary hydrogenases include (with Genbank cycle Yoon, 1997: Ikeda, 2005. Exemplary ferredoxins accession numbers of the hydrogenase subunits listed in include AAA83524, YP 003433536, YP 003433535, parentheses) Cupriavidus necator SH (NP 942732, YP 304316, and homologs thereof. SEQID NO:17, SEQID NP 942730, NP 942729, NP 942728 and NP 942727) NO:18, SEQ ID NO:19, SEQ ID NO:20 represent E. coli and Synechocystis sp. PCC6803 bidirectional hydrogenase codon optimized coding sequence for each of these four ferre (NP 441418, NP 441417, NP 441415, NP 441414 and doxins, respectively, of the present invention. In one aspect, NP 441411), and homologs thereof. In an alternate embodi the invention provides nucleic acid molecules and homologs, ment, the hydrogenase reduces NAD" (E.C. 1.12.1.2). Exem variants and derivatives of SEQID NO:17, SEQID NO:18, plary hydrogenases include (with the Genbank accession SEQID NO:19, SEQID NO:20. The nucleic acid sequences numbers of the hydrogenase Subunits listed in parentheses) can have preferably 70%, 71%, 72%, 73%, 74%, 75%, 76%, Cupriavidus necator SH without the HoxI subunit (NP 77%, 78%, 79%, 80%, 81-85%, 90-95%, 96-98%, 99%, 942730, NP 942729, NP 942728 and NP 942727) and 99.9% or even higher identity to SEQ ID NO:17, SEQ ID homologs thereof Burgdorf, 2005. NO:18, SEQID NO:19, SEQID NO:20. The present inven 0.137 In certain embodiments, the invention provides an tion also provides nucleic acids each comprising or consisting engineered chemoautotroph that can utilize hydrogen Sulfide ofa sequence which is a codon optimized version of one of the as an inorganic energy source. To engineer a host cell for the wild-type ferredoxin genes. In another embodiment, the utilization of hydrogen Sulfideas the inorganic energy source, invention provides nucleic acids each encoding a polypeptide one or more sulfide-quinone oxidoreductases (SQR) can be having the amino acid sequence of one of Genbank accession expressed. Sulfide-quinone oxidoreductase couples the oxi numbers AAA83524, YP 003433536, YP 003433535 and dation of hydrogen sulfide to the reduction of a quinone to the YP 304316, or homologs thereof having 70%, 71%, 72%, corresponding quinol (E.C. 1.8.5.4). The Rhodobacter cap 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81-85%, sulatus SQR has been functionally expressed in the heterolo 90-95%, 96-98%, 99%, 99.9% or even higher identity gous host E. coli Schütz, 1997 and demonstrated to reduce ubiquinone Shibata, 2001. Exemplary SQR enzymes thereto. Two additional exemplary ferredoxins for which no include NP 214500, NP 488552, NP 661023, Genbank accession number has been assigned include SEQ YP 002426210, YP 003444098, YP 003576957, ID NO:22 and SEQID NO:24. SEQID NO:21 and SEQID YP 315983, YP 866354, and homologs thereof. SEQ ID NO:23 represent E. coli codon optimized coding sequence for NO:9, SEQID NO:10, SEQID NO:11, SEQID NO:12, SEQ each of these two unannotated ferredoxins, respectively, of ID NO:13, SEQ ID NO:14, SEQ ID NO:15 and SEQ ID the present invention. In one aspect, the invention provides NO:16 represent E. coli codon optimized coding sequence for nucleic acid molecules and homologs, variants and deriva each of these eight SQRs, respectively, of the present inven tives of SEQID NO:21 and SEQID NO:23. The nucleic acid tion. In one aspect, the invention provides nucleic acid mol sequences can have preferably 70%, 71%, 72%, 73%, 74%, ecules and homologs, variants and derivatives of SEQ ID 75%, 76%, 77%, 78%, 79%, 80%,81-85%,90-95%,96-98%, NO:9, SEQID NO:10, SEQID NO:11, SEQID NO:12, SEQ 99%, 99.9% or even higher identity to SEQ ID NO:21 and ID NO:13, SEQ ID NO:14, SEQ ID NO:15 and SEQ ID SEQID NO:23. The present invention also provides nucleic NO:16. The nucleic acid sequences can have preferably 70%, acids each comprising or consisting of a sequence which is a 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, codon optimized version of one of these two wild-type ferre 81-85%,90-95%, 96-98%, 99%, 99.9% or even higher iden doxin genes. In another embodiment, the invention provides tity to SEQID NO:9, SEQID NO:10, SEQID NO:11, SEQ nucleic acids each encoding a polypeptide having the amino ID NO:12, SEQID NO:13, SEQID NO:14, SEQ ID NO:15 acid sequence of one of SEQID NO:22 and SEQID NO:24, and SEQID NO:16. The present invention provides nucleic or homologs thereof having 70%, 71%, 72%, 73%, 74%, acids each comprising or consisting of a sequence which is a 75%, 76%, 77%, 78%, 79%, 80%,81-85%,90-95%,96-98%, codon optimized version of one of the wild-type Sqr genes. In 99%, 99.9% or even higher identity thereto. another embodiment, the invention provides nucleic acids 0.139. In certain embodiments, the invention provides an each encoding a polypeptide having the amino acid sequence engineered chemoautotrophthat can transfer energy from one of one of Genbank accession numbers NP 214500, reduced cofactor to another. In one embodiment, a ferre NP 488552, NP 661023, YP 0024262.10, doxin-NADP" reductase (FNR) is expressed. FNR can cata YP 003444098, YP 003576957, YP 315983, lyze reversible electron transfer between the two-electron YP 866354, or homologs thereof having 70%, 71%, 72%, carrier NADPH and the one-electron carrier ferredoxin (E.C 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81-85%, 1.18.1.2). Exemplary FNR enzymes include the Hydrogeno 90-95%, 96-98%, 99%, 99.9% or even higher identity bacter thermophilus Fpr (Genbank accession BAH29712) thereto. Alternatively, to engineer a host cell for the utilization and homologs thereof Ikeda, 2009. In another embodiment, of hydrogen sulfide, one or more flavocytochrome c sulfide a ferredoxin-NAD" reductase (E.C. 1.18.1.3) and/or a NAD dehydrogenases can be expressed. Flavocytochrome c sulfide (P) transhydrogenase (E.C. 1.6.1.1 or E.C. 1.6.1.2) is dehydrogenase is similar in structure to SQR but couples the expressed. US 2015/003 7853 A1 Feb. 5, 2015

Carbon Fixation of Inorganic Carbon to Central Metabolites 1.31). In one embodiment, the invention provides an engi 0140. In certain aspects, the engineered chemoautotroph neered chemoautotroph comprising one or more exogenous of the present invention comprises one or more carbon fixa proteins from the rTCA cycle conferring to the organism the tion pathways to use energy from one or more reduced cofac ability to produce central metabolites from inorganic carbon, tors, such as NADH, NADPH, reduced ferredoxins, quinols, wherein the organism lacks the ability to fix carbon via the reduced flavins, and reduced cytochromes, to convert inor rTCA cycle (for example, see FIG. 4). For example, the one or ganic carbon, such as carbon dioxide, formate, or formic acid, more exogenous proteins can be selected from ATP citrate into central metabolites, such as acetyl-coA, pyruvate, gly lyase, citryl-CoA synthetase, citryl-CoA lyase, malate dehy oxylate, glycolate and dihydroxyacetone phosphate. One or drogenase, fumarate dehydratase, fumarate reductase, Succi more of the carbon fixation pathways can be derived from nyl-CoA synthetase, 2-oxoglutarate synthase, isocitrate naturally occurring carbon fixation pathways, such as the dehydrogenase, 2-oxoglutarate carboxylase, oxaloSuccinate Calvin-Benson-Bassham cycle or reductive pentose phos reductase, aconitate hydratrase, pyruvate synthase, phospho phate cycle, the reductive tricarboxylic acid cycle, the Wood enolpyruvate synthetase, and phosphoenolpyruvate carboxy Ljungdhal or reductive acetyl-coA pathway, the 3-hydrox lase. The host organism can also express two or more, three or ypropionate bicycle, 3-hydroxypropionate/4- more, four or more, five or more, and the like, including up to hydroxybutyrate cycle and the dicarboxylate/4- all the protein and enzymes that confer the rTCA pathway. hydroxybutyrate cycle Higler, 2011. Alternatively, one or For example, in the host organism E. coli, the exogenous more of the carbon fixation pathways can be derived from enzymes comprise 2-oxoglutarate synthase and ATP citrate synthetic metabolic pathways not found in nature, such as lyase. As a second example, in the host organism E. coli, the those enumerated by Bar-Even et al. Bar-Even, 2010. In exogenous enzymes comprise 2-oxoglutarate synthase, ATP certain embodiments, the nucleic acids encoding the proteins citrate lyase and pyruvate synthase. Finally, as a third and enzymes of a carbon fixation pathway are introduced into example, in the host organism E. coli, the exogenous enzymes a host cell or organism that does not naturally contain all the comprise 2-oxoglutarate synthase, ATP citrate lyase, pyru carbon fixation pathway enzymes. A particularly useful vate synthase, 2-oxoglutarate carboxylase and OxaloSucci organism for genetically engineering carbon fixation path nate reductase. In another embodiment, alternate enzymes ways is E. coli, which is well characterized in terms of avail can be used that result in the same overall carbon fixation able genetic manipulation tools as well as fermentation con pathway. For example, the enzyme malate dehydrogenase ditions. Following the teaching and guidance provided herein (E.C. 1.1.1.39) can substitute for malate dehydrogenase and for introducing a sufficient number of encoding nucleic acids phosphoenolpyruvate carboxylase. The enzymes 2-oxoglut to generate a particular carbon fixation pathway, those skilled arate synthase and pyruvate synthase can be difficult to dis in the art would understand that the same engineering design tinguish from sequence data alone. Both enzymes comprise also can be performed with respect to introducing at least the 1-5 protein Subunits depending on the species. Exemplary nucleic acids encoding the carbon fixation pathway enzymes pyruvate/2-oxoglutarate synthases include NP 213793, or proteins absent in the host organism. Therefore, the intro NP 213794, and NP 213795; NP 213818, NP 213819 duction of one or more encoding nucleic acids into the host and NP 213820; AAD07654, AAD07655, AAD07656 and organisms of the invention Such that the modified organism AAD07653; ABK44257, ABK44258 and ABK44249; contains a carbon fixation pathway can confer the ability to ACD90193 and ACD90192: YP 001942282 and use inorganic carbon to make central metabolites, provided YP 001942281; and homologs thereof. Exemplary 2-oxo the modified organism has a Suitable inorganic energy source glutarate synthases include BAI69550 and BAI69551; and energy conversion pathway. YP 003432753, YP 003432754, YP 003432755, 0141. In certain embodiments, the invention provides an YP 003432756 and YP 003432757; YP 393565, engineered chemoautotroph with a carbon fixation pathway YP 393566, YP 393567 and YP 393568; BAFT1539, derived from the reductive tricarboxylic acid (rTCA) cycle. BAFT1540, BAF71541 and BAFT1538; BAF69954, The rTCA cycle is well known in the art and consists of BAF69955, BAF69956 and BAF69953: AAM71411 and approximately 11 reactions (FIG.3) Evans, 1966; Buchanan, AAM71410; YP 002607621, YP 002607620, 1990. For two of the reactions (reaction 1 and 7), there are YP 002607619 and YP 002607622; CAA12243 and two known routes between the substrate and product and each CAD27440; and homologs thereof. Exemplary pyruvate syn route is catalyzed by different enzyme(s). The reactions in the thases include YP 392.614, YP 392615, YP 392612 and rTCA cycle are catalyzed by the following enzymes: ATP YP 392.613; YP 001357517, YP 001357518; citrate lyase (E.C. 2.3.3.8) Sintsov, 1980: Kanao, 2002b: YP 001357515 and YP 001357515; YP 001357066, citryl-CoA synthetase (E.C. 6.2.1.18) Aoshima, 2004a): cit YP 001357065, YP 001357068 and YP 001357067; and ryl-CoA lyase (E.C. 4.1.3.34) Aoshima, 2004b; malate homologs thereof. ATP citrate comprise 1-4 protein dehydrogenase (E.C. 1.1.1.37); fumarate dehydratase or Subunits depending on the species. Exemplary ATP citrate fumarase (E.C. 4.2.1.2); fumarate reductase (E.C. 1.3.99.1); lyases include AAC06486; YP 393085 and YP 393084; Succinyl-CoA synthetase (E.C. 6.2.1.5), 2-oxoglutarate Syn BAFT1501 and BAFT1502; BAF69766 and BAF69767; thase or 2-oxoglutarate:ferredoxin oxidoreductase (E.C. 1.2. ACX98447: AAM72322 and AAM72321; YP 002607124 7.3) Gehring, 1972; Yamamoto, 2010; isocitrate dehydro and YP 002607125; BAB21376 and BAB21375; and genase (E.C. 1.1.1.41 or E.C. 1.1.1.42) Kanao, 2002a: homologs thereof. Exemplary citryl-coA synthetases include 2-oxoglutarate carboxylase (E.C. 6.4.1.7) Aoshima, 2004c: BAD17846 and BAD 17844. Exemplary citryl-coA lyases Aoshima, 2006; oxalosuccinate reductase (E.C. 1.1.1.41) include BAD 17841. Aoshima, 2004c: Aoshima, 2006; aconitate hydratrase 0142. In certain embodiments, the invention provides an (E.C. 4.2.1.3); pyruvate synthase or pyruvate:ferredoxin oxi engineered chemoautotroph with a carbon fixation pathway doreductase (E.C. 1.2.7.1); phosphoenolpyruvate synthetase derived from the 3-hydroxypropionate (3-HPA) bicycle. The (E.C. 2.7.9.2); phosphoenolpyruvate carboxylase (E.C. 4.1. 3-HPA bicycle is well known in the art and consists of 19 US 2015/003 7853 A1 Feb. 5, 2015 22 reactions catalyzed by 13 enzymes (FIG. 5) Holo, 1989; ZP 01039179 and YP 001636209, and homologs thereof. Strauss, 1993: Eisenreich, 1993; Herter, 2002a: Zarzycki, SEQ ID NO:25, SEQ ID NO:26, SEQ ID NO:27, SEQ ID 2009; Zarzycki, 2011. The number of reactions in the meta NO:28 and SEQID NO:29 represent E. coli codon optimized bolic pathway exceeds the number of enzymes because par coding sequence for each of these five malonyl-CoA reduc ticular enzymes, such as malonyl-CoA reductase, propionyl tases, respectively, of the present invention. In one aspect, the CoA synthase, and malyl-CoA/B-methylmalyl-CoA/ invention provides nucleic acid molecules and homologs, citramalyl-CoA lyase, are multi-functional enzymes that variants and derivatives of SEQID NO:25, SEQID NO:26, catalyze more than one reaction. Also, in Some species. Such SEQ ID NO:27, SEQ ID NO:28 and SEQ ID NO:29. The as Metallosphaera sedulla, the same enzyme can carboxylate nucleic acid sequences can have preferably 70%, 71%, 72%, acetyl-CoA and propionyl-CoA. The reactions in the 3-HPA 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81-85%, bicycle are catalyzed by the following enzymes: acetyl-CoA 90-95%,96-98%,99%,99.9% oreven higher identity to SEQ carboxylase (E.C. 6.4.1.2) Menendez, 1999: Higler, 2003: ID NO:25, SEQID NO:26, SEQID NO:27, SEQID NO:28 malonyl-CoA reductase (E.C. 1.2.1.75 and E.C. 1.1.1.298) and SEQ ID NO:29. The present invention also provides Higler, 2002; Alber, 2006: Rathnasingh, 2011; propionyl nucleic acids each comprising or consisting of a sequence CoA synthase (E.C. 6.2.1.-, E.C. 4.2.1.- and E.C. 1.3.1.-) which is a codon optimized version of one of the wild-type Alber, 2002; propionyl-CoA carboxylase (E.C. 6.4.1.3) malonyl-CoA reductase genes. In another embodiment, the Menendez, 1999; Higler, 2003; methylmalonyl-CoA epi invention provides nucleic acids each encoding a polypeptide merase (E.C. 5.1.99.1); methylmalonyl-CoA mutase (E.C. having the amino acid sequence of one of Genbank accession 5.4.99.2); succinyl-CoA: (S)-malate CoA transferase (E.C. numbers ZP 04957 196, YP 001433009, ZP 01626393, 2.8.3.-) Friedmann, 2006; succinate dehydrogenase (E.C. ZP 01039179 and YP 001636209, or homologs thereof 1.3.5.1); fumarate hydratase (E.C. 4.2.1.2): (S)-malyl-CoA/ having 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, B-methylmalyl-CoA/(S)-citramalyl-CoA lyase (MMC lyase, 79%, 80%, 81-85%, 90-95%, 96-98%, 99%, 99.9% or even E.C. 4.1.3.24 and E.C. 4.1.3.25) Herter, 2002b: Friedmann, higher identity thereto. Exemplary propionyl-CoA synthases 2007; mesaconyl-C1-CoA hydratase or 3-methylmalyl include AAL47820, and homologs thereof. SEQID NO:30 CoA dehyratase (E.C. 4.2.1.-) Zarzycki, 2008; mesaconyl represents the E. coli codon optimized coding sequence for CoA C1-C4 CoA transferase (E.C. 2.8.3.-) Zarzycki, 2009: this propionyl-CoA synthase of the present invention. In one mesaconyl-C4-CoA hydratase (E.C. 4.2.1.-) Zarzycki, aspect, the invention provides nucleic acid molecule and 2009. In one embodiment, the invention provides an engi homologs, variants and derivatives of SEQ ID NO:30. The neered chemoautotroph comprising one or more exogenous nucleic acid sequence can have preferably 70%, 71%, 72%, proteins from the 3-HPA bicycle conferring to the organism 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81-85%, the ability to produce central metabolites from inorganic car 90-95%,96-98%,99%,99.9% oreven higher identity to SEQ bon, wherein the organism lacks the ability to fix carbon via ID NO:30. The present invention provides nucleic acids each the 3-HPA bicycle (for example, see FIG. 6). Methylmalonyl comprising or consisting of a sequence which is a codon CoA epimerase activity has been reported in E. coli although optimized version of the wild-type propionyl-CoA synthase no corresponding gene or gene product has been identified gene. In another embodiment, the invention provides a Evans, 1993. For E. coli ScpA to be active, vitamin B12 nucleic acid encoding a polypeptide having the amino acid must be present in culture medium or produced intracellu sequence of SEQ ID NO:31, or homologs thereof having larly. For example, the one or more exogenous proteins can be 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, selected from acetyl-CoA carboxylase, malonyl-CoA reduc 80%. 81-85%,90-95%,96-98%, 99%, 99.9% or even higher tase, propionyl-CoA synthase, propionyl-CoA carboxylase, identity thereto. The enzyme acetyl-CoA/propionyl-CoA car methylmalonyl-CoA epimerase, methylmalonyl-CoA boxylase is composed of three subunits: Pech3. AccC and mutase, Succinyl-CoA: (S)-malate CoA transferase. Succinate AccB. Exemplary acetyl-CoA/propionyl-CoA carboxylases dehydrogenase, fumarate hydratase, (S)-malyl-CoA/B-meth include those from Metallosphaera sedula DSM5348 (YP ylmalyl-CoA/(S)-citramalyl-CoA lyase, mesaconyl-C1-CoA 001 191457, YP 001 190248, YP 001190249); Nitros hydratase, mesaconyl-CoA C1-C4 CoA transferase, and opumilus maritimus SCM1 (YP 00158606, mesaconyl-C4-CoA hydratase. The host organism can also YP 001581607, YP 001581608); Cenarchaeum symbio express two or more, three or more, four or more, five or more, sum A (YP 876582, YP 876583, YP 876584); Halobac six or more, seven or more, and the like, including up to all the terium sp. NRC-1 (NP 280337 or NP 279647; protein and enzymes that confer the 3-HPA pathway. For NP 280339 or NP 280547; NP 280866), and homologs example, in the host organism E. coli, the exogenous enzymes thereof. SEQ ID NO:32, SEQ ID NO:33, SEQ ID NO:34, comprise malonyl-CoA reductase, propionyl-CoA synthase, SEQ ID NO:35, SEQ ID NO:36, SEQ ID NO:37, SEQ ID acetyl-CoA/propionyl-CoA carboxylase, Succinyl-CoA:(S)- NO:38, SEQ ID NO:39, SEQ ID NO:40, SEQ ID NO:41, malate CoA transferase, and MMC lyase. As a second SEQID NO:42, SEQID NO:43, SEQID NO:44 and SEQID example, in the host organism E. coli, the exogenous enzymes NO:45 represent E. coli codon optimized coding sequence for comprise malonyl-CoA reductase, propionyl-CoA synthase, each of these acetyl-CoA?propionyl-CoA carboxylase sub acetyl-CoA/propionyl-CoA carboxylase, Succinyl-CoA:(S)- units, respectively, of the present invention. In one aspect, the malate CoA transferase, MMC lyase, and methylmalonyl invention provides nucleic acid molecules and homologs, CoA epimerase. Finally, as a third example, in the host organ variants and derivatives of SEQID NO:32, SEQID NO:33, ism E. coli, the exogenous enzymes comprise malonyl-CoA SEQ ID NO:34, SEQ ID NO:35, SEQ ID NO:36, SEQ ID reductase, propionyl-CoA synthase, propionyl-CoA car NO:37, SEQ ID NO:38, SEQ ID NO:39, SEQ ID NO:40, boxylase, succinyl-CoA:(S)-malate CoA transferase, MMC SEQ ID NO:41, SEQ ID NO:42, SEQ ID NO:43, SEQ ID lyase, methylmalonyl-CoA epimerase and methylmalonyl NO:44 and SEQID NO:45. The nucleic acid sequences can CoA mutase. Exemplary malonyl-coA reductases include have preferably 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, ZP 049571.96, YP 001433009, ZP 01626393, 78%, 79%, 80%, 81-85%, 90-95%, 96-98%, 99%, 99.9% or US 2015/003 7853 A1 Feb. 5, 2015

even higher identity to SEQID NO:32, SEQID NO:33, SEQ YP 115138 and YP 115430, respectively, of the present ID NO:34, SEQID NO:35, SEQID NO:36, SEQID NO:37, invention. In one aspect, the invention provides nucleic acid SEQ ID NO:38, SEQ ID NO:39, SEQ ID NO:40, SEQ ID molecules and homologs, variants and derivatives of SEQID NO:41, SEQID NO:42, SEQID NO:43, SEQID NO:44 and NO:46 and SEQID NO:47. The nucleic acid sequences can SEQID NO:45. The present invention provides nucleic acids have preferably 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, each comprising or consisting of a sequence which is a codon 78%, 79%, 80%, 81-85%, 90-95%, 96-98%, 99%, 99.9% or optimized version of one of the wild-type acetyl-CoA/pro even higher identity to SEQID NO:46 and SEQID NO:47. pionyl-CoA carboxylase genes. In another embodiment, the The present invention provides nucleic acids each comprising invention provides nucleic acids each encoding a polypeptide or consisting of a sequence which is a codon optimized ver having the amino acid sequence of one of Genbank accession sion of one of the wild-type HPS genes. In another embodi numbers YP 001 191457, YP 001 190248, ment, the invention provides nucleic acids each encoding a YP 001 190249, YP 00158606, YP 001581607, polypeptide having the amino acid sequence of one of YP 001581608, YP 876582, YP 876583, YP 876584, YP 115138 and YP 115430, or homologs thereof having NP 280337, NP 279647, NP 280339, NP 280547 and 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, NP 280866, or homologs thereof having 70%, 71%, 72%, 80%. 81-85%,90-95%,96-98%, 99%, 99.9% or even higher 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81-85%, identity thereto. Exemplary PHI enzymes include 90-95%, 96-98%, 99%, 99.9% or even higher identity YP 115431 and BAA90545, and homologs thereof. SEQID thereto. The enzyme succinyl-CoA:malate-CoA transferase NO:48 represent E. coli codon optimized coding sequence for is composed of two subunits, such as SmtA and SmtB in PHI enzyme YP 115431 of the present invention. In one Chloroflexus aurantiacus. Exemplary Succinyl-CoA:malate aspect, the invention provides nucleic acid molecule and CoA transferase subunits include ABF 14399 and ABF 14400, homologs, variants and derivatives of SEQ ID NO:48. The and homologs thereof. Exemplary MMC lyases include nucleic acid sequence can have preferably 78%, 79%. 80%, YP 0017633817, and homologs thereof. 81-85%, 90-95%,96-98%, 99%, 99.9% or even higher iden tity to SEQID NO:48. The present invention provides nucleic 0143. In certain embodiments, the invention provides an acids each comprising or consisting of a sequence which is a engineered chemoautotroph with a carbon fixation pathway codon optimized version of one of the wild-type PHI genes. derived from the ribulose monophosphate (RuMP) cycle. The In another embodiment, the invention provides nucleic acids RuMP cycle is well known in the art and consists of 9 reac each encoding a polypeptide having the amino acid sequence tions (FIG. 7) Strom, 1974). Reactions 1 and 2 (FIG. 7) are catalyzed by two separate enzymes in some organisms and by ofYP 115431, or homologs thereofhaving 70%, 71%, 72%, a bifunctional fusion enzyme in other organisms Yurimoto, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81-85%, 2009). The reactions in the RUMP cycle are catalyzed by the 90-95%, 96-98%, 99%, 99.9% or even higher identity following enzymes: hexulose-6-phosphate synthase (HPS, thereto. Exemplary HPS-PHI enzymes include NP 143767 E.C. 4.1.2.43) Kemp, 1972; Kemp, 1974; 6-phospho-3- andYP 182888, and homologs thereof. SEQID NO:49 rep hexuloisomerase (PHI, E.C. 5.3.1.27) Strom, 1974; Ferenci, resents an E. coli codon optimized coding sequence for a 1974; phosphofructokinase (PFK, E.C. 2.7.1.11); fructose fusion of the Mycobacterium gastri MB 19 HPS enzyme bisphosphate aldolase (FBA, E.C. 4.1.2.13); transketolase (BAA90546) and PHI enzyme (BAA90545) of the present (TK, E.C. 2.2.1.1); transaldolase (TA, E.C. 2.2.1.2); 7, tran invention. In one aspect, the invention provides nucleic acid sketolase (TK, E.C. 2.2.1.1); ribose 5-phosphate isomerase molecule and homologs, variants and derivatives of SEQID (RPI, E.C. 5.3.1.6): ribulose-5-phosphate-3-epimerase (RPE, NO:49. The nucleic acid sequence can have preferably 70%, E.C. 5.1.3.1). In one embodiment, the invention provides an 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, engineered chemoautotroph comprising one or more exog 81-85%, 90-95%,96-98%, 99%, 99.9% or even higher iden enous proteins from the RuMP cycle conferring to the organ tity to SEQID NO:49. The present invention provides nucleic ism the ability to produce central metabolites from inorganic acids each comprising or consisting of a sequence which is a carbon, wherein the organism lacks the ability to fix carbon codon optimized version of one of the wild-type HPS and one via the RuMP cycle (for example, see FIG. 8). For example, of the wild-type PHI genes. In another embodiment, the the one or more exogenous proteins can be selected from invention provides nucleic acids each encoding a polypeptide heXulose-6-phosphate synthase, 6-phospho-3-hexuloi having the amino acid sequence of SEQ ID NO:50, or Somerase, heXulose-6-phosphate synthase/6-phospho-3- homologs thereofhaving 70%, 71%, 72%, 73%, 74%, 75%, hexuloisomerase fusion enzyme Orita, 2005: Orita, 2006; 76%, 77%, 78%, 79%, 80%,81-85%,90-95%,96-98%, 99%, Orita, 2007, phosphofructokinase, fructose bisphosphate 99.9% or even higher identity thereto. aldolase, transketolase, transaldolase, transketolase, ribose 0144. In certain embodiments, the invention provides an 5-phosphate isomerase, and ribulose-5-phosphate-3-epime engineered chemoautotroph comprising a carbon fixation rase. The host organism can also express one or more, two or pathway derived from the RuMP cycle, as described above, more, three or more, and the like, including up to all the and in which formaldehyde is produced from formate. The protein and enzymes that confer the RuMP pathway. For conversion of formate to formyl-coenzyme A (formyl-CoA) example, in the host organism E. coli, the exogenous enzymes can be catalyzed by the enzyme acetyl-CoA synthetase (ACS) comprise hexulose-6-phosphate synthase and 6-phospho-3- operating on anon-cognate Substrate see, e.g., WO 2012/ heXuloisomerase. As a second example, in the host organism 037413. The conversion of formyl-CoA to formaldehyde can E. coli, the exogenous enzymes comprise the bifunctional be catalyzed by the enzyme (acylating) acetaldehyde dehy fusion enzyme heXulose-6-phosphate synthase/6-phospho-3- drogenase (ADH), operating on a non-cognate Substrate. hexuloisomerase. Exemplary HPS enzymes include Exemplary ACS enzymes include AAC77039, and homologs YP 115138, YP 115430 and BAA90546, and homologs thereof. SEQID NO:61 represents recoded coding sequence thereof. SEQID NO:46 and SEQID NO.47 represent E. coli for ACS enzyme AAC77039 of the present invention. In one codon optimized coding sequence for HPS enzymes aspect, the invention provides nucleic acid molecules and US 2015/003 7853 A1 Feb. 5, 2015 24 homologs, variants and derivatives of SEQ ID NO:61. The RPP pathway. For example, in the host organism E. coli, the nucleic acid sequences can have preferably 70%, 71%, 72%, exogenous enzymes comprise ribulose bisphosphate car 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81-85%, boxylase, Sedoheptulose bisphosphatase and phosphoribu 90-95%,96-98%,99%,99.9% or even higher identity to SEQ lokinase. As a second example, in the host organism E. coli, ID NO:61. The present invention provides nucleic acids each the exogenous enzymes comprise ribulose bisphosphate car comprising or consisting of a sequence which is a codon boxylase, NADPH-dependent glyceraldehyde-3P dehydro optimized version of one of the wild-type ACS genes. In genase, Sedoheptulose bisphosphatase and phosphoribuloki another embodiment, the invention provides nucleic acids nase. Ribulose bisphosphate carboxylase has two distinct each encoding a polypeptide having the amino acid sequence forms: Form I and Form II Portis, 2007. Form I is composed of AAC77039, or homologs thereofhaving 70%, 71%, 72%, of four large subunit dimers and eight small subunits (LSs) 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81-85%, and has been expressed previously in heterologous hosts, 90-95%, 96-98%, 99%, 99.9% or even higher identity thereto. Exemplary ADH enzymes include NP 464704, such as Escherichia coli Gatenby, 1985; Tabita, 1985; Gut NP 415757, AAD31841, CAA43226, and homologs teridge, 1986. Exemplary RuBisCO subunits include thereof. SEQID NO:62 represents an E. coli codon optimized YP 170840 andYP 170839, and homologs thereof. Exten coding sequence for ADH enzyme AAC77039 of the present sive work has been done to attempt to optimize the function of invention. In one aspect, the invention provides nucleic acid RuBisCO Parikh, 2006: Greene, 2007), and thus engineered molecules and homologs, variants and derivatives of SEQID RuBisCO enzymes may also be used in the present invention. NO:62. The nucleic acid sequences can have preferably 70%, Exemplary NADPH-dependent GAPDH enzymes include 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, YP 400759, and homologs thereof. SEQ ID NO:51 repre 81-85%,90-95%, 96-98%, 99%, 99.9% or even higher iden sents an E. coli codon optimized coding sequence for this tity to SEQID NO:62. The present invention provides nucleic GAPDH of the present invention. In one aspect, the invention acids each comprising or consisting of a sequence which is a provides nucleic acid molecule and homologs, variants and codon optimized version of one of the wild-type ADH genes. derivatives of SEQID NO:51. The nucleic acid sequence can In another embodiment, the invention provides nucleic acids have preferably 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, each encoding a polypeptide having the amino acid sequence 78%, 79%, 80%, 81-85%, 90-95%, 96-98%, 99%, 99.9% or of NP 464704, NP 415757, AAD31841 or CAA43226, or even higher identity to SEQID NO:51. The present invention homologs thereof having 70%, 71%, 72%, 73%, 74%, 75%, provides nucleic acids each comprising or consisting of a 76%, 77%, 78%, 79%, 80%,81-85%,90-95%,96-98%, 99%, sequence which is a codon optimized version of one of the 99.9% or even higher identity thereto. wild-type GAPDH genes. In another embodiment, the inven 0145. In certain embodiments, the invention provides an tion provides nucleic acids each encoding a polypeptide hav engineered chemoautotroph whose carbon fixation pathway ing the amino acid sequence of YP 400759, or homologs is the Calvin-Benson-Bassham cycle or reductive pentose thereofhaving 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, phosphate (RPP) cycle. The Calvin cycle is well known in the 78%, 79%, 80%, 81-85%, 90-95%, 96-98%, 99%, 99.9% or art and consists of 13 reactions (FIG.9) Bassham, 1954. The even higher identity thereto. Exemplary SBPase enzymes reactions in the RPP cycle are catalyzed by the following include YP 399524, and homologs thereof. SEQID NO:52 enzymes: ribulose bisphosphate carboxylase (RuBisCO, E.C. represents an E. coli codon optimized coding sequence for 4.1.1.39); phosphoglycerate kinase (PGK, E.C. 2.7.2.3); this SBPase of the present invention. In one aspect, the inven glyceraldehyde-3P dehydrogenase (phosphorylating) tion provides nucleic acid molecule and homologs, variants (GAPDH, E.C. 1.2.1.12 or E.C. 1.2.1.13); triose-phosphate and derivatives of SEQID NO:52. The nucleic acid sequence isomerase (TPI, E.C. 5.3.1.1); fructose-bisphosphate aldo can have preferably 70%, 71%, 72%, 73%, 74%, 75%, 76%, lase (FBA, E.C. 4.1.2.13); fructose-bisphosphatase (FBPase, 77%, 78%, 79%, 80%, 81-85%, 90-95%, 96-98%, 99%, E.C. 3.1.3.11); transketolase (TK, E.C. 2.2.1.1); sedoheptu 99.9% or even higher identity to SEQID NO:52. The present lose-1.7-bisphosphate aldolase (SBA, E.C. 4.1.2.-); sedohep invention provides nucleic acids each comprising or consist tulose bisphosphatase (SBPase, E.C.3.1.3.37); transketolase ing of a sequence which is a codon optimized version of one (TK, E.C. 2.2.1.1); ribose-5-phosphate isomerase (RPI, E.C. of the wild-type SBPase genes. In another embodiment, the 5.3.1.6); ribulose-5-phosphate-3-epimerase (RPE. E.C. 5.1. invention provides nucleic acids each encoding a polypeptide 3.1); phosphoribulokinase (PRK, E.C. 2.7.1.19). In one having the amino acid sequence ofYP 399524, or homologs embodiment, the invention provides an engineered chemoau thereofhaving 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, totroph comprising one or more exogenous proteins from the 78%, 79%, 80%, 81-85%, 90-95%, 96-98%, 99%, 99.9% or RPP cycle conferring to the organism the ability to produce even higher identity thereto. Exemplary PRK enzymes central metabolites from inorganic carbon, wherein the include YP 399994, and homologs thereof. SEQID NO:53 organism lacks the ability to fix carbon via the RPP cycle (for represents an E. coli codon optimized coding sequence for example, see FIG. 10). For example, the one or more exog this PRK of the present invention. In one aspect, the invention enous proteins can be selected from ribulose bisphosphate provides nucleic acid molecule and homologs, variants and carboxylase, phosphoglycerate kinase, glyceraldehyde-3P derivatives of SEQID NO:53. The nucleic acid sequence can dehydrogenase (phosphorylating), triose-phosphate have preferably 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, isomerase, fructose-bisphosphate aldolase, fructose-bispho 78%, 79%, 80%, 81-85%, 90-95%, 96-98%, 99%, 99.9% or sphatase, transketolase, Sedoheptulose-1.7-bisphosphate even higher identity to SEQID NO:53. The present invention aldolase, Sedoheptulose bisphosphatase, transketolase, provides nucleic acids each comprising or consisting of a ribose-5-phosphate isomerase, ribulose-5-phosphate-3-epi sequence which is a codon optimized version of one of the merase and phosphoribulokinase. The host organism can also wild-type PRK genes. In another embodiment, the invention express two or more, three or more, four or more, and the like, provides nucleic acids each encoding a polypeptide having including up to all the protein and enzymes that confer the the amino acid sequence ofYP 399994, or homologs thereof US 2015/003 7853 A1 Feb. 5, 2015

having 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, Production of Sugars as the Carbon-Based Products of 79%, 80%, 81-85%, 90-95%, 96-98%, 99%, 99.9% or even Interest higher identity thereto. 0152 Industrial production of chemical products from Production of Central Metabolites as the Carbon-Based biological organisms is often accomplished using a Sugar Products of Interest Source. Such as glucose or fructose, as the feedstock. Hence, in certain embodiments, the engineered chemoautotroph of 0146 In certain embodiments, the engineered chemoau the present invention produces Sugars including glucose and totroph of the present invention produces the central metabo fructose or Sugar phosphates including triose phosphates lites, including but not limited to citrate, malate. Succinate, (such as 3-phosphoglyceraldehyde and dihydroxyacetone fumarate, dihydroxyacetone, dihydroxyacetone phosphate, phosphate) as the carbon-based products of interest. Sugars 3-hydroxypropionate, pyruvate, as the carbon-based products and Sugar phosphates may also be interconverted. For of interest. The engineered chemoautotroph produces central example, glucose-6-phosphate isomerase (E.C. 5.3.1.9; e.g., metabolites as an intermediate or product of the carbon fixa E. coli Pgi) may interconvert between D-fructose 6-phos tion pathway or as a intermediate or product of host metabo phate and D-glucose-6-phosphate. Phosphoglucomutase lism. In such cases, one or more transporters may be (E.C. 5.4.2.2; e.g., E. coli Pgm) converts D-O-glucose-6-P to expressed in the engineered chemoautotroph to export the D-O-glucose-1-P. Glucose-1-phosphatase (E.C. 3.1.3.10; central metabolite from the cell. For example, one or more e.g., E. coli Agp) converts D-O-glucose-1-P to D-O-glucose. members of a family of enzymes known as C4-dicarboxylate Aldose 1-epimerase (E.C. 5.1.3.3; e.g., E. coli GalM) D-B- carriers serve to export Succinate from cells into the media glucose to D-O-glucose. The Sugars or Sugar phosphates may Janausch, 2002; Kim, 2007. These central metabolites can optionally be exported from the engineered chemoautotroph be converted to other products (FIG. 11). into the culture medium. 0147 In some embodiments, the engineered chemoau 0153 Sugar phosphates may be converted to their corre totroph may interconvert between different central metabo sponding Sugars via dephosphorylation that occurs either lites to produce alternate carbon-based products of interest. In intra- or extracellularly. For example, phosphatases such as a one embodiment, the engineered chemoautotroph produces glucose-6-phosphatase (E.C. 3.1.3.9) or glucose-1-phos aspartate by expressing one or more aspartate aminotrans phatase (E.C.3.1.3.10) can be introduced into the engineered ferase (E.C. 2.6.1.1), such as Escherichia coli AspC, to con chemoautotroph of the present invention. Exemplary phos Vert oxaloacetate and L-glutamate to L-aspartate and 2-oxo phatases include Homo sapiens glucose-6-phosphatase glutarate. G6PC (P35575), Escherichia coli glucose-1-phosphatase 0148. In another embodiment, the engineered chemoau Agp (P19926), E. cloacae glucose-1-phosphatase AgpE totroph produces dihydroxyacetone phosphate by expressing (Q6EV19) and Escherichia coli acid phosphatase YihX one or more dihydroxyacetone kinases (E.C. 2.7.1.29), such (POA8Y3). as C. freundii Dhak, to convert dihydroxyacetone and ATP to 0154 Sugar phosphates can be exported from the engi dihydroxyacetone phosphate. neered chemoautotroph into the culture media via transport 0149. In another embodiment, the engineered chemoau ers. Transporters for Sugar phosphates generally act as anti totroph produces serine as the carbon-based product of inter porters with inorganic phosphate. An exemplary triose est. The metabolic reactions necessary for serine biosynthesis phosphate transporter includes A. thaliana triose-phosphate include: phosphoglycerate dehydrogenase (E.C. 1.1.1.95), transporter APE2 (Genbank accession AT5G46110.4). phosphoserine transaminase (E.C. 2.6.1.52), phosphoserine Exemplary glucose-6-phosphate transporters include E. coli phosphatase (E.C. 3.1.3.3). Phosphoglycerate dehydroge sugar phosphate transporter UhpT (NP 418122.1), A. nase, Such as E. coli SerA, converts 3-phospho-D-glycerate thaliana glucose-6-phosphate transporter GPT1 and NAD" to 3-phosphonooxypyruvate and NADH. Phos (AT5G54800.1), A. thaliana glucose-6-phosphate trans phoserine transaminase. Such as E. coli SerC, interconverts porter GPT2, or homologs thereof. Dephosphorylation of between 3-phosphonooxypyruvate--L-glutamate and glucose-6-phosphate can also be coupled to glucose trans O-phospho-L-serine-2-oxoglutarate. Phosphoserine phos port, Such as Genbank accession numbers AAA16222, phatase, Such as E. coli SerB, converts O-phospho-L-serine to AAD19898, O43826. L-serine. 0155 Sugars can be diffusively effluxed from the engi 0150. In another embodiment, the engineered chemoau neered chemoautotroph into the culture media via permeases. totroph produces glutamate as the carbon-based product of Exemplary permeases include H. sapiens glucose transporter interest. The metabolic reactions necessary for glutamate bio GLUT-1, -3, or -7 (P11166, P11169, Q6PXP3), S. cerevisiae synthesis include glutamate dehydrogenase (E.C. 1.4.1.4: hexose transporter HXT-1, -4, or -6 (P32465, P32467, e.g., E. coli Gdh A) which converts C.-ketoglutarate, NH and P39003), Z. mobilis glucose uniporter Glf (P21906), Syn NADPH to glutamate. Glutamate can subsequently be con echocystis sp. 1148 glucose/fructose:H" symporter GlcP Verted to various other carbon-based products of interest, e.g., (T.C. 2.A.1.1.32: P15729) Zhang, 19897, Streptomyces liv according to the scheme presented in FIG. 12. idans major glucose (or 2-deoxyglucose) uptake transporter 0151. In another embodiment, the engineered chemoau GlcP (T.C. 2.A.1.1.35; Q7BEC4) van Wezel, 2005), Plas totroph produces itaconate as the carbon-based product of modium falciparum hexose (glucose and fructose)transporter interest. The metabolic reactions necessary for itaconate bio PfHT1 (T.C. 2.A.1.1.24: O97467), or homologs thereof. synthesis includeaconitate decarboxylase (E.C. 4.1.1.6. Such Alternatively, to enable active efflux of sugars from the engi as that from A. terreus) which converts cis-aconitate to ita neered chemoautotroph, one or more active transporters may conate and CO. Itaconate can Subsequently be converted to be introduced to the cell. Exemplary transporters include various other carbon-based products of interest, e.g., accord mouse glucose transporter GLUT 1 (AAB20846) or ing to the scheme presented in FIG. 12. homologs thereof. US 2015/003 7853 A1 Feb. 5, 2015 26

0156 Preferably, to prevent buildup of other storage poly e.g., E. coli Kate) may be included to convert the waste mers from Sugars or Sugar phosphates, the engineered produce hydrogen peroxide to molecular oxygen. chemoautotrophs of the present invention are attenuated in 0160 The fermentation products according to the above their ability to build other storage polymers such as glycogen, aspect of the invention are Sugars, which are exported into the starch, Sucrose, and cellulose using one or more of the fol media as a result of carbon fixation during chemoautotrophy. lowing enzymes: cellulose synthase (UDP forming) (E.C. The sugars can also be reabsorbed later and fermented, 2.4.1.12), glycogen synthase e.g. glgA1, glgA2 (E.C. 2.4.1. directly separated, or utilized by a co-cultured organism. This 21). Sucrose phosphate synthase (E.C. 2.4.1.14). Sucrose approach has several advantages. First, the total amount of phosphorylase (E.C. 3.1.3.24), alpha-1,4-glucan lyase (E.C. Sugars the cell can handle is not limited by maximum intra 4.2.2.13), glycogen synthase (E.C. 2.4.1.11), 1,4-alpha-glu cellular concentrations because the end-product is exported can branching enzyme (E.C. 2.4.1.18). to the media. Second, by removing the Sugars from the cell, 0157. The invention also provides engineered chemoau the equilibria of carbon fixation reactions are pushed towards totrophs that produce other Sugars such as Sucrose, Xylose, creating more Sugar. Third, during chemoautotrophy, there is lactose, maltose, pentose, rhamnose, galactose and arabinose no need to push carbon flow towards glycolysis. Fourth, the according to the same principles. A pathway for galactose Sugars are potentially less toxic than the fermentation prod biosynthesis is shown (FIG. 13). The metabolic reactions in ucts that would be directly produced. the galactose biosynthetic pathway are catalyzed by the fol 0.161 Chemoautotrophic fixation of carbon dioxide may lowing enzymes: alpha-D-glucose-6-phosphate ketol be followed by flux of carbon compounds to the creation and isomerase (E.C. 5.3.1.9; e.g., Arabidopsis thaliana PGI1), maintenance of biomass and to the storage of retrievable D-mannose-6-phosphate ketol-isomerase (E.C. 5.3.1.8; e.g., carbon in the form of glycogen, cellulose and/or Sucrose. Arabidopsis thaliana DIN9), D-mannose 6-phosphate 16 Glycogen is a polymer of glucose composed of linear alpha phosphomutase (E.C. 5.4.2.8, e.g., Arabidopsis thaliana 1.4-linkages and branched alpha 1.6-linkages. The polymeris ATPMM), mannose-1-phosphate guanylyltransferase (E.C. insoluble at degree of polymerization (DP) greater than about 2.7.7.22; e.g., Arabidopsis thaliana CYT), GDP-mannose 60,000 and forms intracellular granules. Glycogen in synthe 3.5-epimerase (E.C. 5.1.3.18: e.g., Arabidopsis thaliana sized in vivo via a pathway originating from glucose 1-phos GME), galactose-1-phosphate guanylyltransferase (E.C. 2.7. phate. Its hydrolysis can proceed through phosphorylation to n.n; e.g., Arabidopsis thaliana VTC2), L-galactose 1-phos glucose phosphates; via the internal cleavage of polymer to phate phosphatase (E.C. 3.1.3.n; e.g., Arabidopsis thaliana maltodextrins; via the Successive exo-cleavage to maltose; or VTC4). In one embodiment, the invention provides an engi via the concerted hydrolysis of polymer and maltodextrins to neered chemoautotroph comprising one or more exogenous maltose and glucose. Hence, an alternative biosynthetic route proteins from the galactose biosynthetic pathway. to glucose and/or maltose is via the hydrolysis of glycogen 0158. The invention also provides engineered chemoau which can optionally be exported from the cell as described totrophs that produce Sugar alcohols, such as Sorbitol, as the above. There area number of potential enzyme candidates for carbon-based product of interest. In certain embodiments, the glycogen hydrolysis (Table 1). engineered chemoautotroph produces D-sorbitol from D-C.- (0162. In addition to the above, another mechanism is glucose and NADPH via the enzyme polyol dehydrogenase described to produce glucose biosynthetically. In certain (E.C. 1.1.1.21; e.g., Saccharomyces cerevisiae GRE3). embodiments, the present invention provides for cloned 0159. The invention also provides engineered chemoau genes for glycogen hydrolyzing enzymes to hydrolyze gly totrophs that produce Sugar derivatives, such as ascorbate, as cogen to glucose and/or maltose and transport maltose and the carbon-based product of interest. In certain embodiments, glucose from the cell. Preferred enzymes are set forth below the engineered chemoautotroph produces ascorbate from in Table 1. Glucose is transported from the engineered galactose via the enzymes L-galactose dehydrogenase (E.C. chemoautotroph by a glucose/hexose transporter. This alter 1.1.1.122; e.g., Arabidopsis thaliana AtáG33670) and L-ga native allows the cell to accumulate glycogen naturally but lactonolactone oxidase (E.C. 1.3.3.12; e.g., Saccharomyes adds enzyme activities to continuously return it to maltose or cerevisiae ATGLDH). Optionally, a catalase (E.C. 1.11.1.6: glucose units that can be collected as a carbon-based product. TABLE 1. Enzymes for hydrolysis of glycogen

E.C. Enzyme number Function C-amylase 3.2.1.1 endohydrolysis of 14-C-D-glucosidic linkages in polysaccharides 3-amylase 3.2.1.2 hydrolysis of 1,4-C-D-glucosidic linkages in polysaccharides so as to remove Successive maltose units from the non-reducing ends of the chains Y-amylase 3.21.3 hydrolysis of terminal 1.4-linked C-D-glucose residues successively rom non-reducing ends of the chains with release off-D-glucose glucoamylase 3.21.3 hydrolysis of terminal 1.4-linked C-D-glucose residues successively rom non-reducing ends of the chains with release off-D-glucose isoamylase 3.21.68 hydrolysis of (1->6)-C-D-glucosidic branch linkages in glycogen, amylopectin and their beta-limit dextrins pullulanase 3.2.1.41 hydrolysis of (1->6)-C-D-glucosidic linkages in pullulan a linear polymer of C-(1->6)-linked maltotriose units and in amylopectin and glycogen, and the C- and 3-limit dextrins of amylopectin and glycogen US 2015/003 7853 A1 Feb. 5, 2015

TABLE 1-continued Enzymes for hydrolysis of glycogen

E.C. Enzyme number Function amylomaltase 24.125 transfers a segment of a 14-C-D-glucan to a new position in an acceptor, which may be glucose or a 14-C-D-glucan (part of yeast debranching system) amylo-C-1,6-glucosidase 3.2.1.33 debranching enzyme: hydrolysis of (1->6)-C-D-glucosidic branch linkages in glycogen phosphorylase limit dextrin phosphorylase kinase 2.7.11.19 2 ATP + phosphorylase b = 2 ADP + phosphorylase a phosphorylase 2.4.1.1 (1,4-C-D-glucosyl) + phosphate = (14-C-D-glucosyl) + C-D-glucose-1-phosphate

Production of Fermentative Products as the Carbon-Based NADP-dependent alcohol dehydrogenases include Products of Interest Moorella sp. HUC22-1 Adha (YP 430754) Inokuma, 2007, and homologs thereof. 0163. In certain embodiments, the engineered chemoau 0167. In addition to providing exogenous genes or endog totroph of the present invention produces alcohols such as enous genes with novel regulation, the optimization of etha ethanol, propanol, isopropanol, butanol and fatty alcohols as nol production in engineered chemoautotrophs preferably the carbon-based products of interest. requires the elimination or attenuation of certain host enzyme 0164. In some embodiments, the engineered chemoau activities. These include, but are not limited to, pyruvate totroph of the present invention is engineered to produce oxidase (E.C. 1.2.2.2), D-lactate dehydrogenase (E.C. 1.1.1. ethanol via pyruvate fermentation. Pyruvate fermentation to 28), acetate kinase (E.C. 2.7.2.1), phosphate acetyltrans ethanol is well know to those in the art and there are several ferase (E.C. 2.3.1.8), citrate synthase (E.C. 2.3.3.1), phospho pathways including the pyruvate decarboxylase pathway, the enolpyruvate carboxylase (E.C. 4.1.1.3.1). The extent to pyruvate synthase pathway and the pyruvate formate-lyase which these manipulations are necessary is determined by the pathway (FIG. 14). The reactions in the pyruvate decarboxy observed byproducts found in the bioreactor or shake-flask. lase pathway are catalyzed by the following enzymes: pyru For instance, observation of acetate would suggest deletion of vate decarboxylase (E.C. 4.1.1.1) and alcoholdehydrogenase pyruvate oxidase, acetate kinase, and/or phosphotransacety (E.C. 1.1.1.1 or E.C. 1.1.1.2). The reactions in the pyruvate lase enzyme activities. In another example, observation of synthase pathway are catalyzed by the following enzymes: D-lactate would suggest deletion of D-lactate dehydrogenase pyruvate synthase (E.C. 1.2.7.1), acetaldehyde dehydroge enzyme activities, whereas observation of Succinate, malate, nase (E.C. 1.2.1.10 or E.C. 1.2.1.5), and alcohol dehydroge fumarate, oxaloacetate, or citrate would suggest deletion of nase (E.C. 1.1.1.1 or E.C. 1.1.1.2). The reactions in the pyru citrate synthase and/or PEP carboxylase enzyme activities. vate formate-lyase pathway are catalyzed by the following enzymes: pyruvate formate-lyase (E.C. 2.3.1.54), acetalde Production of Ethylene, Propylene, 1-Butene, 1,3-Butadiene, hyde dehydrogenase (E.C. 1.2.1.10 or E.C. 1.2.1.5), and alco Acrylic Acid, Etc. as the Carbon-Based Products of Interest holdehydrogenase (E.C. 1.1.1.1 or E.C. 1.1.1.2). 0.168. In certain embodiments, the engineered chemoau 0.165. In some embodiments, the engineered chemoau totroph of the present invention produces ethylene, propy totroph of the present invention is engineered to produce lene, 1-butene, 1,3-butadiene and acrylic acid as the carbon lactate via pyruvate fermentation. Lactate dehydrogenase based products of interest. Ethylene and/or propylene may be (E.C. 1.1.1.28) converts NADH and pyruvate to D-lactate. produced by either (1) the dehydration of ethanol or propanol Exemplary enzymes include E. colildhA. (E.C. 4.2.1.-), respectively or (2) the decarboxylation of acry 0166 Currently, fermentative products such as ethanol, late or crotonate (E.C. 4.1.1.-), respectively. While many butanol, lactic acid, formate, acetate produced in biological dehydratases exist in nature, none has been shown to convert organisms employ a NADH-dependent processes. However, ethanol to ethylene (or propanol to propylene, propionic acid depending on the energy conversion pathways added to the to acrylic acid, etc.) by dehydration. Genes encoding engineered chemoautotroph, the cell may produce NADPH enzymes in the 4.2.1.x or 4.1.1.x group can be identified by or reduced ferredoxin as the reducing cofactor. NADPH is searching databases such as GenBank using the methods used mostly for biosynthetic operations in biological organ described above, expressed in any desired host (such as isms, e.g., cell for growth, division, and for building up Escherichia coli, for simplicity), and that host can be assayed chemical stores, such as glycogen, Sucrose, and other macro for the appropriate enzymatic activity. A high-throughput molecules. Using natural or engineered enzymes that utilize screen is especially useful for screening many genes and NADPH or reduced ferredoxin as a source of reducing power variants of genes generated by mutagenesis (i.e., error-prone instead of NADH would allow direct use of chemoau PCR, synthetic libraries, chemical mutagenesis, etc.). totrophic reducing power towards formation of normally fer 0169. The ethanol dehydratase gene, after development to mentative byproducts. Accordingly, the present invention a suitable level of activity, can then be expressed in an etha provides methods for producing fermentative products Such nologenic organism to enable that organism to produce eth as ethanol by expressing NADP-dependent or ferredoxin ylene. For instance, coexpress native or evolved ethanol dehy dependent enzymes. NADP-dependent enzymes include dratase gene into an organism that already produces ethanol, alcoholdehydrogenase NADP' (E.C. 1.1.1.2) and acetalde then test a culture by GC analysis of offgas for ethylene hyde dehydrogenase NAD(P) (E.C. 1.2.1.5). Exemplary production that is significantly higher than without the added US 2015/003 7853 A1 Feb. 5, 2015 28 gene or via a high-throughput assay adapted from a colori 3-hydroxybutyrate, which can be simultaneously decarboxy metric test Larue, 1973. It may be desirable to eliminate lated and dehydrated to yield propylene. Optionally, the 3-hy ethanol-export proteins from the production organism to pre droxybutyryl-CoA is polymerized to form poly(3-hydroxy vent ethanol from being secreted into the medium and pre butyrate), a solid compound which can be extracted from the venting its conversion to ethylene. fermentation medium and simultaneously depolymerizied, 0170 Alternatively, acryloyl-CoA can be produced as hydrolyzed, dehydrated, and decarboxyated to yield propy described above, and acryloyl-CoA (E.C.3.1.2.-). lene (U.S. patent application Ser. No. 12/527,714, 2008). such as the acuN gene from Halomonas sp. HTNK1, can Production of Fatty Acids, their Intermediates and Deriva convert acryloyl-CoA into acrylate, which can be thermally tives as the Carbon-Based Products of Interest decarboxylated to yield ethylene. 0.175. In certain embodiments, the engineered chemoau 0171 Alternatively, genes encoding ethylene-forming totroph of the present invention produces fatty acids, their enzyme activities (EfE. E.C. 1.14.17.4) from various sources intermediates and their derivatives as the carbon-based prod are expressed. Exemplary enzymes include Pseudomonas ucts of interest. The engineered chemoautotrophs of the syringae pv. Phaseolicola (BAA02477), P. syringae pv. Pisi present invention can be modified to increase the production (AAD16443), Ralstoma Solanacearum (CAD18680). Opti of acyl-ACP or acyl-CoA, to reduce the catabolism of fatty mizing production may require further metabolic engineering acid derivatives and intermediates, or to reduce feedback (improving production of alpha-ketogluterate, recycling Suc inhibition at specific points in the biosynthetic pathway used cinate as two examples). for fatty acid products. In addition to modifying the genes 0172. In some embodiments, the engineered chemoau described herein, additional cellular resources can be diverted totroph of the present invention is engineered to produce to over-produce fatty acids. For example the lactate. Succinate ethylene from methionine. The reactions in the ethylene bio and/or acetate pathways can be attenuated and the fatty acid synthesis pathway are catalyzed by the following enzymes: biosynthetic pathway precursors acetyl-CoA and/or malonyl methionine adenosyltransferase (E.C. 2.5.1.6), 1-aminocy CoA can be overproduced. clopropane-1-carboxylate synthase (E.C. 4.4.1.14) and 0176). In one embodiment, the engineered chemoautotro 1-aminocyclopropane-1-carboxylate oxidase (E.C. 1.14.17. phs of the present invention can be engineered to express 4). certain fatty acid synthase activities (FAS), which is a group 0173. In some embodiments, the engineered chemoau of peptides that catalyze the initiation and elongation of acyl totroph of the present invention is engineered to produce chains Marrakchi, 2002a). The acyl carrier protein (ACP) propylene as the carbon-based product of interest. In one and the enzymes in the FAS pathway control the length, embodiment, the engineered chemoautotroph is engineered degree of Saturation and branching of the fatty acids pro to express one or more of the following enzymes: propionyl duced, which can be attenuated or over-expressed. Such CoA synthase (E.C. 6.2.1.-, E.C. 4.2.1.- and E.C. 1.3.1.-). enzymes include accABCD, FabD, FabH, FabG, FabA, propionyl-CoA transferase (E.C. 2.8.3.1), dehydro FabZ, Fabi, FabK, FabL, FabM, FabB, FabF, and homologs genase (E.C. 1.2.1.3 or E.C. 1.2.1.4), alcohol dehydrogenase thereof. (E.C. 1.1.1.1 or E.C. 1.1.1.2), and alcohol dehydratase (E.C. 0177. In another embodiment, the engineered chemoau 4.2.1.-). Propionyl-CoA synthase is a multi-functional totrophs of the present invention form fatty acid byproducts enzyme that converts 3-hydroxypropionate, ATP and through ACP-independent pathways, for example, the path NADPH to propionyl-CoA. Exemplary propionyl-CoA syn way described recently by Dellomonaco, 2011 involving thases include AAL47820, and homologs thereof. SEQ ID reversal of beta oxidation. Enzymes involved in these path NO:30 represents the E. coli codon optimized coding ways include such genes as atoB, fadA, fadB, fadD, fadE, sequence for this propionyl-CoA synthase of the present fadI, fadK, fad.J., paaZydio, yfcY, yfc7, ydiD, and homologs invention. In one aspect, the invention provides nucleic acid thereof. molecule and homologs, variants and derivatives of SEQID 0178. In one aspect, the fatty acid biosynthetic pathway NO:30. The nucleic acid sequence can have preferably 70%, precursors acetyl-CoA and malonyl-CoA can be overpro 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, duced in the engineered chemoautotroph of the present inven 81-85%,90-95%, 96-98%, 99%, 99.9% or even higher iden tion. Several different modifications can be made, either in tity to SEQID NO:30. The present invention provides nucleic combination or individually, to the host cell to obtain acids each comprising or consisting of a sequence which is a increased acetyl CoA/malonyl CoA/fatty acid and fatty acid codon optimized version of the wild-type propionyl-CoA derivative production. To modify acetyl-CoA and/or malo synthase gene. In another embodiment, the invention pro nyl-CoA production, the expression of acetyl-CoA carboxy vides a nucleic acid encoding a polypeptide having the amino lase (E.C. 6.4.1.2) can be modulated. Exemplary genes acid sequence of SEQID NO:31. Propionyl-CoA transferase include accABCD (AAC73296) or homologs thereof. To converts propionyl-CoA and acetate to acetyl-CoA and pro increase acetyl CoA production, the expression of several pionate. Exemplary enzymes include Ralstonia eutropha pct genes may be altered including pdh, panK, aceEF, (encoding and homologs thereof. Aldehyde dehydrogenase converts the Elp dehydrogenase component and the E2p dihydroli propionate and NADPH to propanal. Alcohol dehydrogenase poamide acyltransferase component of the pyruvate and converts propanal and NADPH to 1-propanol. Alcohol dehy 2-oxoglutarate dehydrogenase complexes), fabH/fab)/fabG/ dratase converts 1-propanol to propylene. acplp/fabF, and in Some examples additional nucleic acid 0.174. In another embodiment, E. coli thiolase atoB (E.C. encoding fatty-acyl-CoA reductases and aldehyde decarbo 2.3.1.9) converts 2 acetyl-CoA into acetoacetyl-CoA, and C. nylases. Exemplary enzymes include pdh (BAB34380, acetobutyllicum hbd (E.C. 1.1.1.157) converts acetoacetyl AAC73227, AAC73226), panK (also known as coaA, CoA and NADH into 3-hydroxybutyryl-CoA. E. coli tesB AAC76952), aceEF (AAC73227, AAC73226), fabH (EC 3.1.2.20) or C. acetobutylicum ptb and buk (E.C. 2.3.1.19 (AAC74175), fablD (AAC74.176), fabG (AAC74177), acpp and 2.7.2.7 respectively) convert 3-hydroxybutyryl-CoA into (AAC74178), fabF (AAC74179). US 2015/003 7853 A1 Feb. 5, 2015 29

0179 Genes to be knocked-out or attenuated include fadE, intermediates including branched chain fatty acids can be gpSA, ldha, pflb, adhE, pta, poXB, ackA, and/or ackB. Exem enhanced. Branched chain fatty acid production can be plary enzymes include fadE (AAC73325), gspa achieved through the expression of one or more of the fol (AAC76632), ldha (AAC74462), pflb (AAC73989), adhE lowing enzymes Kaneda, 1991: branched chain amino acid (AAC74323), pta (AAC75357), poxB (AAC73958), ackA aminotransferase to produce O-ketoacids from branched (AAC75356), ackB (BAB81430), and homologs thereof. chain amino acids such as isoleucine, leucine and Valine (E.C. 0180 Additional potential modifications include the fol 2.6.1.42), branched chain C-ketoacid dehydrogenase com lowing. To achieve fatty acid overproduction, lipase (E.C. plexes which catalyzes the oxidative decarboxylation of 3.1.1.3) which produce triacylglyerides from fatty acids and C.-ketoacids to branched chain acyl-CoA (bkd, E.C. 1.2.4.4) glycerol and in Some cases serves as a Suppressor of fabA can Denoya, 1995, dihydrolipoyl dehydrogenase (E.C. 1.8.1.4), be included in the engineered chemoautotroph of the present beta-ketoacyl-ACP synthase with branched chain acyl CoA invention. Exemplary enzymes include Saccharomyces cer specificity (E.C. 2.3.1.41) Li, 2005, crotonyl-CoA reduc evisiae LipA (CAA89087), Saccharomyces cerevisiae TGL2 tase (E.C. 1.3.1.8, 1.3.1.85 or 1.3.1.86) Han, 1997), and CAA98876, and homologs thereof. To remove limitations on isobutyryl-CoA mutase (large subunit E.C. 5.4.99.2 and the pool of acyl-CoA, the D311E mutation in plsB small subunit E.C. 5.4.99.13). Exemplary branched chain (AAC77011) can be introduced. amino acid aminotransferases include E. coli ilvE (YP 0181. To engineer an engineered chemoautotroph for the 026247), Lactococcus lactis ilvE (AAF34406), Pseudomo production of a population of fatty acid derivatives with nas putida ilvE (NP 745648), Streptomyces coelicolor ilvE homogeneous chain length, one or more endogenous genes (NP 629657), and homologs thereof. Branched chain C-ke can be attenuated or functionally deleted and one or more toacid dehydrogenase complexes consist of El C/B (decar thioesterases can be expressed. Thioesterases (E.C.3.1.2.14) boxylase), E2 (dihydrolipoyl transacylase) and E3 (dihydro generate acyl-ACP from fatty acid and ACP. For example, lipoyl dehydrogenase) subunits. The industrial host E. coli C10 fatty acids can be produced by attenuating endogenous has only the E3 component as a part of its pyruvate dehydro C18 thioesterases (for example, E. coli tesAAAC73596 and genase complex (lpd, E.C. 1.8.1.4, NP 414658) and so it POADA1, and homologs thereof), which uses C18:1-ACP, requires the E1C/B and E2 bkd proteins. Exemplary C.-ke and expressing a C10 thioesterase, which uses C10-ACP, toacid dehydrogenase complexes include Streptomyces coeli thus, resulting in a relatively homogeneous population of color bkdA1 (NP 628006) E1C. (decarboxylase compo fatty acids that have a carbon chain length of 10. In another nent), S. coelicolor bkdB2 (NP 628005) E1 B example, C14 fatty acid derivatives can be produced by (decarboxylase component), S. coelicolor bkdA3 (NP attenuating endogenous thioesterases that produce non-C14 638004) E2 (dihydrolipoyl transacylase); or S. coelicolor fatty acids and expressing the C14 thioesterase, which uses bkdA2 (NP 733618) E1C. (decarboxylase component), S. C14-ACP. In yet another example, C12 fatty acid derivatives coelicolorbkdB2 (NP 628019) E13 (decarboxylase compo can be produced by expressing thioesterases that use C12 nent), S. coelicolor bkdC2 (NP 628018) E2 (dihydrolipoyl ACP and attenuating thioesterases that produce non-C12 fatty transacylase); or S. avermitilis bkdA (BAC72074) E1C. (de acids. Exemplary C8:0 to C10:0 thioesterases include carboxylase component), S. avermitilis bkdB (BAC72075) Cuphea hookeriana fat32 (AAC49269) and homologs E13 (decarboxylase component), S. avermitilis bkdC thereof. Exemplary C12:0 thioesterases include Umbellu (BAC72076) E2 (dihydrolipoyl transacylase); S. avermitilis laria California fath3 (Q41635) and homologs thereof. Exem bkdF (E.C.1.2.4.4, BAC72088) E1C. (decarboxylase compo plary C14:0thioesterases include Cinnamonum camphorum nent), S. avermitilis bkdG (BAC72089) E1 B (decarboxylase fath3 (Q39473). Exemplary C14:0 to C16:0 thioesterases component), S. avermitilis bkdH (BAC72090) E2 (dihydro include Cuphea hookeriana fat33 (AAC49269). Exemplary lipoyl transacylase); B. subtilis bkdAA (NP 390288) E1C. C16:0 thioesterases include Arabidopsis thaliana fat3 (decarboxylase component), B. subtilis bkdAB (NP (CAA85388), Cuphea hookeriana fat31 (Q39513) and 390288) E13 (decarboxylase component), B. subtilis bkdB homologs thereof. Exemplary C18:1 thioesterases include (NP 390288) E2 (dihydrolipoyl transacylase); or P putida Arabidopsis thaliana fatA (NP 189147, NP 193041), Ara bkdA1 (AAA65614) E1C. (decarboxylase component), P. bidopsis thaliana fath3 (CAA85388), Bradyrhizobium putida bkdA2 (AAA65615) E13 (decarboxylase compo japonicum fatA (CAC39106), Cuphea hookeriana fatA nent), PputidabkdC (AAA65617) E2 (dihydrolipoyltransa (AAC72883), Escherichia coli tesA (NP 415027) and cylase); and homologs thereof. An exemplary dihydrolipoyl homologs thereof. Acetyl CoA, malonyl CoA, and fatty acid dehydrogenase is E. colilpd(NP 414658) E3 and homologs overproduction can be verified using methods known in the thereof. Exemplary beta-ketoacyl-ACP synthases with art, for example by using radioactive precursors, HPLC, and branched chain acyl CoA specificity include Streptomyces GC-MS subsequent to cell lysis. coelicolor fabH1 (NP 626634), ACP (NP 626635) and 0182. In yet another aspect, fatty acids of various lengths fabF (NP 626636); Streptomyces avermitilis fabH3 (NP can be produced in the engineered chemoautotroph by 823466), fabC3 (NP 823467), fabF (NP 823468); Bacillus expressing or overexpressing acyl-CoA synthase peptides subtilis fabH A (NP 389015), fabH B (NP 388898), ACP (E.C. 2.3.1.86), which catalyzes the conversion offatty acids (NP 389474), fabF (NP 389016); Stenotrophomonas mal to acyl-CoA. Some acyl-CoA synthase peptides, which are tophilia SmalDRAFTO818 (ZP 01643059), non-specific, accept other Substrates in addition to fatty acids. SmalDRAFT0821 (ZP 01643063), SmalDRAFT0822 0183 In yet another aspect, branched chain fatty acids, (ZP 01643064); Legionella pneumophila fabH (YP their intermediates and their derivatives can be produced in 123672), ACP (YP 123675), fabF (YP 123676); and the engineered chemoautotroph as the carbon-based products homologs thereof. Exemplary crotonyl-CoA reductases of interest. By controlling the expression of endogenous and include Streptomyces coelicolor ccr (NP 630556), Strepto heterologous enzymes associated with branched chain fatty myces cinnamonensis ccr (AAD53915), and homologs acid biosynthesis, the production of branched chain fatty acid thereof. Exemplary isobutyryl-CoA mutases include Strepto US 2015/003 7853 A1 Feb. 5, 2015 30 myces coelicolor icmA & icmB (NP 629554 and pressing a fatty alcohol forming acyl-CoA reductase (FAR, NP 630904), Streptomyces cinnamomensis icmA and icmB E.C. 1.1.1.), or an acyl-CoA reductases (E.C. 1.2.1.50) and (AAC08713 and AJ24.6005), and homologs thereof. Addi alcoholdehydrogenase (E.C. 1.1.1.1) or a combination of the tionally or alternatively, endogenous genes that normally lead foregoing to produce fatty alcohols from acyl-CoA. Herein to straight chain fatty acids, their intermediates, and deriva after fatty alcohol forming acyl-CoA reductase (FAR. E.C. tives may be attenuated or deleted to eliminate competing 1.1.1.*), acyl-CoA reductases (E.C. 1.2.1.50) and alcohol pathways. Enzymes that interfere with production of dehydrogenase (E.C. 1.1.1.1) are collectively referred to as branched chain fatty acids include B-ketoacyl-ACP synthase fatty alcohol forming peptides. Some fatty alcohol forming II (E.C. 2.3.1.41) and B-ketoacyl-ACP synthase III (E.C. 2.3. peptides are non-specific and catalyze other reactions as well: 1.41) with Straight chain acyl CoA specificity. Exemplary for example, some acyl-CoA reductase peptides accept other enzymes for deletion include E. coli fabF (NP 415613) and substrates in addition to fatty acids. Exemplary fatty alcohol fabH (NP 415609). forming acyl-CoA reductases include Acinetobacter baylvi 0184. In yet another aspect, fatty acids, their intermediates ADP1 acrl (AAC45217), Simmondsia chinensis far and their derivatives with varying degrees of saturation can be (AAD38039), Mus musculus mfarl (AAHO7178), Mus mus produced in the engineered chemoautotroph as the carbon culus mfar2 (AAH55759), Acinetobacter sp. M1 acrM1, based products of interest. In one aspect, hosts are engineered Homo sapiens hfar (AAT42129), and homologs thereof. Fatty to produce unsaturated fatty acids by over-expressing B-ke alcohols can be used as Surfactants. toacyl-ACP synthase I (E.C. 2.3.1.41), or by growing the host 0187. Many fatty alcohols are derived from the products of at low temperatures (for example less than 37°C.). FabB has fatty acid biosynthesis. Hence, the production of fatty alco preference to cis-ödecenoyl-ACP and results in unsaturated hols can be controlled by engineering fatty acid biosynthesis fatty acid production in E. coli. Over-expression of FabB in the engineered chemoautotroph. The chain length, branch results in the production of a significant percentage of unsat ing and degree of saturation of fatty acids and their interme urated fatty acids de Mendoza, 1983. These unsaturated diates can be altered using the methods described herein, fatty acids can then be used as intermediates in hosts that are thereby affecting the nature of the resulting fatty alcohols. engineered to produce fatty acids derivatives, such as fatty 0188 As mentioned above, through the combination of alcohols, esters, waxes, olefins, alkanes, and the like. Alter expressing genes that Support brFA synthesis and alcohol natively, the repressor offatty acid biosynthesis, E. coli FabR synthesis, branched chain alcohols can be produced. For (NP 418398), can be deleted, which can also result in example, when an alcohol reductase Such as Acrl from Acine increased unsaturated fatty acid production in E. coli Zhang, tobacter baylvi ADP1 is coexpressed with a bkd operon, E. 2002. Further increase in unsaturated fatty acids is achieved coli can synthesize isopentanol, isobutanol or 2-methyl by over-expression of heterologous trans-2, cis-3-decenoyl butanol. Similarly, when Acrl is coexpressed with ccrificm ACP isomerase and controlled expression of trans-2-enoyl genes, E. coli can synthesize isobutanol. ACP reductase II Marrakchi, 2002b, while deleting E. coli Fabi (trans-2-enoyl-ACP reductase, E.C. 1.3.1.9, Production of Fatty Esters as the Carbon-Based Products of NP 415804) or homologs thereof in the host organism. Interest Exemplary 3-ketoacyl-ACP synthase I include Escherichia coli fabB (BAA16180) and homologs thereof. Exemplary 0189 Inanother aspect, engineered chemoautotrophs pro trans-2, cis-3-decenoyl-ACP isomerase include Streptococ duce various lengths of fatty esters (biodiesel and waxes) as cus mutans UA159 FabM (DAA05501) and homologs the carbon-based products of interest. Fatty esters can be thereof. Exemplary trans-2-enoyl-ACP reductase II include produced from acyl-CoAs and alcohols. The alcohols can be Streptococcus pneumoniae R6 FabK (NP 357969) and provided in the fermentation media, produced by the engi homologs thereof. To increase production of monounsat neered chemoautotroph itself or produced by a co-cultured urated fatty acids, the Sfa gene, Suppressor of FabA, can be organism. over-expressed Rock, 1996. Exemplary proteins include 0190. In some embodiments, one or more alcohol AAN79592 and homologs thereof. One of ordinary skill in O-acetyltransferases is expressed in the engineered chemoau the art would appreciate that by attenuating fabA, or over totroph to produce fatty esters as the carbon-based product of expressing fabB and expressing specific thioesterases (de interest. Alcohol O-acetyltransferase (E.C. 2.3.1.84) cata scribed above), unsaturated fatty acids, their derivatives, and lyzes the reaction of acetyl-CoA and an alcohol to produce products having a desired carbon chain length can be pro CoA and an acetic ester. In some embodiments, the alcohol duced. O-acetyltransferase peptides are co-expressed with selected 0185. In some examples the fatty acid or intermediate is thioesterase peptides, FAS peptides and fatty alcohol forming produced in the cytoplasm of the cell. The cytoplasmic con peptides to allow the carbon chain length, Saturation and centration can be increased in a number of ways, including, degree of branching to be controlled. In other embodiments, but not limited to, binding of the fatty acid to coenzyme A to the bkd operon can be co-expressed to enable branched fatty form an acyl-CoA thioester. Additionally, the concentration acid precursors to be produced. of acyl-CoAS can be increased by increasing the biosynthesis 0191 Alcohol O-acetyltransferase peptides catalyze other of CoA in the cell. Such as by over-expressing genes associ reactions such that the peptides accept other Substrates in ated with pantothenate biosynthesis (panD) or knocking out addition to fatty alcohols or acetyl-CoA thioester. Other sub the genes associated with glutathione biosynthesis (glu strates include other alcohols and other acyl-CoA thioesters. tathione synthase). Modification of such enzymes and the development of assays for characterizing the activity of a particular alcohol Production of Fatty Alcohols as the Carbon-Based Products O-acetyltransferase peptides are within the scope of a skilled of Interest artisan. Engineered O-acetyltransferases and O-acyltrans 0186. In yet further aspects, hosts cells are engineered to ferases can be created that have new activities and specifici convert acyl-CoA to fatty alcohols by expressing or overex ties for the donor acyl group or acceptor alcohol moiety. US 2015/003 7853 A1 Feb. 5, 2015

0.192 Alcohol acetyl (AATs, E.C. 2.3.1.84), (AAN79592, AAC44390), B-ketoacyl-ACP synthase I (E.C. which are responsible for acyl acetate production in various 2.3.1.41, BAA 16180), and secC null mutant suppressors plants, can be used to produce medium chain length waxes, (cold shock proteins) gns.A and gnsB (ABD18647 and Such as octyl octanoate, decyl octanoate, decyl decanoate, AAC74076). In some examples, the endogenous fabF gene and the like. Fatty esters, synthesized from medium chain can be attenuated, thus, increasing the percentage of palmi alcohol (such as C6, C8) and medium chain acyl-CoA (or toleate (C 16:1) produced. fatty acids, such as C6 or C8) have a relative low melting 0.197 Optionally a wax ester exporter such as a member of point. For example, hexylhexanoate has a melting point of the FATP family is used to facilitate the release of waxes or –55°C. and octyloctanoate has a melting point of-18 to -17° esters into the extracellular environment from the engineered C. The low melting points of these compounds make them chemoautotroph. An exemplary wax esterexporter that can be good candidates for use as biofuels. Exemplary alcohol used is fatty acid (long chain) transport protein CG7400-PA, acetyltransferases include Fragariaxananassa SAAT isoform. A from D. melanogaster (NP 524723), or homologs (AAG 13130) Aharoni, 20007, Saccharomyces cerevisiae thereof. Atfp1 (NP 015022), and homologs thereof. 0193 In some embodiments, one or more wax synthases 0198 The centane number (CN), viscosity, melting point, (E.C. 2.3.1.75) is expressed in the engineered chemoau and heat of combustion for various fatty acid esters have been totroph to produce fatty esters including waxes from acyl characterized in for example, Knothe, 2005. Using the CoA and alcohols as the carbon-based product of interest. teachings provided herein the engineered chemoautotroph Wax synthase peptides are capable of catalyzing the conver can be engineered to produce any one of the fatty acid esters sion of an acyl-thioester to fatty esters. Some wax synthase described in Knothe, 2005. peptides can catalyze other reactions, such as converting short chain acyl-CoAS and short chain alcohols to produce fatty Production of Alkanes as the Carbon-Based Products of esters. Methods to identify wax synthase activity are provided Interest in U.S. Pat. No. 7,118,896, which is herein incorporated by reference. Medium-chain waxes that have low melting points, 0199. In another aspect, engineered chemoautotrophs pro Such as octyl octanoate and octyl decanoate, are good candi duce alkanes of various chain lengths (hydrocarbons) as the dates for biofuel to replace triglyceride-based biodiesel. carbon-based products of interest. Many alkanes are derived Exemplary wax synthases include Acinetobacter baylvi from the products of fatty acid biosynthesis. Hence, the pro ADP1 wsadp1, Acinetobacter baylvi ADP1 wax-dgaT duction of alkanes can be controlled by engineering fatty acid (AAO17391) Kalscheuer, 20037, Saccharomyces cerevisiae biosynthesis in the engineered chemoautotroph. The chain Eeb1 (NP 015230), Saccharomyces cerevisiae YMR210w length, branching and degree of Saturation of fatty acids and (NP 013937), Simmondsia chinensis acyltransferase their intermediates can be altered using the methods (AAD38041), Mus musculus Dgat214 (Q6E1M8), and described herein. The chain length, branching and degree of homologs thereof. saturation of alkanes can be controlled through their fatty acid 0194 In other aspects, the engineered chemoautotrophs biosynthesis precursors. are modified to produce a fatty ester-based biofuel by 0200. In certain aspects, fatty can be converted expressing nucleic acids encoding one or more wax ester to alkanes and CO in the engineered chemoautotroph via the synthases in order to confer the ability to synthesize a satu expression of decarbonylases Cheesbrough, 1984; Dennis, rated, unsaturated, or branched fatty ester. In some embodi 1991. Exemplary enzymes include Arabidopsis thaliana ments, the wax ester synthesis proteins include, but are not cer1 (NP 171723), Oryza sativa cer1 CER1 (AAD29719) limited to: fatty acid elongases, acyl-CoA reductases, acyl and homologs thereof. transferases or wax synthases, fatty acyl transferases, dia 0201 In another aspect, fatty alcohols can be converted to cylglycerol acyltransferases, acyl-coA wax alcohol acyl alkanes in the engineered chemoautotroph via the expression transferases, bifunctional wax ester synthase/acyl-CoA: of terminal alcohol oxidoreductases as in Vibrio firmissii M1 diacylglycerol acyltransferase selected from a multienzyme Park, 2005). complex from Simmondsia chinensis, Acinetobacter sp. strain ADP1 (formerly Acinetobacter calcoaceticus ADP1), Pseudomonas aeruginosa, Fundibacter jadensis, Arabidop Production of Olefins as the Carbon-Based Products of sis thaliana, or Alkaligenes eutrophus. In one embodiment, Interest the fatty acid elongases, acyl-CoA reductases or wax syn 0202 In another aspect, engineered chemoautotrophs pro thases are from a multienzyme complex from Alkaligenes duce olefins (hydrocarbons) as the carbon-based products of eutrophus and other organisms known in the literature to interest. Olefins are derived from the intermediates and prod produce wax and fatty acid esters. ucts of fatty acid biosynthesis. Hence, the production of ole 0.195. Many fatty esters are derived from the intermediates fins can be controlled by engineering fatty acid biosynthesis and products offatty acid biosynthesis. Hence, the production in the engineered chemoautotroph. Introduction of genes of fatty esters can be controlled by engineering fatty acid affecting the production of unsaturated fatty acids, as biosynthesis in the engineered chemoautotroph. The chain described above, can result in the production of olefins. Simi length, branching and degree of saturation of fatty acids and larly, the chain length of olefins can be controlled by express their intermediates can be altered using the methods ing, overexpressing or attenuating the expression of endog described herein, thereby affecting the nature of the resulting enous and heterologous thioesterases which control the chain fatty esters. length of the fatty acids that are precursors to olefin biosyn 0196. Additionally, to increase the percentage of unsatur thesis. Also, by controlling the expression of endogenous and ated fatty acid esters, the engineered chemoautotroph can also heterologous enzymes associated with branched chain fatty overexpress Sfa which encodes a suppressor of fabA acid biosynthesis, the production of branched chain olefins US 2015/003 7853 A1 Feb. 5, 2015 32 can be enhanced. Methods for controlling the chain length or homologs thereof. Exemplary transporters include and branching of fatty acid biosynthesis intermediates and AAU44368, NP 188746, NP 175557, AAN73268 or products are described above. homologs thereof. Production of co-Cyclic Fatty Acids and their Derivatives as 0208. The transport protein, for example, can also be an the Carbon-Based Products of Interest efflux protein selected from: AcrAB (NP 414996.1, 0203. In another aspect, the engineered chemoautotroph NP 414995.1), ToIC (NP 417507.2) and AcrEF (NP of the present invention produces ()-cyclic fatty acids (cyFAS) 417731.1, NP 417732.1) from E. coli, or t11 1618 (NP as the carbon-based product of interest. To synthesize ()-cy 682408), t11 1619 (NP 682409), t110139 (NP 680930), clic fatty acids (cyFAS), several genes need to be introduced H1 1619 and U10139 from Thermosynechococcus elongatus and expressed that provide the cyclic precursor cyclohexyl BP-I or homologs thereof. carbonyl-CoA Cropp, 2000. The genes (fabH, ACP and 0209. In addition, the transport protein can be, for fabF) can then be expressed to allow initiation and elongation example, a fatty acid transport protein (FATP) selected from of ()-cyclic fatty acids. Alternatively, the homologous genes Drosophila melanogaster; Caenorhabditis elegans, Myco bacterium tuberculosis or Saccharomyces cerevisiae, Acine can be isolated from microorganisms that make cyFAS and tobacter sp. H01-N, any one of the mammalian FATPs or expressed in E. coli. Relevant genes include bkdC, lpd, fabH, homologs thereof. The FATPs can additionally be resynthe ACP fabF, fabH1, ACP, fabF, fabH3, fabC3, fabF, fabH A, sized with the membranous regions reversed in order to invert fabH B, ACP. the direction of substrate flow. Specifically, the sequences of 0204 Expression of the following genes are sufficient to amino acids composing the hydrophilic domains (or mem provide cyclohexylcarbonyl-CoA in E. coli: ans.J. ansK, brane domains) of the protein can be inverted while maintain ansL, chcA (1-cyclohexenylcarbonyl CoA reductase) and ing the same codons for each particular amino acid. The ansM from the ansatrienin gene cluster of Streptomyces col linus Chen, 1999 or plm.JK (5-enolpyruvylshikimate-3- identification of these regions is well known in the art. phosphate synthase), plmL (acyl-CoA dehydrogenase), chcA Production of Isoprenoids as the Carbon-Based Products of (enoyl-(ACP) reductase) and plmM (2,4-dienoyl-CoA reduc Interest tase) from the phoslactomycin B gene cluster of Streptomyces sp. HK803 Palaniappan, 2003 together with the acyl-CoA 0210. In one aspect, the engineered chemoautotroph of the isomerase (chcB gene) Patton, 2000 from S. collinus, S. present invention produces isoprenoids or their precursors avermitilis or S. coelicolor. Exemplary ansatrienin gene clus isopentenyl pyrophosphate (IPP) and its isomer, dimethylal ter enzymes include AAC44655, AAF73478 and homologs lyl pyrophosphate (DMAPP) as the carbon-based products of thereof. Exemplary phoslactomycin B gene cluster enzymes interest. There are two known biosynthetic pathways that include AAQ84158, AAQ84159, AAQ84160, AAQ84161 synthesize IPP and DMAPP. Prokaryotes, with some excep and homologs thereof. Exemplary chcB enzymes include tions, use the mevalonate-independent or deoxyxylulose NP 629292, AAF73478 and homologs thereof. 5-phosphate (DXP) pathway to produce IPP and DMAPP separately through a branch point (FIG. 15). Eukaryotes other 0205 The genes (fabH, ACP and fabF) are sufficient to than plants use the mevalonate-dependent (MEV) isoprenoid allow initiation and elongation of co-cyclic fatty acids, pathway exclusively to convert acetyl-coenzyme A (acetyl because they can have broad substrate specificity. In the event CoA) to IPP, which is subsequently isomerized to DMAPP that coexpression of any of these genes with the ansJKLM/ (FIG. 16). In general, plants use both the MEV and DXP chcAB or pml.JKLM/chcAB genes does not yield cyFAs, pathways for IPP synthesis. fabH, ACP and/or fabF homologs from microorganisms that 0211. The reactions in the DXP pathway are catalyzed by make cyFAS can be isolated (e.g., by using degenerate PCR the following enzymes: 1-deoxy-D-xylulose-5-phosphate primers or heterologous DNA probes) and coexpressed. synthase (E.C. 2.2.1.7), 1-deoxy-D-xylulose-5-phosphate reductoisomerase (E.C. 1.1.1.267), 4-diphosphocytidyl-2C Production of Halogenated Derivatives of Fatty Acids methyl-D-erythritol synthase (E.C. 2.7.7.60), 4-diphospho 0206 Genes are known that can produce fluoroacetyl cytidyl-2C-methyl-D-erythritol kinase (E.C. 2.7.1.148), CoA from fluoride ion. In one embodiment, the present inven 2C-methyl-D-erythritol 2.4-cyclodiphosphate synthase (E.C. tion allows for production of fluorinated fatty acids by com 4.6.1.12), (E)-4-hydroxy-3-methylbut-2-enyl diphosphate bining expression of fluoroacetate-involved genes (e.g., synthase (E.C. 1.17.7.1), isopentyl/dimethylallyl diphos fluorinase, nucleotide phosphorylase, fluorometabolite-spe phate synthase or 4-hydroxy-3-methylbut-2-enyl diphos cific aldolases, fluoroacetaldehyde dehydrogenase, and fluo phate reductase (E.C. 1.17.1.2). In one embodiment, the engi roacetyl-CoA synthase). neered chemoautotroph of the present invention expresses one or more enzymes from the DXP pathway. For example, Transport/Efflux/Release of Fatty Acids and their Derivatives one or more exogenous proteins can be selected from 0207 Also disclosed herein is a system for continuously 1-deoxy-D-xylulose-5-phosphate reductoisomerase, producing and exporting hydrocarbons out of recombinant 4-diphosphocytidyl-2C-methyl-D-erythritol synthase, host microorganisms via a transport protein. Many transport 4-diphosphocytidyl-2C-methyl-D-erythritol kinase, 2C-me and efflux proteins serve to excrete a large variety of com thyl-D-erythritol 2.4-cyclodiphosphate synthase, (E)-4-hy pounds and can be evolved to be selective for a particular type droxy-3-methylbut-2-enyl diphosphate synthase, and 4-hy offatty acid. Thus, in some embodiments an ABC transporter droxy-3-methylbut-2-enyl diphosphate reductase. The host can be functionally expressed by the engineered chemoau organism can also express two or more, three or more, four or totroph, so that the organism exports the fatty acid into the more, and the like, including up to all the protein and enzymes culture medium. In one example, the ABC transporter is an that confer the DXP pathway. Exemplary 1-deoxy-D-xylu ABC transporter from Caenorhabditis elegans, Arabidopsis lose-5-phosphate synthases include E. coli DXs (AAC46162): thalania, Alkaligenes eutrophus or Rhodococcus erythropolis P. putida KT2440 DXs (AAN66154); Salmonella enterica US 2015/003 7853 A1 Feb. 5, 2015 33

Paratyphi, see ATCC 9150 DXs (AAV78186); Rhodobacter include AF429385, Hevea brasiliensis; NM 006556, H. sphaeroides 2.4.1 DXs (YP 353327); Rhodopseudomonas sapiens; NC 001145 complement 712315 . . . 713670, S. palustris CGA009 DxS (NP 94.6305); Xylella fastidiosa cerevisiae; and homologs thereof. Exemplary mevalonate Temeculal DxS (NP 779493); Arabidopsis thaliana DXs pyrophosphate decarboxylase include X97557, S. cerevisiae: (NP 001078570 and/or NP 196699); and homologs AF290095, E. faecium; U49260, H. sapiens; and homologs thereof. Exemplary 1-deoxy-D-xylulose-5-phosphate reduc thereof. Exemplary isopentenyl pyrophosphate toisomerases include E. coli DXr (BAA32426); Arabidopsis include NC 000913, 3031087 . . . 3031635, E. coli: thaliana DXR (AAF73140); Pseudomonas putida KT2440 AF082326, Haematococcus pluvialis; and homologs thereof. DXr (NP 743754 and/or Q88MH4); Streptomyces coeli 0213. In some embodiments, the host cell produces IPP color A3(2) DXr (NP 629822); Rhodobacter sphaeroides via the MEV pathway, either exclusively or in combination 2.4.1 DXr (YP 352764); Pseudomonas fluorescens Pfo-1 with the DXP pathway. In other embodiments, a host cells DXr (YP 346389); and homologs thereof. Exemplary DXP pathway is functionally disabled so that the host cell 4-diphosphocytidyl-2C-methyl-D-erythritol synthases produces IPP exclusively through a heterologously intro include E. coli Isp) (AAF43.207); Rhodobacter sphaeroides duced MEV pathway. The DXP pathway can be functionally 2.4.1 IspD (YP 352876); Arabidopsis thaliana ISPD (NP disabled by disabling gene expression or inactivating the 565286); P putida KT2440 IspD (NP 743771); and function of one or more of the DXP pathway enzymes. homologs thereof. Exemplary 4-diphosphocytidyl-2C-me 0214. In some embodiments, the host cell produces IPP thyl-D-erythritol kinases include E. coli IspE (AAF29530): via the DXP pathway, either exclusively or in combination Rhodobacter sphaeroides 2.4.1 Isp (YP 351828); and with the MEV pathway. In other embodiments, a host cells homologs thereof. Exemplary 2C-methyl-D-erythritol 2,4- MEV pathway is functionally disabled so that the host cell cyclodiphosphate synthases include E. coli IspF produces IPP exclusively through a heterologously intro (AAF44656); Rhodobacter sphaeroides 2.4.1 IspF (YP duced DXP pathway. The MEV pathway can be functionally 352877); P putida KT2440 IspF (NP 743775); and disabled by disabling gene expression or inactivating the homologs thereof. Exemplary (E)-4-hydroxy-3-methylbut-2- function of one or more of the MEV pathway enzymes. enyldiphosphate synthase include E. coli IspG (AAK53460); 0215 Provided herein is a method to produce isoprenoids P. putida KT2440 IspG (NP 743.014); Rhodobacter in engineered chemoautotrophs engineered with the isopen sphaeroides 2.4.1 IspG (YP 353044); and homologs tenyl pyrophosphate pathway enzymes. Some examples of thereof. Exemplary 4-hydroxy-3-methylbut-2-enyl diphos isoprenoids include: hemiterpenes (derived from 1 isoprene phate reductases include E. coli IspH (AAL38655); Pputida unit) such as isoprene; monoterpenes (derived from 2 iso KT2440 IspH (NP 742768); and homologs thereof. prene units) Such as myrcene or limonene; sesquiterpenes 0212. The reactions in the MEV pathway are catalyzed by (derived from 3 isoprene units) Such as amorpha-4,11-diene, the following enzymes: acetyl-CoA thiolase, HMG-CoA bisabolene or farmesene; diterpenes (derived from four iso synthase (E.C. 2.3.3.10), HMG-CoA reductase (E.C. 1.1.1. prene units) such as taxadiene; sesterterpenes (derived from 5 34), mevalonate kinase (E.C. 2.7.1.36), phosphomevalonate isoprene units); triterpenes (derived from 6 isoprene units) kinase (E.C. 2.7.4.2), mevalonate pyrophosphate decarboxy Such as squalene; sesquarterpenes (derived from 7 isoprene lase (E.C. 4.1.1.33), isopentenyl pyrophosphate isomerase units); tetraterpenes (derived from 8 isoprene units) Such as (E.C. 5.3.3.2). In one embodiment, the engineered chemoau B-carotene or lycopene; and polyterpenes (derived from more totroph of the present invention expresses one or more than 8 isoprene units) Such as polyisoprene. The production enzymes from the MEV pathway. For example, one or more of isoprenoids is also described in some detail in the pub exogenous proteins can be selected from acetyl-CoA thiolase, lished PCT applications WO2007/139925 and WO/2007/ HMG-CoA synthase, HMG-CoA reductase, mevalonate 140339. kinase, phosphomevalonate kinase, mevalonate pyrophos 0216. In another embodiment, the engineered chemoau phate decarboxylase and isopentenyl pyrophosphate totroph of the present invention produces rubber as the car isomerase. The host organism can also express two or more, bon-based product of interest via the isopentenyl pyrophos three or more, four or more, and the like, including up to all phate pathway enzymes and cis-polyprenylcistransferase the protein and enzymes that confer the MEV pathway. (E.C. 2.5.1.20) which converts isopentenyl pyrophosphate to Exemplary acetyl-CoA thiolases include NC 000913 rubber. The enzyme cis-polyprenylcistransferase may come REGION: 232413 L.2325315, E. coli: D49362, Paracoccus from, for example, Hevea brasiliensis. denitrificans; L20428, S. cerevisiae; and homologs thereof. 0217. In another embodiment, the engineered chemoau Exemplary HMG-CoA synthases include NC 001145 totroph of the present invention produce isopentanol as the complement 19061 . . . 20536, S. cerevisiae: X96617, S. carbon-based product of interest via the isopentenyl pyro cerevisiae; X83882, A. thaliana; AB037907, Kitasatospora phosphate pathway enzymes and isopentanol dikinase. griseola; BT007302, H. sapiens; NC 002758, Locus tag 0218. In another embodiment, the engineered chemoau SAV2546, GeneID 1 122571, S. aureus; and homlogs thereof. totroph produces squalene as the carbon-based product of Exemplary HMG-CoA reductases include NM 206548, D. interest via the isopentenyl pyrophosphate pathway enzymes, melanogaster, NC 002758, Locus tag SAV2545, GeneID geranyl diphosphate synthase (E.C. 2.5.1.1), farnesyl diphos 1122570, S. aureus; NM 204485, Gallus gallus; AB015627, phate synthase (E.C. 2.5.1.10) and squalene synthase (E.C. Streptomyces sp. KO 3988: AF542543, Nicotiana attenuata: 2.5.1.21). Geranyl diphosphate synthase converts dimethyla AB037907, Kitasatospora griseola; AX128213, providing llyl pyrophosphate and isopentenyl pyrophosphate to geranyl the sequence encoding a truncated HMGR. S. cerevisiae: diphosphate. Farnesyl diphosphate synthase converts geranyl NC 001145: complement 115734 ... 1 18898, S. cerevisiae: diphosphate and isopentenyl diphosphate to farnesyl diphos and homologs thereof. Exemplary mevalonate kinases phate. A bifunctional enzyme carries out the conversion of include L77688, A. thaliana: X55875, S. cerevisiae; and dimethylallyl pyrophosphate and two isopentenyl pyrophos homologs thereof. Exemplary phosphomevalonate kinases phate to farnesyl pyrophosphate. Exemplary enzymes include US 2015/003 7853 A1 Feb. 5, 2015 34

Escherichia coli Isp A (NP 414955) and homologs thereof. glycerol dehydratases include K. pneumoniae dhaB1-3. Squalene synthase converts two farnesyl pyrophosphate and Exemplary 1,3-propanediol oxidoreductase include K. pneu NADPH to squalene. In another embodiment, the engineered moniae dhaT. chemoautotroph produces lanosterol as the carbon-based product of interest via the above enzymes, squalene Production of 1,4-Butanediol or 1,3-Butadiene as the monooxygenase (E.C. 1.14.99.7) and lanosterol synthase Carbon-Based Products of Interest (E.C. 5.4.99.7). Squalene monooxygenase converts squalene, 0222. In one aspect, the engineered chemoautotroph of the NADPH and O. to (S)-squalene-2,3-epoxide. Exemplary present invention produces 1,4-butanediolor 1,3-butanediene enzymes include Saccharomyces cerevisiae Ergl (NP as the carbon-based products of interest. The metabolic reac 01 1691) and homologs thereof. Lanosterol synthase converts tions in the 1,4-butanediol or 1,3-butadiene pathway are cata (S)-squalene-2,3-epoxide to lanosterol. Exemplary enzymes lyzed by the following enzymes: Succinyl-CoA dehydroge include Saccharomyces cerevisiae Erg7 (NP 011939) and nase (E.C. 1.2.1.n; e.g., C. kluyveri SucD), homologs thereof. 4-hydroxybutyrate dehydrogenase (E.C. 1.1.1.2; e.g., Arabi 0219. In another embodiment, the engineered chemoau dopsis thaliana GHBDH), aldehyde dehydrogenase (E.C. totroph of the present invention produces lycopene as the 1.1.1.n; e.g., E. coli AldH), 1,3-propanediol oxidoreductase carbon-based product of interest via the isopentenyl pyro (E.C. 1.1.1.202; e.g., K. pneumoniae DhaT), and optionally phosphate pathway enzymes, geranyl diphosphate synthase alcohol dehydratase (E.C. 4.2.1.-). Succinyl-CoA dehydro (E.C. 2.5.1.21, described above), farnesyl diphosphate syn genase converts succinyl-CoA and NADPH to succinic semi thase (E.C. 2.5.1.10, described above), geranylgeranyl pyro aldehyde and CoA. 4-hydroxybutyrate dehydrogenase con phosphate synthase (E.C. 2.5.1.29), phytoene synthase (E.C. verts succinic semialdehyde and NADPH to 2.5.1.32), phytoene oxidoreductase (E.C. 1.14.99.n) and 4-hydroxybutyrate. Aldehyde dehydrogenase converts 4-hy -caroteine oxidoreductase (E.C. 1.14.99.30). Geranylgeranyl droxybutyrate and NADH to 4-hydroxybutanal. 1,3-pro pyrophosphate synthase converts isopentenyl pyrophosphate panediol oxidoreductase converts 4-hydroxybutanal and and farnesyl pyrophosphate to (all trans)-geranylgeranyl NADH to 1,4-butanediol. Alcohol dehydratase converts 1,4- pyrophosphate. Exemplary geranylgeranyl pyrophosphate butanediol to 1,3-butadiene. synthases include Synechocystis sp. PCC6803 crtE (NP 440010) and homologs thereof. Phytoene synthase converts 2 Production of Polyhydroxybutyrate as the Carbon-Based geranylgeranyl-PP to phytoene. Exemplary enzymes include Products of Interest Synechocystis sp. PCC6803 crtB (P37294). Phytoene oxi 0223) In one aspect, the engineered chemoautotroph of the doreductase converts phytoene, 2 NADPH and 20 to -caro present invention produces polyhydroxybutyrate as the car tene. Exemplary enzymes include Synechocystis sp. bon-based products of interest (FIG. 18). The reactions in the PCC6803 crtI and Synechocystis sp. PCC6714 crtI (P21134). polyhydroxybutyrate pathway are catalyzed by the following -caroteine oxidoreductase converts -carotene, 2 NADPH enzymes: acetyl-CoA:acetyl-CoA C-acetyltransferase (E.C. and 20 to lycopene. Exemplary enzymes include Syn 2.3.1.9), (R)-3-hydroxyacyl-CoA:NADP+ oxidoreductase echocystis sp. PCC6803 crtO-2 (NP 441720). (E.C. 1.1.1.36) and polyhydroxyalkanoate synthase (E.C. 0220. In another embodiment, the engineered chemoau 2.3.1.-). Exemplary acetyl-CoA:acetyl-CoA C-acetyltrans totroph of the present invention produces limonene as the ferases include Ralstonia eutropha phaA. Exemplary (R)-3- carbon-based product of interest via the isopentenyl pyro hydroxyacyl-CoA:NADP+ oxidoreductases include Ralsto phosphate pathway enzymes, geranyl diphosphate synthase nia eutropha phaB. Exemplary polyhydroxyalkanoate (E.C. 2.5.1.21, described above) and one of (R)-limonene synthase include Ralstonia eutropha phaC. In the event that synthase (E.C. 4.2.3.20) and (4S)-limonene synthase (E.C. the host organism also has the capacity to degrade polyhy 4.2.3.16) which convert geranyl diphosphate to a limonene droxybutyrate, the corresponding degradation enzymes. Such enantiomer. Exemplary (R)-limonene synthases include that as poly(R)-3-hydroxybutanoatehydrolase (E.C. 3.1.1.75), from Citrus limon (AAM53946) and homologs thereof. may be inactivated. Hosts that lack the ability to naturally Exemplary (4S)-limonene synthases include that from Men synthesize polyhydroxybutyrate generally also lack the tha spicata (AAC37366) and homologs thereof. capacity to degrade it, thus leading to irreversible accumula tion of polyhydroxybutyrate if the biosynthetic pathway is introduced. Production of Glycerol or 1.3-Propanediol as the 0224 Intracellular polyhydroxybutyrate can be measured Carbon-Based Products of Interest by solvent extraction and esterification of the polymer from 0221. In one aspect, the engineered chemoautotroph of the whole cells. Typically, lyophilized biomass is extracted with present invention produces glycerol or 1,3-propanediol as the methanol-chloroform with 10% HCl as a catalyst. The chlo carbon-based products of interest (FIG. 17). The reactions in roform dissolves the polymer, and the methanolesterifies it in the glycerol pathway are catalyzed by the following enzymes: the presence of HC1. The resulting mixture is extracted with sn-glycerol-3-P dehydrogenase (E.C. 1.1.1.8 or E.C. 1.1.1. water to remove hydrophilic Substances and the organic phase 94) and sn-glycerol-3-phosphatase (E.C. 3.1.3.21). To pro is analyzed by GC.. duce 1,3-propanediol, the following enzymes are also included: sn-glycerol-3-P. glycerol dehydratase (E.C. 4.2.1. Production of Lysine as the Carbon-Based Products of 30) and 1,3-propanediol oxidoreductase (E.C. 1.1.1.202). Interest Exemplary sn-glycerol-3-P dehydrogenases include Saccha 0225. In one aspect, the engineered chemoautotroph of the romyces cerevisiae darl and homologs thereof. Exemplary present invention produces lysine as the carbon-based prod sn-glycerol-3-phosphatases include Saccharomyces cerevi uct of interest. There are several known lysine biosynthetic siae gpp2 and homologs thereof. Exemplary sn-glycerol-3-P. pathways. One lysine biosynthesis pathway is depicted in US 2015/003 7853 A1 Feb. 5, 2015

FIG. 19. The reactions in one lysine biosynthetic pathway are 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, catalyzed by the following enzymes: aspartate aminotrans 81-85%, 90-95%,96-98%, 99%, 99.9% or even higher iden ferase (E.C. 2.6.1.1; e.g. E. coli AspC), aspartate kinase (E.C. tity to SEQID NO:30. The present invention provides nucleic 2.7.2.4; e.g., E. coli LySC), aspartate semialdehyde dehydro acids each comprising or consisting of a sequence which is a genase (E.C. 1.2.1.11; e.g., E. coli Asd), dihydrodipicolinate codon optimized version of the wild-type propionyl-CoA synthase (E.C. 4.2.1.52; e.g., E. coli DapA), dihydrodipicoli synthase gene. In another embodiment, the invention pro nate reductase (E.C. 1.3.1.26; e.g., E. coli Dapb), tetrahy vides a nucleic acid encoding a polypeptide having the amino drodipicolinate succinylase (E.C. 2.3.1.117; e.g., E. coli acid sequence of SEQID NO:31. DapD), N-Succinyldiaminopimelate-aminotransferase (E.C. Integration of Metabolic Pathways into Host Metabolism 2.6.1.17; e.g., E. coli Arg)), N-Succinyl-L-diaminopimelate 0227. The engineered chemoautotrophs of the invention desuccinylase (E.C. 3.5.1.18; e.g., E. coli DaphE), diami can be produced by introducing expressible nucleic acids nopimelate epimerase (E.C. 5.1.1.7. E. coli Dapf), diami encoding one or more of the enzymes or proteins participat nopimelate decarboxylase (E.C. 4.1.1.20; e.g., E. coli LySA). ing in one or more energy conversion, carbon fixation and, In one embodiment, the engineered chemoautotroph of the optionally, carbon product biosynthetic pathways. Depend present invention expresses one or more enzymes from a ing on the host organism chosen for conferring a chemoau lysine biosynthetic pathway. For example, one or more exog totrophic capability, nucleic acids for some or all of particular enous proteins can be selected from aspartate aminotrans metabolic pathways can be expressed. For example, if a cho ferase, aspartate kinase, aspartate semialdehyde dehydroge Sen host is deficient in one or more enzymes or proteins for nase, dihydrodipicolinate synthase, dihydrodipicolinate desired metabolic pathways, then expressible nucleic acids reductase, tetrahydrodipicolinate Succinylase, N-Succinyl for the deficient enzyme(s) or protein(s) are introduced into diaminopimelate-aminotransferase, N-Succinyl-L-diami the host for Subsequent exogenous expression. Alternatively, nopimelate desuccinylase, diaminopimelate epimerase, if the chosen host exhibits endogenous expression of some diaminopimelate decarboxylase, L,L-diaminopimelate ami pathway genes, but is deficient in others, then an encoding notransferase (E.C. 2.6.1.83; e.g., Arabidopsis thaliana nucleic acid is needed for the deficient enzyme(s) or protein At4.g33680), homocitrate synthase (E.C. 2.3.3.14: e.g., Sac (s) to achieve production of desired carbon products from charomyces cerevisiae LYS21), homoaconitase (E.C. 4.2.1. inorganic energy and inorganic carbon. Thus, an engineered 36; e.g., Saccharomyces cerevisiae LYS4, LYS3), homoisoci chemoautotroph of the invention can be produced by intro trate dehydrogenase (E.C. 1.1.1.87; e.g., Saccharomyces ducing exogenous enzyme or protein activities to obtain cerevisiae LYS12, LYS11, LYS10), 2-aminoadipate tran desired metabolic pathways or desired metabolic pathways saminase (E.C. 2.6.1.39; e.g., Saccharomyces cerevisiae can be obtained by introducing one or more exogenous ARO8), 2-aminoadipate reductase (E.C. 1.2.1.31; e.g., Sac enzyme or protein activities that, together with one or more charomyces cerevisiae LYS2, LYS5), aminoadipate semial endogenous enzymes or proteins, produces a desired product dehyde-glutamate reductase (E.C. 1.5.1.10: e.g., Saccharo Such as reduced cofactors, central metabolites and/or carbon myces cerevisiae LYS9, LYS13), lysine-2-oxoglutarate based products of interest. reductase (E.C. 1.5.1.7; e.g., Saccharomyces cerevisiae 0228 Depending on the metabolic pathway constituents LYS1). The host organism can also express two or more, three of a selected host microbial organism, the engineered or more, four or more, and the like, including up to all the chemoautotrophs of the invention can include at least one protein and enzymes that confer lysine biosynthesis. exogenously expressed metabolic pathway-encoding nucleic acid and up to all encoding nucleic acids for one or more Production of Y-Valerolactone as the Carbon-Based Product energy conversion, carbon fixation and, optionally, carbon of Interest based product pathways. For example, a RuMP-derived car 0226. In some embodiments, the engineered chemoau bon fixation pathway can be established in a host deficient in totroph of the present invention is engineered to produce a pathway enzyme or protein through exogenous expression Y-Valerolactone as the carbon-based product of interest. One of the corresponding encoding nucleic acid. In a host deficient example Y-Valerolactone biosynthetic pathway is shown in in all enzymes or proteins of a metabolic pathway, exogenous FIG. 20. In one embodiment, the engineered chemoautotroph expression of all enzyme or proteins in the pathway can be is engineered to express one or more of the following included, although it is understood that all enzymes or pro enzymes: propionyl-CoA synthase (E.C. 6.2.1.-, E.C. 4.2.1.- teins of a pathway can be expressed even if the host contains and E.C. 1.3.1.-), beta-ketothiolase (E.C. 2.3.1.16; e.g., Ral at least one of the pathway enzymes or proteins. For example, stonia eutropha BktB), acetoacetyl-CoA reductase (E.C. 1.1. exogenous expression of all enzymes or proteins in a carbon 1.36; e.g., Ralstonia eutropha PhaB), 3-hydroxybutyryl-CoA fixation pathway derived from the 3-HPA bicycle can be dehydratase (E.C. 4.2.1.55; e.g., X. axonopodis Crt), viny included, such as the acetyl-CoA carboxylase, malonyl-CoA lacetyl-CoA A-isomerase (E.C. 5.3.3.3; e.g., C. difficile reductase, propionyl-CoA synthase, propionyl-CoA car Abfl D), 4-hydroxybutyryl-CoA transferase (E.C. 2.8.3.-; e.g., boxylase, methylmalonyl-CoA epimerase, methylmalonyl C. kluyveri OrfA), 1.4-lactonase (E.C.3.1.1.25; e.g., that from CoA mutase, Succinyl-CoA: (S)-malate CoA transferase, Suc R. norvegicus). Propionyl-CoA synthase is a multi-functional cinate dehydrogenase, fumarate hydratase, (S)-malyl-CoA/ enzyme that converts 3-hydroxypropionate, ATP and B-methylmalyl-CoA/(S)-citramalyl-CoA lyase, mesaconyl NADPH to propionyl-CoA. Exemplary propionyl-CoA syn C1-CoA hydratase, mesaconyl-CoA C1-C4 CoA transferase, thases include AAL47820, and homologs thereof. SEQ ID and mesaconyl-C4-CoA hydratase. Given the teachings and NO:30 represents the E. coli codon optimized coding guidance provided herein, those skilled in the art would sequence for this propionyl-CoA synthase of the present understand that the number of encoding nucleic acids to intro invention. In one aspect, the invention provides nucleic acid duce in an expressible form can, at least, parallel the meta molecule and homologs, variants and derivatives of SEQID bolic pathway deficiencies of the selected host microbial NO:30. The nucleic acid sequence can have preferably 70%, organism. US 2015/003 7853 A1 Feb. 5, 2015 36

Genetic Engineering Methods for Optimization of Metabolic usually, but not exclusively, arise from metabolism endog Pathways enous to the host cell or organism. 0229. In some embodiments, the engineered chemoau 0233. A combination of different approaches may be used totrophs of the invention also can include other genetic modi to identify candidate genetic modifications. Such approaches fications that facilitate or optimize production of a carbon include, for example, metabolomics (which may be used to based product from an inorganic energy source and inorganic identify undesirable products and metabolic intermediates carbon or that confer other useful functions onto the host that accumulate inside the cell), metabolic modeling and iso organism. topic labeling (for determining the flux through metabolic 0230. In one aspect, the expression levels of the proteins of reactions contributing to hydrocarbon production), and con interest of the energy conversion pathways, carbon fixation ventional genetic techniques (for eliminating or Substantially pathways and, optionally, carbon product biosynthetic path disabling unwanted metabolic reactions). For example, meta ways can be either increased or decreased by, for example, bolic modeling provides a means to quantify fluxes through replacing or altering the expression control sequences with the cells metabolic pathways and determine the effect of alternate expression control sequences encoded by standard elimination of key metabolic steps. In addition, metabolom ized genetic parts. The exogenous standardized genetic parts ics and metabolic modeling enable better understanding of can regulate the expression of either heterologous or endog the effect of eliminating key metabolic steps on production of enous genes of the metabolic pathway. Altered expression of desired products. the enzyme or enzymes and/or protein or proteins of a meta 0234. To predict how a particular manipulation of metabo bolic pathway can occur, for example, through changing gene lism affects cellular metabolism and synthesis of the desired position or gene order Smolke, 2002b, altered gene copy product, a theoretical framework was developed to describe number Smolke, 2002a, replacement of a endogenous, the molar fluxes through all of the known metabolic pathways naturally occurring regulated promoters with constitutive or of the cell. Several important aspects of this theoretical frame inducible synthetic promoters, mutation of the ribosome work include: (i) a relatively complete database of known binding sites Wang, 2009, or introduction of RNA second pathways, (ii) incorporation of the growth-rate dependence of ary structural elements and/or cleavage sites Smolke, 2000; cell composition and energy requirements, (iii) experimental Smolke, 2001. measurements of the amino acid composition of proteins and 0231. In another aspect, Some engineered chemoautotro the fatty acid composition of membranes at different growth phs of the present invention may require specific transporters rates and dilution rates and (iv) experimental measurements to facilitate uptake of inorganic energy sources and/or inor of side reactions which are known to occur as a result of ganic carbon Sources. In some embodiments, the engineered metabolism manipulation. These new developments allow chemoautotrophs use formate as an inorganic energy source, significantly more accurate prediction of fluxes in key meta inorganic carbon source or both. If formate uptake is limiting bolic pathways and regulation of enzyme activity Keasling, for either growth or production of carbon-based products of 1999a; Keasling, 1999b; Martin, 2002: Henry, 2006). interest, then expression of one or more formate transporters 0235 Such types of models have been applied, for in the engineered chemoautotroph of the present invention example, to analyze metabolic fluxes in organisms respon can alleviate this bottleneck. The formate transporters may be sible for enhanced biological phosphorus removal in waste heterologous or endogenous to the host organism. Exemplary water treatment reactors and in filamentous fungi producing formate transporters include NP 415424 and NP 416987, polyketides Pramanik, 1997; Pramanik, 1998a; Pramanik, and homologs thereof. SEQ ID NO:54 and SEQID NO:55 1998b; Pramanik, 1998c). represent E. coli codon optimized codig sequence each of 0236. In some embodiments, the host organism may have these two formate transporters, respectively, of the present native formate dehydrogenases or other enzymes that con invention. The present invention provides nucleic acids each Sume formate thereby competing with either energy conver comprising or consisting of a sequence which is a codon sion pathways that use formate as an inorganic energy source optimized version of one of the wild-type malonyl-CoA or carbon fixation pathways that use formate as an inorganic reductase genes. In another embodiment, the invention pro carbon Source; hence, these competing formate consumption vides nucleic acids each encoding a polypeptide having the reactions may be disrupted to increase the efficiency of amino acid sequence of one of NP 415424 and NP 416987. energy conversion and/or carbon fixation in the engineered 0232. In addition, the invention provides an engineered chemoautotroph of the present invention. For example, in the chemoautotroph comprising a genetic modification confer host organism E. coli, there are three native formate dehydro ring to the engineered chemoautotrophic microorganism an genases. Exemplary E. coli formate dehydrogenase genes for increased efficiency of using inorganic energy and inorganic disruption include fanG, fanH, fan I, floI, floH, floG and/or carbon to produce carbon-based products of interest relative fah F. Alternatively, since all three native formate dehydroge to the microorganism in the absence of the genetic modifica nases in E. coli require selenium and only those three tion. The genetic modification comprises one or more gene enzymes require selenium, in a preferred embodiment, genes disruptions, whereby the one or more gene disruptions for selenium uptake and/or biosynthesis of selenocysteine, increase the efficiency of producing carbon-based products of such as selA, selB, selC, and/or selD, are disrupted. interest from inorganic energy and inorganic carbon. In one 0237. In other embodiments, the host organism may have aspect, the one or more gene disruptions target genes encod native hydrogenases or other enzymes that consume molecu ing competing reactions for inorganic energy, reduced cofac lar hydrogen thereby competing with energy conversion path tors, inorganic carbon, and/or central metabolites. In another ways that use hydrogen as an inorganic energy source. For aspect, the one or more gene disruptions target genes encod example, in the host organism E. coli, there are four native ing competing reactions for intermediates or products of the hydrogenases although the fourth is not expressed to signifi energy conversion, carbon fixation, and/or carbon product cant levels Self, 2004. Exemplary E. coli formate hydroge biosynthetic pathways of interest. The competing reactions nase genes for disruption include hyaB, hybC, hycE, hyfi US 2015/003 7853 A1 Feb. 5, 2015 37 and fhlA. In another embodiment, a particular strain of the engineered chemoautotroph to express a vitamin biosynthesis host organism can be selected that specifically lacks the com pathway Roessner, 1995. An exemplary biosynthesis path peting reactions typical found in the species. For example, E. way for hydroxocobalamin comprises the following coli B strain BL21 (DE3) lacks formate and hydrogenase enzymes: uroporphyrin-III C-methyltransferase (E.C. 2.1.1. metabolism unlike E. coli K strains Pinske, 2011. 107), precorrin-2 cobaltochelatase (E.C. 4.99.1.3), cobalt 0238. In some embodiments, the host organism may have precorrin-2 (C)-methyltransferase (E.C.2.1.1.151), cobalt metabolic reactions that compete with reactions of the carbon precorrin-3 (C7)-methyltransferase (E.C. 2.1.1.131), cobalt fixation pathways in the engineered chemoautotroph of the precorrin-4 (C')-methyltransferase (E.C.2.1.1.133), cobalt present invention. For example, in the host organism E. coli, precorrin 5A hydrolase (E.C.3.7.1.12), cobalt-precorrin-5B the tricarboxylic acid cycle generally runs in the oxidative (C)-methyltransferase (E.C. 2.1.1.195), cobalt-precorrin direction during aerobic growth and as a split reductive and 6A reductase, cobalt-precorrin-6V (C)-methyltransferase oxidative branches during anaerobic growth. Hence, E. coli (E.C. 2.1.1.-), cobalt-precorrin-7 (C)-methyltransferase has several endogenous reactions that may compete with (decarboxylating) (E.C. 2.1.1.196), cobalt-precorrin-8X desired reactions of an rTCA-derived carbon fixation path methylmutase, cobyrinate A.C-diamide synthase (E.C. 6.3.5. way. Exemplary E. coli enzymes whose function are candi 11), cob(II)yrinate a,c-diamide reductase (E.C. 1.16.8.1), cob dates for disruption include citrate synthase (competes with (I)yrinic acid a,c-diamide adenosyltransferase (E.C. 2.5.1. reaction 1 in FIG. 3), 2-oxoglutarate dehydrogenase (com 17), adenosyl-cobyrate synthase (E.C. 6.3.5.10), petes with reaction 6), isocitrate dehydrogenase (may com adenosylcobinamide phosphate synthase (E.C. 6.3.1.10), pete with desired flux for reaction 7), isocitrate dehydroge GTP:adenosylcobinamide-phosphate guanylyltransferase nase phosphatase (competes with reaction 8), pyruvate (E.C. 2.7.7.62), nicotinate-nucleotide dimethylbenzimida dehydrogenase (competes with reaction 9). Zole phosphoribosyltransferase (E.C. 2.4.2.21), adenosylco 0239. In another aspect, some engineered chemoautotro binamide-GDP:O-ribazole-5-phosphate ribazoletransferase phs of the present invention may require alterations to the (E.C. 2.7.8.26) and adenosylcobalamine-5'-phosphate phos pool of intracellular reducing cofactors for efficient growth phatase (E.C.3.1.3.73). In addition, to allow for cobalt uptake and/or production of the carbon-based product of interest and incorporation into Vitamin B12, the genes encoding the from inorganic energy and inorganic carbon. In some cobalt transporter are overexpressed. The exemplary cobalt embodiments, the total pool of NAD+/NADH in the engi transporter protein found in Salmonella enterica is overex neered chemoautotroph is increased or decreased by adjust pressed and is encoded by proteins ABC-type Co" transport ing the expression level of nicotinic acid phosphoribosyl system, permease component (CbiM, NP 460968), ABC transferase (E.C. 2.4.2.11). Over-expression of either the E. type cobalt transport system, periplasmic component (CbiN. coli or Salmonella gene pncB which encodes nicotinic acid NP 460967), and ABC-type cobalt transport system, per phosphoribosyltransferase has been shown to increase total mease component (CbiQ, NP 461989). NAD+/NADH levels in E. coli Wubbolts, 1990; Berrios 0241. In some embodiments, the intracellular concentra River, 2002; San, 2002. In another embodiment, the avail tion (e.g., the concentration of the intermediate in the engi ability of intracellular NADPH can be also altered by modi neered chemoautotroph) of the metabolic pathway interme fying the engineered chemoautotroph to express an NADH: diate can be increased to further boost the yield of the final NADPH transhydrogenase Sauer, 2004; Chin, 2011. In product. For example, by increasing the intracellular amount another embodiment, the total pool of ubiquinone in the engi of a substrate (e.g., a primary Substrate) for an enzyme that is neered chemoautotroph is increased or decreased by adjust active in the metabolic pathway, and the like. ing the expression level of ubiquinone biosynthetic enzymes, 0242. In another aspect, the carbon-based products of Such as p-hydroxybenzoate-polyprenyl pyrophosphate trans interest are or are derived from the intermediates or products ferase and polyprenyl pyrophosphate synthetase. Overex of fatty acid biosynthesis. To increase the production of pression of the corresponding E. coli genes ubiA and ispB waxes/fatty acid esters, and fatty alcohols, one or more of the increased the ubiquinone pool in E. coli Zhu, 1995. In enzymes of fatty acid biosynthesis can be over expressed or another embodiment, the level of the redox cofactor ferre mutated to reduce feedback inhibition. Additionally, enzymes doxin in the engineered chemoautotroph can be increased or that metabolize the intermediates to make nonfatty-acid decreased by changing the expression control sequences that based products (side reactions) can be functionally deleted or regulate its expression. attenuated to increase the flux of carbon through the fatty acid 0240. In another aspect, in addition to an inorganic energy biosynthetic pathway thereby enhancing the production of and carbon source, some engineered chemoautotrophs may carbon-based products of interest. require a specific nutrients or vitamin(s) for growth and/or production of carbon-based products of interest. For Growth-Based Selection Methods for Optimization of example, hydroxocobalamin, a vitamer of Vitamin B12, is a Engineered Carbon-Fixing Strains cofactor for particular enzymes of the present invention, Such 0243 Selective pressure provides a valuable means for as methylmalonyl-CoA mutase (E.C. 5.4.99.2). Required testing and optimizing the engineered chemoautotrophs of nutrients are generally supplemented to the growth media the present invention. In some embodiments, the engineered during bench scale propagation of such organisms. However, chemoautotrophs of the invention can be evolved under selec such nutrients can be prohibitively expensive in the context of tive pressure to optimize production of a carbon-based prod industrial scale bio-processing. In one embodiment of the uct from an inorganic energy source and inorganic carbon or present invention, the host cell is selected from an organism that confer other useful functions onto the host organism. The that naturally produces the required nutrient(s). Such as Sal ability of an optimized engineered chemoautotroph to repli monella enterica or Pseudomonas denitrificans which natu cate more rapidly than unmodified counterparts confirms the rally produces hydroxocobalamin. In an alternate embodi utility of the optimization. Similarly, the ability to survive and ment, the need for a vitamin is obviated by modifying the replicate in media lacking a required nutrient, such as Vitamin US 2015/003 7853 A1 Feb. 5, 2015

B12, confirms the Successful implementation of a nutrient ment where nutrient uptake are inversely correlated with biosynthetic module. In some embodiments, the engineered growth rate are predicted to evolve away from nutrient chemoautotrophs can be cultured in the presence of inorganic uptake. energy source(s), inorganic carbon and a limiting amount of organic carbon. Over time, the amount of organic carbon Fermentation Conditions present in the culture media is decreased in order to select for 0247 The engineered chemoautotrophs of the present evolved strains that more efficiently utilize the inorganic invention are cultured in a medium comprising inorganic energy and carbon. energy source(s), inorganic carbon source(s) and any 0244 Evolution can occur as a result of either spontane required nutrients. The culture conditions can include, for ous, natural mutation or by addition of mutagenic agents or example, liquid culture procedures as well as fermentation conditions to live cells. If desired, additional genetic variation and other large scale culture procedures. can be introduced prior to or during selective pressure by 0248. The production and isolation of carbon-based prod treatment with mutagens. Such as ultra-violet light, alkylators ucts of interest can be enhanced by employing specific fer e.g., ethyl methanesulfonate (EMS), methyl methane sul mentation techniques. One method for maximizing produc fonate (MMS), diethylsulfate (DES), and nitrosoguanidine tion while reducing costs is increasing the percentage of the (NTG, NG, MMG), DNA intercalcators (e.g., ethidium bro carbon that is converted to carbon-based products of interest. mide), nitrous acid, base analogs, bromouracil, transposonsm During normal cellular lifecycles carbon is used in cellular and the like. The engineered chemoautotrophs can be propa functions including producing lipids, Saccharides, proteins, gated either in serial batch culture or in a turbidostat as a organic acids, and nucleic acids. Reducing the amount of controlled growth rate. carbon necessary for growth-related activities can increase the efficiency of carbon Source conversion to output. This can 0245 Alternately or in addition to selective pressure, path be achieved by first growing engineered chemoautotrophs to way activity can be monitored following growth under per a desired density, such as a density achieved at the peak of the missive (i.e., non-selective) conditions by measuring specific log phase of growth. At Such a point, replication checkpoint product output via various metabolic labeling studies (includ genes can be harnessed to stop the growth of cells. Specifi ing radioactivity), biochemical analyses (Michaelis cally, quorum sensing mechanisms Camilli, 2006; Venturi, Menten), gas chromatography-mass spectrometry (GC/MS), 2006: Reading, 2006 can be used to activate genes such as mass spectrometry, matrix assisted laser desorption ioniza p53, p21, or other checkpoint genes. Genes that can be acti tion time-of-flight mass spectrometry (MALDI-TOF), capil vated to stop cell replication and growth in E. coli include lary electrophoresis (CE), and high pressure liquid chroma umu C genes, the over-expression of which stops the pro tography (HPLC). gression from stationary phase to exponential growth Murli, 0246 To generate engineered chemoautrophs with 2000. UmuC is a DNA polymerase that can carry out trans improved yield of central metabolites and/or carbon-based lesion synthesis over non-coding lesions—the mechanistic products of interest, metabolic modeling can be utilized to basis of most UV and chemical mutagenesis. The umu DC guide strain optimization. Modeling analysis allows reliable gene products are used for the process of translesion synthesis predictions of the effects on cell growth of shifting the and also serve as a DNA damage checkpoint. Umu C gene metabolism towards more efficient production of central products include UmuC, Umu), umul)', Umul'C, Umulo' metabolites or products derived from central metabolites. and UmuD. Simultaneously, the carbon product biosyn Modeling can also be used to design gene knockouts that thetic pathway genes are activated, thus minimizing the need additionally optimize utilization of the energy conversion, for replication and maintenance pathways to be used while carbon fixation and carbon product biosynthetic pathways. In the carbon-based product of interest is being made. Some embodiments, modeling is used to select growth con 0249. Alternatively, cell growth and product production ditions that create selective pressure towards uptake and ulti can be achieved simultaneously. In this method, cells are lization of inorganic energy and inorganic carbon. An in silico grown in bioreactors with a continuous Supply of inputs and Stoichiometric model of host organism metabolism and the continuous removal of product. Batch, fed-batch, and con metabolic pathway(s) of interest can be constructed (see, for tinuous fermentations are common and well known in the art example, a model of the E. coli metabolic network Edwards, and examples can be found in Brock, 1989: Deshpande, 2002). The resulting model can be used to compute pheno 1992). typic phase planes for the engineered chemoautotrophs of the 0250 In a preferred embodiment, the engineered present invention. A phenotypic phase plane is a portrait of chemoautotroph is engineered Such that the final product is the accessible growth states of an engineered chemoau released from the cell. In embodiments where the final prod totroph as a function of imposed substrate uptake rates. A uct is released from the cell, a continuous process can be particular engineered chemoautotroph, at particular uptake employed. In this approach, a reactor with organisms produc rates for limiting nutrients, may not grow as well as the ing desirable products can be assembled in multiple ways. In phenotypic phase plane predicts, but no strain should be able one embodiment, the reactor is operated in bulk continuously, to grow better than indicated by the phenotypic phase plane. with a portion of media removed and held in a less agitated Under a variety of circumstances, it has been shown the environment such that an aqueous product can self-separate modified E. coli strains evolve towards, and then along, the out with the product removed and the remainder returned to phenotypic phase plane, always in the direction of increasing the fermentation chamber. In embodiments where the product growth rates Fong, 2004. Thus, a phenotypic phase plane does not separate into an aqueous phase, media is removed can be viewed as a landscape of selective pressure. Strains in and appropriate separation techniques (e.g., chromatography, an environment where a given nutrient uptake is positively distillation, etc.) are employed. correlated with growth rate are predicted to evolve towards 0251. In an alternate embodiment, the product is not increased nutrient uptake. Conversely, strains in an environ secreted by the engineered chemoautotrophs. In this embodi US 2015/003 7853 A1 Feb. 5, 2015 39 ment, a batch-fed fermentation approach is employed. In Such totrophs of the invention. In some embodiments, the engi cases, cells are grown under continued exposure to inputs neered chemoautotrophs are optimized for a two stage fer (inorganic energy and inorganic carbon) as specified above mentation by regulating the expression of the carbon product until the reaction chamber is saturated with cells and product. biosynthetic pathway. A significant portion to the entirety of the culture is removed, 0256 In one aspect, the percentage of input carbon atoms the cells are lysed, and the products are isolated by appropri converted to hydrocarbon products is an efficient and inex ate separation techniques (e.g., chromatography, distillation, pensive process. Typical efficiencies in the literature are filtration, centrifugation, etc.). ~<5%. Engineered chemoautotrophs which produce hydro 0252. In certain embodiments, the engineered chemoau carbon products can have greater than 1, 3, 5, 10, 15, 20, 25, totrophs of the invention can be sustained, cultured or fer and 30% efficiency. In one example engineered chemoau mented under anaerobic or Substantially anaerobic condi totrophs can exhibit an efficiency of about 10% to about 25%. tions. Briefly, anaerobic conditions refers to an environment In other examples, such microorganisms can exhibit an effi devoid of oxygen. Substantially anaerobic conditions ciency of about 25% to about 30%, and in other examples include, for example, a culture, batch fermentation or con such engineered chemoautotrophs can exhibit >30% effi tinuous fermentation Such that the dissolved oxygen concen ciency. tration in the medium remains between 0 and 10% of satura 0257. In some examples where the final product is tion. Substantially anaerobic conditions also includes released from the cell, a continuous process can be employed. growing or resting cells in liquid medium or on Solid agar In this approach, a reactor with engineered chemoautrophs inside a sealed chamber maintained with an atmosphere of producing for example, fatty acid derivatives, can be less than 1% oxygen. It is highly desirable to maintain anaero assembled in multiple ways. In one example, a portion of the bic conditions in the fermenter to reduce the cost of the overall media is removed and allowed to separate. Fatty acid deriva process. tives are separated from the aqueous layer, which can in turn, 0253) If desired, the pH of the medium can be maintained be returned to the fermentation chamber. at a desired pH, in particular neutral pH, such as a pH of 0258. In another example, the fermentation chamber can around 7 by addition of a base, such as NaOH or other bases, enclose a fermentation that is undergoing a continuous reduc or acid, as needed to maintain the culture medium at a desir tion. In this instance, a stable reductive environment can be able pH. The growth rate can be determined by measuring created. The electron balance would be maintained by the optical density using a spectrophotometer (600 nm), and the release of oxygen. Efforts to augment the NAD/H and glucose uptake rate by monitoring carbon Source depletion NADP/H balance can also facilitate instabilizing the electron over time. balance. 0254. In another embodiment, the engineered chemoau totrophs can be cultured in the presence of an electron accep Consolidated Chemoautotrophic Fermentation tor, for example, nitrate, in particular under Substantially anaerobic conditions. It is understood that an appropriate 0259. The above aspect of the invention is an alternative to amount of nitrate can be added to a culture to achieve a directly producing final carbon-based product of interest as a desired increase in biomass, for example, 1 mM to 100 mM result of chemoautotrophic metabolism. In this approach, nitrate, or lower or higher concentrations, as desired, so long carbon-based products of interest would be produced by as the amount added provides a sufficient amount of electron leveraging other organisms that are more amenable to making acceptor for the desired increase in biomass. Such amounts any one particular product while culturing the engineered include, but are not limited to, 5 mM, 10 mM, 15 mM, 20 mM, chemoautotroph for its carbon Source. Consequently, fermen 25 mM, 30 mM, 40 mM, 50 mM, as appropriate to achieve a tation and production of carbon-based products of interest desired increase in biomass. can occur separately from carbon Source production in a 0255. In some embodiments, the engineered chemoau bioreactor. totrophs of the present invention are initially grown in culture 0260. In one aspect, the methods of producing such car conditions with a limiting amount of organic carbon to facili bon-based products of interest include two steps. The first tate growth. Then, once the Supply of organic carbon is step includes using engineered chemoautotrophs to convert exhausted, the engineered chemoautotrophs transition from inorganic carbon to central metabolites or Sugars such as heterotrophic to autotrophic growth relying on energy from glucose. The second-step is to use the central metabolites or an inorganic energy sources to fixinorganic carbon in order to Sugars as a carbon Source for cells that produce carbon-based produce carbon-based products of interest. The organic car products of interest. In one embodiment, the two-stage bon can be, for example, a carbohydrate Source. Such sources approach comprises a bioreactor comprising engineered include, for example, Sugars such as glucose, Xylose, arabi chemoautotrophs; a second reactor comprising cells capable nose, galactose, mannose, fructose and starch. Other sources of fermentation; wherein the engineered chemoautotrophs of carbohydrate include, for example, renewable feedstocks provides a carbon Source Such as glucose for cells capable of and biomass. Exemplary types of biomasses that can be used fermentation to produce a carbon-based product of interest. as feedstocks in the methods of the invention include cellu The second reactor may comprise more than one type of losic biomass, hemicellulosic biomass and lignin feedstocks microorganism. The resulting carbon-based products of inter or portions of feedstocks. Such biomass feedstocks contain, est are Subsequently separated and/or collected. for example, carbohydrate Substrates useful as carbon sources 0261 Preferably, the two steps are combined into a single Such as glucose, Xylose, arabinose, galactose, mannose, fruc step process whereby the engineered chemoautotrophs con tose and starch. Given the teachings and guidance provided Vert inorganic energy and inorganic carbon and directly into herein, those skilled in the art would understand that renew central metabolites or Sugars such as glucose and Such organ able feedstocks and biomass other than those exemplified isms are capable of producing a variety of carbon-based prod above also can be used for culturing the engineered chemoau ucts of interest. US 2015/003 7853 A1 Feb. 5, 2015 40

0262 The present invention also provides methods and duced has a high logP value, even at very low concentrations compositions for Sustained glucose production in engineered the fatty acid can separate into the organic phase in the fer chemoautotrophs wherein these or other organisms that use mentation vessel. the Sugars are cultured using inorganic energy and inorganic 0268 When producing fatty acids by the methods carbon for use as a carbon Source to produce carbon-based described herein, such products can be relatively immiscible products of interest. In such embodiments, the host cells are in the fermentation media, as well as in the cytoplasm. There capable of secreting the Sugars, such as glucose from within fore, the fatty acid can collect in an organic phase either the cell to the culture media in continuous or fed-batch in a intracellularly or extracellularly. The collection of the prod bioreactor. ucts in an organic phase can lessen the impact of the fatty acid 0263. Certain changes in culture conditions of engineered derivative on cellular function and allows the production host chemoautrophs for the production of Sugars can be optimized to produce more product. for growth. For example, conditions are optimized for inor 0269. The fatty alcohols, fatty acid esters, waxes, and ganic energy source(s) and their concentration(s), inorganic hydrocarbons produced as described herein allow for the carbon source(s)and their concentration(s), electronacceptor production of homogeneous compounds with respect to other (s) and their concentrations, addition of Supplements and compounds wherein at least 50%, 60%, 70%, 80%, 90%, or nutrients. As would be apparent to those skilled in the art, the 95% of the fatty alcohols, fatty acid esters, waxes and hydro conditions sufficient to achieve optimum growth can vary carbons produced have carbon chain lengths that vary by less depending upon location, climate, and other environmental than 4 carbons, or less than 2 carbons. These compounds can factors, such as the temperature, oxygen concentration and also be produced so that they have a relatively uniform degree humidity. Other adjustments may be required, for example, of saturation with respect to other compounds, for example at an organism’s ability for carbon uptake. Increased inorganic least 50%, 60%, 70%, 80%, 90%, or 95% of the fatty alco carbon, Such as in the form of carbon dioxide, may be intro hols, fatty acid esters, hydrocarbons and waxes are mono-, duced into a bioreactor by a gas sparger or aeration devices. di-, or tri-unsaturated. 0264. Advantages of consolidated chemoautotrophic fer mentation include a process where there is separation of chemical end products, e.g., glucose, spatial separation Detection and Analysis between end products (membranes) and time. Additionally, 0270 Generally, the carbon-based products of interest unlike traditional or cellulosic biomass to biofuels produc produced using the engineered chemoautotrophs described tion, pretreatment, saccharification and crop plowing are herein can be analyzed by any of the standard analytical obviated. methods, e.g., gas chromatography (GC), mass spectrometry 0265. The consolidated chemoautrophic fermentation (MS) gas chromatography-mass spectrometry (GCMS), and process produces continuous products. In preferred embodi liquid chromatography-mass spectrometry (LCMS), high ments, the process involves direct conversion of inorganic performance liquid chromatography (HPLC), capillary elec energy and inorganic carbon to product from engineered trophoresis, Matrix-Assisted Laser Desorption Ionization front-end organisms to produce various products without the time-of-flight mass spectrometry (MALDI-TOF MS), need to lyse the organisms. For instance, the organisms can nuclear magnetic resonance (NMR), near-infrared (NIR) utilize 3PGAL to make a desired fermentation product, e.g., spectroscopy, viscometry Knothe, 1997: Knothe, 1999. ethanol. Such end products can be readily secreted as opposed titration for determining free fatty acids Komers, 1997, to intracellular products such as oil and cellulose. In yet other enzymatic methods Bailer, 1991, physical property-based embodiments, organisms produce Sugars, which are secreted methods, wet chemical methods, etc. into the media and Such Sugars are used during fermentation with the same or different organisms or a combination of Carbon Fingerprinting both. 0271 Biologically-produced carbon-based products, e.g., Processing and Separation of Carbon-Based Products of ethanol, fatty acids, alkanes, isoprenoids, represent a new Interest commodity for fuels, such as alcohols, diesel and gasoline. Such biofuels have not been produced using biomass but use 0266 The carbon-based products produced by the engi carbon dioxide as its carbon source. These new fuels may be neered chemoautotrophs during fermentation can be sepa distinguishable from fuels derived form petrochemical car rated from the fermentation media. Known techniques for bon on the basis of carbon-isotopic fingerprinting. Such prod separating fatty acid derivatives from aqueous media can be ucts, derivatives, and mixtures thereof may be completely employed. One exemplary separation process provided distinguished from their petrochemical derived counterparts herein is a two-phase (bi-phasic) separation process. This on the basis of ''C (fM) and carbon-isotopic fingerprinting, process involves fermenting the genetically-engineered pro indicating new compositions of matter. duction hosts under conditions sufficient to produce for 0272. There are three naturally occurring isotopes of car example, a fatty acid, allowing the fatty acid to collect in an bon: 'C, C, and C. These isotopes occur in above-ground organic phase and separating the organic phase from the total carbon at fractions of 0.989, 0.011, and 10', respec aqueous fermentation media. This method can be practiced in tively. The isotopes 'Cand 'Care stable, while ''C decays both a batch and continuous fermentation setting. naturally with a half-life of 5730 years to 'N, a beta particle, 0267 Bi-phasic separation uses the relative immisciblity and an anti-neutrino. The isotope C originates in the atmo of fatty acid to facilitate separation. A skilled artisan would sphere, dueprimarily to neutron bombardment of 'N caused appreciate that by choosing a fermentation media and the ultimately by cosmic radiation. Because of its relatively short organic phase Such that the fatty acid derivative being pro half-life (in geologic terms), ''C occurs at extremely low US 2015/003 7853 A1 Feb. 5, 2015

levels in fossil carbon. Over the course of 1 million years 0279 Table 3 introduces a new quantity, epsilon. This is without exposure to the atmosphere, just 1 part in 10 will the discrimination by a biological process in its utilization of remain ''C. (0273. The 'C:''C ratio varies slightly but measurably 'C vs. C. We define epsilon as follows: epsilon=(R/R)-1. among natural carbon Sources. Generally these differences (0280. This quantity is very similar to 8, and 8, except we are expressed as deviations from the 'C:'C ratio in a stan now compare the biological product directly to the carbon dard material. The international standard for carbon is Pee Source rather than to a standard. Using epsilon, we can com Dee Belemnite, a form of limestone found in South Carolina, with a 'C fraction of 0.01 12372. For a carbon source a, the bine the bias effects of a carbon source and a biological deviation of the 'C: 'Cratio from that of Pee Dee Belemnite process to obtain the bias of the biological product as com is expressed as: pared to the standard. Solving for 8, we obtain: 8 (epsilon) (Ö)+epsilon--ó, and, because (epsilon)(ö) is generally very small compared to the other terms, 8-8,+epsilon. 0274 where R =''C:''C ratio in the natural source, and R='C:''C ratio in Pee Dee Belemnite, the standard. 0281 For a biological product having a production pro 0275 For convenience, 6, is expressed in parts per thou cess with a known epsilon, we may therefore estimate 8, by sand, or %0. A negative value of 8 shows a bias toward 'C Summing 6 and epsilon. We assume that epsilon operates over'C as compared to Pee Dee BelemniteTable 2 shows 8, and ''C fraction for several natural sources of carbon. irrespective of the carbon source. 0282 This has been done in Table 3 for cyanobacterial TABLE 2 lipid and biomass produced from fossil carbon. As shown in 'C:'C variations in natural carbon sources the Tables above, cyanobacterial products made from fossil carbon (in the form of for example, flue gas or other emis Source -ö (%o) References sions) can have a higher 8, than those of comparable biologi Underground coal 32.5 Farquhar, 1989 cal products made from other sources, distinguishing them on Fossil fuels 26 Farquhar, 1989 the basis of composition of matter from these other biological Ocean DIC O-15 Goericke, 1994: Ivlev, 2010 Atmospheric CO2 6-8 Ivlev, 2010; Farquhar, 1989 products. In addition, any product derived solely from fossil Freshwater DIC* 6-14 Dettman, 1999 carbon can have a negligible fraction of C, while products Pee Dee Belemnite O Ivlev, 2010 made from above-ground carbon can have a ''C fraction of *DIC = dissolved inorganic carbon. approximately 10°. 0276 Biological processes often discriminate among car (0283 Accordingly, in certain aspects, the invention pro bonisotopes. The natural abundance of ''C is very small, and hence discrimination for or against ''C is difficult to measure. vides various carbon-based products of interest characterized Biological discrimination between C and 'C, however, is as -6(%0) of about 63.5 to about 66 and -epsilon(%.o) of about well-documented. For a biological product p, we can define 37.5 to about 40. For carbon-based products that are derived similar quantities to those above: from engineered autotrophs that make use of carbon fixation pathways other than the Calvin cycle, epsilon, and thus Ó can vary, as previously described Hayes, 2001. (0277 where R=1: C:''C ratio in the biological product, and R='C:"'C ratio in Pee Dee Belemnite, the standard. (0278 Table 3 shows measured deviations in the 'C:''C Sequences Provided by the Invention ratio for some biological products that arise from carbon fixation by the Calvin cycle. Other carbon fixation pathways (0284 Table 4 provides a summary of SEQID NOS:1-60 provide different “fingerprint” 'C: 'Cratios. disclosed herein. TABLE 3 C:"'C variations in selected biological products. -ö. -epsilon Product (%o) (%o)* References Plant Sugarf starch from atmospheric CO2 18-28 10-20 Ivlev, 2010) Cyanobacterial biomass from marine DIC 18-31 16.5-31 Goericke, 1994: Sakata, 1997 Cyanobacterial lipid from marine DIC 39-40 37.5-40 Sakata, 1997 Algal lipid from marine DIC 17-28 15.5-28 Goericke, 1994: Abelseon, 1961) Algal biomass from freshwater DIC 17-36 3-30 Marty, 2008) E. coilipid from plant Sugar 15-27 near O Monson, 1980 Cyanobacterial lipid from fossil carbon 63.5-66 37.5-40 - Cyanobacterial biomass from fossil 42.5-57 16.5-31 – carbon *epsilon = fractionation by a biological process in its utilization of 'C versus 'C (see text) US 2015/003 7853 A1 Feb. 5, 2015 42

TABLE 4 Sequences SEQ ID NO Sequence Codon optimized Burkholderia stabilis NADP' FDH gene Codon optimized Candida methylica NAD' FDH gene Codon optimized Candida boidini NAD' FDH gene Codon optimized Saccharomyces cerevisiae S288c NAD' FDH gene Clostridium pasteurianum putative ferredoxin-FDH FolhF subunit amino acid sequence Clostridium pasteurianum putative ferredoxin-FDH FolhD subunit amino acid sequence Clostridium pasteuriant in putative FDH-associated ferredoxin domain containing protein 1 amino acid sequence Clostridium pasteuriant in putative FDH-associated ferredoxin domain containing protein 2 amino acid sequence Codon optimized Aquifex aeolicus VF5 SQR gene 10 Codon op imized Nostoc sp. PCC 7120 SQR gene 11 Codon o imized Chlorobium tepidum TLS SQR gene 12 Codon o imized Acidithiobacilius ferrooxidans ATCC 23270 SQR gene 13 Codon optimized Allochromatium vinosum DSM 180 SQR gene 14 Codon optimized Rhodobacter capsulatus SB 1003 SQR gene 15 Codon optimized Thiobacilius denitrificans ATCC 25259 SQR gene 16 Codon optimized Magnetococcus sp. MC-1 SQR gene 17 Codon optimized Clostridium pasteuriant in ferredoxin gene 18 Codon o imized Hydrogenobacter thermophilus TK-6 fix1 gene 19 Codon o imized Hydrogenobacter thermophilus TK-6 flx2 gene 2O Codon o imized Methanosarcina barkeri str. Fusaro ferredoxin gene 21 Codon optimized Aquifex aeolicits fix7 gene 22 Aquifex aeolicits fix7 amino acid sequence 23 Codon optimized Aquifex aeolicits fix6 gene 24 Aquifex aeolicits fix6 amino acid sequence 25 Codon optimized gamma-proteobacterium NOR51-B MCR gene 26 Codon optimized Roseiflexus castenholzii DSM 13941 MCR gene 27 Codon o imized marine gamme proteobacterium HTCC2080 MCR gene 28 Codon o imized Erythrobacter sp. NAP1 MCR gene 29 Codon op imized Chloroflexus aurantiacus J-10-fl MCR gene 30 Codon optimized Chloroflexus aurantiacus PCS gene 31 Chloroflexits attrantiacus PCS amino acid sequence 32 Codon optimized Metaliosphaera sedula Pech gene 33 Codon optimized Metaliosphaera seditia AccC gene 34 Codon optimized Metaliosphaera seditia AccB gene 35 Codon optimized Nitrosopumilus maritimus SCM1 PccB gene 36 Codon o ptimized Nitrosopumilus maritimus SCM1 AccC gene 37 Codon o ptimized Nitrosopumilus maritimus SCM1 AccB gene 38 Codon o ptimized Cenarchaeum symbiosum A Pech3 gene 39 Codon optimized Cenarchael in symbiostin AAccC gene 40 Codon optimized Cenarchael in symbiostin AAccB gene 41 Codon optimized Halobacterium sp. NRC-1 PccB gene 1 42 Codon optimized Halobacterium sp. NRC-1 PccB gene 2 43 Codon optimized Halobacterium sp. NRC-1 PccB gene 1 Codon optimized Halobacterium sp. NRC-1 AccC gene 2 45 Codon o imized Halobacterium sp. NRC-1 AccB gene 46 Codon o imized Methylcoccus capsulatus str. Bath HPS gene 1 47 Codon o imized Methylcoccus capsulatus str. Bath HPS gene 2 48 Codon optimized Methylcoccus capsulatus str. Bath PHI gene 49 Codon optimized Mycobacterium gastri MB19 HPS-PHI fusion gene 50 Mycobacterium gastri MB19 HPS-PHI fusion amino acid sequence 51 Codon optimized Synechococcus elongatus PCC 7942 GAPDH gene 52 Codon optimized Synechococcus elongatus PCC 7942 SBPase gene 53 Codon optimized Synechococcus elongatus PCC 7942 PRK gene S4 Codon optimized Escherichia coli FocA gene 55 Codon optimized Escherichia coli Foch3 gene 56 Plasmid 2430 57 Plasmid 2429 58 Plasmid 4767 59 Plasmid 4768 60 Plasmid 4986 61 Codon optimized Escherichia coli ACS gene 62 Codon optimized Listeria monocytogenes ADH gene 63 Plasmid 94.63 64 Plasmid 94.62 65 Plasmid 20566 66 Plasmid 27439 US 2015/003 7853 A1 Feb. 5, 2015

EXAMPLES and contained the following: 100 ul of 200 mM potassium phosphate buffer, pH 7.0 (made by titering 200 mM dipotas 0285. The examples below are provided herein for illus sium hydrogen phosphate into 200 mM potassium dihydro trative purposes and are not intended to be restrictive. gen phosphate until the solution pH reached 7.0), 15 ul of 10 Example 1 mM NAD(P)" as appropriate, 20 ulcell lysate, and 30 Jul 0.5 M sodium formate. The absorbance at 340 nm of each sample was measured every 20 seconds in a Spectramax Gemini Plus Identification and Selection of Candidate plate reader in order to monitor the reduction of NAD(P)". Sulfide:Quinone Oxidoreductase Enzymes The assay plate was maintained at a temperature of 37°C. The 0286 To identify candidate sulfide-quinone oxidoreduc measured rates of NAD(P)" reduction were normalized to the tases (SQR) for the energy conversion pathway that uses number of cells used to prepare the cell lysates. The assay hydrogen Sulfide as an inorganic energy source, the Rhodo results are shown in FIG. 21. From the assay data, the quan bacter capsulatus SQR was selected as the model enzyme. titative activities of each FDH can be computed as well as The R. capsulatus SQR has been functionally expressed in the their cofactor preference (Table 5). heterologous host E. coli Schütz, 1997 and demonstrated to reduce ubiquinone Shibata, 2001. A search of the NCBI TABLE 5 Protein Clusters database was performed using the search term "sulfide quinone reductase' and 17 different protein Quantitative, measured activities of FDH clusters were identified as of Feb. 1, 2011 (CLSK2755575, amol NADP amol NADP In(NADP'? CLSK2397089, CLSK2336986, CLSK2302249, Plasmid min CFU min CFU NAD') CLSK2299965, CLSK943035, CLSK940594, negative -0.05 O.18 CLSK917086, CLSK903971, CLSK892.907, CLSK884384, control CLSK871744, CLSK871685, CLSK870501, CLSK785404, 2430 21.37 3.06 1.9 CLSK767599, CLSK724710). The 17 protein clusters com 2429 O.12 9.79 -4.4 prised 203 putative SQRs which were subsequently aligned using MUSCLE 3.8.31 using sequenceYP 003443063 as an outgroup. The resulting alignment was imported into Example 3 Geneious Pro 5.3.6 and a tree was made using a neighbor joining method. Based on the alignment, any sequences con Engineered E. coli that Oxidizes Hydrogen Sulfide taining less than four of six conserved residues were elimi nated from the set. The six conserved residues were three 0289 Plasmids comprising a high copy number replica conserved cysteines, two conserved histidines thought to be tion origin, chloramphenicol resistance marker and a codon involved in quinone binding and the absence of a conserved optimized sulfide-quinone oxidoreductase from Rhodobacter aspartate that is characteristic of all glutathion reductase fam capsulatus (sqr) gene under the control of two different rrnB ily of flavoproteins with the exception of SQRS Griesbeck, derived constitutive promoters were constructed using DNA 2000. The resulting sequences were realigned using assembly methods described in WO/2010/070295. The resulting plasmids 4767 (SEQID NO:58) and 4768 (SEQID MUSCLE and a new tree was made. Representative NO:59) were transformed into E. coli using standard plasmid sequences from each Glade were selected as candidate SQRs. transformation techniques. As a negative control, an expres Example 2 sion plasmid without a constitutive promoter but including the Sqr gene was also constructed. Engineered E. coli that Transfer Electrons from 0290 Cultures propagating each of the plasmids were Formate to NADH or NADPH inoculated from glycerol Stocks and grown for two days in an 8-well plate with fresh LB media supplemented with 34 0287 Plasmids comprising a high copy number replica ug/ml chloramphenicol at 30° C. Cells were pelleted by cen tion origin, chloramphenicol resistance marker and each of trifugation for 10 minutes at 2500 rpm and the supernatant two different codon-optimized formate dehydrogenase (fdh) decanted. The cell pellets were resuspended in 2 ml of SQR genes under the control of an irrnB-derived constitutive pro assay buffer (5 g/L sodium chloride, 5 mM magnesium chlo moter were constructed using DNA assembly methods ride hexahydrate, 1 mM calcium chloride dihydrate, 20 mM described in WO/2010/070295. The resulting plasmids 2430 Tris-HCl, pH 7.5). The absorbance at 600 nm of a 100 ul (SEQID NO:56) and 2429 (SEQID NO:57) and transformed aliquot of each resuspended culture was measured to monitor into E. coli using standard plasmid transformation tech the cell density. The assay reactions were prepared in a niques. As a negative control, an expression plasmid without 96-well plate containing 0, 100, 150, 200 ul of SQR assay any fah gene was also constructed. As a positive control, buffer; 10 ul of 0.1M sodium sulfide; and 200, 100, 50, and 0 purified NAD-dependent FDH enzyme obtained from com ul of resuspended cells. The absorbance at 600 nm of each mercial Sources was used. assay reaction was measured to monitor the cell density. The 0288 Cultures propagating each of the plasmids were sampling reactions were prepared in a 96-well assay plate and inoculated from glycerol stocks and grown overnight in a contained the following: 90 ul of Tris-HCl, pH 7.5; 8 ul 24-well plate with fresh LB media supplemented with 34 aliquot from sampling plate; and 8 ul Cline reagent Cline, ug/ml chloramphenicol at 37°C. The grown cultures were 1969. The absorbance at 670 nm of each sampling reaction then diluted into 1 ml fresh media in a 96-well plate. Cells was measured to monitor the Sulfide concentration. The assay were pelleted by centrifugation for 10 minutes at 3000xg and results are shown in FIG. 22. Based on this data, we estimate the supernatant decanted. The cell pellets were resuspended the sulfide oxidation rates in the cell resuspensions to be in 100 ul complete B-PER (contains DNasel and lysozyme). between 2-3.5 mM hour' or roughly 0.5-2.0 mmol sulfide g The assay reactions were prepared in a 96-well assay plate DCW hour'. US 2015/003 7853 A1 Feb. 5, 2015 44

Example 4 and incubated for four hours to allow the cells to reenter growth phase. The cells were then resuspended in either M9 Engineered E. coli Producing Propionyl-coA from minimal medium with 50 mM formate as the sole carbon 3-Hydroxypropionate source (results shown in Table 6) or LB medium with 50 mM formate (results shown in Table 7). Assays for formate levels 0291 Plasmids comprising a high copy number replica (as measured in mM of formate) were performed as described tion origin, chloramphenicol resistance marker and a codon in Example 8 at different timepoints. optimized propionyl-coA synthase from Chloroflexus auran tiacus (pcs) gene under the control of two different rrnB TABLE 6 derived constitutive promoters were constructed using DNA assembly methods described in WO/2010/070295. The Formate uptake by various deletion strains, minimal medium resulting plasmid 4986 (SEQ ID NO:60) was transformed into E. coli using standard plasmid transformation tech Strain genotype O 2O 40 60 240 niques. As a negative control, an expression plasmid without negative control 88 89 98 90 85 AfdhF 89 91 85 66 46 the pcs gene was also constructed. AfdnG 84 8O 65 48 14 0292 Cultures propagating each of the plasmids were AfdoG 84 77 93 S4 S4 inoculated from glycerol stocks and grown overnight in a AselA 84 130 93 88 77 24-well plate with fresh LB media supplemented with 34 AselB 89 124 95 86 59 ug/ml chloramphenicol at 37° C. Cells were pelleted by cen trifugation and the Supernatant decanted. The cell pellets were resuspended in 600 ul complete B-PER (contains TABLE 7 DNasel and lysozyme) and incubated for 30 minutes at 37°C. The assay reactions were prepared in a 96-well assay plate Formate uptake by various deletion strains, rich medium and contained the following: 71 ul of reaction buffer (3 mM ATP, 0.5 mM CoASH, 0.4 mMNADPH, 1XPCS buffer), 20 pil Strain genotype O 2O 40 60 240 of cell lysate and 9 ul of a ten-fold dilution of chemically negative control 68 74 74 64 70 synthesized 3-hydroxypropionate (see below). The 1xPCS AfdhF 81 76 74 66 62 AfdnG 73 74 66 57 28 buffer contained 100 mM Tris-HCl, pH 7.6, 10 mM potas AfdoG 77 74 69 63 64 sium chloride, 5 mM magnesium chloride hexahydrate, 2 AselA 77 78 76 72 78 mM 1,4-dithioerythritol. The absorbance at 340 nm of each AselB 72 46 67 60 76 assay reaction was measured every 12 seconds to monitor the oxidation of NADPH. As controls, the assay reaction contain lysate from a strain propagating plasmid 4986 was also Example 6 assayed in the absence of each required substrate (ATP, Assay Methods to Measure Hydrogenase Activity CoASH, NADPH, 3-hydroxypropionate or 3-HPAA). The 0295 The following assay can be used to measure hydro assay results are shown in FIG. 23. genase enzyme activity in intact cells. All steps are performed 0293. The chemical 3-hydroxypropionate is used a sub in a Shel-labs Bactron IV anaerobic chamber containing strate in enzymatic assays of propionyl-coA synthase (PCS). anaerobic mixed gas (90% nitrogen gas, 5% hydrogen gas, 3-hydroxypropionate can be made via chemical synthesis 5% carbon dioxide). Cultures with and without hydrogenase from B-propiolactone via the following method. A solution is activity are inoculated from single colonies on LB-agar plates prepared containing 0.3 M technical grade 3-propiolactone and grown overnight in a 24-well plate with fresh LB media. (Sigma Aldrich catalog number P-5648) and 2 M sodium An aliquot of each culture (1-2 ml) is pelleted by centrifuga hydroxide and incubated overnight at room temperature. The tion and the Supernatant decanted. The cells are then resus solution is then neutralized with either hydrochloric acid or pended in 1-2 ml 50 mM Tris-HCl, pH 7.6. A very small phosphoric acid. The presence of the reaction product 3-hy amount of sodium dithionite is picked up with a pipette tip droxypropionate can be confirmed via LC-MS. LC-MS can and dissolved into 100 ul of 50 mM Tris-HCl, pH 7.6. The also reveal that no other measurable side-products are assay reactions are prepared in a 96-well plate and contain the formed. Since the starting material, B-propiolactone, is following: 100 ul resuspended cells and 100 ul 0.8 mM highly bacteriocidal, but the product, 3-hydroxypropionate, methyl viologen in 50 mM Tris-HCl, pH 7.6. The 96-well is not, growth inhibition assays can also be used to demon plate is then loaded into a Biochrom UVM340 spectrophoto strate complete conversion of the starting material. metric plate reader and the absorbance at 600 nm is measured at 45 second intervals. To validate the assay, we assayed E. Example 5 coli strain 242 (K strain MG 1655), strain 312 (B strain BL21 DE3 with plysSplasmid) and strain 393 (B strain BL21 DE2 with genes tonA, hycE, hyaB and hybC deleted). E. coli K Engineered E. coli with Reduced Competing strains are known to have hydrogenase activity whereas B Formate Uptake Activity strains do not Pinske, 2011. Assay results are shown in FIG. 0294 The formate uptake of a series of gene deletion 24. strains of E. coli were analyzed as to identify genes respon Example 7 sible for competing, endogenous formate uptake activity in E. coli. All deletion strains were obtained from the Keio collec Identification and Sequencing of a tion Baba, 2006. The negative control was the absence of Formate-Ferredoxin Oxidoreductase from cells. Cultures were grown aerobically in LB medium supple Clostridium pasteurianum mented with 50 mM formate overnight, harvested by cen 0296. A culture sample of Clostridium pasteurianum W5 trifugation, resuspended in fresh LB medium with formate, (ATCC 6013) was obtained from the ATCC (genome size is US 2015/003 7853 A1 Feb. 5, 2015

3.9 Mbp). Fogel, 1999. The strain was cultured under dering isocitrate dehydrogenase non-functional. Table 8 anaerobic conditions in reinforced clostridial medium shows the results of endpoint absorbance at 600 nm measure (Difico). Four aliquots of 1 ml of culture were pelleted by ments of Strain 149 grown under different conditions for 36 centrifugation at 6000xg for 5 minutes and the supernatant hours at 37°C. The negative control is M9 media with glucose removed by aspiration. Genomic DNA was isolated with the with no cells. All readings shown are an average of three Wizard genomic DNA purification kit (Promega) according measurement replicates of the same culture. to the manufacturers instructions for Gram-positive bacteria with the following exceptions. In the lysis step, 10 mg/L TABLE 8 lysozyme in 10 mM Tris, 0.5 mM EDTA, pH 8.2 was used without any additional lysis enzymes. Also, 10 mM Tris, 0.5 Endpoint A600 nm measurements of Strain 149 mM EDTA, pH 8.2 was used in lieu of DNA rehybridization solution. The DNA yield was approximate 26 lug of DNA Growth conditions Average Std Dev from 4 ml of culture. The genomic DNA was sequenced at the Negative control O.O358 O.OOO3 Harvard/MGH sequencing facility. They prepared 160 bp M9 media + glucose O.O363 O.OOO8 M9 media + glucose + glutamate 0.2155 O.OO73 inserts from the genomic DNA and obtained 300 MM 75 bp M9 media + glucose + proline O.1913 O.OO41 paired end reads on an Illumina HiSeq sequencer. The result M9 media + glucose + glutamate + proline O.2145 O.0049 ing coverage was 5000x. Denovo assembly of the reads using Velvet resulting in 170 contigs greater than 5 kb in length comprising 3.9 Mbp. The resulting contigs were analyzed by 0299. When grown under anaerobic conditions, E. coli Glimmer resulting in 3474 identified ORFs comprising 3.6 runs a branched version of the tricarboxylic acid cycle. Mbp. A BLASTable database of amino acid sequences of all Hence, the glutamate/proline auxotrophy phenotype of identified ORFs was produced using NCBI BLAST formatdb tool and subsequently a BLASTable contig database was strains such as Strain 149 in which the iccd gene is rendered generated. Based on inspection of the BLAST results, two non-functional can be rescued by introduction of an exog putative FDH subunits were identified (SEQ ID NO:5 and enous, functional 2-oxoglutarate synthase (FIG. 26). SEQID NO:6) as well as two putative associated ferredoxin domain containing subunits (SEQ ID NO:7 and SEQ ID Example 10 NO:8). Methods for Growth-Based Selections for Formate Example 8 Utilization Assay Methods to Measure Formate Uptake by Intact Cells 0300. Using a model of E. coli metabolism Edwards, 2002, the phenotypic phase planes for E. coli under a variety 0297. The following assay can be used to measure formate of growth conditions were computed. The growth conditions levels in cultures thereby facilitating measurement of formate examined included formate co-metabolism with a second, uptake by intact cells. Cultures are inoculated from glycerols limiting organic carbon source under both anaerobic and and grown overnight in a 24-well plate with fresh LB media supplemented with the appropriate antibiotic as needed. The aerobic (i.e., unlimited oxygen uptake) conditions. The cultures are pelleted and an aliquot of the supernatant (300 ul) organic carbon Sources examined include glucose, glycerol, is saved. The assay reactions are prepared in a 96-well plate malate. Succinate, acetate and glycolate. For each carbon and contain the following: 80 ul of 200 mM potassium phos Source, several in silico genotypes were evaluated including phate buffer pH 7.0, 15ul of freshly prepared 100 mMNAD", (1) wild-type E. coli, (2) E. coli with its native formate dehy 35ul of culture supernant, 20Ll of 100x dilution of pure FDH drogenases (FDH) enzymes removed, (3) wild-type E. coli enzyme purchased commercially. The 96-well plate is then with a heterologous NAD(P)-dependent FDH and (4) E. coli loaded into a Spectramax spectrophotometric plate reader with native FDHs removed and a heterologous NAD(P)- and the absorbance at 340 nm is measured at 12 second dependent FDH. The purpose of the analysis was to identify intervals preceded by 5 seconds of mixing. The rate of NADH growth conditions that created selective pressure for formation can be calculated from the rate of change in the increased formate uptake and utilization. Based on the com absorbance at 340 nm and varies with the level of formate in puted phenotypic phase planes (FIG. 27), increased formate the sample (FIG. 25). uptake correlated with increased growth rates under aerobic growth conditions with a non-fermentable inorganic carbon Example 9 SOUC (glycerold succinate-malate propionate-acetate glycolate). Methods for Growth-Based Selections for Hence, this set of growth conditions is the preferred set of 2-Oxoglutarate Synthase Activity conditions for growth-based selections for formate utiliza 0298 To select for functional 2-oxoglutarate synthase tion. The model analysis also suggests that wildtype E. coli is activity in E. coli, the following growth-based selection can capable of growth on formate as a sole carbon Source with a be used. A strain with the gene encoding isocitrate dehydro predicted doubling time of 1.4 days and that inclusion of an genase rendered non-functional is used such that the strain exogenous NAD-dependent FDH reduces the doubling time cannot make 2-oxoglutarate (a precursor to glutamate Syn (FIG. 28). thesis in the cell). Such a strain can only grow in glucose 0301 E. coli strains can be evolved for improved formate minimal media that is Supplemented with either glutamate or utilization either through repeated Subculturing or through proline (proline degradation produces glutamate) Helling, continuous culturing in a chemostat or turbidostat using the 1971). Strain 149 (CGSCH4451) has the iccd-3 mutation ren above culture conditions. US 2015/003 7853 A1 Feb. 5, 2015 46

Example 11 0308 Alternative electron donors have the potential to solve both the safety problem and the mass transfer problem Computing Mass Transfer Limitations of Hydrogen presented by hydrogen. An ideal non-hydrogen vector for Versus Formate as an Inorganic Energy Source carrying electrical energy would share hydrogen's attractive 0302) The mass transfer limitations of hydrogen from the characteristics, which include (a) a highly negative standard gas to liquid phase is illustrated here. For the purpose of this reduction potential, and (b) established high-efficiency tech analysis, an ideal engineered chemoautotroph that has an nology to for converting electricity into the vector. Unlike unlimited capacity to (i) metabolize dissolved aqueous-phase hydrogen, however, it would (c) have a low propensity to hydrogen and (ii) convert it and carbon dioxide to a desired explode when mixed with air, and (d) have high water solu fuel at 100% of the theoretical yield is assumed. Under these bility under bio-compatible conditions. Formic acid, conditions, the rate of fuel production per unit of reactor HCOOH, or its salts, satisfies these conditions. Formic acid is Volume can depend solely on the rate at which hydrogen can Stoichiometrically equivalent to H+CO, and formate has as be transferred from the gas phase to the liquid phase. standard reduction potential nearly identical to that of hydro 0303. Fuel productivity P in units of g-L'h' can be gen. Since both formic acid and formate salts are highly expressed as the product of fuel molecular weight m, fuel soluble in water, the mass transfer limitations discussed molar yield on hydrogen Y, the biomass concentration in a above for hydrogen do not apply. However, a modified form bioreactor X, and the specific cellular uptake rate of hydrogen of the fuel productivity equation, written for formic acid (A) q, as shown in the equation below. instead of hydrogen (H), still applies, as shown below. P-in-YX 4 0309 Unlike hydrogen-powered electrofuels bioproduc (0304) At steady state, the bulkhydrogen uptake rate X, is tion, limits on formate-powered fuel productivity P stem only equal to the rate of hydrogen transfer from gas to liquid, from the attainable yield, the biomass concentration in the meaning the productivity can be expressed as in the equation reactor, and the specific uptake rate. We assume Y, the below, where C* is the liquid-phase solubility of hydrogen, molar yield of fuel on formic acid, is the stoichiometric maxi C is the liquid-phase concentration of hydrogen, and Ka is mum, whose value is the same as for hydrogen, 0.0467 mol the mass transfer coefficient for hydrogen transport from the isooctanol (mol HCOOH). For high-cell density cultiva gas phase (e.g., as bubbles sparged into the reactor) to the tions of E. coli, biomass concentrations of X=50 g|DCW L' liquid. Ka is a complex function of reactor geometry, bubble are attainable, although these values have not been observed size, Superficial gas Velocity, impeller speed, etc. and is best for growth on formate or in minimal medium. For Thiobacil regarded as an empirical parameter that needs to be deter lus strain A2, naturally capable of growing on formate, mined for a given bioreactor setup. observed values of were 0.0368 mol formate g|DCW'h' P=m-YKia (C-C) Kelly, 1979. The representative values for q and X imply a 0305 Again, as a best-case scenario, an ideal engineered maximal isooctanol productivity on formate of about 10 g-L chemoautotroph capable of maintaining rapid hydrogen 1-h'. uptake rates even at Vanishingly low hydrogen concentrations 0310. On the y-axis of FIG. 29, the range of reported Ka (i.e. that q is not a function of C, even as C, tends to Zero) is attainable in large-scale stirred-tank bioreactors is shown. assumed. This assumption maximizes the fuel productivity at Although there are many reports of higher Ka values in Pm.Y.KaC*. laboratory-scale reactors, during scale up the inevitable (0306 For a fixed production targett, say 0.5 t d' (equiva increase in Volume-to-Surface area ratios means that main lent to 20800 g h"), the productivity P determines the taining high Ka values is for practical purposes impossible. required reactor volume V because V=t/P. Thus, both fuel The maximum of the indicated range of 10-800h' translates productivity and reactor Volumes, even assuming “perfect” to a best-case productivity of 4 g. L'h', which implies a best-case reactor volume of 6,400 L. The best-case produc organisms, are bounded by achieveable Ka values, as shown tivity on formate is 10 gL'h', implying a reactor volume in the equations below. less than half as large would be required to achieve the same production. Most sources that give Ka values for large scale reactors have values much closer to 100 h", meaning the P = (mf YEHC)KLa best-case productivity using formate as the inorganic energy Source would be more than 15 times larger than on hydrogen. Example 12 0307 Maximal productivity corresponds to minimal reac Engineered Organisms Producing Butanol tion Volumes, and occurs at maximal values of my HCKa. The fuel yield cannot exceed the stoichiometric 0311. The enzyme beta-ketothiolase (R. eutropha PhaA or maximal yield. For the fuel isooctanol, the stoichiometric E. coli AtoB) (E.C. 2.3.1.16) converts 2 acetyl-CoA to maximal yield is determined from the balanced chemical acetoacetyl-CoA and CoA. Acetoacetyl-CoA reductase (R. equation 8CO+24H-->CHsO+15H2O, which shows that eutropha PhaE) (E.C. 1.1.1.36) generates R-3-hydroxybu 24 moles of H are required for each mole of isooctanol tyryl-CoA from acetoacetyl-CoA and NADPH. Alternatively, produced. At atmospheric pressure, C is unlikely to greatly 3-hydroxybutyryl-CoA dehydrogenase (C. acetobutyllicum exceed 0.75 mM, the solubility of H in pure water. Using Hbd) (E.C. 1.1.1.30) generates S-3-hydroxybutyryl-CoA these representative values for representative values form, from acetoacetyl-CoA and NADH. Enoyl-CoA hydratase (E. YC and t, the relationships between Ka and Pas well as coli MaoC or C. acetobutylicum Crt) (E.C. 4.2.1.17) gener between Ka and t are shown (FIG. 29). ates crotonyl-CoA from 3-hydroxybutyryl-CoA. Butyryl US 2015/003 7853 A1 Feb. 5, 2015 47

CoA dehydrogenase (C. acetobutylicum Bcd) (E.C. 1.3.99.2) containing 20 g L' of xylose. In parallel, E. coli NEB10? generates butyryl-CoA and NAD(P)H from crotonyl-CoA. cells harboring plasmid 9462 were grown overnight with Alternatively, trans-enoyl-coenzyme A reductase (Tre selection in Luria Broth (LB) medium containing 20 g L' of ponema denticola Ter) (E.C. 1.3.1.86) generates butyryl-CoA Xylose. from crotonyl-CoA and NADH. Butyrate CoA-transferase 0317 Both E. coli cultures were harvested by centrifuga (R. eutropha Pct) (E.C. 2.8.3.1) generates butyrate and acetyl tion, and cell pellets were lysed by resuspension in 0.1 culture CoA from butyryl-CoA and acetate. Aldehyde dehydroge volumes of a buffer containing DNAse I (8 U. mL), nase (E. coli AdhE) (E.C. 1.2.1. {3,4}) generates butanal lysozyme (>1 mg mL), dithioerythritol (0.5 mM), and Tris from butyrate and NADH. Alcohol dehydrogenase (E. coli buffer (20 mM, pH 7.5) followed by rapid freeze-thaw (3 adhE) (E.C. 1.1.1. {1,2}) generates 1-butanol from butanal cycles using liquid nitrogen and at warm water bath). Lysates and NADH, NADPH. Production of 1-butanol is conferred by were clarified by centrifugation for 5 min at >4000 g. the engineered host cell by expression of the above enzyme 0318. The lysates were mixed by combining 20 uL of each activities. into the well of a standard 96-well flat-bottom assay plate. 0312 To create butanol-producing cells, host cells can be The plate was incubated at 30 C. In parallel, lysates from E. further engineered to express acetyl-CoA acetyltransferase coli cultures expressing a metabolically inert gfp gene as a (atoB) from E. coli K12, B-hydroxybutyryl-CoA dehydroge negative control were prepared in an identical fashion. nase from Butyrivibrio Fibrisolvens, crotonase from “Blank” lysates made from the lysis reagent only—i.e. with Clostridium beijerinckii, butyryl CoA dehydrogenase from no cells—were also included as a control. Clostridium beijerinckii, CoA-acylating aldehyde dehydro 0319 A reaction mixture was added to the lysates or lysate genase (ALDH) from Cladosporium fulvum, and adhE mixtures at time Zero so that the final Volume in the well was encoding an aldehyde-alcoholdehydrogenase of Clostridium 200 uL and the final concentration of (non-lysate derived) acetobutylicum (or homologs thereof). reactants was: coenzyme A, 0.5 mM, adenosine triphosphate (ATP), 10 mM, ribulose-5-phosphate (Ru5P), 1 mM, nicotine Example 13 adenine dinucleotide (NAD 1 mM, magnesium sulfate, 5 mM; potassium phosphate buffer pH 7.0, >150 mM; formal Engineered Organisms Producing Acrylate dehyde; 5 mM. The formaldehyde stock solution was previ 0313 Enoyl-CoA hydratase (E. coli paaF) (E.C. 4.2.1.17) ously prepared by autoclaving 240 mg of paraformaldehyde converts 3-hydroxypropionyl-CoA to acryloyl-CoA. Propio powder suspended in 8 mL of pure water at 121 C in a sealed nyl-CoA synthase (E.C. 6.2.1.-, E.C. 4.2.1.- and E.C. 1.3.1.-) septum vial until it was solubilized. also converts 3-hydroxypropionyl-CoA to acryloyl-CoA 0320 In parallel, a separate reaction mixture was prepared (AAL47820, SEQ ID NO:30, SEQ ID NO:31). Acrylate with an identical composition, except that 'C-enriched CoA-transferase (R. eutropha pct) (E.C. 2.8.3.n) generates paraformaldehyde (>99% isotopic purity; Cambridge Isotope acrylate--acetyl-CoA from acryloyl-CoA and acetate. Laboratories, Massachusetts USA) was used as a formalde hyde Source. Example 14 0321. At 0 minutes, 30 minutes, and 120 minutes after the start of the enzyme reactions, 40 uL of the reaction mixture Conversion of Formaldehyde to Central Metabolic was withdrawn from the assay plate and mixed with 160 uL of Intermediates by Lysates of Recombinant E. coli a quenching Solution consisting of 0.1 M formic acid in 40% Cells Viv methanol, 40% w/v acetonitrile, and 20% w/v water. 0314. The hexulose-6-phosphate isomerase (HPS) Samples were vacuum-aspirated to dryness in preparation for enzyme YP 115430 and 6-phospho-3-hexuloisomerase detection by liquid chromatography electrospray ionization (PHI) enzymeYP 115431 were recoded for expression in E. mass spectrometry (LC-ESI-MS) of fructose-6-phosphate coli using the algorithm described in 00109 above and/or pool sizes and fructose-1,6-bisphosphate pool sizes. elsewhere in the present application. Briefly, the algorithm 0322 LC-ESI-MS analysis was carried out on a Thermo attempts to (a) preserve codon rank order frequency in the Q-Exactive LC-ESI-MS system capable of mass determina Source organism (Methylococcus capsulatus) and the target tion to within 5 ppm. Metabolites were eluted from a 100-by organism (E. coli); (b) eliminate undesired restriction endo 2.1 mm hybrid reverse-phase chromatography column with nuclease recognition sequences in the re-coded gene 2.6 um beads (Accucore aO, Thermo Scientific) with a linear sequence; and (c) avoid undesired DNA or RNA secondary gradient consisting of 15 mMacetic acid in ultrapure water as structure in the re-coded gene or its transcript. The resulting the weak solvent and methanol as the strong solvent and nucleotide sequences are provided as SEQ ID NO.47 and introduced to the mass spectrometer via a HESI-III ESI SEQ ID NO:48, respectively. The codon-optimized genes Source. Elution and column reequilibration was carried out were obtained via commercial gene synthesis. under uPLC conditions at a flow rate of 500 uL/min and total 0315 Plasmids encoding a high copy number replication run time of 7 minutes using an Accela 1250 uPLC pump and origin, an antibiotic resistance marker and either a codon Accela OpenAS autosampler. During autosampling, samples optimized heXulose-6-phosphate isomerase from M. capsu were maintained at 4C, while the column was kept at 30 C. latus under the control of a constitutive promoter or a codon ESI source and mass spectrometer acquisition settings were optimized 6-phospho-3-hexuloisomerase from M. capsulatus optimized and operated in both negative and positive polari were constructed using DNA assembly methods described in ties, using a panel of pure standards of metabolites of interest. WO/2010/07025. The resulting plasmids 94.63 (SEQ ID Full MS scans were performed at a resolution of 70,000 over NO:63) and 9462 (SEQID NO:64) were transformed into E. a mass range of 70-900 m/z, allowing for a minimum of 15-20 coli using standard plasmid transformation techniques. scans across each extracted ion chromatogram for absolute 0316 E. coli NEB10/B cells harboring plasmid 9463 were and relative quantitation under uPLC conditions. When grown overnight with selection in Luria Broth (LB) medium needed, tandem MS/MS scans were also performed in both US 2015/003 7853 A1 Feb. 5, 2015 48 targeted and data dependent schemes to obtain additional rrnB-derived constitutive promoter was constructed using structural information via HCD-induced fragmentation of DNA assembly methods described in WO/2010/07025. The intact precursor ions. Metabolite feature identification and resulting plasmid 20566 (SEQ ID NO:65) was transformed full scan quantitation were performed using integration and into E. coli using standard plasmid transformation tech alignment algorithms in Xcalibur (Thermo) and XCMS niques. (Scripps Research Institute). 0326 E. coli cells harboring plasmid 20566 were grown in 0323 Fructose-6-phosphate was detected in negative culture and lysed by the methods described in Example 14. mode as the CHOPanion at m/z 259.0224 Da+/-5 ppm. Lysate-based enzyme reactions were started as described in The M+1 C isotopologue of fructose-6-phosphate (1-'C- Example 14, except that sodium formate 30 mM was used in F6P) was detected at m/z =260.0258 Da+/-5 ppm. A time place of formaldehyde and 30 mM of sodium 'C-formate course showing incorporation of carbon derived from form (>99 atom '% isotopic purity; Cambridge Isotope Laborato aldehyde into fructose-6-phosphate (F6P) is shown in Table ries) was used in place of 'C formaldehyde. 9. The time is in units of minutes and the metabolites are in units of peak area (counts). H'CHO denotes formaldehyde 0327 Samples were withdrawn for LC-ESI-MS analysis and HCHO denotes "C-enriched paraformaldehyde. The as described in Example 14. results shows that carbon from formaldehyde is converted to 0328. Formyl coenzyme A (formyl-CoA) was detected in the native E. coli metabolite fructose-6-phosphate in an HPS negative mode as the CH-N-OPS anion at m/z. 794. and PHI-dependent manner 1028 Da. The M+1 °C isotopologue of formyl-CoA (1-'C- TABLE 9 HPS- and PHI-dependent conversion of formaldehyde H2CHO) to fructose-6-phosphate (FOP with H2CHO with HCHO with no formaldehyde Time F6P 1-C-F6P F6P 1-C-F6P F6P 1-C-F6P HPSPHI O 2.6E--06 NF 1.OE-06 2.OE-06 6.1E--OS NF mixture 3O 4.9E-07 4.1E--06 4.4E--O6 4.7E--O7 6.OE--O6 2.9E-05 12O 31E--07 2.OE-06 5.3E--06 2.7E--07 2.5E-07 3.1E--O6 GFP control O 34E--05 NF 4.OE-05 NF 4.5E-05 NF lysate 3O 11E--O6 NF 11E--O6 NF 1.7E--O6 NF 120 2.9E-06 NF 2.8E--O6 1.2E--04 S2E--O6 NF

Example 15 formyl-CoA) was detected at m/z795.1062 Da. Total counts detected for the m/z =794.1028 ion increased sharply in a time Conversion of Formate to Formyl-CoA in Lysates in reaction mixtures to which sodium formate 30 mM Derived from Recombinant E. coli Cells Expressing Acetyl-CoA Synthetase (H'°COO) was added. Total counts detected for the m/Z=795.1062 ion increased sharply in time only in reaction 0324. The E. coli acetyl-CoA synthetase (ACS) enzyme mixtures to which sodium 30 mM'C-formate (HCOO) AAC77039 was recoded using the algorithm described in was added. In control reactions using lysates from cells not 00109 above, and/or elsewhere in the presentapplication, to expressing acs, no corresponding increases were observed. In eliminate undesired restriction endonuclease recognition sequences in the re-coded gene sequence and avoid undesired control reactions using lysis buffer and no cellular material, DNA or RNA secondary structure in the re-coded gene or its no corresponding increase was observed. The data are shown transcript. The resulting nucleotide sequences is provided as in Table 10. The time is inunits of minutes and the metabolites SEQID NO:61. The codon-optimized gene was obtained via are in units of peak area (counts). Formyl-CoA is denoted by commercial gene synthesis. f-CoA and the M+1 C isotopologue of formyl-CoA is 0325 A plasmid encoding a medium copy number repli denoted by 1-'C-f-CoA. The results demonstrate that for cation origin, an antibiotic resistance marker and the recoded mate was converted to formyl-CoA in a formate- and ACS acetyl-CoA synthetase from E. coli under the control of an dependent manner. TABLE 10 ACS-dependent conversion of formate to formyl-CoA with H2COO with HCOO with no formate

Time f-CoA 1-C-f-CoA f-CoA 1-C-f-CoA f-CoA 1-C-f-CoA

Plasmid O 2.OE-08 5.9E-07 15E-08 4SE-07 4.5E-07 1.OE-09 20566 30 2.6E--08 6.9E-07 S.SE-07 7.5E-07 1.7E--08 4.6E--07 lysate 120 1.3E--09 3.7E--08 4SE-07 1.OE-09 7.9E-07 21E--O7 GFP O 2.OE-08 S.6E--07 15E-08 4.4E--07 1.7E--08 4.9E-07 control 30 2.OE-08 S.6E--07 15E-08 4SE-07 14E--08 3.9E-07 lysate 120 18E--08 4.8E--O7 1.4E--08 4.4E--07 15E-08 4.2E--07 US 2015/003 7853 A1 Feb. 5, 2015 49

Example 16 Other Embodiments Interconversion of Formyl-CoA and Formaldehyde 0334. The examples have focused on E. coli. Nevertheless, by Lysates Derived from Recombinant E. coli Cells the key concept of using genetically engineering to converta Expressing an Acylating Aldehyde Dehydrogenase heterotroph into an engineered chemoautotrophis extensible to other, more complex organisms such as other prokaryotic 0329. The Listeria monocytogenes acetaldehyde dehydro or eukaryotic single cell organisms such as E. coli or S. genase, acylating (ADH) enzyme NP 464704 was recoded cerevisiae, hosts suitable for scale up during fermentation, for expression in E. coli using the algorithm described in archaea, plant cells or cell lines, mammalian cells or cell 00109 and Example 14 above, and/or elsewhere in the lines, or insect cells or cell lines. Alternatively, the same present application. The resulting nucleotide sequences is energy conversion, carbon fixation and/or carbon product provided as SEQID NO:62. biosynthetic pathways described here may be used to enhance 0330. A plasmid encoding a high copy number replication or augment the autotrophic capability of an organism that is origin, an antibiotic resistance marker and the recoded ADH natively autotrophic. under the control of an isopropyl B-D-1-thiogalactopyrano side (IPTG)-inducible bacteriophage T7-based promoter was 0335 Various aspects of the present invention may be used designed by us and synthesized via commercial gene synthe alone, in combination, or in a variety of arrangements not sis. The resulting plasmid 27439 (SEQID NO:66) was trans specifically discussed in the embodiments described in the formed into E. coli using standard plasmid transformation foregoing and is therefore not limited in its application to the techniques. details and arrangement of components set forth in the fore 0331 Cells were grown as described in Example 14, going description or illustrated in the drawings. For example, except that after 2 hr of incubation in the overnight growth aspects described in one embodiment may be combined in medium, 1 mM of IPTG was added to induce gene expres any manner with aspects described in other embodiments. S1O. 0336 Use of ordinal terms such as “first,” “second, 0332 Lysates were prepared and reactions were initiated “third,' etc., in the claims to modify a claim element does not as described in Example 14, except (i) that ATP and ribulose by itself connote any priority, precedence, or order of one 5-phosphate were omitted from the reaction and (ii) time claim element over another or the temporal order in which point samples were taken after 0, 3, and 10 minutes after acts of a method are performed, but are used merely as labels starting the reactions. to distinguish one claim element having a certain name from 0333 Formyl-CoA was detected as described in Example another element having a same name (but for use of the 15. Total counts detected for the m/z-794.1028 ion increased ordinal term) to distinguish the claim elements. sharply in time in reaction mixtures to which formaldehyde 5 0337 Also, the phraseology and terminology used herein mM was added. Total counts detected for the m/z-795. 1062 is for the purpose of description and should not be regarded as ion increased sharply with time only in reaction mixtures to limiting. The use of “including.” “comprising,” or “having.” which C formaldehyde 5 mM was added. In control reac “containing.” “involving.” and variations thereof herein, is tions using lysates from cells that were not induced to express meant to encompass the items listed thereafter and equiva ADH, no corresponding increases were observed. LC-ESI lents thereofas well as additional items. MS detection of both NAD and NADH (in negative mode using m/z 662.1018 and m/z =664. 1175, respectively) con EQUIVALENTS firmed that formyl-CoA formation was linked to NAD deple tion and NADH formation (data not shown). The data are 0338. The present invention provides among other things shown in Table 11. The time is in units of minutes and the novel methods and systems for synthetic biology. While spe metabolites are in units of peak area (counts). Formyl-CoA is cific embodiments of the subject invention have been dis denoted by f-CoA, and the M+1 °C isotopologue of formyl cussed, the above specification is illustrative and not restric CoA is denoted by 1-'C-f-CoA. The results show that ADH tive. Many variations of the invention will become apparent to expressed in E. coli lysates effects the interconversion of those skilled in the art upon review of this specification. The formaldehyde and NAD with formyl-CoA and NADH. full scope of the invention should be determined by reference TABLE 11

ADH-dependent interconversion of formaldehyde and formyl-CoA

with HCHO with HCHO with no formaldehyde

Time f-CoA 1-C-f-CoA f-CoA 1-C-f-CoA f-CoA 1-C-f-CoA induced O 5.3E--06 NF NF NF 4.5E-05 NF NF 10 1.7E--08 4.4E--07 NF 11E--08 NF NF uninduced O NF NF 3 2.9E-06 NF 10 NF NF US 2015/003 7853 A1 Feb. 5, 2015 50 to the claims, along with their full scope of equivalents, and 0351 Bai F. W. Anderson WA, Moo-Young M. Ethanol the specification, along with Such variations. fermentation technologies from Sugar and starch feed stocks. Biotechnol Adv. 2008 January-February; 26(1):89 INCORPORATION BY REFERENCE 105. 0352 Bailer J. de Hueber K. Determination of saponifi 0339 All publications, patents and patent applications ref able glycerol in “bio-diesel.” Fresenius JAnal Chem. 1991; erenced in this specification are incorporated herein by refer 340(3):186. ence in their entirety for all purposes to the same extent as if 0353 Bar-Even A, Noor E. Lewis NE, Milo R. Design and each individual publication, patent or patent application were analysis of synthetic carbon fixation pathways. Proc Natl specifically indicated to be so incorporated by reference. AcadSci USA. 2010 May 11; 107(19):8889-94. 0354 Bassham J A, Benson A A. Kay L. D. Harris A Z. REFERENCES CITED Wilson AT, Calvin M. The path of carbon in photosynthe sis. XXI. The cyclic regeneration of carbon dioxide accep 0340 Abelseon PH, Hoering T C. Carbon isotope frac tor. JAm ChemSoc. 1954; 76:1760-70. tionation in formation of amino acids by photosynthetic 0355 Bayer TS, Widmaier DM, Temme K, Mirsky EA, organisms. Proc Natl Acad Sci. 1961; 47:623-32. Santi D V. Voigt CA. Synthesis of methyl halides from 0341 Aharoni A, Keizer L. C. Bouwmeester H.J. Sun Z. biomass using engineered microbes. JAm ChemSoc. 2009 Alvarez-Huerta M, Verhoeven HA, Blaas J. van Houwelin May 13; 131(18):6508-15. gen AM, De Vos RC, Vander Voet H. Jansen RC, Guis M. 0356 Berríos-Rivera SJ, San KY, Bennett GN. The effect Mol J, Davis RW, Schena M, van Tunen AJ, O'Connell A of NAPRTase overexpression on the total levels of NAD. P. Identification of the SAAT gene involved in strawberry the NADH/NAD+ ratio, and the distribution of metabolites flavor biogenesis by use of DNA microarrays. Plant Cell. in Escherichia coli. Metab Eng. 2002 July; 4(3):238-47. 2000 May; 12(5):647-62. 0357 Brock T. Biotechnology: A Textbook of Industrial 0342 Alber B E, Fuchs G Propionyl-coenzyme A syn Microbiology. Second Edition. Sinauer Associates, Inc. thase from Chloroflexus aurantiacus, a key enzyme of the Sunderland, Mass. 1989. 3-hydroxypropionate cycle for autotrophic CO, fixation. J 0358 Brugna-Guiral M, Tron P. Nitschke W. Stetter KO, Biol Chem. 2002 Apr. 5; 277(14): 12137-43. Burlat B, Guigliarelli B, Bruschi M, Giudici-Orticoni MT. (0343 Alber B, Olinger M, Rieder A, Kockelkorn D, Jobst NiFe hydrogenases from the hyperthermophilic bacte B, Higler M, and Fuchs G Malonyl-coenzyme A reductase rium Aquifex aeolicus: properties, function, and phyloge in the modified 3-hydroxypropionate cycle for autotrophic netics. Extremophiles. 2003 April; 7(2):145-57. carbon fixation in archaeal Metallosphaera and Sulfolobus 0359 Buchanan B B, Amon DI. A reverse KREBS cycle spp. J Bacteriol 2006 December; 188(24) 8551-9. in photosynthesis: consensus at last. Photosynth Res. 1990; 24:47-53. (0344 Andersen J B, Sternberg C. Poulsen L. K. Bjorn SP, 0360 BurgdorfT, vander Linden E. Bernhard M. Yin QY. Givskov M. Molin S. New unstable variants of green fluo Back J. W. Hartog A F, Muijsers A. O. de Koster C G: rescent protein for studies of transient gene expression in Albracht S P Friedrich B. The soluble NAD"-Reducing bacteria. Appl Environ Microbiol. 1998 June: 64(6):2240 NiFel-hydrogenase from Ralstonia eutropha H16 con 6. sists of six subunits and can be specifically activated by 0345 Anderson J C, Voigt CA, Arkin A R Environmental NADPH. J. Bacteriol. 2005 May; 187(9): 3122-32. signal integration by a modular AND gate. Mol Syst Biol. 0361 Camilli A, Bassler B L. Bacterial small-molecule 2007; 3:133. signaling pathways. Science. 2006 Feb. 24; 311 (5764): 0346 Aoshima M. Ishii M., and Igarashi Y. A novel 1113-6. enzyme, citryl-CoA synthetase, catalysing the first step of 0362 Campbell BJ, Jeanthon C, Kostka J. E. Luther GW the citrate cleavage reaction in Hydrogenobacter thermo 3", Cary S.C. Growth and phylogenetic properties of novel philus TK-6. Mol Microbiol 2004 May: 52(3) 751-61. (a) bacteria belonging to the epsilon subdivision of the Pro 0347 Aoshima M. Ishii M., and Igarashi Y. A novel teobacteria enriched from Alvinella pompejana and deep enzyme, citryl-CoA lyase, catalysing the second step of the sea hydrothermal vents. Appl Environ Microbiol. 2001 citrate cleavage reaction in Hydrogenobacter thermophilus October; 67(10):4566-72. TK-6. Mol Microbiol 2004 May: 52(3) 763-70. (b) 0363 Campbell BJ, Smith J. L. Hanson T E. Klotz, MG, Stein L. Y. Lee C K, Wu D, Robinson J M, Khouri HM, 0348 Aoshima M, Ishii M., and Igarashi Y. A novel biotin Eisen JA, Cary S. C. Adaptations to submarine hydrother protein required for reductive carboxylation of 2-oxoglut mal environments exemplified by the genome of Nautilia arate by isocitrate dehydrogenase in Hydrogenobacter profundicola. PLoS Genet. 2009 February: 5(2): thermophilus TK-6. Mol Microbiol 2004 February: 51(3) e1OOO362. 791-8 (C) 0364 Canton B. Labno A, Endy D. Refinement and stan 0349 Aoshima Mand IgarashiY. A novel oxalosuccinate dardization of synthetic biological parts and devices. Nat forming enzyme involved in the reductive carboxylation of Biotechnol. 2008 July; 26(7):787-93. 2-oxoglutarate in Hydrogenobacter thermophilus TK-6. 0365 Cheesbrough TM, Kolattukudy PE. Alkane biosyn Mol Microbiol 2006 November; 62(3) 748-59. thesis by decarbonylation of aldehydes catalyzed by a par 0350 Baba T, Ara T. Hasegawa M, Takai Y, Okumura Y. ticulate preparation from Pisum sativum. Proc Natl Acad Baba M., Datsenko KA, Tomita M, Wanner B L, Mori H. Sci USA. 1984 November; 81 (21):6613-7. Construction of Escherichia coli K-12 in-frame, single 0366 Chen S, von Bamberg D. Hale V. Breuer M, Hardt B, gene knockout mutants: the Keio collection. Mol Syst Biol. Müller R, Floss H G, Reynolds KA, Leistner E. Biosyn 2006: 2:2006.0008. thesis of ansatrienin (mycotrienin) and naphthomycin. US 2015/003 7853 A1 Feb. 5, 2015

Identification and analysis of two separate biosynthetic 0381 Edwards J S. Ramakrishna R, Palsson B O. Charac gene clusters in Streptomyces collinus TU. 1892. Eur J terizing the metabolic phenotype: a phenotype phase plane Biochem. 1999 April; 261 (1):98-107. analysis. Biotechnol Bioeng. 2002 Jan. 5; 77(1):27-36. 0367 Chin J. W. Chino PC. Improved NADPH supply for 0382 Eisenreich W. Strauss G. WerZU, Fuchs G, Bacher xylitol production by engineered Escherichia coli with A. Retrobiosynthetic analysis of carbon fixation in the glycolytic mutations. Biotechnol Prog. 2011 March-April; phototrophic eubacterium Chloroflexus aurantiacus. Eur J 27(2):333-41. doi:10.1002/btprS59. Biochem. 1993 Aug. 1; 215(3):619-32. 0368 Cline J. D. Spectrophotometric Determination of 0383 Evans M C, Buchanan B B, Arnon D. I. A new Hydrogen Sulfide in Natural Waters. Limnol Oceanogr. ferredoxin-dependent carbon reduction cycle in a photo 1969; 14(3):454-8. synthetic bacterium. Proc Natl AcadSci USA. 1966 April; 0369 Cronan J. E. LaPorte D. Tricarboxylic Acid Cycle 55(4):928-34. and Glyoxylate Bypass. In A. Bock, R. Curtiss III, J. B. (0384 Evans CT, Sumegi B, Srere PA, Sherry AD, Malloy Kaper, P. D. Karp, F. C. Neidhardt, T. Nystrom, J. M. C R. 13Cpropionate oxidation in wild-type and citrate Slauch, C. L. Squires, and D. Ussery (ed.), EcoSal—Es synthase mutant Escherichia coli: evidence for multiple cherichia coli and Salmonella: Cellular and Molecular pathways of propionate utilization. Biochem J. 1993 May Biology. http://www.ecosal.org. ASM Press, Washington, 1: 291 (Pt3):927-32. D.C. 2010 Mar. 12. (0385 Farquhar G D, Ehleringer J R, and Hubick K. T. 0370 CroppTA, Wilson DJ, Reynolds KA. Identification Carbon isotope discrimination and photosynthesis. Annu of a cyclohexylcarbonyl CoA biosynthetic gene cluster and Rev Plant Physiol Plant Mol Biol. 1989: 40:503-37. application in the production of doramectin. Nat Biotech 0386 Ferenci T. Strom T. and Quayle J. R. Purification and nol. 2000 September; 18(9):980-3. properties of 3-hexulose phosphate synthase and phospho 0371 Davis J. H. Rubin AJ, Sauer RT. Design, construc 3-hexuloisomerase from Methylococcus capsulatus. Bio tion and characterization of a set of insulated bacterial promoters. Nucleic Acids Res. 2011 February; chem J 1974 December; 144(3)477-86. 39(3):1131-41. (0387 Fogel G. B. Collins CR, Li J, Brunk C F. Prokaryotic 0372 de Mendoza D, Klages Ulrich A, Cronan J E Jr. GenomeSize and SSU rNACopy Number: Estimation of Thermal regulation of membrane fluidity in Escherichia Microbial Relative Abundance from a Mixed Population. coli. Effects of overproduction of beta-ketoacyl-acyl car Microb Ecol. 1999 August;38(2):93-113. rier protein synthase I. J Biol Chem. 1983 Feb. 25:258(4): (0388 Fong S S, Palsson B O. Metabolic gene-deletion 2098-101. strains of Escherichia coli evolve to computationally pre 0373) Dellomonaco C, Clomburg JM, Miller EN, Gonza dicted growth phenotypes. Nat Genet. 2004 October; lez R. Engineered reversal of the B-oxidation cycle for the 36(10): 1056-8. synthesis of fuels and chemicals. Nature. 2011 Aug. 10; (0389 Friedmann S, SteindorfA, AlberBE, Fuchs G Prop 476(7360):355-9. erties of Succinyl-coenzyme A:L-malate coenzyme A 0374 Dennis M. W. Kolattukudy PE. Alkane biosynthesis transferase and its role in the autotrophic 3-hydroxypropi by decarbonylation of aldehyde catalyzed by a microsomal onate cycle of Chloroflexus aurantiacus. J Bacteriol. 2006 preparation from Botryococcus braunii. Arch Biochem April; 188(7):2646-55. Biophys. 1991 June: 287(2):268-75. 0390 Friedmann S, Alber B E, Fuchs G. Properties of 0375 Denoya CD, Fedechko RW. Hafner E.W. McArthur R-citramalyl-coenzyme A lyase and its role in the HA, Morgenstern MR, Skinner D D, Stutzman-Engwall autotrophic 3-hydroxypropionate cycle of Chloroflexus K. Wax RG, Wemau WC. A second branched-chain alpha aurantiacus. J Bacteriol. 2007 April; 189(7):2906-14. keto acid dehydrogenase gene cluster (bkdFGH) from 0391 Gehring U and Arnon DI. Purification and proper Streptomyces avermitilis: its relationship to avermectin ties of -ketoglutarate synthase from a photosynthetic bac biosynthesis and the construction of abkdF mutant suitable terium. J Biol Chem 1972 Nov. 10; 247(21) 6963-9. for the production of novel antiparasitic avermectins. J 0392 Gerhold D, Rushmore T, Caskey CT. DNA chips: Bacteriol. 1995 June; 177(12):3504-11. promising toys have become powerful tools. Trends Bio 0376 Deshpande MV. Ethanol production from cellulose chem Sci. 1999 May; 24(5):168-73. by coupled saccharification/fermentation using Saccharo 0393 Goericke R, Montoya J. P. Fry B. Physiology of myces cerevisiae and cellulase complex from Sclerotium isotopic fractionation in algae and cyanobacteria. Chapter rolfsii UV-8 mutant. Appl Biochem Biotechnol. 1992 Sep 9 in "Stable Isotopes in Ecology and Environmental Sci tember: 36(3):227-34. ence”. Blackwell Publishing. 1994. 0377 Dettman DL, Reische AK, Lohmann KC. Controls 0394 Grantham R, Gautier C, Gouy M, Mercier R, Pavé on the stable isotope composition of seasonal growth bands A. Codon catalog usage and the genome hypothesis. in aragonitic fresh-water bivalves (unionidae). Geochim Nucleic Acids Res. 1980 Jan. 11; 8(1):r49-rô2. Cosmochim Acta. 1999; 63:1049-57. 0395 Greene D N, Whitney S M, Matsumura I. Artifi 0378 Doolittle, RF (Editor). Computer Methods for Mac cially evolved Synechococcus PCC6301 Rubisco variants romolecular Sequence Analysis. Methods in Enzymology. exhibit improvements in folding and catalytic efficiency. 1996; 266:3-711. Biochem J. 2007 Jun. 15:404(3):517-24. 0379 Edgar R.C. MUSCLE: multiple sequence alignment 0396 Griesbeck C, Hauska G, Schutz M. Biological Sul with high accuracy and high throughput. Nucleic Acids fide Oxidation: Sulfide-Quinone Reductase (SQR), the Pri Res. 2004 Mar. 19; 32(5):1792-7. (a) mary Reaction. Recent Research Developments in Micro 0380 Edgar R C. MUSCLE: a multiple sequence align biology. 2000; 4:179-203. ment method with reduced time and space complexity. 0397 Gul-Karaguler N, Session R B, Clarke A R, Hol BMC Bioinformatics. 2004 Aug. 19: 5:113. (b) brook J.J. A single mutation in the NAD-specific formate US 2015/003 7853 A1 Feb. 5, 2015 52

dehydrogenase from Candida methylica allows the 0414 Huisman GW. Gray D. Towards novel processes for enzyme to use NADP. Biotechnol Lett. 2001; 23(4):283-7. the fine-chemical and pharmaceutical industries. Curr 0398 Gutteridge S. Phillips A L. Kettleborough CA, Opin Biotechnol. 2002 August; 13(4):352-8. Parry M A J. Expression of bacterial Rubisco genes in 0415 Ikeda T. Yamamoto M, Arai H, Ohmori D, Ishii M., Escherichia coli. Phil Trans R Soc Lond B 313:433-45. Igarashi Y. Two tandemly arranged ferredoxin genes in the 0399 Han L, Reynolds K.A. A novel alternate anaplerotic Hydrogenobacter thermophilus genome: comparative pathway to the glyoxylate cycle in Streptomycetes. J Bac characterization of the recombinant 4Fe-4S ferredoxins. teriol. 1997 August; 179(16):5157-64. Biosci Biotechnol Biochem. 2005 June: 69(6):1172-7. 0400 Hatrongjit R, Packdibamrung K. A novel NADP+- 0416) Inokuma K. Nakashimada Y. Akahoshi T. Nishio N. dependent formate dehydrogenase from Burkholderia sta Characterization of enzymes involved in the ethanol pro bilis 15516: Screening, purification and characterization. duction of Moorella sp. HUC22-1. Arch Microbiol. 2007 Enzyme Microb Technol. 2010 Jun. 7:46(7):557-61. July; 188(1):37-45. 0401 Hawley DK, McClure W R. Compilation and analy 0417 Ivlev A A. Carbon isotope effects (C/°C) in bio sis of Escherichia coli promoter DNA sequences. Nucleic logical systems. Separation SciTechnol. 2010; 36:1819 Acids Res. 1983 Apr. 25; 11 (8):2237-55. 1914. Janausch Zientz E. Tran Q H. Kroger A. Unden G 0402 Hayes J. M. Fractionation of Carbon and Hydrogen C4-dicarboxylate carriers and sensors in bacteria. Biochim Isotopes in Biosynthetic Processes. Rev Mineral Biophys Acta. 2002 Jan. 17: 1553(1-2):39-56. Geochem. 2001 January; 43(1):225-77. 0418 Jukes T H, Osawa S. Evolutionary changes in the 0403. Helling R B, Kukora J S. Nalidixic acd-resistant genetic code. Comp Biochem Physiol B. 1993 November; mutants of Escherichia coli deficient in isocitrate dehydro 106(3):489-94. genase. J Bacteriol. 1971 March; 105(3):1224-6. 0419 Kalscheuer R. Steinbichel A. A novel bifunctional 04.04 Henry C S, Jankowski M. D. Broadbelt LJ, Hatzi wax ester synthase/acyl-CoA:diacylglycerol acyltrans manikatis V. Genome-scale thermodynamic analysis of ferase mediates wax ester and triacylglycerol biosynthesis Escherichia coli metabolism. Biophys J. 2006 Feb. 15: in Acinetobacter calcoaceticus ADP1. J Biol Chem. 2003 90(4): 1453-61. Mar. 7: 278(10):8075-82. 04.05 Henstra A M, Sipma J. Rinzema A, Stams A J. 0420 Kalscheuer R. Stolting T. Steinbüchel A. Microdie Microbiology of synthesis gas fermentation for biofuel sel: Escherichia coli engineered for fuel production. production. Curr Opin Biotechnol. 2007 June; 18(3):200 Microbiology. 2006 September; 152(Pt 9):2529-36. 6. 0421 Kanao T. Kawamura M., Fukui T, Atomi H, and 0406. Herter S, Fuchs G, Bacher A, Eisenreich W. A bicy Imanaka T. Characterization of isocitrate dehydrogenase clic autotrophic CO, fixation pathway in Chloroflexus from the green sulfur bacterium Chlorobium limicola. A aurantiacus. J Biol Chem. 2002 Jun. 7:277 (23):20277-83. carbon dioxide-fixing enzyme in the reductive tricarboxy (a) 04.07 Herter S. Busch A, Fuchs G L-Malyl-coenzyme A lic acid cycle. Eur J. Biochem 2002 April; 269(7) 1926-31. lyase/beta-methylmalyl-coenzyme A lyase from Chlorof. (a) lexus aurantiacus, a bifunctional enzyme involved in 0422 Kanao T. Fukui T. Atomi H, and Imanaka T. Kinetic autotrophic CO fixation. J Bacteriol. 2002 November; and biochemical analyses on the reaction mechanism of a 184(21):5999-6006. (b) bacterial ATP-citrate lyase. Eur J Biochem 2002 July; 269 0408 Ho N. W. Chen Z, Brainard A R Genetically engi (14)3409-16. (b) neered Saccharomyces yeast capable of effective cofer 0423 Kaneda T. Iso- and anteiso-fatty acids in bacteria: mentation of glucose and Xylose. Appl Environ Microbiol. biosynthesis, function, and taxonomic significance. Micro 1998 May: 64(5):1852-9. biol Rev. 1991 June; 55(2):288-302. 04.09 Hoffmeister M, Piotrowski M. Nowitzki U, Martin 0424 Kapust RB, Waugh D S. Escherichia coli maltose W. Mitochondrial trans-2-enoyl-CoA reductase of wax binding protein is uncommonly effective at promoting the ester fermentation from Euglena gracilis defines a new solubility of polypeptides to which it is fused. Protein Sci. family of enzymes involved in lipid synthesis. J Biol Chem. 1999 August; 8(8):1668-74. 2005 Feb. 11; 280(6):4329-38. 0425 Keasling J D, Jones K L. Van Dien SJ. New Tools 0410 Holo H. Chloroflexus aurantiacus secretes 3-hy for Metabolic Engineering of Escherichica coli. Chapter 5 droxypropionate, a possible intermediate in the assimila in Metabolic Engineering. Marcel Dekker. New York, N.Y. tion of CO, and acetate. Arch Microbiol. 1989; 151(3): 1999. (a) 252-6. 0426 Keasling J D. Gene-expression tools for the meta 0411 Higler M, Menendez. C. Schagger H, Fuchs G bolic engineering of bacteria. Trends Biotechnol. 1999 Malonyl-coenzyme A reductase from Chloroflexus auran November; 17(14452-60. (b) tiacus, a key enzyme of the 3-hydroxypropionate cycle for 0427 Kelly D P. Wood P. Gottschal J C, Kuenen J. G. autotrophic CO, fixation. J Bacteriol. 2002 May: 184(9): Autotrophic metabolism of formate by Thiobacillus strain 2404-10. A2. J Gen Microbiol. 1979; 114:1-13. 0412 Higler M, Huber H, Molyneaux S.J., Vetriani C, 0428 Kelly J R, Rubin AJ, Davis J. H., Ajo-Franklin CM, Sievert S. M. Autotrophic CO, fixation via the reductive Cumbers J. Czar M.J. de Mora K, Glieberman AL, Monie tricarboxylic acid cycle in different lineages within the D D, Endy D. Measuring the activity of BioBrick promot phylum Aquificae: evidence for two ways of citrate cleav ers using an in vivo reference standard. J Biol Eng. 2009 age. Environ Microbiol. 2007 January: 9(1):81-92. Mar. 20; 3:4. 0413 Higler M, Sievert S M. Beyond the Calvin cycle: 0429 Kemp M. B. The hexose phosphate synthetase of autotrophic carbon fixation in the ocean. Ann Rev Mar Sci. Methylococcus capsulatus. Biochem J. 1972 April; 127 2011; 3:261-89. (3):64P-65P. US 2015/003 7853 A1 Feb. 5, 2015

0430 Kemp MB. Hexose phosphate synthase from Meth 0448 Martinez-Alonso M, Toledo-Rubio V. Noad R, ylcoccus capsulatus makes D-arabino-3-hexulose phos Unzueta U, Ferrer-Miralles N, Roy P. Villaverde A. phate. Biochem J. 1974 April; 139(1): 129-34. Rehosting of bacterial chaperones for high-quality protein 0431 Kim O B. Unden G. The L-tartrate? succinate anti production. Appl Environ Microbiol. 2009 December; porter TtdT (YgE) of L-tartrate fermentation in Escheri 75(24):7850-4. chia coli. J. Bacteriol. 2007 March; 189(5):1597–603. 0449 Martinez-Alonso M. Garcia-Fruitos E. Ferrer 0432 Kim JY, Jo B H, Cha HJ. Production of biohydro Miralles N. Rinas U, Villaverde A. Side effects of chaper gen by heterologous expression of oxygen-tolerant Hydro one gene co-expression in recombinant protein production. genovibrio marinus NiFel-hydrogenase in Escherichia Microb Cell Fact. 2010 Sep. 2:9:64. coli. J. Biotechnol. 2011 Jul. 20. 0450 Marty J. Planas D. Comparison of methods to deter 0433 Klimke W. Agarwala R. Badretdin A, Chetvernin S, mine algalo"Cin freshwater. Linmol Oceanogr: Methods. Ciufo S, Fedorov B, Kiryutin B, O'Neill K, Resch W. 2008; 6:51-63. Resenchuk S, Schafer S, Tolstoy I, Tatusova T. The 0451 Menendez C, Bauer Z, Huber H, Gadon N, Stetter National Center for Biotechnology Information’s Protein KO, Fuchs G. Presence of acetyl coenzyme A (CoA) Clusters Database. Nucleic Acids Res. 2009 January; carboxylase and propionyl-CoA carboxylase in 37(Database issue): D216-23. autotrophic Crenarchaeota and indication for operation of 0434 Knight T. Idempotent Vector Design for Standard a 3-hydroxypropionate cycle in autotrophic carbon fixa Assembly of Biobricks. DOI: 1721.1/21168. tion. J Bacteriol. 1999 February; 181(4): 1088-98. 0435 Knight T. BBF RFC10: Draft Standard for Bio 0452 Minshull J, Stemmer W. P. Protein evolution by Brick(R) biological parts. DOI: 1721.1/45138. molecular breeding. Curr Opin Chem Biol. 1999 June; 0436 Larkum A W. Limitations and prospects of natural 3(3):284-90. photosynthesis for bioenergy production. Curr Opin Bio 0453 Miroshnichenko ML, Kostrikina NA, L’Haridon S. technol. 2010 June; 21(3):271-6. Jeanthon C, Hippe H. Stackebrandt E. Bonch-Os 0437. Knothe G, Dunn RO, Bagby M. O. Biodiesel: The molovskaya E. A. Nautilia lithotrophica gen. nov... sp. nov., use of vegetable oils and their derivatives as alternative a thermophilic Sulfur-reducing epsilon-proteobacterium diesel fuels. Am Chem Soc Symp Series. 1997: 666:172 isolated from a deep-sea hydrothermal vent. IntJSyst Evol 208. Microbiol. 2002 July; 52(Pt 4): 1299-304. 0438 Knothe G. Rapid monitoring of transesterification 0454 Mitsui R, Sakai Y. Yasueda H. Kato N. A novel and assessing biodiesel fuel quality by NIR spectroscopy operon encoding formaldehyde fixation: the ribulose using a fiber-optic probe. JAm Oil ChemSoc. 1999: 76(7): monophosphate pathway in the gram-positive facultative 795-800. methylotrophic bacterium Mycobacterium gastri MB19. J 0439 Knothe G. Dependence of biodiesel fuel properties Bacteriol. 2000 February; 182(4):944-8. on the structure of fatty acid alkyl Esters. Fuel Process 0455 Monson KD, Hayes J.M. Biosynthetic control of the Technol. 2005: 86:1059-1070. natural abundance of carbon 13 at specific positions within 0440 Kolkman JA, Stemmer W. P. Directed evolution of fatty acids in Escherichia coli. J. Biol Chem. 1980; 255: proteins by exon shuffling. Nat Biotechnol. 2001 May; 11435-41. 19(5):423-8. 0441 Komers K, Skopal F. Stloukal R. Determination of 0456 Moriya Y. Itoh M, Okuda S. Yoshizawa AC, Kane the neutralization number for biodiesel fuel production. hisa M. KAAS: an automatic genome annotation and path Fett/Lipid. 1997: 99(2):52-54. way reconstruction server. Nucleic Acids Res. 2007 July; 0442 Lame TA, Kurz. W. G. Estimation of nitrogenase 35(Web Server issue):W182-5. using a colorimetric determination for ethylene. Plant 0457. Morweiser M. Kruse 0, Hankamer B, Posten C. Physiol. 1973 June; 51(6):1074-5. Developments and perspectives of photobioreactors for 0443 Li Y., Florova G, Reynolds K. A. Alteration of the biofuel production. Appl Microbiol Biotechnol. 2010 July; fatty acid profile of Streptomyces coelicolor by replace 87(4): 1291-301. ment of the initiation enzyme 3-ketoacyl acyl carrier pro 04.58 MurliS, Opperman T. Smith BT, Walker G.C. A role tein synthase III (FabH). J Bacteriol. 2005 June; 187(11): for the umul)C gene products of Escherichia coli in 3795-9. increasing resistance to DNA damage in stationary phase 0444 Liu CL, Mortenson L. E. Formate dehydrogenase of by inhibiting the transition to exponential growth. J Bac Clostridium pasteurianum. J Bacteriol. 1984 July; teriol. 2000 February; 182(4): 1127-35. 159(4375-80. 0459 Murtagh, F. Complexities of Hierarchic Clustering 0445 Marcia M, Ermler U, Peng G, Michel H. Anew Algorithms: the State of the Art. Computational Statistics structure-based classification of Sulfide: quinone oxi Quarterly. 1984; 1:101-13. doreductases. Proteins. 2010 April; 78(5):1073-83. 0460 Nature Genetics. 1999; 21 (1):1-60. 0446 Marrakchi H. Zhang Y M. Rock C.O. Mechanistic 0461) Ness JE, Del Cardayre SB, Minshull J, Stemmer W diversity and regulation of Type II fatty acid synthesis. P. Molecular breeding: the natural approach to protein Biochem Soc Trans. 2002 November; 30(Pt6):1050-5. (a) design. Adv. Protein Chem. 2000: 55:261-92. Marrakchi H, Choi KH, Rock C.O. A new mechanism for 0462) Ober J.A. Sulfur. U.S. Geological Survery Minerals anaerobic unsaturated fatty acid formation in Streptococ Report 2008. 2010; 74:1-17. cus pneumoniae. J Biol Chem. 2002 Nov. 22: 277(47): 0463 Orita I, Yurimoto H, Hirai R. Kawarabayasi Y. Sakai 44809-16. (b) Y. Kato N. The archaeon Pyrococcus horikoshii possesses 0447 Martin VJJ. Smolke C, Keasling J D. Redesigning a bifunctional enzyme for formaldehyde fixation via the cells for production of complex organic molecules. ASM ribulose monophosphate pathway. J Bacteriol. 2005 June; News. 2002: 68:336-343. 187(11):3636-42. US 2015/003 7853 A1 Feb. 5, 2015 54

0464 Orita I, Sato T. Yurimoto H, Kato N, Atomi H, 0479. Roessner C A, Spencer J B, Ozaki S, Min C, Imanaka T. Sakai Y. The ribulose monophosphate pathway Atshaves B P Nayar P. Anousis N, Stolowich NJ, Holder Substitutes for the missing pentose phosphate pathway in man MT. Scott AI. Overexpression in Escherichia coli of the archaeon Thermococcus kodakaraensis. J Bacteriol. 12 vitamin B12 biosynthetic enzymes. Protein Expr Purif 2006 July; 188(13):4698-704. 1995 April; 6(2):155-63. 0465. Orita I, Sakamoto N. Kato N.Yurimoto H, and Sakai 0480 Sachdev D, Chirgwin J M. Solubility of proteins Y. Bifunctional enzyme fusion of 3-hexulose-6-phosphate isolated from inclusion bodies is enhanced by fusion to synthase and 6-phospho-3-hexuloisomerase. Appl Micro maltose-binding protein or thioredoxin. Protein Expr biol Biotechnol 2007 August: 76(2) 439-45. Purif. 1998 February; 12(1): 122-32. 0466 Palaniappan N, Kim B. S. Sekiyama Y. Osada H, 0481 Sachdev D, Chirgwin J M. Fusions to maltose-bind Reynolds K. A. Enhancement and selective production of ing protein: control of folding and solubility in protein phoslactomycin B, a protein phosphatase IIa inhibitor, purification. Methods Enzymol. 2000: 326:312-21. through identification and engineering of the correspond 0482 Saitou N. Nei M. The neighbor-joining method: a ing biosynthetic gene cluster. J Biol Chem. 2003 Sep. 12; new method for reconstructing phylogenetic trees. Mol 278(37):35552-7. Biol Evol. 1987 July; 4(4):406-25. 0467 Park MO. New pathway for long-chain n-alkane 0483 Sakata S. Hayes J. M. McTaggart A. R. Evans R A, synthesis via 1-alcohol in Vibrio furnissii M1. J Bacteriol. Leckrone K.J. Togasaki RK. Carbon isotopic fractionation 2005 February; 187(4): 1426-9. associated with lipid biosynthesis by a cyanobacterium: 0468 Parikh MR, Greene DN, Woods KK, Matsumura I. relevance for interpretation of biomarker records. Directed evolution of RuBisCO hypermorphs through Geochim Cosmochim Acta. 1997: 61:5379-89. genetic selection in engineered E. coli. Protein Eng Des 0484 Sambrook, J. Russell, D. Molecular Cloning: A Sel. 2006 March; 19(3): 113-9. Laboratory Manual, Third Edition. CSHL Press. Cold 0469 Patton S M, Cropp T A, Reynolds K. A. A novel Spring Harbor, N.Y. 2001. delta(3),delta(2)-enoyl-CoA isomerase involved in the 0485 San KY, Bennett G. N. Berrios-Rivera SJ, Vadali R biosynthesis of the cyclohexanecarboxylic acid-derived V.YangYT, Horton E. Rudolph FB, Sariyar B, Blackwood moiety of the polyketide ansatrienin.A. Biochemistry. 2000 K. Metabolic engineering through cofactor manipulation Jun. 27:39(25):7595-604. and its effects on metabolic flux redistribution in Escheri 0470 Pinske C, Bonn M. Kruger S, LindenstrauBU, Saw chia coli. Metab Eng. 2002 April; 4(2):182-92. ers RG Metabolic Deficiences Revealed in the Biotechno 0486 Sauer U, Canonaco F. Heri S. Perrenoud A, Fischer logically Important Model Bacterium Escherichia coli E. The Soluble and membrane-bound transhydrogenases BL21 (DE3). PLoS One. 2011; 6(8):e22830. Udha and PntAB have divergent functions in NADPH 0471 Portis A R Jr. Parry M A. Discoveries in Rubisco metabolism of Escherichia coli. J. Biol Chem. 2004 Feb. (Ribulose 1.5-bisphosphate carboxylase/oxygenase): a 20: 279(8):6613-9. historical perspective. Photosynth Res. 2007 October; 0487 Schena M. (editor). DNA Microarrays: A Practical 94(1):121-43. Approach. The Practical Approach Series, Oxford Univer 0472. Pramanik J. Keasling J D. Stoichiometric model of sity Press, 1999. Escherichia coli metabolism: incorporation of growth-rate 0488 Schena M (editor). Microarray Biochip: Tools and dependent biomass composition and mechanistic energy Technology. Eaton Publishing Company/BioTechniques requirements. Biotechnol Bioeng. 1997 Nov. 20; 56(4): Books Division. 2000. 398-421. 0489 Schütz M, Shahak Y. Padan E, and Hauska G Sul 0473 Pramanik J. Keasling J D. Effect of Escherichia coli fide-quinone reductase from Rhodobacter capsulatus. biomass composition on central metabolic fluxes predicted Purification, cloning, and expression. J Biol Chem 1997 by a stoichiometric model. Biotechnol Bioeng. 1998 Oct. Apr. 11; 272(15) 9890-4. 20; 60(2):230-8. (a) 0490 Self W T Hasona A, Shanmugam KT. Expression 0474 Pramanik J. Trelstad PL Keasling J D. A flux-based and regulation of a silent operon, hyf, coding for hydroge Stoichiometric model of enhanced biological phosphorus nase 4 isoenzyme in Escherichia coli. J. Bacteriol. 2004 removal metabolism. Wat SciTechnol. 1998: 37(4-5):609 January: 186(2):580-7. 13. (b) 0491 Serov A E, Popova AS, Fedorchuk VV. Tishkov V 0475 Pramanik J. Trelstad P L. Schuler AJ, Jenkins D, I. Engineering of coenzyme specificity of formate dehy Keasling J D. Development and validation of a flux-based drogenase from Saccharomyces cerevisiae. Biochem J. Stoichiometric model for enhanced biological phosphorus 2002 Nov. 1; 367(Pt3):841-7. removal metabolism. Water Res. 1998; 33(2):462-76. (C). 0492. Shetty R P. Endy D, Knight T F Jr. Engineering 0476 Rathnasingh C. RajS M, Lee Y. Catherine C, Ashok BioBrick vectors from BioBrick parts. J Biol Eng. 2008 S, and Park S. Production of 3-hydroxypropionic acid via Apr. 14; 2:5. malonyl-CoA pathway using recombinant Escherichia 0493. Shetty R, Lizarazo M, Rettberg R, Knight T F. coli strains. J Biotechnol 2011 Jun. 23. Assembly of BioBrick standard biological parts using 0477 Reading NC, Sperandio V. Quorum sensing: the three antibiotic assembly. Methods Enzymol. 2011: 498: many languages of bacteria. FEMS Microbiol Left. 2006 311-26. January; 254(1): 1-11. 0494 Shibata H and Kobayashi S. Sulfide oxidation in 0478 Rock C O, Tsay J T, Heath R. Jackowski S. gram-negative bacteria by expression of the Sulfide Increased unsaturated fatty acid production associated quinone reductase gene of Rhodobacter capsulatus and by with a suppressor of the fabA6(Ts) mutation in Escherichia electron transport to ubiquinone. Can J Microbiol 2001 coli. J. Bacteriol. 1996 September; 178(18):5382-7. September; 47(9) 855-60. US 2015/003 7853 A1 Feb. 5, 2015

0495. Shpaer E. G. GeneAssist. Smith-Waterman and constitutes the major glucose uptake system of Streptomy other database similarity searches and identification of ces coelicolor A3(2). Mol Microbiol. 2005 January: 55(2): motifs. Methods Mol Biol. 1997: 70:173-87. 624-36. 0496 Sintsov NV, Ivanovskii RN, and Kondrat'eva EN. 0510 Venturi V. Regulation of quorum sensing in ATP-dependent citrate lyase in the green phototrophic Pseudomonas. FEMS Microbiol Rev. 2006 March; 30(2): bacterium, Chlorobium limicola. Mikrobiologia 1980 274-91. July-August; 49(4) 514-6. 0511 Vignais P M, Colbeau A. Molecular biology of 0497 Smith J. L., Campbell BJ, Hanson TE, Zhang C L, microbial hydrogenases. Curr Issues Mol Biol. 2004 July; Cary S.C. Nautilia profindicola sp. nov., a thermophilic, 6(2): 159-88. Sulfur-reducing epsilonproteobacterium from deep-sea 0512 Vignais PM, Billoud B. Occurrence, classification, hydrothermal vents. IntJ Syst Evol Microbiol. 2008 July; and biological function of hydrogenases: an overview. 58(Pt 7): 1598-602. Chem Rev. 2007 October; 107(10):4206-72. 0498 Smolke CD, Carrier T A, Keasling J. D. Coordi 0513 Wells MA, Mercer J, Mott RA, Pereira-Medrano A nated, differential expression of two genes through G, Burja AM, Radianingtyas H. Wright PC. Engineering directed mRNA cleavage and stabilization by secondary a non-native hydrogen production pathway into Escheri structures. Appl Environ Microbiol. 2000 December; chia coli via a cyanobacterial NiFe hydrogenase. Metab 66(12):5399-405. Eng. 2011 July; 13(4):445-53. 0499 Smolke CD, Martin VJ, Keasling J D. Controlling 0514 Wubbolts MG: Terpstra P. van Beilen J. B. Kingma the metabolic flux through the carotenoid pathway using J. Meesters HA, Witholt B. Variation of cofactor levels in directed mRNA processing and stabilization. Metab Eng. Escherichia coli. Sequence analysis and expression of the 2001 October; 3(4):313-21. pncB gene encoding nicotinic acid phosphoribosyltrans (0500 Smolke CD, Keasling J. D. Effect of copy number ferase. J Biol Chem. 1990 Oct. 15:265(29):17665-72. and mRNA processing and stabilization on transcript and 0515 Yamamoto M, Ikeda T. Arai H, Ishii M., and Igarashi protein levels from an engineered dual-gene operon. Bio Y. Carboxylation reaction catalyzed by 2-oxoglutarate: technol Bioeng. 2002 May 20; 78(4):412-24. (a) ferredoxin oxidoreductases from Hydrogenobacter ther 0501 Smolke CD, Keasling J D. Effect of gene location, mophilus. Extremophiles 2010 January: 14(1) 79-85. mRNA secondary structures, and RNase sites on expres 0516. Yoon KS, Ishii M. KodamaT, Igarashi Y. Purifica sion of two genes in an engineered operon. Biotechnol tion and characterization of pyruvate:ferredoxin oxi Bioeng. 2002 Dec. 30: 80(7):762-76. (b) doreductase from Hydrogenobacter thermophilus TK-6. 0502 Sokal R, Michener, C. A Statistical Method for Arch Microbiol. 1997 May: 167(5):275-9. Evaluating Systematic Relationships. University of Kan 0517. Yoon Y G, Cho J. H. Kim S. C. Cre/loxP-mediated sas Science Bulletin. 1958; 38:1409-38. excision and amplification of large segments of the 0503 Strauss G, Fuchs G Enzymes of a novel autotrophic Escherichia coli genome. Genet Anal. 1998 January; CO fixation pathway in the phototrophic bacterium Chlo 14(3):89-95. roflexus aurantiacus, the 3-hydroxypropionate cycle. Eur J 0518. Zarzycki.J. Brecht V. Milner M, Fuchs GIdentifying Biochem. 1993 Aug. 1; 215(3):633-43. the missing steps of the autotrophic 3-hydroxypropionate 0504 Strom T. Ferenci T, and Quayle J. R. The carbon CO, fixation cycle in Chloroflexus aurantiacus. Proc Natl assimilation pathways of Methylococcus capsulatus, AcadSci USA. 2009 Dec. 15: 106(50):21317-22. Pseudomonas methanica and Methylosinus trichosporium 0519 Zarzycki J. Fuchs G. Co-Assimilation of Organic (OB3B) during growth on methane. Biochem J 1974 Substrates via the Autotrophic 3-Hydroxypropionate Bi December; 144(3) 465-76. Cycle in Chloroflexus aurantiacus. Appl Environ Micro 0505 Sun J, Hopkins RC, Jenney F. E. McTernan PM, biol. 2011 Jul. 15. Adams MW. Heterologous expression and maturation of 0520. Zdobnov. E. M. Apweiler R. InterProScan an inte an NADP-dependent NiFel-hydrogenase: a key enzyme gration platform for the signature-recognition methods in in biofuel production. PLoS One. 2010 May 6; 5(5): InterPro. Bioinformatics. 2001 September; 17(9):847-8. e10526. 0521 Zhang C C, Durand M. C. Jeanjean R. Joset F. 0506 Tabita F R. Small CL. Expression and assembly of Molecular and genetical analysis of the fructose-glucose active cyanobacterial ribulose-1,5-bisphosphate carboxy transport system in the cyanobacterium Synechocystis lase/oxygenase in Escherichia coli containing Stoichio PCC6803. Mol Microbiol. 1989 September; 3(9): 1221-9. metric amounts of large and small subunits. Proc Natl Acad 0522 Zhang Y M, Marrakchi H, Rock C O. The FabR Sci USA. 1985 September; 82(18):6100-3. (YiC) transcription factor regulates unsaturated fatty acid 0507 Tatusov R L. Koonin E. V. Lipman DJ. A genomic biosynthesis in Escherichia coli. J. Biol Chem. 2002 May 3: perspective on protein families. Science. 1997 Oct. 24: 277(18): 15558-65. 278(5338):631-7. 0523 Zhu X, Yuasa M, Okada K. Suzuki K, Nakagawa T. 0508 Tatusov R L. Fedorova N D, Jackson J D, Jacobs. A Kawamukai M. Matsuda H. Production of ubiquinone in R. Kiryutin B. Koonin E. V. Krylov D M, Mazumder R, Escherichia coli by expression of various genes respon Mekhedov S L. Nikolskaya A N. Rao BS, Smirnov S. sible for ubiquinone biosynthesis. J Ferm Bioeng. 1995; Sverdlov A.V.Vasudevan S, WolfY I, Yin JJ, Natale DA. 79(5):493-5. The COG database: an updated version includes eukary 0524 Zweiger G. Knowledge discovery in gene-expres otes. BMC Bioinformatics. 2003 Sep. 11; 4:41. Sion-microarray data mining the information output of the 0509 van Wezel G. P. Mahr K, König M, Traag BA, genome. Trends Biotechnol. 1999 November; 17(11):429 Pimentel-Schmitt E. F. Willimek A, Titgemeyer F. GlcP 36.

US 2015/003 7853 A1 Feb. 5, 2015 59

- Continued

Met Ser Asn Ser Ile Pro Glu Ile Glu Asp Thr Asp Val Val Phe Val 18O 185 19 O Phe Gly Tyr Asn Pro Ser Glu Thr His Pro Ile Val Ala Arg Arg Ile 195 2OO 2O5 Val Lys Ala Arg Glu Lys Gly Ala Lys Ile Ile Val Ala Asp Pro Arg 21 O 215 22O Lys Ile Glu Thir Wall Lys Ile Ser Asp Lieu. Trp Lieu Gln Lieu Lys Gly 225 23 O 235 24 O Gly. Thir Asn Met Ala Lieu Val Asn Ala Lieu. Gly Asn Val Lieu. Ile Asn 245 250 255 Glu Glu Lieu. Tyr Asp Glu Lys Phe Val Glu Asn. Cys Thr Glu Gly Phe 26 O 265 27 O Glu Glu Tyr Lys Glu Ala Wall Lys Llys Tyr Thr Pro Glu Tyr Ala Glu 27s 28O 285 Lys Ile Thr Gly Val Ser Ala Glu Tyr Ile Arg Lys Ala Met Arg Ile 29 O 295 3 OO Tyr Ala Lys Ala Lys Lys Ala Thir Ile Lieu. Tyr Gly Met Gly Val Cys 3. OS 310 315 32O Glin Phe Ser Glin Ala Val Asp Val Val Lys Gly Lieu Ala Ser Lieu Ala 3.25 330 335 Lieu. Lieu. Thr Gly Asn Lieu. Gly Arg Pro Asn Val Gly Ile Gly Pro Val 34 O 345 35. O Arg Gly Glin Asn. Asn Val Glin Gly. Thir Cys Asp Met Gly Val Lieu Pro 355 360 365 Asn Arg Phe Pro Gly Tyr Glin Ser Val Thr Asp Glu Lys Ala Arg Glu 37 O 375 38O Llys Phe Glu Lys Ala Trp Gly Wall Lys Lieu. Ser Asp Arg Val Gly Tyr 385 390 395 4 OO Phe Lieu. Thr Glu Val Pro Llys His Val Lieu Lys Glu Asp Llys Ile Llys 4 OS 41O 415 Ala Tyr Tyr Ile Phe Gly Glu Asp Pro Ala Glin Ser Asp Pro Asn Ala 42O 425 43 O Ala Glu Val Arg Glu Ala Lieu. Asp Llys Ile Asp Phe Val Ile Val Glin 435 44 O 445 Asp Ile Phe Met Asn Llys Thr Ala Lieu. His Ala Asp Val Val Lieu Pro 450 45.5 460 Ala Thir Ser Trp Gly Glu. His Asp Gly Val Tyr Ser Ala Ala Asp Arg 465 470 47s 48O Ser Phe Glin Arg Ile Arg Lys Ala Val Glu Pro Met Gly Glu Ala Lys 485 490 495 Asp Asp Trp Glu Ile Ile Cys Glu Ile Ser Thr Ala Met Gly Tyr Pro SOO 505 51O Met His Tyr Asn Asn Thr Glu Glu Ile Trp Asn Glu Met Arg Ser Lieu. 515 52O 525

Cys Pro Llys Phe Ala Gly Ala Ser Tyr Glu Lys Met Glu Lys Glin Gly 53 O 535 54 O

Ala Val Pro Trp Pro Cys Thir Ser Glu Glu Asp Pro Gly Thr Asp Tyr 5.45 550 555 560 Lieu. Tyr Asp Asp Gly Llys Phe Met Thr Glu Asn Gly Arg Gly Lys Lieu. 565 st O sts US 2015/003 7853 A1 Feb. 5, 2015 60

- Continued Phe Ala Cys Glu Trp Arg His Pro Phe Glu Lieu. Thr Asp Glu Lys Tyr 58O 585 59 O Pro Leu Val Lieu Ser Thr Val Arg Glu Ile Gly His Tyr Ser Val Arg 595 6OO 605 Thir Met Thr Gly Asn. Cys Arg Thr Lieu Gln Lys Lieu Ala Asp Glu Pro 610 615 62O Gly Tyr Ile Glu Ile Ser Val Glu Asp Ala Lys Glu Lieu. Asn. Ile Llys 625 630 635 64 O Asp Glin Glu Lieu Val Thr Val Ser Ser Arg Arg Gly Lys Ile Ile Thr 645 650 655 Arg Ala Ala Val Ala Glu Arg Val Lys Lys Gly Ala Thr Tyr Met Thr 660 665 67 O Tyr Glin Trp Trp Val Gly Ala Cys Asn. Glu Lieu. Thir Ile Asp Ser Lieu. 675 68O 685 Asp Pro Ile Ser Lys Thr Pro Glu Phe Llys Tyr Cys Ala Val Llys Val 69 O. 695 7 OO Glu Arg Ile Lys Asp Glin Gln Lys Ala Glu Glin Glu Ile Glu Glu Arg 7 Os 71O 71s 72O Tyr Ser Ser Lieu Lys Lys Gln Met Lys Ala Glu 72 73 O

<210s, SEQ ID NO 6 &211s LENGTH: 211 212. TYPE PRT <213> ORGANISM: Clostridium pasteurianum

<4 OOs, SEQUENCE: 6 Met Asp Arg Phe Llys Thr Ala Val Ile Lieu Ala Gly Gly Lys Ser Ser 1. 5 1O 15 Arg Met Gly Phe Asp Llys Glin Phe Lieu Lys Ile Gly Glu Lys Arg Lieu 2O 25 3O Met Asp Ile Lieu. Ile Asn. Glu Ile Lys Glu Glu Phe Glin Asp Ile Ile 35 4 O 45 Ile Val Thr Asn Llys Pro Llys Glu Tyr Lys Ser Lieu. Tyr Lys Ser Cys SO 55 6 O Arg Ile Val Ser Asp Glu Ile Glu Ser Glin Gly Pro Lieu. Ser Gly Ile 65 70 7s 8O His Ile Gly Lieu Lys Glu Ser Lys Ser Lys Tyr Ala Tyr Phe Ile Ala 85 90 95 Cys Asp Met Pro Llys Val Asn Ile Pro Tyr Ile Arg Tyr Met Lys Glu 1OO 105 11 O Glu Lieu. Ile Llys Thr Asp Ala Asp Ala Cys Val Thr Glu Ala Gly Cys 115 12 O 125

Arg Met Gln Pro Phe Asn Ala Phe Tyr Ser Lys Glu Val Phe Tyr Lys 13 O 135 14 O

Ile Glu Asp Lieu. Lieu. Arg Glu Gly Lys Arg Ser Met Phe Ser Phe Ile 145 150 155 160 Asn. Ile Ile Asn. Thir His Phe Ile Asp Glu Asp Thr Ala Lys Llys Tyr 1.65 17O 17s

Asn Lys Asp Phe Asn Met Phe Phe Asn Lieu. Asn Thr Pro Glu Asp Lieu 18O 185 19 O

Lys Asp Phe Glin Val Llys Lieu. Tyr Asn Pro Lys Asn Met Asp Lys Asn 195 2OO 2O5 US 2015/003 7853 A1 Feb. 5, 2015 61

- Continued

Ile Glu Lys 21 O

<210s, SEQ ID NO 7 &211s LENGTH: 191 212. TYPE : PRT <213> ORGANISM: Clostridium pasteurianum

<4 OOs, SEQUENCE: 7

Met Arg Asn Phe Ile Lell Phe Luell Tyr Arg Lell Ser Gly Lys Wall 1. 5 15

Gly Lys Ala Met Ser Glu Wall Asn Ser Phe Wall Ile Gly Asp Ala 25 3O

Ser Lys Cys Wall Gly Arg Ala Glu Wall Ala Cys Phe Ala 35 4 O 45

His Ser Asn Arg Glu Glu Ser Ser Pro Ile Phe Wall SO 55 6 O

Arg Arg Asp Ile Ile Thir Arg Ile His Wall Wall Asn Glu Phe 65 70

Ser Wall Pro Wall Glin Arg Glin Glu Asp Ala Pro Ala Asn 85 90 95

Ala Pro Wall Gly Ala Ile Glu Glu His Wall Luell Wall Wall 105 11 O

Glu Glu Glu Lieu Cys Ile Gly Cys Ala Cys Wall Met Ala Pro 115 12 O 125

Phe Gly Ala Ile Glu Wall Lys Arg Ser Glu Glu Wall Arg Wall 13 O 135 14 O

Ala Asp Lell Arg Asn Arg Asp Thir Ala Wall 145 150 155 160

Glu Ile Ser Lys Ala Luell Luell Phe Asp Pro Wall Lys Glu 1.65 17O 17s

Arg Glin Arg Asn Ile Asp Thir Wall Asn ASn Lell Ile Asp Asp 18O 185 19 O

<210s, SEQ ID NO 8 &211s LENGTH: 212. TYPE : PRT &213s ORGANISM: Clostridium pasteurianum

<4 OOs, SEQUENCE:

Met Thir Asn Lieu. Cys His Phe His Arg Glin Arg Glu Glu Arg Ile Ile 1. 1O 15

Met Asn Ser Phe Wall Ile Ala Asn Pro Ile Gly 2O 25 3O

Thir Glu Ala Gly Ala Met Ala His Ser Glu Lys Asn Ile Luell 35 4 O 45

Asn Arg Ser Asp Glu Lell Phe Asn Pro Arg Lell Wall Ile SO 55 6 O

Lys Thir Asp Wall Thir Ala Pro Wall Met Cys Arg His Glu Asn 65 70

Ser Pro Ala Ser Wall Pro Asn Gly Ser Ile Thir Asn Lys Glu 85 90 95

Gly Wall Wall Luell Ile Asn Glin Asp Thir Ile Gly Lys Ser 1OO 105 11 O

US 2015/003 7853 A1 Feb. 5, 2015 69

- Continued

<4 OOs, SEQUENCE: 22 Met Ala Lys Lieu Lys Thr Met Val Asp Glin Glu Thr Cys Thr Ala Cys 1. 5 1O 15 Glu Lieu. Cys Tyr Asp Arg Val Pro Glu Val Tyr Lys Asn Arg Gly Asp 2O 25 3O Gly Ile Ala Asp Val Val Lys Cys Asp Ile Lys Asp Glu Glu Asp His 35 4 O 45 Cys Trp Met Ile Val Pro Glu Gly Lieu. Glu Asp Glu Val Arg Glu Val SO 55 6 O Glu Glu Glu. Cys Pro Ser Gly Ser Ile Ile Val Glu Glu Lieu. Glu Glu 65 70 7s 8O

<210s, SEQ ID NO 23 &211s LENGTH: 231 &212s. TYPE: DNA <213> ORGANISM: Aquifex aeolicus <4 OOs, SEQUENCE: 23 atggggittaa aggtgcgtgt taccaggat acgtgcacgg cct gcgagct gtgctatgac 6 O cgtataccgg aagtatt caa aaacgcaggc gatggcattg Cagatgttgt aaaatgcgat 12 O atagaagatg atgaaggctg. Ctggatgata gtgc.cggaag gcctggagga ggaagttcag 18O gaagtggcgg atgagtgc cc gagtggcago attatagttg aggaagaata a 231

<210s, SEQ ID NO 24 &211s LENGTH: 76 212. TYPE: PRT <213> ORGANISM: Aquifex aeolicus <4 OOs, SEQUENCE: 24 Met Gly Lieu Lys Val Arg Val Asp Glin Asp Thr Cys Thr Ala Cys Glu 1. 5 1O 15 Lieu. Cys Tyr Asp Arg Ile Pro Glu Val Phe Lys Asn Ala Gly Asp Gly 2O 25 3O Ile Ala Asp Val Val Lys Cys Asp Ile Glu Asp Asp Glu Gly Cys Trp 35 4 O 45 Met Ile Val Pro Glu Gly Lieu. Glu Glu Glu Val Glin Glu Val Ala Asp SO 55 6 O Glu Cys Pro Ser Gly Ser Ile Ile Val Glu Glu Glu 65 70 7s

<210s, SEQ ID NO 25 &211s LENGTH: 3633 &212s. TYPE: DNA <213s ORGANISM: Unknown 22 Os. FEATURE: <223> OTHER INFORMATION: gamma-proteobacterium

<4 OOs, SEQUENCE: 25 atgalacacag aaacticgcac caccagoggc ggacggttgc atgacaaggt ggittatcttg 6 O accggcgcgg ccggaaac at tagdtac attagc.cggit CCttgctg.cg cgaaggggcg 12 O aatttggtaa taccgggcg gaatgaaccg aaattacagg catttgttgga aggtttagtt 18O gaggagggitt ttgatcgtga caatat atta attgc catcg gtgactic.cgc gaaggcggat 24 O atatgtc.gcg aaatcgittaa gogogactgtt aat catttcq gcaa.cattga tigt cctdgtt 3OO