US008530207B2

(12) United States Patent (10) Patent No.: US 8,530,207 B2 Watts et al. (45) Date of Patent: Sep. 10, 2013

(54) PHOTOSYNTHETIC MICROORGANISMS WO WO 2007/136762 11, 2007 COMPRISING EXOGENOUS PROKARYOTC WO WO 2008/10O251 8, 2008 WO WO 2008/119082 A2 * 10/2008 ACYL-ACPTHOESTERASES AND WO WO 2008/151149 12/2008 METHODS FOR PRODUCING FATTY ACDS WO WO 2009/036385 3, 2009 WO WO 2009,076.559 6, 2009 (75) Inventors: Kevin Watts, Minneapolis, MN (US); WO WO 2009,111513 9, 2009 WO WO 2010/O19813 2, 2010 Rekha Seshadri, San Diego, CA (US); WO WO 2010/O22090 2, 2010 Toby Richardson, San Diego, CA (US) WO WO 2010/118410 10, 2010 (73) Assignee: ExxonMobil Research and Engineering Company, Annandale, NJ OTHER PUBLICATIONS (US) Pfam family Acyl-ACP TE (PFO1643), obtained from pfam.sanger. ac.uk/family/PFO1643 on Jul. 17, 2012, 3 pages.* (*) Notice: Subject to any disclaimer, the term of this GenBank Accession No. ABK74560, Dec. 2006, 1 page.* patent is extended or adjusted under 35 GenBank Accession No. EFB38222, Dec. 2009, 2 pages.* U.S.C. 154(b) by 0 days. Abe, J., et al. (2008), "Expression of exogenous genes under the control of endogenous HSP70 and CAB promoters in the closterium (21) Appl. No.: 13/324,623 peracerosum-strigosum-littorale complex”, Plant Cell Physiol. (22) Filed: Dec. 13, 2011 49(4): 625-632. Altschul, S., et al. (1997), “Gapped BLAST and PSI-BLAST: a new (65) Prior Publication Data generation of protein database search programs'. Nucleic Acids Research, 25(17): 3389-3402. US 2012/O1647OO A1 Jun. 28, 2012 Angermayr, A., et al. (2009), “Energy biotechnology with cyanobacteria'. Current Opinion in Biotechnology, 20: 257-263. Bateman, A., et al. (2000), “The pfam protein families database'. Related U.S. Application Data Nucleic Acids Research, 28(1):263-266. (60) Provisional application No. 61/426,555, filed on Dec. Bateman, A., et al. (2004), “The pfam protein families database'. 23, 2010. Nucleic Acids Research, 32: Database Issue: D138-D141. Browse, J., et al. (1991), "Glycerolipid synthesis, biochemistry and (51) Int. Cl. regulation'. Plant Molecular Biology, 42: 467-506. CI2P 7/64 (2006.01) Cantu, D., et al. (2010), “Thioesterases: A new perspective based on their primary and tertiary structures'. Protein Science, 19:1281 CI2N I/00 (2006.01) 1295. CI2N L/12 (2006.01) Chan, D., et al. (2010), "Current understanding of fatty acid CI2N L/20 (2006.01) biosynthesis and the acyl carrier protein'. Biochem J. 430:1-19. (52) U.S. Cl. Copeland, A., et al. (2007), “Clostridium thermocellum ATCC USPC ...... 435/134:435/243; 435/257.2:435/252.3 27405, complete genome'. Genbank Accession CP000568. I (58) Field of Classification Search Retrieved on Mar. 21, 2012. Retrieved from the internet at (URL: None http://www.ncbi.nlm.nih.gov/nuccore/125712750?sat=11 See application file for complete search history. &satkey=9897002>. Davies, H., (1993), “Medium chain ACYL-ACP Hydrolysis Activi (56) References Cited ties of Developing Oilseeds” Phytochemistry 33 (6): 1353-1356. Dillon, S., etal. (2004), “The hotdog fold: wrapping up a Superfamily U.S. PATENT DOCUMENTS of thioesterases and dehydratases” BMC Bioinformatics, 5:109. 5.447,858 A 9/1995 Key et al...... 435,172.3 Dörmann, P. et al. (1994), “Specificities of the acyl-acyl carrier 5,451,513 A 9/1995 Maliga et al. 435,172.3 protein (ACP) thioesterase and glycerol-3-phosphate acyltransferase 5,455,167 A 10, 1995 Voelker et al. .. 435/1723 for octadecenoyl-ACP isomers' Plant Physiol. 104:839-844. 5,545,817 A 8, 1996 McBride et al. ... 800,205 5,545,818 A 8, 1996 McBride et al. ... 800,205 (Continued) 5,639,952 A 6/1997 Quail et al...... 800,205 5,654.495 A 8, 1997 Voelker et al...... 800/250 5,661,017 A 8/1997 Dunahay et al. 435/1723 Primary Examiner — David J Steadman 5,689,044 A 1 1/1997 Ryals et al...... 800,205 (74) Attorney, Agent, or Firm — Harness, Dickey & Pierce, 5,750,385 A 5, 1998 Shewmaker et al. 435/1723 5,851,796 A 12/1998 Schatz ...... 435/69.1 P.L.C. 6,379,945 B1 4/2002 Jepson et al...... 435.243 6.410,828 B1 6/2002 Armstrong etal 800,287 7,135,290 B2 11/2006 Dillon ...... 435/6 (57) ABSTRACT

7,294,506 B2 11/2007 Daniell et al. .. 435/320.1 2009/0298143 A1 12/2009 Roessler et al. ... 435/134 The described invention provides genetically engineered 2011/0020883 A1 1/2011 Roessler et al...... 435.134 photosynthetic microorganisms expressing prokaryotic acyl FOREIGN PATENT DOCUMENTS ACP thioesterases and methods of using the genetically engi WO WO95/16783 6, 1995 neered photosynthetic microorganisms for producing free WO WOOOf 626O1 10, 2000 fatty acids and/or fatty acid derivatives. WO WOO3,O91413 11, 2003 WO WO 2005.005643 1, 2005 WO WO 2007/133558 11, 2007 21 Claims, 5 Drawing Sheets US 8,530.207 B2 Page 2

(56) References Cited Méndez-Alvarez, S., et al. (1994), “Transformation of Chlorobium limicola by a plasmid that confers the ability to utilize thiosulfate' OTHER PUBLICATIONS Journal of Bacteriology, 176(23):7395-7397. Murata, N., et al. (1995), "Acyl-lipid desaturases and their impor Finn, R., et al. (2006), “Pfam: clans, web tools and services'. Nucleic tance in the tolerance and acclimatization to cold of cyanobacteria' Biochen J.308:1-8. Acids Research, 34: Database Issue 34:D247-D251. Ohnuma M., et al. (2008), "Polyethylene glycol (PEG)-mediated Finn, R., et al. (2010), “The pham protein families database'. Nucleic transient gene expression in a red alga, Cyanidioschyzon merolae Acids Research, 38: Database Issue 38:D211-D222. 10D, Plant Cell Physiol. 49(1): 117-120. Gibson, S., et al. (1994). “Use of transgenic plants and mutants to Pearson W., et al. (1988), "Improved tools for biological sequence study the regulation and function of lipid composition' Plant, Cell comparison” Proc. Natl. Acad. Sci. USA, 85:2444-2448. and Environment, 17:627-637. Perrone, C. etal. (1998), “The Chlamydomonas IDA7 locus encodes Hallmann, A., et al. (1997), “Gene replacement by homologous a 140 kDa dynein intermediate chain required to assemble the Il recombination in the multicellular green alga Volvox carteri" Proc. inner arm complex”. Molecular biology of the Cell, 9:3351-3365. Natl. Acad. Sci USA, 94:7469-7474. Ramesh, V., et al. (2004), "A simple method for chloroplast transfor Henikoff, S., et al. (1992), “Amino acid substitution matrices from mation in Chlamydomonas reinhardtii' Methods in Molecular Biol protein blocks”. Proc. Natl. Acad. Sci USA, 89:10915-10919. ogy, 274:301-307. Ikawa, M., et al. (1994), “Lipids of cyanobacterium aphanizomenon Ravindran, C. etal. (2006), "Electroporation as a tool to transfer the flos-aquae and inhibition of chlorella growth” Journal of Chemical plasmidpRL489 in Oscillatoria MKU277” Journal of Microbiologi Ecology. 2009):2429-2436. cal Methods, 66:174-176. International Search Report for PCT/US 11/64646 dated Apr. 11, Schroda, M., et al. (2000), “The HSP70A promoter as a tool for 2012. improved expression of transgenes in Chlamydomonas', The plant Iwai. M., et al. (2004), "Improved genetic transformation of the journal 21(2): 121-131. thermophilic cyanobacterium, Thermosynechoccus elongates Schweizer, E., et al. (2004), "Microbial Type I Fatty Acid Synthases BP-1, Plant Cell Physiol. 45(2): 171-175. (FAS): major players in a network of Cellular FAS systems”. Jones, A., et al. (1995), “Palmitoyl-acyl carrier protein (ACP) Microbiology and molecular biology reviews 68(3):501-517. thioesterase and the evolutionary origin of platn acyl-ACP Smith, T., et al., (1981), “Comparison of biosequences'. Advances in Thioesterases. The Plant Cell 7:359-371. Applied Mathematics 2:482-489. Kindle, K., et al. (1989), "Stable nuclear transformation of Sonnhammer, E., et al. (1998), “Pfam: multiple sequence alignments Chlamydomonas using the Chlamydomonas gene for nitrate and HMM-profiles of protein domains' Nucleic Acids Research reductase'. The Journal of Cell Biology, 109 (6, Part 1), 2589-2601. 26(1):320-322. Knutzon, D., et al. (1992), “Isolation of characterization of two Steinbrenner, J., et al. (2006), “Transformation of the Green Alga safflower oleoyl-acyl carrier protein thioesterase cDNA clones', Haematococcus pluvialis with a phytoene Desaturase for accelerated Plant Physiol. 100: 1751-1758. astaxanthin biosynthesis' Applied and Environmental Microbiology Löhden, I., et al. (1988), “Role of plastidial acyl-acyl carrier protein: 72(12):7477-7484. Glycerol 3-phosphate acyltransferase and acyl-acyl carrier protein Tan, C., et al. (2005), “Establishment of a micro-particle bombard hydrolase in channeling the acyl flux through the prokaryotic and ment transformation system for Dunaliella salina', The Journal of eukaryotic pathway”. Planta 176:506-512. Microbiology 43:361-365. Mayer, K., et al. (2005), "A structural model of the plant acyl-acyl Voelker, T., et al. (1994), "Alteration of the specificity and regulation carrier protein thioesterase fatE comprises two helix/4-stranded of fatty acid synthesis of Escherichia coli by expression of a plant sheet domains, the N-terminal domain containing residues that affect medium-chain acyl-acyl carrier protein thioesterase” Journal of Bac specificity and the C-terminal domain containing catalytic residues' teriology 176(23): 7320-7327. The Journal of Biological Chemistry, 280(5):3621-3627. Xu, J., et al. (2007) “Evolution of Symbiotic Bacteria in the Distal Mayer, K., et al. (2007), “Identification of amino acid residues Human Intestine” Uniprot A6LDN7 9PORP|Retrieved on Mar. 21, involved in substrate specificity of plant acyl-ACP thioesterases 2012. Retrieved from the internet at

U.S. Patent Sep. 10, 2013 Sheet 2 of 5 US 8,530,207 B2

&Fife partia pi}c origies

'...'...'...'...'...'...'...'. ------88-8&:

8xey

Prokaryotic six88:888 &8:38 :::::::::::8: fragment .." is ter acid rater . {38egas axia spie

FIGURE 2 U.S. Patent Sep. 10, 2013 Sheet 3 of 5 US 8,530,207 B2

?ºzzzzzzzzzzzzzzzz

FIGURE 3 U.S. Patent Sep. 10, 2013 Sheet 4 of 5 US 8,530,207 B2

FIGURE 4 U.S. Patent Sep. 10, 2013 Sheet 5 of 5 US 8,530,207 B2

thioesterases in 6803

FIGURES US 8,530,207 B2 1. 2 PHOTOSYNTHETIC MICROORGANISMS The position of double bonds in unsaturated fatty acids also COMPRISING EXOGENOUS PROKARYOTC is irregular; in most monounsaturated fatty acids, the double ACYL-ACPTHOESTERASES AND bond is between C-9 and C-10 (A9), and the other double METHODS FOR PRODUCING FATTY ACDS bonds of polyunsaturated fatty acids are generally A12 and A15. The double bonds of polyunsaturated fatty acids are CROSS-REFERENCE TO RELATED almost never conjugated (alternating single and double APPLICATIONS bonds), but commonly are separated by a methylene group (—CH=CH-CH CH=CH-). The physical properties This application claims benefit of priority to U.S. provi of the fatty acids, and of compounds that contain them, are 10 largely determined by the length and degree of unsaturation sional patent application 61/426,555 of the same title filed of the hydrocarbon chain, i.e., the longer the fatty acyl chain Dec. 23, 2010, which is hereby incorporated by reference in and the fewer the double bonds, the lower the solubility in its entirety. water. (Lehninger et al., Principles of Biochemistry, Volume 1, Macmillan, 2005). REFERENCE TO ASEQUENCE LISTING 15 2.1 Fatty Acid Composition of Algae and Cyanobacteria Algae are at the base of the trophic ladder of aquatic eco This application contains references to amino acid systems, providing energy and essential nutrients for primary sequences and/or nucleic acid sequences which have been consumers. The major acyl lipid classes in algae are typically Submitted concurrently herewith as the sequence listing text phospholipids (e.g., phosphatidylcholine, phosphatidyletha file entitled “2010EM384 (PMO001) sequences.TXT, file nolamine, phosphatidylserine, phosphatidylglycerol, phos size 62.2 KiloBytes (KB), created on Dec. 12, 2011. The phatidylinositol), glycolipids (e.g., monogalactosyldiglyc aforementioned sequence listing is hereby incorporated by erol, digalactosyl glycerol Sulfolipids), triacylglycerols, reference in its entirety pursuant to 37 C.F.R.S 1.52(e)(5). sterol esters, and free fatty acids. Phospholipids are structural constituents of cellular membranes, whereas glycolipids are FIELD OF THE INVENTION 25 major components of the thylakoid membrane in chloro plasts. Triacylglycerols produced by some algal species may The invention relates to compositions and methods for be present as intracellular storage material and can occur as producing free fatty acids and/or fatty acid derivatives in clearly visible oil droplets. Sterol esters are normally minor microorganisms, including photosynthetic microorganisms lipid constituents in plants (Arts and Wainman, Lipids in Such as cyanobacteria and microalgae. 30 Freshwater Ecosystems, Springer, 1998). Cyanobacterial cells are observed to resemble chloroplasts BACKGROUND of eukaryotic plants in terms of membrane structure and glycerolipid composition. There are three types of membrane 1. Biofuels in the cyanobacterial cells, namely, the plasma membrane, the Biofuels represent renewable energy sources from living 35 outer membrane, and the thylakoid membranes. The thyla organisms, such as higher plants, fungi, or bacteria. Photo koid membranes are closed systems and are separated from synthetic life forms capture light energy and Subsequently the plasma membrane. This architecture corresponds to that convert it into the free energy of organic compounds based on of the eukaryotic chloroplasts, which has inner and outer fixed CO2, using water as the ultimate electron donor. Cur envelope membranes and thylakoid membranes. (Murata and rently, two major technologies are employed for generating 40 Wada, Biochem. J., 1995, 308: 1-8). biofuels using phototrophic organisms: first, plant-based bio The major nonpigment lipids of cyanobacteria have been fuel production via fermentation of the plant's Sugar content identified as monogalactosyldiacylglycerol (MGDG), diga to ethanol and, second, to a much lesser extent, algae-derived lactosyldiacylglyerol (DGDG), phosphatidylglycerol, and biodiesel production through lipid extraction of biomass from sulfoquinovosyldiacylglycerol. Other lipids observed to large-scale cultures (Angermayr et al., 2009, Curr Opin Bio 45 occur in lesser amounts include fatty acids, sterols, hydrocar technol, 2003): 257-263). bons, and heterocyst glycolipids (glycosides of long chain 2. Fatty Acids diols, triols, and hydroxy acids). Cyanobacteria are not Fatty acids are carboxylic acids with hydrocarbon chains known to contain phosphatidylcholine, -ethanolamine, of 4 to 36 carbons. In some fatty acids, this chain is fully -serine, -inositol, and diphosphatidylglycerol (cardiolipin) saturated (meaning contains no double bonds) and 50 (Ikawa et al., J. Chem. Ecol., 1994, 20: 2429-2436). unbranched; others contain one (monounsaturated) or more 2.2 Fatty Acid Biosynthesis double bonds (polyunsaturated). A few contain three-carbon The irreversible formation of malonyl-CoA from acetyl rings or hydroxyl groups. A simplified nomenclature for these CoA is catalyzed by acetyl-CoA carboxylase in what is con compounds specifies the chain length and number of double sidered to be the first committed step in fatty acid biosynthesis bonds, separated by a colon; the 16-carbon Saturated palmitic 55 (FIG. 1). Acetyl-CoA carboxylase contains biotin as its pros acid is abbreviated 16:0, and the 18-carbon oleic acid, with thetic group, covalently bound in amide linkage to the one double bond, is 18:1. The positions of any double bonds S-amino group of a Lys residue on one of the three subunits of are specified by SuperScript numbers following A (delta); a the enzyme molecule. The carboxyl group, derived from 20-carbon fatty acid with one double bond between C-9 and bicarbonate (HCO3-), is first transferred to biotin in an ATP C-10 (C-1 being the carboxyl carbon), and another between 60 dependent reaction. The biotinyl group serves as a temporary C-12 and C-13, is designated 20:2 (A9,12), for example. The carrier of CO2, transferring it to acetyl-CoA in the second most commonly occurring fatty acids have even numbers of step to yield malonyl-CoA. (Lehninger et al., Principles of carbonatoms in an unbranched chain of 12 to 24 carbons. The Biochemistry, Volume 1, Macmillan, 2005). even number of carbons results from the mode of synthesis of In contrast to other heterotrophic bacteria, Such as E. coli, these compounds, which involves condensation of acetate 65 which have to metabolize from media into acetyl (two-carbon) units. (Lehninger et al., Principles of Biochem CoA in order to initiate the fatty acid synthesis, in cyanobac istry, Vol. 1, Macmillan, 2005). teria, the precursor for fatty acid synthesis, i.e., acetyl-CoA, US 8,530,207 B2 3 4 directly comes from the Calvin-Benson cycle which fixes elongation cycles may be interrupted at a chain length of 10 carbon dioxide using energy and reducing power provided by carbons by one cycle involving an intrinsic isomerase con the light reactions of photosynthesis. Verting the 2-trans- into the 3-cis-decenoyl intermediate, The reaction sequence by which the long chains of carbon which is subsequently not reduced but further elongated to atoms in fatty acids are assembled consists of four steps: (1) 5 long-chain monounsaturated fatty acids (Schweizer and Hof condensation; (2) reduction; (3) dehydration; and (4) reduc mann, Microbiol Mol Biol Rev., 2004, 68(3): 501-17). tion. The saturated acyl group produced during this set of 3. Acyl Carrier Protein (ACP) reactions is recycled to become the Substrate in another con The acyl carrier protein (ACP), the cofactor protein that densation with an activated malonyl group. With each pas covalently binds fatty acyl intermediates via a phosphopan sage through the cycle, the fatty acyl chain is extended by two 10 tetheline linker during the synthesis process, is central to fatty carbons. In many cells, chain elongation terminates when the acid synthesis. It is a highly conserved protein that carries chain reaches 16 carbons, and the product (palmitate, 16:0) acyl intermediates during fatty acid synthesis. ACP Supplies leaves the cycle. The methyl and carboxylcarbonatoms of the acyl chains for lipid and lipoic acid synthesis, as well as for acetyl group become C-16 and C-15, respectively, of the quorum sensing, bioluminescence and toxin activation. Fur palmitate; the rest of the carbonatoms are derived from malo 15 thermore, ACPs or PCPs (peptidyl carrier proteins) also are nyl-CoA. All of the reactions in the synthetic process are utilized in polypeptide and non-ribosomal peptide synthesis, catalyzed by a multienzymatic complex, the fatty acid syn which produce important secondary metabolites, such as, the thase (Lehninger et al., Principles of Biochemistry, Volume 1, lipopeptide antibiotic daptomycin and the iron-carrying sid Macmillan, 2005). erophore enterobactin (Chan and Vogel, Biochem. J., 2010, 2.3. The Elongation Cycle in Fatty Acid Synthesis 430:1-19). Fatty acid synthesis represents a central, conserved process In yeast and mammals, ACP exists as a separate domain by which acyl chains are produced for utilization in a number within a large multifunctional fatty acid synthase polyprotein of end-products Such as biological membranes. The enzyme (type I FAS), whereas it is a small monomeric protein in system, which catalyzes the synthesis of saturated long-chain bacteria and plants (type II FAS) (Byers and Gong, Biochem fatty acids from acetyl CoA, malonyl-CoA, and NADPH, is 25 Cell Biol., 2007, 85(6): 649-62). called the fatty acid synthase (FAS) (FIG. 1). Fatty acid syn In E. coli, ACP is highly abundant, comprising approxi thases (FASs) can be divided into two classes, type I and II, mately 0.25% of all soluble proteins and it represents one of which are primarily present in eukaryotes and in bacteria and four major protein-protein interaction hubs, the others being plants respectively. They are characterized by being com DNA and RNA polymerases as well as ribosome-associated posed of either large multifunctional polypeptides in the case 30 proteins. In type I FAS systems, ACP is part of large, multi of type I or consisting of discretely expressed mono-func domain polypeptides that also carry the other protein domains tional proteins in the type II system. (Chan D. and Vogel H. for FA synthesis in a linear fashion. Although the architecture Biochem J., 2010, 430(1):1-19). The fatty acid synthase con and sequence identity of the type I FAS systems are different tains six catalytic activities and contains beta-ketoacyl Syn from the type II dissociated enzymes, many of the functional thase (KS), acetyl/malonyl transacylase (AT/MT), beta-hy 35 units in these complexes are similar. On the other hand, other droxyacyl dehydratase (DH), enoyl reductase (ER), beta domains, such as the enoyl reductase and dehydratase ketoacyl reductase (KR), acyl carrier protein (ACP), and enzymes, vary significantly between the type Ia, Ib and II thioesterase (TE) (Chirala and Wakil, Lipids, 2004, 39(11): systems (Chan and Vogel, Biochem. J., 2010, 430:1-19). 1045-53). It has been shown that the reactions leading to fatty 4. Acyl-ACP Thioesterases acid synthesis in higher organisms are very much like those of 40 The major termination reaction offatty acid biosynthesis is bacteria (Berg et al. Biochemistry, 6th ed., Macillan, 2008). catalyzed by acyl-acyl carrier protein (acyl-ACP) Fatty acid biosynthesis is initiated by the fatty acid syn thioesterases in eukaryotes. Previous studies have shown that thase component enzyme acetyltransferase loading the acyl the acyl-ACP thioesterase enzyme terminates acyl elongation primer, usually acetate, from coenzyme A (CoA) to a specific of a fatty acyl group by hydrolyzing an acyl group on a fatty binding site on fatty acid synthase (FAS). At the end of the 45 acid. In plants, an acyl-ACP thioesterase terminates the acyl process, termination of chain elongation occurs by removing elongation process by hydrolysis of the acyl-ACP thioester; the product from the fatty acid synthase (FAS) either by free fatty acid then is released from the fatty acid synthase. In transesterification to an appropriate acceptor or by hydroly E. coli, the long-chain acyl group is transferred directly from sis. The respective enzymes are usually palmitoyltransferase ACP to glycerol-3-phosphate by a glycerol-3-phosphate and thioesterase. The reaction sequence between initiation 50 acyltransferase, and free fatty acids normally are not found as and termination involves the elongation of enzyme-bound intermediates in lipid biosynthesis. As in most other organ intermediates by several iterative cycles of a distinct set of isms, the major end products of the plant and E. coli fatty acid reaction steps. Each cycle includes (i) malonyl-transacylation synthase are usually 16- or 18-carbon fatty acids. Chain from CoA to the enzyme by malonyl transferase; (ii) conden length is determined by the3-ketoacyl-ACP synthases I and II sation of acyl-enzyme with enzyme-bound malonate to 3-ke 55 and the glycerol-3-phosphate acyltransferase in E. coli. toacyl-enzyme by 3-ketoacyl synthase, (iii) reduction of the (Voelker and Davies, J. Bacteriol, 1994, 17: 7320-7327) 3-keto- to the 3-hydroxyacyl intermediate by ketoacyl reduc 4.1. Plant Acyl-ACP Thioesterases tase, (iv) dehydration of 3-hydroxyacyl enzyme to 2,3-trans Acyl-ACP thioesterases have been studied extensively in enoate by dehydratase, and, (v) finally, reduction of the plants. In plants, de novo fatty acid synthesis occurs in the enoate to the Saturated acyl-enzyme by enoyl reductase. The 60 stroma of plastids, where the acyl chains are covalently bound prosthetic group, 4'-phosphopantetheline, plays a central role to a soluble acyl carrier protein (ACP) during the extension in Substrate binding, processing of intermediates, and com cycles. Carbon chain elongation can be terminated by trans municating of intermediates between the various catalytic ferring the acyl group to glycerol 3-phosphate, thereby retain centers of fatty acid synthase (FAS). This cofactor is bound ing it in the plastidial “prokaryotic' lipid biosynthesis path covalently to a specific serine hydroxyl group of the ACP 65 way. Alternately, specific thioesterases can intercept the domain or, depending on the FAS system, to the ACP com prokaryotic pathway by hydrolyzing the newly formed acyl ponent of FAS. In some bacteria, the iterative sequence of ACP into free fatty acids and ACP. Subsequently, the free fatty US 8,530,207 B2 5 6 acids exit the plastids by an undetermined mechanism and enzymes to the chloroplast where the biosynthesis of fatty supply the “eukaryotic' lipid biosynthesis pathway. The latter acid occurs in plants. When expressed in bacteria, these is located in the endoplasmic reticulum and is responsible for higher plant thioesterase genes generally are N-terminally the formation of phospholipids, triglycerides, and other neu truncated in order to remove the chloroplast targeting peptide tral lipids. By catalyzing the first committed step in the (Jones and Davies, Plant Cell, 1995, 7(3): 359-371). E. coli eukaryotic lipid biosynthesis pathway in plant cells, acyl and many other prokaryotes do not have acyl-ACP ACP thioesterases play a crucial role in the distribution of de thioesterases (Jones and Davies, Plant Cell, 1995, 7(3): 359 novo synthesized acyl groups between the two pathways (Lo 371), but some prokaryotes do have acyl-CoA thioesterases hden and Frentzen, Planta, 1988, 176: 506-512; Browse and that cleave acyl chains from coenzyme A, which serves as a Somerville, Plant Mol. Biol, 1991, 42: 467-506; and Gibson 10 cofactor in fatty acid degradation. et al., Plant Cell Environ, 1994, 17: 627-637). Current understanding of the role of the acyl-CoA Acyl-ACP thioesterases play an essential role in chain thioesterases in fatty acid metabolism is incomplete. The E. termination during de novo fatty acid synthesis and in the coli acyl-CoA thioesterase TesA (Thioesterase I), for channeling of carbon flux between the two lipid biosynthesis example, is a periplasmic enzyme, but whether it functions in pathways in plants. There are two distinct but related 15 lipid synthesis, recycling, or degradation is unclear. Genes thioesterase gene classes in higher plants, termed FatA and encoding N-terminally truncated TesA or N-terminally trun FatB. FatA encodes a C18:1-ACP thioesterase. In contrast, cated TesA variants have been expressed in bacteria. These FatB encodes thioesterases preferring acyl-ACPs having proteins lack a secretion sequence, and remain in the cyto saturated acyl groups (Jones et al., Plant Cell, 1995, 7(3):359 plasm, where they are able to cleave acyl-ACPs. 71). For example, U.S. Pat. No. 5,455,167 discloses genes and Among prokaryotes, acyl groups exiting the dissociable constructs for expressing genes encoding higher plant acyl fatty acid synthase are transferred directly from ACP to polar ACP thioesterases, as well as a construct for expressing a lipids. In contrast, plants must also release sufficient fatty acid gene encoding the Vibrio harveyi LuxD acyl transferase from ACP to Supply the extraplastidial compartments. Analy (YP 0014.48362.1 GI:156977456), belonging to pfam sis of cloned plant thioesterases suggested that plants possess 25 PF02273, in higher plants. PCT Publication No. WO2007/ individual thioesterases with specificity either for C18:1 or 136762 discloses recombinant microorganisms engineered for one or more saturated fatty acids (Vance et al., Biochem for the fermentative production offatty acid derivatives, such istry of Lipids, Lipoproteins, and Membranes, Elsevier, as, interalia, fatty alcohols and wax esters, in which the host 1996). The most prominent thioesterase in most plants has a strain can express a higher plant thioesterase or the E. coli strong preference for C18:1-ACP making C18:1 the fatty 30 TesA acyl-CoA thioesterase. PCT Publication No. WO2008/ acid most available for extraplastidial glycerolipid synthesis. 100251 describes methods for engineering microorganisms Several plant species that produce storage oils containing that include genes encoding synthetic cellulosomes to pro large amounts offatty acids having an acyl chain length from duce hydrocarbon products (which may be, inter alia, 8 to 14 carbons contain thioesterases specific for those acyl alkanes, alkenes, alkynes, dienes, fatty acids, isoprenoids, chain lengths. By removing acyl groups from ACP prema 35 fatty alcohols, fatty acid esters, polyhydroxyalkanoates, turely, the medium-chain thioesterase simultaneously pre organic acids, and the like). The microorganism that contains vents further chain elongation and releases fatty acids for one or more exogenous nucleic acid sequence encoding a triacylglycerol synthesis outside the plastids. Thus, by regu synthetic cellulosome can also include an exogenous lating expression of different thioesterases, plants can both thioesterase gene. Such as the E. coli Tes.A acyl-CoA fine tune and radically modify the exported fatty acid pool 40 thioesterase or a plant thioesterase gene, which can be (Vance et al., Biochemistry of Lipids, Lipoproteins, and expressed in the host cells. Membranes, Elsevier, 1996). Other applications disclosing microorganisms, including The acyl-ACP hydrolytic specificities of five FatA repre algae, engineered to express heterologous acyl-ACP sentatives from three families have been measured in vitro thioesterases from higher plants or acyl-CoA thioesterases after heterologous expression in E. coli. All FatA 45 from E. coli for the production of various compounds, includ thioesterases appeared to be C18:1 specific, with minor ing, inter alia, fatty acids or fatty acid derivatives, include activities on C18:0 and C16:0 substrates (Knutzonet al., Plant PCT Publication Nos. WO2008/151149, WO2009/076559, Physiol, 1992, 100(4): 1751-1758; Dormann et al., Plant and WO2009036385, as well as PCT Publication Nos. Physiol., 1994, 104(3): 839-844). In contrast to the conserved WO2009/111513, WO2010/022090, and WO2010/118410. nature of FatA, the specificities of FatBenzymes show high 50 variability. The California bay Umbellularia Californica (Uc) SUMMARY FatB1 has a strong preference for C12:0 ACP (and a modest preference for C14:0-ACP; Voelker and Davies, J. Bacteriol. One aspect of the invention relates to a photosynthetic 1994, 176(23): 7320-7327). A C. hookeriana thioesterase, microorganism that includes a recombinant nucleic acid mol encoded by ChFatB2, has been characterized and found to 55 ecule (e.g., a recombinant gene) that encodes a prokaryotic hydrolyze C8:0 and C10:0 ACP; it is the enzyme involved in acyl-acyl carrier protein (acyl-ACP) thioesterase and pro C8:0 and C10:0 fatty acid production for the storage lipids in duces at least one free fatty acid and/or fatty acid derivative, seed. In addition, the specificity of a FatB representative from e.g., by expressing the gene encoding the prokaryotic acyl elm (U. americana) seed, Ua FatB1, shows that this enzyme ACP thioesterase. is involved in C10:0 production for the elm storage lipids (80 60 In most embodiments, the photosynthetic microorganism mol % 10:0: Davies, H., Phytochemistry, 1993, 33:1353 that includes a recombinant nucleic acid molecule encoding a 1356). Medium chain-preferring FatB representatives may be prokaryotic acyl-ACP thioesterase can produce at least one common, if not universal, components of the fatty acid syn free fatty acid and/or fatty acid derivative, in which the thases specialized for medium-chain production in oilseeds amount of at least one free fatty acid and/or derivative pro (Jones and Davies, Plant Cell, 1995, 7(3):359-371). Fat A and 65 duced by the photosynthetic microorganism can be greater Fat B plant acyl-ACP thioesterase enzymes contain a target than the amount produced by a photosynthetic microorgan ing peptide at the N-terminal, which transports the expressed ism that does not include the recombinant nucleic acid mol US 8,530,207 B2 7 8 ecule encoding the prokaryotic acyl-ACP thioesterase. For photosynthetic microorganism can produce a fatty acid hav example, the photosynthetic microorganism that include the ing an acyl chain length of 12, 14, and/or 16 carbons and/or a recombinant nucleic acid molecule encoding the prokaryotic fatty acid derivative having a total number of carbons from 11 acyl-ACP thioesterase can produce at least 5 mg per liter, for to 32. In some embodiments, the photosynthetic microorgan example at least 10 mg per liter, per liter, at least 20 mg per ism contains a nucleic acid molecule that includes nucleotide liter, at least 30 mg per liter, at least 40 mg per liter, or at least sequence SEQID NO:3 or SEQID NO:4. 50 mg per liter, of free fatty acids and/or derivatives, for In some embodiments, the nucleic acid molecule encoding example, produced over a period from six hours to ten days. the prokaryotic acyl-ACP thioesterase can be stably inte Additionally or alternately, at least one free fatty acid and/ grated into a chromosome of the photosynthetic microorgan or derivative produced by the photosynthetic microorganism 10 ism. Additionally or alternately, the nucleic acid encoding the that includes a recombinant gene that encodes a prokaryotic prokaryotic acyl-ACP thioesterase can be in an autonomously acyl-ACP thioesterase can have an acyl chain length from 8 to replicating episome. For example, the nucleic acid encoding 24 carbons, for example, an acyl chain length from 8 to 18 the prokaryotic acyl-ACP thioesterase present on an episome carbons or an acyl chain length from 12 to 16 carbons. For and/or integrated into the genome of the photosynthetic example, at least one free fatty acid and/or derivative pro 15 microorganism can be an exogenous nucleic acid molecule duced by the photosynthetic microorganism can have an acyl introduced into the host microorganism (or a progenitor of the chain length of 8, 10, 12, 14, 16, 18, 20, 22, and/or 24 carbons. host microorganism), and can also be a recombinant nucleic In cases where the fatty acid derivative comprises a wax ester, acid molecule produced by genetic engineering. the wax ester comprises Achain carbons, as well as acyl chain Further additionally or alternately, the genetically engi carbons (B chain carbons), where the B chain can include 8. neered photosynthetic microorganism can include an expres 10, 12, 14, 16, 18, 20, 22, and/or 24 carbons. In cases where sion construct that includes the nucleic acid molecule encod the fatty acid derivative comprises one or more compounds ing the prokaryotic acyl-ACP thioesterase and one or more that do not exhibit a carbonyl group (e.g., fatty alcohols, additional sequences that regulate expression of the acyl alkanes, and alkenes), the “acyl chain length of Such com ACP thioesterase gene. For example, the expression construct pounds should be understood to correspondhereinto the total 25 can include a promoter operative in the host cells, where the number of carbons in those molecules. promoter can be, for example, a bacterial, viral, phage, or Further additionally or alternately, the photosynthetic eukaryotic promoter. Alternately, the promoter can be a syn microorganism that includes at least one recombinant gene thetic promoter. Further, a promoter in an expression con encoding a prokaryotic acyl-ACP thioesterase can produce at struct that includes a gene encoding an acyl-ACP thioesterase least one fatty acid derivative, such as, but not limited to, one 30 can be a constitutive promoter, or, in alternate embodiments, or more fatty aldehydes, fatty alcohols, wax esters, alkanes, can be an inducible promoter. For example, the inducible alkenes, and/or a combination thereof. For example, the pho promoter can be controlled by or a lactose analogue, tosynthetic microorganism can produce at least one fatty acid and/or can be controlled by light and can be, for example, a derivative having a total number of carbons from 7 to 36, for secA promoter, an rbc promoter, a psaAB promoter, or apsbA example, from 7 to 34 or from 11 to 32 carbons. Additionally 35 promoter. or alternately, at least one fatty acid derivative produced by Still further additionally or alternately, the photosynthetic the photosynthetic microorganism can have a total number of microorganism of the described invention that includes a carbons of 7,8,9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, recombinant nucleic acid molecule encoding a prokaryotic 22, 23, 24, 26, 28, 30, 32, 34, and/or 36. acyl-ACP thioesterase can further comprise a recombinant Still further additionally or alternately, at least 30 wt %, for 40 nucleic acid molecule encoding an acetyl-CoA carboxylase example at least 40 wt %, at least 50 wt %, or at least 60 wt %, and/or a recombinant nucleic acid molecule encoding a B-ke of the free fatty acids and/or derivatives produced by the toacyl synthase (KAS). Yet further additionally or alternately, photosynthetic microorganism that includes an recombinant the photosynthetic microorganism of the described invention gene encoding a prokaryotic acyl-ACP thioesterase can be can have attenuated/disrupted expression of one or more free fatty acids having an acyl chain length of 8, 10, 12, 14, 16. 45 genes encoding acyl-ACP synthase, acyl-CoA synthase, acyl and/or 18 carbons and/or fatty acid derivatives having a total CoA dehydrogenase, glycerol-3-phosphate dehydrogenase, number of carbons of 7,8,9, 10, 11, 12, 13, 14, 15, 16, 17, 18, acetaldehyde-CoA dehydrogenase, pyruvate dehydrogenase, 19, 20, 21, 22, 23, 24, 26, 28, 30, 32, 34, and/or 36. or acetate kinase. For example, any of these genes can be Yet further additionally or alternately, the genetically engi knocked out by insertional mutagenesis and/or downregu neered photosynthetic microorganism provided herein can 50 lated via RNA interference or via antisense RNA-mediated include a recombinant nucleic acid molecule encoding a gene silencing. prokaryotic acyl-ACP thioesterase that is a member of Pfam The genetically engineered photosynthetic microorganism family PF01643. Yet still further additionally or alternately, in any of the embodiments provided herein can be, for the photosynthetic microorganism can encode a prokaryotic example, a microalga. For example, the photosynthetic acyl-ACP thioesterase that includes Pfam domain PFO1643, 55 microorganism can be a species of microalgal genus includ and the photosynthetic microorganism produces a fatty acid ing, but not limited to, Achnanthes, Amphiprora, Amphora, having an acyl chain length of 12, 14, and/or 16 carbons Ankistrodesmus, Asteromonas, Boekelovia, Borodinella, Bot and/or a fatty acid derivative having a total number of carbons ryococcus, Bracteococcus, Chaetoceros, Carteria, Chlamy from 11 to 32. domonas, Chlorococcum, Chlorogonium, Chlorella, Alternately or in addition, the genetically engineered 60 Chroomonas, Chrysosphaera, Cricosphaera, Crypthecod microorganism can be a photosynthetic organism and/or can inium, Cryptomonas, Cyclotella, Dunaliella, Ellipsoidon, include a recombinant nucleic acid molecule encoding a Emiliania, Eremosphaera, Ernodesmius, Euglena, Franceia, prokaryotic acyl-ACP thioesterase having at least 70%, for Fragilaria, Gloeothamnion, Haematococcus, Halocafeteria, example at least 75%, at least 80%, at least 85%, at least 90%, Hymenomonas, Isochrysis, Lepocinclis, Micractinium, at least 95%, at least 96%, at least 97%, at least 98%, at least 65 Monoraphidium, Nannochloris, Nannochloropsis, Navicula, 99%, or about 100%, amino acid sequence identity to SEQID Neochloris, Nephrochloris, Nephroselmis, Nitzschia, NO:1 or SEQIDNO:2. Further additionally or alternately, the Ochromonas, Oedogonium, Oocystis, Ostreococcus, Pav US 8,530,207 B2 10 lova, Parachlorella, Pascheria, Phaeodactylum, Phagus, tionally or alternately, a culture can be provided with at least Picochlorum, Platymonas, Pleurochrysis, Pleurococcus, one source of inorganic carbon, Such as, for example, bicar Prototheca, Pseudochlorella, Pseudoneochloris, Pyramino bonate or carbon dioxide (CO2), and/or the photosynthetic nas, Pyrobotrys, Scenedesmus, Skeletonema, Spyrogyra, Sti microorganisms in the culture can be exposed to light for at chococcus, Tetraselmis, Thalassiosira, Viridiella, and Volvox. 5 least a portion of the culturing period. More particularly, the photosynthetic microorganism can Additionally, a free fatty acid and/or derivative can be be a prokaryotic microorganism. For example, the photosyn isolated from the culture, e.g., from the cells, the growth thetic microorganism can be a species of cyanobacterial media, or the whole culture. For example, the isolation can be genus, including, but not limited to, Agmenellum, Anabaena, by organic extraction of whole and/or lysed cells, via removal Anabaenopsis, Anacystis, Aphanizomenon, Arthrospira, 10 of free fatty acids and/or derivatives as precipitates (e.g., from Asterocapsa, Borzia, Calothrix, Chamaesiphon, Chloroglo the upper layer of the culture media, also termed "skim eopsis, Chroococcidiopsis, Chroococcus, Crinalium, Cyano ming'), through the use of particulate adsorbents, bubbles, bacterium, Cyanobium, Cyanocystis, Cyanospira, Cyanoth and/or matrices that can bind the fatty acids or fatty acid eCe, Cylindrospermopsis, Cylindrospermum, derivatives, or combinations thereof. Dactylococcopsis, Dermocarpella, Fischerella, Fremyella, 15 Geitleria, Geitlerinema, Gloeobacter; Gloeocapsa, Gloeoth BRIEF DESCRIPTION OF THE DRAWINGS ece, Halospirulina, Iyengariella, Leptolyngbya, Limnothrix, Lyngbya, Microcoleus, Microcystis, Myxosarcina, Nodu FIG. 1 shows a schematic diagram of the biosynthetic laria, Nostoc, Nostochopsis, Oscillatoria, Phormidium, pathway for producing free fatty acids and fatty acid deriva Planktothrix, Pleurocapsa, Prochlorococcus, Prochloron, tives. Prochlorothrix, Pseudanabaena, Rivularia, Schizothrix, Scy FIG. 2 shows the physical map of the expression vector tonema, Spirulina, Stanieria, Starria, Stigonema, Symploca, (pSGI-YC) containing the prokaryotic acyl-ACP thioesterase Synechococcus, Synechocystis, Thermosynechococcus, Toly genes (EMRE031 (SEQ ID NO:3) and EMRE032 (SEQ ID pothrix, Trichodesmium, Tvchonema, and Xenococcus. NO:4)) of the invention. According to another aspect, the present invention pro 25 FIG. 3 shows the profile of the free fatty acids (FFA) vides a culture for producing a free fatty acid and/or derivative isolated from the culture of Escherichia coli K19 expressing comprising a population of photosynthetic microorganisms the EMRE031 gene (SEQID NO:3) and the profile of the free that can comprise a recombinant nucleic acid molecule fatty acids (FFA) isolated from the culture of a control encoding a prokaryotic acyl-ACP thioesterase. In certain Escherichia coli K19 strain without the EMRE031 gene (Ec embodiments, the growth media of the culture does not 30 control). include a reduced carbon Source, or at least a substantial FIG. 4 is a bar chart illustrating the free fatty acids isolated amount of a reduced carbon source, where a substantial from the culture of Escherichia coli K19 expressing the amount is an amount that can Support growth of the culture in EMRE032 gene (SEQ ID NO:4) and the profile of the free the absence of another energy source. fatty acids (FFA) isolated from the culture of a control In one preferred embodiment, the microorganisms in the 35 Escherichia coli K19 strain without the EMRE032 gene (Ec culture of the present invention can produce (and optionally control). but preferably release and/or secrete) at least one free fatty FIG. 5 is a bar chart illustrating the free fatty acids pro acid and/or fatty acid derivative. Additionally or alternately, duced by the photosynthetic cyanobacterium Synechocystis the microorganisms in the culture can produce a greater sp. PCC 6803 (labeled as 31YC63) expressing the EMRE031 amount of a fatty acid and/or fatty acid derivative than a 40 acyl-ACP thioesterase (SEQID NO: 1) and by the photosyn culture of the same photosynthetic microorganism that does thetic cyanobacteria Synechocystis sp. PCC 6803 (labeled as not include a recombinant nucleic acid molecule encoding a 32YC63) expressing the EMRE032 acyl-ACP thioesterase prokaryotic acyl-ACP thioesterase, in which the culture is (SEQID NO:2). identical in other respects. Further additionally or alternately, the microorganisms in the culture can includes a recombinant 45 DETAILED DESCRIPTION nucleic acid molecule encoding an acyl-ACP thioesterase, in which the culture can further include at least 5 mg per liter, for The described invention provides a composition and example at least 10 mg per liter, at least 20 mg per liter, at least method for producing one or more free fatty acids and/or 30 mg per liter, at least 40 mg per liter, or at least 50 mg per derivatives thereof comprising expressing a prokaryotic acyl liter, of free fatty acids and/or fatty acid derivatives, for 50 ACP thioesterase in a microorganism (e.g., by expressing a example, produced over a period from six hours to ten days. recombinant gene encoding a prokaryotic acyl-ACP The fatty acids and/or fatty acid derivatives can be present in thioesterase). the media, for example, as precipitates in or on, at or near the Surface of the media, associated with the media vessel as GLOSSARY droplets, including Suspended droplets (e.g., an emulsion), as 55 a relatively immiscible layer floating on top of the aqueous The abbreviations used herein for amino acids are those culture medium, as a "scum, film, gel, semi-solid, colloid, abbreviations which are conventionally used: fine particulate, particulate, Solid, or aggregate that may be A=Ala-Alanine: R=Arg-Arginine; N=Asn-Asparagine; dispersed, Suspended, or entrained within the culture D-Asp-Aspartic acid; C=Cys-Cysteine: medium, associated with the cells of the photosynthetic 60 Q-Gln-Glutamine; E-Glu-Glutamic acid; G=Gly-Glycine; microorganism, phase separated in Some other fashion, or a H=His=Histidine; I=Ile=lsoleucine; L=Leu-Leucine; combination thereof. K-Lys-Lysine; M=Met=Methionine: Additionally or alternately, the growth medium of the cul F=Phe-Phenylalanine: P=Pro-Proline; S-Ser-Serine: ture may not include a Substantial amount of a reduced carbon T=Thr=Threonine; W=Trp-Tryptophan: Y=Tyr=Tyrosine: Source, where a Substantial amount is an amount that can 65 V=Val-Valine. The amino acids may be L- or D-amino acids. Support growth of the culture in the absence of another energy An amino acid may be replaced by a synthetic amino acid Source that can be used by the microorganisms. Further, addi which is altered, for example, so as to increase the half-life of US 8,530,207 B2 11 12 the peptide or to modify the activity of the peptide, or to complex organic compounds (carbohydrates, fats, and pro increase the bioavailability of the peptide. teins) from simple inorganic molecules, which include car The phrase “conservative amino acid substitution' or “con bon dioxide and other nonreduced sources of carbons, such as servative mutation” as used herein refers to the replacement bicarbonate, using energy from light (by photosynthesis). of one amino acid by another amino acid with a common "Phototrophic growth' is growth using light as an energy property. A functional way to define common properties Source, and does not require a reduced carbon Source Such as between individual amino acids is to analyze the normalized a Sugar, carbohydrate, organic acid, amino acid, protein, lipid, frequencies of amino acid changes between corresponding etc. proteins of homologous organisms (Schulz, G. E. and R. H. The term “biofuel.” as used herein, refers to any fuel that is Schirmer, Principles of Protein Structure, Springer-Verlag). 10 obtained from a renewable biological resource. According to Such analyses, groups of amino acids can be The term “carbon source, as used herein, refers to a com defined where amino acids within a group exchange prefer pound that provides carbon needed for biosynthesis of new entially with each other, and therefore resemble each other organic molecules by a cell or microorganism. most in their impact on the overall protein structure (Schulz, The term “clade.” as used herein, refers to a group of G. E. and R. H. Schirmer, Principles of Protein Structure, 15 biological taxa or species that share features inherited from a Springer-Verlag). Examples of amino acid groups defined in common ancestor. A clade includes an ancestral lineage and this manner can include: a “charged/polar group, including all the descendants of that ancestor. The term clade is used Glu, Asp, ASn, Gln, Lys, Arg, and His; an 'aromatic or cyclic also to refer to a grouping of genes or proteins by relatedness group, including Pro, Phe, Tyr, and Trp; and an “aliphatic (homology) of their sequences. group' including Gly, Ala, Val, Leu, Ile, Met, Ser, Thr, and A gene that is "codon-optimized' for expression in an Cys. Within each group, subgroups can also be identified. For organism is a gene whose nucleotide sequence has been example, the group of charged/polar amino acids can be Sub altered with respect to the original nucleotide sequence. Such divided into Sub-groups including: the “positively-charged that one or more codons of the nucleotide sequence has been Sub-group.’ comprising Lys, Arg and His; the “negatively changed to a different codon that encodes the same amino charged sub-group comprising Glu and Asp; and the “polar 25 acid, in which the new codon is used more frequently in genes Sub-group' comprising ASn and Gln. In another example, the of the organism of interest than the original codon. The aromatic or cyclic group can be sub-divided into Sub-groups degeneracy of the genetic code provides that all amino acids including: the "nitrogen ring Sub-group comprising Pro, except for methionine and tryptophan are encoded by more His, and Trp; and the “phenyl sub-group' comprising Phe and than one codon. For example, arginine, leucine, and serine are Tyr. In another further example, the aliphatic group can be 30 encoded by six different codons; and glycine, alanine, Valine, Sub-divided into Sub-groups including: the “large aliphatic threonine, and proline are encoded by four different codons. non-polar sub-group.’ comprising Val, Leu, and Ile; the “ali Many organisms use certain codons to encode a particular phatic slightly-polar Sub-group.’ comprising Met, Ser, Thr, amino acid more frequently than others. Without limiting any and Cys; and the 'small-residue sub-group.’ comprising Gly aspects of the invention to any particular mechanism, it is and Ala. Examples of conservative mutations include amino 35 believed that some tRNAs for a given amino acid are more acid substitutions of amino acids within the Sub-groups prevalent than others within a particular organism, and genes above, such as, but not limited to: requiring a rare tRNA for translation of the encoded protein Alanine (A), Serine (S). Threonine (T): may be expressed at a low level due in part to a limiting Aspartic Acid (D), Glutamic Acid (E): amount of the rare tRNA. Thus, for adequate or optimal levels Asparagine (N). Glutamic Acid (Q); 40 of expression of an encoded protein, a gene may be "codon Arginine (R), Lysine (K); optimized to change one or more codons to new codons Isoleucine (I), Leucine (L), Methionine (M), Valine (V); (“preferred codons') that are among those used more fre and quently in the genes of the host organism (referred to as the Phenylalanine (F), Tyrosine (Y), Tryptophan (W). "codon preference of the organism). As used in the context The term “acyl-acyl carrier protein thioesterase' or “acyl 45 of the invention, a "codon-optimized' gene or nucleic acid ACP thioesterase, as used herein, refers to a thioesterase molecule of the invention need not have every codon altered enzyme that hydrolyzes an acyl-ACP ester linkage in prefer to conform to the codon preference of the intended host ence to other Substrates, such as an acyl-CoA substrate and/or organism, nor is it required that altered codons of a "codon a hydroxybenzoyl-CoA substrate (e.g., 4-hydroxybenzoyl optimized gene or nucleic acid molecule be changed to the CoA, 2,5-dihydroxybenzoyl-CoA, or the like), and can 50 most prevalent codon used by the organism of interest. For include an acyl-ACP thioesterase belonging to Protein family example, a codon-optimized gene may have one or more (Pfam) PFO1643 (at pfam.cgb.ki.se/; at pfamjanelia.org/; at codons changed to codons that are used more frequently that pfam.sanger.ac.uk). the original codon(s), whether or not they are used most The term “attenuate.” as used herein, means to weaken or frequently in the organism to encode a particular amino acid. reduce in force, intensity, activity, effect, or quantity. 55 The term “controllable regulatory element' or “regulatory The term “autotroph”, as used herein, refers to an organism element, as used herein, refers to nucleic acid sequences that produces complex organic compounds (carbohydrates, capable of effecting the expression of the nucleic acids, or the fats, and proteins) from simple inorganic molecules using peptide or protein product thereof. Controllable regulatory energy from light (by photosynthesis) or inorganic chemical elements may be operably linked to the nucleic acids, pep reactions. They are typically able to make their own food. 60 tides, or proteins of the present invention. The controllable Some autotrophs can fix carbon dioxide. regulatory elements, such as, but not limited to, control The term 'autotrophic, as used herein, refers to an organ sequences, need not be contiguous with the nucleic acids, ism that is capable of producing complex organic compounds peptides, or proteins whose expression they control as long as (carbohydrates, fats, and proteins) from simple inorganic they function to direct the expression thereof. Thus, for molecules using energy from light (by photosynthesis) and/or 65 example, intervening untranslated yet transcribed sequences inorganic chemical reactions. The term "photoautotrophic.” may be present between a promoter sequence and a nucleic as used herein, refers to an organism capable of producing acid of the present invention and the promoter sequence may US 8,530,207 B2 13 14 still be considered “operably linked to the coding sequence. an acyl moiety containing at least four carbons (for example, Other such control sequences include, but are not limited to, at least 6 carbons, for example at least 8 carbons), in which the enhancer sequences, sequences regulating translation, acyl moiety (i) is covalently linked to a hydrogen atom, (ii) sequences regulating mRNA stability, polyadenylation sig has an ionic charge, to which a counterion can be associated nals, termination signals, and ribosome binding sites. (even if loosely and/or solvent-separated), and/or (iii) is oth The term, “endogenous,” as used herein, refers to sub erwise associated (not covalently) with a moiety other than stances originating or produced within an organism. An hydrogen, for example, through an ester bond, Such that a free "endogenous gene or protein is a gene or protein residing in fatty acid is relatively easily transformable into the corre a species that is also derived from that species. sponding acid form or the corresponding ionic form (e.g., The term “native' is used herein to refer to nucleic acid 10 through hydrogen-bonding or the like). Nonlimiting sequences oramino acid sequences as they naturally occur in examples of counterions can include metals salts (such as the host. The term “non-native' is used herein to refer to calcium, sodium, potassium, aluminum, iron, and the like, nucleic acid sequences or amino acid sequences that do not and combinations thereof), other inorganic ions (such as occur naturally in the host. A nucleic acid sequence or amino ammonium, mono-, di-, tri-, and tetra-alkylammonium, Sul acid sequence that has been removed from a host cell. Sub 15 fonium, phosphonium, and the like, and combinations jected to laboratory manipulation, and reintroduced into a thereof), organic ions (such as carbocations), and the like, and host cell is considered “non-native.” combinations thereof. The term “free fatty acids as used An “episome' is a nucleic acid molecule that is not inte herein also refers to fatty acids which are not covalently grated into the chromosome or chromosomes of the cell and bound to any other moiety with the exception of hydrogen replicates autonomously in a cell. An "episomal nucleic acid (bound by the carboxylic acid group). For example, a free molecule or sequence is a gene, nucleic acid molecule, or fatty acid is not bound to other molecules such as ACP, coen nucleic acid sequence that is integrated into an episome. An Zyme A (CoA), or glycerol (for example, as part of a triglyc example of an episome is a plasmid, which is a circular DNA eride, diglyceride, monoglyceride, or phospholipid mol molecule outside of the chromosome(s) that includes an ori ecule). Free fatty acids contain a carboxyl group (-COOH), gin of replication and replicates autonomously within the cell. 25 which can be ionized into an anionic carboxylate form “Expression construct” refers to a nucleic acid that has (R—COO ; R: hydrocarbons). been generated via human intervention, including by recom Fatty acids can have an even or an odd number of carbon binant means or direct chemical synthesis, with a series of atoms (e.g., heptadecanoic-C17) and can also have branched specified nucleic acid elements that permit transcription and/ chains (e.g., isopalmitic acid, anteisononadecanoic acid) or or translation of a particular nucleic acid in a host cell. The 30 carbocyclic units (e.g., Sterculic acid, chaulmoogric acid). expression construct can be part of a plasmid, virus, or nucleic In some fatty acids, the hydrocarbon chain is fully satu acid fragment. rated (meaning contains no double bonds) and unbranched; The term "exogenous as used herein, refers to a Substance others contain one (monounsaturated) or more double bonds or molecule originating or produced outside of an organism. (unsaturated). A simplified nomenclature for these com The term "exogenous gene' or “exogenous nucleic acid mol 35 pounds specifies the chain length and number of double ecule, as used herein, refers to a nucleic acid that codes for bonds, separated by a colon; the 16-carbon Saturated palmitic the expression of an RNA and/or protein that has been intro acid is abbreviated 16:0, and the 18-carbon oleic acid, with duced (“transformed') into a cell or a progenitor of the cell. one double bond, is 18:1. The positions of any double bonds An exogenous gene may be from a different species (and so a are specified by SuperScript numbers following A (delta); a "heterologous' gene) or from the same species (and so a 40 20-carbon fatty acid with one double bond between C-9 and “homologous' gene), relative to the cell being transformed. A C-10 (C-1 being the carboxyl carbon), and another between transformed cell may be referred to as a recombinant cell. An C-12 and C-13, is designated 20:2 (A9,12), for example. The "endogenous nucleic acid molecule, gene, or protein can most commonly occurring fatty acids have even numbers of represent the organism’s own gene or protein as it is naturally carbonatoms in an unbranched chain of 12 to 24 carbons. The produced by the organism. 45 even number of carbons results from the mode of synthesis of The term “expressing or “expression.” as used herein, these compounds, which involves condensation of acetate means the transcription and translation of a nucleic acid mol (two-carbon) units. The position of double bonds is also regu ecule by a cell. Expression can be, for example, constitutive lar; in most monounsaturated fatty acids, the double bond is or regulated. Such as, by an inducible promoter (e.g., lac between C-9 and C-10 (A9), and other double bonds of poly , which can be triggered by Isopropyl B-D-1-thioga 50 unsaturated fatty acids are generally A12 and A15. The double lactopyranoside (IPTG)). bonds of almost all naturally-occurring unsaturated fatty The term “fatty acid, as used herein, is meant to refer to a acids are in the cis configuration. (Lehninger et al., Principles non-esterified a carboxylic acid having an alkyl chain of at of Biochemistry, Vol. 1, Macmillan, 2005) least 3 carbons (that is, having an acyl chain of at least 4 Examples of Saturated fatty acids include, but are not lim carbons) or its corresponding carboxylate anion, denoted as 55 ited to, butanoic (butyric) acid (C4), hexanoic (caproic) acid RCOOH or RCOO respectively, where R is an alkyl chain (C6), octanoic (caprylic) acid (C8), decanoic (capric) acid of between 3 and 23 carbons. A “free fatty acid is substan (C10), dodecanoic (lauric) acid (C12), tetradecanoic (myris tially unassociated, e.g., with a protein, within or outside an tic) acid (C14), hexadecanoic (palmitic) acid (C16), octade organism (e.g., globular and/or micellular storage within an canoic (stearic) acid (C18), and eicosanoic (arachidic) acid organism, without esterification, can still qualify as a free 60 (C20), docosanoic (behenic) acid (C22), tetracosanoic (ligno fatty acid). Thus, a free fatty acid according to the present ceric) acid (C24). Examples of unsaturated fatty acids invention need not necessarily be a strict acid or be structur include, but are not limited to, myristoleic acid (C14:1, ally “free”, but a free fatty acid specifically does not include cis A9), palmitoleic acid (C16:1, cis A9), sapienic acid (C16:1. an acyl moiety whose carboxylate oxygen is covalently cis A6), oleic acid (C18:1, cis-A9), linoleic acid (C18:2, cisa9. linked to any other moiety besides a hydrogenatom, meaning 65 cis, A12), C.-linolenic acid (C18:3, cis/\9, cisa 12, cis A15), that fatty acid esters are specifically not included in free fatty Y-linolenic acid (C18:3, cisa6, cisa9, cis A12), arachidonic acids. However, a free fatty acid can advantageously include acid (C20:4, cis A5, cis A8, cis-A11, cis A14), eicosapentaenoic US 8,530,207 B2 15 16 acid (C20:5, cis A5, cis A8, cisa11, cis A14, cisa 17), erucic untranslated regions, introns, ribosome binding sites, pro acid (C22:1, cis-A13), and docosahexaenoic acid (C22:6. moter or enhancer regions, or other associated and/or regula cis A4, cis A7, cis A10, cis-A13, cis A16, cis A19). Long chain tory sequence regions. fatty acids also can be made from more readily available The terms “gene expression' and “expression” are used shorter chain fatty acids (C12-C18) by appropriate chain interchangeably herein to refer to the process by which inher extension procedures. itable information from a gene. Such as a DNA sequence, is Nonlimiting examples of naturally-occurring branched made into a functional gene product, such as protein or RNA. chain fatty acids include the iso fatty acids (mainly with an The term 'genetic engineering as used herein, refers to even number of carbon atoms) and the anteiso fatty acids the use of molecular biology methods to manipulate nucleic (mainly with an odd number of carbon atoms), polymethyl 10 acid sequences and introduce nucleic acid molecules into host branched acids in bacterial lipids, and phytol-based acids. organisms. The term “genetically engineered as used The most common cyclic acids contain a cyclopropane, herein, means a cell that has been Subjected to recombinant cyclopropene, or cyclopentene unit. Cyclopropane acids DNA manipulations, such as the introduction of exogenous occur in bacterial membrane phospholipids and are mainly 15 nucleic acid molecule, resulting in a cell that is in a form not C17 or C19 (lactobacillic) acids. The cyclopropane unit, like found originally in nature. cis double bond, introduces a discontinuity in the molecule The term “growth, as used herein, refers to a process of and increases fluidity in the membrane. becoming larger, longer or more numerous, or can indicate an The physical properties of the fatty acids, and of com increase in size, number, or Volume of cells in a cell popula pounds that contain them, are determined largely by the tion. length and degree of unsaturation of the hydrocarbon chain. The term "heterotrophic.” as used herein, refers to requir The nonpolar hydrocarbon chain accounts for the poor solu ing reduced carbon Substrates for growth. bility of fatty acids in water. The longer the fatty acyl chain The term "heterotroph, as used herein, refers to an organ and the fewer the double bonds, the lower the solubility in ism that does not produce its own food and must acquire some water. The carboxylic acid group is polar (and ionized at 25 of its nutrients from the environment, e.g., in the form of neutral pH) and accounts for the slight solubility of short reduced carbon. chain fatty acids in water. The melting points of fatty acids A “homolog” of a gene or protein refers to its functional and of compounds that contain them are influenced also equivalent in another species. strongly by the length and degree of unsaturation of the The term “hydrocarbon, as used herein, refers to any of the 30 organic compounds made up exclusively of hydrogen and hydrocarbon chain. In the fully Saturated compounds, free carbon in various ratios. rotation around each of the carbon-carbon bonds gives the The term “hybridization” refers to the binding oftwo single hydrocarbon chain great flexibility; the most stable confor Stranded nucleic acid molecules to each other through base mation is this fully extended form, in which the steric hin pairing. Nucleotides will bind to their complement under drance of neighboring atoms is minimized. These molecule 35 normal conditions, so two perfectly complementary strands can pack together tightly in nearly crystalline arrays, with will bind (or anneal) to each other readily. However, due to atoms all along their lengths in van der Waals contact with the the different molecular geometries of the nucleotides, a single atoms of neighboring molecules. A cis double bond forces a inconsistency between the two strands will make binding kink in the hydrocarbon chain. Fatty acids with one or several between them more energetically unfavorable. Measuring the of such kinkScannot pack togetheras tightly as fully saturated 40 effects of base incompatibility by quantifying the rate at fatty acids and their interactions with each other are therefore which two strands anneal can provide information as to the weaker. Because it takes less thermal energy to disorder these similarity in base sequence between the two strands being poorly ordered arrays of unsaturated fatty acids, they have annealed. lower melting points than Saturated fatty acids of the same The term “inducer, as used herein, refers to a molecule that chain length (Lehninger et al., Principles of Biochemistry, 45 can initiate the transcription of a gene, which is controlled by Vol. 1, Macmillan, 2005). a inducible promoter. The term “fatty acid derivative,” as used herein, refers to an The term “inducible promoter,” as used herein, refers to a organic molecule derived from a fatty acid. Examples offatty promoter, whose activity in promoting transcription of a gene acid derivative include, but are not limited to, C1-C5 fatty to which it is operably linked is controlled by an environmen acid esters such as fatty acid methyl esters and fatty acid ethyl 50 tal condition (e.g., temperature, light, or the like) or the pres esters, wax esters, fatty alcohols, fatty aldehydes, alkanes, ence of a factor Such as a specific compound or biomolecule. and alkanes. The term “constitutive promoter” refers to a promoter whose The term “fatty alcohol.” as used herein, refers to an alco activity is maintained at a relatively constant level in all cells hol made from a fatty acid or fatty acid derivative and having of an organism with little or no regard to cell environmental the formula ROH, where R is a hydrocarbon chain. The 55 conditions (as the concentration of a Substrate). hydrocarbon chain of the fatty alcohol can be straight or The terms “inhibiting”, “inhibit,” and “inhibition, as used branched. The hydrocarbon chain can be saturated or unsat herein, refer to reducing the amount or rate of a process, to urated. stopping the process entirely, or to decreasing, limiting, or The term “fatty aldehyde, as used herein, refers to an blocking the action or function thereof. Inhibition may aldehyde made from a fatty acid or fatty acid derivative and 60 include a reduction or decrease of the amount, rate, action having the formula RCHO, where R is a hydrocarbon chain. function, or process by at least 5%, for example at least 10%, The hydrocarbon of the fatty aldehyde can be saturated or at least 15%, at least 20%, at least 25%, at least 30%, at least unsaturated. 40%, at least 45%, at least 50%, at least 55%, at least 60%, at The term “gene.” as used herein, refers to a nucleic acid least 65%, at least 70%, at least 75%, at least 80%, at least molecule that encodes a protein or functional RNA (for 65 85%, at least 90%, at least 95%, at least 98%, or at least 99%, example, a tRNA). A gene can include regions that do not when compared to a reference substance, wherein the refer encode the final protein or RNA product, such as 5' or 3' ence Substance is a substance that is not inhibited. US 8,530,207 B2 17 18 “Inorganic carbon is a carbon-containing compound or The term “microorganism” refers to a living organism so molecule that cannot be used as an energy source by an small in size that it is only visible with the aid of a microscope. organism. Typically “inorganic carbon is in the form of CO2 The term “mixotrophic, as used herein, refers to cells or (carbon dioxide), carbonic acid, bicarbonate, or carbonate, organisms capable of using a mix of different sources of which cannot be oxidized for energy or used as a source of 5 energy and carbon, for example, using phototrophy (meaning reducing power by organisms. growth using energy from light) and chemotrophy (meaning The term “insertional mutagenesis.” as used herein, refers growth using energy by the oxidation of electron donors), or to a mutagenesis of DNA by the insertion of exogenous DNA between chemical autotrophy and heterotrophy. into a gene. The term “nucleic acid, as used herein, refers to a deox 10 yribonucleotide or ribonucleotide polymer in either single- or The term "isolate.” as used herein, refers to a process of double-stranded form, and unless otherwise limited, encom obtaining a substance, molecule, protein, peptide, nucleic passes known analogues having the essential nature of natural acid, or antibody that is substantially free of other substances nucleotides in that they hybridize to single-stranded nucleic with which it is ordinarily found in nature or in vivo systems acids in a manner similar to naturally occurring nucleotides to an extent practical and appropriate for its intended use. 15 (e.g., peptide nucleic acids). The term "isolated’ refers to a material, such as a nucleic The term “nucleotide, as used herein, refers to a chemical acid, a peptide, or a protein, which is: (1) Substantially or compound that consists of a heterocyclic base, a Sugar, and essentially free from components that normally accompany one or more phosphate groups. In the most common nucle or interact with it as found in its naturally occurring environ otides the base is a derivative of purine or pyrimidine, and the ment, or (2) if the material is in its natural environment, the sugar is the pentose deoxyribose or ribose. Nucleotides are material has been synthetically (non-naturally) altered by the monomers of nucleic acids, with three or more bonding deliberate human intervention to a composition and/or placed together in order to form a nucleic acid. Nucleotides are the at a location in the cell (e.g., genome or Subcellular organelle) structural units of RNA, DNA, and several cofactors, includ not native to a material found in that environment. The term ing, but not limited to, CoA, FAD, DMN, NAD, and NADP “substantially or essentially free’ is used to refer to a material, 25 The purines include adenine (A), and guanine (G); the pyri which is at least 80% free, for example at least 90% free, at midines include cytosine (C), thymine (T), and uracil (U). least 95% free, or at least 99% free (with percentages being The term “operably linked as used herein, refers to a weight percentages only when applicable) from components functional linkage between a genetic regulatory element or that normally accompany or interact with it as found in its region and a second nucleic acid sequence, wherein the naturally occurring environment. The isolated material 30 genetic regulatory element or region promotes, inhibits, ter optionally comprises material not found with the material in minates, initiates, or mediates transcription, translation, turn its natural environment. over, processing, or transport, of the nucleic acid sequence The term "heterologous,” as used herein, refers to nucleic corresponding to the second sequence. acids derived from a different species than that into which The term “origin of replication, as used herein, refers to a they are introduced or than they reside in through genetic 35 particular sequence in a genome, chromosome, or episome at engineering of the organism or its ancestor. A heterologous which replication of DNA is initiated. protein is derived from a species other than that is produced in The term “open reading frame,” as used herein, refers to a or introduced into. A heterologous nucleic acid sequence, sequence of nucleotides in a DNA molecule that encodes a gene, or protein, is a nucleic acid sequence, gene, or protein sequence of amino acids uninterrupted by a stop codon that derived from an organism other than that it is introduced into 40 has the potential to encode at least a portion of a peptide or or resides in. protein. A complete open reading frame starts with a start When referring to gene regulatory elements, "heterolo codon (typically ATG), is followed by a string of codons each gous' refers to a gene regulatory element that is operably of which encodes an amino acid, and ends with a stop codon linked to a gene with which it is not associated in nature. The (TAA, TAG or TGA). Open reading frames often can be term "heterologous expression, as used herein, means that a 45 confirmed by matching their sequences to a database of heterologous nucleic acid encoding a protein (e.g., an sequenced genes or expressed sequence tags (ESTs). enzyme) is put into a cell that does not normally make (i.e., The term “overexpressed, as used herein, refers to express) that protein. increased quantity of a gene or gene product relative to a The term “lactose analogue, as used herein, refers to a quantity of the gene or gene product under normal conditions. compound used as a Substitute for lactose, wherein the glu 50 The term "peptide.” as used herein, refers to a biopolymer cose moiety of lactose is replaced by another chemical group. formed from the linking together, in a defined order, of amino Examples of a lactose analogue include, but are not limited to, acids. The link between one amino acid residue and the next isopropyl-f-D-thio-galactoside (IPTG), phenyl-f-D-galac is known as an amide or peptide bond. The term “polypep tose (phenyl-Gal), and allolactose. tide, as used herein, refers to a single chain of amino acids, The term “lipid,” as used herein, refers to a chemically 55 and a “protein’ refers to one or more polypeptides. The terms diverse group of compounds, the common and defining fea polypeptide, peptide, and protein are also inclusive of modi ture being their insolubility in water. fications including, but not limited to, glycosylation, lipid The term “metabolic engineering, as used herein, gener attachment, Sulfation, gamma-carboxylation of glutamic acid ally refers to the targeted and purposeful alteration of meta residues, hydroxylation and ADP-ribosylation. Polypeptides bolic pathways found in an organism in order to better under 60 may not be entirely linear. For instance, polypeptides may be stand and utilize cellular pathways for chemical branched as a result of ubiquitination, and they may be cir transformation, energy transduction, and Supramolecular cular, with or without branching, generally as a result of assembly. posttranslational events, including natural processing event The term “metabolic intermediate, as used herein, refers and events brought about by human manipulation which do to a precursor molecule produced by a series of enzymatic 65 not occur naturally. reactions, which is altered by the Subsequent enzymatic reac The term “Pfam” refers to a large collection of protein tions. domains and protein families maintained by the Pfam Con US 8,530,207 B2 19 20 sortium and available at several sponsored world wide web indicated, the term includes reference to the specified sites, including: pfam.sanger.ac.uk/ (Welcome Trust, Sanger sequence as well as the complementary sequence thereof. Institute); pfam.sbc.su.se/(Stockholm Bioinformatics Cen Thus, DNAS or RNAs with backbones modified for stability ter); pfamjanelia.org/(Janelia Farm, Howard Hughes Medi or for other reasons are “polynucleotides” as that term is cal Institute); pfamjouy.inra.fr/(Institut national de la intended herein. Moreover, DNAS or RNAs comprising Recherche Agronomique); and pfam.ccbb.re.kr/. A recent unusual bases, such as inosine, or modified bases, such as release of Pfam is Pfam 24.0 (October 2009, 11912 families) tritylated bases, to name just two examples, are polynucle based on the UniProt protein database release 15.6, a com otides as the term is used herein. It will be appreciated that a posite of Swiss-Prot release 57.6 and TrEMBL release 40.6. great variety of modifications have been made to DNA and Pfam domains and families are identified using multiple 10 RNA that serve many useful purposes are known to those of sequence alignments and hidden Markov models (HMMs). skill in the art. The term polynucleotide, as it is employed Pfam-A families, which are based on high quality assign herein, embraces such chemically, enzymatically or meta ments, are generated by a curated seed alignment using rep bolically modified forms of polynucleotides, as well as the resentative members of a protein family and profile hidden chemical forms of DNA and RNA characteristic of viruses Markov models based on the seed alignment. All identified 15 and cells, including among other things, simple and complex sequences belonging to the family are then used to automati cells. cally generate a full alignment for the family (Sonnhammeret The term “primer' refers to a nucleic acid molecule which, al. (1998) Nucleic Acids Research 26: 320-322; Bateman et when hybridized to a strand of DNA or RNA, is capable of al. (2000) Nucleic Acids Research 26: 263-266: Bateman et serving as the substrate to which nucleotides are added in the al. (2004) Nucleic Acids Research 32, Database Issue: D138 synthesis of an extension product in the presence of a Suitable D141; Finn et al. (2006) Nucleic Acids Research Database polymerization agent (e.g., a DNA polymerase). In some Issue 34: D247-251; Finn et al. (2010) Nucleic Acids cases, the primer is sufficiently long to uniquely hybridize to Research Database Issue 38: D211-222). By accessing the a specific region of a DNA or RNA strand. pfam database, for example, using any of the above-reference The term “promoter,” as used herein, refers to a region of websites, protein sequences can be queried against the hidden 25 DNA proximal to the start site of transcription, which is Markov models (HMMs) using HMMER homology search involved in recognition and binding of RNA polymerase and software (e.g., HMMER3, hmmer.janelia.org/). Significant other proteins to initiate transcription. A given promoter may matches that identify a queried protein as being in a pfam work in concert with other regulatory regions (enhancers, family (or as having a particular pfam domain) are those in silencers, boundary elements/insulators) in order to direct the which the bit score is greater than or equal to the gathering 30 level of transcription of a given gene. threshold for the Pfam domain. The term "gathering threshold The term “lac promoter as used herein, refers to a pro (GA)” or 'gathering cut-off” as used herein, refers to a search moter of the , whose transcription activity is threshold value used to build a full alignment. The gathering repressed by a repressor protein (i.e., the LacI protein threshold is the minimum score that a sequence must attain in encoded by the lad gene) but relieved by an inducer, such as, order to belong the full alignment of a Pfam entry. The gath 35 lactose or analogues thereof (e.g., isopropyl-B-D-thiogalac ering threshold for the Acyl-ACP thioesterase family toside (IPTG)). The inducer binds to the repressor protein and (PFO1643) is 20.3. Expectation values (evalues) can also be prevents it from repressing gene transcription. used as a criterion for inclusion of a queried protein in a pfam The term “tac promoter,” as used herein, refers to a strong or for determining whether a queried protein has a particular hybrid promoter composed of the position -35 region of the pfam domain, where low e values (much less than 1.0, for 40 trp promoter and the position -10 region of the lacUV5 pro example less than 0.1, or less than or equal to 0.01) represent moter/operator. Expression of the tac promoter is repressed low probabilities that a match is due to chance. by the LacI protein. The lacIq allele is a promoter mutation The term “phototroph,” as used herein, refers to an organ that increases the intracellular concentration of the LacI ism which uses Sunlight as its primary energy source. “Pho repressor, resulting in strong repression of tac promoter. The totrophic' growth or culture means growth or culture in 45 transcriptional activity of the tac promoter is controlled by a which the organisms use light, and not organic molecules, for lactose or analogues thereof. energy. The term “trc promoter,” as used herein, refers to a hybrid The term "photosynthetic microorganism as used herein, promoter sequence of the lac and trp promoters. The tran includes, but is not limited to, all algae, microalgae, and scriptional activity of the trc promoter also is controlled by photosynthetic bacteria, which can grow phototrophically. 50 lactose or analogues thereof. One example of a trc promoteris The term “plasmid, as used herein, refers to a DNA mol the trcY promoter (SEQID NO:9). ecule that is separate from, and can replicate independently The term “recombination, as used herein, refers to the of the chromosomal DNA of a cell. It is double stranded and, process by which pieces of DNA are broken apart and recom in many cases, circular. bined. The term “homologous recombination, as used The term “polypeptide' is used herein to refer to a peptide 55 herein, refers to a type of genetic recombination in which containing from about 10 to more than about 1000 amino nucleotide sequences are exchanged between two similar or acids. identical molecules of DNA. The term “polynucleotide' or “nucleic acid molecule' A “recombinant’ or “engineered nucleic acid molecule is refers to a deoxyribopolynucleotide, ribopolynucleotide, or a nucleic acid molecule that has been altered through human an analog thereof that has the essential nature of a natural 60 manipulation. As non-limiting examples, a recombinant deoxyribopolynucleotide or ribonucleotide in that it hybrid nucleic acid molecule includes any nucleic acid molecule izes, under Stringent hybridization conditions, to Substan that: 1) has been partially or fully synthesized or modified in tially the same nucleotide sequence as naturally occurring vitro, for example, using chemical or enzymatic techniques nucleotides and/or allow translation into the same amino (e.g., by use of chemical nucleic acid synthesis, or by use of acid(s) as the naturally occurring nucleotide(s). A polynucle 65 enzymes for the replication, polymerization, digestion (exo otide may be full-length or a Subsequence of a native or nucleolytic or endonucleolytic), ligation, reverse transcrip heterologous structural or regulatory gene. Unless otherwise tion, transcription, base modification (including, e.g., methy US 8,530,207 B2 21 22 lation), integration or recombination (including homologous or deletions (i.e., gaps) compared to the reference sequence and site-specific recombination) of nucleic acid molecules); (which does not comprise additions or deletions) for optimal 2) includes conjoined nucleotide sequences that are not con alignment of the two sequences. Generally, the comparison joined in nature, 3) has been engineered using molecular window is at least 20 contiguous nucleotides in length, and cloning techniques such that it lacks one or more nucleotides optionally can be at least 30 contiguous nucleotides in length, with respect to the naturally occurring nucleic acid molecule for example at least 40 contiguous nucleotides in length, at sequence, and/or 4) has been manipulated using molecular least 50 contiguous nucleotides in length, at least 100 con cloning techniques such that it has one or more sequence tiguous nucleotides in length, or longer. Those of skill in the changes or rearrangements with respect to the naturally art understand that to avoid a high similarity to a reference occurring nucleic acid sequence. As non-limiting examples, a 10 sequence due to inclusion of gaps in the polynucleotide cDNA is a recombinant DNA molecule, as is any nucleic acid sequence, a gap penalty typically is introduced and is Sub molecule that has been generated by in vitro polymerase tracted from the number of matches. reaction(s), or to which linkers have been attached, or that has Methods of alignment of sequences for comparison are been integrated into a vector. Such as a cloning vector or well-known in the art. Optimal alignment of sequences for expression vector. 15 comparison may be conducted by the local homology algo When applied to organisms, the term recombinant, engi rithm of Smith and Waterman, Adv. Appl. Math. 2:482 neered, or genetically engineered refers to organisms that (1981); by the homology alignment algorithm of Needleman have been manipulated by introduction of an exogenous or and Wunsch, J. Mol. Biol. 48:443 (1970); by the search for recombinant nucleic acid sequence into the organism, and similarity method of Pearson and Lipman, Proc. Natl. Acad. includes organisms having gene knockouts, targeted muta Sci. 85:2444 (1988); by computerized implementations of tions and gene replacement, promoter replacement, deletion, these algorithms, including, but not limited to: CLUSTAL in or insertion, as well as organisms having exogenous genes the PC/Gene program by Intelligenetics, Mountain View, that have been introduced into the organism. An exogenous or Calif.; GAP, BESTFIT, BLAST, FASTA, and TFASTA in the recombinant nucleic acid molecule can be integrated into the Wisconsin Genetics Software Package, Genetics Computer recombinant/genetically engineered organism's genome or in 25 Group (GCG), 575 Science Dr. Madison, Wis., USA; the other instances may not be integrated into the recombinant/ CLUSTAL program is well described by Higgins and Sharp, genetically engineered organism's genome. Gene 73:237-244 (1988); Higgins and Sharp, CABIOS The term “recombinant protein, as used herein, refers to a 5:151-153 (1989); Corpet, et al., Nucleic Acids Research protein produced by genetic engineering. 16:10881-90 (1988); Huang, et al., Computer Applications in The term “recombinase, as used herein, refers to an 30 the Biosciences 8:155-65 (1992), and Pearson, et al., Meth enzyme that catalyzes genetic recombination. ods in Molecular Biology 24:307-331 (1994). The BLAST “Reduced carbon” or a “reduced carbon compound” or family of programs, which can be used for database similarity “reduced carbon source” refers to a carbon-based molecule searches, includes: BLASTN for nucleotide query sequences that includes carbon and hydrogen and can be used as an against nucleotide database sequences; BLASTX for nucle energy source by an organism, either through oxidation or 35 otide query sequences against protein database sequences; glycolysis. Non-limiting examples of reduced carbon are Sug BLASTP for protein query sequences against protein data ars (including polysaccharides and starch), alcohols (includ base sequences; TBLASTN for protein query sequences ing glycerol and Sugar alcohols), forms of organic acids (e.g., against nucleotide database sequences; and TBLASTX for acetate, citrate. Succinate, etc.), amino acids, proteins, lipids, nucleotide query sequences against nucleotide database and fatty acids. Reduced carbon is sometimes referred to as 40 sequences. See, Current Protocols in Molecular Biology, "organic carbon.” Chapter 19, Ausubel, et al., Eds. Greene Publishing and The term “regulatory sequence' (also referred to as a Wiley-Interscience, New York (1995). “regulatory region' or “regulatory element) refers to a pro Unless otherwise stated, sequence identity/similarity val moter, enhancer, 5' untranslated region, 3' untranslated ues provided herein refer to the value obtained using the region, ribosome binding site, or other segment of DNA or 45 BLAST 2.0 suite of programs using default parameters. Alts RNA that regulate expression of a proximal gene. chul et al., Nucleic Acids Res. 25:3389-3402 (1997). Soft The terms “amino acid residue' and “amino acid' are used ware for performing BLAST analyses is publicly available, interchangeably to refer to an amino acid that is incorporated e.g., through the National Centerfor Biotechnology-Informa into a protein, a polypeptide, or a peptide, including, but not tion at ncbi.nlm.nih.gov. This algorithm involves first identi limited to, a naturally occurring amino acid and known ana 50 fying high scoring sequence pairs (HSPs) by identifying short logs of natural amino acids that can function in a similar words of length W in the query sequence, which either match manner as naturally occurring amino acids. or satisfy some positive-valued threshold score T when The following terms are used herein to describe the aligned with a word of the same length in a database sequence relationships between two or more nucleic acids or sequence. T is referred to as the neighborhood word score polynucleotides: (a) “reference sequence', (b) “comparison 55 threshold (Altschulet al., supra). These initial neighborhood window', (c) “sequence identity', (d) "percentage of word hits act as seeds for initiating searches to find longer sequence identity”, and (e) “substantial identity”. HSPs containing them. The word hits then are extended in The term “reference sequence” refers to a sequence used as both directions along each sequence for as far as the cumu a basis for sequence comparison. A reference sequence may lative alignment score can be increased. Cumulative scores be a subset or the entirety of a specified sequence; for 60 are calculated using, for nucleotide sequences, the parameters example, as a segment of a full-length cDNA or gene M (reward score for a pair of matching residues; always-0) sequence, or the complete cDNA or gene sequence. and N (penalty score for mismatching residues; always.<0). The term "comparison window' refers to a contiguous and For amino acid sequences, a scoring matrix is used to calcu specified segment of a polynucleotide sequence, wherein the late the cumulative score. Extension of the word hits in each polynucleotide sequence may be compared to a reference 65 direction are halted when: the cumulative alignment score sequence and wherein the portion of the polynucleotide falls off by the quantity X from its maximum achieved value: sequence in the comparison window may comprise additions the cumulative score goes to Zero or below, due to the accu US 8,530,207 B2 23 24 mulation of one or more negative-scoring residue alignments; tions at which the identical nucleic acid base or amino acid or the end of either sequence is reached. The BLAST algo residue occurs in both sequences to yield the number of rithm parameters W. T. and X determine the sensitivity and matched positions, dividing the number of matched positions speed of the alignment. The BLASTN program (for nucle by the total number of positions in the window of comparison, otide sequences) uses as defaults a word length (W) of 11, an and multiplying the result by 100 to yield the percentage of expectation (E) of 10, a cutoff of 100, M-5, N=-4, and a sequence identity. Unless otherwise stated, 96 homology of a comparison of both strands. For amino acid sequences, the sequence is across the entire length of the query sequence (the BLASTP program uses as defaults a word length (W) of 3, an comparison window). expectation (E) of 10, and the BLOSUM62 scoring matrix The term “substantial identity” of polynucleotide (see Henikoff & Henikoff, Proc. Natl. Acad. Sci. USA, 1989, 10 sequences means that a polynucleotide comprises a sequence 89:10915). that has at least 70% sequence identity, for example at least In addition to calculating percent sequence identity, the 80% sequence identity, at least 85% sequence identity, at least BLAST algorithm also performs a statistical analysis of the 90% sequence identity, at least 95% sequence identity, at least similarity between two sequences (see, e.g., Karlin & Alts 96% sequence identity, at least 97% sequence identity, at least chul, Proc. Natl. Acad. Sci. USA, 1993,90:5873-5787). One 15 98% sequence identity, or at least 99% sequence identity, measure of similarity provided by the BLAST algorithm is compared to a reference sequence using one of the alignment the smallest sum probability (P(N)), which provides an indi programs described using standard parameters. One of skill cation of the probability by which a match between two will recognize that these values may be adjusted appropri nucleotide or amino acid sequences would occur by chance. ately to determine corresponding identity of proteins encoded BLAST searches assume that proteins may be modeled as by two nucleotide sequences by taking into account codon random sequences. However, many real proteins comprise degeneracy, amino acid similarity, reading frame positioning regions of nonrandom sequences, which may be homopoly and the like. Substantial identity of amino acid sequences for meric tracts, short-period repeats, or regions enriched in one these purposes normally means sequence identity of at least or more amino acids. Such low-complexity regions may be 60%, for example at least 70%, at least 80%, at least 85%, at aligned between unrelated proteins even though other regions 25 least 90%, or at least 95%. Another indication that nucleotide of the protein are entirely dissimilar. A number of low-com sequences are Substantially identical is if two molecules plexity filter programs may be employed to reduce Such low hybridize to each other under stringent conditions. However, complexity alignments. For example, the SEG (Wooten and nucleic acids that do not hybridize to each other under strin Federhen, Comput. Chem., 1993, 17:149-163) and XNU gent conditions are still Substantially identical if the polypep (Claverie and States, Comput. Chem., 1993, 17:191-201) 30 tides that they encode are substantially identical. This may low-complexity filters may be employed alone or in combi occur, e.g., when a copy of a nucleic acid is created using the nation. maximum codon degeneracy permitted by the genetic code. As used herein, “sequence identity” or “identity” in the One indication that two nucleic acid sequences are substan context of two nucleic acid or polypeptide sequences refers to tially identical is that the polypeptide that the first nucleic acid the residues in the two sequences that are the same when 35 encodes is immunologically cross reactive with the polypep aligned for maximum correspondence over a specified com tide encoded by the second nucleic acid. parison window. When percentage of sequence identity is The terms “substantial identity” in the context of a peptide used in reference to proteins it is recognized that residue indicates that a peptide comprises a sequence with at least positions that are not identical often differ by conservative 70% sequence identity to a reference sequence, for example amino acid substitutions, i.e., where amino acid residues are 40 at least 80%, at least 85%, at least 90%, at least 95%, at least substituted for otheramino acid residues with similar chemi 96%, at least 97%, at least 98%, or at least 99% sequence cal properties (e.g., charge and/or hydrophobicity) and there identity to the reference sequence, over a specified compari fore do not change the functional properties of the molecule. son window. Optionally, optimal alignment is conducted Where sequences differ in conservative substitutions, the per using the homology alignment algorithm of Needleman and cent sequence identity may be adjusted upwards to correct for 45 Wunsch, J. Mol. Biol. 48:443 (1970). An indication that two the conservative nature of the substitution. Sequences that peptide sequences are substantially identical is that one pep differ by such conservative substitutions are said to have tide is immunologically reactive with antibodies raised “sequence similarity” or “similarity.” Means for making this against the second peptide. Thus, a peptide is Substantially adjustment are well-known to those of skill in the art. Typi identical to a second peptide, for example, where the two cally this involves scoring a conservative Substitution as a 50 peptides differ only by a conservative substitution. Peptides partial rather than a full mismatch, thereby increasing the which are 'substantially similar share sequences as noted percentage sequence identity. Thus, for example, where an above except that residue positions that are not identical may identical amino acid is given a score of 1 and a non-conser differ by conservative amino acid changes. Vative Substitution is given a score of Zero, a conservative A“variant' of agene or nucleic acid sequence is a sequence Substitution is given a score between Zero and 1. The scoring 55 having at least 65% identity with the referenced gene or of conservative Substitutions is calculated, e.g., according to nucleic acid sequence, and can include one or more base the algorithm of Meyers and Miller, Computer Applic. Biol. deletions, additions, or substitutions with respect to the ref Sci., 1988, 4:11-17, e.g., as implemented in the program erenced sequence. The differences in the sequences may by PC/GENE (Intelligenetics, Mountain View, Calif., USA). the result of changes, either naturally or by design, in As used herein, percentage of sequence identity” means 60 sequence or structure. Natural changes may arise during the the value determined by comparing two optimally aligned course of normal replication or duplication in nature of the sequences over a comparison window, wherein the portion of particular nucleic acid sequence. Designed changes may be the polynucleotide sequence in the comparison window may specifically designed and introduced into the sequence for comprise additions or deletions (i.e., gaps) relative to the specific purposes. Such specific changes may be made invitro reference sequence (which does not comprise additions or 65 using a variety of mutagenesis techniques. Such sequence deletions) for optimal alignment of the two sequences. The variants generated specifically may be referred to as percentage is calculated by determining the number of posi “mutants of the original sequence. US 8,530,207 B2 25 26 A “variant' of a peptide or protein is a peptide or protein Molecular Biology (NC-IUBMB) into EC (enzyme commis sequence that varies at one or more amino acid positions with sion) 3.1.2.1 to EC 3.1.2.27, as well as EC 3.1.2.—for unclas respect to the reference peptide or protein. A variant can be a sified TEs. Substrates of 15 of these 27 groupings contain naturally-occurring variant or can be the result of spontane coenzyme A (CoA), two contain acyl carrier proteins (ACPs), ous, induced, or genetically engineered mutation(s) to the four have glutathione or its derivatives, one has ubiquitin, and nucleic acid molecule encoding the variant peptide or protein. two contain other moieties. In addition, three groupings have A variant peptide can also be a chemically synthesized vari been deleted (Cantu et al. (2010) Protein Science, 19:1281 ant. 1295). A “conservative variant' of a polypeptide is a polypeptide The term “triacylglycerol or “triglycerides, as used having one or more conservative amino acid substitutions 10 herein, refers to a class of compounds that consist of a glyc with respect to the reference polypeptide, in which the activ erol backbone with a fatty acid linked to each of the three OH ity, substrate affinity, binding affinity of the polypeptide does groups by an ester bond. not substantially differ from that of the reference polypeptide. The term “transit peptide, as used herein, refers to a pep A skilled artisan likewise can produce polypeptide variants tide sequence, often at the N-terminus of a precursor protein, having single or multiple amino acid Substitutions, deletions, 15 which directs a gene product to its specific cellular destina additions, and/or replacements. These variants may include, tion, Such as plastid. inter alia: (a) variants in which one or more amino acid The term “underexpressed as used herein, refers to residues are Substituted with conservative or non-conserva decreased quantity of a gene or gene product relative to the tive amino acids; (b) variants in which one or more amino quantity of a gene or gene product under normal conditions. acids are added; (c) variants in which at least one amino acid The term “vector' is used herein to refer to any agent that includes a Substituent group; (d) variants in which amino acid acts as a carrier or transporter, such as a phage, plasmid, residues from one species are Substituted for the correspond cosmid, bacmid, phage or virus, to which another genetic ing residue in another species, either at conserved or non sequence or element (either DNA or RNA) may be attached conserved positions; and (e) variants in which a target protein so that sequence or element can be conveyed into a host cell. is fused with another peptide or polypeptide Such as a fusion 25 The term “expression vector, as used herein, generally partner, a protein tag or other chemical moiety, that may refers to a nucleic acid molecule that has been constructed in confer useful properties to the target protein, Such as, for such as way that, after insertion of a DNA molecule, its example, an epitope for an antibody. The techniques for coding sequence is properly transcribed into an RNA mol obtaining such variants, including genetic (Suppressions, ecule and the RNA molecule can be optionally translated into deletions, mutations, etc.), chemical, and enzymatic tech 30 a protein. The nucleic acid construct, which can be a vector, niques are known to the skilled artisan. As used herein, the frequently is engineered to contain regulatory sequences that term “mutation” refers to a change of the DNA sequence act as enhancer and promoter regions, which lead to efficient within a gene or chromosome of an organism resulting in the transcription of the open reading frame carried on the expres creation of a new character or trait not found in the parental sion vector. type, or the process by which Such a change occurs in a 35 A “fatty acid ester' is an ester of a fatty acid and an alcohol. chromosome, either through an alteration in the nucleotide The carbon chain originating from an alcohol is referred to as sequence of the DNA coding for a gene or through a change the A chain and the carbon chain originating from a fatty acid in the physical arrangement of a chromosome. Three mecha (the fatty acid moiety can be provided by an acylthioester) is nisms of mutation include Substitution (exchange of one base referred to as the B chain. A fatty acid ester can have an Aside pair for another), addition (the insertion of one or more bases 40 of any length, for example, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 14, into a sequence), and deletion (loss of one or more base pairs). 16, 18, 20, 22, 24, or more than 24 carbons in length. A fatty The term “specifically hybridizes, as used herein, refers to acid ester can have a B side of any length, for example, 4, 6, the process whereby a nucleic acid distinctively or defini 7, 8, 9, 10, 12, 14, 16, 18, 20, 22, 24, or more than 24 carbons tively forms base pairs with complementary regions of at least in length. The lengths of the A and B chains of a fatty acid one strand of the nucleic acid target sequence that was not 45 ester can vary independently. For example, condenstation of originally paired to the nucleic acid. A nucleic acid that selec methanol (C1) and an acyl chain (fatty acid or acyl-thioester) tively hybridizes undergoes hybridization, under stringent of C4 or greater can result in a fatty acid methyl ester hybridization conditions, of the nucleic acid sequence to a (“FAME) and condenstation of ethanol and an acyl chain can specified nucleic acid target sequence to a detectably greater result in a fatty acid ethyl ester (“FAEE). Condensation of a degree (e.g., at least 2-fold over background) than its hybrid 50 fatty acohol (C8 or above) with an acyl thioester (C8 or ization to non-target nucleic acid sequences and to the Sub greater) produces a wax ester. stantial exclusion of non-target nucleic acids. Selectively The term “wax' or “wax esters, as used herein, refers to hybridizing sequences typically have about at least 80% esters of long chain fatty acids and monohydric straight chain sequence identity, at least 85% sequence identity, at least 90% aliphatic alcohols, which form solids or pliable substances sequence identity, at least 95% sequence identity, at least 96% 55 under an identified set of physical conditions. sequence identity, at least 97% sequence identity, at least 98% The term “wild type.” as used herein, refers to an organism sequence identity, at least 99% sequence identity, or about or phenotype as found in nature. 100% sequence identity (i.e., complementary) with each I. Genetically Engineered Microorganism for Producing Free other. Fatty Acids and/or Derivatives The term “stably integrated, as used herein, means that an 60 According to one aspect, the described invention provides exogenous or heterologous genetic material is integrated into a microorganism that includes a recombinant nucleic acid a host genome and is inherited by the descendants of the cell. molecule that encodes a prokaryotic acyl-ACP thioesterase. The term “thioesterase (TE) or “thioester hydrolase, as The genetically engineered microorganism can produce at used herein, refers to a large enzyme group whose members least one free fatty acid and/or fatty acid derivative. hydrolyze the thioester bond between a carbonyl group and a 65 Additionally or alternately, the amount of at least one free sulfur atom. They are classified by the Nomenclature Com fatty acid and/or derivative produced by the genetically engi mittee of the International Union of Biochemistry and neered microorganism can be at least twice the amount of the US 8,530,207 B2 27 28 free fatty acid and/or derivative produced by the same micro acids and/or derivatives by the microorganism. Prokaryotic organism that does not include a recombinant prokaryotic acyl-ACP thioesterases considered useful herein can include acyl-ACP thioesterase gene. For example, the photosynthetic members of the acyl-ACP thioesterase family (e.g., PFO1643; microorganism that includes the recombinant nucleic acid see pfam.cgb.ki.se/ or pfamjanelia.org/ or pfam.sanger molecule that encodes the prokaryotic acyl-ACP thioesterase 5 .ac.uk/) that, when queried against the Pfam bioinformatics can produce at least 30 mg per liter, for example at least 40 mg annotated database of protein families, can demonstrate a per liter or at least 50 mg per liter, of free fatty acids and/or match with the Pfam acyl-ACP thioesterase family derivatives. For example, the host microorganism can express (PFO1643) with a bit score higher than the threshold gather prokaryotic thioesterase Such that one or more fatty acids ing score (for example, a bit score higher that 20.3), and/or and/or derivates can be produced. 10 The genetically engineered microorganism can be any can demonstrate a Pfam-A match with the Pfam acyl-ACP microorganism, including, but not limited to, a heterokont, thioesterase family with an expectation value (evalue) of less fungus, bacterium, microalga, or cyanobacterium. than 0.01 (Bateman et al. (2000) Nucleic Acids Research The genetically engineered host organism can additionally 28:263-266: Bateman et al. (2006) Nucleic Acids Research or alternately be a photosynthetic microorganism, Such as, a 15 32: D138-D141, Finn et al. (2010) Nucleic Acids Research microalga. Representative algae include green algae (chloro 38:D211-222). Prokaryotic thioesterases expressed in a pho phytes), red algae, diatoms, prasinophytes, glaucophytes, tosynthetic microorganism as provided herein in some chlorarachniophytes, euglenophytes, chromophytes, and embodiments have the EC designation EC 3.1.2.14. dinoflagellates. Non-limiting examples of a microalgal genus Non-limiting examples of prokaryotic acyl-ACP that can contain an exogenous nucleic acid molecule encod thioesterases that can be used to transform a microorganism ing a prokaryotic acy-ACP thioesterase include, but are not for producing fatty acids and fatty acid derivatives include, limited to, Achnanthes, Amphiprora, Amphora, Ankistrodes without limitation, the Desulfovibrio desulfuricans acyl-ACP mus, Asteromonas, Boekelovia, Borodinella, Botryococcus, thioesterase (SEQ ID NO:16) having Genbank Accession Bracteococcus, Chaetoceros, Carteria, Chlamydomonas, Number Q312L1 and GenInfo Identifier GI:123552742; the Chlorococcum, Chlorogonium, Chlorella, Chroomonas, 25 Elusimicrobium minutum acyl-ACP thioesterase (SEQ ID Chrysosphaera, Cricosphaera, Crypthecodinium, Crypto NO:17) having Genbank Accession Number ACC98705 and monas, Cyclotella, Dunaliella, Ellipsoidon, Emiliania, GenInfo Identifier GI:186971720; the Carboxydothermus Eremosphaera, Ernodesmius, Euglena, Franceia, Fragilaria, hydrogenoformans acyl-ACP thioesterase (SEQ ID NO:18) Gloeothamnion, Haematococcus, Halocafeteria, Hymeno having Genbank Accession Number YP 3596.70 and Gen in OraS, Isochrysis, Lepocinclis, Micractinium, 30 Info Identifier GI:78042959; the Clostridium thermocellum Monoraphidium, Nannochloris, Nannochloropsis, Navicula, acyl-ACP thioesterase (SEQ ID NO:2) having Genbank Neochloris, Nephrochloris, Nephroselmis, Nitzschia, Accession Number YP 001039461 and Gennfo Identifier Ochromonas, Oedogonium, Oocystis, Ostreococcus, Pav GI: 12597.5551; the Moorella thermoacetica acyl-ACP lova, Parachlorella, Pascheria, Phaeodactylum, Phagus, thioesterase (SEQ ID NO:19) having Genbank Accession Platymonas, Pleurochrysis, Pleurococcus, Prototheca, 35 Number YP 431036 and GenInfo Identifier GI:835.91027: Pseudochlorella, Pyramimonas, Pyrobotrys, Scenedesmus, the Geobacter metallireducens acyl-ACP thioesterase (SEQ Skeletonema, Spyrogyra, Stichococcus, Tetraselmis, Thalas ID NO:20) having Genbank Accession NumberYP 384688 Siosira, Viridiella, and Volvox. and GenInfo Identifier GI:78222941; the Salinibacter ruber Alternately, the photosynthetic microorganism can be a acyl-ACP thioesterase (SEQ ID NO:21) having Genbank cyanobacterial species. Non-limiting examples of a cyano 40 Accession Number YP 444210 and GenInfo Identifier bacterial genus that can include an exogenous nucleic acid GI:83814393; the Microscilla marina acyl-ACP thioesterase molecule encoding a prokaryotic acyl-ACP thioesterase (SEQ ID NO:22) having Genbank Accession Number include, but are not limited to, Agmenellum, Anabaena, Ana EAY28464 and Gennfo Identifier GI: 123988858; the Para baenopsis, Anacystis, Aphanizomenon, Arthrospira, Astero bacteroides distasonis acyl-ACP thioesterase (SEQID NO:1) capsa, Borzia, Calothrix, Chamaesiphon, Chlorogloeopsis, 45 having Genbank Accession Number YP 001303423 and Chroococcidiopsis, Chroococcus, Crinalium, Cyanobacte GenInfo Identifier GI: 150008680; the Enterococcus faecalis rium, Cyanobium, Cyanocystis, Cyanospira, Cyanothece, acyl-ACP thioesterase (SEQ ID NO:23) having Genbank Cylindrospermopsis, Cylindrospermum, Dactylococcopsis, Accession Number: ZP 039.49391 and Gennfo Identifier Dermocarpella, Fischerella, Fremyella, Geitleria, Gei GI:227519342; the lactobacillus plantarum oleoyl-(acyl tlerinema, Gloeobacter; Gloeocapsa, Gloeothece, Halospir 50 ACP) thioesterase (SEQID NO:24) having Genbank Acces ulina, Iyengariella, Leptolyngbya, Limnothrix, Lyngbya, Sion Number YP 003 062170 and Gennfo Identifier Microcoleus, Microcystis, Myxosarcina, Nodularia, Nostoc, GI:254555753; the Leuconostoc mesenteroides subsp. Nostochopsis, Oscillatoria, Phormidium, Planktothrix, Pleu mesenteroides acyl-ACP thioesterase (SEQID NO:25) hav rocapsa, Prochlorococcus, Prochloron, Prochlorothrix, ing Genbank Accession Number YP 817783 and GenInfo Pseudanabaena, Rivularia, Schizothrix, Scytoinema, Spir 55 Identifier GI: 116617412; the Oenococcus oeni acyl-ACP ulina, Stanieria, Starria, Stigonema, Symploca, Synechococ thioesterase (SEQ ID NO:26) having Genbank Accession cus, Synechocystis, Thermosynechococcus, Tolypothrix, Tri Number: ZP 01544.069 and Gennfo Identifier chodesmium, Tichonema, and Xenococcus. For example, the GI: 118586629; the Mycobacterium smegmatis str. MC2 155 photosynthetic microorganism can be a Synechococcus, Syn acyl-ACP thioesterase (SEQ ID NO:27) having Genbank echocystis, or Thermosynechococcus species. Alternatively, 60 Accession Number ABK74560 and Gennfo identifier the photosynthetic microorganism can be a Cyanobium, GI: 118173664: the Mycobacterium vanbaalenii PYR-1 acyl Cyanothece, or Cyanobacterium species, or further alterna ACP thioesterase (SEQID NO:28) having Genbank Acces tively, the photosynthetic microorganism can be a Gloeo Sion Number ABM 11638 and Gennfo Identifier bacter, Lyngbya or Leptolyngba species. GI:11995.4633; the Rhodococcus erythropolis SK121 acyl The prokaryotic acyl-ACP thioesterase gene can be any 65 ACP thioesterase (SEQID NO:29) having Genbank Acces prokaryotic acyl-ACP thioesterase gene that, when expressed Sion Number ZP 04385507 and Gennfo Identifier in the microorganism, can result in the production of free fatty GI:229491686; and the Rhodococcus opacus B4 US 8,530,207 B2 29 30 ROP 16330 (SEQ ID NO:30) having Genbank Accession 28, 30, and/or 32. One or more free fatty acids produced by Number YP 002778825 and Gennfo Identifier the genetically engineered microorganism may be saturated GI:226361047. or may have one or more double bonds. Also considered herein are microorganisms that include In some embodiments, the genetically engineered micro nucleic acid molecules encoding variants of the above-listed 5 organism expressing a prokaryotic acyl-ACP thioesterase can acyl-ACP thioesterases, in which the variants have at least produce free fatty acids and/or derivatives of more than one 70% identity, for example at least 75%, at least 80%, at least acyl chain length, for example, any combination of two or 85%, at least 90%, at least 95%, at least 96%, at least 97%, at more offatty acids having chain lengths of 8, 10, 12, 14, 16, least 98%, at least 99%, or about 100% identity, to the amino 18, 20, 22, and/or 24 carbons (for example, predominantly acid sequences accessed by the provided Genbank Accession 10 fatty acids having acyl chain lengths of 12, 14, and/or 16 Numbers, in which the variants have acyl-ACP thioesterase carbons). In one such embodiment, at least 50 wt %, for activity, and expression of the variant in a microorganism can example at least 60 wt %, at least 70 wt %, at least 80 wt %, result in production of a free fatty acid and/or derivative in an at least 90 wt %, or at least 95 wt %, of the free fatty acids amount greater than (for example at least twice as much as) and/or derivatives produced by a genetically engineered that produced by a microorganism that does not express the 15 microorganism as disclosed herein can have acyl chain variant. Sequence-structure-function relationships for lengths of 12, 14, and 16 carbons, of 12 and 14 carbons, of 12 thioesterases have been advanced significantly in recent years and 16 carbons, or of 14 and 16 carbons. (see, for example, Dillon and Bateman, BMC Bioinformatics Alternatively or in addition, the genetically engineered 2004, 5:109; Mayer and Shanklin, J Biological Chem., 2005, microorganism can include a recombinant nucleic acid mol 280: 3621-3627; Mayer and Shanklin, BMC Plant Biology, ecule encoding a prokaryotic acyl-ACP thioesterase having 2007, 7:1). A variant of a wild-type prokaryotic acyl-ACP an amino acid sequence that has at least 70% identity, for thioesterase can have at least 70% identity with the amino example at least 75%, at least 80%, at least 85%, at least 90%, acid sequence of a prokaryotic acyl-ACP thioesterase as pro or at least 95% identity, with SEQID NO:1 or SEQID NO:2, vided hereinabove. A variant of a wild-type prokaryotic acyl and the microorganism (e.g., including a prokaryotic ACP thioesterase can have at least 75% identity, for example 25 thioesterase and/or a nucleic acid molecule that encodes an at least 80%, at least 85%, at least 90%, at least 95%, at least acyl-ACP thioesterase) can produce a fatty acid having an 96%, at least 97%, at least 98%, or at least 99%, identity, with acyl chain length of 12, 14, and/or 16 carbons (optionally with a wild-type prokaryotic acyl-ACP thioesterase provided at least 50 wt % of the fatty acids produced having an acyl herein. chain length from 12 to 16 carbons) and/or a fatty acid deriva Additionally or alternately, the genetically engineered 30 tive having a total number of carbons from 7 to 36 (for microorganism that includes a recombinant nucleic acid mol example from 7 to 32; from 11 to 36; from 11 to 30; and/or of ecule encoding a prokaryotic acyl-ACP thioesterase can pro 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, 22, 24, 26, 28, 30, duce at least one free fatty acid having an acyl chain length of 32, 34, and/or 36 carbons). 8 carbons, of 10 carbons, of 12 carbons, of 14 carbons, of 16 In some embodiments, the genetically engineered photo carbons, of 18 carbons, of 20 carbons, of 22 carbons, and/or of 35 synthetic microorganism that includes an acyl-ACP 24 carbons. Further additionally or alternately, the genetically thioesterase can produce a fatty aldehyde, fatty alcohol, and/ engineered microorganisms can produce at least one free fatty or a wax ester, and can optionally include one or more recom acid having an acyl chain length from 8 to 18 carbons, for binant nucleic acid molecules encoding a acyl-CoA reduc example from 12 to 16 carbons. tase, carboxylic acid reductase, acyl-ACP reductase, a fatty Typically, acyl-ACP thioesterases are active to some 40 aldehyde reductase, a wax synthase, or a combination thereof. degree on acyl-ACP substrates having a plurality of different Wax esters include an acyl chain (A chain) on the carbonyl acyl chain lengths, but can have higher activity on (e.g., have side of the ester bond and an ester chain (B chain) connected a substrate preference for) one or more acyl-ACP substrates to the oxygen of the ester bond, one or both of which can be having particular acyl chain lengths than on other chain derived from a fatty acid, e.g., generated by a thioesterase length substrates. For example, an acyl-ACP thioesterase 45 such as the prokaryotic acyl-ACP thioesterase. Wax esters can may have a substrate preference for one or more of acyl-ACP have a total number of carbons (an A+B “chain length'), for substrates having acyl chain lengths of 8, 10, 12, 14, 16, 18, example, from 10 to 36 carbons, for example from 16 to 36 20, 22, and/or 24 carbons. Additionally or alternately, the carbons, from 16 to 32 carbons, or from 24 to 32 carbons. acyl-ACP thioesterase can hydrolyze one or more acyl-ACP Additionally or alternately, the genetically engineered Substrates having an acyl chain length from 8 to 18 carbons, 50 photosynthetic microorganism that includes an acyl-ACP for example from 12 to 16 carbons. Further additionally or thioesterase can produce an alkane and/or alkene and can alternately, an acyl-ACP thioesterases of the present inven optionally include at least one recombinant nucleic acid mol tion can, in Some embodiments, have its highest level of ecule encoding a fatty acid decarboxylase, a fatty aldehyde activity on an acyl-ACP Substrate having an acyl chain length decarboxylase, an acyl-CoA reductase, carboxylic acid of 12, 14, and/or 16 carbons. 55 reductase, acyl-ACP reductase, or a combination thereof. In some embodiments, the microorganism expressing a Alkanes and/or alkenes produced by and/or derived from a prokaryotic acyl-ACP thioesterase can produce predomi photosynthetic microorganism that includes a recombinant nantly free fatty acids having acyl chain lengths of 12, 14. nucleic acid molecule encoding a prokaryotic acyl-ACP and/or 16 carbons and/or fatty acid derivatives having a total thioesterase can, for example, have a chain length of 7, 9, 11, carbon number of 12, 14, 16, 24, 26, 28, 30, and/or 32. 60 13, 15, 17, 19, 21, and/or 23 carbons (e.g., one or more odd Additionally or alternately, at least 30 wt %, for example at numbered chain lengths from 7 to 17 carbons, from 7 to 15 least 40 wt %, at least 50 wt %, at least 60 wt %, at least 70 wit carbons, or from 11 to 15 carbons). %, at least 80 wt %, at least 90 wt %, or at least 95 wt %, of the Further additionally or alternately, a genetically engi free fatty acids produced by a genetically engineered micro neered photosynthetic microorganism that can produce a organism as disclosed herein can be fatty acids having acyl 65 fatty alcohol, fatty aldehyde, wax ester, alkane, or alkene may chain lengths of 12, 14, and/or 16 carbons and/or fatty acid optionally include a nucleic acid molecule encoding an acyl derivatives having a total carbon number of 12, 14, 16, 24, 26, CoA synthetase. US 8,530,207 B2 31 32 The nucleic acid molecule encoding the prokaryotic acyl 005643, and WO 2007/133558 (each of which cited reference ACP thioesterase can advantageously be stably integrated is incorporated by reference in its entirety). into the chromosome of the host microorganism, in an For optimal expression of a recombinant protein, in many autonomously replicating episome, in an expression con instances it can be beneficial to employ coding sequences that struct, or a combination thereof. Additionally or alternately, can produce mRNA with codons preferentially used by the the genetically engineered microorganisms can be trans host cell to be transformed. Thus, for an enhanced expression formed with exogenous genes from prokaryotes by the intro of transgenes, the codon usage of the transgene can be duction of appropriate nucleic acid expression constructs that matched with the specific codon bias of the organism in which can include, in addition to the gene of interest, gene expres the transgene is being expressed. For example, methods of sion sequences and optionally sequences that can mediate 10 recoding genes for expression in microalgae are described in recombination into the host chromosome. U.S. Pat. No. 7,135,290, the content of which is incorporated Expression constructs can be introduced into prokaryotic by reference. All or a subset of the codons of a gene can be and eukaryotic cells via conventional transformation or trans changed to incorporate a preferred codon used by the host fection techniques. The terms “transformation' and “trans organism. Additional information for codon optimization is fection', conjugation and transduction, as used in the present 15 available, e.g., at the codon usage database of Genbank. context, are intended to comprise a multiplicity of methods In some embodiments, the thioesterase-encoding nucle known in the art for the introduction of foreign nucleic acid otide sequence in microorganisms transformed with an iso (for example DNA) into a host cell, including, but not limited lated nucleic acid molecule including a recombinant nucleic to, calcium phosphate or calcium chloride coprecipitation, acid sequence encoding a prokaryotic acyl-ACP thioesterase DEAE-dextran-mediated transfection, lipofection, natural can be operably linked to one or more expression control competence, chemically mediated transfer, electroporation, elements and can optionally be codon-optimized for expres and/or particle bombardment. Suitable methods for the trans sion in the microorganism. formation or transfection of host cells can be found in Alternatively or in addition, the exogenous nucleic acid Molecular Cloning A Laboratory Manual (2010), Cold molecule as disclosed herein can be cloned into an expression Spring Harbor Laboratory Press, the contents of which are 25 vector for transformation into a microalga or a photosynthetic incorporated by reference herein. bacterium. The vector can include sequences that promote For example, algae and photosynthetic bacteria can be expression of the transgene of interest (e.g., an exogenous transformed by any suitable method, including, as non-lim prokaryotic acyl-ACP thioesterase gene) such as a heterolo iting examples, natural DNA uptake (Chung et al. (1998) gous promoter, and may optionally include, for expression in FEMS Microbiol. Lett. 164: 353-361; Frigaard et al. (2004) 30 eukaryotic cells, without limitation, an intron sequence, a Methods Mol. Biol. 274:325-40; Zanget al. (2007).J. Micro sequence having a polyadenylation signal, etc. Alternately, if biol. 45: 241-245), conjugation, transduction, glass bead the vector does not contain a promoter in operable linkage transformation (Kindle et al. (1989) J. Cell Biol. 109: 2589 with the gene of interest, the gene can be transformed into the 601: Feng et al. (2009) Mol. Biol. Rep. 36: 1433-9: U.S. Pat. cells such that it becomes operably linked to an endogenous No. 5,661,017), silicon carbide whisker transformation 35 promoter by homologous recombination or vector integra (Dunahayetal. (1997) Methods Mol. Biol. (1997) 62:503-9), tion. biolistics (Dawson et al. (1997) Curr. Microbiol. 35: 356-62: Vectors designed for expression of a gene in microalgae Hallmann et al. (1997) 94: 7469-7474; Jakobiak et al. (2004) can include a promoter active in microalgae operably linked Protist 155:381-93; Tan et al. 2005) J. Microbiol. 43: 361 to the exogenous gene being introduced. A variety of gene 365: Steinbrenner et al. (2006) Appl Environ. Microbiol. 72: 40 promoters and terminators that function in microalgae can be 7477-7484; Kroth (2007) Methods Mol. Biol. 390: 257-267; utilized in expression vectors, including, but not limited to, U.S. Pat. No. 5,661,017), electroporation (Kjaerulff et al. promoters and terminators from prokaryotes or eukaryotes, (1994) Photosynth. Res.41:277-283: Iwaietal. (2004) Plant Such as, but not limited to, Chlamydomonas and other algae Cell Physiol. 45: 171-5: Ravindran et al. (2006).J. Microbiol. (see, for example, Plant Cell Physiol 49: 625-632, 2008), Methods 66: 174-6; Sun et al. 2006. Gene 377: 1340-649; 45 promoters and terminators from viruses, and synthetic pro Wang et al. (2007) Appl. Microbiol. Biotechnol. 76: 651-657: moters and terminators. Chaurasia et al. (2008) J. Microbiol. Methods 73: 133-141: For transformation of diatoms, a variety of gene promoters Ludwig et al. (2008) Appl. Microbiol. Biotechnol. 78: 729 that function in diatoms can be utilized in these expression 35), laser-mediated transformation, or incubation with DNA vectors, including, but not limited to: promoters from Thalas in the presence of or after pre-treatment with any of poly 50 Siosira and other heterokont algae, promoters from viruses, (amidoamine) dendrimers (Pasupathy et al. 2008. Biotech and synthetic promoters. Promoters from Thalassiosira nol. J. 3: 1078-82), polyethylene glycol (Ohnuma et al. pseudonana that would be suitable for use in expression (2008) Plant Cell Physiol. 49: 117-120), cationic lipids (Mu vectors include, without limitation, an alpha-tubulin pro rakawa et al. (2008) J. Biosci. Bioeng. 105: 77-80), dextran, moter, a beta-tubulin promoter, and an actin promoter. Pro calcium phosphate, or calcium chloride (Mendez-Alvarez et 55 moters from Phaeodactylum tricornutum that would be suit al. (1994) J. Bacteriol. 176: 7395-7397), optionally after able for use in expression vectors include, without limitation, treatment of the cells with cell wall-degrading enzymes (Per an alpha-tubulin promoter, a beta-tubulin promoter, and an rone et al. (1998) Mol. Biol. Cell 9: 3351-3365). Agrobacte actin promoter. The terminators associated with these genes, rium-mediated transformation also can be performed on algal other diatom genes, or particular heterologous genes can be cells, for example after removing or wounding the algal cell 60 used to stop transcription and provide the appropriate signal wall (e.g., International Publication No. WO 2000/62601; for polyadenylation. Kumaret al. (2004) Plant Sci. 166: 731-738). Biolistic meth If desired, in order to express the exogenous nucleic acid ods are useful for transformation of the chloroplasts of plant molecule. Such as, prokaryotic acyl-ACP thioesterase, in the and eukaryotic algal species (see, for example, Ramesh et al. plastid, where the fatty acid biosynthesis occurs in microal (2004) Methods Mol. Biol. 274: 355-307: Doestch et al. 65 gae, a nucleotide sequence encoding a chloroplast transit 2001. Curr. Genet. 39: 49-60; U.S. Pat. No. 7,294,506; and peptide can be added to the N-terminus of the exogenous International Publication Nos. WO 2003/091413, WO 2005/ nucleic acid molecule. Alternately, the exogenous nucleic US 8,530,207 B2 33 34 acid molecule encoding a prokaryotic acyl-ACP thioesterase or factor required for survival of the host (for example, an can be introduced directly into the plastid chromosome of auxotrophic marker), and the like, as well as combinations microalgae without disrupting photosynthetic capability of thereof. Transformed cells can optionally be selected based the plastid. Methods for plastid transformation are well upon the ability to grow in the presence of the selectable known for introducing a nucleic acid molecule into a plant 5 marker under conditions in which cells lacking the resistance cell chloroplast (see, for example, International Publication cassette or auxotrophic marker would not grow. Alternately, a Nos. WO 2010/019813 and WO95/16783: U.S. Pat. Nos. non-selectable marker may be present on a vector. Such as a 5,451,513, 5,545,817, and 5,545,818; and McBride et al., gene encoding a fluorescent protein or enzyme that generates Proc. Natl. Acad. Sci. USA 91:7301-7305 (1994), each of a detectable reaction product. which are incorporated by reference herein). 10 Expression vectors can be introduced into the microorgan In some instances, it can be advantageous to express an isms by standard methods, including, but not limited to, natu enzyme. Such as, but not limited to, a prokaryotic acyl-ACP ral DNA uptake, conjugation, electroporation, particle bom thioesterase, at a certain point during the growth of the geneti bardment and abrasion with glass beads, SiC fibers, or other cally engineered host organism to minimize any deleterious particles. The vectors can be, for example, (1) targeted for effects on the growth of that organism and/or to maximize 15 integration into the host chromosome by including flanking production of the fatty acid product of interest. In these sequences that enable homologous recombination into the instances, one or more exogenous nucleic acid molecules chromosome, (2) targeted for integration into endogenous encoding a prokaryotic acyl-ACP thioesterase introduced plasmids by including flanking sequences that enable into the genetically engineered organism can be operably homologous recombination into the endogenous plasmids, linked to an inducible promoter. The promoter can be, for and/or (3) designed Such that the expression vectors replicate example, without limitation, a lac promoter, a tet promoter within the chosen host. (e.g., U.S. Pat. No. 5,851,796), a hybrid promoter that The genetically engineered microorganism can further includes either or both of portions of a tet or lac promoter, a comprise one or more additional recombinant nucleic acid hormone-responsive promoter (e.g., an ecdysone-responsive molecules that may enhance production of fatty acids and/or promoter; see U.S. Pat. No. 6,379.945), a metallothionien 25 fatty acid derivatives, such as, for example, a gene encoding promoter (U.S. Pat. No. 6,410,828), and/or a pathogenesis an acetyl-CoA carboxylase or a subunit thereof and/or a gene related (PR) promoter that can be responsive to a chemical encoding a B-ketoacyl synthase (KAS). Such as a KAS III, Such as, for example, Salicylic acid, ethylene, thiamine, or KAS II, or KAS I enzyme. Additionally or alternately, the BTH(U.S. Pat. No. 5,689,044). An inducible promoter can be microorganism can have attenuated expression of a gene responsive to light or dark (U.S. Pat. Nos. 5,750,385 and 30 encoding acyl-ACP synthase, acyl-CoA synthase, acyl-CoA 5,639,952), temperature (U.S. Pat. No. 5,447,858: Abe et al., dehydrogenase, glycerol-3-phosphate dehydrogenase, Plant Cell Physiol. 49: 625-632 (2008); Shroda et al. Plant J. acetaldehyde-CoA dehydrogenase, pyruvate dehydrogenase, 21: 121-131 (2000)), or the like, or combinations thereof. The acetate kinase, or the like, or a combination thereof. foregoing list is meant to be exemplary and not limiting. The In some embodiments, the culture medium does not promoter sequences can be from any organism, provided that 35 include a reduced carbon compound for Supplying energy to they are functional in the host organism. Inducible promoters, the genetically engineered photosynthetic microorganism, as used in the constructs of the present invention, can use one and yet the culture comprising the microorganism can pro or more portions/domains of the aforementioned promoters duce at least twice the amount of a free fatty acid and/or fatty and/or other inducible promoters fused to at least a portion of acid derivative, compared to a culture of the same microor a different promoter that operates in the host organism to 40 ganism that does not include a recombinant nucleic acid confer inducibility on a promoter that operates in the host encoding the prokaryotic thioesterase, and/or the culture species. medium that includes the transformed microorganism that For example, for transformation of cyanobacteria, a variety includes an acyl-ACP thioesterase can include at least 5 mg of promoters that function in cyanobacteria can be utilized, per liter, for example at least 10 mg per liter, at least 20 mg per including, but not limited to, the lac, tac and trc promoters and 45 liter, at least 30 mg per liter, at least 40 mg per liter, or at least derivatives that are inducible by the addition of isopropyl 50 mg per liter, of free fatty acids and/or fatty acid derivatives B-D-1-thiogalactopyranoside (IPTG), promoters that are produced by the microorganism. naturally associated with transposon- or bacterial chromo The nucleic acid molecule encoding the prokaryotic acyl some-borne antibiotic resistance genes (neomycin phospho ACP thioesterase can be any as described hereinabove, for transferase, chloramphenicol acetyltransferase, spectinomy 50 example, a member of Pfam family PFO1643 and/or, when cin adenyltransferase, etc.), promoters associated with queried against the Pfam database, is a match with PFO1643 various heterologous bacterial and native cyanobacterial with a bit score greater than the gathering threshold value genes, promoters from viruses and phages, and synthetic and/or with an evalue of less than 0.01. As mentioned herein, promoters. One embodiment of Such promoter includes an the nucleic acid molecule can be operably linked to a pro IPTG-inducible trcY promoter (SEQ ID NO:9). Promoters 55 moteractive in the photosynthetic microorganism and option isolated from cyanobacteria that can be used can include, ally one or more additional nucleic acid regulatory sequences, without limitation, secA (secretion; controlled by the redox Such as, for example, a transcriptional terminator sequence. state of the cell), rbc (Rubisco operon), psaAB (PSI reaction Additionally or alternately, the nucleic acid molecule can be center proteins; light regulated), and psbA (Dl protein of present on a self-replicating plasmid introduced into the pho PSII; light-inducible). 60 tosynthetic microorganism, and/or can be integrated into the Likewise, a wide variety of transcriptional terminators can genome of the photosynthetic microorganism. be used for expression vector construction. Examples of pos In some embodiments, the fatty acids and/or fatty acid sible terminators include, but are not limited to, psbA, psa AB, derivatives can be present in the media, for example, as pre rbc., secA, and T7 coat protein. cipitates in or on, at or near the Surface of the media, associ Transformation vectors can optionally also include a 65 ated with the media vessel as droplets, including Suspended selectable marker, Such as, but not limited to, a drug resis droplets (e.g., an emulsion), as a relatively immiscible layer tance gene, an herbicide resistance gene, a metabolic enzyme floating on top of the aqueous culture medium, as a "scum', US 8,530,207 B2 35 36 film, gel, semi-solid, colloid, fine particulate, particulate, thioesterase can be expressed for at least a portion of the time Solid, or aggregate that may be dispersed, Suspended, or during which the photosynthetic microorganism is cultured entrained within the culture medium, associated with the cells and/or upon administering an inducer to the culture. Non of the photosynthetic microorganism, phase separated in limiting examples of the inducer include lactose or a lactose Some other fashion, or a combination thereof. analogue. Such as isopropyl B-D-1-thiogalactopyranoside, In preferred embodiments, at least one free fatty acid pro and light, which can be provided as Sunlight or artificial light, duced by a culture as disclosed herein can have an acyl chain Such as, for example, fluorescent light. length from 8 to 24 carbons, for example from 8 to 18 carbons, Additionally or alternately, the genetically engineered from 12 to 16 carbons, or of 8, 10, 12, 14, 16, 18, 20, 22, photosynthetic microorganism can be grown phototrophi and/or 24 carbons. In embodiments where at least one fatty 10 cally, in which case the growth media typically does not acid derivative (such as one or more fatty alcohols, fatty include a Substantial amount of (e.g., includes none of) a aldehydes, wax esters, alkanes, and alkenes) are produced by reduced carbon Source. When growing phototrophically, the a culture as disclosed herein, the at least one fatty acid deriva microorganism uses light as its energy source, and an inor tive can have a total number of carbons from 7 to 36, for ganic carbon Source. Such as CO2 orbicarbonate, is used for example from 11 to 34, from 12 to 32, or of 7, 8, 9, 10, 11, 12, 15 synthesis of biomolecules by the microorganism. Alternately, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 26, 28, 30, 32, 34, an organic carbon molecule or compound can be provided in and/or 36. the culture medium of a microorganism grown phototrophi Advantageously, the culture medium can be any Suitable cally, but it either cannot be taken up or metabolized by the for growth of the photosynthetic microorganism. In one cell for energy or is not present in an amount effective to embodiment, the culture can include a source of reduced provide energy sufficient for the growth of the cell culture. carbon, such as, for example, one or more Sugars or organic In many embodiments, the culture can include an inorganic acids that can be used by the microorganism for growth, Such carbon Source, including, but not limited to, bicarbonate, that the microorganism can grow heterotrophically or mix calcium carbonate, and/or CO2, present in air, or provided in otrophically. Additionally or alternately, the culture medium enriched form with respect to ambient CO2, for example, as does not include a substantial amount of a reduced carbon 25 5 vol% CO in air. Additionally or alternately, the photosyn compound that can be used for the organism as an energy thetic microorganisms can be exposed to light for at least a Source and/or includes a source of inorganic carbon, such as portion of the culturing period. Artificial light Sources can be CO2 or bicarbonate. used as the sole light source or to enhance or extend natural II. Methods of Producing Free Fatty Acids and/or Derivatives light. An aspect of the present invention relates to a method for 30 “Culturing refers to the intentional fostering of growth producing a free fatty acid and/or derivative in a culture, the (increases in cell size, cellular contents, and/or cellular activ method comprising culturing photosynthetic microorgan ity) and/or propagation (increases in cell numbers, e.g., via isms that include at least one recombinant nucleic acid mitosis) of one or more cells by use of selected and/or con sequence encoding a prokaryotic acyl-ACP thioesterase in trolled conditions. The combination of both growth and growth media under conditions that allow expression of the 35 propagation may be termed “proliferation.” Examples of prokaryotic acyl-ACP thioesterase. Expression of the selected and/or controlled conditions can include the use of a prokaryotic acyl-ACP thioesterase in the photosynthetic defined medium (with known characteristics, such as pH, microorganism can result in production of at least one free ionic strength, and carbon source), specified temperature, fatty acid and/or fatty acid derivative. oxygen tension, carbon dioxide levels, and growth in a biore In one embodiment, the culture that includes the photosyn 40 actor, interalia. thetic microorganism that expresses a prokaryotic acyl-ACP The photosynthetic microorganisms, such as, microalgae thioesterase can produce at least twice the amount of the fatty or cyanobacteria, can be cultured phototrophically, in the acid and/or derivative, compared to a culture that is identical absence of a Substantial amount of a fixed carbon Source, or in all respects except that the photosynthetic microorganism mixotrophically, where the cultures are supplied with light for does not include a recombinant nucleic acid sequence encod 45 at least part of the day, and also supplied with a reduced ing a prokaryotic acyl-ACP thioesterase. For example, the carbon Source. Such as a Sugar (e.g., glucose, fructose, galac photosynthetic microorganism that includes the recombinant tose, mannose, rhamnose, arabinose, Xylose, lactose, Sucrose, nucleic acid molecule encoding the prokaryotic acyl-ACP maltose), an organic acid form (e.g., acetate, citrate. Succi thioesterase can produce (and optionally but preferably nate), and/or glycerol. The photosynthetic microorganism, release and/or secrete) at least 5 mg per liter, for example at 50 alternately, can be cultured mixotrophically, such that the least 10 mg per liter, at least 20 mg per liter, at least 30 mg per organism is grown in the presence of light for at least a part of liter, at least 40 mg per liter, or at least 50 mg per liter, of free the day, and also provided with one or more sources of fatty acids and/or fatty acid derivatives. reduced carbon. Cells can alternately be grown heterotrophi The method can further comprise isolating/removing the cally, where a reduced carbon Source is provided in the media free fatty acid and/or derivative from the culture, e.g., from 55 for energy and biochemical synthesis. A photosynthetic the cells, the growth media, or the whole culture. For organism can be grown mixotrophically for a period of time, example, the isolation can be by organic extraction of whole followed by a period of phototrophic growth, or vice versa. or lysed cells, removal of free fatty acids or fatty acid deriva A variety of media for phototrophic and/or mixotrophic tives as precipitates or from the upper layer of the culture growth of algae and cyanobacteria are known in the art, and media ("skimming'), through the use of particulate adsor 60 media can be optimized to enhance growth or production of bents, bubbles, or matrices that bind the fatty acids and/or fatty acid products for a particular species. derivatives, or the like, or any combination thereof. Microorganisms that may be useful in accordance with the The genetically engineered photosynthetic microorganism methods of the present invention can be found in various can be any as described herein that includes a recombinant locations and environments throughout the world. As a con nucleic acid molecule encoding a prokaryotic acyl-ACP 65 sequence of their isolation from other species and their result thioesterase, whose expression can result in production of ing evolutionary divergence, the particular growth medium free fatty acids and/or fatty acid derivatives. The acyl-ACP for optimal growth and generation of lipid and/or hydrocar US 8,530,207 B2 37 38 bon constituents can vary. In some cases, certain strains of Medium with an Elevated Level of a Carboxylate Counterion microorganisms may be unable to grow on aparticular growth Source' and filed on the same day herewith. medium because of the presence of some inhibitory compo For production of fatty acids and/or fatty acid derivatives, nent or the absence of some essential nutritional requirement photosynthetic microorganisms can be grown indoors (e.g., required by the particular strain of microorganism. in photobioreactors, in shake flasks, test tubes, vials, micro titer dishes, petri dishes, or the like) or outdoors (e.g., in Solid and liquid growth media are generally available from ponds, canals, trenches, raceways, channels, or the like). a wide variety of Sources, as are instructions for the prepara Additionally or alternately, a source of inorganic carbon tion of particular media suitable for a wide variety of strains (such as, but not limited to, CO2), including, but not limited of microorganisms. For example, various freshwater and salt to, air, CO2 enriched air, or flue gas, can be Supplied to the water media can include those described in Barsanti, L. amd 10 photosynthetic microorganisms. Gualtieri, P. (2005) Algae: Anatomy, Biochemistry, and Bio Additionally or alternately, the present invention can technology, CRC Press, Taylor & Francis Group, Boca Raton, include one or more of the following embodiments. Fla., USA, which is incorporated herein by reference for Embodiment 1 media and methods for culturing algae. Algal media recipes 15 can also be found at the websites of various algal culture A photosynthetic microorganism (e.g., a microalga or a collections, including, as nonlimiting examples, the UTEX cyanobacterium) comprising a recombinant nucleic acid mol Culture Collection of Algae (sbs.utexas.edu/utex/medi ecule encoding a prokaryotic acyl-acyl carrier protein (acyl a.aspx); Culture Collection of Algae and Protozoa (cca ACP) thioesterase, wherein the photosynthetic microorgan p.ac.uk/media/pdfrecipes); and Katedra Botaniky (bot ism (e.g., through expression of the prokaryotic acyl-ACP any.natur.cuni.cz/algo?caup-media.html). thioesterase) results in production of at least one free fatty In some embodiments, media used for culturing an organ acid and/or fatty acid derivative. ism that produces fatty acids can include an increased con centration of a metal (typically provided as a salt and/or in an Embodiment 2 ionic form) Such as, for example, Sodium, potassium, mag 25 The photosynthetic microorganism according to embodi nesium, calcium, strontium, barium, beryllium, lead, iron, ment 1, wherein the at least one fatty acid derivative com nickel, cobalt, tin, chromium, aluminum, Zinc, copper, or the prises at least one fatty aldehyde, at least one fatty alcohol, at like, or combinations thereof (particularly multivalent metals, least one wax ester, at least one alkane, at least one alkene, or Such as magnesium, calcium, and/or iron), with respect to a a combination thereof, and/or has a total number of carbons standard medium formulation, such as, for example, standard 30 from 7 to 36, for example from 11 to 34 or from 11 to 32. BG-11 medium (ATCC Medium 616, Table 2), or a modified medium such as ATCC Medium 854 (BG-11 modified to Embodiment 4 contain vitamin B12) or ATCC Medium 617 (BG-11 modi fied for marine cyanobacteria, containing additional NaCl The photosynthetic microorganism according to any one of and vitamin B12). 35 the previous embodiments, wherein the photosynthetic For example, a medium used for growing microorganisms microorganism is capable of producing at least one fatty acid that produce free fatty acids can include at least 2-fold, for having an acyl chain length from 8 to 24 carbons or from 8 to example at least 3-fold, at least 4-fold, at least 5-fold, at least 18 carbons. 6-fold, at least 7-fold, at least 8-fold, at least 9-fold, at least 10-fold, between 2-fold and 10-fold, and/or between 10-fold 40 Embodiment 5 and 100-fold the amount of metal (e.g., calcium) as compared The photosynthetic microorganism according to any one of to a standard medium. The medium used for growing micro the previous embodiments, wherein at least 30 wt % of the organisms that can produce free fatty acids can include, for free fatty acids produced by the photosynthetic microorgan example, at least about 0.5 mM, between about 0.5 mM and ism are free fatty acids having an acyl chain length of 12 about 1 mM, between about 1 mMandabout 2 mM, between 45 about 2 mMandabout 5 mM, between about 5 mMandabout carbons, 14 carbons, 16 carbons, or any mixture thereof. 10 mM, between about 10 mMandabout 25 mM, and greater Embodiment 6 than 25 mM metal (e.g., calcium) in the formulation. In further embodiments, by using the excess amount of The photosynthetic microorganism according to any one of metal (e.g., calcium) in the medium, at least a portion of the 50 the previous embodiments, wherein the prokaryotic acyl fatty acid(s) can be sequestered as Soap precipitates, which ACP thioesterase has at least 70% amino acid sequence iden may result in decreasing the toxic effects of free fatty acid(s). tity, for example at least 75%, at least 80%, at least 85%, at Addition of metal (e.g., calcium) in the medium can addition least 90%, at least 95%, at least 96%, at least 97%, at least ally or alternately increase the tolerance of microorganism in 98%, at least 99%, or about 100% sequence identity, to SEQ media with a relatively high concentration of free fatty acids. 55 ID NO:1 or SEQID NO:2. Additionally or alternately, fatty acid-producing strains can advantageously be more robust with excess metal (e.g., cal Embodiment 7 cium) content. Although the excess component is described herein as a metal, it is contemplated that the component can The photosynthetic microorganism according to any one of more generally be described as a carboxylate counterion 60 the previous embodiments, wherein the nucleic acid molecule Source, for example an Soap-forming counterion source, a encoding the prokaryotic acyl-ACP thioesterase comprises metalion source (noted as "metal herein), a multivalent (i.e., nucleotide sequence SEQID NO:3 or SEQID NO:4. having a valence of +2 or higher) counterion source, a diva lent counterion Source, or some combination. Other details Embodiment 8 regarding this metal/carboxylate counterion source are 65 described in the co-pending, commonly-assigned patent The photosynthetic microorganism according to any one of application, entitled “Culturing a Microorganism in a the previous embodiments, wherein the nucleic acid molecule US 8,530,207 B2 39 40 encoding the prokaryotic acyl-ACP thioesterase is stably free fatty acid and/or derivative from the photosynthetic integrated into a chromosome of the photosynthetic microor microorganism, the growth media, or the whole culture. ganism and/or is in an expression construct. Embodiment 16 Embodiment 9 The method according to embodiment 12, wherein the The photosynthetic microorganism according to embodi photosynthetic microorganism comprises at least one nucleic ment 8, wherein the expression construct comprises a pro acid molecule encoding an prokaryotic acyl-ACP moter operably linked to the nucleic acid molecule encoding thioesterase having at least 70% amino acid sequence identity the prokaryotic acyl-ACP thioesterase, and optionally 10 to SEQID NO:1, wherein the photosynthetic microorganism wherein the promoter is functional in the photosynthetic produces at least one free fatty acids or fatty acid derivative, microorganism. wherein at least 50%, at least 60%, or at least 65% of the free fatty acids or fatty acid derivatives produced are C12, C14, Embodiment 10 and/or C16 free fatty acids or fatty acid derivatives. 15 The photosynthetic microorganism according to any one of Embodiment 17 the previous embodiments, wherein the photosynthetic microorganism further comprises at least one additional The method according to embodiment 12, wherein the nucleic acid molecule encoding at least one additional photosynthetic microorganism comprises at least one nucleic polypeptide Such as acetyl-CoA carboxylase or B-ketoacyl acid molecule encoding an prokaryotic acyl-ACP synthase (KAS), wherein expression of the additional nucleic thioesterase having at least 70% amino acid sequence identity acid molecule in the photosynthetic microorganism enhances to SEQID NO:2, wherein the photosynthetic microorganism production of a free fatty acid and/or fatty acid derivative. produces at least one free fatty acids or fatty acid derivative, wherein at least 50%, of the free fatty acids or fatty acid Embodiment 11 25 derivatives produced are C12, C14, and/or C16 free fatty acids or fatty acid derivatives. The photosynthetic microorganism according to any one of the previous embodiments, wherein the photosynthetic Embodiment 17 microorganism has attenuated expression of at least one gene encoding a protein comprising acyl-acyl carrier protein 30 The method according to embodiment 12, wherein the (ACP) synthase, acyl-CoA dehydrogenase, glycerol-3-phos photosynthetic microorganism comprises at least one nucleic phate dehydrogenase, acetaldehyde-CoA dehydrogenase, acid molecule encoding an prokaryotic acyl-ACP pyruvate dehydrogenase, acetate kinase, and combinations thioesterase having at least 70% amino acid sequence identity thereof. to SEQID NO:2, wherein the photosynthetic microorganism 35 produces at least one free fatty acids or fatty acid derivative, Embodiment 12 wherein at least 50%, of the free fatty acids or fatty acid derivatives produced are C14, C16, and/or C18 free fatty A method for producing a free fatty acid and/or fatty acid acids or fatty acid derivatives. derivative in a culture, the method comprising culturing a Unless defined otherwise, all technical and scientific terms photosynthetic microorganism in growth media, wherein the 40 used herein have the same meaning as commonly understood photosynthetic microorganism comprises at least one nucleic by one of ordinary skill in the art to which this invention acid molecule encoding an prokaryotic acyl-ACP belongs. Although any methods and materials similar or thioesterase according to any one of the previous embodi equivalent to those described herein can also be used in the ments; and wherein the photosynthetic microorganism is practice or testing of the described invention, the preferred grown under a condition that allows expression of the 45 methods and materials are now described. All publications prokaryotic acyl-ACP thioesterase in the photosynthetic mentioned herein are incorporated herein by reference in microorganism during a culturing period. their entireties. Nucleic acid and amino acid sequences identified by Embodiment 13 Accession Numbers or GenInfo Identifiers are also incorpo 50 rated by reference herein. Accession numbers are unique The method according to embodiment 12, wherein at least identifiers for a sequence record publicly available at the a portion of the free fatty acid and/or fatty acid derivative is National Center for Biotechnology Information internet site secreted into the growth media. maintained by the United States National Institutes of Health, which can be accessed at ncbi.nlm.nih.gov. The “GenInfo Embodiment 14 55 Identifier” (GI) sequence identification number is specific to a nucleotide oramino acid sequence. If a sequence changes in The method according to embodiment 12 or embodiment any way, a new GI number is assigned. A Sequence Revision 13, wherein the growth media does not include a substantial History tool is available to track the various GI numbers, amount of a reduced carbon Source, wherein the culture is version numbers, and update dates for sequences that provided with at least one source of inorganic carbon, and/or 60 appeared in a specific Genbank record. Searching and obtain wherein the culture is exposed to light for at least a portion of ing nucleic acid or gene sequences or protein sequences based the culturing period. on Accession numbers and GI numbers is well known in the arts of cell biology, biochemistry, molecular biology, and Embodiment 15 molecular genetics. 65 It must also be noted that as used herein and in the The method according to any one of embodiments 12-14, appended claims, the singular forms “a,” “an and “the wherein the method further comprises isolating at least one include plural referents unless the context clearly dictates US 8,530,207 B2 41 42 otherwise. All technical and scientific terms used herein have CP000568) were obtained from National Center for Biologi the same meaning. The use of 'or' in a listing of two or more cal Information (NCBI). These sequences then were used to items indicates that any combination of the items is contem design nucleotide sequences consistent with the codon usage plated, for example, “A or B' indicates that A alone, B alone, of Synechocystis sp. PCC6803 (at kazusa.or.jp/codon/cgi or both A and B are intended. bin/show.codon.cgi?species 1148). Genes encoding the The publications discussed herein are provided solely for Parabacteroides distasonis thioesterase EMRE031, codon their disclosure prior to the filing date of the present applica optimized for expression in Synechocystis (SEQ ID NO:3), tion. Nothing herein is to be construed as an admission that and the Clostridium thermocellum acyl-ACP thioesterase the described invention is not entitled to antedate such pub EMRE032, codon-optimized for expression in Synechocystis lication by virtue of prior invention. Further, the dates of 10 (SEQ ID NO:4), were synthesized by GENEWIZ (La Jolla, publication provided may be different from the actual publi Calif.). cation dates which may need to be independently confirmed. In order to express the Parabacteroides distasonis acyl ACP thioesterase EMRE031 (SEQ ID NO:1) and the EXAMPLES Clostridium thermocellum EMRE032 (SEQ ID NO:2) in E. 15 coli and Synechocystis, the inserts (i.e., the Parabacteroides The following examples are put forth so as to provide those distasonis thioesterase gene (SEQ ID NO:3) and the of ordinary skill in the art with a complete disclosure and Clostridium thermocellum thioesterase gene (SEQID NO:4)) description of how to make and use the described invention, were subcloned into a pYC expression vector. The pYC vec and are not intended to limit the scope of what the inventors tor was derived from a pUC19 backbone, which includes a regard as their invention nor are they intended to represent bacterial origin of replication for maintenance of the plasmid that the experiments below are all or the only experiments in E. coli. The pYC vector includes the RS2'up' (5') and RS1 performed. Efforts have been made to ensure accuracy with “down” (3') sequences from the Synechocystis genome for respect to numbers used (e.g., amounts, temperature, etc.), homologous recombination (Williams et al., 1988, Methods but some experimental errors and deviations should be in Enzymology, 167: 766-778). In addition, the expression accounted for. Unless indicated otherwise, parts are parts by 25 vector included an omega-Sp cassette providing spectinomy weight, molecular weight is weight average molecular cin resistance, and the isopropyl B-D-1-thiogalactopyrano weight, temperature is in degrees Centigrade, and pressure is side (IPTG)-inducible trcY promoter (SEQ ID NO:9). at or near atmospheric. Specifically, in order to create the pYC vector expressing prokaryotic thioesterases, the RS2 sequence (including both Example 1 30 the up and down fragments shown in the vector map in FIG. 2) was amplified from Synechocystis sp. PCC 6803 genomic Identification of Prokaryotic Acyl-ACP DNA using the primers: RS2 (5'-GGGCCCTATTTGCCCG Thioesterases TATTCTGCCCTATCC-3'; SEQ ID NO:5) and RS2-3 (5'- GGGCCCGACTGCCTTTGGTGGTATTACCGATG-3'; In order to identify prokaryotic acyl-ACP thioesterases that 35 SEQ ID NO:6). Plasmid puC19 was digested with HindIII can produce fatty acids and be expressed in microorganisms, and EcoRI to remove the multiple cloning site (MCS), and Such as microalgae and cyanobacteria, prokaryotic sequence then treated with T4-DNA polymerase to blunt the ends. The databases were searched using the BLASTP tool (National RS2 sequence (comprising RS2 up and RS2 down, 1.8 kb) Center for Biotechnology Information (NCBI) and using was ligated then into the puC19 backbone. The resulting sequences of known bacterial acyl-CoA thioesterases. A phy 40 plasmid was named pYC34. The pYC34 plasmid was logenetic tree then was constructed using Vector NTIR soft digested then with BglII, which cut within the RS2 sequence, ware (Invitrogen, Carlsbad, Calif.), based on the retrieved opening up the integration site. A copy of the omega-Sp sequences of acyl-CoA thioesterases, non-ribosomal peptide cassette (BamHI fragment) was ligated into the BglII site of synthetase (NRPS) thioesterase modules, polyketide pYC34 to make pYC36. The pYC36 plasmid was digested thioesterase modules, and 4-hydroxybenzoyl-CoA 45 with FspI to remove the majority of the Ampicillin resistance thioesterases. Known plant acyl-ACP thioesterases also were gene (Apr), making spectinomycin/streptomycin as the only added to the tree. Analysis of the phylogenetic tree Suggested selectable marker. The constructed plasmid was named that the prokaryotic acyl-ACP thioesterases form a clade pYC37. An EcoRI fragment containing the lacIq gene was together with plant acyl-ACP thioesterase, distinct from the inserted into the EcoRI site of pYC37, between the RS2 “up” bacterial acyl-CoA thioesterases. Among the identified 50 sequence and the omega Sp cassette to allow for regulation of prokaryotic acyl-ACP thioesterases, two examples of the lac-inducible promoters. The vector further included a TrcY clade were selected for further characterization, i.e., the promoter. The TrcY promoter was amplified using the fol polypeptide EMRE031 (YP 001303423 GI: 150008680; lowing primers: 4YC-trcY-5 (5'-ACTAGTCCTGAGGCT SEQ ID NO: 1) from Parabacteroides distasonis and the GAAATGAGCTGTTGACAATTAATCATC polypeptide EMRE032 (YP 001039461 GI: 12597.5551; 55 CGGCTCGTATAATGTGTGGA ATTGTGAG-3'; SEQ ID SEQ ID NO:2) from Clostridium thermocellum. NO:7) and 4YC-trcY-3 (5'-CCATGGTTTTTTTCCTCCT TAGTGTGAAATTGTTATCCGCTCACAAT Example 2 TCCACACATTATACGAGCCGGAT-3'; SEQID NO:8) and inserted into the vector digested with Spel-Xbal. The plasmid Molecular Cloning of Prokaryotic Acvl-ACP 60 was called pYC45. The sequence of the TrcY promoter is Thioesterases provided as (5'-CTGAAATGAGCTGTTGACAATTAAT CATCCGGCTCGTATAATGTGTGGAAT The nucleotide sequences encoding the Parabacteroides TGTGAGCGGAT AACAATTTCACAC TAAGGAG distasonis EMRE 031 polypeptide (nucleotides 2447794 to GAAAAAAA-3'; SEQ ID NO:9). 2447069 of Genbank Accession CP000140) and the 65 For cloning the prokaryotic thioesterase genes into a pYC Clostridium thermocellum EMRE032 polypeptide (nucle expression vector for use in E. coli and Synechocystis, prim otides 3613743 to 3612982 of Genbank Accession ers were designed to the 5' and 3' ends of each gene, in which US 8,530,207 B2 43 44 the 5' primer had homology to the region of the pYC vector concentration of each internal standard was ~50 g/ml. The upstream of the NcoI cloning site, and the 3' primer had fatty acids for making the internal standard set were pur homology to the region of the pYC vector downstream of the chased from Fluka or Nu Chek Prep. The cultures were then Xbal cloning site, both downstream of the TrcY promoter vortexed on a multi-tube vortexer at ~2,500 rpm for ~30 mins. (SEQ ID NO:9). PCR was performed to generate fragments 5 The vials were finally centrifuged for -3 mins at ~2500 rpm having 5' and 3' ends homologous to the vector. The primers to provide good separation between organic and aqueous for cloning the EMRE031 gene were: 5'-AGGAAAAAAAC phases. The hexane layers were sampled by a Gerstel MPS2L CATGATGGAAAAAGTGGGTCTGTTC-3' (SEQ ID Autos ampler. NO:10) and 5'-CCTGCAGATATCTAGATTACCGCCAG E. coli fatty acid samples were analyzed on an Agilent GTCACGGCTGCCCGAC-3' (SEQID NO:11). The primers 10 model 7890A gas chromatograph equipped with an FID for cloning the EMRE032 gene were: 5'-AGGAAAAAAAC (flame ionization detector) that included a J&W Scientific CATGATGCAAAAAAAGCGGTTCAGCAAG-3' (SEQ ID DB-FFAP capillary column (~15 m length, ~0.25 mm inter NO:12) and 5'-CCTGCAGATATCTAGATTAGGACT nal diameter, ~0.25 um film thickness) coupled to an Agilent GAATTTTCTGCCAAATG-3 (SEQID NO:13). 5975C mass spectrophotometer. The GC oven was pro The pYC expression vector was digested with NcoI and 15 grammed as follows: ~140°C. for -0.5 min., then heated at Xba and co-transformed with the EMRE031 or EMRE032 -20°C./min. to ~230° C. (hold ~5 mins). The injector tem PCR fragment into One Shot R Top 10 competent cells (Invit perature was kept at ~250° C., and a ~40:1 split-1 ul injection rogen, Carlsbad, Calif.) according to the manufacturer's pro was used. Helium was used as a carrier gas at a flow rate of tocol. The competent cells were plated on anagarplate coated ~1.2 ml/min. The analytes were identified by comparison of with spectinomycin (-50 ug/ml) for antibiotic selection. The retention times to individually injected standards. The cali resulting colonies were screened further by PCR using the bration range for the analytes was ~2 ug/ml to ~200 ug/ml for same primers used for generating the EMRE031 and C8:0-C16:1 fatty acids and ~0.5 lug/ml to ~50 ug/ml for EMRE032 fragments for cloning. The PCR conditions were C18:0-C 18:2 fatty acids. Spiking and recovery experiments ~94°C. for ~5 minutes, followed by ~29 cycles of -30 secs at into whole cell culture showed that the extraction method -94° C., -30 secs at -55° C., and -90 secs at -72° C., 25 recovered consistently within a range of about 85%-115% of followed by a final run-off for ~5 min at ~72°C. each analyte. The resulting expression constructs had a pUC origin of Expression of both prokaryotic acyl-ACP thioesterases in replication, a prokaryotic thioesterase gene cloned down E. coli led to an increase infree fatty acids in the whole culture stream of the TrcY promoter and upstream of the T4 termi (cells plus media) compared to a control culture without the nator and flanked by the RS1 up and RS1 down sequences; the 30 genes. (FIGS. 3 and 4). Moreover, the prokaryotic acyl-ACP omega spectinomycin cassette, and the lacIq gene positioned thioesterases exhibited distinct substrate specificity when between the RS1 down and RS1 up sequences. expressed in E. coli. Specifically, as shown in FIG.3, expres The sequence of the pYC expression construct that sion of EMRE031 acyl-ACP thioesterase (SEQID NO: 1) in includes the EMRE-031 gene is provided as SEQID NO:14 E. coli increased the production of free fatty acids having a and the sequence of the pYC expression construct that 35 chain length of 12 and 14 carbons, whereas EMRE032 acyl includes the EMRE-032 gene is provided as SEQID NO:15 ACP thioesterase (SEQ ID NO:2: FIG. 4) showed highest specificity toward fatty acids having a chain length of 14 Example 3 carbons in E. coli, while also demonstrating activity on fatty acids having an acyl chain length of 8, 10, and 12 carbons, Expression of Prokaryotic Acyl-ACP Thioesterase in 40 indicating that the two prokaryotic acyl-ACP thioesterases Escherichia coli exhibit substrate specificities for fatty acids distinct from each other when expressed in E. coli. For the expression of the prokaryotic thioesterase genes (SEQIDNO:3 and SEQID NO:4) in E. coli, -1.2 ml of 2xYT Example 5 media (~1.6% Bacto-tryptone, ~1%. Bacto-yeast extract, 45 ~0.5% NaCl, pH ~7.2) containing ~50 ug/ml spectinomycin Transformation of Cyanobacteria and ~1 mM Isopropyl B-D-1-thiogalactopyranoside (IPTG) in a glass tube was inoculated with ~30 microliters of a The plasmids containing the EMRE031 (SEQ ID NO:3) saturated culture of each bacterial strain and cultured for ~24 and the EMRE032 (SEQ ID NO:4) prokaryotic acyl-ACP hours. About 0.6 mL of the culture was removed for bio 50 thioesterase genes described in Example 2 were introduced chemical analysis. into a cyanobacterial host. Synechocystis sp. PCC 6803 cells were transformed essentially according to (Zang et al. (2007) Example 4 J. Microbiology 45:241-245, the content of which is incorpo rated herein by reference in its entirety). Briefly, cells were Analysis of Fatty Acid Samples from Escherichia 55 grown under constant light to an optical density 730 (O.D. coli 730) of approximately 0.7 to 0.9 (an OD730 of -0.25 corre sponds to ~1 x 108 cells/ml) and harvested by centrifugation at Free fatty acids were analyzed by gas chromatography ~2,000 g for ~15 mins at room temperature (-20-25°C.). The (GC) with flame ionization detection (GC-FID). Specifically, cell pellet was resuspended in approximately 0.3 times the -0.6 mL of the E. coli cultures in Example 3 were added to ~2 60 growth volume of fresh BG-11 medium and used immedi ml glass gas chromatography vials with PTFE (polytetrafluo ately for transformation. About 1 microgram of plasmid DNA roethylene)-lined caps (National Scientific). About fifty (containing the EMRE031 acyl-ACP thioesterase gene (SEQ microliters of an internal standard set that included the free ID NO:3) or EMRE032 acyl-ACP thioesterase gene (SEQID fatty acids C9:0, C13:0, and C17:0, each at a concentration of NO:4)) was added to -0.3 ml of cells, gently mixed, and ~600 g/ml, in hexane, were added to the culture sample, 65 incubated approximately 5 hours with illumination at -30°C. followed by ~50 microliters of -50% H2SO4, -100 microli without agitation. Cells were spread on a filter (Whatmann ters of ~5M NaCl, and ~850 microliters of hexane. The final Nuclepore Polycarbonate Track-Etched membrane, PC ~47 US 8,530,207 B2 45 46 mm, -0.2 micron) positioned on a ~50 ml BG-11 agar plates (flame ionization detector) that included a J&W Scientific and allowed to recover for about 16 to 24 hours under light, DB-FFAP capillary column (~15 m length, ~0.25 mm inter after which the filter was lifted and placed on a fresh BG-11 nal diameter, ~0.25 um film thickness) coupled to an Agilent plate containing spectinomycin (20 ug/ml) to select for trans 5975C mass spectrophotometer. The gas chromatography formants. Resulting colonies were screened further for the presence of the thioesterase genes by PCR using the primers oven was programmed as follows: ~140° C. for -0.5 min, used to generate the gene fragments. then heated at -20°C/min. to -230° C. (hold -5 min). The injector temperature is kept at ~250°C., and a ~40:1 split ~1.0 Example 6 ul injection was used. Helium was used as a carrier gas at a flow rate of ~1.2 mL/min. The analytes were identified by Culturing Cyanobacteria 10 comparison of retention times to individually injected Stan Synechocystis cells transformed with the EMRE031 and dards. The calibration range for the analytes was ~2 g/ml to EMRE032 expression constructs were cultured phototrophi ~200 ug/ml for C8:0-C 16:1 fatty acids and -0.5 g/ml to ~50 cally, using light as an energy source. Ten ml of BG-11 ug/ml for C18:0-C 18:2 fatty acids. medium containing 1 mM IPTG in 20 mL. glass vials were 15 As shown in FIG. 5, expression of prokaryotic acyl-ACP inoculated at an OD730 nm of 0.6 and grown for 6.5 days (150 thioesterases EMRE031 (SEQID NO:1; labeled as 31YC63) rpm) at 30°C. with constant illumination (40 uEinsteins m-2 and EMRE032 (SEQID NO:2; labeled as 32YC63)) led to an sec-1). 0.6 ml of culture was removed for biochemical analy increase in free fatty acids with a chain length of 8, 10, 12, 14, sis. The ingredients of the BG-11 medium (ATCC medium: 16, and/or 18 carbons in the cyanobacterial cultures (Syn 616 Medium BG-11 for blue-green algae) were as follows: echocystis sp. PCC 6803). The Y axis indicates the amount of free fatty acids in the sample and the X axis indicates the amount of free fatty acids with different carbon lengths in the NaNO3 1.5 g. sample. The data presented are averaged results of three cul K2HPO4 40 mg MgSO4·7H2O 75 mg tures of each Strain. These data indicate that, when expressed CaCl2O2H2O 36 mg 25 in cyanobacteria, both prokaryotic acyl-ACP thioesterases Citric acid 6 mg exhibit a distinct substrate specificity (e.g., acyl-ACPs with a Ferric ammonium citrate 6 mg chain length of 8, 10, 12, 14, 16 and/or 18 carbons). Approxi EDTA 1 mg Na2CO3 20 mg mately 60% of the fatty acids produced by the cyanobacterial Trace Metal Mix A5 (see below) 1 ml strain expressing the EMRE031 acyl-ACP thioesterase were Agar (if needed) 10 g 30 C16 fatty acids, and over 65% (approximately 67%) of the Distilled water 1 L fatty acids produced by the cyanobacterial strain expressing Adjust final pH to -7.1 EMRE031 were C12, C14, and C16 fatty acids. The cyano Autoclave at -121°C. for -15 minutes. bacterial strain expressing the EMRE032 acyl-ACP thioesterase produced proportionately more C18 fatty acids Trace Metal Mix A5 Composition: 35 as compared with the EMRE031 acyl-ACP thioesterase-ex pressing strain. The EMRE032-expressing strain produced, on average, greater than 35% C16 fatty acids (as a percentage H3BO3 2.86 g of the total fatty acids produced), with an average of more MCI204H2O 1.81 g ZnSO407E2O 0.22 g 40 than 50% (approximately 55%) of the total fatty acids pro Na2MOO402H2O 0.39 g duced by the EMRE032-expressing strain being C12, C14, or CuSO40SH2O 79.0 mg C16 fatty acids. Approximately 65% (average of approxi Co(NO3)2·6H2O 49.4 mg mately 65.9%) of the fatty acids produced by the EMRE032 Distilled water 1 L expressing strain were C14, C16, or C18 fatty acids. 45 It should be understood by those skilled in the art that Example 7 various changes may be made and equivalents may be Sub stituted without departing from the true spirit and scope of the Analysis of Fatty Acid Samples from Cyanobacteria Invention. In addition, many modifications may be made to (Synechocystis) adapt a particular situation, material, composition of matter, 50 process, process step or steps, to the objective, spirit and Synechocystis fatty acid samples were analyzed on an Agi scope of the described invention. All such modifications are lent model 7890A gas chromatograph equipped with an FID intended to be within the scope of the claims appended hereto.

SEQUENCE LISTING

<16 Os NUMBER OF SEO ID NOS: 3 O

<21 Os SEQ ID NO 1 &211s LENGTH: 241 212s. TYPE: PRT <213> ORGANISM: Parabacteroides distasonis

<4 OOs SEQUENCE: 1 Met Glu Lys Val Gly Leu Phe His Phe Val Ala Glu Pro Tyr Lieu Met 1. 5 1O 15 US 8,530,207 B2 47 48 - Continued

Asp Phe Arg Gly Arg Val Thir Lieu Pro Met Ile Gly Asn Tyr Lieu. Ile 2O 25 3O His Ala Ala Ser Ser His Ala Gly Glu Arg Gly Phe Gly Phe Asn Asp 35 4 O 45 Met Ser Glu Arg His Thr Ala Trp Val Lieu. Ser Arg Lieu Ala Ile Glu SO 55 6 O Met Lys Glu Tyr Pro Thr Ala Phe Asp Lys Ile Asn Lieu. Tyr Thr Trp 65 70 7s 8O Ile Asp Glu Val Gly Arg Lieu. Phe Thir Ser Arg Cys Phe Glu Lieu Ala 85 90 95 Asp Glu Asn Gly Llys Thr Phe Gly Phe Ala Arg Ser Ile Trp Ala Ala 1OO 105 11 O Ile Asp Val Glu Thir Arg Arg Pro Thir Lieu. Lieu. Asp Ile Glu Ala Lieu. 115 12 O 125 Gly Lys Tyr Ile Asp Glu Arg Pro Cys Pro Ile Glu Lys Pro Gly Lys 13 O 135 14 O Ile Met Pro Ala Glu Asn Lys Ala Glu Gly Ile Pro Tyr Ser Ile Llys 145 150 155 160 Tyr Ser Asp Lieu. Asp Ile Asn Gly His Phe Asn. Ser Val Lys Tyr Ile 1.65 17O 17s Glu. His Lieu. Lieu. Asp Lieu. Phe Asp Ile Asp Glin Phe Llys Thr Arg Glu 18O 185 19 O Ile Gly Arg Lieu. Glu Ile Ala Tyr Glin Ser Glu Gly Lys Glin Gly Met 195 2OO 2O5 Pro Lieu. Thir Lieu. His Lys Ala Glu Ser Asp Pro Asp Llys Glin Asp Met 21 O 215 22O Ala Ile Cys His Glu Gly Lys Ala Ile Cys Arg Ala Ala Val Thir Trip 225 23 O 235 24 O Arg

<210s, SEQ ID NO 2 &211s LENGTH: 253 212. TYPE: PRT <213> ORGANISM: Clostridium thermocellum

<4 OOs, SEQUENCE: 2 Met Glin Llys Lys Arg Phe Ser Lys Llys Tyr Glu Val His Tyr Tyr Glu 1. 5 1O 15 Ile Asin Ser Met Glin Glu Ala Thir Lieu. Lieu. Ser Lieu. Lieu. Asn Tyr Met 2O 25 3O Glu Asp Cys Ala Ile Ser His Ser Thir Ser Ala Gly Tyr Gly Val Asn 35 4 O 45 Glu Lieu. Lieu Ala Ala Asp Ala Gly Trp Val Lieu. Tyr Arg Trp Lieu. Ile SO 55 6 O Lys Ile Asp Arg Lieu Pro Llys Lieu. Gly Glu Thir Ile Thr Val Glin Thr 65 70 7s 8O Trp Ala Ser Ser Phe Glu Arg Phe Tyr Gly Asn Arg Glu Phe Ile Val 85 90 95 Lieu. Asp Gly Arg Asp ASn Pro Ile Val Lys Ala Ser Ser Val Trp Ile 1OO 105 11 O Tyr Phe Asn. Ile Llys Lys Arg Llys Pro Met Arg Ile Pro Lieu. Glu Met 115 12 O 125 Gly Asp Ala Tyr Gly Ile Asp Glu Thir Arg Ala Lieu. Glu Glu Pro Phe 13 O 135 14 O

US 8,530,207 B2 51 52 - Continued cc catgcgga t c ccc.ctaga gatgggcgac gcc tacggca ttgatgaga C cc.gagcc ctg 42O gaggagcc.gt t cacggattt tact tcgac titcgagcc.ca aggtgat cqa ggagttcacc 48O gtcaag.cgca gtgacattga Cacca actico Cacgtgaaca acaagaaata C9tcgactgg 54 O at catggaga cqgtaccc.ca acaaatctat gacaattaca agg to acgt.c cctdcagat c 6OO atct acaaga aggagagtag cct cqgcagc gggattaaag ccgggtgcgt catcgacgaa 660

Cagaatacgg acaatcc.gcg cctgcticcac aagatctggg ataaaaacac gggcctic gag 72 O c tagtgtc.cg cc.gagaccat ttggcagaaa attcagt cct aa 762

<210s, SEQ ID NO 5 &211s LENGTH: 31 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Forward primer for amplifying RS2 from Synechocystis

<4 OOs, SEQUENCE: 5 gggc cc tatt togc.ccg tatt citgcc ctato c 31

<210s, SEQ ID NO 6 &211s LENGTH: 32 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Reverse primer for amplifying RS2 from Synechocystis

<4 OOs, SEQUENCE: 6 gggc.ccgact gcc tittggtg gtatt accga tig 32

<210s, SEQ ID NO 7 &211s LENGTH: 70 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Forward primer for amplifying the troY promoter <4 OO > SEQUENCE: 7 actagt cctg aggctgaaat gagctgttga caattaatca tocggct cqt ataatgtgtg 6 O gaattgtgag 70

<210s, SEQ ID NO 8 &211s LENGTH: 70 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Reverse primer for amplifying the troY promoter

<4 OOs, SEQUENCE: 8 c catggittitt titt cotcc tt agtgtgaaat tdttatcc.gc ticaca attcc acacattata 6 O cgagc.cggat 70

<210s, SEQ ID NO 9 &211s LENGTH: 90 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: trcY promoter sequence <4 OOs, SEQUENCE: 9

Ctgaaatgag Ctgttgacaa ttaat catcc ggctictata atgtgtggala ttgtgagcgg 6 O US 8,530,207 B2 53 54 - Continued ataacaattt cacactaagg aggaaaaaaa 9 O

<210s, SEQ ID NO 10 &211s LENGTH: 36 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Forward primer for cloning the codon-optimized Parabacteroides distasonis thioesterase gene <4 OOs, SEQUENCE: 10 aggaaaaaaa C catgatgga aaaagtgggt Ctgttc 36

<210s, SEQ ID NO 11 &211s LENGTH: 42 &212s. TYPE: DNA <213> ORGANISM: Artificial sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Reverse primer for cloning the codon-optimized Parabacteroides distasonis thioesterase gene <4 OOs, SEQUENCE: 11 Cctgcagata t ct agattac cqC caggit ca C9gctgc.ccg ac 42

<210s, SEQ ID NO 12 &211s LENGTH: 39 &212s. TYPE: DNA <213> ORGANISM: Artificial sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Forward primer for cloning the codon-optimized Clostridium thermocellum acyl-ACP thioesterase gene <4 OOs, SEQUENCE: 12 aggaaaaaaa C catgatgca aaaaaag.cgg ttcagcaag 39

<210s, SEQ ID NO 13 &211s LENGTH: 41 &212s. TYPE: DNA <213> ORGANISM: Artificial sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Reverse primer for cloning the codon-optimized Clostridium thermocellum acyl-ACP thioesterase gene <4 OOs, SEQUENCE: 13 cctgcagata t ctagattag gactgaattt totgccaaat g 41

<210s, SEQ ID NO 14 &211s LENGTH: 7646 &212s. TYPE: DNA <213> ORGANISM: Artificial sequence 22 Os. FEATURE: <223> OTHER INFORMATION: pYC expression construct with the codon optimized Parabacteroides distasonis thioesterase gene <4 OOs, SEQUENCE: 14 cagttaccaa togcttaatca gtgaggcacc tat ct cagog atctgtctat titcgttcatc 6 O

Catagttgcc tact.ccc.cg tcgtgtagat alactacgata C9ggagggct tac Catctgg 12 O c cccagtgct gcaatgatac cqc.gagaccc acgct caccq gct coagatt tat cagdaat 18O aalaccagcca gcc.ggaaggg cc.gagcgcag aagtggit cot gcaacttitat cc.gc.ctic cat 24 O Ccagtictatt aattgttgcc gggaagctag agtaagtag t t cqc.cagtta at agtttgcg 3OO Caactgttgg gaagggcgat cqgtgcgggc Ctctt.cgcta ttacgc.ca.gc tiggcgaaagg 360 gggatgtgct gcaaggcgat taagttgggit aacgc.caggg tttitcc.cagt cacgacgttg 42O

US 8,530,207 B2 67 68 - Continued t caggggata acgcaggaaa gaacatgtga gcaaaaggcc agcaaaaggc Cagga accgt. 69 OO aaaaaggc.cg gtttitt.ccat aggct Cogcc cc cct gacga gcatcacaaa 696 O aatcgacgct Caagttcagag gtggcgaaac ccgacaggac tataaagata ccaggcgttt cc.ccctggaa gct coct cqt gcqct citcct gttcc gaccc tgcc.gcttac C9gatacctg 708 O to cqccttitc tcc ctt.cggg aag.cgtgg.cg citt tot cata gct cacgctg. taggitat ct c 714. O agttcggtgt aggtogttcg ctic caagctg ggctgttgttgc acgaaccc.cc cqttcagcc c gaccgctg.cg c ctitat cogg taact atcgt. cittgagt cca acccggtaag acacgacitta 726 O tcgc.cactgg Cagcagccac tgg talacagg attagcagag cgagg tatgt aggcggtgct 732O acagagttct tgaagtggtg gcc talactac ggctacacta gaaggacagt atttggitatic 7380 tgcgct ctgc tgaagc.cagt tacct tcgga aaaagagttg gtagcticttg atc.cggcaaa 744. O caaaccaccg Ctggtagcgg tggitttittitt gtttgcaa.gc agcagattac gcgcagaaaa 75OO aaaggat.ct c aagaagat CC tttgatctitt tctacggggt Ctgacgctica gtggaacgaa 756 O aact cacgtt aagggattitt ggt catgaga ttatcaaaaa. ggat.ctt cac ctagatccitt ttaa attaala aatgaagttt taaat Caatc. taaagtatat atgagtaaac ttggit ct 7677

SEQ ID NO 16 LENGTH: 259 TYPE : PRT ORGANISM: Desulfovibrio desulfuricans FEATURE: OTHER INFORMATION: subsp. desulfuricans str. G20 <4 OOs, SEQUENCE: 16

Met Arg Cys Ile Glin Arg Asp His Met Ile Pro Phe Ala Gly His Thr 1. 5 1O 15

Gly Thir Glu Thr Phe Pro Val Arg Thr Tyr Asp Ala Asp Ser Thr Gly 25 3O

Arg Ala Gly Ile Arg Ala Ile Ala Asp Tyr Phe Glin Glu Ala Ala Ser 35 4 O 45

Gly His Ala Arg Thr Lieu. Gly Phe Pro Ala Glu Arg Lell Arg Thr Glu SO 55 6 O

Glin Luell Ala Trp Val Lieu Ala Arg Lieu. Glin Ile Thir Wall Asn Arg Phe 65 70 7s 8O

Pro Pro Ala Gly Glu Thr Val Thr Ala Wall. Thir Trp Pro Ala Ala His 85 90 95

Glu Arg His Met Ala Tyr Arg Cys Tyr Glu Lieu. Tyr Thir Glin Asp Gly 105 11 O

Glu Luell Luell Ala Ala Gly. Thir Ser Ala Trp Val Thir Ile His Lieu Ala 115 12 O 125

Asp Arg Ser Met Wall Pro Leu Pro Asp Phe Ile Arg Asp Gly Tyr Pro 13 O 135 14 O

Glin Asp Asn Pro Pro Cys Arg Pro Phe Glu Thir Arg Thir Lieu Pro Arg 145 150 155 160

Lell Arg Glu Glu Ala Ala Gly Val Arg Ile Arg Thir Arg Arg Ala Asp 1.65 17O 17s

Lell Asp Ile Asin Gly. His Val Asn Asin Gly His Lell Gln Trp Lieu. 18O 185 19 O

Lell Glu Cys Met Pro Cys Asp Arg Glin Asn. Glu Lell Arg Ala Val Asp 195

Ile Ser Phe Arg Ala Glu. Cys Phe Ala Asp Thr Glu Ile Wal Ser Ala 21 O 215 22O US 8,530,207 B2 69 70 - Continued

Arg Gly Ala Lieu. Thr Gly Glu. His Thr Val Lieu. His Cys Ile Arg Thr 225 23 O 235 24 O

Ala Asp Asp Ala Lys Glu Lieu. Cys Arg Ala Arg Ser Arg Trp Ala Ala 245 250 255

Pro Luell Gly

SEO ID NO 17 LENGTH: 243 TYPE : PRT ORGANISM: Elusimicrobium minutum (strain Pei1.91)

< 4 OOs SEQUENCE: 17

Met Thr Glu Lieu. Glu Phe Llys Pro Arg Tyr Tyr Glu Ala Ala Lieu. Asp 1. 5 1O 15

Asp Ser Ile Pro Val His Met Lieu. Cys Asn Tyr Lieu. Glin Glu Gly Ala 25 3O

Gly Glin Asp Ala Asn Asn Lieu. Ser Phe Gly Arg Glu Glin Ile Gly Glu 35 4 O 45

His Gly Wall Ala Trp Val Lieu. Ser Arg Met Glin Ile Glu Lieu. Ile Asn SO 55 6 O

Lys Ala Wall Luell Gly Llys Llys Lieu Lys Wall Lys Thir Trp Pro Ser Phe 65 70 7s 8O

Ser Glu Ile Ile Ser Arg Arg Glu Tyr Ile Ile Thr Asp Glu Asp 85 90 95

Gly Ile Ile Lieu Lys Cys Ser Ser Trp Trp Lieu. Ile Lieu. Asn Lieu. 105 11 O

Asn Thir Arg Ile Thr Arg Ile Pro Gln His Met Lieu. Asp Lieu. Asn 115 12 O 125

Thir Glu Pro Asp Phe Met Val Glu Glu Gly Asn. Phe Llys Lieu Lys 13 O 135 14 O

Thir Pro Glin Asp Ala Llys Pro Val Phe Ser Lys Asp Phe Lieu Val Arg 145 150 155 160

Lell Glu Asp Ile Asp Cys Asn Gly His Val Asn. Asn. Thir His Tyr Ile 1.65 17O 17s

Ala Trp Ala Ile Glu Thir Lieu Pro Ala Glu Val Val Lys Asn Llys Thr 18O 185 19 O

Ile Ser Luell Arg Ile Asn. Phe Llys Ser Glu. Cys Ile Asp Gly Asn 195 2OO 2O5

Ile Ser Phe Val Tyr Asp Asn Gly Asn Ser Glu Tyr Ile His 21 O 215 22O

Glu Luell Wall Arg Glu Asp Asp Gly Lys Glu Val Phe Arg Lieu Val Ser 225 23 O 235 24 O

Asn Trp Glin

SEQ ID NO 18 LENGTH: TYPE : PRT ORGANISM: Carboxydothermus hydrogenoformans Z-2901

< 4 OOs SEQUENCE: 18 Met Asin Ser Asn Ile Phe Glu Lieu. Glu Tyr Arg Ile Pro Tyr Tyr Asp 1. 5 1O 15 Val Asp Tyr Gln Lys Arg Thr Lieu. Ile Thr Ser Lieu. Ile Asn Tyr Phe 25 3O

Asn Asp Ile Ala Phe Val Glin Ser Glu Asn Lieu. Gly Gly Ile Ala Tyr 35 4 O 45 US 8,530,207 B2 71 72 - Continued

Lell Thir Glin Asn Asn Lell Gly Trp Wall Luell Met Asn Trp Asp Ile Lys SO 55 6 O

Wall Asp Arg Pro Arg Phe Asn Glu Arg Wall Lell Wall Arg Thir Ala 65 70

Pro His Ser Phe Asn Phe Phe Ala Tyr Arg Trp Phe Glu Ile Tyr 85 90 95

Asp Asn Gly Ile Ile Ala Lys Ala ASn Ser Arg Trp Luell Luell 1OO 105 11 O

Ile Asn Thir Glu Arg Arg Pro Wall Ile Asn Asp Luell Tyr 115 12 O 125

Gly Ile Gly Wall Ser Tyr Glu Asn Asn ASn Ile Lell Pro Ile Glu 13 O 135 14 O

Glu Pro Glin Lell Lell Ser Ile Asp Ile Glu Glin Phe Glu Wall 145 150 155 160

Arg Ser Asp Lell Asp Ser Asn Gly His Wall Asn Asn Wall Lys Tyr 1.65 17s

Wall Wall Trp Ala Lell Asp Thir Wall Pro Luell Glu Ile Ile Ser Asn Tyr 18O 185 19 O

Ser Luell Glin Arg Lell Wall Lys Glu Lys Glu Wall Thir Gly 195 2O5

Thir Wall Arg Wall Lell Thir Gly Ile Luell Ser Glu 21 O 215 22O

<210s, SEQ ID NO 19 &211s LENGTH: 252 212. TYPE : PRT &213s ORGANISM: Moorella thermoacetica ATCC 39073

<4 OOs, SEQUENCE: 19

Met Pro Thir Ser Thir Glin Arg Asp Tyr Glu Wall Arg Tyr Glu 1. 5 15

Thir Asn Phe Luell Lell Glu Ala Ser Pro Wall Thir Ile Lell Gly Tyr Luell 25 3O

Glu Glu Thir Ala Thir Lell His Ser Glu Thir Ala Gly Ile Gly Ile Asn 35 4 O 45

Luell Ala Ala Gly Arg Gly Trp Wall Wall Tyr Arg His Luell SO 55 6 O

Glin Met Glu Arg Tyr Pro Arg Trp Arg Glu His Ile Thir Ile Thir Thir 65 70

Trp Wall Glu Asn Phe Glin Arg Phe Ala His Arg Asp Phe Tyr Ile 85 90 95

His Asp Ala Gly Gly Asn Lell Ile Gly Arg Ala Ala Ser Wall Trp Wall 1OO 105 11 O

Phe Luell Asp Ile His Lys Pro Luell Arg Ile Pro Pro Glin Wall 115 12 O 125

Thir Gly Ala Gly Lell Tyr Pro Glu Wall Ala Wall Pro Gly Ala Phe 13 O 135 14 O

Thir Asp Luell Pro Ser Lell Glu Pro Ala Thir Ala Gly Glu Phe Thir 145 150 155 160

Wall Arg Met Ala Asp Lell Asp Thir Asn His His Ala Asn Asn Lys Arg 1.65 17s

Ile Gly Trp Ile Lell Glu Gly Wall Pro Luell Glu Wall His Arg Thir 18O 185 19 O

Ala Phe Pro Ala Thir Ile Glu Wall Luell Tyr Asp Ala Arg Tyr 195 2OO 2O5 US 8,530,207 B2 73 74 - Continued

Gly Glu. His Ile His Cys Glu Cys Glin Glu Ile Pro Ala Val Glu Gly 21 O 215 22O

Asp Arg Cys Tyr Lieu. His Arg Leu Ser Cys Pro Glu Arg Asp Lieu. Glu 225 23 O 235 24 O

Lell Thir Lieu Ala Arg Thir Thir Trp Arg Lys Arg Arg 245 250

<210s, SEQ ID NO 2 O &211s LENGTH: 247 212. TYPE : PRT &213s ORGANISM: Geobacter metallireducens GS-15

<4 OOs, SEQUENCE:

Met Lys Thr Glu Asn Ser Ile Phe Glu Thir Ser Phe Thir Ile Arg Ser 1. 5 15

Phe Asp Wall Asp Pro His Gly Phe Wall Ser Pro Wall Thir Luell Luell Gly 25

Luell Glin Glu Ala Ala Ser Glu His Met Thir Lell Lell Gly Gly Thir 35 4 O 45

Wall Arg Ser Luell Met Ala Glu Gly Luell Thir Trp Wall Lell Ser Arg Wall SO 55 6 O

His Luell Ser Ile Glu Arg Pro Arg Wall Arg Asn Glu Met Thir Wall 65 70

Arg Thir Trp Pro Ser Lell Arg Glu Gly Arg Phe Thir Arg Glu Phe 85 90 95

Glu Luell Luell Asp Arg Thir Gly Ala Wall Met Ala Arg Ala Thir Thir Ser 105 11 O

Trp Ala Wall Ile Asp Phe Thir Arg Arg Ala Wall Arg Wall Asp Arg 115 12 O 125

His Pro Pro Pro Lel Thir Pro Arg Arg Ala Ile Asp Asp Asp Phe 13 O 135 14 O

Ala Luell Luell Pro Ala Lel Gly Ser Glin Ala Glu Glu Arg Phe Arg 145 150 155 160

Wall Arg Arg Ser Asp Lel Asp Luell Asn His His Wall Asn His Met Wall 1.65

Ala Gly Trp Ala Lel Asp Ala Wall Pro Asp Glu Wall Ala Glu Arg 18O 185 19 O

His Luell Wall Ser Lel Glu Ile Gly Tyr Arg Ala Glu Ala Luell Ala 195 2O5

Gly Glu Glu Wall Thir Wall Cys Ala Ala Glu Asp Asp Asn Gly 21 O 215

Ile Luell Wall Ile His Arg Ile Ala Ser Ala Asp Cys Arg Glu Luell Thir 225 23 O 235 24 O

Arg Luell Arg Thir Arg Trp Arg 245

SEQ ID NO 21 LENGTH: 248 TYPE : PRT ORGANISM: Salinibacter ruber DSM 13855

< 4 OOs SEQUENCE: 21 Met Thr Ala Gly Ile Trp Thr Asp Glu Val Arg Val Arg Ser Tyr Asp 1. 5 15 Val Thr Pro Glin Gly Thr Ala Ser Val Lieu. Thir Lieu Ala Asp Tyr Phe 25 US 8,530,207 B2 76 - Continued Glin Glu Ala Ala Gly Arg His Ala Ala Glu Luell Gly Val Ser Met Thir 35 4 O 45

Asp Luell Arg Ala Asp Gly Glin Ala Trp Wall Luell Ala Phe Met His Met SO 55 6 O

Glin Wall Glu Arg Lell Pro His Glin Asn Glu Ser Lell Arg Ile Glu Thir 65 70

Trp Pro Ser Gly Lell Glu Ser Ala Ser Ala His Arg Glu Phe Wall Phe 85 90 95

His Asp Glu Glu Gly Thir Wall Luell Ala Gly Gly Thir Ser Arg Trp Phe 105 11 O

Wall Phe Asp Wall Asp Arg Arg Arg Pro Wall Arg Pro Pro Arg Wall Luell 115 12 O 125

Ala Asp Ile Glu Gly Pro Asp Arg Pro Gly Pro Ile Asp Glin Asp Luell 13 O 135 14 O

Asp Ala Luell Ser Ala Pro Ala His Thir Asp Arg Glu Glin Thir Phe Thir 145 150 155 160

Ala Arg His Asp Lell Asp Luell Asn Arg His Wall Asn Asn Wall Arg 1.65 17s

Luell Glu Trp Ala Lell Glu Thir Luell Pro Ala Ala Wall Luell Asp Glu 18O 185 19 O

Arg Arg Cys Luell Gly Phe Ala Wall Glin Phe Glu Ala Glu Thir Arg Luell 195

Gly Asp Pro Wall Arg Ala Ser Ala Glu Glin Ile Glu Asp Gly Gly Thir 21 O 215

Lell Arg Wall Arg His Arg Lell Ala His Ala Glu Ser Asp Glin Thir Luell 225 23 O 235 24 O

Ala Luell Wall Arg Thir Thir Trp Luell 245

SEQ ID NO 22 LENGTH: 269 TYPE : PRT ORGANISM: Microscilla marina ATCC 231.34

< 4 OOs SEQUENCE: 22

Met Cys Asp Lieu Ile Ile Glu Asp Ala Lys ASn Ile Luell Trp Tyr 1. 5 1O 15

Met Glu Lys Asn Glin Thir Thir Glin Luell Pro Lys Ile Trp Luell Asp Phe 25

Glu Wall Arg Ala Tyr Glu Wall Asp Ile ASn Arg Wall Ser Pro Wall 35 4 O 45

Thir Ile Ala Asn Tyr Lell Glin Glu Ala Ala Gly Glin His Ala Asp His SO 55 6 O

Lell Gly Wall Gly Wall Thir Asp Luell Luell His Arg Lell Thir Trp Wall 65 70

Lell Thir Arg Ile Lys Ile Asp Met Glin Glin Tyr Pro Ser Arg Tyr Glu 85 90 95

Pro Wall Arg Wall Lell Thir Pro Ile Gly Tyr Asp Tyr Phe Wall 105 11 O

Arg Asn Phe Glin Lell Tyr Asn Ala Glin Gly Glin Ile Gly Glin 115 12 O 125

Ala Thir Ser Thir Trp Ala Wall Met Asp Ile Glin Ala Arg Met Wall 13 O 135 14 O

Gly Wall Pro Glin Lell Ile Thir Ser Luell Pro Ile Pro Asp Asp Glu Asp 145 150 155 160 US 8,530,207 B2 77 78 - Continued

Phe Ile Thir Arg Thir Gly Ile Ala Lys Wall Asn Ala Pro Luell 1.65 17O 17s

Ser Glu Thir Luell Phe Wall Arg Trp Asn Asp Lell Asp Thir Asn Glin 18O 185 19 O

His Thir Asn Asn Ala Luell Glin Trp Ala Ile Glu Ser Luell Pro 195

Glu Glu Wall Luell Ser Arg Glin Luell Ala Ser Ile Asp Luell Luell Tyr 21 O 215

Arg Luell Glu Thir Thir Trp Glu Gly Wall Wall Ala Arg Thir Glu Glin 225 23 O 235 24 O

Thir Ser Thir Glin Pro Lell Ser Phe Ile His Glin Lell Ile Arg Glu Ser 245 250 255

Asp Glin Glu Lell Ala Glin Ala Thir Thir Wall Trp Wall 26 O 265

SEQ ID NO 23 LENGTH: 251 TYPE : PRT ORGANISM: Enterococcus faecallis TXO104

< 4 OOs SEQUENCE: 23

Met Glu Lys Gly Asp Phe Wall Gly Lys His Thir Ser Ser Tyr 1. 5 15

Glu Wall Ala Tyr Tyr Asp Gly Asp Phe Thir Gly Ala Met Lys Ile Pro 25

Ala Luell Luell Ala Wall Wall Ile Lys Wall Ser Glu Glu Glin Thir Glu Luell 35 4 O 45

Lell Gly Arg Asn Ala Ala Tyr Wall Ala Glin Phe Gly Lell Gly Trp Wall SO 55 6 O

Ile Thir Asn Tyr Glu Ile Glu Ile His Arg Luell Pro Wall Gly Glu 65 70

Wall Ala Ile Thir Thir Glin Ala Met Ser Tyr Asn Phe 85 90 95

Arg Asn Phe Trp Wall His Asp Glu Glu Gly Asn Glu Cys Wall Phe 105 11 O

Wall Ser Thir Phe Wall Lell Met Asp Glin Lys Asn Arg Ile Ser 115 12 O 125

Ser Wall Luell Pro Glu Ile Ile Ala Pro Phe Asp Ser Glu Ile Thir 13 O 135 14 O

Lys Ile Arg His Glu Ile Glu Wall Thir Glu Gly Asn Phe 145 150 155 160

Lell Pro Arg Wall Arg Phe Phe Asp Ile Asp Gly Asn Glin His Wall 1.65 17O 17s

Asn Asn Ala Ile Tyr Phe Asn Trp Luell Luell Asp Wall Lell Gly Tyr Asp 18O 185 19 O

Phe Luell Thir Thir His Glin Pro Lys Ile Luell Wall Lys Phe Asp 195

Glu Wall Glu Tyr Gly Glin Glu Wall Glu Ser His Tyr Glu Ile Wall Glu 21 O 215 22O

Glin Glu Asn His Lell Lys Thir Arg His Glu Ile Arg Ile Asp Gly Glin 225 23 O 235 24 O

Thir Glu Ala Asn Ile Asp Trp Thir ASn 245 250

<210s, SEQ ID NO 24 &211s LENGTH: 261 US 8,530,207 B2 79 - Continued

212. TYPE: PRT <213> ORGANISM: Lactobacillus plantarum JDM1 <4 OOs, SEQUENCE: 24 Met Ala Thir Lieu. Gly Ala Asn Ala Ser Lieu. Tyr Ser Glu Gln His Arg 1. 5 1O 15 Ile Thr Tyr Tyr Glu. Cys Asp Arg Thr Gly Arg Ala Thr Lieu. Thir Thr 2O 25 3O Lieu. Ile Asp Ile Ala Val Lieu Ala Ser Glu Asp Glin Ser Asp Ala Lieu. 35 4 O 45 Gly Lieu. Thir Thr Glu Met Val Glin Ser His Gly Val Gly Trp Val Val SO 55 6 O Thr Glin Tyr Ala Ile Asp Ile Thr Arg Met Pro Arg Glin Asp Glu Val 65 70 7s 8O Val Thr Val Ala Val Arg Gly Ser Ala Tyr Asn Pro Tyr Phe Ala Tyr 85 90 95 Arg Glu Phe Trp Ile Arg Asp Ala Asp Gly Glin Gln Lieu Ala Tyr Ile 1OO 105 11 O Thir Ser Ile Trp Val Met Met Ser Glin Thr Thr Arg Arg Ile Val Lys 115 12 O 125 Ile Lieu Pro Glu Lieu Val Ala Pro Tyr Glin Ser Glu Val Val Lys Arg 13 O 135 14 O Ile Pro Arg Lieu Pro Arg Pro Ile Ser Phe Glu Ala Thr Asp Thir Thr 145 150 155 160 Ile Thr Llys Pro Tyr His Val Arg Phe Phe Asp Ile Asp Pro Asn Arg 1.65 17O 17s His Val Asn. Asn Ala His Tyr Phe Asp Trp Lieu Val Asp Thir Lieu Pro 18O 185 19 O Ala Thr Phe Lieu. Lieu Gln His Asp Lieu Val His Val Asp Val Arg Tyr 195 2OO 2O5 Glu Asn. Glu Val Llys Tyr Gly Glin Thr Val Thr Ala His Ala Asn. Ile 21 O 215 22O Lieu Pro Ser Glu Val Ala Asp Glin Val Thir Thr Ser His Lieu. Ile Glu 225 23 O 235 24 O Val Asp Asp Glu Lys Cys Cys Glu Val Thir Ile Glin Trp Arg Thr Lieu 245 250 255

Pro Glu Pro Ile Glin 26 O

<210s, SEQ ID NO 25 &211s LENGTH: 243 212. TYPE: PRT <213> ORGANISM: Leuconostoc mesenteroides 22 Os. FEATURE: <223> OTHER INFORMATION: subsp. mesenteroides ATCC 82.93 <4 OOs, SEQUENCE: 25 Met Lys Llys Tyr Glu Ile Lys Arg Arg Val Glu Tyr Tyr Glu Ala Asp 1. 5 1O 15 Thir Thr Gln Lys Lieu. Ser Leu Pro Met Ile Lieu. Asn Tyr Ala Val Lieu. 2O 25 3O Ala Ser Lys Cys Glin Ser Asp Glu Lieu. Gly Val Gly Glin Asp Phe His 35 4 O 45 Lieu. Gly Arg Gly Lieu. Gly Trp Ile Ile Lieu. Glin Tyr Glu Val Ile Ile SO 55 6 O

Lys Arg Arg Pro Llys Ile Gly Glu Ile Ile Arg Ile Glin Thir Phe Ala 65 70 7s 8O US 8,530,207 B2 81 82 - Continued

Thir Glin Tyr Asn Pro Phe Phe Wall Arg Arg Pro Phe Wall Phe Luell Asp 85 90 95

Glu Asn Asp Asn Glu Ile Ile Arg Wall Asp Ser Ile Trp Thir Met Ile 105 11 O

Asp Met Thir Asn Arg Arg Met Ala Arg Luell Pro Glin Asp Ile Ile Asp 115 12 O 125

Glin Tyr Asp Ala Arg Wall Glin Ile Ser Arg Ile Pro Asn Pro 13 O 135 14 O

Glu Phe Ser Asp Asp Asp Glin Tyr Thir Glu Arg Asp His Wall 145 150 155 160

Arg Luell Asp Ile Asp Gly Asn His Wall Asn Asn Ser Lys Tyr 1.65

Phe Glu Trp Met Glin Asp Wall Ile Ala Pro Glu Tyr Lell Luell Thir His 18O 185 19 O

Glu Ile Thir Ile Asn Lell Lys Phe Glu ASn Glu Ile Arg Luell Gly 195

His Thir Ile Luell Ser Glin Wall Wall Glin Asn Asp Asn Ser His 21 O 215 22O

Arg Ile Met Met Glu Asn Wall Ile Ser Ala Glu Ala Glu Phe Trp 225 23 O 235 24 O

Arg Lys Ile

<210s, SEQ ID NO 26 &211s LENGTH: 26 O 212. TYPE : PRT <213> ORGANISM: Oenococcus oeni ATCC BAA-1163

<4 OOs, SEQUENCE: 26

Met Phe Ile Llys Phe Thir Ile Arg Arg Lys Thir Met Ser Glu Ile Tyr 1. 5 15

Ser Glu Glu Luell Tyr Ile Glu Asp Phe Tyr Asp Arg Thir Ala 25

Lell Ser Luell Pro Met Ile Thir Glu Luell Ala Ile Ser Wall Ser Ser 35 4 O 45

Glin Thir Wall Glu Met Gly Ile Gly Met Glin Arg Lell Wall Glu Ala His SO 55 6 O

His Gly Trp Ile Lell Lell Glin Tyr Asp Ile Lys Ile Asn Arg Arg Pro 65 70 8O

Asn Luell Gly Glu Lys Ile Ile Arg Thir Asp Pro Arg His Asn 85 90 95

Arg Phe Phe Ala Phe Arg Asp Phe Asp Phe Phe Asp Glin Asp Gly Asn 105 11 O

Thir Luell Ile His Ile Asp Ser Luell Trp Ala Met Ile Asp Luell 115 12 O 125

Arg Arg Luell Wall Ser Ile Asp Ser Glu Phe Wall Asp Pro Luell Gly 13 O 135 14 O

Asn Luell Wall Asp Arg Lell Glu Arg Luell Glu Lys Pro Ala Asp Luell Asp 145 150 155 160

Arg Ser Phe Ser Gly Ala Ser Ile Ser Wall Glu Glu Ile Ala Asn 1.65 17O 17s

Phe Asp Ile Asp Thir Asn Glin His Wall ASn Asn Ser Asn Tyr Luell 18O 185 19 O

Phe Phe Luell Wall Pro Wall Ala Glu Asn Phe Lell Lell Arg His Glu 195 2OO 2O5 US 8,530,207 B2 83 - Continued Pro Lys Arg Ile Lieu. Ile Llys Tyr Val Lys Glu Ile Arg Lieu. Asn Glin 21 O 215 22O Ser Val Val Ser Met Ala Glin Phe Ile Gly Pro Leu. His Ser Val His 225 23 O 235 24 O Glu Ile Ser Asp Asn. Ser Lieu. Ile Asn Ala Glin Ala Glu Ile Glu Trp 245 250 255

Ser Glu Wall Glu 26 O

<210s, SEQ ID NO 27 &211s LENGTH: 282 212. TYPE: PRT <213> ORGANISM: Mycobacterium smegmatis str. MC2 155

<4 OOs, SEQUENCE: 27 Met Thr Gly Thr Glu Lys Gly Ser Lys Asp Ser Ser Glu Asn Ser Lieu. 1. 5 1O 15 Val Ser Thr Gly Lieu Ala Lys Thr Met Met Pro Val Pro Asp Pro His 2O 25 3O Pro Asp Val Phe Asp Thr Gly Trp Pro Lieu. Arg Val Ala Asp Ile Asp 35 4 O 45 Arg Asn Gly Arg Lieu. Arg Phe Asp Ala Ser Thr Arg His Ile Glin Asp SO 55 6 O Ile Gly Glin Asp His Lieu. Arg Glin Lieu. Gly Phe Glu Asp Thir His Pro 65 70 7s 8O Ala Trp Ile Val Arg Arg Thr Met Ile Asp Met Ile Glu Pro Ile Glu 85 90 95 Phe Pro Glu Lieu. Lieu. Arg Lieu. Arg Arg Trp Cys Ser Gly. Thir Ser Asn 1OO 105 11 O Arg Trp Cys Glu Met Arg Val Arg Ile Asp Gly Arg Lys Gly Gly Lieu 115 12 O 125 Val Glu Ser Glu Ala Phe Trp Ile Asn. Ile Asn Arg Glu Thr Glin Gly 13 O 135 14 O Pro Ser Arg Ile Ala Asp Asp Phe Lieu Ala Gly Lieu. Arg Arg Thir Thr 145 150 155 160 Asp Ile Asp Arg Lieu. Arg Trp Llys Pro Tyr Lieu Lys Ala Gly Ser Arg 1.65 17O 17s Glu Asp Ala Lieu. Glu Ile Arg Glu Tyr Pro Val Arg Val Ala Asp Ile 18O 185 19 O Asp Lieu. Phe Asp His Met Asn. Asn Ala Val Tyr Trp Thr Val Val Glu 195 2OO 2O5 Asp Tyr Lieu. Tyr Thr His Pro Glu Lieu. Lieu Ala Glin Pro Lieu. Arg Val 21 O 215 22O Thir Ile Glu. His Asp Ala Pro Val Ala Lieu. Gly Asp Llys Lieu. Glu Ile 225 23 O 235 24 O Ile Ser His Thr His Pro Pro Gly Thr Thr Asp Llys Phe Gly Pro Glu 245 250 255 Lieu. Thir Asp Arg Thr Val Thr Thr Lieu. Thr Tyr Ala Val Gly Glu Glu 26 O 265 27 O Thir Lys Ala Val Ala Cys Lieu. Phe Ser Lieu. 27s 28O

<210s, SEQ ID NO 28 &211s LENGTH: 271 212. TYPE: PRT <213> ORGANISM: Mycobacterium vanbaalenii PYR-1 US 8,530,207 B2 85 86 - Continued

<4 OOs, SEQUENCE: 28

Met Ser Thir Thr Pro Gly Ala Thir Gly Luell Ala Ala Met Met Pro 1. 15

Wall Pro Asp Pro His Pro Asp Wall Phe Asp Ile Glin Trp Pro Luell Arg 25

Wall Ala Asp Wall Asp Arg Glu Gly Arg Luell Lys Phe Asp Ala Ala Thir 35 4 O 45

Arg His Ile Glin Asp Ile Gly Thir Asp Glin Luell Arg Glu Met Gly SO 55 6 O

Glu Asp Thir His Pro Lell Trp Ile Wall Arg Arg Thir Met Ile Asp 65 70

Ile Glu Pro Wall Wall Phe Asp Met Luell Arg Lell Arg Arg Trp 85 90 95

Ser Gly Thir Ser Asn Arg Trp Glu Met Arg Wall Arg Ile Glu Gly 105 11 O

Arg Gly Gly Lell Ile Glu Ser Glu Ala Phe Trp Ile Asn Ile Asn 115 12 O 125

Arg Glu Thir Glin Gly Pro Ala Arg Ile Ser Asp Asp Phe Ile Glu Gly 13 O 135 14 O

Lell Arg Arg Thir Thir Asp Glu Asn Arg Luell Arg Trp Pro Luell 145 150 155 160

Arg Ala Gly Ser Arg Glu Asp Ala Glu His Ile Arg Asp Pro Wall 1.65 17s

Arg Wall Ser Asp Ile Asp Ile Phe Asp His Met Asn Asn Ser Wall 18O 185 19 O

Trp Ser Wall Wall Glu Asp Luell Ser Glin Pro Glu Luell Met Ser 195

Ala Pro Wall Arg Wall Thir Ile Glu His Asp Luell Pro Wall Ala Luell Gly 21 O 215 22O

Asp Luell Glu Ile Ile Arg His Wall His Pro Ala Gly Ser Thir Asp 225 23 O 235 24 O

Phe Gly Glu Glu Lell Ser Asp Arg Thir Wall Thir Thir Luell Thir 245 250 255

Ala Wall Gly Asn Glu Thir Ala Wall Ala Ala Ile Phe Pro Luell 26 O 265 27 O

<210s, SEQ ID NO 29 &211s LENGTH: 256 212. TYPE : PRT &213s ORGANISM: Rhodococcus erythropolis SK121

<4 OOs, SEQUENCE: 29

Met Asn Arg Thr Lys Wall Asp Asp His Wall Ala Met Ala Glu Luell Pro 1. 5 1O 15

Thir Thir Gly Ser Ile Phe Glin Ala Ser Trp Pro Wall Arg Thir Gly Asp 25

Ile His Thir Asp Lys Glin Lell Arg Luell Asp Ala Ile Ala Arg Luell 35 4 O 45

Glin Asp Ala Gly Phe Asp Asn Luell Wall Asp Arg Gly Ala Met Asp Thir SO 55 6 O

His Pro Luell Trp Met Wall Arg Arg Thir Wall Ile Asp Wall Luell Glu Pro 65 70

Wall Wall Trp Pro Asp Arg Wall His Luell Glin Arg Trp Ser Gly Luell 85 90 95

Ser Asn Trp Cys Ser Met Arg Wall Arg Ile Arg Ser Asp Gly Gly US 8,530,207 B2 87 88 - Continued

105 11 O

Gly Luell Ile Glu Thir Glu Ala Phe Trp Ile ASn Ile Asp Pro Thir 115 12 O 125

Gly Met Pro Ala Ser Ile Ser Glu Phe Thir Ala Ala Luell Ala Ser 13 O 135 14 O

Thir Ala Wall Asp Glin His Lell His Trp Arg Arg Trp Ile Asp Pro Thir 145 150 155 160

Pro Asp Thir Thir Pro Ser Ala Asp Thir Pro Phe Pro Lell Arg Ser Ser 1.65 17O 17s

Asp Phe Asp Pro Phe Asp His Wall Asn Asn Ala Ile Trp Glin Pro 18O 185 19 O

Wall Glu Asp Ala Lell Pro Asp Arg Luell Arg Ser Gly Pro Phe Arg Ala 195 2O5

Ile Luell Glu Tyr Thir Glin Pro Ile Arg Gly Glu Glin Wall Ser Wall 21 O 215

Arg Thir Gly Thir Asn Thir Lell His Ile Asp Ala Gly Asp Glu Ala Arg 225 23 O 235 24 O

Ala Ser Ala Arg Trp Phe Ala Luell Ser Pro Lys Ser Asn Asp Glin Gly 245 250 255

SEQ ID NO 3 O LENGTH: 259 TYPE : PRT ORGANISM: Rhodococcus opacus B4

< 4 OOs SEQUENCE: 3 O

Met Ala Lieu. Asp Arg Pro Lell Ala Ala Pro Pro Ser Arg Gly Arg Phe 1. 5 1O 15

Phe Glu Thir Ser Trp Pro Wall Arg Thir Gly Asp Ile Asp Ala Ala 2O 25 3O

Arg Luell Arg Luell Asp Gly Ile Ala Arg Luell Glin Asp Ala Gly Luell 35 4 O 45

Asp Asn Luell Asp Ala Wall Asp Ala Ala Asp Ser His Pro Luell Trp Ile SO 55 6 O

Wall Arg Arg Thir Wall Ile Asp Wall Luell Arg Pro Ala Wall Trp Pro Glu 65 70

Arg Wall His Luell Arg Arg Trp Ser Ala Luell Ser Thir Arg Trp Thir 85 90 95

Asn Met Arg Wall Glin Ile Arg Gly Glu Ala Gly Ala Lell Ile Glu Thir 105 11 O

Glu Gly Phe Trp Ile His Ile Ser Gly Glu Thir Gly Met Pro Thir Arg 115 12 O 125

Ile Asp Asp Gly Phe Ile Glu Arg Luell Gly Glu Ser Ala Glu Glu His 13 O 135 14 O

Arg Luell Trp Arg Trp Luell Wall Glu ASn Ala Pro Ala Ala Asp 145 150 155 160

Glu Asp Gly Ile Glu Asp Ser Glu Phe Wall Luell Arg Arg Thir Asp Ile 1.65 17O 17s

Asp Pro Phe Asp His Wall Asn Asn Ala Wall Tyr Trp Glin Ala Wall Glu 18O 185 19 O

Glu Luell Luell Ala Asp His Glu His Pro Gly Gly Gly Arg Luell Wall Asp 195

Asn Pro His Arg Ala Wall Lell Glu Luell Ala Pro Ile Wall Ser Thir 21 O 215 22O US 8,530,207 B2 89 90 - Continued

Asp Llys Ile Val Lieu. Arg Ser Arg Arg Asp Asp Thir Ser Lieu. Thr Val 225 23 O 235 24 O Trp Phe Lieu Val Asp Asp Ala Lieu. Arg Ala Val Ala His Val Gly Pro 245 250 255

Lieu. Thir Ala

What is claimed is: dactylum, Phagus, Picochlorum, Platymonas, Pleurochrysis, 1. A photosynthetic microorganism comprising a nucleic Pleurococcus, Prototheca, Pseudochlorella, Pseudoneochlo acid molecule encoding an exogenous acyl-acyl carrier pro ris, Pyramimonas, Pyrobotrys, Scenedesmus, Skeletonema, tein (acyl-ACP) thioesterase selected from the group consist 15 Spyrogyra, Stichococcus, Tetraselmis, Thalassiosira, Viridi ing of SEQ ID NO: 1, SEQ ID NO:2, and variants thereof ella, and Volvox. having at least 80% amino acid sequence identity to the amino 9. The photosynthetic microorganism according to claim 1, acid sequence of SEQ ID NO:1 or 2, wherein the acyl-ACP wherein the photosynthetic microorganism is a cyanobacte thioesterase, when queried against Pfam database hidden rium. Markov models, demonstrates inclusion in Pfam PFO1643 10. The photosynthetic microorganism according to claim with a bit score higher than 20.3 and an E-value of less than or 9, wherein the photosynthetic microorganism is of a genus equal to 0.01, and wherein the photosynthetic microorganism selected from the group consisting of Agmenellum, Ana produces at least one free fatty acid, and further wherein baena, Anabaenopsis, Anacystis, Aphanizomenon, greater than 35% of the at least one free fatty acid or fatty acid Arthrospira, Asterocapsa, Borzia, Calothrix, Chamaesiphon, derivative produced by the photosynthetic microorganism has 25 Chlorogloeopsis, Chroococcidiopsis, Chroococcus, Crina a chain length of 16 carbons. lium, Cyanobacterium, Cyanobium, Cyanocystis, Cyano 2. The photosynthetic microorganism according to claim 1, spira, Cyanothece, Cylindrospermopsis, Cylindrospermum, wherein the photosynthetic microorganism further produces Dactylococcopsis, Dermocarpella, Fischerella, Fremyella, at least one fatty acid derivative comprising at least one fatty Geitleria, Geitlerinema, Gloeobacter; Gloeocapsa, Gloeoth aldehyde, at least one fatty alcohol, at least one fatty acid 30 ece, Halospirulina, Iyengariella, Leptolyngbya, Limnothrix, ester, at least one wax ester, at least one alkane, at least one Lyngbya, Microcoleus, Microcystis, Myxosarcina, Nodu alkene, or a combination thereof. laria, Nostoc, Nostochopsis, Oscillatoria, Phormidium, 3. The photosynthetic microorganism according to claim 2, Planktothrix, Pleurocapsa, Prochlorococcus, Prochloron, wherein the photosynthetic microorganism produces at least Prochlorothrix, Pseudanabaena, Rivularia, Schizothrix, Scy one wax ester having a total number of carbons from 16 to 36. 35 tonema, Spirulina, Stanieria, Starria, Stigonema, Symploca, 4. The photosynthetic microorganism according to claim 2, Synechococcus, Synechocystis, Thermosynechococcus, Toly wherein the photosynthetic microorganism produces at least pothrix, Trichodesmium, Tichonema, and Xenococcus. one free fatty acid, at least one fatty aldehyde, at least one 11. A method for producing a free fatty acid or fatty acid fatty alcohol, at least one alkane, or at least one alkene having derivative in a culture, the method comprising culturing the a carbon chain length of from 8 to 24 carbons. 40 photosynthetic microorganism of claim 1 in a growth 5. The photosynthetic microorganism according to claim 4. medium, wherein at least one free fatty acid, at least one fatty aldehyde, wherein the photosynthetic microorganism is cultured at least one fatty alcohol, at least one alkane, or at least one under conditions that allow for expression of the exog alkene produced by the photosynthetic microorganism has a enous acyl-ACP thioesterase in the photosynthetic chain length of from 12 to 16 carbons. 45 microorganism; and 6. The photosynthetic microorganism according to claim 4. wherein the expression of the exogenous acyl-ACP wherein at least 50% of the at least one free fatty acid, at least thioesterase in the photosynthetic microorganism results one fatty aldehyde, at least one fatty alcohol, at least one in production of at least one free fatty acid or at least one alkane, or at least one alkene produced by the photosynthetic fatty acid derivative. microorganism has a chain length of from 12 to 16 carbons. 50 12. The method according to claim 11, wherein the at least 7. The photosynthetic microorganism according to claim 1, one fatty acid derivative comprises at least one fatty aldehyde, wherein the photosynthetic microorganism is a microalga. at least one fatty alcohol, at least one fatty acid ester, at least 8. The photosynthetic microorganism according to claim 7. one wax ester, at least one alkane, at least one alkene, or a wherein the photosynthetic microorganism is of a genus combination thereof. selected from the group consisting of Achnanthes, 55 13. The method according to claim 11, wherein the growth Amphiprora, Amphora, Ankistrodesmus, Asteromonas, medium does not include a reduced carbon source. Boekelovia, Borodinella, Botryococcus, Bracteococcus, 14. The method according to claim 11, wherein the method Chaetoceros, Carteria, Chlamydomonas, Chlorococcum, further comprises isolating the at least one free fatty acid or Chlorogonium, Chlorella, Chroomonas, Chrysosphaera, fatty acid derivative from the culture. Cricosphaera, Crypthecodinium, Cryptomonas, Cyclotella, 60 15. The method according to claim 11, wherein the photo Dunaliella, Ellipsoidon, Emiliania, Eremosphaera, Ernodes synthetic microorganism is a microalga. mius, Euglena, Franceia, Fragilaria, Gloeothamnion, Hae 16. The method according to claim 15, wherein the photo matococcus, Halocafeteria, Hymenomonas, Isochrysis, Lep synthetic microorganism is of a genus selected from the group ocinclis, Micractinium, Monoraphidium, Nannochloris, consisting of Achnanthes, Amphiprora, Amphora, Anki Nannochloropsis, Navicula, Neochloris, Nephrochloris, 65 strodesmus, Asteromonas, Boekelovia, Borodinella, Botryo Nephroselmis, Nitzschia, Ochromonas, Oedogonium, Oocys coccus, Bracteococcus, Chaetoceros, Carteria, Chlamy tis, Ostreococcus, Pavlova, Parachlorella, Pascheria, Phaeo domonas, Chlorococcum, Chlorogonium, Chlorella, US 8,530,207 B2 91 92 Chroomonas, Chrysosphaera, Cricosphaera, Crypthecod rocapsa, Prochlorococcus, Prochloron, Prochlorothrix, inium, Cryptomonas, Cyclotella, Dunaliella, Ellipsoidon, Pseudanabaena, Rivularia, Schizothrix, Scytonema, Spir Emiliania, Eremosphaera, Ernodesmius, Euglena, Franceia, ulina, Stanieria, Starria, Stigonema, Symploca, Synechococ Fragilaria, Gloeothamnion, Haematococcus, Halocafeteria, cus, Synechocystis, Thermosynechococcus, Tolypothrix, Tri Hymenomonas, Isochrysis, Lepocinclis, Micractinium, Chodesmium, Tvchonema, and Xenococcus. Monoraphidium, Nannochloris, Nannochloropsis, Navicula, 19. The photosynthetic microorganism according to claim Neochloris, Nephrochloris, Nephroselmis, Nitzschia, 6, wherein at least 30 wt % of the at least one free fatty acid, Ochromonas, Oedogonium, Oocystis, Ostreococcus, Pav at least one fatty aldehyde, at least one fatty alcohol, at least lova, Parachlorella, Pascheria, Phaeodactylum, Phagus, one alkane, or at least one alkene produced by the photosyn Picochlorum, Platymonas, Pleurochrysis, Pleurococcus, 10 thetic microorganism has a chain length of 16 carbons. Prototheca, Pseudochlorella, Pseudoneochloris, Pyramimo 20. The photosynthetic microorganism according to claim nas, Pyrobotry's, Scenedesmus, Skeletonema, Spyrogyra, Sti 1, wherein the exogenous acyl-ACP thioesterase is selected chococcus, Tetraselmis, Thalassiosira, Viridiella, and Volvox. from the group consisting of: SEQID NO:1, SEQID NO:2, 17. The method according to claim 11, wherein the photo and variants thereofhaving at least 85% amino acid sequence Synthetic microorganism is a cyanobacterium. 15 identity to the amino acid sequence of SEQID NO:1 or 2. 18. The method according to claim 17, wherein the photo 21. A photosynthetic microorganism comprising a nucleic Synthetic microorganism is of a genus selected from the group acid molecule encoding an exogenous acyl-acyl carrier pro consisting of Agimenellum, Anabaena, Anabaenopsis, Ana tein (acyl-ACP)thioesterase selected from the group consist cystis, Aphanizomenon, Arthrospira, Asterocapsa, Borzia, ing of SEQ ID NO: 1, SEQ ID NO:2, and variants thereof Calothrix, Chamaesiphon, Chlorogloeopsis, Chroococcidi having at least 80% amino acid sequence identity to the amino Opsis, Chroococcus, Crinalium, Cyanobacterium, Cyano acid sequence of SEQID NO:1 or 2, wherein the acyl-ACP bium, Cyanocystis, Cyanospira, Cyanothece, Cylindrosper thioesterase, when queried against Pfam database hidden mopsis, Cylindrospermum, Dactylococcopsis, Markov models, demonstrates inclusion in Pfam PF01643 Dermocarpella, Fischerella, Fremyella, Geitleria, Gei with a bit score higher than 20.3 and an E-value of less than or tlerinema, Gloeobacter; Gloeocapsa, Gloeothece, Halospir 25 equal to 0.01, and wherein the photosynthetic microorganism ulina, Iyengariella, Leptolyngbya, Limnothrix, Lyngbya, produces at least 30 wt % of at least one fatty acid derivative Microcoleus, Microcystis, Myxosarcina, Nodularia, Nostoc, having a chain length of 16 carbons. Nostochopsis, Oscillatoria, Phormidium, Planktothrix, Pleu ck ck ck ck ck