USOO9096834B2

(12) United States Patent (10) Patent No.: US 9,096,834 B2 Seshadri et al. (45) Date of Patent: Aug. 4, 2015

(54) RECOMBINANT MICROORGANISMS 75. E: 858 Ritaanell et al...... 4 ... COMPRISING THOESTERASE AND 7,524,658 B2 4/2009 Damude et al. ... 435,134 LYSOPHOSPHATDIC ACID 7,537.920 B2 5/2009 Renz et al...... 435,194 ACYLTRANSFERASE FOR FATTY 7,608.443 B2 10/2009 Kinney et al. . 435/193 ACID PRODUCTION 7,871,804 B2 1/2011 Cirpus et al. .. 435/193 8,048,654 B2 11/2011 Berry et al...... 435,134 2006/0094.088 A1 5/2006 Picataggio et al. ... 435,134 (75) Inventors: Rekha Seshadri, San Diego, CA (US); 2006/0174376 A1* 8/2006 Renz et al...... 800/281 Jennie Lau Kit, Encinitas, CA (US); 2007. O184538 A1 8, 2007 Damude et al. ... 435,134 Nicholas Bauman, San Diego, CA (US); 2009/0209774 A1 8, 2009 Renz et al...... 554/20 2009,0271892 A1 10/2009 Thomasset et al. ... 800/281 Robert Christopher Brown, San Diego, 2009,0298143 A1 12/2009 Roessler et al...... 435,134 CA (US) 2010, O105963 A1 4/2010 Hu ...... 568,840 2010. 0317882 A1 12/2010 Yadav et al. .. 554,224 (73) Assignee: EXXONMOBIL RESEARCH AND 2011/0020883 A1 1/2011 Roessler et al...... 435,134 2011, 0023185 A1 1/2011 Renz et al...... 800/281 ENGINEERING COMPANY., 2011/O250659 A1 10/2011 Roberts et al. ... 435,134 Annandale, NJ (US) 2011/0321195 A1 12/2011 Taylor et al...... 800/281 2012,0060242 A1 3/2012 Senger et al...... 800,298 (*) Notice: Subject to any disclaimer, the term of this patent is extended or adjusted under 35 FOREIGN PATENT DOCUMENTS U.S.C. 154(b) by 0 days. WO WOOOf 626O1 10, 2000 (21) Appl. No.: 13/404,717 WO WOO3,O91413 11, 2003 WO WO 2005.005643 1, 2005 WO WO 2006/052807 5, 2006 (22) Filed: Feb. 24, 2012 WO WO 2007/133558 11, 2007 WO WO 2007 141257 12/2007 (65) Prior Publication Data WO WO 2008, 157827 12/2008 WO WO 2009,076.559 6, 2009 US 2013/0224810 A1 Aug. 29, 2013 WO WO 2009/14O696 11, 2009 WO WO 2010/042664 4/2010 (51) Int. Cl. WO WO 2010.075483 T 2010 CI2P 7/64 (2006.01) WO WO 2010/088426 8, 2010 CI2N L/20 (2006.01) WO WO 2010/13.5624 11, 2010 CI2N I/00 (2006.01) WO WO 2011/066137 6, 2011 CI2N L/12 (2006.01) OTHER PUBLICATIONS CI2N 9/16 (2006.01) CI2N 9/10 (2006.01) UniProtKB Accession P73054, Jan. 2011, 1 page.* (52) U.S. Cl. Lu et al., “A perspective: Photosynthetic production of fatty acid CPC ...... CI2N 9/1029 (2013.01); C12N 9/16 based biofuels in genetically engineered cyanobacteria'. Biotechnol. (2013.01); CI2P 7/6409 (2013.01); C12Y Advances 28:742-746, 2010.* Liu et al., PNAS 108:6905-6908, 2011.* 203/01051 (2013.01) Lu, X., Biotechnol. Adv. 28:742-746, 2010.* (58) Field of Classification Search Han, M., et al. (2008), "Proteome-level responses of Escherichia coli None to long-chain fatty acids and use of fatty acid inducible promoter in See application file for complete search history. production'. Journal of Biomedicine and Biotechnology, 2008:735101, doi:10.1155/2008/735101. (56) References Cited (Continued) U.S. PATENT DOCUMENTS 5.447,858 A 9/1995 Key et al...... 435,172.3 Primary Examiner — David J Steadman 5,455,167 A 10, 1995 Voelker et al. .. 435/1723 (74) Attorney, Agent, or Firm — Harness, Dickey & Pierce, 5,464,758 A 11/1995 Gossen et al...... 435/69.1 P.L.C. 5,563,058 A 10, 1996 Davies et al...... 435/193 5,639,952 A 6/1997 Quail et al...... 800,205 5,654.495 A 8, 1997 Voelker et al...... 800/250 (57) ABSTRACT 5,661,017 A 8/1997 Dunahay et al. 435/1723 The present invention provides for the improved production 5,689,044 A 1 1/1997 Ryals et al...... 800,205 5,750,385 A 5, 1998 Shewmaker et al. 435/1723 offatty acid products in a recombinant microorganism or host 5,814,618 A 9/1998 Bujard et al...... 514,44 cell engineered to express a non-native encoding a 5,824,858 A 10, 1998 Davies et al...... 800,205 thioesterase and a non-native gene encoding a lysophospha 5,851,796 A 12/1998 Schatz ...... 435/69.1 5,910,630 A 6, 1999 Davies et al. ... 800,295 tidic acid acyltransferase (LPAAT), in which the LPAAT is 5,968,791 A 10, 1999 Davies et al. ... 435/134 overexpressed in the recombinant microorganism compared 6,051,755 A 4/2000 Zou et al...... 800/281 to anotherwise identical microorganism that does not include 6,093,568 A 7/2000 Davies et al...... 435/419 the non-native LPAAT gene. The invention also provides 6,143,538 A 11/2000 Somerville et al...... 435,189 6,379,945 B1 4/2002 Jepson et al...... 435.243 methods of producing fatty acid products using such recom 6.410,828 B1 6/2002 Armstrong et al. . 800,287 binant microorganisms. 7,118,896 B2 10/2006 Kalscheuer et al. 435.134 7,135,290 B2 11/2006 Dillon ...... 435/6 32 Claims, 6 Drawing Sheets US 9,096,834 B2 Page 2

(56) References Cited Liu, X. etal. (2009), "Nickel-inducible lysis system in Synechocystis sp. PCC 6803”. Proc. Natl. Acad. Sciences USA 106: 21550-21554. OTHER PUBLICATIONS Maisonneuve, S., et al. (2010), “Expression of rapeseed microsomal lysophosphatidic acid acyltransferase isozymes enhances seed oil Quintana, N., et al. (2011), "Renewable energy from cyanobacteria: content in arabidopsis'', Plant Physiology, 152: 670-684. energy production optimization by metabolic pathway engineering'. Méndez-Alvarez, S., et al. (1994), “Transformation of chlorobium Appl Microbiol Biotechnol, 91: 471-490. limicola by a plasmid that confers the ability to utilize thiosulfate' Abe, J., et al. (2008), "Expression of exogenous genes under the Journal of Bacteriology, 176(23):7395-7397. control of endogenous HSP70 and CAB promoters in the closterium No, D., et al. (1996), “Ecdysone-inducible gene expression in mam peracerosum-strigosum-littorale complex”, Plant Cell Physiol, malian cells and transgenic mice'. Proc. Natl. Acad. Sci USA, 93: 49(4): 625-632. Altschul, S., et al. (1997), “Gapped BLAST and PSI-BLAST: a new 3346-3351. generation of protein database search programs'. Nucleic Acids Ohnuma M., et al. (2008), "Polyethylene glycol (PEG)-mediated Research, 25(17): 3389-3402. transient gene expression in a red alga, cyanidioschyzon merolae Athenstaedt, K., et al. (1999), "Phosphatidic acid, a key intermediate 10D, Plant Cell Physiol. 49(1): 117-120. in lipid metabolism”, Eur: J. Biochem, 266: 1-16. Okazaki, K., et al. (2006), The significance of C16 fatty acids in the Bateman, A., et al. (2000), “The pfam protein families database'. sn-2 Positions of glycerolipids in the photosynthetic growth of Nucleic Acids Research, 28(1):263-266. Synechocystis sp. PCC6803, Plant Physiology, 141: 546-556. Bateman, A., et al. (2004), “The pfam protein families database'. Perrone, C., et al. (1998), “The chlamydomonas IDA7 locus encodes Nucleic Acids Research, 32: Database Issue: D138-D141. a 140 kDa dynein intermediate chain required to assemble the Il Bourgis, F., et al. (1999), "A plastidial lysophosphatidic acid inner arm complex”. Molecular biology of the Cell, 9:3351-3365. acyltransferase from oilseed rape", Plant Physiology, 120.913-921. Quinn, J., et al. (2003), “Copper response element and crrl-dependent Brown, A., et al. (1995), “Identification of a cDNA that encodes a Ni responsive promoter for induced, reversible gene expression in 1-acyl-sn-glycerol-3-phosphate acyltransferase from Limnanthes Chlamydomonas reinhardtii', Eukaryotic Cell. 2(5): 995-1002. douglasii', Plant Molecular Biology, 29: 267-278. Rajasekharan, R., et al., “Fatty acid biosynthesis and regulation in Buikema, W., et al. (2000), "Expression of the anabaena hetR gene plants'. Plant Developmental Biology, 6: 105-115, 2010. from a copper-regulated promoter leads to heterocyst differentiation Ramesh, V., et al. (2004), "A simple method for chloroplast transfor under repressing conditions'. Proc. Natl. Acad. Sciences USA 98(5): mation in Chlamydomonas reinhardtii' Methods in Molecular Biol 2729-2734. ogy, 274:301-307. Finn, R., et al. (2006), “Pfam: clans, web tools and services'. Nucleic Acids Research, 34: Database Issue 34:D247-D251. Ravindran, C. etal. (2006), "Electroporation as a tool to transfer the Finn, R., et al. (2010), “The pham protein families database'. Nucleic plasmidpRL489 in Oscillatoria MKU277” Journal of Microbiologi Acids Research, 38: Database Issue 38:D211-D222. cal Methods, 66:174-176. Ghosh, A., et al. (2009), "Atag24160, a soluble acyl-coenzyme a-de Schroda, M., et al. (2000), “The HSP70A promoter as a tool for pendent lysophosphatidic acid acyltransferase'. Plant Physiology, improved expression of transgenes in Chlamydomonas', The plant 151: 869-881. journal 21(2): 121-131. Hallmann, A., et al. (1997), “Gene replacement by homologous Shindou, H., et al. (2009), "Acyl-coaclysophospholipid recombination in the multicellular green alga Volvox carteri”. Proc. acyltransferases”,Journal of Biological Chemistry, 284(1): 1-5. Natl. Acad. Sci USA, 94:7469-7474. Shui, G., et al. (2010), “Characterization of substrate preference for Handke, P. et al. (2011), "Application and engineering of fatty acid Slclp and Cst26p in Saccharomyces cerevisiae using lipidomic biosynthesis in Escherichia coli for advanced fuels and chemicals'. approaches and an LPAAT activity assay”. PLoS One, 5(8): e11956. Metabolic Engineering: 13:28-37. Sonnhammer, E., et al. (1998), “Pfam: multiple sequence alignments Henikoff, S., et al. (1992), “Amino acid substitution matrices from and HMM-profiles of protein domains' Nucleic Acids Research protein blocks”. Proc. Natl. Acad. Sci USA, 89:10915-10919. 26(1):320-322. International Search Report for PCT/US 12/26566 dated Jun. 18, Steinbrenner, J., et al. (2006), “Transformation of the Green Alga 2012. Haematococcus pluvialis with a phytoene Desaturase for accelerated Iwai. M., et al. (2004), "Improved genetic transformation of the astaxanthin biosynthesis' Applied and Environmental Microbiology thermophilic cyanobacterium, thermosynechoccus elongates BP-1”, 72(12):7477-7484. Plant Cell Physiol. 45(2): 171-175. Stemmer, W. (1994), “DNA shuffling by random fragmentation and Kaczmarzyk, D., et al. (2010), "Fatty acid activation in cyanobacteria reassembly: In vitro recombination for molecular evolution'. Proc. mediated by acyl-acyl carrier protein synthetase enables fatty acid Natl. Acad. Sciences USA, 91: 10747-10751. recycling. Plant Physiology, 152: 1598-1610. Tan, C., et al. (2005), “Establishment of a micro-particle bombard Karlin, S., et al., (1990), "Methods for assessing the statistical sig ment transformation system for Dunaliella saline', The Journal of nificance of molecular sequence features by using general scoring Microbiology 43:361-365. schemes', Proc. Natl. Acad. Sci. USA, 87: 2264-2268. Wiberg, E., et al. (2000), “The distribution of caprylate, caprate and Kim, H., et al. (2004), “Plastid lysophosphatidyl acyltransferase is laurate in lipids from developing and mature seeds of transgenic essential for embryo development in arabidopsis' Plant Physiology, Brassica napus L.”. Planta 212: 33-40. 134: 1206-1216. Weier, D., et al. (2005), “Characterisation of acyltransferases from Kim, H., et al. (2005), "Ubiquitous and endoplasmic reticulum Synechocystis sp. PCC6803”. Biochem. Biophys. Res. Commun, located lysophosphatidyl acyltransferase, LPAT2, is essential for 334(4): 1127-1134. female but not male gametophyte development inarabidopsis' The Yu, B., et al. (2004), “Loss of plastidic lysophosphatidic acid Plant Cell. 17: 1073-1089. acyltransferase causes Embryo-Lethality in Arabidopsis' Plant Cell Kindle, K., et al. (1989), "Stable nuclear transformation of Physiol. 45(5): 503-5 10. chlamydomonas using the Chlamydomonas gene for nitrate Zhang, Y, et al. (2008), "Acyltransferases in bacterial reductase'. The Journal of Cell Biology, 109 (6, Part 1), 2589-2601. glycerophospholipid synthesis”, Journal of Lipid Research, 49: Knutzon, D., et al. (1995), “Cloning of a coconut endosperm cDNA 1867. encoding a 1-Acyl-sn-glycerol-3-phosphate acyltransferase that Zheng, Z. et al. (2003), “Arabidopsis AtGPAT1, a Member of the accepts medium-chain-length substrates' Plant Physiol. 109: 999 membrane-bound glycerol-3-phosphate acyltransferase gene family, 1006. is essential for tapetum differentiation and male fertility”. 15: 1872 Lassner, M., et al. (1995), “Lysophosphatidic acid acyltransferase 1887. from meadowfoam mediates insertion of erucic acid at the sn-2 Zou, J., et al. (1997), “Modification of seed oil content and acyl position of triacylglycerol in transgenic rapeseed oil'. Plant Physiol. composition in the Brassicaceae by expression of a yeast sn-2', The 1.09: 1389-1394. Plant Cell, 9:909-923. US 9,096,834 B2 Page 3

(56) References Cited Position of Triacylglycerols in Lauric Rapeseed Oil and Can Increase Total Laurate Levels, Plant Physiology, 1999, vol. 120, pp. 739-746. OTHER PUBLICATIONS Thelen, J.J. & J.B. Ohlrogge Metabolic Engineering of Fatty Acid Biosynthesis in Plants, Metabolic Engineering, 2002, vol. 4, pp. Sun, Y., et al. (2006), "Functional complementation of a nitrate 12-21. reductase defective mutant of a green alga dunaliella viridis by intro International Preliminary Report on Patentability dated Aug. 26. ducing the nitrate reductase gene'. Gene 377: 140-149. 2014 issued in PCT Patent Application No. PCT/US2012/026566. Knutzon, D.S., et al. Lysophosphatidic Acid Acyltransferase from Coconut Endosperm Mediates the Insertion of Laurate at the sn-2 * cited by examiner U.S. Patent Aug. 4, 2015 Sheet 1 of 6 US 9,096,834 B2

88. synthesis

8:::::::::38, 8i xxyias: 8

Nebrane

X xx c14:0-AcPC12:0-ACP x cis-Ace acaca as 8:88-8:8 cci Fat 3 FFA

Figure 1 U.S. Patent Aug. 4, 2015 Sheet 2 of 6 US 9,096,834 B2

pUC Origin N ... --- la -- RS2-down

// trCY

RS2-up st: 1752

Figure 2 U.S. Patent Aug. 4, 2015 Sheet 3 of 6 US 9,096,834 B2

TrcY

CC1 FatB

P15A OR - pSG-NB158 6414 bp

TlacQ

Figure 3 U.S. Patent Aug. 4, 2015 Sheet 4 of 6 US 9,096,834 B2

ac promoter N siri B09KO up

puC origin

---. pSG-NB19 6663 bp

str1609KO down

Kan promoter Origin Figure 4 U.S. Patent Aug. 4, 2015 Sheet 5 of 6 US 9,096,834 B2

PAATAASKO Experiment - Reduce Feedback inhibition Non-Normalized Totai FFA 3 day induction} 400.0 gr. 350.0 ico

s g & s : : s ax aw t t

Figure 5 U.S. Patent Aug. 4, 2015 Sheet 6 of 6 US 9,096,834 B2 US 9,096,834 B2 1. 2 RECOMBINANT MICROORGANISMS lysophosphatidic acid to form phosphatidic acid, a precursor COMPRISING THOESTERASE AND of membrane lipids. The LPAAT preferably has a different LYSOPHOSPHATDIC ACID acyl chain length Substrate preference than the acyl chain ACYLTRANSFERASE GENES FOR FATTY length substrate preference of the thioesterase produced by ACID PRODUCTION expression of a non-native gene by the recombinant microor ganism. Additional modifications of the host microorganism can optionally include attenuation of an acyl-ACP synthetase REFERENCE TO ASEQUENCE LISTING gene or an acyl-CoA synthetase gene, which participate in This application contains references to amino acid recycling and degradation, respectively, offatty acids, and/or sequences and/or nucleic acid sequences which have been 10 overexpression of one or more polypeptides having lipolytic Submitted concurrently herewith as the sequence listing text activity (e.g., lipases) that can release fatty acids from lipids file “61099863 1.txt, file size 127 KiloBytes (KB), created (see e.g., FIG. 1). on May 10, 2013. The aforementioned sequence listing is In one aspect the invention provides a recombinant micro hereby incorporated by reference in its entirety pursuant to 37 organism that includes a non-native thioesterase gene, in C.F.R. S.1.52(e)(5). 15 which the recombinant microorganism further includes a non-native nucleic acid molecule that includes a sequence FIELD OF THE INVENTION encoding a lysophosphatidic acid acyltransferase (LPAAT), in which the microorganism produces a fatty acid product. A The present invention relates to the fields of bioengineer fatty acid product can be, for example, a free fatty acid, a fatty ing, metabolic biochemistry, and molecular biology. In par aldehyde, a fatty alcohol, a fatty acid ester, a wax ester, an ticular, the invention relates to the production of fatty acid alkane, or an alkene. The thioesterase gene can be indepen products in a recombinant microorganism or host cell engi dently synthesized using synthetic genomic techniques, or neered to reduce or minimize feedback inhibition offatty acid alternatively derived from the same or a different species with synthesis enzymes and to methods of producing fatty acid respect to the host microorganism, and can encode an acyl products using such recombinant microorganisms or host 25 ACP thioesterase, an acyl-CoA thioesterase, or a 4-hydroxy cells. benzoyl thioesterase. The thioesterase can be, for example, a thioesterase derived from a prokaryotic species or a plant BACKGROUND OF THE INVENTION species. Alternatively or in addition, a non-native thioesterase gene expressed by the host microorganism can be an endog Producing renewable sources for a variety of fuels and 30 enous thioesterase gene that is operably linked to a heterolo chemicals is of great importance to a world with increasing gous promoter. For example, the thioesterase gene may be an demand for such products. While petroleum is a product of endogenous thioesterase gene, and a heterologous promoter decayed plant and other matter that has been incubated can be introduced into the microorganism to regulate expres beneath the earth's surface for millions of years, some efforts sion of the endogenous thioesterase gene. today focus on the direct use of plants and other organisms to 35 The recombinant microorganism that includes a non-native generate, e.g., lipids, which can include fatty acids and thioesterase gene further includes a non-native nucleic acid derivatives thereof, foruse in the fuel and chemical industries. molecule that includes a sequence encoding an LPAAT, Fatty acids are composed of long alkyl chains, similar to where the LPAAT-encoding sequence can be homologous petroleum, and are a primary metabolite used by cells for both (originating from the same species) or heterologous (origi chemical and energy storage functions. Improving the Scal 40 nating from a different species) with respect to the host micro ability, controllability, and cost-effectiveness of producing organism. For example, a non-native LPAAT gene from the fatty acids and fatty acid derivatives (collectively “fatty acid same or a different species, preferably operably linked to a products”) would be beneficial to the development of renew heterologous promoter, can be introduced into the host micro able energy and chemical sources. organism, or alternatively, a homologous or heterologous Fatty acid synthesis in a cell is a repeating cycle where 45 LPAAT gene can be introduced into the host microorganism malonyl-acyl carrier protein (malonyl-ACP) is condensed such that the LPAAT gene recombines into the host genome with a carbon substrate to form a growing chain of acyl-ACP. and becomes operably linked to a promoter that is endog Fatty acid production by engineered cyanobacteria has been enous to the host. Further alternatively, a recombinant micro described in U.S. Appl. Pub. No. 2009/0298143, which dis organism that includes a non-native LPAAT gene can com closes introducing a non-native gene encoding a fatty-acyl 50 prise an endogenous LPAAT gene residing in the genome of ACP thioesterase to release fatty acids from ACP. However, as the host microorganism operably linked to a heterologous with many cellular processes, feedback inhibition by inter promoter introduced into the microorganism and integrated mediates and/or various end-products may limit production into the genome, for example, by homologous recombination. of fatty acids. For example, feedback inhibition of key fatty In further examples, a non-native nucleic acid molecule that acid synthesis enzymes, such as ACCase, FabH, and Fab, is 55 includes an LPAAT gene in a microorganism provided herein mediated by medium- to long-chain acyl-ACPs. See Handke can encode an LPAAT derived from a prokaryotic or eukary et al., Metabolic Eng. 13:28-37 (2011); FIG. 1 (dashed line). otic LPAAT (e.g., an LPAAT derived from a higher plant, bryophyte, algal, cyanobacterial, bacterial, fungal, or heter SUMMARY OF THE INVENTION okont (stramenopile) LPAAT, or derived from an animal 60 LPAAT, e.g., a mammalian or insect LPAAT), where a The present invention describes additional genetic modifi eukaryotic LPAAT gene can encode a plastidal, mitochon cations for improving fatty acid product yields in a microor drial, or microsomal LPAAT or an LPAAT derived from the ganism or host cell engineered to express a non-native gene sequence of a plastidal, mitochondrial, or microsomal encoding a thioesterase. Such modifications include expres LPAAT. sion of an additional non-native gene encoding lysophospha 65 The recombinant microorganism can comprise at least one tidic acid acyltransferase (LPAAT), which uses acyl-ACP or non-native LPAAT gene and can further comprise at least one acyl-CoA substrates to acylate the Sn-2 hydroxyl group of non-native thioesterase gene that can cleave an acylthioester US 9,096,834 B2 3 4 substrate, in which the LPAAT and the thioesterase preferably at least 60%, at least 65%, at least 70%, at least 75%, at least have different acyl chain length substrate specificities. For 80%, at least 85%, at least 90%, at least 95%, at least 96%, at example, the recombinant microorganism can include a non least 97%, at least 98%, at least 99%, or about 100% identity native nucleic acid molecule encoding an LPAAT that has a to SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO:4, SEQ ID Substrate preference for one or more long chain acyl Sub NO:5, SEQID NO:6, SEQID NO:7, SEQID NO:8, SEQID strates (e.g., C16 and/or C18 acyl-ACP or acyl-CoA) and a NO:9, SEQID NO:10, SEQID NO:11, SEQID NO:12, SEQ non-native gene encoding a thioesterase that has a Substrate ID NO:13, SEQID NO:14, SEQID NO:15, SEQID NO:16, preference for one or more medium chain acyl Substrates SEQ ID NO:17, SEQ ID NO:18, SEQ ID NO:19, SEQ ID (e.g., C8, C10, C12, and/or C14 acyl-ACP substrates). Alter NO:20, SEQ ID NO:21, SEQ ID NO:22, SEQ ID NO:23, natively, the recombinant microorganism can include a non 10 SEQ ID NO:24, SEQ ID NO:25, or SEQ ID NO:26. The native nucleic acid molecule encoding an LPAAT that has a LPAAT can be homologous with respect to the host microor Substrate preference for one or more medium chain acyl Sub ganism, for example, the host microorganism can be a cyano strates (e.g., C8, C10, C12, and/or C14 acyl-ACP substrates) bacterial species and the LPAAT produced by expression of and a non-native gene encoding a thioesterase that has a the non-native nucleic acid sequence can be an LPAAT Substrate preference for one or more long chain acyl Sub 15 derived from the same cyanobacterial species. In an alterna strates (e.g., C16 and/or C18 acyl-ACP or acyl-CoA). In tive, a host microorganism can be a cyanobacterial species another example, the recombinant microorganism can and can include a non-native gene encoding an LPAAT include a non-native gene encoding an LPAAT that has a derived from a different cyanobacterial species or from a substrate preference for a C16 acyl substrate (e.g., C16 acyl non-cyanobacterial species. ACP and/or C16 acyl-CoA) and a non-native gene encoding Additionally to any of the foregoing, a recombinant micro a thioesterase that has a Substrate preference for one or more organism that produces a fatty acid product as disclosed of a C8, C10, C12, or C14 acyl substrate (e.g., C8, C10, C12, herein can include a recombinant nucleic acid molecule that and/or C14 acyl-ACP). Alternatively, the recombinant micro includes a sequence that encodes an LPAAT, in which the organism can include a non-native gene encoding an LPAAT LPAAT-encoding sequence is operably linked to a heterolo that has a substrate preference for C18 acyl substrates (e.g., 25 gous promoter. The heterologous promoter can be a regulat C18 acyl-ACP and/or C18 acyl-CoA) and a non-native gene able promoter, and can optionally be an inducible promoter. encoding a thioesterase that has a Substrate preference for one For example, the promoter can be induced by light, tempera or more of a C8, C10, C12, C14, or C16 acyl substrate (e.g., ture, or the addition or removal of a compound to the growth C8, C10, C12, C14, and/or C16 acyl-ACP). In further non media, Such as for example, a Sugar or Sugar analogue, a limiting examples, a recombinant microorganism can include 30 hormone, a salt, a metal, etc. In particular examples a nucleic a non-native gene encoding an LPAAT that has a substrate acid sequence encoding a thioesterase gene expressed by preference for a C18 acyl substrate and a non-native gene recombinant microorganism is operably linked to a heterolo encoding a thioesterase that has a Substrate preference for a gous promoter that can be, for example, regulatable, and can C16 acyl substrate; a recombinant microorganism can optionally be inducible. Additionally, an LPAAT-encoding include a non-native gene encoding an LPAAT that has a 35 nucleic acid sequence and a thioesterase-encoding nucleic substrate preference for a C18 acyl substrate and a non-native acid sequence can be operably linked to different copies of the gene encoding a thioesterase gene that has a Substrate pref same promoter, or to different promoters regulated by the erence for C14 and C16 acyl substrates; or a recombinant same or different conditions or compound(s). Alternatively, microorganism can include a non-native gene encoding an an LPAAT-encoding nucleic acid sequence and a LPAAT that has a substrate preference for a C18 acyl sub 40 thioesterase-encoding nucleic acid sequence can be operably strate and a non-native gene encoding a thioesterase that has linked to the same promoter, for example, the LPAAT-encod a substrate preference for C12, C14, and C16 acyl substrates. ing nucleic acid sequence and the thioesterase-encoding In further examples, a microorganism can include a non sequence can be organized as an operon. native gene encoding an LPAAT that has a Substrate prefer Additionally, a recombinant microorganism as provided ence for a C16 acyl Substrate and a non-native gene encoding 45 herein that includes a non-native thioesterase gene and a a thioesterase that has a substrate preference for a C14 acyl non-native LPAAT gene can optionally further comprise an Substrate; a microorganism can include a non-native gene attenuated or disrupted acyl-ACP synthetase gene or acyl encoding an LPAAT that has a substrate preference for a C16 CoA synthetase gene. For example, the endogenous acyl acyl Substrate and a non-native gene encoding a thioesterase ACP synthetase gene or acyl-CoA synthetase gene can be that has a substrate preference for a C12 acyl substrate, and a 50 targeted by antisense RNA, one or more micro RNAs, an microorganism can include a non-native gene encoding an siRNA (e.g., generated by a shRNA construct), or one or more LPAAT that has a substrate preference for a C16 acyl sub ribozymes, to reduce expression of the acyl-ACP synthetase strate and a non-native gene encoding a thioesterase that has gene or acyl-CoA synthetase gene. Alternatively, the endog a substrate preference for C12 and C14 acyl substrates. enous acyl-ACP synthetase gene or acyl-CoA synthetase A recombinant microorganism can include a non-native 55 gene can be interrupted, truncated, or deleted, for example, by nucleic acid molecule that includes a nucleic acid sequence insertional mutagenesis, meganuclease genome modifica that encodes an LPAAT that is independently synthesized tion, and/or homologous recombination. using synthetic genomic techniques, or alternatively derived The recombinant microorganism that includes a non-native from any of a variety of organisms including, for example, LPAAT gene and a non-native thioesterase gene can further animals, fungi, heterokonts, plants, algae, and bacteria, 60 optionally include a non-native gene encoding a polypeptide including cyanobacteria. For example, a recombinant micro having lipolytic activity, which can be, for example, a lipase, organism can include a non-native nucleic acid molecule that an esterase, a cutinase, or an amidase. Additionally or alter includes a nucleic acid sequence that encodes an LPAAT natively, the recombinant microorganism can have an attenu derived from a cyanobacterial, bacterial, or plant species, for ated gene that encodes an acyl-ACP synthetase, an acyl-CoA example, a recombinant microorganism can include a nucleic 65 synthetase, or an acyl-CoA oxidase. acid molecule encoding a sequence that encodes an LPAAT Further additionally or alternatively, a recombinant micro that has at least 40%, at least 45%, at least 50%, at least 55%, organism that includes a non-native LPAAT gene and a non US 9,096,834 B2 5 6 native thioesterase gene can further optionally include one or Also provided herein are methods for producing a fatty more non-native genes that encode enzymes for producing acid product, where the method includes culturing any of the fatty acid derivatives such as fatty aldehydes, fatty alcohols, microorganisms disclosed herein under conditions in which fatty acid esters, wax esters, alkanes, and alkenes. Non-lim at least one fatty acid product is produced. For example, a iting examples of Such enzymes are aldehyde-forming acyl recombinant microorganism that includes at least one non ACP reductases, aldehyde-forming acyl-CoA reductases, native gene encoding a thioesterase and at least one non carboxylic acid reductases, aldehyde-forming fatty acid native gene encoding an LPAAT can be cultured in a Suitable reductases, alcohol-forming acyl-ACP reductases, alcohol culture medium Such that the non-native genes are expressed forming acyl-CoA reductases, acyltransferases, wax syn to produce at least one free fatty acid or at least one fatty acid thases, fatty aldehyde decarbonylases, and fatty acid decar 10 derivative. The methods can be used to produce a free fatty boxylases. acid or a fatty acid derivative Such as a fatty aldehyde, a fatty alcohol, a fatty acid ester, a wax ester, or a hydrocarbon. In The recombinant microorganism that includes a non-native Some examples, the recombinant microorganism produces at LPAAT gene and a non-native thioesterase gene can produce least one Cs to C free fatty acid, such as at least one C to at least 10% more of a fatty acid product that a microorganism 15 Cs free fatty acid, Such as, for example, a C to C free fatty identical in all respects except that it lacks a non-native acid. Alternatively or in addition, the fatty acid or fatty acid nucleic acid molecule that comprises a nucleic acid sequence derivative produced using the methods provided herein com encoding an LPAAT. For example, a recombinant microor prises at least one Cs to C fatty alcohol or fatty aldehyde, ganism can produce at least 20%, at least 30%, at least 40%, such as at least one C to Cs fatty alcohol or fatty aldehyde, at least 50%, at least 60%, at least 70%, at least 80%, at least Such as a C to C fatty alcohol or fatty aldehyde; or at least 90%, at least 100%, at least 150%, at least 200%, at least one wax ester having an A chain of Cs to C or C to Cs, or 400%, at least 600%, at least 800%, or at least 1000% more for example, an A chain of C2 to C, and a B chain of Cs to than the amount of the fatty acid product produced by an C24 or C2 to C1s, or for example, a B chain of C2 to Ce; or otherwise identical microorganism not including the non at least one C to C alkane or alkene, such as a C to Cs native nucleic acid molecule that comprises a nucleic acid 25 alkane or alkene. The method can further include isolating a sequence encoding an LPAAT cultured under identical con fatty acid product from the microorganism, the culture ditions. medium, or a combination thereof. Additionally, the recom The recombinant microorganism can be any microorgan binant microorganism may secrete at least a portion of the ism, for example, a eubacterium, archaebacterium, cyano fatty acid product into the culture medium, and optionally all bacterium, microalga, fungus, yeast, or heterokont. The 30 or a portion of the secreted product can be isolated from the recombinant microorganism can be a photosynthetic micro culture medium. organism, for example, a microalga or cyanobacterium. In In various methods, the amount of fatty acid product pro Some examples a microalga can be a member of the genus duced by the recombinant microorganism can be at least 10% Achnanthes, Amphiprora, Amphora, Ankistrodesmus, Astero greater, at least 20% greater, at least 30% greater, at least 40% monas, Boekelovia, Borodinella, Botryococcus, Bracteococ 35 greater, at least 50% greater, at least 60% greater, at least 70% cus, Chaetoceros, Carteria, Chlamydomonas, Chlorococ greater, at least 80% greater, at least 90% greater, or at least Citin, Chlorogonium, Chlorella, Chroomonas, 100% greater than the amount produced by an otherwise Chrysosphaera, Cricosphaera, Crypthecodinium, Crypto identical microorganism lacking the non-native nucleic monas, Cyclotella, Dunaliella, Ellipsoidon, Emiliania, sequence encoding an LPAAT that is cultured under the same Eremosphaera, Ernodesmius, Euglena, Franceia, Fragilaria, 40 conditions. Gloeothamnion, Haematococcus, Halocafeteria, Hymeno Further provided is a fatty acid product made using the in OraS, Isochrysis, Lepocinclis, Micractinium, methods of the invention. The fatty acid product can be, for Monoraphidium, Nannochloris, Nannochloropsis, Navicula, example, a fatty acid, a fatty alcohol, a fatty aldehyde, an Neochloris, Nephrochloris, Nephroselmis, Nitzschia, alkane, an alkene, a fatty acid ester, or a wax ester. The fatty Ochromonas, Oedogonium, Oocystis, Ostreococcus, Pav 45 acid product can have at least one carbon chain having a lova, Parachlorella, Pascheria, Phaeodactylum, Phagus, length of C7 to C, and in Some preferred embodiments, the Picochlorum, Platymonas, Pleurochrysis, Pleurococcus, fatty acid product includes one or more free fatty acids, fatty Prototheca, Pseudochlorella, Pseudoneochloris, Pyramino aldehydes, fatty alcohols, alkanes, or alkenes having C to nas, Pyrobotrys, Scenedesmus, Skeletonema, Spyrogyra, Sti Cs chain lengths, for example, chain lengths of C to C, or chococcus, Tetraselmis, Thalassiosira, Viridiella, or Volvox. 50 one or more fatty acid esters having one or both of an A chain Alternatively, the recombinant microorganism can be a and a B chain of Cs to C2a, or of C2 to C1s, for example, C2 cyanobacterium, for example, of the genus Agmenellum, to C. For example, a wax ester produced using the methods Anabaena, Anabaenopsis, Anacystis, Aphanizomenon, provided herein can have an A and a B chain of Cs to C, or Arthrospira, Asterocapsa, Borzia, Calothrix, Chamaesiphon, of C2 to Cls, or in Some examples of C to C. Chlorogloeopsis, Chroococcidiopsis, Chroococcus, Crina 55 lium, Cyanobacterium, Cyanobium, Cyanocystis, Cyano BRIEF DESCRIPTION OF THE DRAWINGS spira, Cyanothece, Cylindrospermopsis, Cylindrospermum, Dactylococcopsis, Dermocarpella, Fischerella, Fremyella, FIG. 1 is a schematic depicting genetic modifications pro Geitleria, Geitlerinema, Gloeobacter; Gloeocapsa, Gloeoth posed to relieve feedback inhibition offatty acid biosynthesis ece, Halospirulina, Iyengariella, Leptolyngbya, Limnothrix, 60 by reducing the pool of C18-ACP. Abbreviations are FFA: Lyngbya, Microcoleus, Microcystis, Myxosarcina, Nodu free fatty acid, AAS: acyl-ACP synthase, Cc 1 FatB1: Cuphea laria, Nostoc, Nostochopsis, Oscillatoria, Phormidium, carthagenensis acyl-ACP thioesterase with a C12, C14, and Planktothrix, Pleurocapsa, Prochlorococcus, Prochloron, C16 substrate preference, and Sll 1752C18:0: LPAAT of Syn Prochlorothrix, Pseudanabaena, Rivularia, Schizothrix, Scy echocystis sp. 6803 with a C18 acyl substrate preference. tonema, Spirulina, Stanieria, Starria, Stigonema, Symploca, 65 FIG. 2 is a schematic representation of an example of an Synechococcus, Synechocystis, Thermosynechococcus, Toly integration vector that includes a nucleic acid sequence pothrix, Trichodesmium, Tvchonema or Xenococcus. encoding an LPAAT (sl11752). US 9,096,834 B2 7 8 FIG. 3 is a schematic representation of an example of an polypeptide coding sequences). Genes may further comprise integration vector that includes a nucleic acid sequence the regulatory sequences required for their expression. Genes encoding an acyl-ACP thioesterase (CC1FatB1). can be obtained from a variety of sources, including cloning FIG. 4 is a schematic representation of an example of a from a source of interest or synthesizing from known or vector for disrupting a gene encoding an acyl-ACP synthetase predicted sequence information, and may include sequences (slr1609). designed to have desired parameters. FIG. 5 is a graph depicting non-normalized total free fatty The term “nucleic acid' or “nucleic acid molecule' refers acid production (ug/mL) of Synechocystis sp. 6803 strains to, e.g., DNA or RNA (e.g., mRNA). The nucleic acid mol comprising (left to right) the CC1FatB1 acyl-ACP ecules can be double-stranded or single-stranded; single thioesterase gene (Control 1A), the recombinant sll 1752 gene 10 stranded RNA or DNA can be the coding (sense) strand or the encoding a C18:0 LPAAT (sl11752), an attenuated acyl-ACP non-coding (antisense) strand. synthetase gene (AASKO), followed by three separate clones A nucleic acid molecule may be "derived from an indi (Patch 3, Patch 5, Patch 6) containing the CC1FatB1 acyl cated Source, which includes the isolation (in whole or in part) ACP thioesterase gene in combination with the inducible of a nucleic acid segment from an indicated Source or the sl1752 gene (1A/sl11752), and finally the sl1752 LPAAT 15 purification of a polypeptide from an indicated Source. A gene in combination with the attenuated acyl-ACP synthetase nucleic acid molecule may also be derived from an indicated gene (AASKO). Source by, for example, direct cloning, PCR amplification, or FIG. 6 is a table providing the amounts (ug/mL) of free artificial synthesis from the indicated polynucleotide source fatty acids of particular chain lengths and degree of saturation or based on a sequence associated with the indicated poly produced by Synechocystis sp. PCC 6803 strains comprising nucleotide source. Genes or nucleic acid molecules derived the CC1FatB1 acyl-ACP thioesterase gene (Control 1A), the from a particular source or species also include genes or recombinant sl11752 gene encoding a C18:0 LPAAT nucleic acid molecules having sequence modifications with (sl11752), an attenuated acyl-ACP synthetase gene respect to the source nucleic acid molecules. For example, a (AASKO), three separate clones (Clone 3, Clone 5, Clone 6) gene or nucleic acid molecule derived from a source (e.g., a containing the CC1FatB1 acyl-ACP thioesterase gene as well 25 particular referenced gene) can incur one or more mutations as the recombinant sl11752 gene (1A/s111752), and the recom with respect to the source gene or nucleic acid molecule that binant sl1752 gene in combination with the attenuated acyl are unintended or that are deliberately introduced, and if one ACP synthetase gene (sl1752/AASKO). or more mutations, including Substitutions, deletions, or insertions, are deliberately introduced the sequence alter DETAILED DESCRIPTION OF THE INVENTION 30 ations can be introduced by random or targeted mutation of cells or nucleic acids, by amplification or other molecular Definitions biology techniques, or by chemical synthesis. A gene or nucleic acid molecule that is derived from a referenced gene Unless defined otherwise, all technical and scientific terms or nucleic acid molecule that encodes a functional RNA or used herein have the same meaning as commonly understood 35 polypeptide can encode a functional RNA or polypeptide by one of ordinary skill in the art to which this invention having at least 70%, at least 75%, at least 80%, at least 85%, belongs. In case of conflict, the present application including at least 90%, at least 95%, at least 96%, at least 97%, at least the definitions will control. Unless otherwise required by 98%, or at least 99% sequence identity with the referenced or context, singular terms shall include pluralities and plural source functional RNA or polypeptide, or to a functional terms shall include the singular. All publications, patents and 40 fragment thereof. For example, a gene or nucleic acid mol other references mentioned herein are incorporated by refer ecule that is derived from a referenced gene or nucleic acid ence in their entireties for all purposes as if each individual molecule that encodes a functional RNA or polypeptide can publication or patent application were specifically and indi encode a functional RNA or polypeptide having at least 85%, vidually indicated to be incorporated by reference. at least 90%, at least 95%, at least 96%, at least 97%, at least Although methods and materials similar or equivalent to 45 98%, or at least 99% sequence identity with the referenced or those described herein can be used in practice or testing of the source functional RNA or polypeptide, or to a functional present invention, Suitable methods and materials are fragment thereof. described below. The materials, methods and examples are As used herein, an "isolated nucleic acid or protein is illustrative only and are not intended to be limiting. Other removed from its natural milieu or the context in which the features and advantages of the invention will be apparent 50 nucleic acid or protein exists in nature. For example, an iso from the detailed description and from the claims. lated protein or nucleic acid molecule is removed from the To facilitate an understanding of the present invention, a cell or organism with which it is associated in its native or number of terms and phrases are defined below. natural environment. An isolated nucleic acid or protein can As used in the present disclosure and claims, the singular be, in some instances, partially or Substantially purified, but forms “a,” “an and “the' include plural forms unless the 55 no particular level of purification is required for isolation. context clearly dictates otherwise. Thus, for example, an isolated nucleic acid molecule can be a Wherever embodiments are described herein with the lan nucleic acid sequence that has been excised from the chro guage "comprising otherwise analogous embodiments mosome, genome, or episome that it is integrated into in described in terms of “consisting of and/or “consisting nature. essentially of are also provided. 60 A "purified nucleic acid molecule or nucleotide sequence, The term “and/or as used in a phrase such as “A and/or B' or protein or polypeptide sequence, is Substantially free of herein is intended to include A and B, “A or B', 'A', and cellular material and cellular components. The purified “B”. nucleic acid molecule or protein may be free of chemicals The term “gene' is used broadly to refer to any segment of beyond buffer or solvent, for example. “Substantially free” is nucleic acid molecule (typically DNA, but optionally RNA) 65 not intended to mean that other components beyond the novel encoding a protein or expressed RNA. Thus, genes include nucleic acid molecules are undetectable. In some circum sequences encoding expressed RNA (which can include stances “substantially free” may mean that the nucleic acid US 9,096,834 B2 10 molecule or nucleotide sequence is free of at least 95% (w/w) The term “recombinant protein’ as used herein refers to a of cellular material and components. protein produced by genetic engineering. The terms “naturally-occurring and “wild-type” refer to a When applied to organisms, the term recombinant, engi form found in nature. For example, a naturally occurring or neered, or genetically engineered refers to organisms that wild-type nucleic acid molecule, nucleotide sequence or pro 5 have been manipulated by introduction of a heterologous or tein may be present in and isolated from a natural source, and recombinant nucleic acid sequence into the organism, and is not intentionally modified by human manipulation. includes gene knockouts, targeted mutations and gene As used herein “attenuated” means reduced in amount, replacement, promoter replacement, deletion, or insertion, as degree, intensity, or strength. Attenuated gene expression well as introduction of transgenes into the organism. Recom 10 binant or genetically engineered organisms can also be organ may refer to a significantly reduced amount and/or rate of isms into which constructs for gene “knock down” have been transcription of the gene in question, or of translation, fold introduced. Such constructs include, but are not limited to, ing, or assembly of the encoded protein. As nonlimiting RNAi, microRNA, shRNA, antisense, and ribozyme con examples, an attenuated gene may be a mutated or disrupted structs. Also included are organisms whose genomes have gene (e.g., a gene disrupted by partial or total deletion, or 15 been altered by the activity of meganucleases or Zinc finger insertional mutation) or having decreased expression due to nucleases. The heterologous or recombinant nucleic acid alteration of gene regulatory sequences. molecule can be integrated into the recombinant/genetically "Exogenous nucleic acid molecule' or “exogenous gene’ engineered organism's genome or in other instances are not refers to a nucleic acid molecule or gene that has been intro integrated into the recombinant/genetically engineered duced (“transformed') into a cell. A transformed cell may be organism's genome. As used herein, “recombinant microor referred to as a recombinant cell, into which additional exog ganism’ or “recombinant host cell includes progeny or enous gene(s) may be introduced. A descendent of a cell derivatives of the recombinant microorganisms of the inven transformed with a nucleic acid molecule is also referred to as tion. Because certain modifications may occur in Succeeding “transformed if it has inherited the exogenous nucleic acid generations due to either mutation or environmental influ molecule. The exogenous gene may be from a different spe 25 ences. Such progeny or derivatives may not, in fact, be iden cies (and so "heterologous'), or from the same species (and so tical to the parent cell, but are still included within the scope “homologous'), relative to the cell being transformed. An of the term as used herein. "endogenous nucleic acid molecule, gene or protein is a The term “promoter” refers to a nucleic acid sequence native nucleic acid molecule, gene or protein as it occurs in, or capable of binding RNA polymerase in a cell and initiating is naturally produced by, the host. 30 transcription of a downstream (3' direction) coding sequence. The term “native' is used herein to refer to nucleic acid A promoter includes the minimum number of bases or ele sequences or amino acid sequences as they naturally occur in ments necessary to initiate transcription at levels detectable the host. The term “non-native' is used herein to refer to above background. A promoter can include a transcription nucleic acid sequences or amino acid sequences that do not initiation site as well as protein binding domains (consensus occur naturally in the host. A nucleic acid sequence or amino 35 sequences) responsible for the binding of RNA polymerase. acid sequence that has been removed from a host cell. Sub Eukaryotic promoters often, but not always, contain “TATA’ jected to laboratory manipulation, and introduced or reintro boxes and "CAT" boxes. Prokaryotic promoters may contain duced into a host cell is considered “non-native.” Synthetic or -10 and -35 prokaryotic promoter consensus sequences. A partially synthetic genes introduced into a host cell are “non large number of promoters, including constitutive, inducible native. Non-native genes further include genes endogenous 40 and repressible promoters, from a variety of different sources to the host microorganism operably linked to one or more are well known in the art. Representative sources include for heterologous regulatory sequences that have been recom example, viral, mammalian, insect, plant, yeast, and bacterial bined into the host genome. cell types, and Suitable promoters from these sources are A “recombinant’ or “engineered nucleic acid molecule is readily available, or can be made synthetically, based on a nucleic acid molecule that has been altered through human 45 sequences publicly available on line or, for example, from manipulation. As non-limiting examples, a recombinant depositories such as the ATCC as well as other commercial or nucleic acid molecule that: 1) has been synthesized or modi individual sources. Promoters can be unidirectional (i.e., ini fied in vitro, for example, using chemical or enzymatic tech tiate transcription in one direction) or bi-directional (i.e., niques (for example, by use of chemical nucleic acid synthe initiate transcription in both directions off of opposite sis, or by use of enzymes for the replication, polymerization, 50 Strands). A promoter may be a constitutive promoter, a digestion (exonucleolytic or endonucleolytic), ligation, repressible promoter, or an inducible promoter. Non-limiting reverse transcription, transcription, base modification (in examples of promoters include, for example, the T7 pro cluding, e.g., methylation), integration or recombination (in moter, the cytomegalovirus (CMV) promoter, the SV40 pro cluding homologous and site-specific recombination)) of moter, and the RSV promoter. Examples of inducible promot nucleic acid molecules; 2) includes conjoined nucleotide 55 ers include the lac promoter, the pBAD (araA) promoter, the sequences that are not conjoined in nature, 3) has been engi Tet promoter (U.S. Pat. Nos. 5.464,758 and 5,814,618), and neered using molecular cloning techniques such that it lacks the Ecdysone promoter (No et al., Proc. Natl. Acad. Sci. one or more nucleotides with respect to the naturally occur (1996) 93 (8): 3346-3351). ring nucleic acid molecule sequence, and/or 4) has been The term "heterologous' when used in reference to a poly manipulated using molecular cloning techniques such that it 60 nucleotide, a gene, a nucleic acid, a polypeptide, or an has one or more sequence changes or rearrangements with enzyme refers to a polynucleotide, gene, a nucleic acid, respect to the naturally occurring nucleic acid sequence. As polypeptide, or an enzyme not naturally found in the host non-limiting examples, a cDNA is a recombinant DNA mol organism. When referring to a gene regulatory sequence or to ecule, as is any nucleic acid molecule that has been generated an auxiliary nucleic acid sequence used for maintaining or by in vitro polymerase reaction(s), or to which linkers have 65 manipulating agene sequence (e.g. a 5' untranslated region, 3' been attached, or that has been integrated into a vector. Such untranslated region, poly A addition sequence, intron as a cloning vector or expression vector. sequence, splice site, ribosome binding site, internal ribo US 9,096,834 B2 11 12 Some entry sequence, genome homology region, recombina ments are generated). The equivalent Blastp parameter set tion site, etc.), "heterologous' means that the regulatory tings for comparison of amino acid sequences can be: Q=9; sequence or auxiliary sequence is from a different source than R=2; wink=1; and gapw-32. A Bestfit comparison between the gene with which the regulatory or auxiliary nucleic acid sequences, available in the GCG package version 10.0, can sequence is juxtaposed in a construct, genome, use DNA parameters GAP=50 (gap creation penalty) and or episome. Thus, a promoter operably linked to a gene to LEN=3 (gap extension penalty), and the equivalent settings in which it is not operably linked to in its natural state (i.e. in the protein comparisons can be GAP-8 and LEN=2. genome of a non-genetically engineered organism) is referred Thus, when referring to the polypeptide or nucleic acid to herein as a "heterologous promoter, even though the pro sequences of the present invention, included are sequence moter may be derived from the same species (or, in some 10 identities of at least 65%, 70%, 75%, 80%, or 85%, for cases, the same organism) as the gene to which it is linked. example at least 86%, at least 87%, at least 88%, at least 89%, As used herein, the term “protein’ or “polypeptide' is at least 90%, at least 91%, at least 92%, at least 93%, at least intended to encompass a singular “polypeptide' as well as 94%, at least 95%, at least 96%, at least 97%, at least 98%, at plural “polypeptides, and refers to a molecule composed of least 99%, or about 100% sequence identity with the full monomers (amino acids) linearly linked by amide bonds (also 15 length polypeptide or nucleic acid sequence, or to fragments known as peptide bonds). The term “polypeptide' refers to thereof comprising a consecutive sequence of at least 50, at any chain or chains of two or more amino acids, and does not least 75, at least 100, at least 125, at least 150 or more amino refer to a specific length of the product. Thus, peptides, dipep acid residues of the entire protein; variants of such sequences, tides, tripeptides, oligopeptides, “protein.” “amino acid e.g., wherein at least one amino acid residue has been inserted chain.” or any other term used to refer to a chain or chains of N- and/or C-terminal to, and/or within, the disclosed two or more amino acids, are included within the definition of sequence(s) which contain(s) the insertion and Substitution. "polypeptide.” and the term “polypeptide' can be used Contemplated variants can additionally or alternately include instead of, or interchangeably with any of these terms. those containing predetermined mutations by, e.g., homolo As used herein, the terms “percent identity” or “homology” gous recombination or site-directed or PCR mutagenesis, and with respect to nucleic acid or polypeptide sequences are 25 the corresponding polypeptides or nucleic acids of other spe defined as the percentage of nucleotide oramino acid residues cies, including, but not limited to, those described herein, the in the candidate sequence that are identical with the known alleles or other naturally occurring variants of the family of polypeptides, after aligning the sequences for maximum per polypeptides or nucleic acids which contain an insertion and cent identity and introducing gaps, if necessary, to achieve the substitution; and/or derivatives wherein the polypeptide has maximum percent homology. N-terminal or C-terminal inser 30 been covalently modified by substitution, chemical, enzy tion ordeletions shall not be construed as affecting homology, matic, or other appropriate means with a moiety other than a and internal deletions and/or insertions into the polypeptide naturally occurring amino acid which contains the insertion sequence of less than about 30, less than about 20, or less than and Substitution (for example, a detectable moiety Such as an about 10amino acid residues shall not be construed as affect enzyme). ing homology. 35 As used herein, the phrase “conservative amino acid Sub Homology or identity at the nucleotide or amino acid stitution' or “conservative mutation” refers to the replace sequence level can be determined by BLAST (Basic Local ment of one amino acid by anotheramino acid with a common Alignment Search Tool) analysis using the algorithm property. A functional way to define common properties employed by the programs blastp, blastin, blastX, thlastin, and between individual amino acids is to analyze the normalized thlastx (Altschul (1997), Nucleic Acids Res. 25, 3389-3402, 40 frequencies of amino acid changes between corresponding and Karlin (1990), Proc. Natl. Acad. Sci. USA 87, 2264 of homologous organisms (Schulz (1979) Principles 2268), which are tailored for sequence similarity searching. of Protein Structure, Springer-Verlag). According to such The approach used by the BLAST program is to first consider analyses, groups of amino acids can be defined where amino similar segments, with and without gaps, between a query acids within a group exchange preferentially with each other, sequence and a database sequence, then to evaluate the sta 45 and therefore resemble each other most in their impact on the tistical significance of all matches that are identified, and overall protein structure (Schulz (1979) Principles of Protein finally to Summarize only those matches which satisfy a pre Structure, Springer-Verlag). Examples of amino acid groups selected threshold of significance. For a discussion of basic defined in this manner can include: a “charged/polar group' issues in similarity searching of sequence databases, see Alts including Glu, Asp, ASn, Gln, Lys, Arg, and His; an 'aromatic chul (1994), Nature Genetics 6, 119-129. The search param 50 or cyclic group' including Pro, Phe, Tyr, and Trp; and an eters for histogram, descriptions, alignments, expect (i.e., the “aliphatic group' including Gly, Ala, Val, Leu, Ile, Met, Ser, statistical significance threshold for reporting matches Thr, and Cys. Within each group, Subgroups can also be against database sequences), cutoff, matrix, and filter (low identified. For example, the group of charged/polar amino complexity) can be at the default settings. The default scoring acids can be sub-divided into Sub-groups including: the “posi matrix used by blastp, blastX, thlastin, and thlastx is the BLO 55 tively-charged sub-group' comprising Lys, Arg and His; the SUM62 matrix (Henikoff (1992), Proc. Natl. Acad. Sci. USA “negatively-charged Sub-group' comprising Glu and Asp; 89, 10915-10919), recommended for query sequences over and the “polar sub-group' comprising ASn and Gln. In 85 in length (nucleotide bases or amino acids). another example, the aromatic or cyclic group can be Sub For blastin, designed for comparing nucleotide sequences, divided into Sub-groups including: the "nitrogen ring Sub the scoring matrix is set by the ratios of M (i.e., the reward 60 group' comprising Pro. His, and Trp; and the “phenyl Sub score for a pair of matching residues) to N (i.e., the penalty group' comprising Phe and Tyr. In another further example, score for mismatching residues), wherein the default values the aliphatic group can be sub-divided into Sub-groups for M and N can be +5 and -4, respectively. Four blastin including: the “large aliphatic non-polar sub-group' compris parameters can be adjusted as follows: Q=10 (gap creation ing Val, Leu, and Ile; the “aliphatic slightly-polar Sub-group' penalty); R=10 (gap extension penalty); wink-1 (generates 65 comprising Met, Ser, Thr, and Cys; and the “small-residue word hits at every winkth position along the query); and Sub-group' comprising Gly and Ala. Examples of conserva gapw-16 (sets the window width within which gapped align tive mutations include amino acid substitutions of amino US 9,096,834 B2 13 14 acids within the Sub-groups above. Such as, but not limited to: As used herein, "up-regulated” or "up-regulation' includes Lys for Arg or vice versa, Such that a positive charge can be an increase in expression of a gene or nucleic acid molecule of maintained; Glu for Asp or vice versa, Such that a negative interest or the activity of an enzyme, e.g., an increase in gene charge can be maintained; Serfor Thr or vice versa, such that expression or enzymatic activity as compared to the expres a free—OH can be maintained; and Gln for ASnor vice versa, sion or activity in an otherwise identical gene or enzyme that such that a free —NH2 can be maintained. has not been up-regulated. As used herein, "expression' includes the expression of a As used herein, “down-regulated' or “down-regulation' gene at least at the level of RNA production, and an “expres includes a decrease in expression of a gene or nucleic acid sion product includes the resultant product, e.g., a polypep molecule of interest or the activity of an enzyme, e.g., a 10 decrease in gene expression or enzymatic activity as com tide or functional RNA (e.g., a ribosomal RNA, a tRNA, an pared to the expression or activity in an otherwise identical antisense RNA, a microRNA, an shRNA, a ribozyme, etc.), gene or enzyme that has not been down-regulated. of an expressed gene. The term “increased expression” The term “Pfam” refers to a large collection of protein includes an alteration in gene expression to facilitate domains and protein families maintained by the Pfam Con increased mRNA production and/or increased polypeptide 15 sortium and available at several sponsored world wide web expression. “Increased production' includes an increase in sites, including: pfam.sanger.ac.uk/ (Welcome Trust, Sanger the amount of polypeptide expression, in the level of the Institute); pfam.sbc.su.se/ (Stockholm Bioinformatics Cen enzymatic activity of a polypeptide, or a combination of both, ter); pfamjanelia.org/ (Janelia Farm, Howard Hughes Medi as compared to the native production or enzymatic activity of cal Institute); pfamjouy.inra.fr/ (Institut national de la the polypeptide. Recherche Agronomique); and pfam.ccbb.re.kr. The latest The term “secreted includes movement of polypeptides or release of Pfam is Pfam 26.0 (November 2011) based on the fatty acid products produced by the recombinant microorgan UniProt protein database release 15.6, a composite of Swiss isms or methods of the invention to the periplasmic space or Prot release 57.6 and TrEMBL release 40.6. Pfam domains extracellular milieu. “Increased secretion' includes secretion and families are identified using multiple sequence align in excess of the naturally-occurring amount of secretion, e.g., 25 ments and hidden Markov models (HMMs). Pfam-A family that is at least 1%, 2%,3%, 4%, 5%, 6%, 7%, 8%, 9%, or 10%, or domain assignments, are high quality assignments gener or at least 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, ated by a curated seed alignment using representative mem 100%, 200%, 300%, 400%, 500%. 600%, 700%, 800%, bers of a protein family and profile hidden Markov models 900%, 1000%, or more, as compared to the naturally-occur based on the seed alignment. (Unless otherwise specified, ring level of Secretion. 30 matches of a queried protein to a Pfam domain or family are Embodiments of the present invention provide for the Pfam-A matches.) All identified sequences belonging to the "insertion, e.g., the addition, integration, incorporation, or family are then used to automatically generate a full align introduction, the activation, or up-regulation of certain ment for the family (Sonnhammer (1998) Nucleic Acids nucleic acid molecules or particular polynucleotide Research 26, 320-322: Bateman (2000) Nucleic Acids sequences, with or without additional regulatory sequences, 35 Research 26, 263-266: Bateman (2004) Nucleic Acids within microorganisms or host cells in order to affect the Research 32, Database Issue, D138-D141; Finn (2006) activity, such as the expression of an enzyme, of certain Nucleic Acids Research Database Issue 34, D247-251; Finn nucleic acid molecules or particular polynucleotide (2010) Nucleic Acids Research Database Issue 38, D211 sequences. In certain embodiments, a microorganism of inter 222). By accessing the Pfam database, for example, using any est may be engineered by site directed homologous recombi 40 of the above-reference websites, protein sequences can be nation to insert a particular gene of interest or a promoter that queried against the HMMs using HMMER homology search affects the expression of a particular gene or set of genes. software (e.g., HMMER2, HMMER3, or a higher version, Embodiments of the present invention provide recombi himmer.janelia.org/). Significant matches that identify a que nant microorganisms in which the nucleic acid molecules or ried protein as being in a Pfam family (or as having a particu particular polynucleotide sequences are partially, Substan 45 lar Pfam domain) are those in which the bit score is greater tially, or completely deleted, silenced, inactivated, or down than or equal to the gathering threshold for the Pfam domain. regulated in order to affect the activity for which they encode, Expectation values (e values) can also be used as a criterion Such as the expression of an enzyme. Genes can be partially, for inclusion of a queried protein in a Pfam or for determining Substantially, or completely deleted, silenced, inactivated, or whether a queried protein has a particular Pfam domain, down-regulated by insertion of nucleic acid sequences that 50 where low evalues (much less than 1.0, for example less than disrupt the function and/or expression of the gene (e.g., P1 0.1, or less than or equal to 0.01) represent low probabilities transduction or other methods known in the art). The terms that a match is due to chance. "eliminate.'"elimination, and “knockout' can be used inter An enzyme such as an LPAAT or a thioesterase that acts on changeably with the terms “deletion.” “partial deletion.” acyl thioester Substrates (e.g., uses acyl-ACP and/or acyl “substantial deletion, or “complete deletion.” In certain 55 CoA substrates) often has activity on substrates of different embodiments, a microorganism of interest may be engineered chain lengths, but can have greater activity on one or more by site directed homologous recombination to knockout a substrates than on others. For example, “substrate prefer particular gene of interest. In still other embodiments, RNAi ence” refers to the substrate or substrates an enzyme is most orantisense DNA (asDNA) may be used to partially, substan active on. Different acyl-ACP thioesterases may have differ tially, or completely silence, inactivate, or down-regulate a 60 ent degrees of chain length specificity, sometimes referred to particular gene of interest. as the enzyme’s “preference” for cleaving a particular length These insertions, deletions, or other modifications of cer of fatty acid from ACP, and thioesterases are typically most tain nucleic acid molecules or particular polynucleotide active in cleaving a particular chain length fatty acid while sequences may be understood to encompass "genetic modi having lesser activity in cleaving one or more other chain fication(s) or “transformation(s) such that the resulting 65 length fatty acids. Similarly, an LPAAT can have a substrate strains of the microorganisms or host cells may be understood preference for one or more acyl substrates, such as acyl-ACP to be “genetically modified’ or “transformed.” or acyl-CoA Substrates of particular carbon chain lengths. US 9,096,834 B2 15 16 As used herein, a “medium chain length' fatty acid or hols, alkenes and/or wax esters are the desired end product, a acyl-ACP is a fatty acid or acyl-ACP having a chain length of gene encoding an alcohol-forming fatty acyl reductase (e.g., from 8-14 carbons. alcohol-forming acyl-CoA reductase, 1.2.1.50) may be intro As used herein, a “long chain length' fatty acid or acyl duced into the host cell. Further, a fatty aldehyde reductase ACP is a fatty acid or acyl-ACP having a chain length of gene may be introduced to reduce fatty aldehydes to fatty greater than 14 carbons. alcohols. Fatty alcohols may be processed further to alkenes As used herein, the term “fatty acid product includes free with the introduction of one or more genes encoding a fatty fatty acids; mono-, di- or triglycerides; fatty aldehydes; a fatty alcohol dehydratase. Fatty acid esters, including wax esters, alcohols; fatty acid esters (including, but not limited to, wax may be formed by introducing genes encoding polypeptides esters); and hydrocarbons (including, but not limited to, 10 that catalyze condensation of an alcohol with a fatty acyl alkanes and alkenes). thioester, such as acyltransferases and wax synthases. Metabolic Pathways for Producing Fatty Acids In some embodiments of the invention described above, the The fatty acid biosynthesis pathway is highly conserved in conversion of acyl-ACP to fatty alcohol may occur via syn prokaryotes and in the chloroplasts of eukaryotic algae and thesis of a fatty aldehyde, wherein an acyl reductase (e.g., an higher plants. The fatty acid biosynthesis pathway starts from 15 aldehyde-forming acyl-CoA reductase or aldehyde-forming the central metabolite acetyl-CoA. Fatty acid biosynthesis is acyl-ACP reductase) expressed in the host cell first reduces initiated by the conversion of acetyl-CoA to malonyl-CoA, acyl-ACP to a fatty aldehyde. For example, in certain embodi catalyzed by acetyl-CoA carboxylase (ACCase). Malonyl ments, the host cell can be engineered to overexpress an CoA is then converted to malonyl-ACP, catalyzed by malo endogenous fatty aldehyde-forming reductase (e.g., by nyl-CoA-ACP transacylase (Fab) in E. coli). Finally, malo inserting promoter and/or enhancer transcriptional control nyl-ACP is converted to acyl-ACP, catalyzed by the enzyme elements near the fatty aldehyde-forming reductase gene). In complex fatty acid synthase (FAS). The fatty acid synthase other embodiments, the host cell may be engineered to complex initiates the elongation cycle by first condensing express an exogenous fatty aldehyde-forming reductase. malonyl-ACP with acetyl-ACP, catalyzed by a beta-ketoacyl Lysophosphatidic Acid Acyltransferases ACP synthase III (e.g., FabH of E. coli). The B-ketoacyl-ACP 25 As used herein, the term “lysophosphatidic acid acyltrans (3-ketoacyl-ACP) formed by the FabH reaction is reduced to ferase' or “LPAAT', or “1-acyl-sn-glycerol-3-phosphate a f-hydroxyacyl-ACP (3-hydroxyacyl-ACP) by 3-ketoacyl acyltransferase' or “AGPAT” (encoded by plsC in E. coli and ACP reductase (e.g. FabG). The B-hydroxyacyl-ACP is then other prokaryotes), includes those enzymes capable of con acted on by a B-hydroxyacyl-ACP dehydratase (e.g. FabA, Verting a lysophosphatidic acid to a phosphatidic acid by FabZ) to form trans-2-enoyl-ACP, which in turn is reduced by 30 transferring an acyl chain from an acyl Substrate Such as enoyl-ACP reductase (e.g. Fab I, Fab K. Fab|L) to form the 2 acyl-CoA or acyl-ACP to the sn2 position of lysophospha carbon-elongated acyl-ACP product. Subsequent cycles are tidic acid. LPAAT includes those enzymes that correspond to initiated by a beta-ketoacyl-ACP synthase I or II (e.g., FabB Enzyme Commission Number 2.3.1.51. or FabF) catalyzed condensation of malonyl-ACP with acyl LPAAT and the enzyme glycerol-3-phosphate acyltrans ACP. The cycles of condensation, reduction, dehydration, and 35 ferase (glycerolphosphate acyltransferase: GPAT: encoded reduction are repeated, with each cycle adding two carbons by plsB in E. coli) which catalyzes the conversion of glycerol from malonyl-ACP. until the acyl chain is transferred to 3-phosphate to lysophosphatidic acid, share canonical acyl another molecule (e.g. glycerol 3-phosphate) by a transacy transferase signatures such as HXXXXD or NHXXXXD and lase or cleaved from ACP by a thioesterase, such as FatA or XXXXXXG, where X is an arbitrary amino acid, separated FatB in chloroplasts, to form free fatty acids. 40 by 60 to 83 amino acid residues (Okazaki, K., et al. (2006) Unlike plant chloroplasts, cyanobacteria do not produce Plant Physiol. 141:546-56). Nucleic acids that encode GPATs free fatty acids, and unlike E. coli and other heterotrophic and LPAATs having these signature sequences have been bacteria, cyanobacteria do not produce acyl-CoA. After fatty identified in many organisms, including Arabidopsis thaliana acid elongation with the acyl chain covalently bound to acyl (Zhengetal. (2003) Plant Cell 15:1872-87; Kim and Huwang carrier protein, acyl transferases of cyanobacteria transfer the 45 (2004) Plant Physiol. 134:1206-16; Yu et al. (2003) Plant acyl chain to a glycerol backbone to produce membrane lip Cell Physiol. 15:1872-87) and Synechocystis sp. PCC6803 ids. (Weier et al. (2005) Biochem. Biophys. Res. Commun. 334: To produce fatty acid derivatives such as fatty alcohols, 1127-34: Okazaki, K., et al. (2006) Plant Physiol. 141:546 fatty aldehydes, wax esters, alkanes, or alkenes in microor 56). For example, an endoplasmic reticulum-located (mi ganisms, it is typically necessary to introduce one or more 50 crosomal) LPAAT, which is encoded by the LAT2 gene from genes encoding one or more enzymes to convert acyl Limnanthes douglasii or by the LPAT2 gene from A. thaliana, thioester intermediates (e.g., acyl-CoA or acyl-ACP) to the have 68 or 69 amino acid residues between the canonical desired end product (e.g., an alcohol, aldehyde, alkane, alk acyltransferase signatures and can use C18 acyl substrates, ene, or wax ester). For example, if fatty aldehydes and/or such as C18:1 CoA, for membrane lipid synthesis (Brown et alkanes are the desired end product, a gene encoding an 55 al. (1995) Plant Mol. Biol. 29:267-78; Kimetal. (2005) Plant aldehyde-forming fatty aldehyde reductase (e.g., aldehyde Cell 17:1073-89). Other seed-specific endoplasmic reticu forming acyl-CoA reductase, 1.2.1.42 or 1.2.1.50; see also lum-located LPAATs may have 63 amino acid residues or U.S. Pat. No. 6,143.538) may be introduced to reduce acyl approximately 63 amino acid residues between the canonical CoA to fatty aldehydes; additionally or alternatively, a car acyltransferase signatures and may use unusual acyl Sub boxylic acid reductase gene (see, e.g., WO 2010/135624 and 60 strates, e.g., C12:0 or C22:1 acyl substrates which in various WO 2010/042664) may be introduced to reduce free fatty plant species are incorporated into seed storage lipids (Brown acids to fatty aldehydes. Further, a gene encoding a fatty etal. (1995) Plant Mol. Biol. 29:267-78: Knutzonetal. (1995) alcohol oxidase (e.g., 1.1.3.20) or a fatty alcohol dehydroge Plant Physiol. 109:999-1006; Lassner et al. (1995) Plant nase (e.g., 1.1.1.164) may be introduced to convert fatty alco Physiol. 109:1389-94). hols to fatty aldehydes. Fatty aldehydes may be processed 65 LPAATs useful in the microorganisms and methods pro further to alkanes with the introduction of a gene encoding a vided herein may have one or more conserved domains found fatty aldehyde decarbonylase (e.g., 4.1.99.5). If fatty alco in cyanobacterial LPAAT sl1752 (SEQID NO:2), as charac US 9,096,834 B2 17 18 terized in the Conserved Domain Database maintained by the ence and an acyl-ACP thioesterase is selected that has a long National Center for Biotechnology Information (available at chain-length acyl-ACP or acyl-CoA substrate preference. In incbi.nlm.nih.gov/Structure/cdd/cdd.shtml), such as but not certain aspects, an LPAAT is selected that has a long chain limited to: caO7992 (Lysophospholipid Acyltransferases length acyl substrate preference and an acyl-ACP thioesterase (LPLATs) of Glycerophospholipid Biosynthesis: Unknown 5 is selected that has a medium chain-length acyl-ACP sub AAK1481.6-like), cdO7989 (Lysophospholipid Acyltrans strate preference. The preferred acyl-ACP substrate can have ferases (LPLATs) of Glycerophospholipid Biosynthesis: a saturated acyl chain or can have an unsaturated acyl chain. AGPAT-like), cdO6551 (Lysophospholipid acyltransferases In some preferred embodiments of the foregoing, the (LPLATs) of glycerophospholipid biosynthesis), cdO7993 LPAAT encoded by a non-native nucleic acid sequence in the (Lysophospholipid Acyltransferases (LPLATs) of Glycero- 10 recombinant microorganism has a Substrate preference for phospholipid Biosynthesis: GPAT-like), cd O7988 (Lysophos acyl-ACP or acyl-CoA molecules having longer acyl chain pholipid Acyltransferases (LPLATs) of Glycerophospholipid lengths than the acyl substrates preferred by the thioesterase Biosynthesis: Unknown ABO13168), cdO7987 (Lysophos produced by expression of a non-native thioesterase gene in pholipid Acyltransferases (LPLATs) of Glycerophospholipid the recombinant microorganism. Without limiting the inven Biosynthesis: MGAT-like), COG0204: P1sC (1-acyl-sn- 15 tion to any particular mechanism, the LPAAT can have a glycerol-3-phosphate acyltransferase), TIGR00530: AGP Substrate preference for an acyl Substrate that has a longer acyltrn (1-acyl-sn-glycerol-3-phosphate acyltransferases), acyl chain than the acyl substrate preferred by the thioesterase cdO7991: LPLAT LPCAT1-like (Lysophospholipid Acyl produced by the recombinant microorganism, Such that dur transferases (LPLATs) of Glycerophospholipid Biosynthe ing fatty acid biosynthesis, acyl-ACP intermediates are either sis: LPCAT1-like), cd O7990 (Lysophospholipid Acyltrans- 20 cleaved by the thioesterase or, if they escape cleavage at the ferases (LPLATs) of Glycerophospholipid Biosynthesis: preferred Substrate chain length, their acyl chains become LCLAT1-like), and cdO7986 (Lysophospholipid Acyltrans further elongated in the fatty acid biosynthesis cycle such that ferases (LPLATs) of Glycerophospholipid Biosynthesis: they attain a chain length preferred by the LPAAT. The Unknown ACT14924). LPAAT can then act on the acyl-ACP substrate, transferring Because LPAATs are essential to the production of phos- 25 the acyl chain to a glycerolipid and thereby removing acyl phatidic acid in all cellular organisms except archea, LPAATs ACP intermediates that have been elongated beyond the sub can be identified from a large variety of species, using, e.g., strate preference of the thioesterase. Such intermediates the canonical acyltransferase sequence signatures described could otherwise buildup in the cell and thereby downregulate above and/or by bioinformatics to identify conserved fatty acid synthesis. For example, the recombinant microor domains characteristic of LPAATs, such as, but not limited to, 30 ganism can express an acyl-ACP thioesterase, acyl-CoA those provided above, as well as by inclusion in Pfam thioesterase, or 4-hydroxybenzoyl thioesterase having a PF01553 (“Acyltransferase”), which has a gathering cut-off medium chain acyl substrate preference (for example, for C8, of 21.1. LPAAT activity and substrate specificity can be deter C10, C12, and/or C14 acyl substrates such as acyl-ACP and/ mined using, e.g., an acyltransferase assay as described in or acyl-CoA) and an LPAAT having a long chain acyl Sub Okazaki, K. et al. (2006) Plant Physiol. 141:546-56, 35 strate preference (for example, for C16, C18, C20, C22, or WO2007 141257, or U.S. Pat. No. 5,563,058, each of which is C24 acyl substrates such as acyl-ACP and/or acyl-CoA). In incorporated by reference, in which any of the methods can be Some examples, the recombinant microorganism expresses a adapted to use either acyl-ACP or acyl-CoA as a substrate, or non-native gene encoding a thioesterase that has a preference by using any of the methods disclosed herein. for a C12 and/or C14 acyl Substrate and expresses a non A gene encoding an LPAAT with a particular acyl chain 40 native gene encoding an LPAAT that has a preference for one length substrate preference can be expressed in a recombinant or more C16 and/or C18 acyl substrates. In another example, microorganism that expresses a non-native thioesterase gene both the thioesterase and the LPAAT encoded by non-native as provided herein to increase the levels of free fatty acids genes in the recombinant cell can have long chain acyl Sub produced by the recombinant host cells or microorganisms. strate preferences, where the LPAAT prefers one or more An LPAAT gene can be selected based on the substrate pref 45 Substrates having a longer acyl chain length than the acyl erence of the encoded enzyme in relation to the substrate chain length of the substrate(s) preferred by the thioesterase. preference of the thioesterase produced by the recombinant For example, a recombinant microorganism can express a microorganism or host cell. In some aspects of the invention, non-native gene encoding a thioesterase having a C16 acyl an LPAAT can be selected that has a different acyl chain Substrate preference and a non-native gene encoding an length substrate preference than the thioesterase produced by 50 LPAAT having a C18 acyl substrate preference. In further expression of a non-native gene, which can be for example, an examples, the recombinant microorganism expresses a non acyl-ACP thioesterase, an acyl-CoA thioesterase, or a 4-hy native gene encoding a thioesterase that has a preference for droxybenzoyl thioesterase. In some aspects of the invention, a C12, C14, and/or C16 acyl Substrate and expresses a non an LPAAT is selected that has a complementary acyl substrate native gene encoding an LPAAT that has a preference for a chain length preference with respect to the substrate prefer- 55 C18 acyl substrate. For example, the recombinant microor ence of the thioesterase produced by the recombinant micro ganism can express a non-native gene encoding a thioesterase organism, where acyl substrates preferred by the LPAAT are that has a C12 acyl substrate preference or a C14 acyl sub not preferred substrates of the thioesterase, and acyl sub strate preference. For example, the recombinant microorgan strates preferred by the thioesterase are not preferred sub ism can express an LPAAT having a C18 acyl substrate pref strates of the LPAAT. For example, in certain aspects an 60 erence and can express a non-native gene encoding a LPAAT is selected that has a medium chain-length acyl sub thioesterase that has a C16 acyl substrate preference, or the strate preference, e.g., a preference for using a C12 and/or a recombinant microorganism can express a non-native gene C14 acyl-ACP or acyl-CoA, and in other aspects, an LPAAT encoding a thioesterase that has a C14 and C16 acyl substrate is selected that has a long chain-length acyl-ACP Substrate preference or a preference for a C12, C14, and C16 acyl preference, e.g., a preference for using a C16 and/or a C18 65 Substrates. acyl-ACP substrate. In certain aspects, an LPAAT is selected An LPAAT gene that can be useful in the host cells and that has a medium chain-length acyl-ACP substrate prefer methods provided herein can be a prokaryotic or eukaryotic US 9,096,834 B2 19 20 LPAAT gene. A prokaryotic LPAAT gene can be derived from 334121556: SEQ ID NO:15). These and other orthologs of any prokaryotic microorganism, for example, a bacterium or sl1752 (SEQ ID NO:2) of Synechocystis are considered for cyanobacterium. Prokaryotic LPAATs commonly have a C16 use in the microorganisms and methods provided herein. acyl Substrate preference, although the invention considers Recombinant microorganisms that include a non-native prokaryotic LPAATs having substrate preferences for any nucleic acid sequence that encodes an LPAAT and produce at acyl chain length, particularly but not exclusively C16 and least one free fatty acid or at least one fatty acid derivative can C18 chain lengths. For example, an LPAAT encoded by a include non-native sequences encoding LPAATs having at non-native nucleic acid molecule can from a bacterial species least 40%, at least 45%, at least 50%, at least 55%, at least such as, for example, E. coli, e.g., plsC (SEQID NO:16) or 60%, at least 65%, at least 70%, at least 75%, at least 80%, at the LPAAT of Marinobacter adhaerens HP15 (Genbank 10 least 85%, at least 90%, at least 95%, at least 96%, at least a accession ADP99136, gene identifier 311696263; SEQ ID 97%, at least 98%, at least 99%, or 100% identity to any of NO:17) and orthologs in other bacterial species having at SEQID NO:2, SEQID NO:3, SEQID NO:4, SEQID NO:5, least 40%, at least 45%, at least 50%, at least 55%, at least SEQID NO:6, SEQID NO:7, SEQID NO:8, SEQID NO:9, 60%, at least 65%, at least 70%, at least 75%, at least 80%, at SEQ ID NO:10, SEQ ID NO:11, SEQ ID NO:12, SEQ ID least 85%, at least 90%, at least 95%, at least 96%, at least 15 NO:13, SEQ ID NO:14, or SEQ ID NO:15, or to LPAATs 97%, at least 98%, at least 99%, or 100% identity to SEQID encoded by other cyanobacterial orthologs of the Syn NO:16 or SEQID NO:17, or the LPAAT can be derived from echocystis sll 1752 gene. a cyanobacterial species, such as the sl11752 LPAAT of Syn Alternatively or in addition, a recombinant microorganism echocystis sp. PCC 6803 and orthologs in other species. can include a non-native nucleic acid molecule encoding a The LPAAT encoded by the sl1752 gene from Syn eukaryotic LPAAT. A eukaryotic LPAAT can be derived from echocystis sp. PCC 6803 (GenBank/EMBL accession num an animal, fungal, bacterial, plant or algal species. For ber NP 440396 or BAA17076; SEQID NO:2), orthologs of example, a recombinant microorganism can include a non which are widely conserved in published genomes of cyano native nucleic acid molecule encoding a plant or eukaryotic bacteria, has been shown to have a preference for C18:0 and algal LPAAT and can be, in some examples, an LPAAT that is C18:1 acyl substrates (Okazaki, K., et al. (2006) Plant 25 naturally either associated with a plastid or hypothesized to Physiol. 141:546-56). The LPAAT encoded by the sl1752 be associated with a plastid (for example, based on sequence gene from Synechocystis sp. PCC 6803 has a distance of 64 similarities to other plastidal LPAATs). Plastidal LPAATs amino acid residues between the canonical acyltransferase typically have a C16 acyl substrate preference. Nonlimiting signatures, which is similar to seed-specific LPAATs of examples of plastidal LPAATs include the polypeptides hav higher plants. 30 ing GenBank accession numbers ABB47826 (gene identifier Cyanobacterial LPAATs considered to be orthologs of the 78708851) from Oryza sativa (SEQ ID NO:18), XP polypeptide encoded by sl1752 (Genbank Accession 003525921; XP 003525921 (gene identifier 356514455) of NP 440396; gene identifier 16329668; SEQ ID NO:2) of Glycine max (SEQ ID NO:19); AAF73736 (gene identifier Synechocystis include, but are not limited to, the phospho 8163563) of Brassica napus (SEQ ID NO:20), and lipid/glycerol acyltransferase of Cyanothece sp. PCC 8801 35 NP 194787 (gene identifier 8163563) of Arabidopsis (Genbank Accession YP 002372580; gene identifier thaliana (SEQID NO:21). As nonlimiting examples, a non 218247209; SEQID NO:3), the phospholipid/glycerol acyl native gene that can be present in a recombinant microorgan transferase of Crocosphaera watsonii WH 8501 (Genbank ism as disclosed herein can encode an LPAAT having at least Accession ZP 00513647; geneidentifier 67920127: SEQID 40%, at least 45%, at least 50%, at least 55%, at least 60%, at NO:4), hypothetical protein CYO110 27159 of Cyanothece 40 least 65%, at least 70%, at least 75%, at least 80%, at least sp. CCY0110 (Genbank Accession ZP 01731155; gene 85%, at least 90%, at least 95%, at least 96%, at least 97%, at identifier 126660033; SEQ ID NO:5), the unnamed protein least 98%, at least 99%, or about 100% identity to any of for product of Microcystis aeruginosa PCC 7806 (Genbank example, SEQID NO:18, SEQID NO:19, SEQID NO:20, or Accession CAO89042: gene identifier 159026728; SEQ ID SEQID NO:21. NO:6), the hypothetical protein a1 14871 of Nostoc sp. PCC 45 A nucleic acid sequence can also encode an LPAAT 7120 (Genbank Accession NP 488911; gene identifier derived from a eukaryotic species that is either associated 17232363; SEQ ID NO:7), the unnamed protein product of with the endoplasmic reticulum (ER) or cytosol or hypoth Anabaena variabilis ATCC 29413 (Genbank Accession esized to be associated with the ER or cytosol, which may be YP 322656; gene identifier 75908360; SEQ ID NO:8), the referred to as a cytoplasmic, cytosolic, or microsomal phospholipid/glycerol acyltransferase of Nostoc azolae 50 LPAAT. Many microsomal LPAATs have a C18 acyl sub 0708 (Genbank Accession YP 003721758; gene identifier strate preference. For example, a non-native nucleic acid 298491581; SEQID NO:9), the phospholipid/glycerol acyl sequence can encode a microsomal LPAAT from a fungus, transferase of Raphidiopsis brookii D9 (Genbank Accession heterokont, mammal, insect, bryophyte, or higher plant. Illus ZP 06305519; geneidentifier 282897518; SEQID NO:10), trative examples of microsomal LPAATs include, without the acyltransferase domain protein of Microcoleus chthono 55 limitation, LPATs having the following GenBank accession plastes PCC 7420 (Genbank Accession ZP 05026385; gene numbers: CAA82638 (gene identifier 575960, SEQ ID identifier 254412612: SEQ ID NO:11), the phospholipid/ NO:22) of Zea mays, AEE79683 (geneidentifier 332646162: glycerol acyltransferase of Cylindrospermopsis raciborskii SEQ ID NO:23) of Arabidopsis thaliana, AEE32642 (gene CS-505 (Genbank Accession ZP 06306989; gene identifier identifier 332194521; SEQ ID NO:24) of Arabidopsis 282899007: SEQID NO:12), the conserved hypothetical pro 60 thaliana, CAB09138 (gene identifier 4583544: SEQ ID tein of Oscillatoria sp. PCC 6506 (Genbank Accession NO:25) of Brassica napus, and AAF20003 (gene identifier ZP 07110610; geneidentifier 300865867; SEQID NO:13), 6635.840; SEQID NO:26) of Prunus dulcis. For example, a the phospholipid/glycerol acyltransferase of Nodularia spu microorganism as provided herein can include a non-native migena CCY9414 (Genbank Accession ZP 01628592: gene gene encoding an LPAAT having at least at least 40%, at least identifier 119509444; SEQID NO:14), and the phospholipid/ 65 45%, at least 50%, at least 55%, at least 60%, at least 65%, at glycerol acyltransferase of Microcoleus vaginatus FGP-2 least 70%, at least 75%, at least 80%, at least 85%, at least (Genbank Accession ZP 08495621; gene identifier 90%, at least 95%, at least 96%, at least 97%, at least 98%, at US 9,096,834 B2 21 22 least 99%, or about 100% identity to any of SEQID NO:22, Genes encoding acyl-ACP thioesterases of plants and SEQID NO:23, SEQID NO:24, SEQID NO:25, or SEQID bryophytes may include sequences encoding transit peptides NO:26. for transport into plastids that can optionally be deleted for Additional microsomal LPAATs considered for use in the expression of N-terminally truncated thioesterases in host methods and microorganisms include LPAATs with medium microorganisms including but not limited to prokaryotic host chain length preference (e.g., U29657; GI: 1098604), or acyl microorganisms, such as bacteria and cyanobacteria. Genes chain length preference of greater than C18 (e.g., encoding thioesterases may further optionally include addi AAC49185.1; GI: 1209507). tional truncations, such as but not limited to N-terminal trun Thioesterases cations that include a portion, Such as up to ten, up to twenty, As used herein, the term “thioesterase' is intended to 10 or up to thirty amino acids of a mature acyl-ACP thioesterase include hydrolases capable of acting on a thioester bond. Sequence. Such enzymes can correspond to, e.g., Enzyme Commission Further included are acyl-ACP thioesterase genes from Number 3.1.2.2, 3.1.2.14, 3.1.2.18, 3.1.2.19, 3.2.1.20, prokaryotic organisms. Illustrative examples of prokaryotic 3.1.2.22, 3.1.2.23, or 3.1.2.27. A thioesterase expressed in a acyl-ACP thioesterases that may be expressed by a microor recombinant microorganism or host cell of the invention can 15 ganism useful in the methods and cultures provided herein be, for example, an acyl-ACP thioesterase, an acyl-CoA include, but are not limited to acyl-ACP thioesterases from thioesterase, or a hydroxylbenzoylthioesterase. For example, Desulfovibrio desulfuricans (e.g., Q312L1 GI: 123552742): a recombinant microorganism or host cell can be transformed Elusimicrobium minutum (e.g., ACC98705 GI:186971720): with a gene encoding an exogenous acyl-ACP thioesterase, Carboxydothermus hydrogenoformans (e.g., YP 359670 Such as a gene encoding a polypeptide that when queried GI:78042.959); Clostridium thermocellum (e.g., against the Pfam database, provides a match with Pfam YP 001039461 GI: 12597.5551); Moorella thermoacetica PFO1643 having a bit score of less than or equal to 20.3 (the (e.g., YP 431036 GI:835.91027); Geobacter metallire gathering cut-off for PFO1643). The acyl-ACP thioesterase ducens (e.g., YP 384688 GI:78222941); Salinibacter ruber gene can encode an acyl-ACP thioesterase from a higher plant (e.g., YP 444210 GI:83814393); Microscilla marina (e.g., species. Genes encoding acyl-ACP thioesterases derived 25 EAY28464 123988858); Parabacteroides distasonis (e.g., from higher plants can include, without limitation, genes YP 001303423 GI:150008680); Enterococcus faecalis encoding acyl-ACP thioesterases from Cuphea species (e.g. (e.g., ZP 03949391 GI:227519342); Lactobacillus plan Cuphea carthagenensis, Cuphea Wrightii (e.g., AAC49784.1 tarum (e.g., YP 003062170 GI:254555753); Leuconostoc GI: 1336008), Cuphea lanceolata (e.g., CAA54060, mesenteroides (e.g., YP 817783 GI: 116617412): Oenococ GI495227), Cuphea palustris, (e.g., AAC497.83.1 30 cus oeni (e.g., ZP 01544.069 GI: 118586629); Mycobacte GI: 1336006: AAC49179.1 GI:1215718); Cuphea hookeri rium smegmatis (e.g., ABK74560 GI:1 18173664); Mycobac ana (e.g., AAC72882.1 GI:3859830; AAC49269.1 terium vanbaalenii (e.g., ABM11638 GI:11995.4633): GI: 1292906; AAC72881.1 GI:3859828: AAC72883.1 Rhodococcus erythropolis (e.g., ZP 04385507 GI:3859832), Cuphea calophylla (e.g., ABB71580.1 GI:229491686; Rhodococcus opacus (e.g., YP 002778825 GI:81361963) or genes of various Cuphea species disclosed 35 GI:226361047), or any of those disclosed in the co-pending, in United States patent application publication US 2011/ commonly-assigned U.S. patent application Ser. No. 13/324, 0020883, incorporated by reference herein) or genes from 623 entitled “Prokaryotic Acyl-ACP Thioesterases for Pro other higher plant species. For example, a microorganism ducing Fatty Acids in Genetically Engineered Microorgan used in the methods and cultures disclosed herein can include isms, filed on Dec. 13, 2011, and which is incorporated a gene encoding an acyl-ACP thioesterase from species Such 40 herein by reference in its entirety. as but not limited to, Arabidopsis (XP 002885681.1 In additional embodiments, a gene encoding an acyl-CoA GI:297835598: NP 172327.1 GI:15223236); Arachis thioesterase can be introduced into a microorganism or host hypogaea (e.g., AB038556.1 GI: 133754634); Brassica spe cell and can be from a plant, animal, or microbial source. For cies (e.g., CAA52069.1 GI:435.011), Camellia oleifera (e.g., example, a gene encoding the TesA or TesB thioesterase of E. ACQ57189.1 GI:22.9358082); Cinnamomum camphorum 45 coli, or a variant thereof, for example, an acyl-CoA (e.g., AAC49151.1 GI: 1143156); Cocos nucifera, Glycine thioesterase Such as, but not limited to, a variant as disclosed max (e.g., ABD91726.1 GI:90192131); Garcinia man in WO 2010/075483, incorporated by reference herein in its gostana (e.g., AAB51525.1 GI: 1930081); Gossypium hirsu entirety, can be introduced into a microorganism or host cell. tum (e.g., AAD01982.1 GI:4104242); Helianthus annuus Also included are genes encoding proteins that when queried (e.g., AAQ08226 GI:33325244); Jatropha curcas (e.g., 50 against the Pfam database of protein families are identified as ABU96744.1 GI:156900676); Macadamia tetraphylla (e.g., members of Pfam PF02551 (acyl-CoA thioesterase), where ADA79524.1 GI:282160399); Elaeis oleifera (e.g., the bit score is equal to or greater than the gathering cut off AAM09524.1 GI:20067070): Elaeis guineensis (e.g., (20.7). AAD42220 GI:30962820); Oryza sativa (e.g., BAA83582.1 Alternately or in addition, the microorganism or host cell GI:5803272): Populus tomentosa (e.g., ABC47311 55 of the invention can include one or more genes encoding a GI:83778888); Umbellularia Californica (e.g., AAC49001.1 hydroxybenzoyl thioesterase, for example an exogenous GI:595955); Ulmus Americana (e.g., AAB71731.1 4-hydroxybenzoate thioesterase or 4-chlorobenzoate GI:2459533); and Zea mays (ACG41291.1 GI: 195643646), thioesterase. Genes encoding hydroxybenzoyl thioesterases or any of those disclosed in U.S. Pat. No. 5,455,167; U.S. Pat. that may be useful in a recombinant microorganism for pro No. 5,654.495; and U.S. Pat. No. 5,455,167; and in U.S. 60 ducing free fatty acids can include, for example, those dis Patent Appl. Pub. Nos. 2009/0298143 and 2011/0020883; all closed in the commonly-assigned U.S. patent application Ser. incorporated by reference herein in their entireties. Further No. 13/324,607 entitled “Genetically Engineered Microor included are acyl-ACP thioesterases from mosses (Bryo ganisms Comprising 4-Hydroxybenzoyl-CoA Thioesterases phyta). Such as, for example, Physcomitrella patens, (e.g., XP and Methods of Using Same for Producing Free Fatty Acids 001770108 GI:168035.219). These examples are not limiting 65 and Fatty Acid Derivatives', filed on Dec. 13, 2011, and with regard to the types or specific examples of acyl-ACP which is incorporated herein by reference in its entirety: thioesterase genes that can be used. 4-hydroxybenzoyl thioesterases from Bacillus species and US 9,096,834 B2 23 24 Geobacillus species; as well as 4-hydroxybenzoyl ris, Pyramimonas, Pyrobotrys, Scenedesmus, thioesterases of Acidiphilium, Bartonella, Rhodopseudomo Schizochytrium, Skeletonema, Spyrogyra, Stichococcus, Tet nas, Magnetospirillum, Burkholderia, Granulibacter, Rhizo raselmis, Viridiella, or Volvox species. bium, and Labrenzia species, or the like, or combinations In some embodiments, photosynthetic bacteria, including thereof. for example, green Sulfur bacteria, purple Sulfur bacteria, Acyl-ACP thioesterases typically can be active to some green nonsulfur bacteria, purple nonsulfur bacteria, or cyano degree on acyl-ACP substrates having a plurality of different bacteria are used. Cyanobacterial species that can be used for acyl chain lengths, but can have higher activity on (e.g., have production offatty acid products include, without limitation, a substrate preference for) one or more acyl-ACP substrates Agmenellum, Anabaena, Anabaenopsis, Anacystis, Aphani having particular acyl chain lengths than on other chain 10 zomenon, Arthrospira, Asterocapsa, Borzia, Calothrix, length substrates. In some examples, an enzyme referred to as Chamaesiphon, Chroococcus, Chlorogloeopsis, Chroococ having a substrate preference for particular acyl chain length cidiopsis, Chroococcus, Crinalium, Cyanobacterium, Substrates produces at least twice as much product of the Cyanobium, Cyanocystis, Cyanospira, Cyanothece, Cylin preferred Substrate or Substrates as it does from a non-pre drospermopsis, Cylindrospermum, Dactylococcopsis, Der ferred Substrate, and for example can produce at least three 15 mocarpella, Fischerella, Fremvella, Geitleria, Geitlerinema, times, at least four times, or at least five times as much product Gloeobacter, Gloeocapsa, Gloeothece, Halospirulina, Iyen from a preferred substrate as from a non-preferred substrate. gariella, Leptolyngbya, Limnothrix, Lyngbya, Microcoleus, For example, an acyl-ACP thioesterase may have a substrate Microcystis, Myxosarcina, Nodularia, Nostoc, Nostochopsis, preference for one or more of acyl-ACP substrates having Oscillatoria, Phormidium, Planktothrix, Pleurocapsa, acyl chain lengths of 8, 10, 12, 14, 16, 18, 20, 22, and/or 24 Prochlorococcus, Prochloron, Prochlorothrix, Pseudana carbons. Additionally or alternately, the acyl-ACP baena, Rivularia, Schizothrix, Scytoinema, Spirulina, thioesterase can hydrolyze one or more acyl-ACP substrates Stanieria, Starria, Stigonema, Symploca, Synechococcus, having an acyl chain length from 8 to 18 carbons, for example Synechocystis, Thermosynechococcus, Tolypothrix, Tri from 12 to 18 carbons. For example, the acyl-ACP chodesmium, Tichonema and Xenococcus. For example, the thioesterase can hydrolyze one or more acyl-ACP substrates 25 recombinant photosynthetic microorganism can be a Cyano having an acyl chain length from 12 to 16 carbons. Further bium, Cyanothece, or Cyanobacterium species, or further additionally or alternately, an acyl-ACP thioesterases of the alternatively, the recombinant photosynthetic microorganism present invention can, in Some embodiments, have its highest can be a Gloeobacter; Lyngbya or Leptolyngba species. Alter level of activity on an acyl-ACP substrate having an acyl natively, the recombinant photosynthetic microorganism can chain length of 12, 14, and/or 16 carbons. 30 be a Synechococcus, Synechocystis, or Thermosynechococ Microorganisms and Host Cells cus species. A number of cyanobacterial species are known A recombinant microorganism or host cell as provided and have been manipulated using molecular biological tech herein comprises a non-native nucleic acid molecule encod niques, including the unicellular cyanobacteria Synechocystis ing an LPAAT. The microorganism or host cell may be of sp. PCC6803 and Synechococcus elongates PCC7942, whose prokaryotic or eukaryotic origin, and include, but are not 35 genomes have been completely sequenced. limited to, photosynthetic organisms. “Photosynthetic organ In addition to photosynthetic microorganisms and host isms' are any prokaryotic or eukaryotic organisms that can cells, non-photosynthetic microorganisms and host cells Such perform photosynthesis. Photosynthetic organisms include as fungi and non-algal stamenophiles can be used. Oleagi higher plants (i.e., vascular plants), bryophytes, algae, and nous yeasts, including but not limited to Aspergillus niger, photosynthetic bacteria. The term "algae includes cyanobac 40 Yarrowia lypolytica, Cryptococcus curvatus, Cryptococcus teria (Cyanophyceae), green algae (Chlorophyceae), yellow terricolus, Candida species, Lipomyces Starkeyi, Lipomyces green algae (Xanthophyceae), golden algae (Chryso lipofer, Endomycopsis vernalis, Rhodotorula glutinis, and phyceae), brown algae (Phaeophyceae), red algae Rhodotorula gracilis can also be microorganisms and host (Rhodophyceae), diatoms (Bacillariophyceae), and “pico cells for use in the invention. Other fungi, including but not plankton’ (Prasinophyceae and Eustigmatophyceae). Also 45 limited to species of Aspergillus, Trichoderma, Neurospora, included in the term algae are members of the taxonomic Fusarium, Humicola, Rhizomucor, Kluyveromyces, Pichia, classes Dinophyceae, Cryptophyceae, Euglenophyceae, Mucor, Myceliophtora, Penicillium, Phanerochaete, Chry Glaucophyceae, and Prymnesiophyceae. Microalgae are uni sosporium, Saccharomyces, and Schizosaccharomyces, are cellular or colonial algae that can be seen as single organisms also encompassed as microorganisms and host cells for use only with the aid of a microscope. Microalgae include both 50 with the invention. Labyrinthulomycete species (e.g., Thraus eukaryotic and prokaryotic algae (e.g., cyanobacteria). tichytrium, Ulkenia, and Schizochytrium species) can also be Algae for use in the invention, include without limitation, microorganisms and host cells for use in the invention. microalgae, such as but not limited to, an Achnanthes, In some embodiments, the microorganism or host cell is a Amphiprora, Amphora, Ankistrodesmus, Asteromonas, bacterium, Such as, but not limited to, an Acetobacter, Acine Boekelovia, Borodinella, Botryococcus, Bracteococcus, 55 tobacter, Arthrobacter, Bacillus, Brevibacterium, Chroma Chaetoceros, Carteria, Chlamydomonas, Chlorococcum, tium, Chlorobium, Clostridium, Corynebacterium, Deino Chlorogonium, Chlorella, Chroomonas, Chrysosphaera, coccus, Delftia, Desulfo vibrio, Enterococcus, Escherichia, Cricosphaera, Crypthecodinium, Cryptomonas, Cyclotella, Kineococcus, Klebsiella, Lactobacillus, Lactococcus, Micro Dunaliella, Ellipsoidon, Emiliania, Eremosphaera, Ernodes coccus, Mycobacterium, Jeotgalicoccus, Paenibacillus, Pro mius, Euglena, Franceia, Fragilaria, Gloeothamnion, Hae 60 pionibacter, Pseudomonas, Rhodopseudomonas, Rhodo matococcus, Halocafeteria, Hymenomonas, Isochrysis, Lep bacter, Rhodococcus, Rhodospirillium, Rhodomicrobium, ocinclis, Micractinium, Monoraphidium, Nannochloris, Salmonella, Serratia, Shewanella, Stenotrophomonas, Strep Nannochloropsis, Navicula, Neochloris, Nephrochloris, tomyces, Streptococcus, Vibrio, or Zymomonas species. Nephroselmis, Nitzschia, Ochromonas, Oedogonium, Oocys The invention provides a recombinant microorganism tis, Ostreococcus, Pavlova, Parachlorella, Pascheria, Phaeo 65 comprising a non-native nucleic acid molecule that includes a dactylum, Phagus, Picochlorum, Platymonas, Pleurochrysis, nucleic acid sequence encoding an LPAAT and a non-native Pleurococcus, Prototheca, Pseudochlorella, Pseudoneochlo nucleic acid sequence that encodes a thioesterase. The nucleic US 9,096,834 B2 25 26 acid sequence encoding the LPAAT and the nucleic acid 98%, or 99% sequence identity to a polypeptide comprising sequence encoding the thioesterase can be present on the the amino acid sequence of SEQID NO: 18, 19, 20, or 21. same or different nucleic acid molecules. The recombinant Illustrative examples of recombinant microorganisms that microorganism can produce a fatty acid product, Such as a express a non-native LPAAT gene and a non-native fatty acid, fatty alcohol, fatty aldehyde, fatty acid ester, wax thioesterase gene include recombinant microorganisms such ester, or hydrocarbon (such as an alkane or alkene). The as but not limited to cyanobacteria that express a non-native recombinant host cell may comprise, e.g., any of the nucleic gene encoding an LPAAT having a C18. Substrate preference, acid sequences encoding an LPAAT described herein and such as an LPAAT having at least 85% identity to SEQ ID may comprise any of the nucleic acid sequences encoding a NO:2 (for example, at least 90% or at least 95% identity to thioesterase described herein (e.g., an acyl-ACP thioesterase, 10 SEQ ID NO:2) and a higher plant acyl-ACP thioesterase an acyl-CoA thioesterase, or a 4-hydroxybenzoyl having a substrate preference for one or more of a C12, C14, thioesterase). The recombinant host cell may comprise, e.g., or C16 acyl substrate, for example, the C12-preferring FatB1 any of the vectors described herein. In some embodiments, thioesterase of Umbellularia Californica (AAC49001.1 the nucleic acid sequence encoding the LPAAT gene is het GI:595955), the C14-preferring thioesterase of Cinnamo erologous with respect to the recombinant host cell, and can 15 mum camphorum (Q39473.1 GI:8469216), the N-terminally be an LPAAT gene derived from any species, including plant, truncated C16-preferring acyl-ACP thioesterase of Elaeis animal, or microbial species. Alternatively, the LPAAT gene oleifera (SEQ ID NO:56), the C16-preferring acyl-ACP may be homologous with respect to the host organism. For thioesterase of Elaeisguineensis (SEQID NO:57) or a trun example, the non-native LPAAT gene may bean LPAAT gene cated version thereof, the C14 and C16-preferring FatB2 that is native to the host microorganism and is introduced into thioesterase of Cuphea palustris (AAC49180.1 GI:1215720), the recombinant microorganism in an expression cassette that Cc 1 FatB1 thioesterase of Cuphea carthagenensis, or a vari allows regulated expression or overexpression of the endog ant thereof (e.g., the truncated polypeptide of SEQ ID enous LPAAT gene. Alternatively, the LPAAT gene may be NO:44), or acyl-ACP thioesterases of other Cuphea species endogenous to the microorganism and a heterologous pro (see, for example, United States patent application publica moter may be introduced into the host microorganism Such 25 tion US 2011/0020883), including, for example, the N-termi that it becomes juxtaposed with and operably linked to the nally truncated C14, C16-preferring Cd1 FatB1 acyl-ACP endogenous LPAAT gene. The nucleic acid sequence encod thioesterase of SEQID NO:54 or a variant thereof, the N-ter ing the thioesterase is can be derived from a gene that is minally truncated C16-preferring Cpl FatB1 thioesterase of homologous or heterologous with respect to the recombinant SEQ ID NO:55 or a variant thereof, or the C14, the C16 host cell. A thioesterase gene can be derived from an animal, 30 preferring C. hookeriana FatB1 thioesterase (Q39513.1 plant, or microbial species. GI:8469217). Also specifically considered are thioesterases The recombinant host cell can comprise a non-native gene having at least 85% identity, for example at least 90% or at encoding an LPAAT with at least 40%, 45%, 50%, 55%, 60%, least 95% identity to the aforementioned Umbellularia, Cin 65%, 70%, 75%, 80%, 85%, 86%, 87%, 88%, 89%, 90%, namomum, Cuphea and Elaeis acyl-ACP thioesterases, in 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% 35 which the acyl-ACP thioesterases have C12, C14, and/or C16 sequence identity to SEQID NO: 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, Substrate specificities. 12, 13, 14, 15, 16, or 17, or a functional fragment of the Further illustrative and non-limiting examples include LPAAT of the listed sequences. For example, the nucleic acid recombinant microorganisms that express a non-native sequence can encodes the LPAAT of SEQID NO: 2, 3, 4, 5, LPAAT gene encoding an LPAAT having a C18 substrate 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, or 17. For example, the 40 preference in combination with any of the acyl-ACP recombinant host cell can comprise a non-native gene encod thioesterase provided above (e.g., the Umbellularia, Cinna ing an LPAAT with at least 85%, 86%, 87%, 88%. 89%, 90%, momum, Cuphea and Elaeis acyl-ACP thioesterases, includ 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% ing acyl-ACP thioesterases of SEQID NO:54, 55, 56, and 57 sequence identity to SEQ ID NO: 2, and can encode an and variants thereof). LPAATs having a C18 substrate pref LPAAT with at least 90%, 95%, 96%, 97%, 98%, or 99% 45 erence include, for example, the LPAATs disclosed herein sequence identity to SEQID NO: 2. In further examples, the above having at least 85%, for example at least 90% or at least recombinant host cell can comprise a non-native gene encod 95% to microsomal LPAATs such as SEQID NO 22, 23, 24, ing an LPAAT with at least 85%, 86%, 87%, 88%. 89%, 90%, 25, and 26. Other LPAATs specifically considered for expres 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% sion in a host microorganism in combination with C12, C14. sequence identity to SEQID NO:3,4,5,6,7,8,9, 10, 11, 12, 50 and/or C16 preferring acyl-ACP thioesterases including 13, 14, or 15. In yet other examples, the recombinant host cell those of SEQ ID NO:54, 55, 56, or 57 and thioesterases can comprise a non-native gene encoding an LPAAT with at having at least 85%, 90%, or 95% identity to SEQID NO:54, least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 55, 56, or 57 or truncations thereof, include cyanobacterial 94%.95%,96%.97%.98%, or 99% sequence identity to SEQ orthologs of the Synechocystis C18-preferring LPAAT of ID NO: 16 or 17. In certain embodiments, the LPAAT 55 SEQ ID NO:2, including but not limited to LPAATs of SEQ encoded by the non-native nucleic acid molecule is derived ID NOs: 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, and 15, and from, e.g., a plant, fungal, or algal species, and may be LPAATs having at least 85%, at least 90%, or at least 95% to derived from an LPAAT that is native to the host microorgan these or other cyanobacterial orthologs of the LPAAT ism. In some embodiments, the recombinant host cell encoded by the Synechocystis sl11752 gene (SEQID NO:1). expresses an LPAAT with at least 40%, 45%, 50%, 55%, 60 In some examples, the recombinant host cell can comprises 60%, 65%, 70%, 75%, 80%, 85%, 86%, 87%, 88%, 89%, a nucleic acid sequence encoding an LPAAT with at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% 30%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, sequence identity to a polypeptide comprising the amino acid 85%. 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, sequence of SEQID NO: 18, 19, 20, or 21, or to a functional 95%,96%.97%, 98%, or 99% sequence identity to the nucle fragment of the polypeptide. For example, the recombinant 65 otide sequence of SEQ ID NO: 1, or to a fragment of the host cell can express an LPAAT with at least 85%. 86%, 87%, nucleotide sequence that encodes a functional fragment of the 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, LPAAT. In certain embodiments, the nucleic acid sequence US 9,096,834 B2 27 28 encoding the LPAAT comprises the nucleotide sequence of Without limiting the invention to any particular mecha SEQID NO: 1. In some embodiments, the recombinant host nisms, it is considered that overexpression of an LPAAT gene cell is a photosynthetic host cell, and the non-native nucleic may result in significantly increased production of membrane acid sequences encoding the LPAAT and/or the thioesterase or storage lipids. Excess membrane or storage lipids can be are codon optimized for expression in the host cell. 5 acted on by polypeptides having lipolytic activity (such as, The recombinant host cell expressing an LPAAT gene in e.g., one or more lipases or amidases) to release additional combination with a thioesterase gene produces a greater fatty acids (that can optionally be further converted to fatty amount of a fatty acid product than a control host cell that acid derivatives), thereby further increasing the amount of does not express the LPAAT gene. In some embodiments, the fatty acid products made by the recombinant microorganism amount of fatty acid product produced by a culture of the 10 recombinant host cell expressing a non-native gene encoding (see, for example, the model provided in FIG. 2). an LPAAT and a non-native gene encoding a thioesterase is at The use of genes encoding polypeptides having lipolytic least 10%, 15%, 20%, 25%, 30%, 40%, 50%, 60%, 65%, activity in microorganisms for the production of free fatty 70%, 75%, 80%, 90%, 95%, 100%, 125%, 150%, 175%, acids is disclosed in commonly-assigned U.S. patent appli 200%, 225%, 250%, 275%, 300%, 325%, 350%, 375%, 15 cation Ser. No. 13/324,653 entitled “Production of Free Fatty 400%, 425%, 450%, 475%, 500%, 525%, 550%, 575%, Acids and Fatty Acid Derivatives by Recombinant Microor 600%. 625%, 650%. 675%, 700%, 725%, 750%, 775%, ganisms Expressing Polypeptides Having Lipolytic Activity.” 800%, 825%, 850%, 875%, 900%, 925%, 950%, 975%, or filed on Dec. 13, 2011, and which is incorporated herein by 1000% greater than the amount of wax ester produced by a reference in its entirety. The polypeptide having lipolytic control host cell that does not express the non-native LPAAT activity can be for example a lipase, e.g., that liberates a fatty gene. acid from a glycerolipid (including a monoglyceride, a dig In some embodiments, the recombinant host cell express lyceride, a triglyceride, a phospholipid, a galactolipid, etc.) or ing an LPAAT and a thioesterase produces a greater amount can be an amidase. For example, the recombinant microor of a fatty acid product than a control host cell that does not ganism can include a non-native gene encoding a lipase, Such express the LPAAT and the thioesterase. In some embodi 25 as but not limited to a lipase that is a member of a Pfam ments, the amount of a fatty acid product produced by a belonging to the AB Hydrolase Pfam clan (CL0028). For culture of the recombinant host cell expressing an LPAAT and example, a non-native gene encoding a polypeptide having a thioesterase is at least 10%, 15%, 20%, 25%, 30%, 40%, lipolytic activity can encode a lipase that includes a LipA 50%, 60%, 65%, 70%, 75%, 80%, 90%, 95%, 100%, 125%, domain identified as conserved protein domain COG 1075, or 150%, 175%, 200%, 225%, 250%, 275%, 300%, 325%, 30 is included in the protein family Pfam PFO1674 (Lipase 2); a 350%, 375%, 400%, 425%, 450%, 475%, 500%, 525%, non-native nucleic acid molecule that encodes a lipase that 550%, 575%, 600%. 625%, 650%. 675%, 700%, 725%, includes a Lipase 3 domain identified as conserved protein 750%, 775%, 800%, 825%, 850%, 875%, 900%, 925%, domain COG3675, or is included in the protein family Pfam 950%,975%, or 1000% greater than the amount of fatty acid PFO1764 (Lipase 3); a non-native nucleic acid molecule that product produced by a control host cell that does not express 35 encodes a lipase that is included in the protein family Pfam the LPAAT and the thioesterase. PF07819 (PGAP1); or a non-native nucleic acid molecule Additional Host Cell Modifications for Free Fatty Acid (FFA) that encodes a polypeptide that is included in any of the Production protein families Pfam PF03583, Pfam PF00151 (Lipase), Further modifications to the recombinant microorganisms Pfam PF00561 (Ab hydrolase 1), Pfam PF02230 (Ab hydro or host cells of the invention can be made to affect free fatty 40 lase 2), Pfam PF07859 (Ab hydrolase 3), Pfam PF08386 (Ab acid production. For example, the recombinant microorgan hydrolase 4), Pfam PF12695 (Ab hydrolase 5), Pfam ism can include one or more additional non-native genes PF 12697 (Ab hydrolase 6), Pfam PF12715 (Ab hydrolase 7), whose expression increases the production of a fatty acid Pfam PF04083 (Ab hydro lipase). In some embodiments, an product. The additional gene may be encoded by a nucleic exogenous lipase gene introduced into a microorganism can acid molecule that is the same as the nucleic acid molecule 45 encode a protein with an amino acid sequence having an that encodes the LPAAT and/or the nucleic acid molecule that E-value parameter of 0.01 or less when queried using the encodes the thioesterase, or the additional gene may be Pfam Profile HMM for any of Pfam PF01674, Pfam PF encoded by a separate nucleic acid molecule. Where two or O1764, Pfam PF07819, Pfam PF03583, and/or Pfam more genes are encoded by the same nucleic acid molecule PF00151. Further, the recombinant microorganism can (e.g., on the same expression vector), the expression of each 50 include a non-native gene encoding an amidase having lipoly gene may optionally be independently regulated by a same or tic activity, such as but not limited to an amidase that recruits a different promoter and/or enhancer. In certain embodi to Pfam PF01425 (Amidase) with a bit score greater than the ments, the additional gene may increase the rate and/or level gathering cutoff of 20.1 and can catalyze the release of fatty of free fatty acid or fatty acid derivative production. acids from lipids. For example, the recombinant host cell expressing an 55 Additionally or alternately contemplated are recombinant LPAAT, and, preferably, a thioesterase, can further express at microorganisms that are engineered to include gene regula least one additional recombinant or exogenous gene that tory sequences that induce or increase expression of an encodes a polypeptide having lipolytic activity. For example, endogenous lipase gene. For example, a microorganism can a recombinant microorganism or host cell of the invention can be engineered such that a heterologous promoter is inserted include one or more genes encoding one or more polypep 60 upstream of a coding region of an endogenous lipase gene. tides having lipolytic activity, where the polypeptide(s) hav The heterologous promoter can replace an endogenous pro ing lipolytic activity are capable of producing free fatty acids moter and/or can be inserted upstream or downstream of the from membrane lipids or storage lipids, e.g., phospholipids, endogenous promoter that regulates expression of the endog triacylglycerols, diacylglycerols, monoacylglycerols, or the enous lipase gene, for example using homologous recombi like, or combinations thereof. The polypeptides having 65 nation or site-specific recombination. The heterologous pro lipolytic activity can be, for example, lipases, esterases, culti moter can be a constitutive promoter oran inducible promoter nases, or amidases. that increases expression of the endogenous lipase gene. US 9,096,834 B2 29 30 Further modifications to the recombinant microorganisms enzymes are inactivated or down-regulated, and/or such that or host cells of the invention can be made to affect free fatty the enzymes themselves that are operative on Such pathways acid production. For example, a recombinant microorganism are inhibited. Examples include, but are not limited to, or host cell of the invention can include one or more exog enzymes involved in glycogen, starch, or chrysolaminarin enous nucleic acid molecules that encode a polypeptide that synthesis, including glucan synthases and/or branching participates in the synthesis of a fatty acid, including, but not enzymes. Other examples include enzymes involved in PHA limited to, an acetyl-CoA carboxylase, a malonyl CoA: ACP biosynthesis Such as acetoacetyl-CoA synthase and PHA Syn transacylase, or a beta-ketoacyl-ACP synthase. thase. Additionally or alternatively, the recombinant microorgan Modifications for Fatty Acid Derivative Production ism or host cell can comprise a modification of an endogenous 10 The recombinant microorganisms or host cells of the nucleic acid molecule that encodes, e.g., an acyl-CoA syn invention can include additional modifications for the pro thetase, acyl-ACP synthetase, acyl CoA dehydrogenase, duction offatty acid derivatives such as, e.g., fatty aldehydes, glycerol-3-phosphate dehydrogenase, acetaldehyde CoA fatty alcohols, fatty acid esters, wax esters, and hydrocarbons, dehydrogenase, pyruvate dehydrogenase, acetate kinase, and including alkanes and alkenes. In some embodiments, the the like, and combinations thereof. In certain embodiments, 15 recombinant microorganism or host cell comprises one or the modification down-regulates the endogenous nucleic acid more nucleic acid molecules encoding an exogenous acyl and includes partial, Substantial, or complete deletion, silenc CoA reductase, carboxylic acid reductase, and/or acyl-ACP ing, or inactivation of the nucleic acid or its regulatory ele reductase and can produce a fatty alcohol. mentS. For example, for the production of fatty aldehydes, which In some examples, the host cell can have attenuated expres can optionally be further converted to products such as fatty sion of an endogenous gene encoding an acyl-ACP synthetase alcohols, wax esters, or alkanes, a transgenic microorganism which participates in the recycling of fatty acids into lipids. as provided herein can include an exogenous gene(s) that The endogenous acyl-ACP synthetase gene can be, for encodes an aldehyde-forming reductase. Such as, for example, downregulated by deletion or mutation of the pro example, an aldehyde-forming acyl-CoA reductase, an alde moter, or the protein-encoding of the gene can be internally 25 hyde-forming acyl-ACP reductase, or a carboxylic acid deleted or disrupted, for example, by insertional mutagenesis. reductase. Genes or portions of genes that are listed in Gen Alternatively, the entire acyl-CoA synthetase gene can be Bank and other genetic databases and that are predicted to deleted, for example, by homologous recombination or other encode proteins that are homologous to known acyl-CoA genome modification techniques. In yet further alternatives, reductases that produce fatty aldehydes, referred to herein as gene knockdown constructs Such as but not limited to 30 'aldehyde-generating fatty acyl-CoA reductases, can be ribozyme, antisense, or RNAi constructs can be introduced introduced into various microorganisms in order to test for the into the host cell to attenuate expression of the endogenous production of specific fatty aldehydes or fatty alcohols pro acyl-ACP synthetase gene. duced therefrom. Nonlimiting examples of fatty aldehyde Additionally or alternatively, the recombinant microorgan generating acyl-CoA reductases include the Acrl gene of ism or host cell can be modified Such that one or more genes 35 Acinetobacter baylvi (Accession U77680, GI:1684885), the that encode beta-Oxidation pathway enzymes have been inac AcrM-1 gene of Acinetobacter sp. M-1 (Accession YP tivated and/or down-regulated, and/or such that the enzymes 001086217, GI:18857900), and the luxC and luxE genes of themselves that are operative on Such beta-Oxidation path various photoluminescent bacteria, e.g., an Altermonas, Pho ways may be inhibited. This could prevent the degradation of tobacterium, Shewanella, Vibrio, or Xenorhabdus species. fatty acids released from acyl-ACPs, thus enhancing the yield 40 The enzymes encoded by these and other genes identified, for offatty acid products. In cases where the desired products are example, by sequence homology or protein domain can be medium-chain fatty acids, the inactivation and/or down-regu tested to determine their Substrates and products using assays lation of genes that encode acyl-CoA synthetase and/or acyl know in the art. CoA oxidase enzymes that preferentially use these chain Nonlimiting examples of carboxylic acid reductases that lengths as Substrates can be made. Mutations in the genes 45 can be used in the invention include the Nocardia CAR gene encoding medium-chain-specific acyl-CoA synthetase and/ (Accession AY495697: GI:40796034) and homologs thereof, or medium-chain-specific acyl-CoA oxidase enzymes, such some of which are disclosed in US2010/0105.963, incorpo that the activity of the enzymes are diminished, may addition rated by reference herein. ally or alternately be effective in increasing the yield of pro In some examples, the host cell can include a non-native duced and/or released fatty acid products. Mutations in the 50 gene encoding an aldehyde-forming acyl-ACP reductase genes can be introduced either by recombinant or non-recom such as but not limited to any of those disclosed in WO binant methods. 2009/140696 and WO 2011/066137. For example, the recom Genes may be targeted specifically by disruption, deletion, binant host cell may comprise an aldehyde-forming acyl generation of antisense sequences, generation of ribozymes, ACP reductase that has at least 50%, 60%, 70%, 80%, 90% or RNAi, meganuclease genome modification, and/or other 55 95% sequence identity to an aldehyde-forming reductase, recombinant approaches. Inactivation of the genes can addi e.g., as disclosed in WO 2009/140696 or WO 2011/066137. tionally or alternately be accomplished by random mutation Such as, for example, any of the reductases having the acces techniques such as exposure to UV and/or chemical sion numbers AAM82647: AAM82647; BAD78241: mutagens, and the resulting genes and/or enzymes can be ABA22149; BAB76983; ZP 03763674; ACL42791; screened for mutants with the desired activity. The proteins 60 ZP 01628095; ZP 01619574; YP 001865324; themselves can be inhibited by intracellular generation of YP 721978; NP 682102; YP 00151.8341; appropriate antibodies, intracellular generation of peptide YP 002371 106: ZP 05027136; ZP 03273554; inhibitors, or the like, or some combination thereof. NP 442146; ZP 01728620; ZP 0503.9135; In additional embodiments, a recombinant microorganism YP 001802846; NP 926091; YP 001660322; or host cell of the invention comprises can be modified such 65 ZP 00516920; CAO90781; ZP 01085337; that one or more genes that encode storage carbohydrate YP 001227841; ABD96327; NP 897828: and/or polyhydroxyalkanoate (PHA) biosynthesis pathway YP 001224378; ABD96480; ZP 01123215; ABB92249; US 9,096,834 B2 31 32 ZP 01079773; YP 377636; NP 874926; NP 895058; DGATs. Some wax synthase peptides can catalyze other reac ABD96274; ABD964.42; ZP 01469469; ZP 05045052; tions as well, for example Some wax synthase peptides will YP 001014416: YP 001010913; YP 38.1056; accept short chain acyl-CoAS and short chain alcohols to YP 001550421; NP 892651: YP 001090783; produce fatty esters. Methods to identify wax synthase activ ZP 01472595; YP 293,055; ZP 05138243; YP 731192; ity are provided in U.S. Pat. No. 7,118.896, which is herein YP 001483815; YP 001008982; YP 473896; incorporated by reference. Nonlimiting examples of wax syn YP 478.638; or YP 397030. In some embodiments the thases that can be encoded by an exogenous nucleic acid recombinant host cell includes an exogenous gene encoding molecule introduced into a recombinant microorganism as an aldehyde-forming acyl-ACP reductase, where the alde disclosed herein include the bifunctional wax ester synthase/ hyde-forming acyl-ACP reductase can be from a cyanobac 10 acyl-CoA:diacylglycerol acyltransferase of Simmondsia terial species, and may be from the same species as the host microorganism, or may be from a different species. Alterna chinensis (AAD38041), the wax synthase of Acinetobacter tively, a cyanobacterial host can be engineered to overexpress sp. strain ADP 1 (CAG67733), Pseudomonas aeruginosa an endogenous acyl-ACP reductase gene. (AAG06717), Arabidopsis thaliana (Q93ZR6), Alcanivorax For the production offatty alcohols, a recombinant micro 15 (EDX90960), Rhodococcus opacus (YP 002782647), organism as provided herein can include an exogenous gene Homo sapiens (Q6E213), Mus musculus (Q6E1M8), or Petu encoding an alcohol-forming acyl reductase such as bfar from niaxhybrida (AAZ08051). Bombyx minori, far from Simmondsia chinensis, an acyl In some embodiments, the recombinant microorganisms or CoA reductase from Titicum aestivum, mfarl of Mus muscu host cells of the invention comprise at least one nucleic acid lus, mfar2 from Mus musculus, hfar from H. sapiens, molecule encoding an exogenous fatty acid decarboxylase or FARXIII of Ostrinia scapulalis, MS2 of Z. mays, the putative an exogenous fatty aldehyde decarbonylase, or additionally at fatty acyl-coA reductase of Oryza sativa (GenBank accession least one exogenous nucleic acid molecule encoding an exog BAC84377) or MS2, FAR4, FARE, or CER4 of Arabidopsis enous acyl-CoA reductase, carboxylic acid reductase, or thaliana. An alcohol-forming fatty acyl-CoA reductase can acyl-ACP reductase, and can produce an alkane and/or alk also be a prokaryotic enzyme. Such as for example, those 25 ene. Alkanes and alkenes produced by the recombinant having Genbank accession numbers AAC45217 (Acineto microorganisms or host cells of the invention can, for bacter baylvifatty acyl-CoA reductase), YP 047869 (Acine example, have chain lengths of 7, 9, 11, 13, 15, 17, 19, 21, tobacter sp. ADP1 fatty acyl-CoA reductase), BAB85476 and/or 23 carbons, including, for example, chain lengths of 7. (Acinetobacter sp. M-1 acyl coenzyme A reductase), 9, 11, 13, 15, and/or 17 carbons, or chain lengths of 7, 9, 11, YP 001086217 (Acinetobacter baumanni ATCC 17978 30 acyl coenzyme A reductase), YP 580344 short-chain dehy 13, and/or 15 carbons, or chain lengths of 11, 13, and/or 15 drogenase/reductase SDR Psychrobacter cryohalolentis carbons. K5), YP 001280274 (Psychrobacter sp. PRwf-1 short-chain Additionally, the recombinant microorganisms or host dehydrogenase/reductase SDR), the acyl reductase of cells of the invention that produce a fatty alcohol, fatty alde Marinobacter algicola DG893 (Accession ZP 01892457), 35 hyde, fatty acid ester, wax ester, or hydrocarbons, including the short chain acyl dehydrogenase of Marinobacter aquae an alkane or an alkene, may optionally include a nucleic acid olei Maqu 2507 (YP 95.9769) Marinobacter aquaeolei molecule encoding an exogenous acyl-CoA synthetase, or VT8 Maqu 2220 (YP 959486), Hahella cheiuensis Heh may be engineered to have upregulated expression of an 05075 (YP 436.183), Marinobacter adhaerens HP15 810 endogenous acyl-CoA synthetase gene. (ADP96574), or an acyl reductase of an Oceanobacter spe 40 Additionally, the recombinant host cell may optionally be cies (e.g., RED65 09894, Accession EAT 13695). Alcohol engineered to express an exogenous transmembrane trans forming reductases may include those that are able to use porter to facilitate secretion of one or more fatty acid prod acyl-ACP as a Substrate, as disclosed in the co-pending, com ucts. For example, the recombinant host cell can include a monly-assigned U.S. patent application No. 61/539,640 non-native gene encoding an ATP-binding cassette (ABC) entitled “Fatty Alcohol-Forming Acyl-ACP Reductases”. 45 transporter or an RND pump. In some embodiments, the filed on Sep. 27, 2011, and which is incorporated herein by transporter is at least 80% identical in sequence to a trans reference in its entirety. porter protein encoded by an Arabidopsis genes CER5, Alternatively or in addition, the recombinant microorgan WBC11, AtMRPS, AmiS2 and AtPGP1, or fatty acid trans ism or host cell comprises one or more nucleic acid molecules porter (FATP) genes from Saccharomyces, Drosophila, encoding an exogenous acyl-CoA reductase, carboxylic acid 50 mycobacterial species, or mammalian species. reductase, and/or acyl-ACP reductase, and an exogenous wax Also included are genes encoding variants of these and synthase and can produce a wax ester. Wax esters include an other naturally-occurring enzymes that participate in the Syn A chain and a B chain linked through an ester bond, one or thesis offatty acid products having at least 65% identity to the both of which can be derived from a fatty acid generated by referenced or naturally-occurring proteins, in which the activ the recombinant microorganisms or host cells of the inven 55 ity of the enzyme is not substantially reduced with respect to tion. Wax esters produced by the recombinant microorgan the wild-type or above-referenced enzyme. isms or host cells of the invention include, e.g., A chain The above-described recombinant host cells may be used lengths of from 8 to 24 carbons, for example, 12 to 18 car in any of the methods of producing a fatty acid product as bons, or 12 to 16 carbons, and/or B chain lengths of from 8 to described herein. 24 carbons, for example, 12 to 18 carbons, or 12 to 16 car 60 Nucleic Acid Molecules bons. For example, the wax esters can have A+B chain lengths The nucleic acid molecules and polypeptides described including, but not limited to, of 16 to 48 carbons, 24 to 36 herein can be used in any of the methods of the invention, and carbons, or 24 to 32 carbons. may be included in any of the vectors or recombinant micro Wax synthases include polypeptides having enzyme clas organisms of the invention. Nucleic acid molecules compris sification number EC 2.3.1.75, as well as any other peptide 65 ing sequences that encode LPAATs are provided for use in capable of catalyzing the conversion of an acyl-thioester to host microorganisms and methods for producing fatty acid fatty esters, e.g., Some acyltransferases, including some products, including free fatty acids, fatty aldehydes, fatty US 9,096,834 B2 33 34 alcohols, fatty acid esters, wax esters, alkanes, and/or alk sequence identity to the amino acid sequence of SEQ ID enes. A nucleic acid molecule as disclosed herein can be NO:2, can have at least 90% sequence identity to the amino isolated and/or purified. acid sequence of SEQ ID NO:2, or can have at least 95% In some embodiments, expression in a host microorganism sequence identity to the amino acid sequence of SEQ ID (such as a recombinant microorganism that expresses a non NO:2. native gene encoding a thioesterase) of a recombinant nucleic Additionally or alternately, an isolated nucleic acid mol acid molecule or sequence encoding an LPAAT as described ecule useful in the methods and microorganisms of the inven herein results in a higher production level of a fatty acid or a tion can comprise a nucleic acid sequence that encodes an fatty acid derivative by the host microorganism than the pro LPAAT, where the nucleic acid sequence has at least 30%, duction level in a control microorganism, where the control 10 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, microorganism is cultured under the same conditions and is 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, Substantially identical to the microorganism expressing the 96%, 97%, 98%, or 99% sequence identity to the nucleotide non-native nucleic acid molecule or sequence encoding the sequence of SEQID NO:1, or to a fragment of the nucleotide LPAAT in all respects, with the exception that the control sequence of SEQID NO:1 that encodes a functional fragment microorganism does not express a non-native nucleic acid 15 of the LPAAT of SEQID NO:2. In some embodiments, the molecule that encodes an LPAAT. In particular embodiments invention provides an isolated or recombinant nucleic acid the recombinant microorganism can be a photosynthetic molecule that encodes an LPAAT, where the nucleic acid microorganism. sequence has at least 85%, at least 90%, or at least 95% In further embodiments, expression in a host microorgan sequence identity to the nucleotide sequence of SEQ ID ism of a non-native nucleic acid sequence encoding an NO:1, or to a fragment of the nucleotide sequence of SEQID LPAAT and a non-native nucleic acid sequence encoding a NO:1 that encodes a functional fragment of the acyl-ACP wax thioesterase as described herein results in a higher production ester synthase of SEQID NO:2. In some embodiments, the level of a fatty acid product by the host microorganism than invention provides an isolated nucleic acid molecule that the production level in a control microorganism, where the comprises the nucleic acid sequence of SEQID NO:1 control microorganism is cultured under the same conditions 25 An isolated or recombinant nucleic acid molecule encod and is Substantially identical to the microorganism expressing ing an LPAAT can comprise a nucleic acid sequence that the recombinant nucleic acid molecule(s) or sequences in all encodes a polypeptide having LPAAT activity that has at least respects, with the exception that the control microorganism 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, does not express a thioesterase and does not express a non 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, native nucleic acid sequences encoding an LPAAT. In par 30 96%, 97%, 98%, or 99% sequence identity to the amino acid ticular embodiments, the recombinant microorganism can be sequence of a prokaryotic LPAAT, such as but not limited to a photosynthetic microorganism. plsCofE. coli (SEQID NO:16) and orthologs thereofinother The recombinant host cells or microorganisms can include, bacterial species, or the LPAAT of Marinobacter adhaerens for example, an isolated nucleic acid molecule encoding a HP15 (Genbank accession ADP99136, gene identifier polypeptide having LPAAT activity, in which the polypeptide 35 311696263; SEQID NO:17) and orthologs in other bacterial comprises an amino acid sequence having at least 40%, 45%, species. For example, in some instances an isolated or recom 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 86%, 87%, binant nucleic acid molecule encoding an LPAAT can com 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, prise a nucleic acid sequence that encodes a polypeptide 98%, or 99% sequence identity to the amino acid sequence of having LPAAT activity that has at least 85%, 90%, 95%, or SEQID NO: 2,3,4,5,6,7,8,9, 10, 11, 12, 13, 14, 15, 16, 17, 40 99% sequence identity to the amino acid sequence of a 18, 19, 20, 21, 22, 23, 24, 25, or 26, or to a functional fragment prokaryotic LPAAT, such as but not limited to plsC of E. coli of any of the provided amino acid sequences. (SEQ ID NO:16) and orthologs thereof in other bacterial An isolated or recombinant nucleic acid molecule encod species, or the LPAAT of Marinobacter adhaerens HP15 ing an LPAAT can comprise a nucleic acid sequence that (Genbank accession ADP99136, gene identifier 311696263; encodes a polypeptide having LPAAT activity that has at least 45 SEQID NO:17) and orthologs in other bacterial species. 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, Further, isolated or recombinant nucleic acid molecule 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, encoding an LPAAT can comprise a nucleic acid sequence 96%, 97%, 98%, or 99% sequence identity to the amino acid that encodes a polypeptide having LPAAT activity that has at sequence of SEQID NO: 2,3,4,5,6,7,8,9, 10, 11, 12, 13, least 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 14, 15, 16, or 17, or to a portion thereof, for example, to a 50 85%. 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, functional fragment of the polypeptide. For example, a 95%,96%.97%, 98%, or 99% sequence identity to the amino nucleic acid sequence can encode a polypeptide having acid sequence of a eukaryotic LPAAT. Such as a plastidal LPAAT activity that can include an amino acid sequence LPAAT that can be derived from a plant or algal species, such having at least 85% sequence identity to the amino acid as, for example the LPAAT of Oryza sativa (SEQID NO:18), sequence of SEQIDNO:2,3,4,5,6,7,8,9, 10, 11, 12, 13, 14, 55 the LPAAT of Glycine max (SEQID NO:19); the LPAAT of 15, 16, or 17 or a functional fragment thereof, or having at Brassica napus (SEQ ID NO:20), and the LPAAT of Arabi least 90% sequence identity to the amino acid sequence of dopsis thaliana (SEQID NO:21), or to a functional fragment SEQID NO: 2,3,4,5,6,7,8,9, 10, 11, 12, 13, 14, 15, 16, or thereof, or to other plastidal LPAATs, including functional 17, or a functional fragment thereof, or, for example, having fragments thereof. at least 95% sequence identity to the amino acid sequence of 60 Additionally, an isolated or recombinant nucleic acid mol SEQID NO:2,3,4,5,6,7,8,9, 10, 11, 12, 13, 14, 15, 16, or ecule encoding an LPAAT can comprise a nucleic acid 17, or a functional fragment thereof, and can comprise a sequence that encodes a polypeptide having LPAAT activity nucleotide sequence that encodes a polypeptide comprising that has at least 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, all or an active fragment of the amino acid sequence of SEQ 80%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, ID NO: 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, or 17. 65 94%. 95%, 96%.97%, 98%, or 99% sequence identity to the In some examples, a nucleic acid sequence that encodes a amino acid sequence of a eukaryotic microsomal LPAAT, polypeptide having LPAAT activity can have at least 85% including but not limited to SEQID NO:22 of Zea mays, SEQ US 9,096,834 B2 35 36 ID NO:23 of Arabidopsis thaliana, SEQID NO:24 of Arabi least 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, dopsis thaliana, SEQID NO:25 of Brassica napus, and SEQ 100%, 200%, 300%, 400%, 500%. 600%, 700%, 800%, ID NO:26 of Prunus dulcis, or to a fragment thereof demon 900%, or 1000% in comparison to the activity of the LPAAT strating LPAAT activity. For example, a microorganism as from which the variant is derived. In certain embodiments, the provided herein can include a non-native gene encoding an 5 amount of fatty acid or fatty acid derivative produced by a LPAAT having at least 85%, at least 90%, at least 95%, at least host cell expressing the fragment or variant is at least 5%, 96%, at least 97%, at least 98%, at least 99%, or 100% 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 100%, identity to any of SEQID NO:22, SEQID NO:23, SEQ ID 200%, 300%, 400%, 500%. 600%, 700%, 800%, 900% or NO:24, SEQID NO:25, or SEQID NO:26. 1000% of the amount offatty acid product made by a host cell In some embodiments, the invention encompasses nucleic 10 expressing the LPAAT from which the fragment or variant is acid molecules encoding deletion mutants of an LPAAT derived. where one or more amino acids have been deleted from the protein. For example, the encoded polypeptide can lack at Any of the provided nucleic acid molecules can optionally least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 70, or 80 further comprise an additional nucleic acid sequence of at amino acids from the N- and/or C-terminus and can have an 15 least 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 350, 400, amino acid sequence at least 40%, 45%, 50%, 55%, 60%, 450, 500, 550, 600, 700, 800, 900, 1000, or 1500 nucleotides 65%, 70%, 75%, 80%, 85%, 86%, 87%, 88%, 89%, 90%, from a photosynthetic organism. 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% Other Modifications identical to the corresponding amino acid sequence of SEQ The invention also provides further variants of the nucle ID NO: 2,3,4,5,6,7,8,9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20 otide sequences of the invention. In some embodiments, the 19, 20, 21, 22, 23, 24, 25, or 26. In some examples, the deleted nucleotide sequence variants encode fragments or variants of sequences may include targeting sequences, for example, at the polypeptides as described herein. In some embodiments, least a portion of a chloroplast transit peptide, at least a the nucleotide sequence variants are naturally-occurring. In portion of a mitochondrial targeting sequence, at least a por other embodiments, the nucleotide sequence variants are non tion of an endoplasmic reticulum targeting sequence, etc. 25 naturally-occurring, such as those induced by various Further, the invention provides nucleic acid molecules mutagens and mutagenic processes. In certain embodiments, encoding variants of naturally-occurring LPAAT amino acid the nucleotide sequence variants are a combination of natu sequences, such as but not limited to variants of LPAAT rally- and non-naturally-occurring. A given nucleic acid sequences of SEQID NO: 2,3,4,5,6,7,8,9, 10, 11, 12, 13, sequence may be modified, for example, according to stan 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, or 26 or fragments 30 dard mutagenesis or artificial evolution or domain Swapping thereof. Variants may be naturally occurring, or non-natu methods to produce modified sequences. Accelerated evolu rally-occurring, such as those induced by various mutagens tion methods are described, e.g. by Stemmer (1994) Nature and mutagenic processes. In some embodiments, a nucleic 370, 389-391, and Stemmer (1994) Proc. Natl. Acad. Sci. acid molecule encodes a variantofan LPAAT in which at least USA 91, 10747-10751. Chemical or enzymatic alteration of one amino acid residue has been inserted N- and/or C-termi- 35 expressed nucleic acids and polypeptides can be performed nal to, and/or within, the reference sequence. In some by Standard methods. For example, a sequence can be modi embodiments, at least one amino acid residue has been fied by addition of phosphate groups, methyl groups, lipids, deleted N- and/or C-terminal to, and/or within, the reference Sugars, peptides or organic or inorganic compounds, by the sequence. In some embodiments, the nucleic acid molecules inclusion of modified nucleotides or amino acids, or the like. may encode variants that may be sequences containing pre- 40 For optimal expression of a recombinant protein, in certain determined mutations by, e.g., homologous recombination or instances it may be beneficial to employ coding sequences site-directed or PCR mutagenesis; corresponding proteins of that produce mRNA with codons preferentially used by the other species; alleles or other naturally occurring variants; host cell to be transformed (“codon optimization'). Thus, for and/or derivatives wherein the protein has been covalently enhanced expression of transgenes, the codon usage of the modified by chemical, enzymatic or other appropriate means 45 transgene can be matched with the specific codon bias of the with a moiety other than a naturally occurring amino acid. organism in which the transgene is desired to be expressed. A substitution, insertion or deletion can adversely affect Methods of recoding genes for expression in microalgae are the protein when the altered sequence substantially inhibits a described in, e.g., U.S. Pat. No. 7,135,290. The precise biological function associated with the protein. In certain mechanisms underlying this effect are believed to be many, embodiments, a variant of an LPAAT may have activity that is 50 but can include the proper balancing of available aminoacy reduced by not more than about 1%, 2%. 3%, 4%. 5%, 6%, lated tRNA pools with proteins being synthesized in the cell, 7%, 8%, 9%, 10%, 15%, 20%, 30%, 40%, or 50%, in com coupled with more efficient translation of the transgenic mes parison to the activity of the LPAAT from which the variant is senger RNA (mRNA) when this need is met. In some embodi derived (e.g., any of SEQID NO: 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, ments, only a portion of the codons is changed to reflect a 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, or 26, or 55 preferred codon usage of a host microorganism. In certain other LPAATs). In some embodiments, the amount of a fatty embodiments, one or more codons are changed to codons that acid product produced by a host cell expressing the LPAAT are not necessarily the most preferred codon of the host variant is not less than about 99%, 98%, 97%, 96%. 95%, microorganism encoding a particular amino acid. Additional 94%, 93%, 92%, 91%, 90%, 85%, 80% or 75% of the amount information for codon optimization is available, e.g. at the or the fatty acid product produced by a host cell expressing 60 codon usage database of GenBank. The coding sequences the LPAAT from which the variant is derived (e.g., e.g., any of may be codon optimized for optimal production of a desired SEQID NO: 2,3,4,5,6,7,8,9, 10, 11, 12, 13, 14, 15, 16, 17, product in the host organism selected for expression. In cer 18, 19, 20, 21, 22, 23, 24, 25, or 26, or other LPAATs). tain embodiments, the non-native nucleic acid sequence The invention also provides fragments and variants of an encoding an LPAAT and/or the non-native nucleic acid LPAAT that have increased activity in comparison to the 65 sequence encoding a thioesterase is codon optimized for reference polypeptides. In certain embodiments, the LPAAT expression in a photosynthetic microorganism, e.g., a cyano fragment or variant may have activity that is increased by at bacterium or a eukaryotic microalga. US 9,096,834 B2 37 38 In some embodiments, the nucleic acid molecules of the 5,750,385, U.S. Pat. No. 5,639,952), metals (Eukaryotic Cell invention encode fusion proteins that comprise an LPAAT. 2:995-1002 (2003)) or temperature (U.S. Pat. No. 5,447.858: For example, the nucleic acids of the invention may comprise Abe et al. Plant Cell Physiol. 49: 625-632 (2008); Shroda et polynucleotide sequences that encode glutathione-S-trans al. Plant J. 21: 121-131 (2000)). The foregoing list is exem ferase (GST) or a portion thereof, thioredoxin or a portion 5 plary and not limiting. The promoter sequence can be from thereof, maltose binding protein or a portion thereof, poly any organism, provided that it is functional in the host organ histidine (e.g. His), poly-HN, poly-lysine, a hemagglutinin ism. In certain embodiments, inducible promoters are formed tag sequence, HSV-Tag and/or at least a portion of HIV-Tat by fusing one or more portions or domains from a known fused to the LPAAT-encoding sequence. inducible promoter to at least a portion of a different promoter In some embodiments, the nucleic acid molecules of the 10 that can operate in the host cell, e.g. to confer inducibility on invention comprise additional non-coding sequences such as a promoter that operates in the host species. non-coding 3' and 5' sequences (including, e.g., regulatory For transformation of cyanobacteria, a variety of promot sequences) that may be homologous or heterologous to the ers that function in cyanobacteria can be utilized, including, LPAAT gene. but not limited to, the ara, lac, tac, and trc promoters, as well Nucleic Acid Constructs 15 as derivatives that are also inducible by the addition of iso The invention also provides constructs comprising a propyl B-D-1-thiogalactopyranoside (IPTG) such as the trcY nucleic acid sequence encoding an LPAAT and/or a or trcE promoter. Other promoters that may find use in the thioesterase that can further include one or more sequences invention include promoters that are naturally associated with that regulate or mediate transcription, translation, or integra transposon- or bacterial chromosome-borne antibiotic resis tion of nucleotide sequences into a host genome. For tance genes (e.g., neomycin phosphotransferase, chloram example, the invention provides expression constructs that phenicol acetyltransferase, spectinomycinadenyltransferase, comprise one or more "expression control elements’ or or the like, or combinations thereof), promoters associated sequences that regulate expression transcription of an oper with various heterologous bacterial and native cyanobacterial ably linked gene, or translation of the transcribed RNA. For genes, promoters from viruses and phages, synthetic promot example, an expression control element can be a promoter 25 ers, or the like, or combinations thereof. For example, the that may be operably linked to the gene of interest (e.g., an promoter(s) can be selected from prokaryotic promoters from LPAAT gene) in an expression construct or “expression cas a range of species, including eubacterial and cyanobacterial sette. In some embodiments of the foregoing, the promoteris species, such as, for example, an araC or pBAD promoter, a regulatable, e.g., inducible. rha promoter, a Pm promoter, a XylS promoter, a nir promoter, In embodiments where the nucleic acid construct does not 30 a nar promoter, a pho promoter, a tet promoter, a cys pro contain a promoter in operable linkage with the nucleic acid moter, a metallothionien promoter, an fif promoter, a gln sequence encoding the gene of interest (e.g., an LPAAT gene) promoter, a heat shock promoter, a cold-inducible promoter the nucleic acid sequence can be transformed into the cells or a viral promoter. The foregoing promoters are exemplary Such that it becomes operably linked to an endogenous pro and are not limiting. Promoters isolated from cyanobacteria moter by, e.g., homologous recombination, site specific inte 35 that can be used can include but are not limited to the follow gration, and/or vector integration. In some embodiments, ing: nirs (nickel-inducible), secA (secretion; controlled by the genomic host sequences included in a nucleic acid construct redox state of the cell), rbc (Rubisco Operon), psaAB (PS I for mediating homologous recombination into the host reaction centerproteins; light regulated), psbA (Dl protein of genome can include gene regulatory sequences, for example, PSII; light-inducible), and the like, and combinations thereof. a promoter sequence, that can regulate expression of an 40 In some embodiments, the promoters are regulated by nitro LPAAT gene of the nucleic acid construct. In such embodi gen compounds, such as, for example, nar, ntc., nir or nrt ments, the transgene(s) of the construct can become operably promoters. In some embodiments, the promoters are regu linked to a promoter that is endogenous to the host microor lated by phosphate (e.g., pho orpst promoters) or metals, e.g., ganism. In some embodiments, the endogenous promoter(s) the nrs promoter (Liu and Curtis (2009) Proc Natl Acad are regulatable, e.g., inducible. 45 Sciences USA 106: 21550-21554), or the petE promoter A promoter operably linked to a nucleic acid sequence (Buikema and Haselkorn (2001) Proc Natl Acad Sciences encoding an LPAAT may be a promoter that is heterologous USA 98: 2729-2734)). Inducible promoters, as used in the with respect to the LPAAT gene. In some embodiments of the constructs of the present invention, can use one or more foregoing invention, the promoter may be an inducible pro portions or domains of the aforementioned promoters and/or moter, i.e., a promoter that mediates transcription of an oper 50 other inducible promoters fused to at least a portion of a ably linked gene in response to a particular stimulus. Such different promoter that can operate in the host organism, e.g., promoters may be advantageous, e.g., to minimize any del to confer inducibility on a promoter that operates in the host eterious effects on the growth of the host cell and/or to maxi species. mize production of the fatty acid product. An inducible pro Likewise, a wide variety of transcriptional terminators can moter can be responsive to, e.g., light or dark or high or low 55 be used for expression vector construction. Examples of pos temperature, and/or can be responsive to specific compounds. sible terminators can include, but are not limited to, psbA, The inducible promoter may be, an ara promoter, a lac pro psaAB, rbc, SecA, T7 coat protein, and the like, and combi moter, a trp promoter, a tet promoter (e.g., U.S. Pat. No. nations thereof. 5,851,796), a hybrid promoter that includes a portion of a trp, In some embodiments, an isolated or recombinant nucleic lac, or tet promoter, a hormone-responsive promoter (e.g., an 60 acid molecule of the invention can comprise both a nucleic ecdysone-responsive promoter, such as described in U.S. Pat. acid sequence that encodes an LPAAT and a nucleic acid No. 6,379,945), a metallothionien promoter (e.g., U.S. Pat. sequence that encodes a thioesterase. The nucleic acid No. 6,410,828), a pathogenesis-related (PR) promoter that sequences encoding the LPAAT and the thioesterase may be, can be responsive to a chemical Such as, for example, salicylic for example, any of the nucleic acid sequences described acid, ethylene, thiamine, and/or BTH (U.S. Pat. No. 5,689, 65 herein. 044), or the like, or some combination thereof. An inducible In certain embodiments, the nucleic acid sequence that promoter can also be responsive to light or dark (U.S. Pat. No. encodes an LPAAT and the nucleic acid sequence that US 9,096,834 B2 39 40 encodes a thioesterase can be operably linked to the same genome, for example, one or more sequences having homol promoter and/or enhancer. For example, in particular ogy with one or more nucleotide sequences of the host micro embodiments the two genes (encoding an LPAAT and a organism. A vector can be an expression vector that includes thioesterase) may be organized as an operon, in which, for one or more specified nucleic acid “expression control ele example, a promoter sequence is followed, in the 5' to 3' ments' that permit transcription and/or translation of a par direction, by a thioesterase-encoding sequence and then an ticular nucleic acid in a host cell. The vector can be a plasmid, LPAAT-encoding sequence. In an alternative configuration of a part of a plasmid, a viral construct, a nucleic acid fragment, the operon, a promoter sequence is followed, in the 5' to 3' or the like, or a combination thereof. direction, by an LPAAT-encoding sequence and then a The vector can be a high copy number vector, a shuttle thioesterase-encoding sequence. In some embodiments, an 10 vector that can replicate in more than one species of cell, an isolated nucleic acid molecule can include two or more genes expression vector, an integration vector, or a combination arranged in tandem, where the isolated nucleic acid molecule thereof. Typically, the expression vector can include a nucleic does not include a promoter sequence that operates in the acid comprising a gene of interest operably linked to a pro intended host microorganism upstream of the two or more moter in an "expression cassette, which can also include, but genes. In these embodiments, the promoterless operon can be 15 is not limited to, a transcriptional terminator, a ribosome designed for integration (e.g., homologous recombination) binding site, a splice site or splicing recognition sequence, an into a site of the host genome that may include a promoter intron, an enhancer, a polyadenylation signal, an internal sequence, such that the synthetic operon can be transcription ribosome entry site, and similar elements. According to some ally regulated by a promoter in the genome of the host micro embodiments, the present invention can involve recombinant organism. Further, the operon may be designed for integra microorganisms transformed with an isolated nucleic acid tion (e.g., homologous recombination) into a site of the host comprising a gene of interest under control of a heterologous genome that may include an enhancer sequence, such that the promoter. Alternatively, if the vector does not contain a pro introduced operon can be transcriptionally regulated by an moter operably linked with an isolated nucleic acid compris enhancer in the genome of the host microorganism. In any of ing a gene of interest, the isolated nucleic acid can be trans the above embodiments of operons that include LPAAT and 25 formed into the microorganisms or host cells Such that it thioesterase genes, one or more additional regulatory becomes operably linked to an endogenous promoter by sequences can be included in the isolated nucleic acid mol homologous recombination, site specific integration, and/or ecule, for example, a sequence for enhancing translation can vector integration. be included upstream of any of the gene-encoding sequences, In some embodiments, the present invention additionally and/or a transcriptional terminator can optionally be included 30 provides recombinant microorganisms or host cells trans at or near the 3' end of the synthetic operon. formed with an isolated nucleic acid comprising a gene of In addition to an LPAAT gene and a thioesterase gene, one interest that is operably linked to one or more expression or more additional genes can optionally be included in a control elements. In some instances, it can be advantageous to synthetic operon as provided herein, where the one or more express the protein at a certain point during the growth of the additional genes may include, for example, one or more genes 35 recombinant microorganism, e.g., to minimize any deleteri encoding enzymes or proteins of the fatty acid synthesis ous effects on the growth of the recombinant microorganism pathway and/or one or more genes encoding enzymes or and/or to maximize production of the fatty acid product of proteins that may enhance fatty acid product synthesis, one or interest. In Such instances, one or more exogenous genes more genes that may enhance photosynthesis or carbon-fixa introduced into the recombinant microorganism or host cell tion, and/or one or more reporter genes or selectable markers. 40 can be operably linked to an inducible promoter, which medi In some embodiments, the nucleic acid sequence that ates transcription of an operably linked gene in response to a encodes an LPAAT and the nucleic acid sequence that particular stimulus. encodes a thioesterase can be provided on the same nucleic Transformation vectors can additionally or alternately acid construct where they are operably linked to different include a selectable marker, Such as, but not limited to, a drug promoters and/or transcriptional enhancers. The promoters 45 resistance gene, an herbicide resistance gene, a metabolic and enhancers may be, e.g., any of the promoters and tran enzyme and/or factor required for survival of the host (for Scriptional enhancers described herein. example, an auxotrophic marker), or the like, or a combina In certain embodiments, the vector comprising a nucleic tion thereof. Transformed cells can be selected based upon the acid sequence encoding an LPAAT is designed for transfor ability to grow in the presence of the antibiotic and/or other mation into cyanobacteria. In a particular embodiment, the 50 selectable marker under conditions in which cells lacking the vector permits homologous recombination of the LPAAT resistance cassette or auxotrophic marker could not grow. encoding sequence with the cyanobacterial genome. Further, a non-selectable marker may be present on a vector, An isolated nucleic acid molecule of the present invention Such as a gene encoding a fluorescent protein or enzyme that can include the sequences disclosed herein that encode an generates a detectable reaction product. LPAAT or other polypeptide in a vector, such as, but not 55 A vector comprising an isolated nucleic acid comprising a limited to, an expression vector. A vector can be a nucleic acid gene of interest can also be an integration vector that includes that has been generated via human intervention, including by one or more sequences that promote integration of the gene of recombinant means and/or direct chemical synthesis, and can interest or a gene expression cassette into the genome of the include, for example, one or more of: 1) an origin of replica host microorganism or host cell. For example, an integration tion for propagation of the nucleic acid sequences in one or 60 vector used to transform cyanobacteria can include at least more hosts (which may or may not include the production one sequence of at least 50, at least 100, at least 200, at least host); 2) one or more selectable markers; 3) one or more 300, at least 400, at least 500, or at least 600 nucleotides with reporter genes; 4) one or more expression control sequences, homology to a sequence in the genome of the host organism Such as, but not limited to, promoter sequences, enhancer to allow integration of the gene of interest or gene expression sequences, terminator sequences, sequence for enhancing 65 cassette into the genome of the host microorganism or host translation, etc.; and/or 5) one or more sequences for promot cell to occur via homologous recombination. In some ing integration of the nucleic acid sequences into a host examples, the gene or gene expression cassette is flanked by US 9,096,834 B2 41 42 sequences homologous to a region of the host chromosome to algal species (see, for example, Ramesh et al. (2004) Methods promote integration of the gene of interest or gene expression Mol. Biol. 274: 355-307: Doestchet al. (2001) Curr. Genet. cassette into the host chromosome. Alternatively or in addi 39: 49-60; U.S. Pat. No. 7,294,506; WO 2003/091413: WO tion, an integration vector can include one or more sequences 2005/005643; and WO 2007/133558, all incorporated herein that promote site-specific recombination or random integra by reference in their entireties). tion Such as, but not limited to, sequences recognized by Methods of Producing Fatty Acid Products recombinases, integrases, or transposases. In some embodi The invention encompasses methods of producing a fatty ments, the integration vector can further include a gene acid product by culturing the recombinant microorganisms encoding a recombinase, integrase, or transposase. and host cells described herein, under conditions in which at Transformation of Microorganisms and Host Cells 10 least one fatty acid product is produced. The methods can A vector comprising an isolated nucleic acid comprising a further comprise isolating at least one fatty acid product. In gene of interest can be introduced into cyanobacteria via Some embodiments, at least a portion of the fatty acid and/or conventional transformation and/or transfection techniques. fatty acid derivative produced by the recombinant microor The terms “transformation.” “transfection.” “conjugation.” ganisms is released or secreted into the growth media by the and “transduction, as used in the present context, are 15 microorganism. In some embodiments, the expression of a intended to comprise a multiplicity of methods known to polypeptide encoded by the nucleic acid molecules described those skilled in the art for the introduction of foreign nucleic herein can be induced in the recombinant microorganism to acid (for example, exogenous DNA) into a host cell, including produce the fatty acid product. calcium phosphate and/or calcium chloride coprecipitation, Releasing and secreting, as used herein in the context of DEAE-dextran-mediated transfection, lipofection, natural products of the invention, are used interchangeably to refer to competence, chemically mediated transfer, electroporation, a process by which active and/or passive transport mecha particle bombardment, or the like, or combinations thereof. nisms allow products of the invention to cross the cell mem Examples of suitable methods for the transformation and/or brane to exit the cell. Examples of such transport mechanisms transfection of host cells, e.g., can be found in Molecular can include, but are not limited to, gradient diffusion, facili Cloning A Laboratory Manual (2010), Cold Spring Harbor 25 tated diffusion, active transport, and combinations thereof. Laboratory Press. Culturing refers to the intentional fostering of growth (e.g., Host cells Such as plants for use in the invention can be increases in cell size, cellular contents, and/or cellular activ transformed by any feasible means, including, without limi ity) and/or propagation (e.g., increases in cell numbers via tation, the use of Agrobacterium, particle gun-mediated trans mitosis) of one or more cells by use of selected and/or con formation, laser-mediated transformation, or electroporation. 30 trolled conditions. The combination of both growth and Algae and photosynthetic bacteria can be transformed by any propagation may be termed proliferation. Non-limiting suitable methods, including, as nonlimiting examples, natural examples of selected and/or controlled conditions can include DNA uptake (Chunget al. (1998) FEMS Microbiol. Lett. 164: the use of a defined medium (with known characteristics such 353-361; Frigaard et al. (2004) Methods Mol. Biol. 274:325 as pH, ionic strength, and/or carbon Source), specified tem 40; Zang et al. (2007) J. Microbiol. 45: 241-245), conjuga 35 perature, oxygen tension, carbon dioxide levels, growth in a tion, transduction, glass bead transformation (Kindle et al. bioreactor, or the like, or combinations thereof. In some (1989).J. Cell Biol. 109: 2589-601; Feng et al. (2009) Mol. embodiments, the microorganism or host cell can be grown Biol. Rep. 36: 1433-9; U.S. Pat. No. 5,661,017), silicon car heterotrophically, using a reduced carbon Source, or mix bide whisker transformation (Dunahay et al. (1997) Methods otrophically, using both light and a reduced carbon Source. Mol. Biol. (1997) 62.503-9), biolistics (Dawson et al. (1997) 40 Additionally or alternately, the microorganism or host cell Curr. Microbiol. 35: 356-62; Hallmann et al. (1997) Proc. can be cultured phototrophically. When growing phototrophi Natl. Acad. USA 94: 7469-7474; Jakobiaketal. (2004) Protist cally, the microorganism can advantageously use light as an 155:381-93; Tan et al. (2005) J. Microbiol. 43: 361-365; energy source. An inorganic carbon Source, such as CO or Steinbrenneretal. (2006) Appl Environ. Microbiol. 72: 7477 bicarbonate, can be used for synthesis of biomolecules by the 7484: Kroth (2007) Methods Mol. Biol. 390: 257-267; U.S. 45 microorganism. “Inorganic carbon, as used herein, includes Pat. No. 5,661,017) electroporation (Kjaerulff et al. (1994) carbon-containing compounds or molecules that cannot be Photosynth. Res. 41: 277-283: Iwai et al. (2004) Plant Cell used as a Sustainable energy source by an organism. Typically Physiol. 45: 171-5: Ravindran et al. (2006) J. Microbiol. “inorganic carbon can be in the form of CO (carbon diox Methods 66: 174-6; Sun et al. (2006) Gene 377: 140-149: ide), carbonic acid, bicarbonate salts, carbonate salts, hydro Wang et al. (2007) Appl. Microbiol. Biotechnol. 76: 651-657: 50 gen carbonate salts, or the like, or combinations thereof, Chaurasia et al. (2008).J. Microbiol. Methods 73: 133-141: which cannot be further oxidized for sustainable energy nor Ludwig et al. (2008) Appl. Microbiol. Biotechnol. 78: 729 used as a source of reducing power by organisms. If an 35), laser-mediated transformation, or incubation with DNA organic carbon molecule or compound is provided in the in the presence of or after pre-treatment with any of poly culture medium of a microorganism grown phototrophically, (amidoamine) dendrimers (Pasupathy et al. (2008) Biotech 55 it generally cannot be taken up and/or metabolized by the cell mol. J. 3: 1078-82), polyethylene glycol (Ohnuma et al. for energy and/or typically is not present in an amount Suffi (2008) Plant Cell Physiol. 49: 117-120), cationic lipids (Mu cient to provide sustainable energy for the growth of the cell radawa et al. (2008) J. Biosci. Bioeng. 105: 77-80), dextran, culture. However, organic carbon molecules can be used by calcium phosphate, or calcium chloride (Mendez-Alvarez et microorganisms growing heterotrophically. Thus, the present al. (1994) J. Bacteriol. 176: 7395-7397), optionally after 60 invention includes a process for converting a carbon Source to treatment of the cells with cell wall-degrading enzymes (Per a fatty acid product comprising contacting the carbon source rone et al. (1998) Mol. Biol. Cell 9: 3351-3365). Agrobacte with a recombinant microorganism or host cell of the inven rium-mediated transformation can also be performed on algal tion. In some aspects the carbon source is an inorganic carbon cells, for example after removing or wounding the algal cell Source and in other aspects the carbon source is an organic wall (e.g., WO 2000/62601; Kumar et al. (2004) Plant Sci. 65 carbon Source. 1.66: 731-738). Biolistic methods are particularly successful Microorganisms and host cells that can be useful in accor for transformation of the chloroplasts of plant and eukaryotic dance with the methods of the present invention can be found US 9,096,834 B2 43 44 in various locations and environments throughout the world. metalion Source (noted as "metal herein), a multivalent (i.e., Without wishing to be bound by theory, it is observed that, having a valence of +2 or higher) counterion source, a diva perhaps as a consequence of their isolation from other species lent counterion source, or some combination thereof. Other and their evolutionary divergence, the particular growth details regarding this metal/carboxylate counterion Source medium for optimal growth and generation of lipid and/or are described in the co-pending, commonly-assigned U.S. other hydrocarbon constituents can vary. In some cases, cer patent application Ser. No. 13/324,636, entitled “Culturing a tain strains of microorganisms may be unable to grow in a Microorganism in a Medium with an Elevated Level of a particular growth medium because of the presence of some Carboxylate Counterion Source', filed on Dec. 13, 2011. inhibitory component or the absence of some essential nutri The culture methods can include inducing expression of a tional requirement of the particular strain of microorganism 10 or host cell. particular gene described herein for the production of fatty Solid and liquid growth media are generally available from acid products, and/or regulating metabolic pathway in the a wide variety of Sources, as are instructions for the prepara microorganism. Inducing expression can include adding a tion of particular media suitable for a wide variety of strains nutrient or compound to the culture, removing one or more of microorganisms. For example, various freshwater and salt 15 components from the culture medium, increasing or decreas water media can include those described in Barsanti (2005) ing light and/or temperature, and/or other manipulations that Algae: Anatomy, Biochemistry & Biotechnology, CRC Press promote expression of the gene of interest. Such manipula for media and methods for culturing algae. Algal media reci tions can largely depend on the nature of the (heterologous) pes can also be found at the websites of various algal culture promoter operably linked to the gene of interest. collections, including, as nonlimiting examples, the UTEX In some embodiments of the present invention, the recom Culture Collection of Algae (available on the worldwide web binant microorganisms or host cells can be cultured in a at www.sbs.utexas.edu/uteX/media.aspx) (visited Sep. 20. bioreactor. “Bioreactor” refers to an enclosure or partial 2011); Culture Collection of Algae and Protozoa (available enclosure in which cells are cultured, optionally in Suspen on the world wide web at www.ccap.ac.uk) (visited Sep. 20. sion and, when suspended, preferably in an aqueous liquid. 2011); and Katedra Botaniky (available on the world wide 25 The bioreactor can be used to culture microalgal cells through web at botany.natur.cuni.cz/algo?caup-media.html) (visited the various phases of their physiological cycle. Bioreactors Sep. 20, 2011). can offer many advantages for use in heterotrophic growth In some embodiments, media used for culturing an organ and propagation methods. To produce biomass for use as ism that produces fatty acids can include an increased con food, microorganisms or host cells are preferably fermented centration of a metal (typically provided as a salt and/or in an 30 in large quantities in liquid, such as in Suspension cultures as ionic form) Such as, for example, Sodium, potassium, mag nesium, calcium, strontium, barium, beryllium, lead, iron, an example. Bioreactors such as steel fermentors can accom nickel, cobalt, tin, chromium, aluminum, Zinc, copper, or the modate very large culture volumes (40,000 liter and greater like, or combinations thereof (particularly multivalent metals, capacity bioreactors can be used in various embodiments of Such as magnesium, calcium, and/or iron), with respect to a 35 the invention). Bioreactors can also typically allow for the standard medium formulation, such as, for example, standard control of one or more culture conditions such as temperature, BG-11 medium (ATCC Medium 616, Table 5), or a modified pH, oxygen tension, carbon dioxide levels, and the like, as medium such as ATCC Medium 854 (BG-11 modified to well as combinations thereof. Bioreactors can typically be contain vitamin B12) or ATCC Medium 617 (BG-11 modi configurable, for example, using ports attached to tubing, to fied for marine cyanobacteria, containing additional NaCl 40 allow gaseous components, such as CO, CO-enriched air, and vitamin B12). oxygen, and/or nitrogen, to be contacted with (e.g., bubbled For example, a medium used for growing microorganisms through) a liquid culture. Other culture parameters, such as that produce free fatty acids can include at least 2-fold, for the pH of the culture media, the identity and/or concentration example at least 3-fold, at least 4-fold, at least 5-fold, at least of trace elements and/or nutrients, the identity and/or concen 6-fold, at least 7-fold, at least 8-fold, at least 9-fold, at least 45 tration of other media constituents, or the like, or combina 10-fold, between 2-fold and 10-fold, and/or between 10-fold tions thereof, can typically be more readily manipulated using and 100-fold the amount of metal (e.g., calcium) as compared a bioreactor. to a standard medium. The medium used for growing micro Microorganisms and host cells can additionally or alter organisms that can produce free fatty acids can include, for nately be cultured in a bioreactor equipped with an artificial example, at least 0.5 mM, between about 0.5 mMandabout 1 50 light source, a “photobioreactor, and/or can have one or mM, between about 1 mMandabout 2 mM, between about 2 more walls that is transparent enough to light, including Sun mMandabout 5 mM, between about 5 mMandabout 10 mM, light, to enable, facilitate, and/or maintain acceptable micro between about 10 mMandabout 25 mM, and greater than 25 organism growth. For production offatty acid products, pho mM metal (e.g., calcium) in the formulation. tosynthetic microorganisms or host cells can additionally or In further embodiments, by using the excess amount of 55 alternately be cultured in shake flasks, test tubes, vials, micro metal (e.g., calcium) in the medium, at least a portion of the titer dishes, petridishes, or the like, or combinations thereof. fatty acid(s) can be sequestered as Soap precipitates, which Additionally or alternately, recombinant photosynthetic may result in decreasing the toxic effects of free fatty acid(s). microorganisms or host cells may be grown in ponds, canals, Addition of metal (e.g., calcium) in the medium can addition sea-based growth containers, trenches, raceways, channels, ally or alternately increase the tolerance of microorganism in 60 or the like, or combinations thereof. As with standard biore media with a relatively high concentration of free fatty acids. actors, a source of inorganic carbon (such as, but not limited Additionally or alternately, fatty acid-producing strains can to, CO, bicarbonate, carbonate salts, and the like), including, advantageously be more robust with excess metal (e.g., cal but not limited to, air, CO-enriched air, flue gas, or the like, cium) content. Although the excess component is described or combinations thereof, can be supplied to the culture. When herein as a metal, it is contemplated that the component can 65 Supplying flue gas and/or other sources of inorganic that may more generally be described as a carboxylate counterion contain CO in addition to CO., it may be necessary to pre Source, for example a Soap-forming counterion source, a treat such sources such that the CO level introduced into the US 9,096,834 B2 45 46 (photo)bioreactor do not constitute a dangerous and/or lethal example, at least one of a C12, C14, C16, and/or a C18 fatty dose with respect to the growth and/or survival of the micro acid, can be increased in the culture with respect to a culture organisms. of an otherwise identical microorganism or host cell, but The methods include culturing a recombinant microorgan lacking the genetic modifications of the invention. ism, such as a photosynthetic microorganism, such as, for It is to be understood that the disclosure of the present example, a cyanobacterium, that includes a protein(s) as invention extends to methods, products and systems accord described herein to produce at least one fatty acid product, in ing to the various aspects of the invention which comprise combinations of one or more features discussed herein by which the method results in production of at least 0.1%, 0.5%, reference to certain embodiments of the invention with one or 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 15%, 20%, more further features discussed herein by reference to certain 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 10 other embodiments of the invention. 75%, 80%, 85%, 90%, 95%, 100%, 110%, 120%, 130%, 14.0%, 150%, 160%, 170%, 180%, 190%, 200%, 300%, Additionally or alternatively, the present invention can 400%, 500%. 600%, 700%, 800%, 900%, or 1000% more include one or more of the following embodiments. than the amount of the fatty acid product produced by an Embodiment 1 otherwise identical microorganism not including the pro 15 tein(s), cultured under identical conditions. Additionally or A recombinant microorganism comprising a non-native alternately, the methods include producing at least 100 mg, at nucleic acid sequence encoding a thioesterase and a non least 110 mg, at least 120 mg, at least 130 mg, at least 140 mg. native nucleic acid sequence encoding a lysophosphatidic at least 150 mg, at least 160 mg, at least 170 mg, at least 180 acid acyltransferase (LPAAT), wherein the recombinant mg, at least 190 mg, at least 200 mg, at least 210 mg, at least microorganism produces at least one free fatty acid or at least 220 mg, at least 230 mg, at least 240 mg, at least 250 mg, at least 260 mg, at least 270 mg, at least 280 mg, at least 290 mg. one fatty acid derivative, wherein the LPAAT and the at least 300 mg, at least 310 mg, at least 320 mg, at least 330 thioesterase have different acyl substrate preferences. mg, at least 340 mg, at least 350 mg, at least 360 mg, at least Embodiment 2 370 mg, at least 380 mg, at least 390 mg, at least 400 mg, at 25 least 450 mg, at least 500 mg, at least 550 mg, at least 600 mg. The recombinant microorganism of embodiment 1, at least 650 mg, at least 700 mg, at least 750 mg, at least 800 wherein the thioesterase and the LPAAT have complementary mg, at least 850 mg, at least 900 mg, at least 950 mg, per liter acyl-ACP substrate preferences, preferably wherein the of culture of a fatty acid product by culturing the recombinant LPAAT has a substrate preference for an acyl-ACP substrate microorganisms described herein. Although many times the 30 having a longer acyl chain length than an acyl chain length of goal can be to produce and/or recover as much fatty acid a preferred substrate of the thioesterase. product as possible, in some instances the amount of the fatty acid product produced and/or recovered by the method Embodiment 3 described herein can be limited to 600 mg or less per liter of culture, for example to 550 mg or less per liter of culture, or 35 The recombinant microorganism of embodiment 1, to 500 mg or less per liter of culture. wherein any of the following are satisfied: the thioesterase has Fatty acid products can be recovered from culture by recov a Substrate preference for one or more acyl Substrates having ery means known to those of ordinary skill in the art, such as acyl chain lengths of C12, C14, and/or C16, and the LPAAT by whole culture extraction, for example, using organic Sol has a substrate preference for substrates having a C18 acyl vents. In some cases, recovery of fatty acid products can be 40 chain length; the thioesterase has a Substrate preference for enhanced by homogenization of the cells. When fatty acid one or more acyl Substrates having acyl chain lengths of C12 products are sufficiently released or secreted from the micro and C14 and the LPAAT has a substrate preference for sub organisms into the culture medium, the recovery method can strates having a C16 or C18 acyl chain length; the thioesterase be adapted to efficiently recover only the released fatty acid has a Substrate preference for one or more acyl Substrates products, only the fatty acid products produced and stored 45 having acyl chain lengths of C12 and the LPAAT has a sub within the microorganisms, or both the produced and released strate preference for one or more substrates having a C14. fatty acid products. C16, and/or C18 acyl chain length; and the thioesterase has a Fatty acid products secreted/released into the culture Substrate preference for one or more acyl Substrates having medium by the recombinant microorganisms described acyl chain lengths of C12, C14, C16, and/or C18, and the above can be recovered in a variety of ways. A straightfor 50 LPAAT has a substrate preference for substrates having a ward isolation method, e.g., by partition using immiscible C20, C22, or C24 acyl chain length. Solvents, may be employed. Additionally or alternately, par ticulate adsorbents can be employed. These can include lipo Embodiment 4 philic particulates and/or ion exchange resins, depending on the design of the recovery method. They may be circulating in 55 The recombinant microorganism of embodiment 1, the separated medium and then collected, and/or the medium wherein the thioesterase and the LPAAT have different acyl may be passed over a fixed bed column, for example a chro ACP substrate preferences, wherein one or more of the fol matographic column, containing these particulates. The fatty lowing are satisfied: the LPAAT has a C16 acyl substrate acid products can then be eluted from the particulate adsor preference, the LPAAT has a C18 substrate preference, the bents, e.g., by the use of an appropriate solvent. In Such 60 LPAAT is a prokaryotic LPAAT, the LPAAT is a eukaryotic circumstances, one isolation method can include carrying out LPAAT, the LPAAT is a microsomal LPAAT, the LPAAT is a evaporation of the solvent, followed by further processing of plastidal LPAAT, the LPAAT is a cyanobacterial LPAAT. the isolated fatty acid products, to yield chemicals and/or fuels that can be used for a variety of commercial purposes. Embodiment 5 In some embodiments of the methods described herein, the 65 level of a fatty acid product, for example a C8-C24 fatty acid, The recombinant microorganism of any of embodiments a C12-C24 fatty acid, or a C12-C 18 fatty acid, such as, for 1-4, wherein the LPAAT has at least 40%, at least 45%, at US 9,096,834 B2 47 48 least 50%, at least 55%, at least 60%, at least 65%, at least comprises a construct for downregulating the expression of 70%, at least 75%, at least 80%, at least 85%, at least 90%, at an endogenous acyl-ACP synthetase gene or an acyl-CoA least 95%, at least 96%, at least 97%, at least 98%, at least synthetase gene, wherein the construct is an antisense con 99%, or about 100% identity to SEQID NO:2, 3, 4, 5, 6, 7, 8, struct, a micro RNA construct, a ribozyme construct, or a 9, 10, 11, 12, 13, 14, or 15; or gene disruption construct. the LPAAT has at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at Embodiment 10 least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or about 100% The recombinant microorganism of any one of embodi identity to SEQID NO:16 or 17; or ments 1-9, wherein the recombinant microorganism further the LPAAT has at least 40%, at least 45%, at least 50%, at least 10 comprises a construct for downregulating the expression of 55%, at least 60%, at least 65%, at least 70%, at least 75%, at an endogenous acyl-ACP synthetase gene or an acyl-CoA least 80%, at least 85%, at least 90%, at least 95%, at least synthetase gene, wherein the construct is an antisense RNA 96%, at least 97%, at least 98%, at least 99%, or about 100% construct, a microRNA construct, or a ribozyme construct. identity to SEQID NO:18, 19, 20, or 21; or the LPAAT has at least 40%, at least 45%, at least 50%, at least 55%, at least 15 Embodiment 11 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least A recombinant microorganism according to any of the 97%, at least 98%, at least 99%, or about 100% identity to previous embodiments, wherein the recombinant microor SEQID NO:22, 23, 24, 25, or 26. ganism further includes one or more additional exogenous or non-native genes encoding one or more of an acyl-CoA syn Embodiment 6 thetase, an acyl-CoA reductase, an acyl-ACP reductase, a The recombinant microorganism of any previous embodi carboxylic acid reductase, a fatty aldehyde reductase, a fatty ment, wherein the thioesterase is an acyl-CoA thioesterase, a aldehyde decarbonylase, a fatty acid decarboxylase, a wax 4-hydroxybenzoyl thioesterase, or an acyl-ACP thioesterase, synthase, and an acyltransferase, wherein the microorganism optionally a plant acyl-ACP thioesterase or a prokaryotic produces at least one fatty acid derivative, preferably wherein acyl-ACP thioesterase. 25 the fatty acid derivative is a fatty alcohol, a wax ester, an alkane, or an alkene. Embodiment 7 Embodiment 12 The recombinant microorganism of any previous embodi ment, wherein any of the following are satisfied: the non 30 A recombinant microorganism according to any of the native nucleic acid sequence encodes an LPAAT that is previous embodiments, wherein the recombinant microor homologous with respect to the recombinant microorganism, ganism further includes one or more additional exogenous or the non-native nucleic acid sequence encodes an LPAAT that non-native genes encoding one or more of an a beta-ketoacyl is heterologous with respect to the recombinant microorgan synthetase, an acetyl-CoA carboxylase, a malonyl CoA:ACP ism, the non-native nucleic acid sequence encoding the 35 transacylase, or an acyl carrier protein. LPAAT is operably linked to a heterologous promoter, the non-native nucleic acid sequence encoding the acyl-ACP Embodiment 13 thioesterase is operably linked to a heterologous promoter, one or both of the non-native nucleic acid sequence encoding The recombinant microorganism of any one of embodi the acyl-ACP thioesterase and the non-native nucleic acid ments 1-12, wherein the recombinant microorganism is a sequence encoding the LPAAT are operably linked to an 40 photosynthetic microorganism, optionally wherein the pho inducible promoter, the non-native nucleic acid sequence tosynthetic microorganism is a cyanobacterium or a encoding the acyl-ACP thioesterase and the non-native microalga. nucleic acid sequence encoding the LPAAT are operably linked to the same promoter or to copies of the same promoter. Embodiment 14 45 Embodiment 8 The recombinant microorganism of embodiment 13, wherein the cyanobacterium is Agmenellum, Anabaena, Ana A recombinant microorganism, according to any of the baenopsis, Anacystis, Aphanizomenon, Arthrospira, Astero following: the microorganism comprises a non-native gene capsa, Borzia, Calothrix, Chamaesiphon, Chlorogloeopsis, encoding an LPAAT having at least 85% identity to SEQID 50 Chroococcidiopsis, Chroococcus, Crinalium, Cyanobacte NO:2, or to a cyanobacterial ortholog thereof and a non rium, Cyanobium, Cyanocystis, Cyanospira, Cyanothece, native gene encoding an acyl-ACP thioesterase having at least Cylindrospermopsis, Cylindrospermum, Dactylococcopsis, 85% identity to SEQ ID NO:44, SEQ ID NO:54, SEQ ID Dermocarpella, Fischerella, Fremvella, Geitleria, Gei NO:55, SEQID NO:56, or SEQ ID NO:57; or tlerinema, Gloeobacter, Gloeocapsa, Gloeothece, Halospir the microorganism comprises a non-native gene encoding an 55 ulina, Iyengariella, Leptolyngbya, Limnothrix, Lyngbya, LPAAT having at least 85% identity to SEQID NO:22, 23, 24, 24, or 26, and a non-native gene encoding an acyl-ACP Microcoleus, Microcystis, Myxosarcina, Nodularia, Nostoc, thioesterase having at least 85% identity to SEQID NO:26, Nostochopsis, Oscillatoria, Phormidium, Planktothrix, Pleu SEQID NO:54, SEQID NO:55, SEQID NO:56,or SEQ ID rocapsa, Prochlorococcus, Prochloron, Prochlorothrix, NO:57; Pseudanabaena, Rivularia, Schizothrix, Scytoinema, Spir wherein the microorganism produces at least one free fatty 60 ulina, Stanieria, Starria, Stigonema, Symploca, Synechococ acid or at least one fatty acid derivative; cus, Synechocystis, Tolypothrix, Trichodesmium, Tichonema preferably wherein the microorganism is a cyanobacterium. or Xenococcus. Embodiment 9 Embodiment 15 65 The recombinant microorganism of any one of embodi The recombinant microorganism of embodiment 13, ments 1-8, wherein the recombinant microorganism further wherein the microalga is an Achnanthes, Amphiprora, US 9,096,834 B2 49 50 Amphora, Ankistrodesmus, Asteromonas, Boekelovia, Borod embodiments 1-19 under conditions in which at least one inella, Botryococcus, Bracteococcus, Chaetoceros, Carteria, fatty acid product is produced, wherein the fatty acid product Chlamydomonas, Chlorococcum, Chlorogonium, Chlorella, is one or more of a fatty acid, a fatty aldehyde, a fatty alcohol, Chroomonas, Chrysosphaera, Cricosphaera, Crypthecod a fatty acid ester, a wax ester, or a hydrocarbon, wherein the inium, Cryptomonas, Cyclotella, Dunaliella, Ellipsoidon, fatty acid product is preferably a C12-C24 fatty acid product, Emiliania, Eremosphaera, Ernodesmius, Euglena, Franceia, more preferably a C12-C 18 fatty acid product, or more pref Fragilaria, Gloeothamnion, Haematococcus, Halocafeteria, erably a C12-C18 fatty acid product, wherein the fatty acid Hymenomonas, Isochrysis, Lepocinclis, Micractinium, product is Saturated or unsaturated. Monoraphidium, Nannochloris, Nannochloropsis, Navicula, Neochloris, Nephrochloris, Nephroselmis, Nitzschia, 10 Embodiment 21 Ochromonas, Oedogonium, Oocystis, Ostreococcus, Pav lova, Parachlorella, Pascheria, Phaeodactylum, Phagus, The method of embodiments 20, further wherein at least a Platymonas, Pleurochrysis, Pleurococcus, Prototheca, portion of the fatty acid product is secreted into the medium, Pseudochlorella, Pyramimonas, Pyrobotrys, Scenedesmus, preferably wherein the fatty acid product is isolated from the Schizochytrium, Skeletonema, Spyrogyra, Stichococcus, Tet 15 culture medium. raselmis, Thraustochytrium, Viridiella, or Volvox species. Embodiment 22 Embodiment 16 The method of claim 20 or 21, wherein the recombinant The recombinant microorganism of any one of embodi microorganism is a photosynthetic microorganism, wherein ments 1-12, wherein the recombinant microorganism is a the photosynthetic microorganism is cultured photoau bacterium, fungus, or heterokont, optionally wherein the totrophically, preferably wherein the culture medium does recombinant microorganism is a species of Acetobacter; not include a substantial amount of reduced carbon, option Acinetobacter, Arthrobacter; Bacillus, Brevibacterium, Chro ally wherein the culture medium includes inorganic carbon, matium, Chlorobium, Clostridium, Corynebacterium, Deino 25 optionally wherein the inorganic carbon is CO or carbonate. coccus, Delftia, Desulfovibrio, Enterococcus, Escherichia, Kineococcus, Klebsiella, Lactobacillus, Lactococcus, Micro Embodiment 23 coccus, Mycobacterium, Jeotgalicoccus, Paenibacillus, Pro pionibacter, Pseudomonas, Rhodopseudomonas, Rhodo The method of any of embodiments 20-22, wherein expres bacter, Rhodococcus, Rhodospirillium, Rhodomicrobium, 30 sion of one or both of the non-native nucleic acid sequence Salmonella, Serratia, Shewanella, Stenotrophomonas, Strep that encodes an LPAAT and the non-native nucleic acid tomyces, Streptococcus, Vibrio, Zymomonas, Aspergillus, sequence encodes a thioesterase is induced. Candida, Cryptococcus, Trichoderma, Neurospora, Endo mycopsis, Fusarium, Humicola, Rhizomucor, Rhodotorula, Embodiment 24 Kluyveromyces, Pichia, Lipomyces, Mucor, Myceliophtora, 35 Penicillium, Phanerochaete, Chrysosporium, Saccharomy The method of any one of embodiments 20-23, wherein at ces, Schizosaccharomyces, Yarrowia, Schizochytrium, or least one fatty acid product produced by the recombinant Thraustochytriales. microorganism is a C8-C24 fatty acid, fatty alcohol, or fatty aldehyde, preferably a C12-C 18 fatty acid, fatty alcohol, or Embodiment 17 40 fatty aldehyde, or more preferably a C12-C16 fatty acid, fatty alcohol, or fatty aldehyde; or wherein at least one fatty acid The recombinant microorganism of any one of embodi product produced by the recombinant microorganism is a ments 1-16, wherein the recombinant microorganism pro C16-C48 wax ester, preferably a C24-C36 wax ester, or more duces: at least 10%, at least 20%, at least 25%, at least 30%, preferably a C24-C32 wax ester; or wherein the fatty acid at least 40%, at least 50%, or at least 60% more fatty acid 45 product is a C11-C23 alkane or alkene, preferably a C11-C19 product compared to an otherwise identical microorganism alkane or alkene, and more preferably a C11-C17 alkane or lacking the second nucleic acid. alkene. Embodiment 18 Embodiment 25 50 The recombinant microorganism of embodiment 16, A fatty acid product produced by the recombinant micro wherein the fatty acid product is one or more of a fatty acid, a organism of any one of embodiments 1-19 or any of the fatty aldehyde, a fatty alcohol, a fatty acid ester, a wax ester, methods of embodiments 20-24, wherein the fatty acid prod or a hydrocarbon. uct is one or more of a fatty acid, a fatty aldehyde, a fatty 55 alcohol, a fatty acid ester, a wax ester, or a hydrocarbon. Embodiment 19 It is to be understood that the disclosure of the present invention extends to methods, products and systems accord The recombinant microorganism of embodiment 18, ing to the various aspects of the invention which comprise wherein the fatty acid product is a C8-C24 fatty acid product, combinations of one or more features discussed herein by preferably a C12-C 18 fatty acid product or a C12-C16 fatty 60 reference to certain embodiments of the invention with one or acid product, optionally wherein the fatty acid product is more further features discussed herein by reference to certain saturated. other embodiments of the invention.

Embodiment 20 EXAMPLES 65 A method of producing a fatty acid product comprising The invention as described above can be readily under culturing the recombinant microorganism of any one of stood by reference to the following examples, which are US 9,096,834 B2 51 52 included for purposes of illustration of certain aspects and the vectorin E. Coli. The pSGI-YC63 vector (SEQID NO:31) embodiments of the present invention, and are not intended to was PCR amplified to produce a linearized nucleic acid mol limit the invention. ecule using primers NB393 (SEQ ID NO:34) and NB394 (SEQ ID NO:35). Purified PCR product for the vector back bone and the sll 1752 gene was combined in E. coli using the Example 1 Quick PCR Cloning Kit and propagated in Alpha-Select Gold E. Coli cells. Production of Free Fatty Acids in Reduced Feedback 1.2. Thioesterase Col FatB1 Vector Construction Inhibition Synechocystis sp. PCC6803 Strains A thioesterase gene from the higher plant Cuphea carthagenensis, Cc1 FatB1 (US 2009/0298143: WO 2009/ While the invention is not bound by any one theory, the 10 076559), was cloned into a vector designed for promoting inventors hypothesized that C18:0 acyl-ACP, which is not a gene integration into the RS1 site of the Synechocystis sp. preferred substrate of the non-native Ce1 FatB1 acyl-ACP PCC 6803 genome. The pSGI-NB158acyl-ACP thioesterase thioesterase expressed in the cell, might accumulate in fatty construct (SEQID NO:36: FIG. 3) contained a P15A origin acid-producing cells that express the C12-C 16-preferring of replication for propagation in E. coli, amplified from vector Cc 1 FatB1 acyl-ACP thioesterase, possibly limiting fatty acid 15 pACYC184 using primers NB470 (SEQ ID NO:37) and production by feedback inhibition, and that overexpressing a NB471 (SEQID NO:38); a kanamycin resistance marker for C18-preferring LPAAT gene might limit any inhibitory selection, amplified from cyanobacterial integration vector effects by reducing the C18:0 acyl-ACP pool through SGI-YC28 using primers NB466 (SEQ ID NO:39) and LPAAT-catalyzed condensation of C18:0 acyl-ACP with NB467 (SEQ ID NO:40), “RS1 up” (SEQ ID NO:41) and lysophosphatidic acid to generate phosphatidic acid. See, “RS1 down” (SEQ ID NO:42) fragments for homologous e.g., FIG. 1. In addition to limiting feedback inhibition of recombination in Synechocystis 6803, amplified using prim fatty acid synthesis, the inventors considered the modification ers NB466 and NB467, respectively; the trcY promoter (SEQ might help to stabilize the cell membrane by promoting mem ID NO:28), and a lacIQ repressor for the trcY promoter brane lipid biosynthesis. driven Cuphea Cc 1 FatB1 thioesterase gene (SEQID NO:43) 1.1. LPAATVector Construction encoding an N-terminally truncated acyl-ACP thioesterase An expression construct was designed that included a gene 25 (SEQID NO:44) having a substrate preference for C12-C16 encoding an LPAAT with a preference for C18:0 acyl-ACPs. acyl-ACPs. Primers NB 462 (SEQ ID NO:45) and NB 463 The pSGI-NB55 vector (SEQ ID NO:27: FIG. 2) was con (SEQ ID NO:46) were used to amplify the trcY-Cc1 FatB1 structed to overexpress the Lysophophatidic Acid Acyl Trans thioesterase cassette from a prior expression vector. The lac ferase (LPAAT) gene sll 1752 (Genbank Accession Number, IQ fragment was amplified using primers NB464 (SEQ ID Gene ID: SEQID NO:1) from Synechocystis sp. PCC 6803 30 NO:47) and NB 465 (SEQ ID NO:48) from another vector, using a trcY promoter (SEQ ID NO:28). The sl11752 gene and the kanamycin resistance gene was amplified using prim was amplified from genomic Synechocystis DNA using prim ers NB466 (SEQ ID NO:49) and NB467 (SEQ ID NO:50) ers NB395 (SEQ ID NO:29) and NB396 (SEQ ID NO:30) from vectorpSGI-YC28 (see Table 1). Primers were designed which contained approximately 15bp sequence overlap with to include sequences that would be incorporated into the cyanobacterial integration vector pSGI-YC63 (SEQ ID 35 termini of the final PCR products such that the amplified NO:31). pSGI-YC63 contained a spectinomycin marker for fragments would have 10-20 bps terminal homology to selection, homologous “RS2 up” (SEQID NO:32) and “RS2 nucleic acid fragments to be cloned adjacent to the fragments. down” (SEQID NO:33) arms for integration in the RS2 site Cloning was performed using the Quick PCR Cloning Kit of the Synechocystis sp. PCC 6803 genome, the lacIQ repres (BPS Biosciences, San Diego). The clones were transformed sor for the trcY promoter (SEQ ID NO:28) to drive sl11752 and propagated in Alpha-Select Gold competent E. coli cells expression, and a pUC origin of replication for propagation of (Bioline, Tauton). TABLE 1.

Primers Used in Wector Construction

PRIMER SEQUENCE SEO ID NO :

NB460 GTTTATCACAGTTAAATTATTGCTGAAGCGGAATCCCTGG 58 NB461 CAGCCTCAGGCCATATAACCATCAAAGCCATAGTTGGC 59 NB462 CTTTGATGGTTATATGGCCTGAGGCTGAAATGAGCTGTTGAC 45 NB4 63 GATCTGAAGCTTGAAAGGTTATTAACTGGCGGGATTACCATTACTGG 46 NB4 64 GTAATCCCGCCAGTTAATAACCTTTCAAGCTTCAGATCAATTCGCGC 47 NB46 ATATTTTTATCTTGTGGTCCGGAATTCGCCCTTCTAAGC 48 NB466 GGGCGAATTCCGGACCACAA- 49 GATAAAAATATATCATCATGAACAATAAAACTGTCTG NB467 GGTGCCATCCATACCGGGCCGCCGTCCCGTCAAG SO NB4 68 GGGACGGCGGCCCGGTATGGATGGCACCGATG 6 O NB469 CCGTCAGTAGCTGAACATGGGGGACCATTCTCTGGATCATTG 67 NB47O AGAGAATGGTCCCCCATGTTCAGCTACTGACGGGGTG 37 NB471 GGATTCCGCTTCAGCAATAATTTAACTGTGATAAACTACCGCATTAAAGCTTATCG 38 NB1 CTCGAGCCCCCGTGCTATGACTAGC 52 NB9 CTCGAGCCCGGAACGTTTTTTGTACCCC 53 NB87 CAATTGGCATGCACCCCTATTTGTTTATTTTTCTAAATAC 61 NB88 CAATTGGTTTAAACGGCCTAAGCTTTATGCTTGTAAACCGTTTTGTG 62 NB5 GATCGACGTATAAACTGTCAACATATTTCTGCAAG 63 NB6 CTCGAGTGTGGTGCCGGAGGTGTAG 7s NB393 TTCTAGATATCTGCAGGCCTAAGCTTTATGC 34 NB394 TTTTTTTCCTCCTTAGTGTGAAATTGTTATCCGC 35 NB395 CACTAAGGAGGAAAAAAAATGACCAATTCTCCCCTGGCG 29 NB396 CCTGCAGATATCTAGAATCACGAAGCGGCGATCG 3 O US 9,096,834 B2 53 1.3. Acyl-ACP Synthetase Knockout Vector Construction TABLE 2-continued The pSGI-NB19 vector (SEQ ID NO:51) was used to knock out the slr1609 Acyl-ACP synthetase gene of Syn BG-11 Medium echocystis sp. PCC 6803. Primers NB1 (SEQID NO:52) and Trace Metal Mix A5" 1.0 ml NB9 (SEQID NO:53) were used to amplify the slr1609 gene Agar (if needed) (up to) 10.0 g fragment from Synechocystis sp. PCC 6803. This fragment Distilled water 1.0 L was ligated into the TOPO vector PCR2.1 (Life Technologies, "Trace Metal Mix A5 Carlsbad, Calif.) via the manufacturer's protocol, and the HBO, 2.86g plasmid construct was propagated in TOP10 E. Coli cells MnCl* 4H2O 1.81 g 10 ZnSO4*7H2O 0.22g (Life Technologies, Carlsbad, Calif.). This plasmid was cut Na2MoC* 2H2O 0.39 g with and Mfel restriction Enzyme (New England Biolabs, CuSO4* 5H2O 0.080 g Ipswich, Mass.). The GmR fragment was PCR amplified Co(NO3)2 * 6H2O 49.4 mg from a pAW4 plasmid with primers NB87 and NB88 contain Distilled water to 1.0L ing Mfel restrictions sites used to cut the ends of the PCR 15 For production of fatty acids, patches of Synechocystis fragment. This was then ligated into the TOPO vector con cells transformed with desired constructs (based on size taining the slr1609 fragment from the previous step to create selection using specific primers for PCR screening) were plasmid pSGI-NB19 (SEQ ID NO:51). scraped and grown in 25 mL of BG-11 medium (ATCC 616, 1.4. Production of Free Fatty Acids as shown in Table 2; component weights are approximate; Synechocystis sp. PCC 6803 was transformed with the final pH 7.1: autoclaved at about 121°C. for about 15 mins.) pSGI-NB55 LPAAT gene expression construct (SEQ ID with appropriate antibiotics. Liquid cultures were grown in NO:27: FIG. 2), the pSGI-NB158 acyl-ACP thioesterase shake flasks at 30° C., -65umol/m/s light, about 215 rpm expression construct (SEQID NO:36: FIG. 3), or the pSGI with the supply of ~5% CO, until turbid (approximately 3 NB19 acyl-ACP synthetase gene knock out (AAS KO) con days). The O.D. (730 nm) of each culture was measured in struct (SEQID NO:51: FIG. 4) or the combinations of pSGI order to determine the volume of BG-11 medium for resus NB55 and pSGI-NB158 or pSGI-NB55 and pSGI-NB19 25 pending the cells for production cultures. Resuspensions essentially according to Zang et al. (2007) J. Microbiology were performed to provide all cultures at or above O.D. 1.1 45:241-245, the content of which is incorporated herein by (730 nm). To ensure that the desired starting O.D. 1.1 (730 reference in its entirety. Briefly, cells were grown under con nm) for induction was achieved, cultures were spun down at stant light to an optical density 730 (O.D. zo) of approxi 2,000 g for 10 minutes (20-25°C.) and resuspended in 9-10 mately 0.7 to 0.9 (an OD, so of -0.25 corresponds to ~1x10 30 mL BG-11 medium supplemented with 10 mg/mL BSA. cells/ml) and harvested by centrifugation at ~2,000 g for ~15 O.D. readings were measured again after resuspension and mins at room temperature (-20-25°C.). The cell pellet was diluted to O.D 1.1 (730 nm), again in BG-11 medium supple resuspended in approximately 0.3 times the growth volume of mented with 10 mg/mL BSA. BSA supplementation reduced fresh BG-11 medium and used immediately for transforma fatty acid toxicity that could cause bleaching of the produc 35 tion cultures. Appropriate antibiotics were added to the cul tion. About 1 microgram of plasmid DNA (containing pSGI tures and the cells were induced with IPTG on day 1 in ~4ml NB55 construct (SEQ ID NO:27) that included an LPAAT screw thread glass vials with gas permeable tape for sealing, gene, or the pSGI-NB158 acyl-ACP thioesterase construct growing at about 30°C., -65 umol/m/s light, shaking at (SEQ ID NO:36), or the AAS knock out construct pSGI about 215 rpm with a supply of ~5% CO. The BG-11 NB19 (SEQID NO:51)) was added to -0.3 ml of cells, gently medium does not provide a reduced carbon Source that can be mixed, and incubated approximately 5 hours with illumina 40 used as an energy source for the growth of the cells. Rather, tion at -30°C. without agitation. Cells were spread on a filter the cells were grown phototrophically using CO as substan (Whatmann Nuclepore Polycarbonate Track-Etched mem tially the sole carbon Source, using light as the energy sources, brane, PC ~47 mm, ~0.2 micron) positioned on a ~50 ml and incorporating carbon from CO into biomolecules, BG-11 agar plates and allowed to recover for about 16 to 24 including fatty acids. The final concentration of -1 mMIPTG hours under light, after which the filter was lifted and placed 45 was added to all samples to induce the free fatty acid produc on a fresh BG-11 plate containing spectinomycin (10 g/ml) tion. The whole vials were submitted for GC-free fatty acid for selection of the LPAAT expression construct pSGI-NB55 analysis. (SEQ ID NO:27), kanamycin (10 ug/ml) for selection of the 1.5. Analysis of free fatty acids acyl-ACP thioesterase expression construct pSGI-NB158 Free fatty acids were analyzed by gas chromatography (SEQID NO:36), or gentamycin (5 g/ml) for selection of the 50 with flame ion detection (GC-FID). About 1.0 mL of the acyl-ACP synthetase knockout construct pSGI-NB 19 (SEQ Synechocystis cultures were added to ~4 mL glass GC vials with PTFE-lined caps (National Scientific). About eighty ID NO:51) to select for transformants. Resulting colonies four microliters of an internal standard (I.S.) set that included were screened further for the presence of the thioesterase the free fatty acids C9:0, C13:0, and C17:0, each at about 600 genes by PCR using the primers used to generate the gene ug/ml in hexane, were added to the culture sample followed fragments. 55 by about 83 microliters of ~50% HSO, about 167 microli ters of ~5M NaCl, and about 1.4 milliliters of hexane. The TABLE 2 final concentration of each I.S. was about 50 g/mL. The fatty acids for making the internal standard set were purchased BG-11 Medium either from Fluka or Nu Chek Prep. The cultures were then NaNO, 1.5 g. 60 vortexed on a Multi-tube vortexer at about 2,500 rpm for KHPO 0.04 g about 30 minutes. The vials were finally centrifuged for about MgSO * 7H2O 0.075 g 3 minutes at about 2,000 rpm, in order to provide good sepa CaCl-, * 2H2O 0.036 g Citric acid 6.0 mg ration between organic and aqueous phases. The hexane layer Ferric ammonium citrate 6.0 mg was analyzed by gas chromatography (GC). EDTA 1.0 mg 65 Synechocystis fatty acid samples were analyzed on an Agi Na2CO 0.02 g lent model 7890A GC/FID that included a J&W Scientific DB-FFAP capillary column (~10 m length, ~0.10 mm inter US 9,096,834 B2 55 56 nal diameter, ~0.10um film thickness). For analysis of cyano in total FFA as compared to Control 1 A. A similar drop in bacterial samples, the GC oven was programmed as follows: total FFA as compared to Control 1A was observed in strains about 120° C. for about 0.1 minute, then heated at about 40° with an attenuated acyl-ACP synthetase gene (AASKO) or a C./minute to about 240°C. (hold -3.5 minutes). The injector combination of sl1752 and AASKO. temperature was kept at about 260°C., and a ~30:1 split -0.9 5 FIG. 6 shows the chain lengths of the fatty acids produced ul injection was used. Helium was used as a carrier gas at a by the strains expressing the acyl-ACP thioesterase flow rate of about 0.599 mL/minute. The analytes were iden- Cc1 FatB1 with and without expression of the sl11752 LPAAT. tified by comparison of retention times to individually The substrate preference for C14 and C16, as well as C12 acyl injected Standards. The calibration range for the analytes was Substrates is clearly seen in the 1A control and is maintained about 2 ug/ml to about 200 ug/ml for C8:0-C 16:1 fatty acids 10 in the strain that co-expressed the C18-preferring LPAAT and about 0.625 ug/ml to about 50 g/ml for C18:0-C 18:3 gene; however the yield of free fatty acids is significantly fatty acids. increased. As shown in FIG. 5, a 60% increase in total FFA production Those skilled in the art will recognize, or be able to ascer over the CC1FatB1 thioesterase-containing strain (Control tain using no more than routine experimentation, many 1A) was detected for the combination Ce1 FatB1 thioesterase equivalents to the specific embodiments of the invention and LPAAT-containing strain (sl1752). Over-expression of 15 described herein. Such equivalents are intended to be encom sl1752 alone, however, resulted in a greater than 3-fold drop passed by the following claims.

SEQUENCE LISTING

<16 Os NUMBER OF SEO ID NOS: 63

<21 Os SEQ ID NO 1 &211s LENGTH: 723 &212s. TYPE: DNA <213> ORGANISM: Synechocystis PCC6803 <4 OOs SEQUENCE: 1 atgaccalatt ct cocctggc gatcgaccct gctgtcattg gocct tccag ggitttcc.ccc 60

tggctaatca aattaattta toccc taggc accaggttitt tacattggta ttittggc.ccg 12O atcgc catcc atggit cagga goatttaccc cqcagtggcc cqattattot ggcc cct acc 18O

CaccgttcCC gttgggatgc aattittgctt totttggc.cg ccggtcgggg ggtaactggit 24 O

cgagacct cc gttitt atggt gg.cggtgacg galagtgcagg gtttac aggg ttggitt tatt 3 OO cgc.catttgg ggggattitcc cqttgacgtgaaaaggc.cgg aaat cagtag titgagtt at 360

agtgtgcagt tactic Caagc aggggaaatgttggit cattt tt CCtgaggg gggcatttitt 42O cgggat Caac acactgtc.ca toccctcaaa ciggg tattg gg.cgcattgc catggaagtt 48O

tgtaa.gcaaa accocaiacac tdacattaaa gtgatcc cag tdacgatcgc ctacagtgac 54 O Ccct acc cag gCaaaggcac tt cagtggala attaattittg gC cagggcat cqc.cgctagg 6 OO

gacitacgacc ccagdac cat caaggctago toccaaaaac to accc.gttgtttggcc cat 660 agaatgcaga acctttacag titcc.gagaat gt ct citctgt gcgaggcgat cqc.cgctt.cg 72O

tga 723

<21 Os SEQ ID NO 2 &211s LENGTH: 24 O 212s. TYPE: PRT <213> ORGANISM: Synechocystis PCC6803

<4 OOs SEQUENCE: 2 Met Thr Asn Ser Pro Leu Ala Ile Asp Pro Ala Val Ile Gly Pro Ser 1. 5 1O 15 Arg Val Ser Pro Trp Lieu. Ile Llys Lieu. Ile Tyr Pro Lieu. Gly Thr Arg 2O 25 3 O Phe Lieu. His Trp Tyr Phe Gly Pro Ile Ala Ile His Gly Glin Glu. His 35 4 O 45 Lieu Pro Arg Ser Gly Pro Ile Ile Lieu Ala Pro Thr His Arg Ser Arg SO 55 60

Trp Asp Ala Ile Lieu. Lieu. Ser Lieu Ala Ala Gly Arg Gly Val Thr Gly 65 70 7s 8O US 9,096,834 B2 57 58 - Continued

Arg Asp Luell Arg Phe Met Wall Ala Wall Thir Glu Val Glin Gly Luell Glin 85 90 95

Gly Trp Phe Ile Arg His Lell Gly Gly Phe Pro Wall Asp Wall Lys 105 11 O

Pro Glu Ile Ser Ser Wall Ser Tyr Ser Wall Glin Lell Lell Glin Ala Gly 115 12 O 125

Glu Met Luell Wall Ile Phe Pro Glu Gly Gly Ile Phe Arg Asp Glin His 13 O 135 14 O

Thir Wall His Pro Lell Lys Arg Gly Ile Gly Arg Ile Ala Met Glu Wall 145 150 155 160

Glin Asn Pro Asn Thir Asp Ile Lys Wall Ile Pro Wall Thir Ile 1.65 17O 17s

Ala Ser Asp Pro Pro Gly Lys Gly Thir Ser Wall Glu Ile Asn 18O 185 19 O

Phe Gly Glin Gly Ile Ala Ala Arg Asp Tyr Asp Pro Ser Thir Ile 195

Ala Ser Ser Glin Lell Thir Arg Luell Ala His Arg Met Glin Asn 21 O 215 22O

Lell Tyr Ser Ser Glu Asn Wall Ser Luell Glu Ala Ile Ala Ala Ser 225 23 O 235 24 O

<210s, SEQ ID NO 3 &211s LENGTH: 236 212. TYPE : PRT <213> ORGANISM: Cyanothece PCC 8801

<4 OOs, SEQUENCE: 3

Met Asn. Glin Ser Thir Thir Ser Asn Ser Ile Glu Ser Ser Ile Asn Pro 1. 5 15

Trp Luell Ile Arg Lell Wall Pro Luell Gly Cys Ser Lell Ile Met Pro 25

Lell Phe Phe Gly Arg Ile Thir Ile Ser Gly Glin Glu Asn Ile Pro Thir 35 4 O 45

Thir Gly Pro Wall Ile Wall Ala Pro Thir His Arg Ser Arg Trp Asp Ala SO 55 6 O

Lell Ile Wall Pro Glin Ala Wall Gly Arg Luell Wall Ser Gly Arg Asp Luell 65 70

Arg Phe Met Wall Met Ser Thir Glu Met Thir Gly Lell Glin Gly Trp Luell 85 90 95

Ile Arg Arg Luell Gly Gly Phe Pro Ile Asp Wall Arg Pro Gly Luell 105 11 O

Asp Ser Luell Glu His Ser Wall Ser Ile Luell Lys Glin Gly Glu Met Luell 115 12 O 125

Wall Ile Phe Pro Glu Gly Gly Ile Phe Arg Asp Asn His Wall His Pro 13 O 135 14 O

Lell Arg Gly Wall Ala Arg Ile Ala Luell Glu Wall Wall Ser Glin Glin 145 150 155 160

Pro Asn Ser Gly Ile Ile Luell Pro Wall Ser Wall Glin Tyr Thir Glin 1.65 17O 17s

Pro Pro Ala Trp Thir Asp Wall Ile Wall Asn Ile Gly Glin Pro 18O 185 19 O

Ile Asp Wall Ala Asn Pro Glin Arg Met Ser Ser Ser Glu 195 2OO 2O5 US 9,096,834 B2 59 - Continued Ala Lieu. Thir Asn Asp Lieu. Glu Ala Lys Lieu Lys Asp Lieu. Tyr Glin Gly 21 O 215 22O Glin Lys Ala Lieu. Asn Lieu Ala Glu Phe Val Gly Ser 225 23 O 235

<210s, SEQ ID NO 4 &211s LENGTH: 251 212. TYPE: PRT <213> ORGANISM: Crocosphaera watsonii WH 85O1

<4 OOs, SEQUENCE: 4 Met Cys Asn Lieu. His Arg Glu Ile Val Phe Val Ala Phe Cys Lieu Lys 1. 5 1O 15 Llys Met Asn. Ile His Lys Asn Llys Ser Asp Arg Val Llys Ser Glin Ile 2O 25 3O Ser Pro Trp Lieu. Ile Asn Ile Ala Tyr Pro Leu Gly Ser Lys Ile Val 35 4 O 45 Lieu Pro Leu Tyr Phe Gly Ser Ile Thr Ile Asn Gly Glin Glu Asn Ile SO 55 6 O Pro Thir Ser Gly Pro Ile Lieu. Ile Ala Pro Thr His Arg Ser Arg Trp 65 70 7s 8O Asp Ala Lieu. Ile Ile Pro Tyr Ala Val Gly Arg Ile Val Ser Gly Arg 85 90 95 Asp Val Arg Phe Met Val Thir Ser Ser Glu Ile Thr Gly Ile Glin Gly 1OO 105 11 O Trp Phe Ile Arg Arg Met Gly Gly Phe Pro Val Asp Lieu Lys Arg Pro 115 120 125 Gly Ala Ser Ser Lieu. Glu. His Ser Val Glu Ile Lieu Lys Glin Gly Glu 13 O 135 14 O Met Leu Val Ile Phe Pro Glu Gly Gly Ile Phe Arg Asp Lys Glu Val 145 150 155 160 His Pro Lieu Lys Arg Gly Val Ala Arg Ile Ala Lieu. Glu Val Glu Ser 1.65 17O 17s Gln Glin Pro Gly Cys Gly Met Lys Ile Leu Pro Val Ser Ile Glu Tyr 18O 185 19 O

Thr Llys Pro Phe Pro Ser Trp Gly Thr His Ile Thr Val Asin Ile Gly 195 2OO 2O5

Cys Ser Lieu. Asp Wall Ala Ser Tyr Asp Thir Thir Thr Lieu Lys Arg Ser 21 O 215 22O

Ser Glin Llys Lieu. Thr Glin Asp Lieu. Glu Ser His Lieu Lys Trp Ile Phe 225 23 O 235 24 O

Glin Gly Lys Lieu. Glu Glin Ile Val Ile Glin Lys 245 250

<210s, SEQ ID NO 5 &211s LENGTH: 234 212. TYPE: PRT <213> ORGANISM: Cyanothece CCYO110

<4 OOs, SEQUENCE: 5 Met Asn. Ile Ser Thir Ile Llys Ser Asp Glin Val Llys Ser Arg Ile Ser 1. 5 1O 15

Pro Trp Lieu. Ile Arg Ile Thr Tyr Pro Leu Gly Ser Trp Leu Val Lieu. 2O 25 3O

Pro Leu Tyr Phe Gly Arg Ile Lys Ile Thr Gly Glin Glu Asin Ile Pro 35 4 O 45 US 9,096,834 B2 61 62 - Continued

Asp Thir Gly Pro Wall Ile Ile Ala Pro Thir His Arg Ser Arg Trp Asp SO 55 6 O

Ala Luell Ile Ile Pro Tyr Ala Wall Gly Arg Met Wall Ser Arg Asp 65 70

Wall Arg Phe Met Wall Thir Ser Ser Glu Met Glu Gly Ile Gly Trp 85 90 95

Phe Ile Arg Arg Lieu. Gly Gly Phe Pro Wall Asp Wall Pro Gly 105

Pro Ser Ser Lieu. Glu His Ser Ile Glu Wall Luell Glin Glu Met 115 12 O 125

Lell Wall Ile Phe Pro Glu Gly Gly Ile Phe Arg Asp Thir Wall His 13 O 135 14 O

Pro Luell Arg Gly Wall Ala Arg Ile Ala Luell Asp Wall Ser Glin 145 150 155 160

Glin Pro Gly Met Ile Luell Pro Ile Gly Ile Tyr Ser 1.65 17O 17s

Glin Pro Phe Pro Ser Trp Thir Asp Wall Thir Wall Asn Gly Ser 18O 185

Pro Luell Asp Wall Ala Thir Asp Thir Thir Met Lys Arg Ser Ser 195

Glu Lys Luell Thir Ile Asp Lell Glu Ser Asp Luell Lys Glin Wall Phe His 21 O 215 22O

His Glin Ser Glu Ala Ile Wall Phe Glin Ser 225 23 O

SEQ ID NO 6 LENGTH: 236 TYPE : PRT ORGANISM: Microcystis aeruginosa

< 4 OOs SEQUENCE: 6

Met Ser Glin Ser Thr Ile Wall Asp Glu Ser Luell Thir Ile Arg Lys Ser 1. 5 15

Pro Ala Wall Asn. Ser Arg Ile Asn Pro Cys Luell Ile Arg Phe 2O 25 3O

Phe Luell Gly Asn Tyr Lell Wall Luell Pro Ala Phe Phe Ser Ile Thir 35 4 O 45

Wall Thir Gly Glin Glu Asn Ile Pro Luell Thir Gly Gly Wall Ile Luell Ala SO 55 6 O

Pro Thir His Arg Ser Arg Trp Asp Ala Luell Ile Lell Pro Ala Wall 65 70

Gly Arg Luell Wall Ser Gly Arg Asp Luell Arg Phe Met Wall Ser Ala Asn 85 90 95

Glu Ile Gly Val Glin Gly Trp Phe Ile Arg Arg Lell Gly Gly Phe 1OO 105 11 O

Pro Wall Asn Thir Asp His Pro Gly Met Gly Ser Lell Wall His Ser Wall 115 12 O 125

Glu Luell Luell Ala Ala Gly Glu Met Wall Ala Ile Phe Pro Glu Gly Gly 13 O 135 14 O

Ile Arg Asp Arg Wall Wall His Pro Luell Lys Pro Gly Wall Ala Arg 145 150 155 160

Ile Ala Luell Glu Wall Thir Met Pro Asp Ala Asp Ile Lys Ile 1.65 17O 17s

Lell Pro Wall Ser Ile Ser Asn Glin Pro Tyr Pro Gly Trp Gly Ser 18O 185 19 O US 9,096,834 B2 63 64 - Continued

Glu Ala Thir Wall Asn. Ile Gly Glin Gly Ile ASn Val Lieu. Asp Tyr Glin 195

Glin Asn Ser Lieu Lys Arg Glu Thir Wall Arg Luell Thr Lys Thr Lieu Ser 21 O 215 22O

Asp Arg Luell Llys Lieu. Lieu. His Glu Asp Glin Ser Pro 225 23 O 235

<210s, SEQ ID NO 7 &211s LENGTH: 287 212. TYPE : PRT &213s ORGANISM: Nostoc PCC 712 O

<4 OOs, SEQUENCE: 7

Met Ile Glu Phe Phe Ser Wall Ser Asp Thir His Glin Thir Glin Pro Thir 1. 5 15

Asn Arg Luell Wall His Pro Glin Wall Ala Wall Thir Thir Ser Arg Wall Ser 25

Pro Trp Luell Ser Pro Wall Lell Tyr Phe Wall Gly Asn Tyr Phe Luell Luell 35 4 O 45

Pro Ser Phe Phe Gly Arg Ile Ser Ile Thir Gly Glin Glu Asn Ile Pro SO 55 6 O

Lys Thir Gly Pro Wall Ile Lell Ala Pro Thir His Arg Ala Arg Trp Asp 65 70

Ser Luell Luell Luell Pro Ala Thir Gly Arg Wall Thir Gly Arg Asp 85 90 95

Lieu Phe Met Wall Thir Thir Glu Arg Gly Lieu Glin Gly Trp 105 11 O

Phe Wall Arg Arg Met Gly Gly Phe Pro Ile Asp Thir Glin His Pro Ala 115 12 O 125

Wall Ser Thir Luell Arg His Ala Wall Glu Luell Luell Glin Glin Gly Glin Met 13 O 135 14 O

Lell Wall Ile Pro Glu Gly Asn Ile Phe Arg Asp Gly Luell His 145 150 155 160

Pro Luell Ser Gly Ile Ser Arg Luell Ala Luell Ser Ala Glu Ser Ser 1.65 17O 17s

His Pro Gly Luell Gly Wall Ile Luell Pro Ile Ser Ile Asn Tyr Ser 18O 185 19 O

Glin Pro Tyr Pro Trp Gly Thir Asp Wall Ser Ile His Ile Gly Ser 195

Pro Ile Asn Wall Glin Asp Tyr Thir Gly Lys Wall Lys Glin Asn Ala 21 O 215 22O

Lys Arg Luell Thir Glu Asp Lell Ala Arg Asp Luell Glin Ser Luell Ser Asn 225 23 O 235 24 O

Pro Glu Luell Ala Phe Ser His His Asp Ala Ala Glu Wall Ser Thir Ser 245 250 255

Pro Ile Wall Ser Wall Ser Asp Ser Thir Glu Arg Ser Pro Asn Wall Arg 26 O 265 27 O

Gly Gly Ala Thir Ser Arg Arg Glu Luell Luell Thir Ile Asp 27s 28O 285

<210s, SEQ ID NO 8 &211s LENGTH: 309 212. TYPE : PRT <213> ORGANISM: Anabaena variabilis US 9,096,834 B2 65 66 - Continued

<4 OOs, SEQUENCE: 8

Met Ile Glu Phe Phe Ser Wall Ser Asp Thr Arg Glin Thir Gln Pro Thir 1. 1O 15

Asn Thir Luell Phe Pro Pro Glin Wall Ala Wall Thir Thir Ser Llys Val Ser 25 3O

Pro Trp Luell Ser Pro Wall Lell Tyr Phe Wall Gly Asn Tyr Luell Luell Luell 35 4 O 45

Pro Ser Phe Phe Gly Arg Ile Ser Ile Thr Gly Glin Glu ASn Ile Pro SO 55 6 O

Asn Thir Gly Pro Wall Ile Lell Ala Pro Thir His Arg Ala Arg Trp Asp 65 70

Ser Luell Luell Luell Pro Ala Thir Gly Arg Tyr Wall Thir Gly Arg Asp 85 90 95

Lell Phe Met Wall Thir Thir Glu Cys Arg Gly Lell Gln Gly Trp 105 11 O

Phe Wall Arg Arg Met Gly Gly Phe Pro Ile Asp Thir Glin His Pro Ala 115 12 O 125

Ile Ser Thir Luell Arg His Ala Wall Glu Lieu. Luell Glin Glin Gly Glin Met 13 O 135 14 O

Lell Wall Ile Pro Glu Gly Asn Ile Phe Arg Asp Gly Llys Lieu. His 145 150 155 160

Pro Luell Ser Gly Ile Ser Arg Lieu Ala Luell Ser Ala Glu Ser Ser 1.65 17O 17s

His Pro Gly Luell Gly Wall Ile Leul Pro Wall Ser Ile Asn Tyr Ser 18O 185 19 O

Glin Pro Tyr Pro Trp Gly Thir Asp Wall Ser Ile His Ile Gly Thir 195

Ala Ile Asn Wall Glin Asp Tyr Thir Asin Gly Lys Wall Glin Asn Ala 21 O 215 22O

Lys Arg Luell Thir Glu Asp Lell Ala Arg Asp Luell Glin Ser Luell Ser Asn 225 23 O 235 24 O

Pro Glu Luell Ala Phe Ser His His Asp Wall Ala Glu Wall Ser Asn Pro 245 250 255

Pro Gly Lys Gly Asp Glu Gly Ala Glu Ala Wall Glu Glu Glin 26 O 265 27 O

Gly Luell Gly Luell Pro Ser Gly Ala Ile Ala Glu Glu Glin Ala Ser 28O 285

Ile Wall Ser Wall Ser Arg Thir Gly Ala Thr Ser Arg Arg Arg Glu Luell 29 O 295 3 OO

Lell Thir Wall Asp Tyr 3. OS

<210s, SEQ ID NO 9 &211s LENGTH: 256 212. TYPE : PRT <213s ORGANISM: Nostoc azollae

<4 OOs, SEQUENCE:

Met Met Glu Phe Tyr Ser Ala Ser Thr Llys Pro Gln Ser Glin Ile Thir 1. 5 15

Asn. Thir Lieu Val Asn Pro Llys Val Ala Cys Ser Thr Ser Arg Val Ser 25 3O

Pro Trp Lieu. Thr Pro Leu Ala Tyr Lieu. Lieu. Gly Arg ASn Ile Wall Lieu. 35 4 O 45 US 9,096,834 B2 67 - Continued Pro Leu Phe Phe Gly Asp Ile His Ile Thr Gly Glin Glu Asin Ile Pro SO 55 6 O Thir Ser Gly Pro Ile Ile Leu Ala Pro Thr His Arg Ser Arg Trp Asp 65 70 7s 8O Ser Lieu. Lieu. Lieu Pro Tyr Ala Thr Gly Arg Cys Val Thr Gly Arg Asp 85 90 95 Met Arg Phe Met Val Thr Ser Ser Glu. Cys Lys Gly Val Glin Gly Trp 1OO 105 11 O Phe Val Arg Arg Met Gly Gly Phe Pro Val Asp Thr Glin Arg Pro Ala 115 12 O 125 Ile Ala Thr Lieu. Arg His Thr Val Glu Lieu Met Val Glu Gly Glu Met 13 O 135 14 O Lieu Val Ile Tyr Pro Glu Gly Gly Ile Tyr Arg Asp Gly Llys Val His 145 150 155 160 Ala Lieu Lys Ser Gly Ile Ser Arg Lieu Ala Lieu. Ser Ala Ala Ser Val 1.65 17O 17s His Pro Gly Lieu. Glu Ile Lys Ile Lieu. Thr Val Gly Ile Asn Tyr Ser 18O 185 19 O Gln Pro Tyr Pro Asn Trp Gly Thr Asp Val Asn Ile His Ile Gly Lys 195 2OO 2O5 Pro Ile Llys Val Thr Asp Tyr Ile Ser Gly Cys Lieu Lys Lys Glu Ala 21 O 215 22O Lys Arg Lieu. Thir Ala Asp Lieu. Thir Asn Glin Lieu. Glin Gln Lieu. Ser His 225 23 O 235 24 O Gln Gly Lieu. Glu Val Llys Ser His Ala Phe Ala Glu Ile Pro ASn Ser 245 250 255

<210s, SEQ ID NO 10 &211s LENGTH: 246 212. TYPE: PRT <213> ORGANISM: Raphidiopsis brookii <4 OOs, SEQUENCE: 10 Met Lys Glu Pro His Ser Ala Ser Thr Ser Phe Ala Asp Glin Ser Pro 1. 5 1O 15 Asn Ile Cys Lieu. Asp Pro Pro Phe Ser His Thr Arg Ser Trp Ile Ser 2O 25 3O Pro Trp Lieu. Thr Pro Leu Ala Tyr Trp Leu Gly Tyr His Leu Val Ile 35 4 O 45 Pro Leu Phe Phe Gly Ser Ile His Val Glu Gly Glin Glu. His Ile Pro SO 55 6 O Ile Arg Gly Pro Val Ile Lieu Ala Pro Thr His Arg Ser Arg Trp Asp 65 70 7s 8O Pro Lieu. Lieu. Lieu Ala Tyr Ala Ala Gly Arg Tyr Ile Thr Gly Arg Asp 85 90 95

Lieu. Arg Phe Met Val Met Ser Ser Glu. Cys Arg Gly Ile Glin Gly Trp 1OO 105 11 O

Phe Ile Leu Ser Met Gly Gly Phe Ala Val Asn Lieu. Glin Arg Pro Gly 115 12 O 125 Ile Llys Ser Lieu. Arg His Val Ile Asp Lieu Met Lieu. Gly Gly Glu Met 13 O 135 14 O Lieu Val Ile Tyr Pro Glu Gly Gly Ile Phe Arg Asp Gly Lys Ile His 145 150 155 160

Pro Lieu Lys Ser Gly Ile Ala Arg Lieu. Ser Ile Asn Ala Glin Ser Lieu. 1.65 17O 17s US 9,096,834 B2 69 70 - Continued

Asp Pro Asn Luell Asp Ile Ile Ile Pro Wall Ser Ile Asn Tyr Asp 18O 185 19 O

Arg Pro Tyr Pro Asn Trp Gly Thir Llys Val Lys Ile Cys Ile Gly Glu 195 2OO 2O5

Pro Wall Wall Ala Asp Tyr Met His Gly Cys Lell Glin Asn Ser 21 O 215 22O

Glin Tyr Luell Thir Asp Lell Arg Ser Lys Lieu. Glin Glu Wall Ala Asp 225 23 O 235 24 O

Ser Glin Pro Wall Ser Lell 245

SEQ ID NO 11 LENGTH: 253 TYPE : PRT ORGANISM: Microcoleus chthonoplastes

< 4 OOs SEQUENCE: 11

Met Gln Leul Glin Arg Glu Glin Asn Glin Thir Pro Ser Thir Luell Asp 1. 15

Lys Pro Asp Ser Thir Wall Asn Ser Ala Lys Ile Wall Pro Wall Asn 25

Ser Asn Wall Ser Pro Trp Lell Thir Ser Ile Phe Pro Luell Gly Arg 35 4 O 45

Arg Ile Luell Met Pro Lell Tyr Phe Gly Arg Luell Glin Wall Thir Gly Glin SO 55 6 O

Asp Asn Ile Pro Thir Gly Pro Wall Ile Lieu. Ala Pro Thir His 65 70

Ser Arg Trp Asp Gly Lell Thir Met Ala Tyr Ala Ala Gly Pro Wall 85 90 95

Thir Gly Arg Asp Lell Arg Phe Met Wall Ser Glin Asp Glu Met Gly 105 11 O

Lell Glin Gly Trp Phe Ile Arg Arg Luell Gly Gly Phe Pro Wall Asn Thir 115 12 O 125

His Pro Gly Ile Ser Ser Ile Arg His Ser Wall Glu Luell Luell Arg 13 O 135 14 O

Arg Gly Glu Ala Lell Wall Met Phe Pro Glu Gly Asn Ile Phe Arg His 145 150 155 160

Gly Asp Wall Asn Pro Lell Pro Gly Met Ala Arg Ile Ala Luell Glin 1.65 17s

Ala Glu Ser Asn Glin Pro Gly Luell Gly Luell Lys Ile Wall Pro Wall Ser 18O 185 19 O

Ile Arg Tyr Ser Asp Pro Wall Pro Glin Trp Gly Ser Asp Ile Asp Ile 195

Glin Ile Gly Ser Ala Lell Asp Wall Ala Cys Asn Glin Ser Ala 21 O 215 22O

Lys Gly Ala Glin Glin Lell Thir Thir Asp Luell Glu Ile Ala Luell Lys 225 23 O 235 24 O

Glin Wall Asp Glin Lys Lell Glu Wall Ile Lys Glu Ala Wall 245 250

<210s, SEQ ID NO 12 &211s LENGTH: 255 212. TYPE : PRT <213> ORGANISM: Cylindrospermopsis raciborskii US 9,096,834 B2 71 72 - Continued

<4 OOs, SEQUENCE: 12

Met Lys Glu Ser His Ser Ala Ser Thir Ser Phe Ala Gly Glin Ser Pro 1. 1O 15

Asn Ile Cys Luell Asp Pro Pro Phe Ser His Thir Arg Ser Trp Ile Ser 25

Pro Trp Luell Thir Pro Lell Ala Tyr Trp Lieu. Gly Tyr Asn Luell Wall Ile 35 4 O 45

Pro Ser Phe Phe Gly Arg Ile Asp Wall Ala Gly Glin Glin His Ile Pro SO 55 6 O

Ile Gly Pro Wall Ile Lell Ala Pro Thir His Arg Ser Arg Trp Asp 65 70

Pro Luell Luell Luell Ala Ala Ala Gly Arg Tyr Ile Thir Gly Arg Asp 85 90 95

Lell Arg Phe Met Wall Met Ser Ser Glu Cys Arg Gly Glin Gly Trp 105 11 O

Phe Ile Luell Ser Met Gly Gly Phe Ala Wall ASn Lell Arg Pro Gly 115 12 O

Ile Arg Ser Luell Arg His Ala Wall Asp Lieu. Met Lell Gly Glu Met 13 O 135 14 O

Lell Wall Ile Pro Glu Gly Gly Ile Phe Arg Asp Ile His 145 150 155 160

Pro Luell Ser Gly Ile Ala Arg Luell Ser Ile Asn Glin Ser Luell 1.65 17O 17s

Asp Pro Asn Luell Asp Ile Ile Ile Pro Wall Ala Asn Tyr Asp 18O 185 19 O

Arg Pro Tyr Pro Ser Trp Gly Thir Llys Val Lys Ile Arg Ile Gly Glu 195 2O5

Pro Wall Wall Thir Asp Tyr Ile His Gly Cys Lell Glin Asn Ala 21 O 215

Lys Luell Thir Asp Lell Arg Ser Lys Luell Glin Glu Wall Ala Asp 225 23 O 235 24 O

Ser Glin Pro Asp Phe Ser Lell Arg Ala Asn Cys Lell Thir Asn Ser 245 250 255

<210s, SEQ ID NO 13 &211s LENGTH: 236 212. TYPE : PRT &213s ORGANISM: Oscillatoria PCC 6506

<4 OOs, SEQUENCE: 13

Met Ile Thir Phe Asp Ser Thir Asn Ser Ser Luell Glu Thir Glin Met Pro 1. 5 1O 15

Ala Lys Ala Thir Ser Ile Asn Ser His Wall Ser Pro Trp Luell Ala Pro 2O 25 3O

Lell Wall Tyr Pro Lell Gly Arg Tyr Lieu. Ile Luell Pro Phe Phe Arg 35 4 O 45

Ser Ile Glu Wall Lell Gly Glin Glu His Ile Pro Arg Glu Gly Pro Wall SO 55 6 O

Ile Luell Ala Pro Thir His Arg Ser Arg Trp Asp Ser Glin Met Luell Pro 65 70 8O

Ser Thir Gly Arg Wall Thir Gly Arg Asp Lell Arg Phe Met Wall 85 90 95

Ser Wall Asn Glu Wall Gly Luell Gln Gly Trp Phe Ile Arg Arg Luell 1OO 105 11 O US 9,096,834 B2 73 74 - Continued

Gly Gly Phe Pro Wall Asp Pro Lys His Pro Ala Ile Ala Thir Luell Arg 115 12 O 125

His Gly Wall Glu Lell Lell Lell Asp Gly Glin Met Phe Wall Ile Phe Pro 13 O 135 14 O

Glu Gly Gly Ile Phe Glin Asp Asn Glin Wall His Pro Lell Pro Gly 145 150 155 160

Lell Ala Arg Luell Ala Ile Glin Ala Glu Ser Ser His Pro Gly Luell Asp 1.65 17O 17s

Wall Ile Wall Pro Met Ser Ile Arg Tyr His Pro Ile Pro Arg 18O 185 19 O

Trp Gly Ser Arg Wall Ile Thir Ile Gly Ser Pro Lell Ser Ala Ala 195 2O5

Asp Tyr Thir Gly Ala Gly Lys Asp Ala Glin Lell Luell Thir Ala 21 O 215 22O

Asp Luell Glu Thir Ala Lell Arg Ile Asp Arg Glu 225 23 O 235

SEQ ID NO 14 LENGTH: 250 TYPE : PRT ORGANISM: Nodularia spumigena

< 4 OOs SEQUENCE: 14

Met Glu Ile Tyr Thir Ser His Glin Thir Ser Glin Glu Thir Pro Ala Ser 1. 5 15

Asn Lys Ala Asn Thir Wall Ala His Thir Thir Ser Arg Wall Ser Pro 25 30

Trp Luell Ser Pro Lell Lell Luell Luell Gly Glin His Lell Luell Luell Pro 35 4 O 45

Phe Phe Phe Arg Glin Ile Glu Ile Thir Gly Glin Glu Asn Ile Pro Luell SO 55 6 O

Thir Gly Pro Wall Ile Lell Ala Pro Thir His Arg Ser Arg Trp Asp Ser 65 70

Lell Luell Luell Pro Tyr Thir Gly Arg Cys Wall Thir Gly Arg Asp Luell 85 90 95

Arg Phe Met Wall Thir Asn Glu Cys Glin Gly Lell Glin Gly Trp Luell 105 11 O

Wall Arg Ser Met Gly Phe Pro Wall Asn Pro Glin Arg Pro Ala Ile 115 12 O 125

Thir Thir Luell Arg His Wall Glu Luell Luell Glin Glin Gly Glu Ile Luell 13 O 135 14 O

Wall Ile Pro Glu Asn Ile Asp Gly Luell His Pro 145 155 160

Lell Pro Gly Ile Arg Luell Ser Luell Thir Ala Glu Ser Ser His 1.65 17O 17s

Pro Luell Asp Wall Ile Luell Pro Wall Ser Ile Asn Tyr Ser Glin 18O 185 19 O

Pro Pro Glin Trp Thir Ser Wall Lys Ile Ser Ile Gly Thir Thir 195 2O5

Lell Asn Wall Ala Asp Thir Gly Gly Wall Lys Glin Asn Ala Glin 21 O 215 22O

Asn Luell Thir Ser Asp Lell Ser Luell Wall Luell Glin Glin Lell Ser Pro Glin 225 23 O 235 24 O

Glu Ser Wall Ile Pro Glu Asn Wall Asn Ser 245 250 US 9,096,834 B2 75 - Continued

<210s, SEQ ID NO 15 &211s LENGTH: 242 212. TYPE: PRT <213> ORGANISM: Microcoleus vaginatus <4 OOs, SEQUENCE: 15 Met Pro His Thr Ala Thr Ile Ala Val Glu Ser Ser Lys Ser Ser Lieu. 1. 5 1O 15 Val Llys Pro Met Pro Ala Lys Val Ala Ser Asn Thr Ser Arg Val Ser 2O 25 3O Pro Trp Lieu. Thir Ser Lieu Lleu Tyr Pro Leu Gly His Tyr Val Val Lieu. 35 4 O 45 Pro Phe His Phe Gly Lys Ile Glu Ile Thr Gly Glin Glu. His Leu Pro SO 55 6 O Lys Asp Gly Pro Val Ile Lieu Ala Pro Thr His Arg Ser Arg Trp Asp 65 70 7s 8O Ala Leu Met Met Pro Tyr Ser Thr Gly Glin Llys Val Thr Gly Arg Asp 85 90 95 Lieu. Arg Phe Met Val Thr Ala Asp Glu Val Lys Gly Lieu. Glin Gly Trp 1OO 105 11 O Phe Ile Arg Asn Lieu. Gly Gly Phe Pro Val Asp Lieu Lys His Pro Ala 115 12 O 125 Ile Gly. Thir Lieu. Arg Tyr Gly Val Glu Lieu. Lieu. Lieu. Asp Arg Glu Met 13 O 135 14 O Lieu Val Ile Phe Pro Glu Gly Gly Ile Phe Glin Asp Gly Glu Val His 145 150 155 160 Pro Ile Llys Pro Gly Lieu Ala Arg Lieu Ala Met Glin Ala Glu Lieu. Ser 1.65 17O 17s Llys Pro Gly Lieu. Gly Val Lys Ile Val Pro Met Ser Ile Arg Tyr Arg 18O 185 19 O Pro Arg Ile Pro Gln Trp Gly Cys Glin Val Lys Ile Ala Ile Gly Ser 195 2OO 2O5 Pro Ile Ser Val Ala Asp Tyr Ala Asn Gly Gly Gly Gly Lys Lys Glu 21 O 215 22O Ala Arg Lieu. Lieu. Ser Ala Glin Lieu. Glu Met Asp Lieu Lys Llys Lieu. Asp 225 23 O 235 24 O

Glu Ser

<210s, SEQ ID NO 16 &211s LENGTH: 245 212. TYPE: PRT <213> ORGANISM: Escherichia coli

<4 OOs, SEQUENCE: 16 Met Leu Tyr Ile Phe Arg Lieu. Ile Ile Thr Val Ile Tyr Ser Ile Leu 1. 5 1O 15

Val Cys Val Phe Gly Ser Ile Tyr Cys Lieu Phe Ser Pro Arg Asn Pro 2O 25 3O Lys His Val Ala Thr Phe Gly His Met Phe Gly Arg Lieu Ala Pro Leu 35 4 O 45

Phe Gly Lieu Lys Val Glu. Cys Arg Llys Pro Thr Asp Ala Glu Ser Tyr SO 55 6 O Gly Asn Ala Ile Tyr Ile Ala Asn His Glin Asn. Asn Tyr Asp Met Val 65 70 7s 8O US 9,096,834 B2 77 78 - Continued

Thir Ala Ser ASn Ile Wall Glin Pro Pro Thir Wall Thir Wall Gly Lys Lys 85 90 95

Ser Luell Luell Trp Ile Pro Phe Phe Gly Glin Luell Tyr Trp Luell Thir Gly 1OO 105 11 O

Asn Luell Luell Ile Asp Arg Asn Asn Arg Thir Lys Ala His Gly Thir Ile 115 12 O 125

Ala Glu Wall Wall Asn His Phe Arg Arg Ile Ser Ile Trp Met 13 O 135 14 O

Phe Pro Glu Gly Thr Arg Ser Arg Gly Arg Gly Lell Lell Pro Phe Lys 145 150 155 160

Thir Gly Ala Phe His Ala Ala Ile Ala Ala Gly Wall Pro Ile Ile Pro 1.65 17O 17s

Wall Wall Ser Thir Thir Ser Asn Lys Ile ASn Lell Asn Arg Luell His 18O 185 19 O

Asn Gly Luell Wall Ile Wall Glu Met Luell Pro Pro Ile Asp Wall Ser Glin 195

Gly Lys Asp Glin Wall Arg Glu Luell Ala Ala His Arg Ser Ile 21 O 215 22O

Met Glu Glin Lys Ile Ala Glu Luell Asp Glu Wall Ala Glu Arg Glu 225 23 O 235 24 O

Ala Ala Gly Llys Val 245

<210s, SEQ ID NO 17 &211s LENGTH: 231 212. TYPE PRT &213s ORGANISM: Marinobacter adhaerens

<4 OOs, SEQUENCE: 17

Met Lieu. Arg Lieu. Lieu. Ala Ala Trp Ile Luell Phe Trp Ala Ile Luell 1. 5 1O 15

Lell Thir Luell Ile Lieu. Phe Phe Pro Ile Wall Ile Ala Ala Luell Luell Gly 2O 25

Arg Gly Asp Ala Ala Phe His Gly Thir Glin Ile Tyr Ala Trp Ile 35 4 O 45

Ile Luell Val Cys Gly Ile Arg Luell Wall Arg Gly Arg Glu Asn SO 55 6 O

Ile Glu Pro Gly Glin Arg Wall Ile Luell Ser Asn His Ala Ser Tyr 65 70

Lell Asp Pro Pro Ala Lell Wall Luell Ala Luell Gly Lell Glin Tyr Arg Trp 85 90 95

Wall Ile Lys Glu Lell Arg Wall Pro Luell Phe Gly Luell Ala Luell 1OO 105 11 O

Glu Ala Ser Arg Asn Lell Phe Ile Asp Arg Ser Gly Ser Asp Ala 115 12 O 125

Lell Glu Ser Ile Llys Arg Gly Wall Gly Glin Luell Pro Asp Gly Thir Gly 13 O 135 14 O

Ile Luell Ile Phe Pro Glu Gly Thir Arg Ser Trp Asp Gly Luell Luell 145 150 155 160

Pro Phe Gly Phe Wall Ile Ala Glin Asp Gly Glu Luell Pro 1.65 17O 17s

Ile Luell Pro Wall. Thir Ile Gly Ser His Glin Arg Lell Pro Gly 18O 185 19 O

Ser Ala Ala Phe Ser Arg Gly Asp Ile Glu Ile Wall Ile His Pro Pro 195 2OO 2O5 US 9,096,834 B2 79 80 - Continued

Met Ala Ser Gly Ala Lieu Pro Lieu. Asp Asp Lieu Met Thr Asp Val Arg 21 O 215

Asn Ser Ile Ala Ser Ser Lieu. 225 23 O

<210s, SEQ ID NO 18 &211s LENGTH: 332 212. TYPE : PRT <213> ORGANISM: Oryza sativa

<4 OOs, SEQUENCE: 18

Met Gly Thr Lieu. Lell Arg Pro Arg Pro Luell Ala His Ala Ala Gly Ala 1. 5 15

Gly Asp Ala Thir Pro Ser Thir Ala His Ala Wall Wall Wall Ser Gly Gly 25

Arg Gly Arg Gly Wall Glu Glin Pro His Arg Wall Arg Arg Arg Pro 35 4 O 45

Gly Pro Glin Wall Ala Wall Ala Thir Ala Ser Trp Arg Arg Arg Arg Glu SO 55 6 O

Thir Wall Wall Arg Ser Asp Phe Ala Ala Gly Gly Ala Ala Thir Met Gly 65 70

Asp Ser Pro Glin Ala Lell Ser Asp Ile Asp Wall Wall Ser Arg Wall Arg 85 90 95

Gly Wall Phe Tyr Ala Wall Thir Ala Wall Ala Ala Ile Phe Luell Phe 105 11 O

Wall Ala Met Wall Wall Wall His Pro Lieu Wall Lieu. Lieu Phe Asp Arg Tyr 115 12 O 125

Arg Arg Arg Ala Glin His Tyr Ile Ala Ile Trp Ala Thir Luell Thir 13 O 135 14 O

Ile Ser Met Phe Tyr Lys Lell Asp Wall Glu Gly Met Glu Asn Luell Pro 145 150 155 160

Pro Asn Ser Ser Pro Ala Wall Wall Ala ASn His Glin Ser Phe Luell 1.65 17O 17s

Asp Ile Thir Lell Lell Thir Luell Gly Arg Phe Phe Ile Ser 18O 185 19 O

Thir Ser Ile Phe Met Phe Pro Ile Ile Gly Trp Ala Met Luell 195

Lell Gly Wall Ile Pro Lell Arg Arg Met Asp Ser Arg Ser Glin Luell Asp 21 O 215

Cys Luell Arg Cys Wall Asp Luell Wall Lys Lys Gly Ala Ser Wall Phe 225 23 O 235 24 O

Phe Phe Pro Glu Gly Thir Arg Ser Asp Gly Lell Gly Ala Phe 245 250 255

Arg Gly Ala Phe Ser Wall Ala Thir Lys Thir Gly Ala Pro Wall Ile 26 O 265 27 O

Pro Ile Thir Luell Lell Gly Thir Gly Luell Met Pro Ser Gly Met Glu 27s 285

Gly Ile Luell Asn Ser Gly Ser Wall Luell Ile Ile His His Pro Ile 29 O 295 3 OO

Glu Gly Asn Asp Ala Glu Luell Ser Glu Ala Arg Wall Ile 3. OS 310 315 32O

Ala Asp Thir Luell Ile Lell Asn Gly Gly Wall His 3.25 330 US 9,096,834 B2 81 82 - Continued

SEQ ID NO 19 LENGTH: 345 TYPE : PRT ORGANISM: Glycine ax

< 4 OOs SEQUENCE: 19

Met Glu Wall Thr Pro Lell Ser Ser Pro Ser Pro Ile His Arg Luell His 1. 5 15

Lell Arg His Lys Glu Ala Arg Phe Luell Ala Wall Pro Ser Thir Luell Luell 25 3O

Thir Arg Arg Gly Thir Thir Thir Wall His Pro Ile Luell Arg Thir 35 4 O 45

Ser His Asn Ser Pro Pro Cys Ser Luell Glin Ala Ile Ser His SO 55 6 O

Glu Asn Wall Ser Trp Lell Ser Wall Ser Pro Lys Lell His Wall Glin Asn 65 70

Phe Pro Arg Asp Wall Wall Wall Arg Ser Glu Lell Thir Ala Ala Gly 85 90 95

Ser Ala Gly Asp Gly Tyr Lell Luell Pro Glu Luell Lell Glu Ser 105 11 O

Wall Arg Gly Wall Cys Phe Wall Wall Thir Ala Phe Ser Ala Ile Phe 115 12 O 125

Lell Phe Met Luell Met Lell Wall Gly His Pro Ser Wall Lell Luell Phe Asp 13 O 135 14 O

Arg Arg Arg Met Phe His His Phe Wall Ala Wall Trp Ala Ala 145 150 155 160

Lell Thir Wall Ala Pro Phe Ile Glu Phe Glu Gly Luell Glu Asn 1.65 17O

Lell Pro Pro Pro Asp Thir Pro Ala Wall Wall Ser Asn His Glin Ser 18O 185 19 O

Phe Luell Asp Ile Tyr Thir Lell Luell Thir Luell Gly Arg Ser Phe Phe 195

Ile Ser Thir Gly Ile Phe Luell Phe Pro Ile Ile Gly Trp Ala Met 21 O 215

Phe Luell Luell Gly Wall Ile Pro Luell Arg Met Asp Ser Arg Ser Glin 225 23 O 235 24 O

Lell Asp Luell Lys Arg Met Asp Luell Ile Gly Ala Ser 245 250 255

Wall Phe Phe Phe Pro Glu Gly Thir Arg Ser Lys Gly Lys Luell Gly 26 O 265 27 O

Thir Phe Lys Gly Ala Phe Ser Wall Ala Ala Thir Asn Ala Pro 27s 28O 285

Wall Wall Pro Ile Ser Lell Ile Gly Thir Gly Glin Ile Met Pro Ala Gly 29 O 295 3 OO

Lys Glu Gly Ile Wall Asn Lell Gly Ser Wall Lys Wall Wall Ile His Lys 3. OS 310 315

Pro Ile Wall Gly Lys Asp Pro Asp Met Luell Cys Glu Ala Arg Lys 3.25 330 335

Thir Ile Ala Ser Wall Lell Thir Glin Ser 34 O 345

<210s, SEQ ID NO 2 O &211s LENGTH: 344 212. TYPE : PRT <213> ORGANISM: Brassica napus US 9,096,834 B2 83 - Continued

<4 OOs, SEQUENCE: 2O Met Asp Val Ala Ser Ala Arg Gly Val Ser Ser His Pro Pro Tyr Tyr 1. 5 1O 15 Ser Llys Pro Ile Cys Ser Ser Glin Ser Ser Lieu. Ile Arg Ile Pro Ile 2O 25 3O Ser Lys Gly Cys Cys Phe Ala Arg Ser Ser Asn Lieu. Ile Thir Ser Lieu. 35 4 O 45 His Ala Ala Ser Arg Gly Val Thr Arg Arg Thr Ser Gly Val Glin Trp SO 55 6 O Cys Tyr Arg Ser Ile Arg Phe Asp Pro Phe Llys Val Asn Asp Lys Asn 65 70 7s 8O Ser Arg Thr Val Thr Val Arg Ser Asp Leu Ser Gly Ala Ala Thr Pro 85 90 95 Glu Ser Thr Tyr Pro Glu Pro Glu Ile Llys Lieu Ser Ser Arg Lieu. Arg 1OO 105 11 O Gly Ile Cys Phe Cys Lieu Val Ala Gly Ile Ser Ala Ile Val Lieu. Ile 115 12 O 125 Val Lieu Met Ile Ile Gly His Pro Phe Val Lieu Lleu Phe Asp Arg Tyr 13 O 135 14 O Arg Arg Llys Phe His His Phe Ile Ala Lys Lieu. Trp Ala Ser Ile Ser 145 150 155 160 Ile Tyr Pro Phe Tyr Lys Thr Asp Ile Glin Gly Lieu. Glu Asn Lieu Pro 1.65 17O 17s Ser Ser Asp Thr Pro Cys Val Tyr Val Ser Asn His Glin Ser Phe Leu 18O 185 19 O Asp Ile Tyr Thr Lieu Lleu Ser Leu Gly Glin Ser Tyr Lys Phe Ile Ser 195 2OO 2O5 Lys Thr Gly Ile Phe Val Ile Pro Val Ile Gly Trp Ala Met Ser Met 21 O 215 22O Met Gly Val Val Pro Lieu Lys Arg Met Asp Pro Arg Ser Glin Val Asp 225 23 O 235 24 O Cys Lieu Lys Arg Cys Met Glu Lieu Val Lys Lys Gly Ala Ser Val Phe 245 250 255 Phe Phe Pro Glu Gly Thr Arg Ser Lys Asp Gly Arg Lieu. Gly Pro Phe 26 O 265 27 O Llys Lys Gly Ala Phe Thir Ile Ala Ala Lys Thr Gly Val Pro Val Val 27s 28O 285 Pro Ile Thr Lieu Met Gly Thr Gly Lys Ile Met Pro Thr Gly Ser Glu 29 O 295 3 OO Gly Ile Lieu. Asn His Gly Asp Val Arg Val Ile Ile His Llys Pro Ile 3. OS 310 315 32O Tyr Gly Ser Lys Ala Asp Val Lieu. Cys Glu Glu Ala Arg Asn Lys Ile 3.25 330 335

Ala Glu Ser Met Asn Lieu Lleu Ser 34 O

<210s, SEQ ID NO 21 &211s LENGTH: 356 212. TYPE: PRT <213> ORGANISM: Arabidopsis thaliana

<4 OOs, SEQUENCE: 21 Met Asp Val Ala Ser Ala Arg Ser Ile Ser Ser His Pro Ser Tyr Tyr 1. 5 1O 15 US 9,096,834 B2 85 - Continued Gly Llys Pro Ile Cys Ser Ser Glin Ser Ser Lieu. Ile Arg Ile Ser Arg 2O 25 3O Asp Llys Val Cys Cys Phe Gly Arg Ile Ser Asn Gly Met Thr Ser Phe 35 4 O 45 Thir Thr Ser Lieu. His Ala Val Pro Ser Glu Lys Phe Met Gly Glu Thr SO 55 6 O Arg Arg Thr Gly Ile Glin Trp Ser Asn Arg Ser Lieu. Arg His Asp Pro 65 70 7s 8O Tyr Arg Phe Lieu. Asp Llys Llys Ser Pro Arg Ser Ser Glin Lieu Ala Arg 85 90 95 Asp Ile Thr Val Arg Ala Asp Lieu. Ser Gly Ala Ala Thr Pro Asp Ser 1OO 105 11 O Ser Phe Pro Glu Pro Glu Ile Llys Lieu. Ser Ser Arg Lieu. Arg Gly Ile 115 12 O 125 Phe Phe Cys Val Val Ala Gly Ile Ser Ala Thr Phe Lieu. Ile Val Lieu. 13 O 135 14 O Met Ile Ile Gly His Pro Phe Val Lieu. Leu Phe Asp Pro Tyr Arg Arg 145 150 155 160 Llys Phe His His Phe Ile Ala Lys Lieu. Trp Ala Ser Ile Ser Ile Tyr 1.65 17O 17s Pro Phe Tyr Lys Ile Asn Ile Glu Gly Lieu. Glu Asn Lieu Pro Ser Ser 18O 185 19 O Asp Thr Pro Ala Val Tyr Val Ser Asn His Glin Ser Phe Lieu. Asp Ile 195 2OO 2O5 Tyr Thir Lieu Lieu Ser Lieu. Gly Llys Ser Phe Llys Phe Ile Ser Llys Thr 21 O 215 22O Gly Ile Phe Val Ile Pro Ile Ile Gly Trp Ala Met Ser Met Met Gly 225 23 O 235 24 O Val Val Pro Lieu Lys Arg Met Asp Pro Arg Ser Glin Val Asp Cys Lieu 245 250 255 Lys Arg Cys Met Glu Lieu Lleu Lys Lys Gly Ala Ser Val Phe Phe Phe 26 O 265 27 O Pro Glu Gly. Thir Arg Ser Lys Asp Gly Arg Lieu. Gly Ser Phe Llys Llys 27s 28O 285 Gly Ala Phe Thr Val Ala Ala Lys Thr Gly Val Ala Val Val Pro Ile 29 O 295 3 OO Thr Lieu Met Gly Thr Gly Lys Ile Met Pro Thr Gly Ser Glu Gly Ile 3. OS 310 315 32O Lieu. Asn His Gly Asn Val Arg Val Ile Ile His Llys Pro Ile His Gly 3.25 330 335 Ser Lys Ala Asp Val Lieu. Cys Asn. Glu Ala Arg Ser Lys Ile Ala Glu 34 O 345 35. O

Ser Met Asp Lieu. 355

<210s, SEQ ID NO 22 &211s LENGTH: 374 212. TYPE: PRT <213> ORGANISM: Zea mays

<4 OOs, SEQUENCE: 22 Met Ala Ile Pro Lieu Val Lieu Val Val Lieu Pro Lieu. Gly Lieu. Lieu. Phe 1. 5 1O 15

Lieu. Lieu. Ser Gly Lieu. Ile Val Asn Ala Ile Glin Ala Val Lieu. Phe Val 2O 25 3O US 9,096,834 B2 87 88 - Continued

Thir Ile Arg Pro Phe Ser Ser Phe Tyr Arg Arg Ile Asn Arg Phe 35 4 O 45

Lell Ala Glu Luell Lell Trp Lell Glin Luell Wall Trp Wall Wall Asp Trp Trp SO 55 6 O

Ala Gly Wall Wall Glin Lell His Ala Asp Glu Glu Thir Arg Ser 65 70

Met Gly Glu His Ala Lell Ile Ile Ser ASn His Arg Ser Asp Ile 85 90 95

Asp Trp Luell Ile Gly Trp Ile Luell Ala Glin Arg Ser Gly Cys Luell Gly 105 11 O

Ser Thir Luell Ala Wall Met Lys Ser Ser Phe Lell Pro Wall Ile 115 12 O 125

Gly Trp Ser Met Trp Phe Ala Glu Tyr Luell Phe Lell Glu Arg Ser Trp 13 O 135 14 O

Ala Asp Glu Thir Lell Trp Gly Luell Glin Arg Luell Asp 145 150 155 160

Phe Pro Arg Pro Phe Trp Lell Ala Luell Phe Wall Glu Gly Thir Arg Phe 1.65 17O

Thir Pro Ala Lys Lell Lell Ala Ala Glin Glu Tyr Ala Ala Ser Glin Gly 18O 185 19 O

Lell Pro Ala Pro Arg Asn Wall Luell Ile Pro Arg Thir Lys Gly Phe Wall 195

Ser Ala Wall Ser Ile Met Arg Asp Phe Wall Pro Ala Ile Asp Thir 21 O 215

Thir Wall Ile Wall Pro Lys Asp Ser Pro Glin Pro Thir Met Luell Arg Ile 225 23 O 235 24 O

Lell Gly Glin Ser Ser Wall Ile His Wall Arg Met Arg His Ala 245 250 255

Met Ser Glu Met Pro Ser Asp Glu Asp Wall Ser Trp 26 O 265 27 O

Asp Ile Phe Wall Ala Asp Ala Luell Luell Asp His Luell Ala Thir 27s 285

Gly Thir Phe Asp Glu Glu Ile Arg Pro Ile Gly Arg Pro Wall Ser 29 O 295 3 OO

Lell Luell Wall Thir Lell Phe Trp Ser Luell Luell Lell Phe Gly Ala Ile 3. OS 310 315

Glu Phe Phe Trp Thir Glin Luell Luell Ser Thir Trp Arg Gly Wall Ala 3.25 330 335

Phe Thir Ala Ala Gly Met Ala Luell Wall Thir Gly Wall Met His Wall Phe 34 O 345 35. O

Ile Met Phe Ser Glin Ala Glu Arg Ser Ser Ser Ala Arg Ala Ala Arg 355 360 365

Asn Arg Wall Lys Lys Glu 37 O

SEQ ID NO 23 LENGTH: 389 TYPE : PRT ORGANISM: Arabidopsis thaliana

< 4 OOs SEQUENCE: 23

Met Wall Ile Ala Ala Ala Wall Ile Val Pro Leu Gly Lieu. Leu Phe Phe 1. 5 15

Ile Ser Gly Lieu Ala Val Asn Lieu. Phe Glin Ala Val Cys Tyr Val Lieu. US 9,096,834 B2 89 - Continued

2O 25 3O Ile Arg Pro Lieu. Ser Lys Asn. Thir Tyr Arg Lys Ile Asn Arg Val Val 35 4 O 45 Ala Glu Thir Lieu. Trp Lieu. Glu Lieu Val Trp Ile Val Asp Trp Trp Ala SO 55 6 O Gly Val Lys Ile Glin Val Phe Ala Asp Asn. Glu Thir Phe Asin Arg Met 65 70 7s 8O Gly Lys Glu. His Ala Lieu Val Val Cys Asn His Arg Ser Asp Ile Asp 85 90 95 Trp Lieu Val Gly Trp Ile Lieu Ala Glin Arg Ser Gly Cys Lieu. Gly Ser 1OO 105 11 O Ala Lieu Ala Val Met Lys Llys Ser Ser Llys Phe Lieu Pro Val Ile Gly 115 12 O 125 Trp Ser Met Trp Phe Ser Glu Tyr Lieu Phe Leu Glu Arg Asn Trp Ala 13 O 135 14 O Lys Asp Glu Ser Thr Lieu Lys Ser Gly Lieu. Glin Arg Lieu. Ser Asp Phe 145 150 155 160 Pro Arg Pro Phe Trp Leu Ala Leu Phe Val Glu Gly Thr Arg Phe Thr 1.65 17O 17s Glu Ala Lys Lieu Lys Ala Ala Glin Glu Tyr Ala Ala Ser Ser Glu Lieu. 18O 185 19 O Pro Ile Pro Arg Asn Val Lieu. Ile Pro Arg Thr Lys Gly Phe Val Ser 195 2OO 2O5 Ala Val Ser Asn Met Arg Ser Phe Val Pro Ala Ile Tyr Asp Met Thr 210 215 220 Val Thir Ile Pro Llys Thr Ser Pro Pro Pro Thr Met Lieu. Arg Lieu. Phe 225 23 O 235 24 O Lys Gly Glin Pro Ser Val Val His Val His Ile Lys Cys His Ser Met 245 250 255 Lys Asp Lieu Pro Glu Ser Asp Asp Ala Ile Ala Glin Trp Cys Arg Asp 26 O 265 27 O Glin Phe Val Ala Lys Asp Ala Lieu. Lieu. Asp Llys His Ile Ala Ala Asp 27s 28O 285 Thr Phe Pro Gly Glin Glin Glu Glin Asn Ile Gly Arg Pro Ile Llys Ser 29 O 295 3 OO Lieu Ala Val Val Lieu. Ser Trp Ala Cys Val Lieu. Thir Lieu. Gly Ala Ile 3. OS 310 315 32O Llys Phe Lieu. His Trp Ala Gln Leu Phe Ser Ser Trp Lys Gly Ile Thr 3.25 330 335 Ile Ser Ala Lieu. Gly Lieu. Gly Ile Ile Thr Lieu. Cys Met Glin Ile Lieu. 34 O 345 35. O Ile Arg Ser Ser Glin Ser Glu Arg Ser Thr Pro Ala Lys Val Val Pro 355 360 365 Ala Lys Pro Lys Asp Asn His His Pro Glu Ser Ser Ser Glin Thr Glu 37 O 375 38O

Thr Glu Lys Glu Lys 385

<210s, SEQ ID NO 24 &211s LENGTH: 376 212. TYPE: PRT <213> ORGANISM: Arabidopsis thaliana

<4 OOs, SEQUENCE: 24 US 9,096,834 B2 91 92 - Continued

Met Ile Pro Ala Ala Lell Wall Phe Ile Pro Val Gly Val Luell Phe 1O 15

Lell Ile Ser Gly Lell Ile Wall Asn Ile Ile Glin Lell Wall Phe Phe Ile 2O 25

Ile Wall Arg Pro Phe Ser Arg Ser Luell Arg Arg Ile Asn Asn 35 4 O 45

Wall Ala Glu Luell Lell Trp Lell Glin Luell Ile Trp Lell Phe Asp Trp Trp SO 55 6 O

Ala Ile Ile Asn Lell Wall Asp Ala Glu Thir Luell Glu Luell 65 70

Ile Gly Glu His Ala Lell Wall Luell Ser ASn His Arg Ser Asp Ile 85 90 95

Asp Trp Luell Ile Gly Trp Wall Met Ala Glin Arg Wall Gly Cys Luell Gly 105 11 O

Ser Ser Luell Ala Ile Met Lys Glu Ala Lys Lell Pro Ile Ile 115 12 O 125

Gly Trp Ser Met Trp Phe Ser Asp Ile Phe Lell Glu Arg Ser Trp 13 O 135 14 O

Ala Asp Glu Asn Thir Lell Ala Gly Phe Arg Luell Glu Asp 145 150 155 160

Phe Pro Met Thir Phe Trp Lell Ala Luell Phe Wall Glu Gly Thir Arg Phe 1.65 17O

Thir Glin Glu Lys Lell Glu Ala Ala Glin Glu Tyr Ala Ser Ile Arg Ser 18O 185 19 O

Lieu Pro Ser Pro Arg Asn Wall Lieu Ile Pro Arg Thr Lys Gly Phe Wall 195

Ser Ala Wall Ser Glu Ile Arg Ser Phe Wall Pro Ala Ile Asp 21 O 215

Thir Luell Thir Wall His Asn Asn Glin Pro Thir Pro Thir Lell Luell Arg Met 225 23 O 235 24 O

Phe Ser Gly Glin Ser Ser Glu Ile Asn Luell Met Arg Arg His 245 250 255

Met Ser Glu Luell Pro Glu Thir Asp Asp Gly Ala Glin Trp Glin 26 O 265 27 O

Asp Luell Phe Ile Thir Asp Ala Glin Luell Tyr Phe Thir 27s 285

Asp Wall Phe Ser Asp Lell Glu Wall His Glin Asn Arg Pro Ile 29 O 295 3 OO

Pro Luell Ile Wall Wall Ile Ile Trp Luell Gly Lell Wall Phe Gly Gly 3. OS 310 32O

Phe Luell Luell Glin Trp Lell Ser Ile Wall Ser Trp Ile Ile 3.25 330 335

Lell Luell Phe Wall Phe Phe Lell Wall Ile Ala Thir Ile Thir Met Glin Ile 34 O 345 35. O

Lell Ile Glin Ser Ser Glu Ser Glin Arg Ser Thir Pro Ala Arg Pro 355 360 365

Lell Glin Glu Glin Lell Ile Ser Ala 37 O 375

SEO ID NO 25 LENGTH: 390 TYPE : PRT ORGANISM: Brassica napus

< 4 OOs SEQUENCE: 25 US 9,096,834 B2 93 94 - Continued

Met Ala Met Ala Ala Ala Wall Ile Wall Pro Luell Gly Ile Luell Phe Phe 1O 15

Ile Ser Gly Luell Wall Wall Asn Luell Luell Glin Ala Wall Cys Tyr Wall Luell 25

Wall Arg Pro Met Ser Asn Thir Lys Ile Asn Arg Wall Wall 35 4 O 45

Ala Glu Thir Luell Trp Lell Glu Luell Wall Trp Ile Wall Asp Trp Trp Ala SO 55 6 O

Gly Wall Ile Glin Wall Phe Ala Asp Asp Glu Thir Phe Asn Arg Met 65 70

Gly Glu His Ala Lell Wall Wall Asn His Arg Ser Asp Ile Asp 85 90 95

Trp Luell Wall Gly Trp Ile Lell Ala Glin Arg Ser Gly Luell Gly Ser 105 11 O

Ala Luell Ala Wall Met Ser Ser Phe Lell Pro Wall Ile Gly 115 12 O 125

Trp Ser Met Trp Phe Ser Glu Tyr Luell Phe Luell Glu Arg Asn Trp Ala 13 O 135 14 O

Lys Asp Glu Ser Thir Lell Glin Ser Gly Luell Glin Arg Lell Asn Asp Phe 145 150 155 160

Pro Arg Pro Phe Trp Lell Ala Luell Phe Wall Glu Gly Thir Arg Phe Thir 1.65 17O 17s

Glu Ala Luell Ala Ala Glin Glu Tyr Ala Ala Ser Ser Glu Luell 18O 185 19 O

Pro Wall Pro Arg Asn Wall Lell Ile Pro Arg Thir Gly Phe Wall Ser 195

Ala Wall Ser Asn Met Arg Ser Phe Wall Pro Ala Ile Asp Met Thir 21 O 215 22O

Wall Ala Ile Pro Thir Ser Pro Pro Pro Thir Met Lell Arg Luell Phe 225 23 O 235 24 O

Gly Glin Pro Ser Wall Wall His Wall His Ile His Ser Met 245 250 255

Asp Luell Pro Glu Pro Glu Asp Glu Ile Ala Glin Trp Cys Arg Asp 26 O 265 27 O

Glin Phe Wall Ala Asp Ala Luell Luell Asp His Ile Ala Ala Asp 285

Thir Phe Pro Gly Glin Glu Glin Asn Ile Gly Arg Pro Ile Ser 29 O 295 3 OO

Lell Ala Wall Wall Wall Ser Trp Ala Luell Luell Thir Lell Gly Ala Met 3. OS 310 315

Phe Luell His Trp Ser Asn Luell Phe Ser Ser Trp Gly Ile Ala 3.25 330 335

Lell Ser Ala Phe Gly Lell Gly Ile Ile Thir Luell Met Glin Ile Luell 34 O 345 35. O

Ile Arg Ser Ser Glin Ser Glu Arg Ser Thir Pro Ala Lys Wall Ala Pro 355 360 365

Ala Lys Pro Asp Asn His Glin Ser Gly Pro Ser Ser Glin Thir Glu 37 O 375 38O

Wall Glu Glu Glin Lys 385 390

<210s, SEQ ID NO 26 &211s LENGTH: 306 US 9,096,834 B2 95 - Continued

212. TYPE: PRT <213> ORGANISM: Prunus dulcis

<4 OOs, SEQUENCE: 26 Met Gly Lys Glu. His Ala Lieu Val Ile Ser Asn His Arg Ser Asp Ile 1. 5 1O 15 Asp Trp Lieu Val Gly Trp Val Lieu Ala Glin Arg Ser Gly Cys Lieu. Gly 2O 25 3O Ser Ser Lieu Ala Wal Met Lys Llys Ser Ser Llys Phe Lieu Pro Val Ile 35 4 O 45 Gly Trp Ser Met Trp Phe Ser Glu Tyr Lieu. Phe Lieu. Glu Arg Ser Trp SO 55 6 O Ala Lys Asp Glu Gly. Thir Lieu Lys Ser Gly Val Glin Arg Lieu Lys Asp 65 70 7s 8O Phe Pro Gln Pro Phe Trp Leu Ala Leu Phe Val Glu Gly Thr Arg Phe 85 90 95 Thr Glin Ala Lys Lieu. Lieu Ala Ala Glin Glu Tyr Ala Ala Ala Thr Gly 1OO 105 11 O Lieu Pro Val Pro Arg Asn Val Lieu. Ile Pro Arg Thr Lys Gly Phe Val 115 12 O 125 Thr Ala Val Ser Gln Met Arg Ser Phe Ala Pro Ala Ile Tyr Asp Val 13 O 135 14 O Thr Val Ala Ile Pro Llys Ser Ser Pro Ala Pro Thr Met Lieu. Arg Lieu. 145 150 155 160 Phe Glu Gly Arg Pro Ser Val Val His Val His Ile Lys Arg His Val 1.65 170 175 Met Arg Asp Lieu Pro Glu Thir Asp Glu Ala Val Ala Glin Trp Cys Llys 18O 185 19 O Asp Ile Phe Val Ala Lys Asp Ala Lieu. Lieu. Asp Llys His Thr Val Glu 195 2OO 2O5 Glin Thr Phe Gly Asp Glin Gln Lieu Lys Val Thr Gly Arg Pro Lieu Lys 21 O 215 22O Ser Lieu. Lieu Val Val Thr Ala Trp Ala Cys Lieu. Lieu. Ile Lieu. Gly Ala 225 23 O 235 24 O Lieu Lys Phe Lieu. Tyr Trp Ser Ser Lieu. Lieu. Ser Ser Trp Llys Gly Ile 245 250 255 Ala Phe Ser Ala Lieu. Gly Lieu. Gly Val Val Thr Val Lieu Met Glin Ile 26 O 265 27 O Lieu. Ile Arg Phe Ser Glin Ser Glu Arg Ser Thr Pro Ala Pro Val Ala 27s 28O 285 Pro Thr Asn. Asn Lys Asn Lys Gly Glu Ser Ser Gly Llys Pro Glu Lys 29 O 295 3 OO

Glin Glin 3. OS

<210s, SEQ ID NO 27 &211s LENGTH: 76.42 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: pSGI-NB55 vector that includes sll1752 LPAAT gene

<4 OOs, SEQUENCE: 27 cagttaccaa togcttaatca gtgaggcacc tat ct cagog atctgtctat titcgttcatc 6 O

Catagttgcc tact.ccc.cg tcgtgtagat alactacgata C9ggagggct tac Catctgg 12 O

US 9,096,834 B2 119 120 - Continued actggcagca gcc actggta attgatttag aggagittagt Cttgaagttca togc.cggitt aaggctaaac talaaggaca agttittggtg actg.cgcticc tccaa.gc.cag ttacct cqgt 516 O tcaaagagtt ggtagcticag agaac ctitcq aaaaaccocc Ctgcaaggcg gttttitt cqt 522 O tittcagagca agagattacg cgcagaccala aacgatctoa agaagat cat cittattaatc 528 O agataaaata tittctagatt t cagtgcaat titat citct to aaatgtagca Cctgaagtica 534 O gccc catacg atataagttg taatt Ctcat gtttgacagc titat catcga taagctittaa 54 OO tgcgg tagtt tat cacagtt aaattattgc tgaag.cggaa tcc ctggitta atgcc.gc.cgc 546 O cgatgc caat tdcattct co aagttgggg.ca cattgaacgc ttcaa.cccgg catttittaga 552O gctaac caaa attct caaaa cggaagagtt attggcgatc gaagcc.cat C gcatgagt cc 558 O

Ctatt cccag C9ggccaatg atgtc.t.ccgt. ggt attggat ttgatgatcc atgacattga 564 O

Cctgttgctg gaattggtgg gttcggaagt ggittaaactg tcc.gc.ca.gtg gcagt cqggc st OO ttctgggt ca ggatatttgg attatgtcac cgctacgitta ggcttct cot coggcattgt 576. O ggccaccctic accoccagta agg to accca tcqtaaaatt cgttc catcg cc.gcc cactg 582O caaaaatticc ct caccgaag cggattitt ct caataacgaa attittgatcc atcgc.caaac 588 O

Caccgctgat tagcgcgg actatggc.ca ggt attgtat cgc.caggatg gtctaatcga 594 O aaaggtttac accagtaata ttgaacct ct ccacgctgaa ttagaacatt ttatt cattg 6 OOO tgttagggga ggtgat Caac Cct cagtggg gggagaa.ca.g gcc ct calagg C cctgaagtt agc.cagttta attgaagaaa tggcc Ctgga Cagticaggaa tgg catgggggggaagttgt 612 O gacagaat at CaagatgcCa CCCtggcc ct Cagtgcgagt gtttaaatca act taattaa. 618O tgcaattatt gcgagttcaa act cqataac tttgttgaaat attactgttgaattaatcta 624 O tgactatt.ca atacac cc cc ctago cqatc gcc tittggc ctacct cqcc gcc.gatcgc.c 63 OO taaatctgag cqccaagagt agttc cctica acaccagtat tctgcticago agtgaccitat 636 O t caatcagga agggggaatt gta acagc.ca actatggctt tgatggittat atgg 6 414

SEO ID NO 37 LENGTH: 37 TYPE: DNA ORGANISM: Artificial Sequence FEATURE: OTHER INFORMATION: Primer NB47 O

SEQUENCE: 37 agagaatggt CCCC catgtt cagct actga C9gggtg 37

SEQ ID NO 38 LENGTH: 56 TYPE: DNA ORGANISM: Artificial Sequence FEATURE: OTHER INFORMATION: Primer NB471

SEQUENCE: 38 ggatt.ccgct tcagdaataa tittaactgtgataaact acc gcattaaagc titat cq 56

SEO ID NO 39 LENGTH: 57 TYPE: DNA ORGANISM: Artificial Sequence FEATURE: OTHER INFORMATION: Primer NB4 66

SEQUENCE: 39

US 9,096,834 B2 125 126 - Continued

Lell Pro Asp Trp Ser Met Lell Luell Thir Ala Ile Thir Thir Wall Phe Wall 35 4 O 45

Ala Ala Glu Glin Trp Thir Met Luell Asp Arg Lys Ser Arg Ser SO 55 6 O

Asp Met Luell Wall Asp Ser Phe Gly Met Glu Arg Ile Wall Glin Asp Gly 65 70

Lell Wall Phe Arg Glin Ser Phe Ser Ile Arg Ser Glu Ile Gly Ala 85 90 95

Asp Arg Arg Ala Ser Ile Glu Thir Luell Met ASn His Lell Glin Glu Thir 105 11 O

Ser Luell Asn His Cys Ser Ile Arg Luell Luell Asn Glu Gly Phe Gly 115 12 O 125

Arg Thir Pro Glu Met Lys Arg Asp Luell Ile Trp Wall Wall Thir Arg 13 O 135 14 O

Met His Ile Met Wall Asn Arg Pro Thir Trp Gly Asp Thir Wall Glu 145 150 155 160

Ile Asn Thir Trp Wall Ser Glin Ser Gly Lys ASn Gly Met Gly Arg Asp 1.65 17O

Trp Luell Ile Ser Asp Asn Thir Gly Glu Ile Lell Ile Arg Ala Thir 18O 185 19 O

Ser Ala Trp Ala Met Met Asn Glin Thir Arg Arg Lell Ser Lel 195

Pro Tyr Glu Wall Ser Glin Glu Ile Ala Pro His Phe Wall Asp Ser 21 O 215 22O

Pro Wall Ile Glu Asp Gly Asp Arg Luell His Phe Asp Wall Lys 225 23 O 235 24 O

Thir Gly Asp Ser Ile Arg Gly Luell Thir Pro Arg Trp Asn Asp Lel 245 250 255

Asp Wall Asn Glin His Wall Asn Asn Wall Ile Gly Trp Ile Lel 26 O 265 27 O

Glu Ser Met Pro Thir Glu Wall Luell Glu Thir His Glu Lell Phe Lel 285

Thir Luell Glu Arg Glu Gly Arg Asp Ser Wall Luell Glu Ser 29 O 295 3 OO

Wall Thir Ala Met Asp Pro Ser Asn Glu Gly Gly Arg Ser His Glin 3. OS 310 315

His Luell Luell Arg Lell Glu Asp Gly Thir Asp Ile Wall Gly Arg Thir 3.25 330 335

Glu Trp Arg Pro Lys Asn Ala Arg Asn Ile Gly Ala Ile Ser Thir Gly 34 O 345 35. O

Thir Ser Asn Gly Asn Pro Ala Ser 355 360

SEO ID NO 45 LENGTH: 42 TYPE: DNA ORGANISM: Artificial Sequence FEATURE: OTHER INFORMATION: Primer NB462

<4 OOs, SEQUENCE: 45 Ctttgatggit tatatggcct gaggctgaaa tagctgttg ac

<210s, SEQ ID NO 46 &211s LENGTH: 47 US 9,096,834 B2 127 128 - Continued

&212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: 223s OTHER INFORMATION: Primer NB4 63

<4 OOs, SEQUENCE: 46 gatctgaagc titgaaaggitt attaactggc gggattacca ttactgg 47

<210s, SEQ ID NO 47 &211s LENGTH: 47 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: 223s OTHER INFORMATION: Primer NB4 64

<4 OOs, SEQUENCE: 47 gtaatc.ccgc cagittaataa cct ttcaa.gc titcagat caa titcgc.gc 47

<210s, SEQ ID NO 48 &211s LENGTH: 39 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: 223s OTHER INFORMATION: Primer NB4 65

<4 OOs, SEQUENCE: 48 at atttittat cittgttggtco ggaatticgcc cittctaagc 39

<210s, SEQ ID NO 49 &211s LENGTH: 57 & 212 TYPE DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: 223s OTHER INFORMATION: Primer NB4 66

<4 OOs, SEQUENCE: 49 ggg.cgaattic cqgaccacaa gataaaaata tat catcatgaacaataaaa citgtctg

<210s, SEQ ID NO 50 &211s LENGTH: 34 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: 223s OTHER INFORMATION: Primer NB4 67

<4 OOs, SEQUENCE: 50 ggtgc.catcc at accgggcc gcc.gtc.ccgt Caag 34

<210s, SEQ ID NO 51 &211s LENGTH: 6663 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: pSGI-NB19 vector

<4 OOs, SEQUENCE: 51

CCCC cqtgct atgact agcg gcgat.cgc.ca taccggccac gac catttgc attggat.ccc 6 O

Caacggcggc cacaact tcc atggcattga gatgcgggga atgatgttct agacitctgac 12 O gcaccaaag.c caattitttgt tgatggttgc aatggggatg act actgttc actittgc.ccc 18O cagogt caat gcc tag acct agcagtaccC cCagggctgt ggtag tecCC cccaccacgc 24 O attcqct tag cactaagtaa ctitt cqgcat gttcCtgggc taactgtgcg c cc cactgca 3OO alacc Ctgctgaaaaagatgc tccaccaggg cCaacggtaa cgcttgc cct gtggaaagac 360

US 9,096,834 B2 135 136 - Continued

<210s, SEQ ID NO 54 &211s LENGTH: 362 212. TYPE: PRT <213> ORGANISM: Cuphea decandra <4 OOs, SEQUENCE: 54 Met Gly Ile Asin Gly Ser Ser Val Gly Lieu Lys Ser Gly Ser Lieu Lys 1. 5 1O 15 Thr Glin Glu Asp Thr Pro Ser Ser Pro Pro Pro Arg Thr Phe Ile Asn 2O 25 3O Gln Leu Pro Asp Trp Ser Met Lieu. Leu Ala Ala Ile Thr Thr Val Phe 35 4 O 45 Lieu Ala Ala Glu Lys Glin Trp Met Met Lieu. Asp Trp Llys Pro Lys Arg SO 55 6 O Pro Asp Met Lieu Val Asp Pro Phe Gly Lieu. Gly Arg Ile Val Glin Asp 65 70 7s 8O Gly Lieu Val Phe Arg Glin Asn Phe Ser Ile Arg Ser Tyr Glu Ile Gly 85 90 95 Ala Asp Arg Thr Ala Ser Ile Glu Thir Lieu Met Asn His Lieu. Glin Glu 1OO 105 11 O Thir Ala Lieu. Asn His Val Llys Ser Ala Gly Lieu. Lieu. Asn Asp Gly Phe 115 12 O 125 Gly Arg Thr Pro Glu Met Tyr Lys Arg Asp Lieu. Ile Trp Val Val Ala 13 O 135 14 O Lys Met Glin Val Met Val Asn Arg Tyr Pro Thr Trp Gly Asp Thr Val 145 150 155 160 Glu Val Asn. Thir Trp Val Ala Lys Ser Gly Lys Asn Gly Met Arg Arg 1.65 17O 17s Asp Trp Lieu. Ile Ser Asp Cys Asn Thr Gly Glu Ile Lieu. Thir Arg Ala 18O 185 19 O Ser Ser Val Trp Val Met Met Asin Glin Llys Thr Arg Arg Lieu. Ser Lys 195 2OO 2O5 Ile Pro Asp Glu Val Arg His Glu Ile Glu Pro His Phe Val Asp Ser 21 O 215 22O Pro Pro Val Ile Glu Asp Asp Asp Arg Llys Lieu Pro Llys Lieu. Asp Glu 225 23 O 235 24 O Llys Thr Ala Asp Ser Ile Arg Lys Gly Lieu. Thr Pro Arg Trp Asn Asp 245 250 255 Lieu. Asp Wall Asin Glin His Val Asn. Asn. Wall Lys Tyr Ile Gly Trp Ile 26 O 265 27 O Lieu. Glu Ser Thr Pro Glin Glu Val Lieu. Glu Thr Glin Glu Lieu. Cys Ser 27s 28O 285 Lieu. Thir Lieu. Glu Tyr Arg Arg Glu. Cys Gly Arg Asp Ser Val Lieu. Glu 29 O 295 3 OO Ser Lieu. Thir Ala Val Asp His Ser Gly Lys Gly Ser Gly Ser Asn. Phe 3. OS 310 315 32O Glin His Lieu. Lieu. Arg Lieu. Glu Asp Gly Gly Glu Ile Val Lys Gly Arg 3.25 330 335

Thr Glu Trp Arg Pro Lys Asn Ala Val Ile Asin Gly Ala Val Ala Pro 34 O 345 35. O

Gly Glu Thir Ser Pro Gly Asn Ser Val Ser 355 360

<210s, SEQ ID NO 55 &211s LENGTH: 356 US 9,096,834 B2 137 138 - Continued

212. TYPE: PRT <213> ORGANISM: Cuphea paucipetela

<4 OO > SEQUENCE: 55 Met Ala Asn Gly Ser Ala Val Ser Lieu Lys Asp Gly Ser Lieu. Glu Thir 1. 5 1O 15 Gln Glu Gly Thr Ser Ser Ser Ser His Pro Pro Arg Thr Phe Ile Asn 2O 25 3O Gln Leu Pro Asp Trp Ser Met Leu Lleu Ser Ala Ile Thr Thr Val Phe 35 4 O 45 Val Ala Ala Glu Lys Gln Trp Thr Met Lieu. Asp Arg Llys Ser Lys Arg SO 55 6 O Pro Asp Met Leu Val Glu Pro Phe Val Glin Asp Gly Val Ser Phe Arg 65 70 7s 8O Glin Ser Phe Ser Ile Arg Ser Tyr Glu Ile Gly Ala Asp Arg Thr Ala 85 90 95

Ser Ile Glu. Thir Lieu Met ASn Ile Phe Glin Glu. Thir Ser Lieu. Asn His 1OO 105 11 O Cys Llys Ser Lieu. Gly Lieu. Lieu. Asn Asp Gly Phe Gly Arg Thr Pro Glu 115 12 O 125 Met Cys Lys Arg Asp Lieu. Ile Trp Val Val Thir Lys Met Glin Ile Glu 13 O 135 14 O Val Asn Arg Tyr Pro Thr Trp Gly Asp Thir Ile Glu Val Thir Thr Trp 145 150 155 160 Val Ser Glu Ser Gly Lys Asn Gly Met Ser Arg Asp Trp Lieu. Ile Ser 1.65 170 175 Asp Cys His Thr Gly Glu Ile Lieu. Ile Arg Ala Thir Ser Val Trp Ala 18O 185 19 O Met Met Asn Glin Llys Thr Arg Arg Lieu. Ser Lys Ile Pro Asp Glu Val 195 2OO 2O5 Arg Glin Glu Ile Val Pro Tyr Phe Val Asp Ser Ala Pro Val Ile Glu 21 O 215 22O Asp Asp Arg Llys Lieu. His Llys Lieu. Asp Wall Lys Thr Gly Asp Ser Ile 225 23 O 235 24 O Arg Asn Gly Lieu. Thr Pro Arg Trp Asn Asp Lieu. Asp Wall Asn Gln His 245 250 255 Val Asn Asn Val Lys Tyr Ile Gly Trp Ile Leu Lys Ser Val Pro Thr 26 O 265 27 O Glu Val Phe Val Thr Glin Glu Lieu. Cys Gly Lieu. Thir Lieu. Glu Tyr Arg 27s 28O 285 Arg Glu. Cys Arg Arg Asp Ser Val Lieu. Glu Ser Val Thr Ala Met Asp 29 O 295 3 OO Pro Ser Lys Glu Gly Asp Arg Ser Lieu. Tyr Gln His Lieu. Lieu. Arg Lieu. 3. OS 310 315 32O Glu Asn Gly Ala Asp Ile Ala Lieu. Gly Arg Thr Glu Trp Arg Pro Llys 3.25 330 335

Asn Ala Gly Thr Asn Gly Ala Ile Ser Thr Thr Lys Thr Ser Pro Gly 34 O 345 35. O

Asn. Ser Wal Ser 355

<210s, SEQ ID NO 56 &211s LENGTH: 267 212. TYPE: PRT <213> ORGANISM; Elaeis oleifera US 9,096,834 B2 139 140 - Continued

<4 OOs, SEQUENCE: 56

Asn. Ser Ile Pro Asp Trp Ser Met Luell Luell Ala Ala Wall Thir Thir Ile 1. 5 15

Phe Luell Ala Ala Glu Glin Trp Thir Luell Luell Asp Trp Lys Pro Arg 25 3O

Arg Pro Asp Met Lell Thir Asp Ala Phe Ser Luell Gly Glu Ile Wall Glin 35 4 O 45

Asp Gly Luell Wall Phe Arg Glin Asn Phe Ser Ile Arg Ser Glu Ile SO 55 6 O

Gly Ala Asp Arg Thir Ala Ser Ile Glu Thir Luell Met Asn His Lieu. Glin 65 70 8O

Glu Thir Wall Luell Asn His Wall Arg Asn Ala Gly Lell Lell Gly Asp Gly 85 90 95

Phe Gly Ala Thir Pro Glu Met Ser Lys Arg ASn Lell Ile Trp Wall Wall 105 11 O

Thir Met Glin Wall Lell Ile Glu His Pro Ser Trp Gly Asp Wall 115 12 O 125

Wall Glu Wall Asp Thir Trp Wall Gly Ala Ser Gly Lys Asn Gly Met Arg 13 O 135 14 O

Arg Asp Trp His Wall Arg Asp Arg Thir Gly Glin Thir Ile Pro Arg 145 150 155 160

Ala Thir Ser Ile Trp Wall Met Met Asn Lys His Thir Arg Luell Ser 1.65 17O 17s

Met Pro Glu Glu Wall Arg Ala Glu Ile Gly Pro Phe Met Glu 18O 185 19 O

His Ala Thir Ile Wall Asp Glu Asp Ser Gly Lys Lell Pro Lieu. Asp 195 2OO 2O5

Asp Asp Thir Ala Asp Ile Trp Gly Luell Thir Pro Arg Trp Ser 21 O 215 22O

Asp Luell Asp Wall Asn Glin His Wall Asn His Wall Ile Gly Trp 225 23 O 235 24 O

Ile Luell Glu Ser Ala Pro Ile Ser Ile Luell Glu Asn His Asn Trp Ala 245 250 255

Ser Luell Ile Luell Asp Thir Gly Arg Asn Trp ASn 26 O 265

<210s, SEQ ID NO 57 &211s LENGTH: 417 212. TYPE : PRT &213s ORGANISM: Elaeis guineens is

<4 OOs, SEQUENCE: f

Met Wall Ala Ser Ile Wall Ala Trp Ala Phe Phe Pro Thir Pro Ser Phe 1. 5 1O 15

Ser Pro Thir Ala Ser Ala Ala Ser Thir Ile Gly Glu Gly Ser 25

Glu Asn Luell Asn Wall Arg Gly Ile Ile Ala Pro Thir Ser Ser Ser 35 4 O 45

Ala Ala Glin Gly Wall Met Ala Glin Ala Wall Pro Ile Asn SO 55 6 O

Gly Ala Wall Gly Lell Ala Glu Ser Glin Ala Glu Glu Asp 65 70 7s 8O

Ala Ala Pro Ser Ser Ala Pro Arg Thir Phe Asn Glin Luell Pro Asp 85 90 95 US 9,096,834 B2 141 142 - Continued

Trp Ser Wall Luell Lell Ala Ala Wall Thir Thir Ile Phe Lell Ala Ala Glu 105 11 O

Glin Trp Thir Lell Lell Asp Trp Pro Arg Arg Pro Asp Met Luell 115 12 O 125

Thir Gly Ala Phe Ser Lell Gly Lys Ile Wall Glin Asp Gly Luell Wall Phe 13 O 135 14 O

Arg Glin Asn Phe Ser Ile Arg Ser Glu Ile Gly Ala Asp Arg Thir 145 150 155 160

Ala Ser Ile Glu Thir Lell Met Asn His Lieu. Glin Glu Thir Ala Luell Asn 1.65 17O 17s

His Wall Arg Asn Ala Gly Lell Luell Gly Asp Gly Phe Gly Ala Thir Pro 18O 185 19 O

Glu Met Ser Asn Lell Ile Trp Wall Wall Thir Lys Met Glin Wall 195

Lell Ile Glu His Tyr Pro Ser Trp Gly Asp Wall Wall Glu Wall Asp Thir 21 O 215 22O

Trp Wall Gly Ala Ser Gly Lys Asn Gly Met Arg Arg Asp Trp His Wall 225 23 O 235 24 O

Arg Asp Arg Thir Gly Glin Thir Ile Lieu. Arg Ala Thir Ser Ile Trp 245 250 255

Wall Met Met Asp Lys His Thir Arg Lys Luell Ser Met Pro Glu Glu 26 O 265 27 O

Wall Arg Ala Glu Ile Gly Pro Tyr Phe Met Glu His Ala Ala Ile Wall 27s 285

Asp Glu Asp Ser Arg Lieu Pro Lieu. Asp Asp Asp Thr Ala Asp 29 O 295 3 OO

Tyr Ile Trp Gly Lell Thir Pro Arg Trp Ser Asp Lell Asp Wall Asn 3. OS 310 315

Glin His Wall Asn Asn Wall Ile Gly Trp Ile Lell Glu Ser Ala 3.25 330 335

Pro Ile Ser Ile Lell Glu Asn His Glu Lieu Ala Ser Met Thir Luell Glu 34 O 345 35. O

Arg Arg Glu Gly Arg Asp Ser Wall Lieu. Glin Ser Luell Thir Ala 355 360 365

Wall Ala Asn Asp Thir Gly Gly Luell Pro Glu Ala Ser Ile Glu 37 O 375

Glin His Luell Luell Glin Lell Glu Gly Ala Glu Ile Wall Arg Gly Arg 385 390 395 4 OO

Thir Glin Trp Arg Pro Arg Arg Ala Ser Gly Pro Thir Ser Ala Gly Ser 4 OS 41O 415

Ala

SEO ID NO 58 LENGTH: 4 O TYPE: DNA ORGANISM: Artificial Sequence FEATURE: OTHER INFORMATION: Primer NB4 6 O

SEQUENCE: 58 gttitat caca gttaaattat togctgaag.cg gaatc cctogg

<210s, SEQ ID NO 59 &211s LENGTH: 38 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence US 9,096,834 B2 143 144 - Continued

22 Os. FEATURE: 223s OTHER INFORMATION: Primer NB4 61

<4 OO > SEQUENCE: 59 cago ct cagg ccatataacc atcaaag.cca tagttggc 38

SEQ ID NO 60 LENGTH: 32 TYPE: DNA ORGANISM: Artificial Sequence FEATURE: OTHER INFORMATION: Primer NB468

< 4 OOs SEQUENCE: 6 O gggacggcgg CCC99tatgg atggCaccga tig 32

SEQ ID NO 61 LENGTH: 4 O TYPE: DNA ORGANISM: Artificial Sequence FEATURE: OTHER INFORMATION: Primer NB87

< 4 OOs SEQUENCE: 61 caattggcat gcaccc ct at ttgtttattt ttctaaatac 4 O

SEQ ID NO 62 LENGTH: 47 TYPE: DNA ORGANISM: Artificial Sequence FEATURE; OTHER INFORMATION: Primer NB88

< 4 OOs SEQUENCE: 62 caattggttt aaacggccta agctt tatgc titgitaaacco ttttgttg 47

SEQ ID NO 63 LENGTH: 35 TYPE: DNA ORGANISM: Artificial Sequence FEATURE: OTHER INFORMATION: Primer NB5

<4 OOs, SEQUENCE: 63 gatcgacgta taaactgtca acatatttct gcaag 35

What is claimed is: for one or more acyl Substrates having an acyl chain length of 1. A recombinant microorganism comprising a first non 50 C12 and the LPAAT has a substrate preference for one or native nucleic acid encoding an acyl-ACP thioesterase from a more Substrates having an acyl chain length selected from the higher plant species or from a prokaryotic organism and a group consisting of C14, C16, and C18 acyl chain length; and second non-native nucleic acid encoding a lysophosphatidic (4) the thioesterase has a substrate preference for one or more acid acyltransferase (LPAAT) from a prokaryotic organism, acyl Substrates having an acyl chain length selected from the wherein the microorganism produces a fatty acid product, 55 group consisting of C12, C14, C16, and C18, and the LPAAT wherein the thioesterase and the LPAAT have complementary has a substrate preference for substrates having a C20, C22, or acyl-ACP substrate preferences, and wherein the comple C24 acyl chain length. mentary acyl-ACP substrate preferences are selected from the 2. The recombinant microorganism of claim 1, wherein the group consisting of: (1) the thioesterase has a Substrate pref LPAAT has a substrate preference for C16 acyl substrates. erence for one or more acyl Substrates having an acyl chain 60 3. The recombinant microorganism of claim 2, wherein the length selected from the group consisting of C12, C14, and thioesterase has a Substrate preference for one or more of a C16, and the LPAAT has a substrate preference for substrates C12 and C14 acyl substrate. having a C18 acyl chain length; (2) the thioesterase has a 4. The recombinant microorganism of claim 1, wherein the Substrate preference for one or more acyl Substrates having an LPAAT has a substrate preference for C18 acyl substrates. acyl chain length of C12 and C14 and the LPAAT has a 65 5. The recombinant microorganism of claim 4, wherein the substrate preference for substrates having a C16 or C18 acyl thioesterase has a Substrate preference for one or more of a chain length; (3) the thioesterase has a Substrate preference C12, C14, and C16 acyl substrate. US 9,096,834 B2 145 146 6. The recombinant microorganism of claim 1, wherein the 19. The method of claim 18, wherein the fatty acid product LPAAT is a cyanobacterial LPAAT. is one or more of a free fatty acid, a fatty aldehyde, a fatty 7. The recombinant microorganism of claim 1, wherein the alcohol, a fatty acid ester, a wax ester, or a hydrocarbon. LPAAT is endogenous to the host microorganism. 20. The method of claim 19, wherein the fatty acid product 8. The recombinant microorganism of claim 7, wherein the comprises at least one C to Ce free fatty acid. nucleic acid encoding the LPAAT is operably linked to a 21. The method of claim 18, wherein the fatty acid product heterologous promoter. includes at least one C to C fatty alcohol or fatty aldehyde, 9. The recombinant microorganism of claim 8, wherein the at least one wax ester having an A chain of C2 to C and a B heterologous promoter is regulatable. chain ofC to C, or at least one C to Cisalkane oralkene. 10 22. The method of claim 18, further comprising isolating 10. The recombinant microorganism of claim 1, wherein the fatty acid product from the microorganism, the culture the thioesterase has a substrate preference for one or more of medium, or a combination thereof. a C12, C14, and C16 acyl substrate. 23. The method of claim 18, wherein at least a portion of 11. The recombinant microorganism of claim 1, wherein the fatty acid product is secreted from the recombinant micro the recombinant microorganism further comprises a dis 15 organism. rupted acyl-ACP synthetase gene. 24. The method of claim 18, wherein the amount of fatty 12. The recombinant microorganism of claim 1, wherein acid product produced by the recombinant microorganism is the recombinant microorganism produces at least 10% more at least 10% greater than the amount produced by an other fatty acid product than a microorganism Substantially identi wise identical microorganism lacking the non-native nucleic cal in all respects except that it lacks a non-native nucleic acid acid sequence encoding a thioesterase that is cultured under sequence encoding an LPAAT. the same conditions. 13. The recombinant microorganism of claim 1, wherein 25. A recombinant microorganism comprising a first non the recombinant microorganism is a photosynthetic microor native nucleic acid encoding a thioesterase having a substrate ganism. preference for one or more acyl Substrates having an acyl 14. The recombinant microorganism of claim 13, wherein 25 chain length selected from the group consisting of C12, C14. the recombinant microorganism is a microalga. and C16 and a second non-native nucleic acid encoding a 15. The recombinant microorganism of claim 14, wherein LPAAT, having a substrate preference for substrates having a the microalga is a species of Achnanthes, Amphiprora, C18 acyl chain length wherein the microorganism produces a Amphora, Ankistrodesmus, Asteromonas, Boekelovia, Borod fatty acid product, and wherein the LPAAT has at least 85% inella, Botryococcus, Bracteococcus, Chaetoceros, Carteria, 30 identity to a LPAAT comprising an amino acid sequence Chlamydomonas, Chlorococcum, Chlorogonium, Chlorella, selected from the group consisting of SEQID NO:2, SEQID Chroomonas, Chrysosphaera, Cricosphaera, Crypthecod NO:3, SEQID NO:4, SEQID NO:5, SEQID NO:6, SEQID inium, Cryptomonas, Cyclotella, Dunaliella, Ellipsoidon, NO:7, SEQIDNO:8, SEQIDNO:9, SEQID NO:10, SEQID Emiliania, Eremosphaera, Ernodesmius, Euglena, Franceia, NO:11, SEQID NO:12, SEQID NO:13, SEQID NO:14, and Fragilaria, Gloeothamnion, Haematococcus, Halocafeteria, 35 SEQID NO:15. Hymenomonas, Isochrysis, Lepocinclis, Micractinium, 26. The recombinant microorganism of claim 25, wherein Monoraphidium, Nannochloris, Nannochloropsis, Navicula, the thioesterase is an acyl-ACP thioesterase, an acyl-CoA Neochloris, Nephrochloris, Nephroselmis, Nitzschia, thioesterase, or a 4-hydroxybenzoyl-CoA thioesterase. Ochromonas, Oedogonium, Oocystis, Ostreococcus, Pav 27. The recombinant microorganism of claim 26, wherein lova, Parachlorella, Pascheria, Phaeodactylum, Phagus, 40 the thioesterase is an acyl-ACP thioesterase. Picochlorum, Platymonas, Pleurochrysis, Pleurococcus, 28. The recombinant microorganism of claim 27, wherein Prototheca, Pseudochlorella, Pseudoneochloris, Pyramino the acyl-ACP thioesterase is a higher plant acyl-ACP nas, Pyrobotrys, Scenedesmus, Skeletonema, Spyrogyra, Sti thioesterase or a prokaryotic acyl-ACP thioesterase. chococcus, Tetraselmis, Thalassiosira, Viridiella, or Volvox. 29. The recombinant microorganism of claim 25, wherein 16. The recombinant microorganism of claim 13, wherein 45 the recombinant microorganism further comprises a dis the photosynthetic microorganism is a cyanobacterium. rupted acyl-ACP synthetase gene. 17. The recombinant microorganism of claim 16, wherein 30. The recombinant microorganism of claim 25, wherein the cyanobacterium is a species of Agmenellum, Anabaena, the LPAAT has at least 90% identity to a LPAAT comprising Anabaenopsis, Anacystis, Aphanizomenon, Arthrospira, an amino acid sequence selected from the group consisting of Asterocapsa, Borzia, Calothrix, Chamaesiphon, Chloroglo 50 SEQID NO:2, SEQID NO:3, SEQID NO:4, SEQID NO:5, eopsis, Chroococcidiopsis, Chroococcus, Crinalium, Cyano SEQID NO:6, SEQID NO:7, SEQID NO:8, SEQID NO:9, bacterium, Cyanobium, Cyanocystis, Cyanospira, Cyanoth SEQ ID NO:10, SEQ ID NO:11, SEQ ID NO:12, SEQ ID eCe, Cylindrospermopsis, Cylindrospermum, NO:13, SEQID NO:14, and SEQID NO:15. Dactylococcopsis, Dermocarpella, Fischerella, Fremyella, 31. The recombinant microorganism of claim 30, wherein Geitleria, Geitlerinema, Gloeobacter; Gloeocapsa, Gloeoth 55 the LPAAT has at least 95% identity to a LPAAT comprising ece, Halospirulina, Iyengariella, Leptolyngbya, Limnothrix, an amino acid sequence selected from the group consisting of Lyngbya, Microcoleus, Microcystis, Myxosarcina, Nodu SEQID NO:2, SEQID NO:3, SEQID NO:4, SEQID NO:5, laria, Nostoc, Nostochopsis, Oscillatoria, Phormidium, SEQID NO:6, SEQID NO:7, SEQID NO:8, SEQID NO:9, Planktothrix, Pleurocapsa, Prochlorococcus, Prochloron, SEQ ID NO:10, SEQ ID NO:11, SEQ ID NO:12, SEQ ID Prochlorothrix, Pseudanabaena, Rivularia, Schizothrix, Scy 60 NO:13, SEQID NO:14, and SEQID NO:15. tonema, Spirulina, Stanieria, Starria, Stigonema, Symploca, 32. A recombinant microorganism comprising a first non Synechococcus, Synechocystis, Thermosynechococcus, Toly native nucleic acid encoding an acyl-ACP thioesterase from a pothrix, Trichodesmium, Tvchonema or Xenococcus. higher plant species or from a prokaryotic organism and a 18. A method of producing a fatty acid product comprising second non-native nucleic acid encoding an LPAAT from a culturing the recombinant microorganism of claim 1 under 65 prokaryotic organism, wherein the microorganism produces a conditions in which at least one fatty acid product is pro fatty acid product, wherein the thioesterase and the LPAAT duced. have complementary acyl-ACP Substrate preferences, and US 9,096,834 B2 147 148 wherein the complementary acyl-ACP substrate preferences are selected from the group consisting of: (1) the thioesterase has a Substrate preference for one or more acyl Substrates having an acyl chain length of C12 and C14 and the LPAAT has a substrate preference for substrates having a C16 or C18 acyl chain length; (2) the thioesterase has a Substrate prefer ence for one or more acyl substrates having an acyl chain length of C12 and the LPAAT has a substrate preference for one or more Substrates having an acyl chain length selected from the group consisting of C14, C16, and C18 acyl chain 10 length; and (3) the thioesterase has a substrate preference for one or more acyl Substrates having an acyl chain length selected from the group consisting of C12, C14, and C18, and the LPAAT has a substrate preference for substrates having a C20, C22, or C24 acyl chain length. 15 k k k k k