<<

US007256023B2

(12) United States Patent (10) Patent No.: US 7.256,023 B2 Metz et al. (45) Date of Patent: Aug. 14, 2007

(54) PUFA POLYKETIDE SYNTHASE SYSTEMS (58) Field of Classification Search ...... None AND USES THEREOF See application file for complete search history. (75) Inventors: James G. Metz, Longmont, CO (US); (56) References Cited James H. Flatt, Longmont, CO (US); U.S. PATENT DOCUMENTS Jerry M. Kuner, Longmont, CO (US); William R. Barclay, Boulder, CO (US) 5,130,242 A 7/1992 Barclay et al. 5,246,841 A 9, 1993 Yazawa et al...... 435.134 (73) Assignee: Martek Biosciences Corporation, 5,639,790 A 6/1997 Voelker et al...... 514/552 Columbia, MD (US) 5,672.491 A 9, 1997 Khosla et al...... 435,148 (*) Notice: Subject to any disclaimer, the term of this (Continued) patent is extended or adjusted under 35 U.S.C. 154(b) by 347 days. FOREIGN PATENT DOCUMENTS EP O823475 A1 2, 1998 (21) Appl. No.: 11/087,085 (22) Filed: Mar. 21, 2005 (Continued) OTHER PUBLICATIONS (65) Prior Publication Data Abbadi et al., Eur: J. Lipid Sci. Technol. 103: 106-113 (2001). US 2005/0273884 A1 Dec. 8, 2005 (Continued) Related U.S. Application Data Primary Examiner Nashaat T. Nashed (63) Continuation of application No. 10/124,800, filed on (74) Attorney, Agent, or Firm—Sheridan Ross P.C. Apr. 16, 2002, which is a continuation-in-part of application No. 09/231,899, filed on Jan. 14, 1999, (57) ABSTRACT now Pat. No. 6,566,583. (60) Provisional application No. 60/284,066, filed on Apr. The invention generally relates to polyunsaturated fatty acid 16, 2001, provisional application No. 60/298.796, (PUFA) polyketide synthase (PKS) systems isolated from or filed on Jun. 15, 2001, provisional application No. derived from non-bacterial organisms, to homologues 60/323.269, filed on Sep. 18, 2001. thereof, to isolated nucleic acid molecules and recombinant nucleic acid molecules encoding biologically active (51) Int. Cl. domains of such a PUFA PKS system, to genetically modi CI2P 7/64 (2006.01) fied organisms comprising PUFA PKS systems, to methods CI2N I/2 (2006.01) of making and using Such systems for the production of C7H 2L/04 (2006.01) bioactive molecules of interest, and to novel methods for (52) U.S. Cl...... 435/134:435/325.435/410; identifying new bacterial and non-bacterial microorganisms 435/252.3; 435/254.1; 435/257.1; 435/183; having such a PUFA PKS system. 435/232; 435/320.1:536/23.2:536/23.1; 536/23.7 28 Claims, 3 Drawing Sheets

307 kDa Of A

Orf C US 7.256,023 B2 Page 2

U.S. PATENT DOCUMENTS Jostensen & Landfald, FEMS Microbiology Letters, 151:95-101 (1997). 5,683,898 A 1 1/1997 Yazawa et al...... 435.136 Katz & Donadio, Annu. Rev. Microbiol., 47:875-912 (1993). 6,033,883. A 3, 2000 Barr et al. ... 435/148 Keating et al., Curr: Opin. Chem. Biol. 3:598-606 (1999). 6,140,486 A 10/2000 Facciotti et al...... 536,232 Kyle et al., HortScience, 25:1523-26 (1990). 6,566,583 B1 5, 2003 Facciotti et al. Leadlay PF. Current Opinion in Chemical Biology (1997) 1: 2004/OOO5672 A1 1/2004 Santi et al. 162:168. Magnuson, Microbil. Rev. 57(3):522-542 (1993) Abstract. FOREIGN PATENT DOCUMENTS Metz et al., Science, 293:290–293 (2001). WO WO 93,23545 11, 1993 Nakahara, Yukagaku, 44(10):821-7 (1995). WO WO/96/21735 T 1996 Nasu et al., J. Ferment. Bioeng. 122:467-473 (1997). WO WO 98,46764 10, 1998 Nichols et al., Curr: Opin. Biotechnol., 10:240-246 (1999). WO WO 98,55625 12/1998 Nicholson et al., Chemistry and Biology (London), 8:2 (2001) pp. WO WOOOf 42.195 T 2000 157-178. Nogi et al., Extremophiles, 2:1-7 (1998). OTHER PUBLICATIONS Oliynyk et al. Chemistry & Biology (1996) 3: 833-839. Parker-Barnes et al., PNAS, 97(15):8284-8289 (2000). Allen et al. Appl. Envir: Microbiol. 65(4): 1710-1720 (1999). Sánchez et al., Chemistry & Biolosy, 8:725-738 (2001). Bateman et al., Nucl. Acids Res., 30(1):276-280 (2002). Shanklin et al., Annu. Rev. Plant Physiol. Plant Mol. Biol, 49:611 Bentley et al., Annu. Rev. Microbiol., 53:411-46 (1999). 41 (1998). Bisang et al., Nature, 401:502-505 (1999). Smith et al., Nature Biotechnol. 15:1222-1223 (1997). Bork, TIG, 12(10):425-427 (1996). Brenner, TIG, 15(4): 132-133 (1999). Somerville Am. J. Clin. Nutr. 58(2 supp):270S-275S (1993). Broun et al., Science, 282:1315-1317 (1998). Van de Loo, Proc. Natl. Acad. Sci. USA, 92:6743-6747 (1995). Chuck et al., Chem. and Bio., Current Bio. (London, GB), 4:10 Wallis et al., Tibs Trends. In Biochemical Sci. 27:9 (2002) pp. (1997) pp. 757-766. 467-473. Creelman et al., Annu. Rev. Plan Physiol. Plant Mol. Biol., 48:355 Watanabe et al., J. Biochem., 122:467-473 (1997). 81 (1997). Wiessmann et al. Chemistry & Biology (Sep. 1995) 2: 583-589. DeLong & Yayanos, Appl. Environ. Microbiol., 51(4):730-737 Weissmann et al. Biochemistry (1997) 36: 13849-13855. (1986). Weissmann et al. Biochemistry (1998). 37: 11012-11017. Doerks, TIG, 14(6):248-250 (1998). Yalpani et al., The Plant Cell, 13:1401-1409 (2001). Facciotti et al., "Cloning and Characterization of Polyunsaturated Yazawa, Lipids, 31(supp):S297-S300 (1996). Fatty Acids (PUFA) Genes from Marine Bacteria” in Proceedings of Kealey et al., “Production of a polyketide natural product in the international symposium on progress and prospect of marine non-polyketide-producing prokaryotic and eukaryotic hosts'. Pro biotechnology (China Pres 1999), pp. 404-405 Abstract. ceedings of the National Academy of Sciences of the United States GenBank Accession No. U09865, Alcaligenes eutrophus pyruvate of America, vol. 95, No. 2, Jan. 20, 1998, pp. 505-509, dehydrogenase (pdhA), dihydrolipoamide acetyltransferase (pdhB), XPOO23.38563. dihydrolipoamide dehydrogenase (pdhL), and ORF3 genes, com DATABASE Geneseq 'Online Dec. 11, 2000, "S. aggregatum PKS plete cols (1994). cluster ORF6 homolog DNA.” XP002368912, retrieved from EBI Harlow et al. Antibodies: A Laboratory Manual (1988) Cold Spring accession No. GSN: AAA71567Database accession No. Harbor Laboratory Press, p. 76. AAA71567 & DATABASE Geneseq 'Online! Dec. 11, 2000, “S. Heath et al., J. Biol. Chem., 271 (44):27795-27801 (1996). aggregatum PKS cluster ORF6 homolog protein.” XP002368914 Hopwood & Sherman, Annu. Rev. Genet., 24:37-66 (1990). retrieved from EBI accession No. GSP:AAB10482 Database acces Hutchinson, Annu. Rev. Microbiol. 49:201-238 (1995). sion No. AAB10482 & WO 00/42.195 A (Calgene, LLC) Jul. 20, Jez et al., Chem. and Bio. (London), 7:12 (2000) pp. 919-930. 2000. U.S. Patent Aug. 14, 2007 Sheet 1 of 3 US 7.256,023 B2 :

U.S. Patent Aug. 14, 2007 Sheet 2 of 3 US 7.256,023 B2

Z*OIH

US 7,256,023 B2 1. 2 PUFA POLYKETDE SYNTHASE SYSTEMIS Type I iterative (fungal-like) PKS system differs from Type AND USES THEREOF II in that enzymatic activities, instead of being associated with separable proteins, occur as domains of larger proteins. CROSS-REFERENCE TO RELATED This system is analogous to the Type I FAS systems found APPLICATIONS in animals and fungi. In contrast to the Type I iterative (fungal-like) and Type This application is a continuation of U.S. patent applica II systems, in Type I modular PKS systems, each enzyme tion Ser. No. 10/124,800, filed Apr. 16, 2002, which claims domain is used only once in the production of the end the benefit of priority under 35 U.S.C. S 119(e) to: U.S. product. The domains are found in very large proteins that Provisional Application Ser. No. 60/284,066, filed Apr. 16, 10 do not utilize the economy of iterative reactions (i.e., a 2001, entitled “A Polyketide Synthase System and Uses distinct domain is required for each reaction) and the prod Thereof: U.S. Provisional Application Ser. No. 60/298.796, uct of each reaction is passed on to another domain in the filed Jun. 15, 2001, entitled “A Polyketide Synthase System PKS protein. Additionally, in all of the PKS systems and Uses Thereof; and U.S. Provisional Application Ser. described above, if a carbon-carbon double bond is incor No. 60/323,269, filed Sep. 18, 2001, entitled “Thraus 15 porated into the end product, it is always in the trans tochytrium PUFA PKS System and Uses Thereof. U.S. configuration. patent application Ser. No. 10/124,800 is also a continua Polyunsaturated fatty acids (PUFAs) are critical compo tion-in-part of copending U.S. application Ser. No. 09/231, nents of membrane lipids in most (Lauritzen et 899, filed Jan. 14, 1999, entitled “Schizochytrium PKS al., Prog. Lipid Res. 401 (2001); McConnet al., Plant J. 15, Genes' now U.S. Pat. No. 6,566,583. Each of the above 521 (1998)) and are precursors of certain hormones and identified patent applications is incorporated herein by ref signaling molecules (Heller et al., Drugs 55, 487 (1998); erence in its entirety. Creelman et al., Annu. Rev. Plant Physiol. Plant Mol. Biol. This application does not claim the benefit of priority 48, 355 (1997)). Known pathways of PUFA synthesis from U.S. application Ser. No. 09/090,793, filed Jun. 4, involve the processing of saturated 16:0 or 18:0 fatty acids 1998, now U.S. Pat. No. 6,140,486, although U.S. applica 25 (the abbreviation X:Y indicates an acyl group containing X tion Ser. No. 09/090,793 is incorporated herein by reference carbonatoms and Y cis double bonds; double-bond positions in its entirety. of PUFAs are indicated relative to the methyl carbon of the fatty acid chain (c)3 or (O6) with systematic methylene FIELD OF THE INVENTION interruption of the double bonds) derived from fatty acid 30 synthase (FAS) by elongation and aerobic desaturation reac This invention relates to polyunsaturated fatty acid tions (Sprecher, Curr. Opin. Clin. Nutr. Metab. Care 2, 135 (PUFA) polyketide synthase (PKS) systems from microor (1999); Parker-Barnes et al., Proc. Natl. Acad. Sci. USA 97, ganisms, including eukaryotic organisms, such as Thraus 8284 (2000); Shanklin et al., Annu. Rev. Plant Physiol. Plant tochytrid microorganisms. More particularly, this invention Nol. Biol. 49, 611 (1998)). Starting from acetyl-CoA, the relates to nucleic acids encoding non-bacterial PUFA PKS 35 synthesis of DHA requires approximately 30 distinct systems, to non-bacterial PUFA PKS systems, to genetically enzyme activities and nearly 70 reactions including the four modified organisms comprising non-bacterial PUFA PKS repetitive steps of the fatty acid synthesis cycle. Polyketide systems, and to methods of making and using the non synthases (PKSS) carry out Some of the same reactions as bacterial PUFA PKS systems disclosed herein. This inven FAS (Hopwood et al., Annu. Rev. Genet. 24, 37 (1990); tion also relates to a method to identify bacterial and 40 Bentley et al., Annu. Rev. Microbiol. 53,411 (1999)) and use non-bacterial microorganisms comprising PUFA PKS sys the same Small protein (or domain), acyl carrier protein temS. (ACP), as a covalent attachment site for the growing carbon chain. However, in these enzyme systems, the complete BACKGROUND OF THE INVENTION cycle of reduction, dehydration and reduction seen in FAS is 45 often abbreviated so that a highly derivatized carbon chain Polyketide synthase (PKS) systems are generally known is produced, typically containing many keto- and hydroxy in the art as enzyme complexes derived from fatty acid groups as well as carbon-carbon double bonds in the trans synthase (FAS) systems, but which are often highly modified configuration. The linear products of PKSs are often to produce specialized products that typically show little cyclized to form complex biochemicals that include antibi resemblance to fatty acids. Researchers have attempted to 50 otics and many other secondary products (Hopwood et al., exploit polyketide synthase (PKS) systems that have been (1990) supra; Bentley et al., (1999), supra; Keating et al., described in the literature as falling into one of three basic Curr. Opin. Chem. Biol. 3, 598 (1999)). types, typically referred to as: Type II, Type I iterative Very long chain PUFAs such as docosahexaenoic acid (fungal-like) and Type I modular. The Type II system is (DHA; 22:6c)3) and eicosapentaenoic acid (EPA; 20:5c)3) characterized by separable proteins, each of which carries 55 have been reported from several species of marine bacteria, out a distinct enzymatic reaction. The enzymes work in including Shewanella sp (Nichols et al., Curr. Op. Biotech concert to produce the end product and each individual mol. 10, 240 (1999); Yazawa, Lipids 31, S (1996); DeLong enzyme of the system typically participates several times in et al., Appl. Environ. Microbiol. 51,730 (1986)). Analysis of the production of the end product. This type of system a genomic fragment (cloned as plasmid pEPA) from operates in a manner analogous to the fatty acid synthase 60 Shewanella sp. strain SCRC2738 led to the identification of (FAS) systems found in plants and bacteria. Type literative five open reading frames (Orfs), totaling 20 Kb, that are (fungal-like) PKS systems are similar to the Type II system necessary and sufficient for EPA production in E. coli in that the enzymes are used in an iterative fashion to (Yazawa, (1996), supra). Several of the predicted protein produce the end product. In other words, the same set of domains were homologues of FAS enzymes, while other reactions is carried out in each cycle until the end product is 65 regions showed no homology to proteins of known function. obtained. There is no allowance for the introduction of On the basis of these observations and biochemical studies, unique reactions during the biosynthetic procedure. The it was suggested that PUFA synthesis in Shewanella US 7,256,023 B2 3 4 involved the elongation of 16- or 18-carbon fatty acids not require NADH. All these results are consistent with the produced by FAS and the insertion of double bonds by pEPA genes encoding a multifunctional PKS that acts inde undefined aerobic desaturases (Watanabe et al., J. Biochem. pendently of FAS, elongase, and desaturase activities to 122, 467 (1997)). The recognition that this hypothesis was synthesize EPA directly. It is likely that the PKS pathway for incorrect began with a reexamination of the protein PUFA synthesis that has been identified in Shewanella is sequences encoded by the five Shewanella Orfs. At least 11 widespread in marine bacteria. Genes with high homology regions within the five Orfs were identifiable as putative to the Shewanella gene cluster have been identified in enzyme domains (See Metz et al., Science 293:290–293 Photobacterium profundum (Allen et al., Appli. Environ. (2001)). When compared with sequences in the gene data Microbiol. 65:1710 (1999)) and in Moritella marina (Vibrio bases, seven of these were more strongly related to PKS 10 proteins than to FAS proteins. Included in this group were marinus) (Tanaka et al., Biotechnol. Lett. 21:939 (1999)). domains putatively encoding malonyl-CoA:ACP acyltrans The biochemical and molecular-genetic analyses per formed with Shewanella provide compelling evidence for ferase (MAT), 3-ketoacyl-ACP synthase (KS), 3-ketoacyl polyketide synthases that are capable of synthesizing PUFAs ACP reductase (KR), acyltransferase (AT), phosphopanteth from malonyl-CoA. A complete scheme for synthesis of EPA eine transferase, chain length (or chain initiation) factor 15 (CLF) and a highly unusual cluster of six ACP domains (i.e., by the Shewanella PKS has been proposed. The identifica the presence of more than two clustered ACP domains has tion of protein domains homologous to the E. coli FabA not previously been reported in PKS or FAS sequences). protein, and the observation that bacterial EPA synthesis However, three regions were more highly homologous to occurs anaerobically, provide evidence for one mechanism bacterial FAS proteins. One of these was similar to the wherein the insertion of cis double bonds occurs through the newly-described Triclosan-resistant enoyl reductase (ER) action of a bifunctional dehydratase/2-trans, 3-cis isomerase from Streptococcus pneumoniae (Heath et al., Nature 406, (DH/2.3 I). In E. coli, condensation of the 3-cis acyl inter 145 (2000)); comparison of ORF8 peptide with the S. mediate with malonyl-ACP requires a particular ketoacyl pneumoniae enoyl reductase using the LALIGN program ACP synthase and this may provide a rationale for the (matrix, BLOSUM50; gap opening penalty, -10; elongation 25 presence of two KS in the Shewanella gene cluster (in Orf penalty -1) indicated 49% similarity over a 386aa overlap). 5 and Orf 7). However, the PKS cycle extends the chain in Two regions were homologues of the E. coli FAS protein two-carbon increments while the double bonds in the EPA encoded by fabA, which catalyzes the synthesis of trans-2- product occur at every third carbon. This disjunction can be decenoyl-ACP and the reversible isomerization of this prod solved if the double bonds at C-14 and C-8 of EPA are uct to cis-3-decenoyl-ACP (Heath et al., J. Biol. Chem., 271, 30 generated by 2-trans, 2-cis isomerization (DH/2.21) fol 27795 (1996)). On this basis, it seemed likely that at least lowed by incorporation of the cis double bond into the some of the double bonds in EPA from Shewanella are elongating fatty acid chain. The enzymatic conversion of a introduced by a dehydrase-isomerase mechanism catalyzed trans double bond to the cis configuration without bond by the FabA-like domains in Orf7. migration is known to occur, for example, in the synthesis of Anaerobically-grown E. coli cells harboring the pEPA 35 11-cis-retinal in the retinoid cycle (Jang et al., J. Biol. Chem. plasmid accumulated EPA to the same levels as aerobic 275, 28.128 (2000)). Although such an enzyme function has cultures (Metz et al., 2001, Supra), indicating that an oxy not yet been identified in the Shewanella PKS, it may reside gen-dependent desaturase is not involved in EPA synthesis. in one of the unassigned protein domains. When pEPA was introduced into a fabB mutant of E. coli, The PKS pathways for PUFA synthesis in Shewanella and which is unable to synthesize monounsaturated fatty acids 40 another marine bacteria, Vibrio marinus, are described in and requires unsaturated fatty acids for growth, the resulting detail in U.S. Pat. No. 6,140.486 (issued from U.S. appli cells lost their fatty acid auxotrophy. They also accumulated cation Ser. No. 09/090,793, filed Jun. 4, 1998, entitled much higher levels of EPA than other pEPA-containing “Production of Polyunsaturated Fatty Acids by Expression strains, Suggesting that EPA competes with endogenously of Polyketide-like Synthesis Genes in Plants', which is produced monounsaturated fatty acids for transfer to glyc 45 incorporated herein by reference in its entirety). erolipids. When pEPA-containing E. coli cells were grown Polyunsaturated fatty acids (PUFAs) are considered to be in the presence of 'C-acetate, the data from C-NMR useful for nutritional, pharmaceutical, industrial, and other analysis of purified EPA from the cells confirmed the iden purposes. An expansive supply of PUFAs from natural tity of EPA and provided evidence that this fatty acid was Sources and from chemical synthesis are not sufficient for synthesized from acetyl-CoA and malonyl-CoA (See Metz, et 50 commercial needs. Because a number of separate desaturase al., 2001, Supra). A cell-free homogenate from pEPA-con and elongase enzymes are required for fatty acid synthesis taining fab B cells synthesized both EPA and saturated fatty from linoleic acid (LA, 18:2 A9, 12), common in most plant acids from C-malonyl-CoA. When the homogenate was species, to the more saturated and longer chain PUFAs, separated into a 200,000xg high-speed pellet and a mem engineering plant host cells for the expression of PUFAs brane-free Supernatant fraction, Saturated fatty acid synthe 55 such as EPA and DHA may require expression of five or six sis was confined to the Supernatant, consistent with the separate enzyme activities to achieve expression, at least for soluble nature of the Type II FAS enzymes (Magnuson et al., EPA and DHA. Additionally, for production of useable Microbiol. Rev. 57,522 (1993)). Synthesis of EPA was found quantities of such PUFAs, additional engineering efforts only in the high-speed pellet fraction, indicating that EPA may be required, for instance the down regulation of synthesis can occur without reliance on enzymes of the E. 60 enzymes competing for Substrate, engineering of higher coli FAS or on soluble intermediates (such as 16:0-ACP) enzyme activities such as by mutagenesis or targeting of from the cytoplasmic fraction. Since the proteins encoded by enzymes to plastid organelles. Therefore it is of interest to the Shewanella EPA genes are not particularly hydrophobic, obtain genetic material involved in PUFA biosynthesis from restriction of EPA synthesis activity to this fraction may species that naturally produce these fatty acids and to reflect a requirement for a membrane-associated acyl accep 65 express the isolated material alone or in combination in a tor molecule. Additionally, in contrast to the E. coli FAS, heterologous system which can be manipulated to allow EPA synthesis is specifically NADPH-dependent and does production of commercial quantities of PUFAs. US 7,256,023 B2 5 6 The discovery of a PUFA PKS system in marine bacteria ID NO:10, SEQID NO:13, SEQID NO:18, SEQID NO:20, such as Shewanella and Vibrio marinus (see U.S. Pat. No. SEQ ID NO:22, SEQ ID NO:24, SEQ ID NO:26, SEQ ID 6,140,486, ibid.) provides a resource for new methods of NO:28, SEQ ID NO:30, and SEQID NO:32. In a preferred commercial PUFA production. However, these marine bac embodiment, the nucleic acid sequence encodes an amino teria have limitations which will ultimately restrict their acid sequence chosen from: SEQ ID NO:2, SEQ ID NO:4, usefulness on a commercial level. First, although U.S. Pat. SEQ ID NO:6, SEQ ID NO:8, SEQ ID NO:10, SEQ ID No. 6,140,486 discloses that the marine bacteria PUFA PKS NO:13, SEQ ID NO:18, SEQ ID NO:20, SEQ ID NO:22, systems can be used to genetically modify plants, the marine SEQ ID NO:24, SEQ ID NO:26, SEQ ID NO:28, SEQ ID bacteria naturally live and grow in cold marine environ NO:30, SEQID NO:32 and/or biologically active fragments ments and the enzyme systems of these bacteria do not 10 thereof. In one aspect, the nucleic acid sequence is chosen function well above 30° C. In contrast, many crop plants, from: SEQID NO:1, SEQID NO:3, SEQID NO:5, SEQID which are attractive targets for genetic manipulation using NO:7, SEQID NO:9, SEQID NO:12, SEQID NO:17, SEQ the PUFA PKS system, have normal growth conditions at ID NO:19, SEQID NO:21, SEQID NO:23, SEQID NO:25, temperatures above 30° C. and ranging to higher than 40°C. SEQ ID NO:27, SEQ ID NO:29, and SEQ ID NO:31. Therefore, the marine bacteria PUFA PKS system is not 15 Another embodiment of the present invention relates to a predicted to be readily adaptable to plant expression under recombinant nucleic acid molecule comprising the nucleic normal growth conditions. Moreover, the marine bacteria acid molecule as described above, operatively linked to at PUFA PKS genes, being from a bacterial source, may not be least one transcription control sequence. In another embodi compatible with the genomes of eukaryotic host cells, or at ment, the present invention relates to a recombinant cell least may require significant adaptation to work in eukary transfected with the recombinant nucleic acid molecule otic hosts. Additionally, the known marine bacteria PUFA described directly above. PKS systems do not directly produce triglycerides, whereas Yet another embodiment of the present invention relates direct production of triglycerides would be desirable to a genetically modified microorganism, wherein the micro because triglycerides are a lipid storage product in micro organism expresses a PKS system comprising at least one organisms and as a result can be accumulated at very high 25 biologically active domain of a polyunsaturated fatty acid levels (e.g. up to 80-85% of cell weight) in microbial/plant (PUFA) polyketide synthase (PKS) system. The at least one cells (as opposed to a "structural' lipid product (e.g. phos domain of the PUFA PKS system is encoded by a nucleic pholipids) which can generally only accumulate at low acid sequence chosen from: (a) a nucleic acid sequence levels (e.g. less than 10-15% of cell weight at maximum)). encoding at least one domain of a polyunsaturated fatty acid Therefore, there is a need in the art for other PUFA PKS 30 (PUFA) polyketide synthase (PKS) system from a Thraus systems having greater flexibility for commercial use. tochytrid microorganism; (b) a nucleic acid sequence encod ing at least one domain of a PUFA PKS system from a SUMMARY OF THE INVENTION microorganism identified by the screening method of the present invention; (c) a nucleic acid sequence encoding an One embodiment of the present invention relates to an 35 amino acid sequence selected from the group consisting of isolated nucleic acid molecule comprising a nucleic acid SEQID NO:2, SEQID NO:4, SEQID NO:6, and biologi sequence chosen from: (a) a nucleic acid sequence encoding cally active fragments thereof; (d) a nucleic acid sequence an amino acid sequence selected from the group consisting encoding an amino acid sequence selected from the group of: SEQ ID NO:2, SEQ ID NO:4, SEQ ID NO:6, and consisting of: SEQ ID NO:8, SEQ ID NO:10, SEQ ID biologically active fragments thereof; (b) a nucleic acid 40 NO:13, SEQ ID NO:18, SEQ ID NO:20, SEQ ID NO:22, sequence encoding an amino acid sequence selected from SEQ ID NO:24, SEQ ID NO:26, SEQ ID NO:28, SEQ ID the group consisting of: SEQID NO:8, SEQID NO:10, SEQ NO:30, SEQ ID NO:32, and biologically active fragments ID NO:13, SEQID NO:18, SEQID NO:20, SEQID NO:22, thereof: (e) a nucleic acid sequence encoding an amino acid SEQ ID NO:24, SEQ ID NO:26, SEQ ID NO:28, SEQ ID sequence that is at least about 60% identical to at least 500 NO:30, SEQ ID NO:32, and biologically active fragments 45 consecutive amino acids of an amino acid sequence selected thereof (c) a nucleic acid sequence encoding an amino acid from the group consisting of: SEQID NO:2, SEQID NO:4, sequence that is at least about 60% identical to at least 500 and SEQ ID NO:6; wherein the amino acid sequence has a consecutive amino acids of the amino acid sequence of (a), biological activity of at least one domain of a PUFA PKS wherein the amino acid sequence has a biological activity of system; and, (f) a nucleic acid sequence encoding an amino at least one domain of a polyunsaturated fatty acid (PUFA) 50 acid sequence that is at least about 60% identical to an amino polyketide synthase (PKS) system; (d) a nucleic acid acid sequence selected from the group consisting of: SEQID sequence encoding an amino acid sequence that is at least NO:8, SEQ ID NO:10, SEQ ID NO:13, SEQ ID NO:18, about 60% identical to the amino acid sequence of (b). SEQ ID NO:20, SEQ ID NO:22, SEQ ID NO:24, SEQ ID wherein the amino acid sequence has a biological activity of NO:26, SEQ ID NO:28, SEQ ID NO:30, and SEQ ID at least one domain of a polyunsaturated fatty acid (PUFA) 55 NO:32; wherein the amino acid sequence has a biological polyketide synthase (PKS) system; and (e) a nucleic acid activity of at least one domain of a PUFA PKS system. In sequence that is fully complementary to the nucleic acid this embodiment, the microorganism is genetically modified sequence of (a), (b), (c), or (d). In alternate aspects, the to affect the activity of the PKS system. The screening nucleic acid sequence encodes an amino acid sequence that method of the present invention referenced in (b) above is at least about 70% identical, or at least about 80% 60 comprises: (i) selecting a microorganism that produces at identical, or at least about 90% identical, or is identical to: least one PUFA; and, (ii) identifying a microorganism from (1) at least 500 consecutive amino acids of an amino acid (i) that has an ability to produce increased PUFAs under sequence selected from the group consisting of SEQ ID dissolved oxygen conditions of less than about 5% of NO:2, SEQ ID NO:4, and SEQ ID NO:6; and/or (2) a saturation in the fermentation medium, as compared to nucleic acid sequence encoding an amino acid sequence that 65 production of PUFAs by the microorganism under dissolved is at least about 70% identical to an amino acid sequence oxygen conditions of greater than 5% of Saturation, and selected from the group consisting of: SEQID NO:8, SEQ more preferably 10% of saturation, and more preferably US 7,256,023 B2 7 8 greater than 15% of Saturation, and more preferably greater PKS system, from a type I iterative (fungal-like) PKS than 20% of saturation in the fermentation medium. system, from a type II PKS system, and/or from a type I In one aspect, the microorganism endogenously expresses modular PKS system. a PKS system comprising the at least one domain of the In another aspect of this embodiment, the microorganism PUFA PKS system, and wherein the genetic modification is is genetically modified by transfection with a recombinant in a nucleic acid sequence encoding the at least one domain nucleic acid molecule encoding the at least one domain of a of the PUFA PKS system. For example, the genetic modi polyunsaturated fatty acid (PUFA) polyketide synthase fication can be in a nucleic acid sequence that encodes a (PKS) system. Such a recombinant nucleic acid molecule domain having a biological activity of at least one of the can include any recombinant nucleic acid molecule com following proteins: malonyl-CoA:ACP acyltransferase 10 prising any of the nucleic acid sequences described above. (MAT). B-keto acyl-ACP synthase (KS), ketoreductase In one aspect, the microorganism has been further geneti (KR), acyltransferase (AT), FabA-like B-hydroxyacyl-ACP cally modified to recombinantly express at least one nucleic dehydrase (DH), phosphopantetheline transferase, chain acid molecule encoding at least one biologically active length factor (CLF), acyl carrier protein (ACP), enoyl ACP domain from a bacterial PUFA PKS system, from a Type I reductase (ER), an enzyme that catalyzes the synthesis of 15 iterative (fungal-like) PKS system, from a Type II PKS trans-2-decenoyl-ACP, an enzyme that catalyzes the revers system, or from a Type I modular PKS system. ible isomerization of trans-2-decenoyl-ACP to cis-3-de Yet another embodiment of the present invention relates cenoyl-ACP, and an enzyme that catalyzes the elongation of to a genetically modified plant, wherein the plant has been cis-3-decenoyl-ACP to cis-vaccenic acid. In one aspect, the genetically modified to recombinantly express a PKS system genetic modification is in a nucleic acid sequence that comprising at least one biologically active domain of a encodes an amino acid sequence selected from the group polyunsaturated fatty acid (PUFA) polyketide synthase consisting of: (a) an amino acid sequence that is at least (PKS) system. The domain can be encoded by any of the about 70% identical, and preferably at least about 80% nucleic acid sequences described above. In one aspect, the identical, and more preferably at least about 90% identical plant has been further genetically modified to recombinantly and more preferably identical to at least 500 consecutive 25 express at least one nucleic acid molecule encoding at least amino acids of an amino acid sequence selected from the one biologically active domain from a bacterial PUFA PKS group consisting of: SEQID NO:2, SEQID NO:4, and SEQ system, from a Type I iterative (fungal-like) PKS system, ID NO:6; wherein the amino acid sequence has a biological from a Type II PKS system, and 1 from a Type I modular activity of at least one domain of a PUFA PKS system; and, PKS system. (b) an amino acid sequence that is at least about 70% 30 Another embodiment of the present invention relates to a identical, and preferably at least about 80% identical, and method to identify a microorganism that has a polyunsatu more preferably at least about 90% identical and more rated fatty acid (PUFA) polyketide synthase (PKS) system. preferably identical to an amino acid sequence selected from The method includes the steps of: (a) selecting a microor the group consisting of: SEQID NO:8, SEQID NO:10, SEQ ganism that produces at least one PUFA; and, (b) identifying ID NO:13, SEQID NO:18, SEQID NO:20, SEQID NO:22, a microorganism from (a) that has an ability to produce SEQ ID NO:24, SEQ ID NO:26, SEQ ID NO:28, SEQ ID increased PUFAs under dissolved oxygen conditions of less NO:30, and SEQ ID NO:32; wherein the amino acid than about 5% of saturation in the fermentation medium, as sequence has a biological activity of at least one domain of compared to production of PUFAs by the microorganism a PUFA PKS system. under dissolved oxygen conditions of greater than 5% of 40 saturation, more preferably 10% of saturation, more prefer In one aspect, the genetically modified microorganism is ably greater than 15% of saturation and more preferably a Thraustochytrid, which can include, but is not limited to, greater than 20% of saturation in the fermentation medium. a Thraustochytrid from a genus chosen from Schizochytrium A microorganism that produces at least one PUFA and has and Thraustochytrium. In another aspect, the microorganism an ability to produce increased PUFAs under dissolved has been further genetically modified to recombinantly 45 oxygen conditions of less than about 5% of Saturation is express at least one nucleic acid molecule encoding at least identified as a candidate for containing a PUFAPKS system. one biologically active domain from a bacterial PUFA PKS In one aspect of this embodiment, step (b) comprises system, from a Type I iterative (fungal-like) PKS system, identifying a microorganism from (a) that has an ability to from a Type II PKS system, and/or from a Type I modular PKS system. produce increased PUFAs under dissolved oxygen condi 50 tions of less than about 2% of saturation, and more prefer In another aspect of this embodiment, the microorganism ably under dissolved oxygen conditions of less than about endogenously expresses a PUFAPKS system comprising the 1% of saturation, and even more preferably under dissolved at least one biologically active domain of a PUFA PKS conditions of about 0% of saturation. system, and wherein the genetic modification comprises In another aspect of this embodiment, the microorganism expression of a recombinant nucleic acid molecule selected 55 selected in (a) has an ability to consume bacteria by phago from the group consisting of a recombinant nucleic acid cytosis. In another aspect, the microorganism selected in (a) molecule encoding at least one biologically active domain has a simple fatty acid profile. In another aspect, the micro from a second PKS system and a recombinant nucleic acid organism selected in (a) is a non-bacterial microorganism. In molecule encoding a protein that affects the activity of the another aspect, the microorganism selected in (a) is a PUFAPKS system. Preferably, the recombinant nucleic acid 60 . In another aspect, the microorganism selected in molecule comprises any one of the nucleic acid sequences (a) is a member of the order Thraustochytriales. In another described above. aspect, the microorganism selected in (a) has an ability to In one aspect of this embodiment, the recombinant produce PUFAs at a temperature greater than about 15° C. nucleic acid molecule encodes a phosphopantetheline trans and preferably greater than about 20° C., and more prefer ferase. In another aspect, the recombinant nucleic acid 65 ably greater than about 25° C., and even more preferably molecule comprises a nucleic acid sequence encoding at greater than about 30° C. In another aspect, the microor least one biologically active domain from a bacterial PUFA ganism selected in (a) has an ability to produce bioactive US 7,256,023 B2 10 compounds (e.g., lipids) of interest at greater than 5% of the (PKS) system. In another aspect, the organism produces a dry weight of the organism, and more preferably greater than polyunsaturated fatty acid (PUFA) profile that differs from 10% of the dry weight of the organism. In yet another aspect, the naturally occurring organism without a genetic modifi the microorganism selected in (a) contains greater than 30% cation. In another aspect, the organism endogenously of its total fatty acids as C14:0, C16:0 and C16:1 while also expresses a non-bacterial PUFA PKS system, and wherein producing at least one long chain fatty acid with three or the genetic modification comprises Substitution of a domain more unsaturated bonds, and preferably, the microorganism from a different PKS system for a nucleic acid sequence selected in (a) contains greater than 40% of its total fatty encoding at least one domain of the non-bacterial PUFA acids as C14:0, C16:0 and C16:1 while also producing at PKS system. least one long chain fatty acid with three or more unsaturated 10 In yet another aspect, the organism endogenously bonds. In another aspect, the microorganism selected in (a) expresses a non-bacterial PUFA PKS system that has been contains greater than 30% of its total fatty acids as C14:0. modified by transfecting the organism with a recombinant C16:0 and C16:1 while also producing at least one long nucleic acid molecule encoding a protein that regulates the chain fatty acid with four or more unsaturated bonds, and chain length of fatty acids produced by the PUFA PKS more preferably while also producing at least one long chain 15 system. For example, the recombinant nucleic acid molecule fatty acid with five or more unsaturated bonds. encoding a protein that regulates the chain length of fatty In another aspect of this embodiment, the method further acids can replace a nucleic acid sequence encoding a chain comprises step (c) of detecting whether the organism com length factor in the non-bacterial PUFA PKS system. In prises a PUFA PKS system. In this aspect, the step of another aspect, the protein that regulates the chain length of detecting can include detecting a nucleic acid sequence in fatty acids produced by the PUFA PKS system is a chain the microorganism that hybridizes under Stringent condi length factor. In another aspect, the protein that regulates the tions with a nucleic acid sequence encoding an amino acid chain length of fatty acids produced by the PUFA PKS sequence from a Thraustochytrid PUFA PKS system. Alter system is a chain length factor that directs the synthesis of natively, the step of detecting can include detecting a nucleic C20 units. acid sequence in the organism that is amplified by oligo 25 In one aspect, the organism expresses a non-bacterial nucleotide primers from a nucleic acid sequence from a PUFA PKS system comprising a genetic modification in a Thaustochytrid PUFA PKS system. domain chosen from: a domain encoding FabA-like B-hy Another embodiment of the present invention relates to a droxy acyl-ACP dehydrase (DH) domain and a domain microorganism identified by the screening method described encoding B-ketoacyl-ACP synthase (KS), wherein the modi above, wherein the microorganism is genetically modified to 30 fication alters the ratio of long chain fatty acids produced by regulate the production of molecules by the PUFA PKS the PUFA PKS system as compared to in the absence of the system. modification. In one aspect, the modification comprises Yet another embodiment of the present invention relates Substituting a DH domain that does not possess isomeriza to a method to produce a bioactive molecule that is produced tion activity for a FabA-like B-hydroxyacyl-ACP dehydrase by a polyketide synthase system. The method includes the 35 (DH) in the non-bacterial PUFA PKS system. In another step of culturing under conditions effective to produce the aspect, the modification is selected from the group consist bioactive molecule a genetically modified organism that ing of a deletion of all or a part of the domain, a Substitution expresses a PKS system comprising at least one biologically of a homologous domain from a different organism for the active domain of a polyunsaturated fatty acid (PUFA) domain, and a mutation of the domain. polyketide synthase (PKS) system. The domain of the PUFA 40 PKS system is encoded by any of the nucleic acid sequences In another aspect, the organism expresses a PKS system described above. and the genetic modification comprises Substituting a FabA In one aspect of this embodiment, the organism endog like B-hydroxy acyl-ACP dehydrase (DH) domain from a enously expresses a PKS system comprising the at least one PUFA PKS system for a DH domain that does not possess domain of the PUFA PKS system, and the genetic modifi 45 isomerization activity. cation is in a nucleic acid sequence encoding the at least one In another aspect, the organism expresses a non-bacterial domain of the PUFA PKS system. For example, the genetic PUFA PKS system comprising a modification in an enoyl modification can change at least one product produced by ACP reductase (ER) domain, wherein the modification the endogenous PKS system, as compared to a wild-type results in the production of a different compound as com organism. 50 pared to in the absence of the modification. For example, the In another aspect of this embodiment, the organism modification can be selected from the group consisting of a endogenously expresses a PKS system comprising the at deletion of all or a part of the ER domain, a substitution of least one biologically active domain of the PUFA PKS an ER domain from a different organism for the ER domain, system, and the genetic modification comprises transfection and a mutation of the ER domain. of the organism with a recombinant nucleic acid molecule 55 In one aspect, the bioactive molecule produced by the selected from the group consisting of a recombinant nucleic present method can include, but is not limited to, an anti acid molecule encoding at least one biologically active inflammatory formulation, a chemotherapeutic agent, an domain from a second PKS system and a recombinant active excipient, an osteoporosis drug, an anti-depressant, an nucleic acid molecule encoding a protein that affects the anti-convulsant, an anti-Heliobactor pylori drug, a drug for activity of the PUFA PKS system. For example, the genetic 60 treatment of neurodegenerative disease, a drug for treatment modification can change at least one product produced by of degenerative liver disease, an antibiotic, and a cholesterol the endogenous PKS system, as compared to a wild-type lowering formulation. In one aspect, the bioactive molecule organism. is a polyunsaturated fatty acid (PUFA). In another aspect, the In yet another aspect of this embodiment, the organism is bioactive molecule is a molecule including carbon-carbon genetically modified by transfection with a recombinant 65 double bonds in the cis configuration. In another aspect, the nucleic acid molecule encoding the at least one domain of bioactive molecule is a molecule including a double bond at the polyunsaturated fatty acid (PUFA) polyketide synthase every third carbon. US 7,256,023 B2 11 12 In one aspect of this embodiment, the organism is a one malonyl-CoA:ACP acyltransferase (MAT) domain. In microorganism, and in another aspect, the organism is a one aspect, the PUFA PKS system is a eukaryotic PUFA plant. PKS system. In another aspect, the PUFA PKS system is an Another embodiment of the present invention relates to a algal PUFA PKS system, and preferably a Thraustochytri method to produce a plant that has a polyunsaturated fatty ales PUFAPKS system, which can include, but is not limited acid (PUFA) profile that differs from the naturally occurring to, a Schizochytrium PUFA PKS system or a Thraus plant, comprising genetically modifying cells of the plant to tochytrium PUFA PKS system. express a PKS system comprising at least one recombinant In this embodiment, the PUFA PKS system can be nucleic acid molecule comprising a nucleic acid sequence expressed in a prokaryotic host cell or in a eukaryotic host encoding at least one biologically active domain of a PUFA 10 cell. In one aspect, the host cell is a plant cell. Accordingly, PKS system. The domain of the PUFA PKS system is one embodiment of the invention is a method to produce a encoded by any of the nucleic acid sequences described product containing at least one PUFA, comprising growing above. a plant comprising Such a plant cell under conditions effec Yet another embodiment of the present invention relates tive to produce the product. The host cell is a microbial cell to a method to modify an endproduct containing at least one 15 and in this case, one embodiment of the present invention is fatty acid, comprising adding to the endproduct an oil a method to produce a product containing at least one PUFA, produced by a recombinant host cell that expresses at least comprising culturing a culture containing Such a microbial one recombinant nucleic acid molecule comprising a nucleic cell under conditions effective to produce the product. In one acid sequence encoding at least one biologically active aspect, the PKS system catalyzes the direct production of domain of a PUFAPKS system. The domain of a PUFAPKS triglycerides. system is encoded by any of the nucleic acid sequences Yet another embodiment of the present invention relates described above. In one aspect, the endproduct is selected to a genetically modified microorganism comprising a poly from the group consisting of a dietary Supplement, a food unsaturated fatty acid (PUFA) polyketide synthase (PKS) product, a pharmaceutical formulation, a humanized animal system, wherein the PKS catalyzes both iterative and non milk, and an infant formula. A pharmaceutical formulation 25 iterative enzymatic reactions. The PUFA PKS system com can include, but is not limited to: an anti-inflammatory prises: (a) at least two enoyl ACP-reductase (ER) domains: formulation, a chemotherapeutic agent, an active excipient, (b) at least six acyl carrier protein (ACP) domains; (c) at an osteoporosis drug, an anti-depressant, an anti-convulsant, least two B-keto acyl-ACP synthase (KS) domains; (d) at an anti-Heliobactor pylori drug, a drug for treatment of least one acyltransferase (AT) domain; (e) at least one neurodegenerative disease, a drug for treatment of degen 30 ketoreductase (KR) domain; (f) at least two Fab A-like erative liver disease, an antibiotic, and a cholesterol lower B-hydroxyacyl-ACP dehydrase (DH) domains; (g) at least ing formulation. In one aspect, the endproduct is used to one chain length factor (CLF) domain; and (h) at least one treat a condition selected from the group consisting of malonyl-CoA:ACP acyltransferase (MAT) domain. The chronic inflammation, acute inflammation, gastrointestinal genetic modification affects the activity of the PUFA PKS disorder, cancer, cachexia, cardiac restenosis, neurodegen 35 system. In one aspect of this embodiment, the microorgan erative disorder, degenerative disorder of the liver, blood ism is a eukaryotic microorganism. lipid disorder, osteoporosis, osteoarthritis, autoimmune dis Yet another embodiment of the present invention relates ease, preclampsia, preterm birth, age related maculopathy, to a recombinant host cell which has been modified to pulmonary disorder, and peroxisomal disorder. express a non-bacterial polyunsaturated fatty acid (PUFA) Yet another embodiment of the present invention relates 40 polyketide synthase (PKS) system, wherein the non-bacte to a method to produce a humanized animal milk, compris rial PUFA PKS catalyzes both iterative and non-iterative ing genetically modifying milk-producing cells of a milk enzymatic reactions. The non-bacterial PUFA PKS system producing animal with at least one recombinant nucleic acid comprises: (a) at least one enoyl ACP-reductase (ER) molecule comprising a nucleic acid sequence encoding at domain; (b) multiple acyl carrier protein (ACP) domains; (c) least one biologically active domain of a PUFAPKS system. 45 at least two B-ketoacyl-ACP synthase (KS) domains; (d) at The domain of the PUFA PKS system is encoded by any of least one acyltransferase (AT) domain; (e) at least one the nucleic acid sequences described above. ketoreductase (KR) domain; (f) at least two Fab A-like Yet another embodiment of the present invention relates B-hydroxyacyl-ACP dehydrase (DH) domains; (g) at least to a method produce a recombinant microbe, comprising one chain length factor (CLF) domain; and (h) at least one genetically modifying microbial cells to express at least one 50 malonyl-CoA:ACP acyltransferase (MAT) domain. recombinant nucleic acid molecule comprising a comprising a nucleic acid sequence encoding at least one biologically BRIEF DESCRIPTION OF THE FIGURES active domain of a PUFA PKS system. The domain of the PUFA PKS system is encoded by any of the nucleic acid FIG. 1 is a graphical representation of the domain struc sequences described above. 55 ture of the Schizochytrium PUFA PKS system. Yet another embodiment of the present invention relates FIG. 2 shows a comparison of PKS domains from to a recombinant host cell which has been modified to Schizochytrium and Shewanella. express a polyunsaturated fatty acid (PUFA) polyketide FIG. 3 shows a comparison of PKS domains from synthase (PKS) system, wherein the PKS catalyzes both Schizochytrium and a related PKS system from Nostoc iterative and non-iterative enzymatic reactions. The PUFA 60 whose product is a long chain fatty acid that does not contain PKS system comprises: (a) at least two enoyl ACP-reductase any double bonds. (ER) domains; (b) at least six acyl carrier protein (ACP) domains; (c) at least two B-keto acyl-ACP synthase (KS) DETAILED DESCRIPTION OF THE domains; (d) at least one acyltransferase (AT) domain; (e) at INVENTION least one ketoreductase (KR) domain; (f) at least two FabA 65 like B-hydroxy acyl-ACP dehydrase (DH) domains; (g) at The present invention generally relates to non-bacterial least one chain length factor (CLF) domain; and (h) at least derived polyunsaturated fatty acid (PUFA) polyketide syn US 7,256,023 B2 13 14 thase (PKS) systems, to genetically modified organisms introduce cis double bonds and the capacity to vary the comprising non-bacterial PUFA PKS systems, to methods of reaction sequence in the cycle. making and using Such systems for the production of prod Therefore, the present inventors propose to use these ucts of interest, including bioactive molecules, and to novel features of the PUFA PKS system to produce a range of methods for identifying new eukaryotic microorganisms 5 bioactive molecules that could not be produced by the having such a PUFA PKS system. As used herein, a PUFA previously described (Type II, Type I iterative (fungal-like) PKS system generally has the following identifying features: and Type I modular) PKS systems. These bioactive mol (1) it produces PUFAs as a natural product of the system; ecules include, but are linited to, many of which will be and (2) it comprises several multifunctional proteins discussed below. For example, using the knowledge of the assembled into a complex that conducts both iterative pro 10 PUFA PKS gene structures described herein, any of a cessing of the fatty acid chain as well non-iterative process number of methods can be used to alter the PUFA PKS ing, including trans-cis isomerization and enoyl reduction genes, or combine portions of these genes with other syn reactions in selected cycles (See FIG. 1, for example). thesis systems, including other PKS systems, such that new More specifically, first, a PUFA PKS system that forms products are produced. The ingerent ability of this particular the basis of this invention produces polyunsaturated fatty 15 type of system to do both iterative and selective reactions acids (PUFAs) as products (i.e., an organism that endog will enable this system to yield products that would not be enously (naturally) contains such a PKS system makes found if similar methods were applied to other types of PKS PUFAs using this system). The PUFAs referred to herein are systems. preferably polyunsaturated fatty acids with a carbon chain In one embodiment, a PUFAPKS system according to the length of at least 16 carbons, and more preferably at least 18 present invention comprises at least the following biologi carbons, and more preferably at least 20 carbons, and more cally active domains: (a) at least two enoyl ACP-reductase preferably 22 or more carbons, with at least 3 or more double (ER) domains; (b) at least six acyl carrier protein (ACP) bonds, and preferably 4 or more, and more preferably 5 or domains; (c) at least two B-keto acyl-ACP synthase (KS) more, and even more preferably 6 or more double bonds, domains; (d) at least one acyltransferase (AT) domain; (e) at wherein all double bonds are in the cis configuration. It is an 25 least one ketoreductase (KR) domain; (f) at least two FabA object of the present invention to find or create via genetic like B-hydroxy acyl-ACP dehydrase (DH) domains; (g) at manipulation or manipulation of the endproduct, PKS sys least one chain length factor (CLF) domain; and (h) at least tems which produce polyunsaturated fatty acids of desired one malonyl-CoA:ACP acyltransferase (MAT) domain. The chain length and with desired numbers of double bonds. functions of these domains are generally individually known Examples of PUFAs include, but are not limited to, DHA 30 in the art and will be described in detail below with regard (docosahexaenoic acid (C22:6, ()-3)), DPA (docosapen to the PUFA PKS system of the present invention. taenoic acid (C22:5, c)-6)), and EPA (eicosapentaenoic acid In another embodiment, the PUFA PKS system comprises (C20:5, co-3)). at least the following biologically active domains: (a) at least Second, the PUFA PKS system described herein incor one enoyl ACP-reductase (ER) domain; (b) multiple acyl porates both iterative and non-iterative reactions, which 35 carrier protein (ACP) domains (at least four, and preferably distinguish the system from previously described PKS sys at least five, and more preferably at least six, and even more tems (e.g., type I iterative (fungal-like), type II or type I preferably seven, eight, nine, or more than nine); (c) at least modular). More particularly, the PUFA PKS system two B-keto acyl-ACP synthase (KS) domains; (d) at least described herein contains domains that appear to function one acyltransferase (AT) domain; (e) at least one ketoreduc during each cycle as well as those which appear to function 40 tase (KR) domain; (f) at least two FabA-like B-hydroxy during only some of the cycles. A key aspect of this may be acyl-ACP dehydrase (DH) domains; (g) at least one chain related to the domains showing homology to the bacterial length factor (CLF) domain; and (h) at least one malonyl Fab A enzymes. For example, the Fab. A enzyme of E. coli CoA:ACP acyltransferase (MAT) domain. Preferably, such a has been shown to possess two enzymatic activities. It PUFA PKS system is a non-bacterial PUFA-PKS system. possesses a dehydration activity in which a water molecule 45 In one embodiment, a PUFA PKS system of the present (H2O) is abstracted from a carbon chain containing a invention is a non-bacterial PUFA PKS system. In other hydroxy group, leaving a trans double bond in that carbon words, in one embodiment, the PUFA PKS system of the chain. In addition, it has an isomerase activity in which the present invention is isolated from an organism that is not a trans double bond is converted to the cis configuration. This bacteria, or is a homologue of or derived from a PUFA PKS isomerization is accomplished in conjunction with a migra 50 system from an organism that is not a bacteria, such as a tion of the double bond position to adjacent carbons. In PKS eukaryote or an archaebacterium. Eukaryotes are separated (and FAS) systems, the main carbon chain is extended in 2 from prokaryotes based on the degree of differentiation of carbon increments. One can therefore predict the number of the cells. The higher group with more differentiation is extension reactions required to produce the PUFA products called eukaryotic. The lower group with less differentiated of these PKS systems. For example, to produce DHA 55 cells is called prokaryotic. In general, prokaryotes do no (C22:6, all cis) requires 10 extension reactions. Since there possess a nuclear membrane, do not exhibit mitosis during are only 6 double bonds in the end product, it means that cell division, have only one chromosome, their cytoplasm during some of the reaction cycles, a double bond is retained contains 70S ribosomes, they do not possess any mitochon (as a cis isomer), and in others, the double bond is reduced dria, endoplasmic reticulum, , lysosomes or prior to the next extension. 60 golgi apparatus, their flagella (if present) consists of a single Before the discovery of a PUFA PKS system in marine fibril. In contrast eukaryotes have a nuclear membrane, they bacteria (see U.S. Pat. No. 6,140.486), PKS systems were do exhibit mitosis during cell division, they have many not known to possess this combination of iterative and chromosomes, their cytoplasm contains 80S ribosomes, they selective enzymatic reactions, and they were not thought of do possess mitochondria, endoplasmic reticulum, chloro as being able to produce carbon-carbon double bonds in the 65 plasts (in ), lysosomes and golgi apparatus, and their cis configuration. However, the PUFA PKS system flagella (if present) consists of many fibrils. In general, described by the present invention has the capacity to bacteria are prokaryotes, while algae, fungi, , protozoa US 7,256,023 B2 15 16 and higher plants are eukaryotes. The PUFAPKS systems of containing the first partial open reading frame described in the marine bacteria (e.g., Shewanella and Vibrio marinus) U.S. application Ser. No. 09/231,899 (denoted therein as are not the basis of the present invention, although the SEQID NO:69) match nucleotides 4677-8730 (plus the stop present invention does contemplate the use of domains from codon) of the sequence denoted herein as OrfA (SEQ ID these bacterial PUFA PKS systems in conjunction with 5 NO:1). Nucleotides 1-4876 of the cDNA sequence contain domains from the non-bacterial PUFA PKS systems of the ing the second partial open reading frame described in U.S. present invention. Por example, according to the present application Ser. No. 09/231,899 (denoted therein as SEQID invention, genetically modified organisms can be produced NO:71) matches nucleotides 1311-6177 (plus the stop which incorporate non-bacterial PUFA PKS functional codon) of the sequence denoted herein as OrfB (SEQ ID domains with bacteria PUFA PKS functional domains, as 10 NO:3). Nucleotides 145-4653 of the cDNA sequence con well as PKS functional domains or proteins from other PKS taining the complete open reading frame described in U.S. systems (type I iterative (fungal-like), type II, type I modu application Ser. No. 09/231,899 (denoted therein as SEQID lar) or FAS systems. NO:76 and incorrectly designated as a partial open reading Schizochytrium is a Thraustochytrid marine microorgan frame) match the entire sequence (plus the stop codon) of the ism that accumulates large quantities of triacylglycerols rich 15 sequence denoted herein as OrfQ (SEQ ID NO:5). in DHA and docosapentaenoic acid (DPA; 22:5 (0-6); e.g., Further sequencing of cDNA and genomic clones by the 30% DHA+DPA by dry weight (Barclay et al., J. Appl. present inventors allowed the identification of the full-length Phycol. 6, 123 (1994)). In eukaryotes that synthesize 20- and genomic sequence of each of OrfA, OrfB and Orf and the 22-carbon PUFAs by an elongation/desaturation pathway, complete identification of the domains with homology to the pools of 18-, 20- and 22-carbon intermediates are those in Shewanella (see FIG. 2). It is noted that in relatively large so that in vivo labeling experiments using Schizochytrium, the genomic DNA and cDNA are identical, 'C-acetate reveal clear precursor-product kinetics for the due to the lack of introns in the organism genome, to the best predicted intermediates (Gellerman et al., Biochim. Biophys. of the present inventors knowledge. Therefore, reference to Acta 573:23 (1979)). Furthermore, radiolabeled intermedi a nucleotide sequence from Schizochytrium can refer to ates provided exogenously to Such organisms are converted 25 genomic DNA or cDNA. Based on the comparison of the to the final PUFA products. The present inventors have Schizochytrium PKS domains to Shewanella, clearly, the shown that 1-'C-acetate was rapidly taken up by Schizochytrium genome encodes proteins that are highly Schizochytrium cells and incorporated into fatty acids, but at similar to the proteins in Shewanella that are capable of the shortest labeling time (1 min), DHA contained 31% of catalyzing EPA synthesis. The proteins in Schizochytrium the label recovered in fatty acids, and this percentage 30 constitute a PUFA PKS system that catalyzes DHA and DPA remained essentially unchanged during the 10-15 min of synthesis. As discussed in detail herein, simple modification 'C-acetate incorporation and the subsequent 24 hours of of the reaction scheme identified for Shewanella will allow culture growth (See Example 3). Similarly, DPA represented for DHA synthesis in Schizochytrium. The homology 10% of the label throughout the experiment. There is no between the prokaryotic Shewanella and eukaryotic evidence for a precursor-product relationship between 16- or 35 Schizochytrium genes suggests that the PUFA PKS has 18-carbon fatty acids and the 22-carbon polyunsaturated undergone lateral gene transfer. fatty acids. These results are consistent with rapid synthesis FIG. 1 is a graphical representation of the three open of DHA from '''C-acetate involving very small (possibly reading frames from the Schizochytrium PUFA PKS system, enzyme-bound) pools of intermediates. A cell-free homoge and includes the domain structure of this PUFAPKS system. nate derived from Schizochytrium cultures incorporated 40 As described in Example 1 below, the domain structure of 1-CI-malonyl-CoA into DHA, DPA, and saturated fatty each open reading frame is as follows: acids. The same biosynthetic activities were retained by a 100,000xg supernatant fraction but were not present in the Open Reading Frame A (OrfA): membrane pellet. Thus, DHA and DPA synthesis in The complete nucleotide sequence for OrfA is represented Schizochytrium does not involve membrane-bound desatu 45 herein as SEQID NO:1. Nucleotides 4677-8730 of SEQ ID rases or fatty acid elongation enzymes like those described NO:1 correspond to nucleotides 390-4443 of the sequence for other eukaryotes (Parker-Barnes et al., 2000, supra; denoted as SEQ ID NO:69 in U.S. application Ser. No. Shanklin et al., 1998, supra). These fractionation data con 09/231,899. Therefore, nucleotides 1-4676 of SEQID NO:1 trast with those obtained from the Shewanella enzymes (See represent additional sequence that was not disclosed in U.S. Metz et al., 2001, Supra) and may indicate use of a different 50 application Ser. No. 09/231,899. This novel region of SEQ (soluble) acyl acceptor molecule, such as CoA, by the ID NO:1 encodes the following domains in OrfA: (1) the Schizochytrium enzyme. ORFA-KS domain; (2) the ORFA-MAT domain; and (3) at In copending U.S. application Ser. No. 09/231,899, a least a portion of the ACP domain region (e.g., at least ACP cDNA library from Schizochytrium was constructed and domains 1-4). It is noted that nucleotides 1-389 of SEQ ID approximately 8,000 random clones (ESTs) were sequenced. 55 NO:69 in U.S. application Ser. No. 09/231,899 do not match Within this dataset, only one moderately expressed gene with the 389 nucleotides that are upstream of position 4677 (0.3% of all sequences) was identified as a fatty acid in SEQ ID NO:1 disclosed herein. Therefore, positions desaturase, although a second putative desaturase was rep 1-389 of SEQ ID NO:69 in U.S. application Ser. No. resented by a single clone (0.01%). By contrast, sequences 09/231,899 appear to be incorrectly placed next to nucle that exhibited homology to 8 of the 11 domains of the 60 otides 390-4443 of that sequence. Most of these first 389 Shewanella PKS genes shown in FIG. 2 were all identified nucleotides (about positions 60–389) are a match with an at frequencies of 0.2-0.5%. In U.S. application Ser. No. upstream portion of OrfA (SEQ ID NO: 1) of the present 09/231,899, several cDNA clones showing homology to the invention and therefore, it is believed that an error occurred Shewanella PKS genes were sequenced, and various clones in the effort to prepare the contig of the cDNA constructs in were assembled into nucleic acid sequences representing 65 U.S. application Ser. No. 09/231,899. The region in which two partial open reading frames and one complete open the alignment error occurred in U.S. application Ser. No. reading frame. Nucleotides 390-4443 of the cDNA sequence 09/231,899 is within the region of highly repetitive sequence US 7,256,023 B2 17 18 (i.e., the ACP region, discussed below), which probably As a class of enzymes, KS's have been well characterized. created Some confusion in the assembly of that sequence The sequences of many verified KS genes are know, the from various cDNA clones. active site motifs have been identified and the crystal OrfA is a 8730 nucleotide sequence (not including the structures of several have been determined. Proteins (or stop codon) which encodes a 2910 amino acid sequence, domains of proteins) can be readily identified as belonging represented herein as SEQID NO:2. Within OrfA are twelve to the KS family of enzymes by homology to known KS domains: (a) one B-keto acyl-ACP synthase (KS) domain; Sequences. (b) one malonyl-CoA:ACP acyltransferase (MAT) domain; The second domain in OrfA is a MAT domain, also (c) nine acyl carrier protein (ACP) domains; and (d) one referred to herein as ORFA-MAT. This domain is contained ketoreductase (KR) domain. 10 within the nucleotide sequence spanning from a starting The nucleotide sequence for OrfA has been deposited point of between about positions 1723 and 1798 of SEQ ID with GenBank as Accession No. AF378327 (amino acid NO:1 (OrfA) to an ending point of between about positions sequence Accession No. AAK728879). OrfA was compared 2805 and 3000 of SEQ ID NO:1. The nucleotide sequence with known sequences in a standard BLAST search (BLAST containing the sequence encoding the ORFA-MAT domain 2.0 Basic BLAST homology search using blastp for amino 15 is represented herein as SEQID NO:9 (positions 1723-3000 acid searches, blastin for nucleic acid searches, and blastX of SEQ ID NO:1). The amino acid sequence containing the for nucleic acid searches and searches of the translated MAT domain spans from a starting point of between about amino acid sequence in all 6 open reading frames with positions 575 and 600 of SEQID NO:2 (ORFA) to an ending standard default parameters, wherein the query sequence is point of between about positions 935 and 1000 of SEQ ID filtered for low complexity regions by default (described in NO:2. The amino acid sequence containing the ORFA-MAT Altschul, S. F., Madden, T. L., Schääffer, A. A., Zhang, J., domain is represented herein as SEQ ID NO:10 (positions Zhang, Z. Miller, W. & Lipman, D. J. (1997) “Gapped 575-1000 of SEQID NO:2). It is noted that the ORFA-MAT BLAST and PSI-BLAST: a new generation of protein data domain contains an active site motif: GHS*XG (acyl base search programs. Nucleic Acids Res. 25:3389-3402, binding site S7), represented herein as SEQ ID NO:11. incorporated herein by reference in its entirety)). At the 25 According to the present invention, a domain or protein nucleic acid level, OrfA has no significant homology to any having malonyl-CoA:ACP acyltransferase (MAT) biological known nucleotide sequence. At the amino acid level, the activity (function) is characterized as one that transfers the sequences with the greatest degree of homology to ORFA malonyl moiety from malonyl-CoA to ACP. In addition to were: Nostoc sp. 7120 heterocyst glycolipid synthase (Ac the active site motif (GXSXG), these enzymes possess an cession No. NC 003272), which was 42% identical to 30 extended motif R and Qamino acids in key positions) that ORFA over 1001 amino acid residues; and Moritella mari identifies them as MAT enzymes (in contrast to the AT nus (Vibrio marinus) ORF8 (Accession No. AB025342), domain of Schizochytrium OrfB). In some PKS systems (but which was 40% identical to ORFA over 993 amino acid not the PUFA PKS domain) MAT domains will preferen residues. tially load methyl- or ethyl-malonate on to the ACP group The first domain in OrfA is a KS domain, also referred to 35 (from the corresponding CoA ester), thereby introducing herein as ORFA-KS. This domain is contained within the branches into the linear carbon chain. MAT domains can be nucleotide sequence spanning from a starting point of recognized by their homology to known MAT sequences and between about positions 1 and 40 of SEQID NO:1 (OrfA) by their extended motif structure. to an ending point of between about positions 1428 and 1500 Domains 3-11 of OrfA are nine tandem ACP domains, of SEQ ID NO:1. The nucleotide sequence containing the 40 also referred to herein as ORFA-ACP (the first domain in the sequence encoding the ORFA-KS domain is represented sequence is ORFA-ACP1, the second domain is ORFA herein as SEQID NO:7 (positions 1-1500 of SEQID NO:1). ACP2, the third domain is ORFA-ACP3, etc.). The first ACP The amino acid sequence containing the KS domain spans domain, ORFA-ACP1, is contained within the nucleotide from a starting point of between about positions 1 and 14 of sequence spanning from about position 3343 to about posi SEQ ID NO:2 (ORFA) to an ending point of between about 45 tion 3600 of SEQID NO:1 (OrfA). The nucleotide sequence positions 476 and 500 of SEQ ID NO:2. The amino acid containing the sequence encoding the ORFA-ACP1 domain sequence containing the ORFA-KS domain is represented is represented herein as SEQ ID NO:12 (positions 3343 herein as SEQID NO:8 (positions 1-500 of SEQID NO:2). 3600 of SEQID NO:1). The amino acid sequence containing It is noted that the ORFA-KS domain contains an active site the first ACP domain spans from about position 1115 to motif: DXAC* (*acyl binding site Cs). 50 about position 1200 of SEQ ID NO:2. The amino acid According to the present invention, a domain or protein sequence containing the ORFA-ACP1 domain is represented having 3-keto acyl-ACP synthase (KS) biological activity herein as SEQ ID NO:13 (positions 1115-1200 of SEQ ID (function) is characterized as the enzyme that carries out the NO:2). It is noted that the ORFA-ACP1 domain contains an initial step of the FAS (and PKS) elongation reaction cycle. active site motif: LGIDS* (pantetheline binding motif The acyl group destined for elongation is linked to a cysteine 55 Ss,), represented herein by SEQ ID NO:14. residue at the active site of the enzyme by a thioester bond. The nucleotide and amino acid sequences of all nine ACP In the multi-step reaction, the acyl-enzyme undergoes con domains are highly conserved and therefore, the sequence densation with malonyl-ACP to form-ketoacyl-ACP, CO for each domain is not represented herein by an individual and free enzyme. The KS plays a key role in the elongation sequence identifier. However, based on the information cycle and in many systems has been shown to possess 60 disclosed herein, one of skill in the art can readily determine greater Substrate specificity than other enzymes of the reac the sequence containing each of the other eight ACP tion cycle. For example, E. coli has three distinct KS domains (see discussion below). enzymes—each with its own particular role in the physiol All nine ACP domains together span a region of OrfA of ogy of the organism (Magnuson et al., Microbiol. Rev. 57. from about position 3283 to about position 6288 of SEQID 522 (1993)). The two KS domains of the PUFA-PKS sys 65 NO:1, which corresponds to amino acid positions of from tems could have distinct roles in the PUFA biosynthetic about 1095 to about 2096 of SEQ ID NO:2. The nucleotide reaction sequence. sequence for the entire ACP region containing all nine US 7,256,023 B2 19 20 domains is represented herein as SEQID NO:16. The region dependent reduction of 3-ketoacyl forms of ACP. It is the represented by SEQID NO:16 includes the linker segments first reductive step in the de novo fatty acid biosynthesis between individual ACP domains. The repeat interval for the elongation cycle and a reaction often performed in nine domains is approximately every 330 nucleotides of polyketide biosynthesis. Significant sequence similarity is SEQID NO:16 (the actual number of amino acids measured observed with one family of enoyl ACP reductases (ER), the between adjacent active site serines ranges from 104 to 116 other reductase of FAS (but not the ER family present in the amino acids). Each of the nine ACP domains contains a PUFA PKS system), and the short-chain alcohol dehydro pantetheline binding motif LGIDS (represented herein by genase family. Pfam analysis of the PUFA PKS region SEQ ID NO:14), wherein S* is the pantetheline binding site indicated above reveals the homology to the short-chain serine (S). The pantetheline binding site serine (S) is located 10 alcohol dehydrogenase family in the core region. Blast near the center of each ACP domain sequence. At each end analysis of the same region reveals matches in the core area of the ACP domain region and between each ACP domain is to known KR enzymes as well as an extended region of a region that is highly enriched for proline (P) and alanine homology to domains from the other characterized PUFA (A), which is believed to be a linker region. For example, PKS systems. between ACP domains 1 and 2 is the sequence: APAPV 15 KAAAPAAPVASAPAPA, represented herein as SEQ ID Open Reading Frame B (OrfB): NO:15. The locations of the active site serine residues (i.e., The complete nucleotide sequence for OrfB is represented the pantetheline binding site) for each of the nine ACP herein as SEQ ID NO:3. Nucleotides 1311-4242 and 4244 domains, with respect to the amino acid sequence of SEQID 6177 of SEQ ID NO :3 correspond to nucleotides 1-2932 NO:2, are as follows: ACP1=Ss: ACP2=Se: and 2934-4867 of the sequence denoted as SEQ ID NO:71 ACP3=S; ACP4-Sass: ACP5-S. ACP6–Ss: in U.S. application Ser. No. 09/231,899 (The cDNA ACP7=Ss: ACP8–So; and ACP9–Sosa. Given that the sequence in U.S. application Ser. No. 09/231,899 contains average size of an ACP domain is about 85 amino acids, about 345 additional nucleotides beyond the stop codon, excluding the linker, and about 110 amino acids including including a polyA tail). Therefore, nucleotides 1-1310 of the linker, with the active site serine being approximately in 25 SEQ ID NO: 1 represent additional sequence that was not the center of the domain, one of skill in the art can readily disclosed in U.S. application Ser. No. 09/231,899. This determine the positions of each of the nine ACP domains in novel region of SEQ ID NO:3 contains most of the KS OrfA. domain encoded by OrfB. According to the present invention, a domain or protein OrfB is a 6177 nucleotide sequence (not including the having acyl carrier protein (ACP) biological activity (func 30 stop codon) which encodes a 2059 amino acid sequence, tion) is characterized as being Small polypeptides (typically, represented herein as SEQ ID NO:4. Within OrfB are four 80 to 100 amino acids long), that function as carriers for domains: (a) one B-keto acyl-ACP synthase (KS) domain; growing fatty acyl chains via a thioester linkage to a (b) one chain length factor (CLF) domain; (c) one acyl covalently bound co-factor of the protein. They occur as transferase (AT) domain; and, (d) one enoyl ACP-reductase separate units or as domains within larger proteins. ACPs are 35 (ER) domain. converted from inactive apo-forms to functional holo-forms The nucleotide sequence for OrfB has been deposited by transfer of the phosphopantetheinyl moeity of CoA to a with GenBank as Accession No. AF378328 (amino acid highly conserved serine residue of the ACP. Acyl groups are sequence Accession No. AAK728880). OrfB was compared attached to ACP by a thioester linkage at the free terminus with known sequences in a standard BLAST search as of the phosphopantetheinyl moiety. ACPs can be identified 40 described above. At the nucleic acid level, OrfB has no by labeling with radioactive pantetheline and by sequence significant homology to any known nucleotide sequence. At homology to known ACPs. The presence of variations of the the amino acid level, the sequences with the greatest degree above mentioned motif (LGIDS) is also a signature of an of homology to ORFB were: Shewanella sp. hypothetical ACP. protein (Accession No. U73935), which was 53% identical Domain 12 in OrfA is a KR domain, also referred to 45 to ORFB over 458 amino acid residues; Moritella marinus herein as ORFA-KR. This domain is contained within the (Vibrio marinus) ORF 11 (Accession No. AB025342), nucleotide sequence spanning from a starting point of about which was 53% identical to ORFB over 460 amino acid position 6598 of SEQ ID NO:1 to an ending point of about residues; Photobacterium profundum omega-3 polyunsatu position 8730 of SEQ ID NO:1. The nucleotide sequence rated fatty acid synthase Pfal) (Accession No. AF409 100), containing the sequence encoding the ORFA-KR domain is 50 which was 52% identical to ORFB over 457 amino acid represented herein as SEQ ID NO:17 (positions 6598-8730 residues; and Nostoc sp. 7120 hypothetical protein (Acces of SEQ ID NO:1). The amino acid sequence containing the sion No. NC 003272), which was 53% identical to ORFB KR domain spans from a starting point of about position over 430 amino acid residues. 2200 of SEQ ID NO:2 (ORFA) to an ending point of about The first domain in OrfB is a KS domain, also referred to position 2910 of SEQ ID NO:2. The amino acid sequence 55 herein as ORFB-KS. This domain is contained within the containing the ORFA-KR domain is represented herein as nucleotide sequence spanning from a starting point of SEQ ID NO:18 (positions 2200-2910 of SEQ ID NO:2). between about positions 1 and 43 of SEQ ID NO:3 (OrfB) Within the KR domain is a core region with homology to to an ending point of between about positions 1332 and 1350 short chain aldehyde-dehydrogenases (KR is a member of of SEQ ID NO:3. The nucleotide sequence containing the this family). This core region spans from about position 60 sequence encoding the ORFB-KS domain is represented 7198 to about position 7500 of SEQ ID NO:1, which herein as SEQ ID NO:19 (positions 1-1350 of SEQ ID corresponds to amino acid positions 2400-2500 of SEQ ID NO:3). The amino acid sequence containing the KS domain NO:2. spans from a starting point of between about positions 1 and According to the present invention, a domain or protein 15 of SEQID NO:4 (ORFB) to an ending point of between having ketoreductase activity, also referred to as 3-ketoacyl 65 about positions 444 and 450 of SEQ ID NO:4. The amino ACP reductase (KR) biological activity (function), is char acid sequence containing the ORFB-KS domain is repre acterized as one that catalyzes the pyridine-nucleotide sented herein as SEQID NO:20 (positions 1-450 of SEQID US 7,256,023 B2 21 22 NO:4). It is noted that the ORFB-KS domain contains an identified (e.g. to malonyl-CoA:ACP acyltransferase.MAT). active site motif: DXAC*(*acyl binding site C). KS In spite of the weak homology toMAT, this AT domain is not biological activity and methods of identifying proteins or believed to function as aMAT because it does not possess an domains having Such activity is described above. extended motif structure characteristic of Such enzymes The second domain in OrfB is a CLF domain, also (seeMAT domain description, above). For the purposes of referred to herein as ORFB-CLF. This domain is contained this disclosure, the functions of the AT domain in a PUFA within the nucleotide sequence spanning from a starting PKS system include, but are not limited to: transfer of the point of between about positions 1378 and 1402 of SEQID fatty acyl group from the ORFAACP domain(s) to water (i.e. NO:3 (OrfB) to an ending point of between about positions a thioesterase-releasing the fatty acyl group as a free fatty 2682 and 2700 of SEQ ID NO:3. The nucleotide sequence 10 acid), transfer of a fatty acyl group to an acceptor Such as containing the sequence encoding the ORFB-CLF domain is CoA, transfer of the acyl group among the various ACP represented herein as SEQ ID NO:21 (positions 1378-2700 domains, or transfer of the fatty acyl group to a lipophilic of SEQ ID NO:3). The amino acid sequence containing the acceptor molecule (e.g. to lysophosphadic acid). CLF domain spans from a starting point of between about The fourth domain in OrfB is an ER domain, also referred positions 460 and 468 of SEQID NO:4 (ORFB) to an ending 15 to herein as ORFB-ER. This domain is contained within the point of between about positions 894 and 900 of SEQ ID nucleotide sequence spanning from a starting point of about NO:4. The amino acid sequence containing the ORFB-CLF position 4648 of SEQID NO:3 (OrfB) to an ending point of domain is represented herein as SEQ ID NO:22 (positions about position 6177 of SEQ ID NO:3. The nucleotide 460-900 of SEQID NO:4). It is noted that the ORFB-CLF sequence containing the sequence encoding the ORFB-ER domain contains a KS active site motif without the acyl domain is represented herein as SEQ ID NO:25 (positions binding cysteine. 4648-6177 of SEQ ID NO:3). The amino acid sequence According to the present invention, a domain or protein is containing the ER domain spans from a starting point of referred to as a chain length factor (CLF) based on the about position 1550 of SEQID NO:4 (ORFB) to an ending following rationale. The CLF was originally described as point of about position 2059 of SEQ ID NO:4. The amino characteristic of Type II (dissociated enzymes) PKS systems 25 acid sequence containing the ORFB-ER domain is repre and was hypothesized to play a role in determining the sented herein as SEQ ID NO:26 (positions 1550-2059 of number of elongation cycles, and hence the chain length, of SEQ ID NO:4). the end product. CLF amino acid sequences show homology According to the present invention, this domain has enoyl to KS domains (and are thought to form heterodimers with reductase (ER) biological activity. The ER enzyme reduces a KS protein), but they lack the active site cysteine. CLF's 30 the trans-double bond (introduced by the DH activity) in the role in PKS systems is currently controversial. New evi fatty acyl-ACP, resulting in fully saturating those carbons. dence (C. Bisang et al., Nature 401, 502 (1999)) suggests a The ER domain in the PUFA-PKS shows homology to a role in priming (providing the initial acyl group to be newly characterized family of ER enzymes (Heath et al., elongated) the PKS systems. In this role the CLF domain is Nature 406,145 (2000)). Heath and Rock identified this new thought to decarboxylate malonate (as malonyl-ACP), thus 35 class of ER enzymes by cloning a gene of interest from forming an acetate group that can be transferred to the KS Streptococcus pneumoniae, purifying a protein expressed active site. This acetate therefore acts as the priming from that gene, and showing that it had ER activity in an in molecule that can undergo the initial elongation (condensa vitro assay. The sequence of the Schizochytrium ER domain tion) reaction. Homologues of the Type II CLF have been of OrfB shows homology to the S. pneumoniae ER protein. identified as loading domains in some modular PKS sys 40 All of the PUFA PKS systems currently examined contain at tems. A domain with the sequence features of the CLF is least one domain with very high sequence homology to the found in all currently identified PUFA PKS systems and in Schizochytrium ER domain. The Schizochytrium PUFA PKS each case is found as part of a multidomain protein. system contains two ER domains (one on OrfB and one on The third domain in OrfB is an AT domain, also referred Orf(). to herein as ORFB-AT. This domain is contained within the 45 nucleotide sequence spanning from a starting point of Open Reading Frame C (Orf(r): between about positions 2701 and 3598 of SEQ ID NO:3 The complete nucleotide sequence for Orfc is represented (OrfB) to an ending point of between about positions 3975 herein as SEQ ID NO:5. Nucleotides 1-4506 of SEQ ID and 4200 of SEQ ID NO:3. The nucleotide sequence con NO:5 (i.e., the entire open reading frame sequence, not taining the sequence encoding the ORFB-AT domain is 50 including the stop codon) correspond to nucleotides 145 represented herein as SEQ ID NO:23 (positions 2701-4200 2768, 2770-2805, 2807-2817, and 2819-4653 of the of SEQ ID NO:3). The amino acid sequence containing the sequence denoted as SEQID NO:76 in U.S. application Ser. AT domain spans from a starting point of between about No. 09/231,899 (The cDNA sequence in U.S. application positions 901 and 1200 of SEQ ID NO:4 (ORFB) to an Ser. No. 09/231,899 contains about 144 nucleotides ending point of between about positions 1325 and 1400 of 55 upstream of the start codon for Orf and about 110 nucle SEQ ID NO:4. The amino acid sequence containing the otides beyond the stop codon, including a polyA tail). Orf ORFB-AT domain is represented herein as SEQ ID NO:24 is a 4506 nucleotide sequence (not including the stop codon) (positions 901-1400 of SEQ ID NO:4). It is noted that the which encodes a 1502 amino acid sequence, represented ORFB-AT domain contains an active site motif of GXS*XG herein as SEQID NO:6. Within Orfare three domains: (a) (acyl binding site Sao) that is characteristic of acyltrans 60 two FabA-like B-hydroxy acyl-ACP dehydrase (DH) ferse (AT) proteins. domains; and (b) one enoyl ACP-reductase (ER) domain. An “acyltransferase' or “AT” refers to a general class of The nucleotide sequence for Orfor has been deposited enzymes that can carry out a number of distinct acyl transfer with GenBank as Accession No. AF3783.29 (amino acid reactions. The Schizochytrium domain shows good homol sequence Accession No. AAK728881). Orf was compared ogy to a domain present in all of the other PUFA PKS 65 with known sequences in a standard BLAST search as systems currently examined and very weak homology to described above. At the nucleic acid level, Orf has no Some acyltransferases whose specific functions have been significant homology to any known nucleotide sequence. At US 7,256,023 B2 23 24 the amino acid level (Blastp), the sequences with the greatest point of about position 1502 of SEQ ID NO:6. The amino degree of homology to ORFC were: Moritella marinus acid sequence containing the ORFC-ER domain is repre (Vibrio marinus) ORF11 (Accession No. ABO25342), sented herein as SEQID NO:32 (positions 999-1502 of SEQ which is 45% identical to ORFC over 514 amino acid ID NO:6). ER biological activity has been described above. residues. Shewanella sp. hypothetical protein 8 (Accession One embodiment of the present invention relates to an No. U73935), which is 49% identical to ORFC over 447 isolated nucleic acid molecule comprising a nucleic acid amino acid residues, Nostoc sp. hypothetical protein (Acces sequence from a non-bacterial PUFA PKS system, a homo sion No. NC 003272), which is 49% identical to ORFC logue thereof, a fragment thereof, and/or a nucleic acid over 430 amino acid residues, and Shewanella sp. hypo sequence that is complementary to any of Such nucleic acid thetical protein 7 (Accession No. U73935), which is 37% 10 sequences. In one aspect, the present invention relates to an identical to ORFC over 930 amino acid residues. isolated nucleic acid molecule comprising a nucleic acid The first domain in Orf is a DH domain, also referred to sequence selected from the group consisting of: (a) a nucleic herein as ORFC-DH1. This is one of two DH domains in acid sequence encoding an amino acid sequence selected Orf , and therefore is designated DH 1. This domain is from the group consisting of: SEQID NO:2, SEQID NO:4, contained within the nucleotide sequence spanning from a 15 SEQID NO:6, and biologically active fragments thereof; (b) starting point of between about positions 1 and 778 of SEQ a nucleic acid sequence encoding an amino acid sequence ID NO:5 (Orf) to an ending point of between about selected from the group consisting of: SEQ ID NO:8, SEQ positions 1233 and 1350 of SEQ ID NO:5. The nucleotide ID NO:10, SEQID NO:13, SEQID NO:18, SEQID NO:20, sequence containing the sequence encoding the ORFC-DH1 SEQ ID NO:22, SEQ ID NO:24, SEQ ID NO:26, SEQ ID domain is represented herein as SEQ ID NO:27 (positions NO:28, SEQ ID NO:30, SEQ ID NO:32, and biologically 1-1350 of SEQ ID NO:5). The amino acid sequence con active fragments thereof (c) a nucleic acid sequence encod taining the DH I domain spans from a starting point of ing an amino acid sequence that is at least about 60% between about positions 1 and 260 of SEQID NO:6 (ORFC) identical to at least 500 consecutive amino acids of said to an ending point of between about positions 411 and 450 amino acid sequence of (a), wherein said amino acid of SEQ ID NO:6. The amino acid sequence containing the 25 sequence has a biological activity of at least one domain of ORFC-DH1 domain is represented herein as SEQID NO:28 a polyunsaturated fatty acid (PUFA) polyketide synthase (positions 1-450 of SEQ ID NO:6). (PKS) system; (d) a nucleic acid sequence encoding an The characteristics of both the DH domains (see below for amino acid sequence that is at least about 60% identical to DH2) in the PUFA PKS systems have been described in the said amino acid sequence of (b), wherein said amino acid preceding sections. This class of enzyme removes HOH 30 sequence has a biological activity of at least one domain of from a B-keto acyl-ACP and leaves a trans double bond in a polyunsaturated fatty acid (PUFA) polyketide synthase the carbon chain. The DH domains of the PUFA PKS (PKS) system; or (e) a nucleic acid sequence that is fully systems show homology to bacterial DH enzymes associated complementary to the nucleic acid sequence of (a), (b), (c), with their FAS systems (rather than to the DH domains of or (d). In a further embodiment, nucleic acid sequences other PKS systems). A subset of bacterial DH's, the FabA 35 including a sequence encoding the active site domains or like DHs, possesses cis-trans isomerase activity (Heath et other functional motifs described above for several of the al., J. Biol. Chem., 271, 27795 (1996)). It is the homologies PUFA PKS domains are encompassed by the invention. to the FabA-like DH's that indicate that one or both of the According to the present invention, an amino acid DH domains is responsible for insertion of the cis double sequence that has a biological activity of at least one domain bonds in the PUFA PKS products. 40 of a PUFA PKS system is an amino acid sequence that has The second domain in OrFC is a DH domain, also referred the biological activity of at least one domain of the PUFA to herein as ORFC-DH2. This is the Second of two DH PKS system described in detail herein, as exemplified by the domains in Orfc, and therefore is designated DH2. This Schizochytrium PUFA PKS system. The biological activities domain is contained within the nucleotide sequence Span of the various domains within the Schizochytrium PUFA ning from a starting point of between about positions 1351 45 PKS system have been described in detail above. Therefore, and 2437 of SEQ ID NO:5 (Orf() to an ending point of an isolated nucleic acid molecule of the present invention between about positions 2607 and 2847 of SEQ ID NO:5. can encode the translation product of any PUFA PKS open The nucleotide sequence continuing the sequence encoding reading frame, PUFA PKS domain, biologically active frag the ORFC-DH2 domain is represented herein as SEQ ID ment thereof, or any homologue of a naturally occurring NO:29 (positions 1351-2847 of SEQ ID NO:5). The amino 50 PUFA PKS open reading frame or domain which has bio acid sequence containing the DH2 domain spans from a logical activity. A homologue of given protein or domain is starting point of between about positions 451 and 813 of a protein or polypeptide that has an amino acid sequence SEQ ID NO:6 (ORFC) to an ending point of between about which differs from the naturally occurring reference amino positions 869 and 949 of SEQ ID NO:6. The amino acid acid sequence (i.e., of the reference protein or domain) in sequence containing the ORFC-DH2 domain is represented 55 that at least one or a few, but not limited to one or a few, herein as SEQ ID NO:30 (positions 451-949 of SEQ ID amino acids have been deleted (e.g., a truncated version of NO:6). DH biological activity has been described above. the protein, Such as apeptide or fragment), inserted, inverted, The third domain in Orf is an ER domain, also referred Substituted and/or derivatized (e.g., by glycosylation, phos to herein as ORFC-ER. This domain is contained within the phorylation, acetylation, myristoylation, prenylation, palmi nucleotide sequence spanning from a starting point of about 60 tation, amidation and/or addition of glycosylphosphatidyl position 2995 of SEQID NO:5 (Orf) to an ending point of inositol). Preferred homologues of a PUFA PKS protein or about position 4506 of SEQ ID NO:5. The nucleotide domain are described in detail below. It is noted that sequence containing the sequence encoding the ORFC-ER homologues can include synthetically produced homo domain is represented herein as SEQ ID NO:31 (positions logues, naturally occurring allelic variants of a given protein 2995-4509 of SEQ ID NO:5). The amino acid sequence 65 or domain, or homologous sequences from organisms other containing the ER domain spans from a starting point of than the organism from which the reference sequence was about position 999 of SEQ ID NO:6 (ORFC) to an ending derived. US 7,256,023 B2 25 26 In general, the biological activity or biological action of as described herein. Protein homologues (e.g., proteins a protein or domain refers to any function(s) exhibited or encoded by nucleic acid homologues) have been discussed performed by the protein or domain that is ascribed to the in detail above. naturally occurring form of the protein or domain as mea A nucleic acid molecule homologue can be produced Sured or observed in vivo (i.e., in the natural physiological using a number of methods known to those skilled in the art environment of the protein) or in vitro (i.e., under laboratory (see, for example, Sambrook et al., Molecular Cloning. A conditions). Biological activities of PUFA PKS systems and LaboratoryManual, Cold Spring Harbor Labs Press, 1989). the individual proteins/domains that make up a PUFA PKS For example, nucleic acid molecules can be modified using system have been described in detail elsewhere herein. a variety of techniques including, but not limited to, classic 10 mutagenesis techniques and recombinant DNA techniques, Modifications of a protein or domain, Such as in a homo Such as site-directed mutagenesis, chemical treatment of a logue or mimetic (discussed below), may result in proteins nucleic acid molecule to induce mutations, restriction or domains having the same biological activity as the enzyme cleavage of a nucleic acid fragment, ligation of naturally occurring protein or domain, or in proteins or nucleic acid fragments, PCR amplification and/or mutagen domains having decreased or increased biological activity as 15 esis of selected regions of a nucleic acid sequence, synthesis compared to the naturally occurring protein or domain of oligonucleotide mixtures and ligation of mixture groups modifications which result in a decrease in expression or a to “build a mixture of nucleic acid molecules and combi decrease in the activity of the protein or domain, can be nations thereof. Nucleic acid molecule homologues can be referred to as inactivation (complete or partial), down selected from a mixture of modified nucleic acids by screen regulation, or decreased action of a protein or domain. ing for the function of the protein encoded by the nucleic Similarly, modifications which result in an increase in acid and/or by hybridization with a wild-type gene. expression or an increase in the activity of the protein or The minimum size of a nucleic acid molecule of the domain, can be referred to as amplification, overproduction, present invention is a size Sufficient to form a probe or activation, enhancement, up-regulation or increased action oligonucleotide primer that is capable of forming a stable of a protein or domain. A functional domain of a PUFA PKS 25 hybrid (e.g., under moderate, high or very high Stringency system is a domain (i.e., a domain can be a portion of a conditions) with the complementary sequence of a nucleic protein) that is capable of performing a biological function acid molecule useful in the present invention, or of a size (i.e., has biological activity). Sufficient to encode an amino acid sequence having a bio In accordance with the present invention, an isolated logical activity of at least one domain of a PUFA PKS nucleic acid molecule is a nucleic acid molecule that has 30 system according to the present invention. As such, the size been removed from its natural milieu (i.e., that has been of the nucleic acid molecule encoding such a protein can be subject to human manipulation), its natural milieu being the dependent on nucleic acid composition and percent homol genome or chromosome in which the nucleic acid molecule ogy or identity between the nucleic acid molecule and is found in nature. As such, "isolated does not necessarily complementary sequence as well as upon hybridization reflect the extent to which the nucleic acid molecule has 35 conditions per se (e.g., temperature, salt concentration, and been purified, but indicates that the molecule does not formamide concentration). The minimal size of a nucleic include an entire genome or an entire chromosome in which acid molecule that is used as an oligonucleotide primer or as the nucleic acid molecule is found in nature. An isolated a probe is typically at least about 12 to about 15 nucleotides nucleic acid molecule can include a gene. An isolated in length if the nucleic acid molecules are GC-rich and at nucleic acid molecule that includes a gene is not a fragment 40 least about 15 to about 18 bases in length if they are AT-rich. of a chromosome that includes Such gene, but rather includes There is no limit, other than a practical limit, on the maximal the coding region and regulatory regions associated with the size of a nucleic acid molecule of the present invention, in gene, but no additional genes naturally found on the same that the nucleic acid molecule can include a sequence chromosome. An isolated nucleic acid molecule can also Sufficient to encode a biologically active fragment of a include a specified nucleic acid sequence flanked by (i.e., at 45 domain of a PUFA PKS system, an entire domain of a PUFA the 5' and/or the 3' end of the sequence) additional nucleic PKS system, several domains within an open reading frame acids that do not normally flank the specified nucleic acid (Orf) of a PUFA PKS system, an entire Orf of a PUFA PKS sequence in nature (i.e., heterologous sequences). Isolated system, or more than one Orf of a PUFA PKS system. nucleic acid molecule can include DNA, RNA (e.g., In one embodiment of the present invention, an isolated mRNA), or derivatives of either DNA or RNA (e.g., cDNA). 50 nucleic acid molecule comprises or consists essentially of a Although the phrase “nucleic acid molecule' primarily nucleic acid sequence selected from the group of: SEQ ID refers to the physical nucleic acid molecule and the phrase NO:2, SEQID NO:4, SEQID NO:6, SEQID NO:8, SEQID “nucleic acid sequence' primarily refers to the sequence of NO:10, SEQ ID NO:13, SEQ ID NO:18, SEQ ID NO:20, nucleotides on the nucleic acid molecule, the two phrases SEQ ID NO:22, SEQ ID NO:24, SEQ ID NO:26, SEQ ID can be used interchangeably, especially with respect to a 55 NO:28, SEQ ID NO:30, SEQ ID NO:32, or biologically nucleic acid molecule, or a nucleic acid sequence, being active fragments thereof. In one aspect, the nucleic acid capable of encoding a protein or domain of a protein. sequence is selected from the group of SEQID NO:1, SEQ Preferably, an isolated nucleic acid molecule of the ID NO:3, SEQID NO:5, SEQID NO:7, SEQID NO:9, SEQ present invention is produced using recombinant DNA tech ID NO:12, SEQID NO:17, SEQID NO:19, SEQID NO:21, nology (e.g., polymerase chain reaction (PCR) amplifica 60 SEQ ID NO:23, SEQ ID NO:25, SEQ ID NO:27, SEQ ID tion, cloning) or chemical synthesis. Isolated nucleic acid NO:29, and SEQ ID NO:31. In one embodiment of the molecules include natural nucleic acid molecules and homo present invention, any of the above-described PUFA PKS logues thereof, including, but not limited to, natural allelic amino acid sequences, as well as homologues of Such variants and modified nucleic acid molecules in which sequences, can be produced with from at least one, and up nucleotides have been inserted, deleted, substituted, and/or 65 to about 20, additional heterologous amino acids flanking inverted in Such a manner that such modifications provide each of the C- and/or N-terminal end of the given amino acid the desired effect on PUFA PKS system biological activity sequence. The resulting protein or polypeptide can be US 7,256,023 B2 27 28 referred to as "consisting essentially of a given amino acid consecutive amino acids, and more preferably to at least sequence. According to the present invention, the heterolo about 2200 consecutive amino acids, and more preferably to gous amino acids are a sequence of amino acids that are not at least about 2300 consecutive amino acids, and more naturally found (i.e., not found in nature, in vivo) flanking preferably to at least about 2400 consecutive amino acids, the given amino acid sequence or which would not be and more preferably to at least about 2500 consecutive encoded by the nucleotides that flank the naturally occurring amino acids, and more preferably to at least about 2600 nucleic acid sequence encoding the given amino acid consecutive amino acids, and more preferably to at least sequence as it occurs in the gene, if Such nucleotides in the about 2700 consecutive amino acids, and more preferably to naturally occurring sequence were translated using standard at least about 2800 consecutive amino acids, and even more codon usage for the organism from which the given amino 10 preferably, to the full length of SEQ ID NO:2. acid sequence is derived. Similarly, the phrase “consisting In another aspect, a homologue of a Schizochytrium PUFA essentially of, when used with reference to a nucleic acid PKS protein or domain encompassed by the present inven sequence herein, refers to a nucleic acid sequence encoding tion comprises an amino acid sequence that is at least about a given amino acid sequence that can be flanked by from at 65% identical, and more preferably at least about 70% least one, and up to as many as about 60, additional 15 identical, and more preferably at least about 75% identical, heterologous nucleotides at each of the 5' and/or the 3' end and more preferably at least about 80% identical, and more of the nucleic acid sequence encoding the given amino acid preferably at least about 85% identical, and more preferably sequence. The heterologous nucleotides are not naturally at least about 90% identical, and more preferably at least found (i.e., not found in nature, in vivo) flanking the nucleic about 95% identical, and more preferably at least about 96% acid sequence encoding the given amino acid sequence as it identical, and more preferably at least about 97% identical, occurs in the natural gene. and more preferably at least about 98% identical, and more The present invention also includes an isolated nucleic preferably at least about 99% identical to an amino acid acid molecule comprising a nucleic acid sequence encoding sequence chosen from: SEQ ID NO:2, SEQ ID NO:4, or an amino acid sequence having a biological activity of at SEQ ID NO:6, over any of the consecutive amino acid least one domain of a PUFAPKS system. In one aspect, such 25 lengths described in the paragraph above, wherein the amino a nucleic acid sequence encodes a homologue of any of the acid sequence has a biological activity of at least one domain Schizochytrium PUFA PKS ORFs or domains, including: of a PUFA PKS system. SEQID NO:2, SEQID NO:4, SEQID NO:6, SEQID NO:8, In one aspect of the invention, a homologue of a SEQ ID NO:10, SEQ ID NO:13, SEQ ID NO:18, SEQ ID Schizochytrium PUFA PKS protein or domain encompassed NO:20, SEQ ID NO:22, SEQ ID NO:24, SEQ ID NO:26, 30 by the present invention comprises an amino acid sequence SEQID NO:28, SEQID NO:30, or SEQID NO:32, wherein that is at least about 60% identical to an amino acid sequence the homologue has a biological activity of at least one chosen from: SEQ ID NO:8, SEQ ID NO:10, SEQ ID domain of a PUFA PKS system as described previously NO:13, SEQ ID NO:18, SEQ ID NO:20, SEQ ID NO:22, herein. SEQ ID NO:24, SEQ ID NO:26, SEQ ID NO:28, SEQ ID In one aspect of the invention, a homologue of a 35 NO:30, or SEQ ID NO:32, wherein said amino acid Schizochytrium PUFA PKS protein or domain encompassed sequence has a biological activity of at least one domain of by the present invention comprises an amino acid sequence a PUFA PKS system. In a further aspect, the amino acid that is at least about 60% identical to at least 500 consecutive sequence of the homologue is at least about 65% identical, amino acids of an amino acid sequence chosen from: SEQ and more preferably at least about 70% identical, and more ID NO:2, SEQ ID NO:4, and SEQ ID NO:6; wherein said 40 preferably at least about 75% identical, and more preferably amino acid sequence has a biological activity of at least one at least about 80% identical, and more preferably at least domain of a PUFA PKS system. In a further aspect, the about 85% identical, and more preferably at least about 90% amino acid sequence of the homologue is at least about 60% identical, and more preferably at least about 95% identical, identical to at least about 600 consecutive amino acids, and and more preferably at least about 96% identical, and more more preferably to at least about 700 consecutive amino 45 preferably at least about 97% identical, and more preferably acids, and more preferably to at least about 800 consecutive at least about 98% identical, and more preferably at least amino acids, and more preferably to at least about 900 about 99% identical to an amino acid sequence chosen from: consecutive amino acids, and more preferably to at least SEQ ID NO:8, SEQ ID NO:10, SEQ ID NO:13, SEQ ID about 1000 consecutive amino acids, and more preferably to NO:18, SEQ ID NO:20, SEQ ID NO:22, SEQ ID NO:24, at least about 1100 consecutive amino acids, and more 50 SEQ ID NO:26, SEQ ID NO:28, SEQ ID NO:30, SEQ ID preferably to at least about 1200 consecutive amino acids, NO:32, wherein the amino acid sequence has a biological and more preferably to at least about 1300 consecutive activity of at least one domain of a PUFA PKS system. amino acids, and more preferably to at least about 1400 According to the present invention, the term "contiguous’ consecutive amino acids, and more preferably to at least or “consecutive', with regard to nucleic acid or amino acid about 1500 consecutive amino acids of any of SEQ ID 55 sequences described herein, means to be connected in an NO:2, SEQID NO.4 and SEQID NO:6, or to the full length unbroken sequence. For example, for a first sequence to of SEQ ID NO:6. In a further aspect, the amino acid comprise 30 contiguous (or consecutive) amino acids of a sequence of the homologue is at least about 60% identical to second sequence, means that the first sequence includes an at least about 1600 consecutive amino acids, and more unbroken sequence of 30 amino acid residues that is 100% preferably to at least about 1700 consecutive amino acids, 60 identical to an unbroken sequence of 30 amino acid residues and more preferably to at least about 1800 consecutive in the second sequence. Similarly, for a first sequence to amino acids, and more preferably to at least about 1900 have “100% identity” with a second sequence means that the consecutive amino acids, and more preferably to at least first sequence exactly matches the second sequence with no about 2000 consecutive amino acids of any of SEQID NO:2 gaps between nucleotides or amino acids. or SEQ ID NO:4, or to the full length of SEQ ID NO:4. In 65 As used herein, unless otherwise specified, reference to a a further aspect, the amino acid sequence of the homologue percent (%) identity refers to an evaluation of homology is at least about 60% identical to at least about 2100 which is performed using: (1) a BLAST 2.0 Basic BLAST US 7,256,023 B2 29 30 homology search using blastp for amino acid searches, protein comprising an amino acid sequence represented by blastin for nucleic acid searches, and blastX for nucleic acid any of SEQ ID NO:2, SEQ ID NO:4, SEQ ID NO:6, SEQ searches and searches of translated amino acids in all 6 open ID NO:8, SEQID NO:10, SEQID NO:13, SEQID NO:18, reading frames, all with standard default parameters, SEQ ID NO:20, SEQ ID NO:22, SEQ ID NO:24, SEQ ID wherein the query sequence is filtered for low complexity NO:26, SEQ ID NO:28, SEQ ID NO:30, or SEQ ID regions by default (described in Altschul, S. F...Madden, T. NO:32.Methods to deduce a complementary sequence are L., Schääsfer, A. A., Zhang, J., Zhang, Z. Miller, W. & known to those skilled in the art. It should be noted that since Lipman, D. J. (1997) “Gapped BLAST and PSI-BLAST: a amino acid sequencing and nucleic acid sequencing tech new generation of protein database search programs.” nologies are not entirely error-free, the sequences presented Nucleic Acids Res. 25:3389-3402, incorporated herein by 10 herein, at best, represent apparent sequences of PUFA PKS reference in its entirety); (2) a BLAST 2 alignment (using domains and proteins of the present invention. the parameters described below); (3) and/or PSI-BLAST As used herein, hybridization conditions refer to standard with the standard default parameters (Position-Specific Iter hybridization conditions under which nucleic acid mol ated BLAST). It is noted that due to some differences in the ecules are used to identify similar nucleic acid molecules. standard parameters between BLAST 2.0 Basic BLAST and 15 Such standard conditions are disclosed, for example, in BLAST 2, two specific sequences might be recognized as Sambrook et al., Molecular Cloning. A Laboratory Manual. having significant homology using the BLAST 2 program, Cold Spring Harbor Labs Press, 1989. Sambrook et al., ibid., whereas a search performed in BLAST 2.0 Basic BLAST is incorporated by reference herein in its entirety (see using one of the sequences as the query sequence may not specifically, pages 9.31-9.62). In addition, formulae to cal identify the second sequence in the top matches. In addition, culate the appropriate hybridization and wash conditions to PSI-BLAST provides an automated, easy-to-use version of achieve hybridization permitting varying degrees of mis a “profile' search, which is a sensitive way to look for match of nucleotides are disclosed, for example, in Meinkoth sequence homologues. The program first performs a gapped et al., 1984, Anal. Biochem. 138, 267-284:Meinkoth et al., BLAST database search. The PSI-BLAST program uses the ibid., is incorporated by reference herein in its entirety. information from any significant alignments returned to 25 More particularly, moderate stringency hybridization and construct a position-specific score matrix, which replaces washing conditions, as referred to herein, refer to conditions the query sequence for the next round of database searching. which permit isolation of nucleic acid molecules having at Therefore, it is to be understood that percent identity can be least about 70% nucleic acid sequence identity with the determined by using any one of these programs. nucleic acid molecule being used to probe in the hybridiza Two specific sequences can be aligned to one another 30 tion reaction (i.e., conditions permitting about 30% or less using BLAST 2 sequence as described in Tatusova and mismatch of nucleotides). High Stringency hybridization and Madden, (1999), "Blast 2 sequences a new tool for com washing conditions, as referred to herein, refer to conditions paring protein and nucleotide sequences. FEMS Microbiol which permit isolation of nucleic acid molecules having at Lett. 174:247-250, incorporated herein by reference in its least about 80% nucleic acid sequence identity with the entirety. BLAST 2 sequence alignment is performed in 35 nucleic acid molecule being used to probe in the hybridiza blastp or blastn using the BLAST 2.0 algorithm to perform tion reaction (i.e., conditions permitting about 20% or less a Gapped BLAST search (BLAST 2.0) between the two mismatch of nucleotides). Very high Stringency hybridiza sequences allowing for the introduction of gaps (deletions tion and washing conditions, as referred to herein, refer to and insertions) in the resulting alignment. For purposes of conditions which permit isolation of nucleic acid molecules clarity herein, a BLAST 2 sequence alignment is performed 40 having at least about 90% nucleic acid sequence identity using the standard default parameters as follows. with the nucleic acid molecule being used to probe in the For blastin, using 0 BLOSUM62 matrix: hybridization reaction (i.e., conditions permitting about 10% Reward for match=1 or less mismatch of nucleotides). As discussed above, one of Penalty for mismatch=-2 skill in the art can use the formulae in Meinkoth et al., ibid. 45 to calculate the appropriate hybridization and wash condi Open gap (5) and extension gap (2) penalties tions to achieve these particular levels of nucleotide mis gap X dropoff(50) expect (10) word size (11) filter (on) match. Such conditions will vary, depending on whether For blastp, using 0 BLOSUM62 matrix: DNA:RNA or DNA:DNA hybrids are being formed. Calcu Open gap (11) and extension gap (1) penalties lated melting temperatures for DNA:DNA hybrids are 10° gap X dropoff(50) expect (10) word size (3) filter (on). 50 C. less than for DNA:RNA hybrids. In particular embodi In another embodiment of the invention, an amino acid ments, stringent hybridization conditions for DNA:DNA sequence having the biological activity of at least one hybrids include hybridization at an ionic strength of 6xSSC domain of a PUFA PKS system of the present invention (0.9 M. Na") at a temperature of between about 20° C. and includes an amino acid sequence that is sufficiently similar about 35° C. (lower stringency), more preferably, between to a naturally occurring PUFA PKS protein or polypeptide 55 about 28° C. and about 40° C. (more stringent), and even that a nucleic acid sequence encoding the amino acid more preferably, between about 35° C. and about 45° C. sequence is capable of hybridizing under moderate, high, or (even more stringent), with appropriate wash conditions. In very high stringency conditions (described below) to (i.e., particular embodiments, stringent hybridization conditions with) a nucleic acid molecule encoding the naturally occur for DNA:RNA hybrids include hybridization at an ionic ring PUFA PKS protein or polypeptide (i.e., to the comple 60 strength of 6xSSC (0.9 M. Na") at a temperature of between ment of the nucleic acid strand encoding the naturally about 30° C. and about 45° C., more preferably, between occurring PUFA PKS protein or polypeptide). Preferably, an about 38° C. and about 50° C., and even more preferably, amino acid sequence having the biological activity of at least between about 45° C. and about 55° C., with similarly one domain of a PUFA PKS system of the present invention stringent wash conditions. These values are based on cal is encoded by a nucleic acid sequence that hybridizes under 65 culations of a melting temperature for molecules larger than moderate, high or very high Stringency conditions to the about 100 nucleotides, 0% formamide and a G+C content of complement of a nucleic acid sequence that encodes a about 40%. Alternatively, T, can be calculated empirically US 7,256,023 B2 31 32 as set forth in Sambrook et al., Supra, pages 9.31 to 9.62. In vector in a manner that operatively links the nucleic acid general, the wash conditions should be as stringent as sequence to regulatory sequences in the vector which enable possible, and should be appropriate for the chosen hybrid the transcription and translation of the nucleic acid sequence ization conditions. For example, hybridization conditions within the recombinant host cell. can include a combination of salt and temperature conditions In another embodiment, a recombinant vector used in a that are approximately 20-25°C. below the calculated T of recombinant nucleic acid molecule of the present invention a particular hybrid, and wash conditions typically include a is a targeting vector. As used herein, the phrase “targeting combination of Salt and temperature conditions that are vector” is used to refer to a vector that is used to deliver a approximately 12-20° C. below the calculated T of the particular nucleic acid molecule into a recombinant host cell, particular hybrid. One example of hybridization conditions 10 wherein the nucleic acid molecule is used to delete or suitable for use with DNA:DNA hybrids includes a 2-24 inactivate an endogenous gene within the host cell or hour hybridization in 6xSSC (50% formamide) at about 42 microorganism (i.e., used for targeted gene disruption or C., followed by washing steps that include one or more knock-out technology). Such a vector may also be known in washes at room temperature in about 2xSSC, followed by the art as a “knock-out vector. In one aspect of this additional washes at higher temperatures and lower ionic 15 embodiment, a portion of the vector, but more typically, the strength (e.g., at least one wash as about 37° C. in about nucleic acid molecule inserted into the vector (i.e., the 0.1X-0.5xSSC, followed by at least one wash at about 68°C. insert), has a nucleic acid sequence that is homologous to a in about 0.1x-0.5xSSC). nucleic acid sequence of a target gene in the host cell (i.e., Another embodiment of the present invention includes a a gene which is targeted to be deleted or inactivated). The recombinant nucleic acid molecule comprising a recombi nucleic acid sequence of the vector insert is designed to bind nant vector and a nucleic acid molecule comprising a nucleic to the target gene Such that the target gene and the insert acid sequence encoding an amino acid sequence having a undergo homologous recombination, whereby the endog biological activity of at least one domain of a PUFA PKS enous target gene is deleted, inactivated or attenuated (i.e., system as described herein. Such nucleic acid sequences are by at least a portion of the endogenous target gene being described in detail above. According to the present inven 25 mutated or deleted). tion, a recombinant vector is an engineered (i.e., artificially Typically, a recombinant nucleic acid molecule includes produced) nucleic acid molecule that is used as a tool for at least one nucleic acid molecule of the present invention manipulating a nucleic acid sequence of choice and for operatively linked to one or more transcription control introducing Such a nucleic acid sequence into a host cell. The sequences. As used herein, the phrase “recombinant mol recombinant vector is therefore Suitable for use in cloning, 30 ecule' or “recombinant nucleic acid molecule' primarily sequencing, and/or otherwise manipulating the nucleic acid refers to a nucleic acid molecule or nucleic acid sequence sequence of choice, such as by expressing and/or delivering operatively linked to a transcription control sequence, but the nucleic acid sequence of choice into a host cell to form can be used interchangeably with the phrase “nucleic acid a recombinant cell. Such a vector typically contains heter molecule', when such nucleic acid molecule is a recombi ologous nucleic acid sequences, that is nucleic acid 35 nant molecule as discussed herein. According to the present sequences that are not naturally found adjacent to nucleic invention, the phrase “operatively linked’ refers to linking a acid sequence to be cloned or delivered, although the vector nucleic acid molecule to a transcription control sequence in can also contain regulatory nucleic acid sequences (e.g., a manner such that the molecule is able to be expressed promoters, untranslated regions) which are naturally found when transfected (i.e., transformed, transduced, transfected, adjacent to nucleic acid molecules of the present invention 40 conjugated or conduced) into a host cell. Transcription or which are useful for expression of the nucleic acid control sequences are sequences which control the initiation, molecules of the present invention (discussed in detail elongation, or termination of transcription. Particularly below). The vector can be either RNA or DNA, either important transcription control sequences are those which prokaryotic or eukaryotic, and typically is a plasmid. The control transcription initiation, such as promoter, enhancer, vector can be maintained as an extrachromosomal element 45 operator and repressor sequences. Suitable transcription (e.g., a plasmid) or it can be integrated into the chromosome control sequences include any transcription control of a recombinant organism (e.g., a microbe or a plant). The sequence that can function in a host cell or organism into entire vector can remain in place within a host cell, or under which the recombinant nucleic acid molecule is to be certain conditions, the plasmid DNA can be deleted, leaving introduced. behind the nucleic acid molecule of the present invention. 50 Recombinant nucleic acid molecules of the present inven The integrated nucleic acid molecule can be under chromo tion can also contain additional regulatory sequences, such Somal promoter control, under native or plasmid promoter as translation regulatory sequences, origins of replication, control, or under a combination of several promoter con and other regulatory sequences that are compatible with the trols. Single or multiple copies of the nucleic acid molecule recombinant cell. In one embodiment, a recombinant mol can be integrated into the chromosome. A recombinant 55 ecule of the present invention, including those which are vector of the present invention can contain at least one integrated into the host cell chromosome, also contains selectable marker. secretory signals (i.e., signal segment nucleic acid In one embodiment, a recombinant vector used in a sequences) to enable an expressed protein to be secreted recombinant nucleic acid molecule of the present invention from the cell that produces the protein. Suitable signal is an expression vector. As used herein, the phrase “expres 60 segments include a signal segment that is naturally associ sion vector” is used to refer to a vector that is suitable for ated with the protein to be expressed or any heterologous production of an encoded product (e.g., a protein of interest). signal segment capable of directing the secretion of the In this embodiment, a nucleic acid sequence encoding the protein according to the present invention. In another product to be produced (e.g., a PUFA PKS domain) is embodiment, a recombinant molecule of the present inven inserted into the recombinant vector to produce a recombi 65 tion comprises a leader sequence to enable an expressed nant nucleic acid molecule. The nucleic acid sequence protein to be delivered to and inserted into the membrane of encoding the protein to be produced is inserted into the a host cell. Suitable leader sequences include a leader US 7,256,023 B2 33 34 sequence that is naturally associated with the protein, or any fection' when such term is used to refer to the introduction heterologous leader sequence capable of directing the deliv of nucleic acid molecules into microbial cells. Such as algae, ery and insertion of the protein to the membrane of a cell. bacteria and yeast. In microbial systems, the term “trans The present inventors have found that the Schizochytrium formation' is used to describe an inherited change due to the PUFA PKS Orfs A and B are closely linked in the genome acquisition of exogenous nucleic acids by the microorgan and region between the Orfs has been sequenced. The Orfs ism and is essentially synonymous with the term “transfec are oriented in opposite directions and 4244 base pairs tion.” However, in animal cells, transformation has acquired separate the start (ATG) codons (i.e. they are arranged as a second meaning which can refer to changes in the growth follows: 3'OrfA5'-4244 bp-5'OrfB3'). Examination of the properties of cells in culture after they become cancerous, 4244 bp intergenic region did not reveal any obvious Orfs 10 for example. Therefore, to avoid confusion, the term “trans (no significant matches were found on a BlastX search). fection' is preferably used with regard to the introduction of Both Orfs A and B are highly expressed in Schizochytrium, exogenous nucleic acids into animal cells, and the term at least during the time of oil production, implying that “transfection' will be used herein to generally encompass active promoter elements are embedded in this intergenic transfection of animal cells, plant cells and transformation of region. These genetic elements are believed to have utility as 15 microbial cells, to the extent that the terms pertain to the a bi-directional promoter sequence for transgenic applica introduction of exogenous nucleic acids into a cell. There tions. For example, in a preferred embodiment, one could fore, transfection techniques include, but are not limited to, clone this region, place any genes of interest at each end and transformation, particle bombardment, electroporation, introduce the construct into Schizochytrium (or some other microinjection, lipofection, adsorption, infection and proto host in which the promoters can be shown to function). It is plast fusion. predicted that the regulatory elements, under the appropriate It will be appreciated by one skilled in the art that use of conditions, would provide for coordinated, high level recombinant DNA technologies can improve control of expression of the two introduced genes. The complete expression of transfected nucleic acid molecules by manipu nucleotide sequence for the regulatory region containing lating, for example, the number of copies of the nucleic acid Schizochytrium PUFA PKS regulatory elements (e.g., a 25 molecules within the host cell, the efficiency with which promoter) is represented herein as SEQ ID NO:36. those nucleic acid molecules are transcribed, the efficiency In a similar manner, Orf is highly expressed in with which the resultant transcripts are translated, and the Schizochytrium during the time of oil production and regu efficiency of post-translational modifications. Additionally, latory elements are expected to reside in the region upstream the promoter sequence might be genetically engineered to of its start codon. A region of genomic DNA upstream of 30 improve the level of expression as compared to the native OrfQ has been cloned and sequenced and is represented promoter. Recombinant techniques useful for controlling the hereinas (SEQID NO:37). This sequence contains the 3886 expression of nucleic acid molecules include, but are not nt immediately upstream of the Orf start codon. Exami limited to, integration of the nucleic acid molecules into one nation of this region did not reveal any obvious Orfs (i.e., no or more host cell chromosomes, addition of vector stability significant matches were found on a BlastX search). It is 35 sequences to plasmids, Substitutions or modifications of believed that regulatory elements contained in this region, transcription control signals (e.g., promoters, operators, under the appropriate conditions, will provide for high-level enhancers), Substitutions or modifications of translational expression of a gene placed behind them. Additionally, control signals (e.g., ribosome binding sites, Shine-Dalgarno under the appropriate conditions, the level of expression sequences), modification of nucleic acid molecules to cor may be coordinated with genes under control of the A-B 40 respond to the codon usage of the host cell, and deletion of intergenic region (SEQ ID NO:36). sequences that destabilize transcripts. Therefore, in one embodiment, a recombinant nucleic General discussion above with regard to recombinant acid molecule useful in the present invention, as disclosed nucleic acid molecules and transfection of host cells is herein, can include a PUFAPKS regulatory region contained intended to be applied to any recombinant nucleic acid within SEQ ID NO:36 and/or SEQ ID NO:37. Such a 45 molecule discussed herein, including those encoding any regulatory region can include any portion (fragment) of SEQ amino acid sequence having a biological activity of at least ID NO:36 and/or SEQ ID NO:37 that has at least basal one domain from a PUFA PKS, those encoding amino acid PUFA PKS transcriptional activity. sequences from other PKS systems, and those encoding One or more recombinant molecules of the present inven other proteins or domains. tion can be used to produce an encoded product (e.g., a 50 This invention also relates to the use of a novel method to PUFA PKS domain, protein, or system) of the present identify a microorganism that has a PUFA PKS system that invention. In one embodiment, an encoded product is pro is homologous in structure, domain organization and/or duced by expressing a nucleic acid molecule as described function to a Schizochytrium PUFA PKS system. In one herein under conditions effective to produce the protein. A embodiment, the microorganism is a non-bacterial microor preferred method to produce an encoded protein is by 55 ganism, and preferably, the microorganism identified by this transfecting a host cell with one or more recombinant method is a eukaryotic microorganism. In addition, this molecules to form a recombinant cell. Suitable host cells to invention relates to the microorganisms identified by Such transfect include, but are not limited to, any bacterial, fungal method and to the use of these microorganisms and the (e.g., yeast), insect, plant or animal cell that can be trans PUFA PKS systems from these microorganisms in the fected. Host cells can be either untransfected cells or cells 60 various applications for a PUFA PKS system (e.g., geneti that are already transfected with at least one other recom cally modified organisms and methods of producing bioac binant nucleic acid molecule. tive molecules) according to the present invention. The According to the present invention, the term “transfec unique screening method described and demonstrated herein tion' is used to refer to any method by which an exogenous enables the rapid identification of new microbial strains nucleic acid molecule (i.e., a recombinant nucleic acid 65 containing a PUFA PKS system homologous to the molecule) can be inserted into a cell. The term “transfor Schizochytrium PUFA PKS system of the present invention. mation' can be used interchangeably with the term “trans Applicants have used this method to discover and disclose US 7,256,023 B2 35 36 herein that a Thraustochytrium microorganism contains a shaker table, are instead filled to greater than about 50% of PUFA PKS system that is homologous to that found in their capacity, and more preferably greater than about 60%, Schizochytrium. This discovery is described in detail in and most preferably greater than about 75% of their capacity Example 2 below. with culture medium. High loading of the shake flask with Microbial organisms with a PUFA PKS system similar to culture medium prevents it from mixing very well in the that found in Schizochytrium, such as the Thraustochytrium flask when it is placed on a shaker table, preventing oxygen microorganism discovered by the present inventors and diffusion into the culture. Therefore as the microbes grow, described in Example 2, can be readily identified/isolated/ they use up the existing oxygen in the medium and naturally screened by the following methods used separately or in any create a low or no oxygen environment in the shake flask. combination of these methods. 10 After the culture period, the cells are harvested and In general, the method to identify a non-bacterial micro analyzed for content of bioactive compounds of interest organism that has a polyunsaturated fatty acid (PUFA) (e.g., lipids), but most particularly, for compounds contain polyketide synthase (PKS) system includes a first step of (a) ing two or more unsaturated bonds, and more preferably selecting a microorganism that produces at least one PUFA; three or more double bonds, and even more preferably four and a second step of (b) identifying a microorganism from 15 or more double bonds. For lipids, those strains possessing (a) that has an ability to produce increased PUFAs under Such compounds at greater than about 5%, and more pref dissolved oxygen conditions of less than about 5% of erably greater than about 10%, and more preferably greater saturation in the fermentation medium, as compared to than about 15%, and even more preferably greater than production of PUFAs by said microorganism under dis about 20% of the dry weight of the microorganism are Solved oxygen conditions of greater than 5% of Saturation, identified as predictably containing a novel PKS system of more preferably 10% of saturation, more preferably greater the type described above. For other bioactive compounds, than 15% of saturation and more preferably greater than Such as antibiotics or compounds that are synthesized in 20% of saturation in the fermentation medium. A microor Smaller amounts, those strains possessing Such compounds ganism that produces at least one PUFA and has an ability at greater than about 0.5%, and more preferably greater than to produce increased PUFAs under dissolved oxygen con 25 about 0.1%, and more preferably greater than about 0.25%, ditions of less than about 5% of saturation is identified as a and more preferably greater than about 0.5%, and more candidate for containing a PUFAPKS system. Subsequent to preferably greater than about 0.75%, and more preferably identifying a microorganism that is a strong candidate for greater than about 1%, and more preferably greater than containing a PUFA PKS system, the method can include an about 2.5%, and more preferably greater than about 5% of additional step (c) of detecting whether the organism iden 30 the dry weight of the microorganism are identified as pre tified in step (b) comprises a PUFA PKS system. dictably containing a novel PKS system of the type In one embodiment of the present invention, step (b) is described above. performed by culturing the microorganism selected for the Alternatively, or in conjunction with this method, pro screening process in low oxygen/anoxic conditions and spective microbial strains containing novel PUFA PKS aerobic conditions, and, in addition to measuring PUFA 35 systems as described herein can be identified by examining content in the organism, the fatty acid profile is determined, the fatty acid profile of the strain (obtained by culturing the as well as fat content. By comparing the results under low organism or through published or other readily available oxygen/anoxic conditions with the results under aerobic sources). If the microbe contains greater than about 30%, conditions, the method provides a strong indication of and more preferably greater than about 40%, and more whether the test microorganism contains a PUFA PKS 40 preferably greater than about 45%, and even more preferably system of the present invention. This preferred embodiment greater than about 50% of its total fatty acids as C 14:0, C is described in detail below. 16:0 and/or C 16:1, while also producing at least one long Initially, microbial strains to be examined for the presence chain fatty acid with three or more unsaturated bonds, and of a PUFAPKS system are cultured under aerobic conditions more preferably 4 or more double bonds, and more prefer to induce production of a large number of cells (microbial 45 ably 5 or more double bonds, and even more preferably 6 or biomass). As one element of the identification process, these more double bonds, then this microbial strain is identified as cells are then placed under low oxygen or anoxic culture a likely candidate to possess a novel PUFA PKS system of conditions (e.g., dissolved oxygen less than about 5% of the type described in this invention. Screening this organism saturation, more preferably less than about 2%, even more under the low oxygen conditions described above, and is preferably less than about 1%, and most preferably dis 50 confirming production of bioactive molecules containing solved oxygen of about 0% of saturation in the culture two or more unsaturated bonds would suggest the existence medium) and allowed to grow for approximately another of a novel PUFA PKS system in the organism, which could 24-72 hours. In this process, the microorganisms should be be further confirmed by analysis of the microbes genome. cultured at a temperature greater than about 15°C., and more The success of this method can also be enhanced by preferably greater than about 20° C., and even more pref 55 screening eukaryotic strains that are known to contain C17:0 erably greater than about 25°C., and even more preferably and or C17:1 fatty acids (in conjunction with the large greater than 30° C. The low or anoxic culture environment percentages of C14:0, C16:0 and C16:1 fatty acids described can be easily maintained in culture chambers capable of above) because the C17:0 and C17:1 fatty acids are poten inducing this type of atmospheric environment in the cham tial markers for a bacterial (prokaryotic) based or influenced ber (and thus in the cultures) or by culturing the cells in a 60 fatty acid production system. Another marker for identifying manner that induces the low oxygen environment directly in strains containing novel PUFA PKS systems is the produc the culture flask/vessel itself. tion of simple fatty acid profiles by the organism. According In a preferred culturing method, the microbes can be to the present invention, a “simple fatty acid profile' is cultured in shake flasks which, instead of normally contain defined as 8 or fewer fatty acids being produced by the strain ing a small amount of culture medium-less than about 50% 65 at levels greater than 10% of total fatty acids. of total capacity and usually less than about 25% of total Use of any of these methods or markers (singly or capacity—to keep the medium aerated as it is shaken on a preferably in combination) would enable one of skill in the US 7,256,023 B2 37 38 art to readily identify microbial strains that are highly more nucleic acid sequences that encode a domain of a predicted to contain a novel PUFA PKS system of the type PUFA PKS system as described herein. Preferably, this step described in this invention. of detection includes a suitable nucleic acid detection In a preferred embodiment combining many of the meth method, such as hybridization, amplification and or sequenc ods and markers described above, a novel biorational Screen 5 ing of one or more nucleic acid sequences in the microbe of (using shake flask cultures) has been developed for detecting interest. The probes and/or primers used in the detection microorganisms containing PUFA producing PKS systems. methods can be derived from any known PUFAPKS system, This screening system is conducted as follows: including the marine bacteria PUFA PKS systems described A portion of a culture of the strain/microorganism to be in U.S. Pat. No. 6,140,486, or the Thraustochytrid PUFA tested is placed in 250 mL baffled shake flask with 50 mL 10 PKS systems described in U.S. application Ser. No. 09/231, culture media (aerobic treatment), and another portion of 899 and herein. Once novel PUFA PKS systems are iden culture of the same strain is placed in a 250 mL non-baffled tified, the genetic material from these systems can also be shake flask with 200 mL culture medium (anoxic/low oxy used to detect additional novel PUFAPKS systems. Methods gen treatment). Various culture media can be employed of hybridization, amplification and sequencing of nucleic depending on the type and strain of microorganism being 15 evaluated. Both flasks are placed on a shaker table at 200 acids for the purpose of identification and detection of a rpm. After 48-72 hr of culture time, the cultures are har sequence are well known in the art. Using these detection vested by centrifugation and the cells are analyzed for fatty methods, sequence homology and domain structure (e.g., the acid methyl ester content via gas chromatography to deter presence, number and/or arrangement of various PUFAPKS mine the following data for each culture: (1) fatty acid functional domains) can be evaluated and compared to the profile; (2) PUFA content; and (3) fat content (approximated known PUFA PKS systems described herein. as amount total fatty acids/cell dry weight). In some embodiments, a PUFA PKS system can be These data are then analyzed asking the following five identified using biological assays. For example, in U.S. questions (Yes/No): application Ser. No. 09/231,899, Example 7, the results of a 25 key experiment using a well-known inhibitor of some types Comparing the Data from the low O/Anoxic Flask with the of fatty acid synthesis systems, i.e., thiolactomycin, is Data from the Aerobic Flask: described. The inventors showed that the synthesis of (1) Did the DHA (or other PUFA content) (as % FAME PUFAs in whole cells of Schizochytrium could be specifi (fatty acid methyl esters)) stay about the same or preferably cally blocked without blocking the synthesis of short chain increased in the low oxygen culture compared to the aerobic 30 saturated fatty acids. The significance of this result is as culture? follows: the inventors knew from analysis of cDNA (2) Is C14:0+C16:0+C16:1 greater than about 40%TFA in sequences from Schizochytrium that a Type I fatty acid the anoxic culture? synthase system is present in Schizochytrium. It was known (3) Are there very little (<1% as FAME) or no precursors that thiolactomycin does not inhibit Type I FAS systems, and (C18:3n-3+C18:2n-6+C18:3n-6) to the conventional oxy 35 this is consistent with the inventors data—i.e., production gen dependent elongase? desaturase pathway in the anoxic of the saturated fatty acids (primarily C14:0 and C16:0 in culture? Schizochytrium) was not inhibited by the thiolactomycin (4) Did fat content (as amount total fatty acids/cell dry treatment. There are no indications in the literature or in the weight) increase in the low oxygen culture compared to the inventors’ own data that thiolactomycin has any inhibitory aerobic culture? 40 effect on the elongation of C14:0 or C16:0 fatty acids or their (5) Did DHA (or other PUFA content) increase as % cell desaturation (i.e. the conversion of short chain Saturated dry weight in the low oxygen culture compared to the fatty acids to PUFAs by the classical pathway). Therefore, aerobic culture? the fact that the PUFA production in Schizochytrium was If the first three questions are answered yes, this is a good blocked by thiolactomycin strongly indicates that the clas indication that the strain contains a PKS genetic system for 45 sical PUFA synthesis pathway does not produce the PUFAs making long chain PUFAs. The more questions that are in Schizochytrium, but rather that a different pathway of answered yes (preferably the first three questions must be synthesis is involved. Further, it had previously been deter answered yes), the stronger the indication that the strain mined that the Shewanella PUFAPKS system is inhibited by contains such a PKS genetic system. If all five questions are thiolactomycin (note that the PUFA PKS system of the answered yes, then there is a very strong indication that the 50 present invention has elements of both Type I and Type II strain contains a PKS genetic system for making long chain systems), and it was known that thiolactomycin is an inhibi PUFAs. The lack of 18:3n-3/18:2n-6/18:3n-6 would indi tor of Type II FAS systems (such as that found in E. coli). cate that the low oxygen conditions would have turned off or Therefore, this experiment indicated that Schizochytrium inhibited the conventional pathway for PUFA synthesis. A produced PUFAs as a result of a pathway not involving the high 14:0/16:0/16:1 fatty is an preliminary indicator of a 55 Type I FAS. A similar rationale and detection step could be bacterially influenced fatty acid synthesis profile (the pres used to detect a PUFA PKS system in a microbe identified ence of C17:0 and 17:1 is also and indicator of this) and of using the novel screening method disclosed herein. a simple fatty acid profile. The increased PUFA synthesis In addition, Example 3 shows additional biochemical data and PUFA containing fat synthesis under the low oxygen which provides evidence that PUFAs in Schizochytrium are conditions is directly indicative of a PUFA PKS system, 60 not produced by the classical pathway (i.e., precursor prod since this system does not require oxygen to make highly uct kinetics between C16:0 and DHA are not observed in unsaturated fatty acids. whole cells and, in vitro PUFA synthesis can be separated Finally, in the identification method of the present inven from the membrane fraction—all of the fatty acid desatu tion, once a strong candidate is identified, the microbe is rases of the classical PUFA synthesis pathway, with the preferably screened to detect whether or not the microbe 65 exception of the delta 9 desaturase which inserts the first contains a PUFA PKS system. For example, the genome of double bond of the series, are associated with cellular the microbe can be screened to detect the presence of one or membranes). This type of biochemical data could be used to US 7,256,023 B2 39 40 detect PUFA PKS activity in microbe identified by the novel the potential for transfer of a bacterial PKS system into a screening method described above. eukaryotic cell if a mistake occurred and the bacterial cell Preferred microbial Strains to screen using the screening/ (or its DNA) did not get digested and instead are function identification method of the present invention are chosen ally incorporated into the eukaryotic cell. from the group consisting of bacteria, algae, fungi, protozoa 5 Strains of microbes (other than the members of the or , but most preferably from the eukaryotic Thraustochytrids) capable of bacterivory (especially by microbes consisting of algae, fungi, protozoa and protists. phagocytosis or endocytosis) can be found in the following These microbes are preferably capable of growth and pro microbial classes (including but not limited to example duction of the bioactive compounds containing two or more genera): unsaturated bonds attemperatures greater than about 15° C., 10 In the algae and algae-like microbes (including strameno more preferably greater than about 20° C., even more piles): of the class Euglenophyceae (for example genera preferably greater than about 25° C. and most preferably Euglena, and Peranema), the class Chrysophyceae (for greater than about 30° C. example the genus Ochromonas), the class Dinobryaceae In some embodiments of this method of the present (for example the genera , Platychrysis, and invention, novel bacterial PUFA PKS systems can be iden- 15 Chrysochromulina), the Dinophyceae (including the genera tified in bacteria that produce PUFAs at temperatures Crypthecodinium, Gymnodinium, Peridinium, Ceratium, exceeding about 20° C., preferably exceeding about 25°C. Gyrodinium, and Oxyrrhis), the class Cryptophyceae (for and even more preferably exceeding about 30° C. As example the genera Cryptomonas, and Rhodomonas), the described previously herein, the marine bacteria, class Xanthophyceae (for example the genus Olisthodiscus) Shewanella and Vibrio marinus, described in U.S. Pat. No. 20 (and including forms of algae in which an amoeboid stage 6,140,486, do not produce PUFAs at higher temperatures, occurs as in the Rhizochloridaceae, and which limits the usefulness of PUFA PKS systems derived Zoospores/gametes of Aphanochaete pascheri, Bumilleria from these bacteria, particularly in plant applications under stigeoclonium and Vaucheria geminata), the class Eustig field conditions. Therefore, in one embodiment, the screen matophyceae, and the class Prymnesiopyceae (including the ing method of the present invention can be used to identify 25 genera Prymnesium and Diacronema). bacteria that have a PUFA PKS system which are capable of In the including the: Proteromonads, Opa growth and PUFA production at higher temperatures (e.g., lines, Developayella, Diplophorys, Larbrinthulids, Thraus above about 20.25, or 30°C.). In this embodiment, inhibitors tochytrids, Bicosecids, , Hypochytridiomycetes, of eukaryotic growth Such as nystatin (antifungal) or cyclo Commation, Reticulosphaera, Pelagomonas, Pelapococcus, heximide (inhibitor of eukaryotic protein synthesis) can be 30 Ollicola, Aureococcus, , Raphidiophytes, Synurids, added to agar plates used to culture/select initial strains from Rhizochromulinaales, , , Chrysom water samples/soil samples collected from the types of eridales, Sarcinochrysidales, , , and habitats/niches described below. This process would help . select for enrichment of bacterial strains without (or mini In the Fungi. Class Myxomycetes (form myxamoebae)— mal) contamination of eukaryotic strains. This selection 35 slime molds, class Acrasieae including the orders Acrasiceae process, in combination with culturing the plates at elevated (for example the genus Sappinia), class Guttulinaceae (for temperatures (e.g. 30° C.), and then selecting strains that example the genera Guttulinopsis, and Guttulina), class produce at least one PUFA would initially identify candidate Dicty Steliaceae (for example the genera Acrasis, Dictyoste bacterial strains with a PUFA PKS system that is operative lium, Polysphondylium, and Coenonia), and class Phyco at elevated temperatures (as opposed to those bacterial 40 myceae including the orders Chytridiales, Ancylistales, strains in the prior art which only exhibit PUFA production Blastocladiales, Monoblepharidales, , Per at temperatures less than about 20° C. and more preferably onosporales. Mucorales, and Entomophthorales. below about 5° C.). In the Protozoa: Protozoa strains with life stages capable Locations for collection of the preferred types of microbes of bacterivory (including by phageocytosis) can be selected for screening for a PUFA PKS system according to the 45 from the types classified as ciliates, flagellates or amoebae. present invention include any of the following: low oxygen Protozoan ciliates include the groups: Chonotrichs, Colpo environments (or locations near these types of low oxygen dids, Cyrtophores, Haptorids, Karyorelicts, Oligohymeno environments including in the guts of animals including phora, Polyhymenophora (spirotrichs), Prostomes and Suc invertebrates that consume microbes or microbe-containing toria. Protozoan flagellates include the Biosoecids, foods (including types of filter feeding organisms), low or 50 Bodonids, Cercomonads, Chrysophytes (for example the non-oxygen containing aquatic habitats (including freshwa genera Anthophysa, Chrysanoemba, Chrysosphaerella, ter, Saline and marine), and especially at-or near-low oxygen Dendromonas, Dinobryon, Mallomonas, Ochromonas, environments (regions) in the . The microbial strains Paraphysomonas, Poteriodchromonas, Spumella, Syn would preferably not be obligate anaerobes but be adapted crypta, Synura, and Uroglena), Collar flagellates, Crypto to live in both aerobic and low or anoxic environments. Soil 55 phytes (for example the genera Chilomonas, Cryptomonas, environments containing both aerobic and low oxygen or Cyanomonas, and Goniomonas), Dinoflagellates, anoxic environments would also excellent environments to Diplomonads, Euglenoids, Heterolobosea, Pedinellids, find these organisms in and especially in these types of soil Pelobionts, Phalansteriids, Pseudodendromonads, Spon in aquatic habitats or temporary aquatic habitats. gomonads and Volvocales (and other flagellates including A particularly preferred microbial strain would be a strain 60 the unassigned genera of Artodiscus, Clautriavia, (selected from the group consisting of algae, fungi (includ Helkesimastix, Kathablepharis and Multicilia). Amoeboid ing yeast), protozoa or protists) that, during a portion of its protozoans include the groups: Actinophryids, Centrohelids, life cycle, is capable of consuming whole bacterial cells Desmothoricids, Diplophryids, Eumamoebae, Heterolobo (bacterivory) by mechanisms such as phagocytosis, phag sea, Leptomyxids, Nucleariid filose amoebae, Pelebionts, otrophic or endocytic capability and/or has a stage of its life 65 Testate amoebae and Vampyrellids (and including the unas cycle in which it exists as an amoeboid stage or naked signed amoebid genera Gymnophrys, Biomyxa, Microcom protoplast. This method of nutrition would greatly increase etes, Reticulomyxa, Belonocystis, Elaeorhanis, Allelogro US 7,256,023 B2 41 42 mia, Gromia or Lieberkuhnia). The protozoan orders include Genus: Thraustochytrium, Schizochytrium, Labyrinthu the following: Percolomonadeae, Heterolobosea, Lyromon loides, or Japonochytrium). For the present invention, mem adea, Pseudociliata, Trichomonadea, Hypermastigea, Heter bers of the labrinthulids are considered to be included in the omiteae, Telonemea, Cyathobodonea, Ebridea, Pyytomyxea, Thraustochytrids. Taxonomic changes are summarized Opalinea, Kinetomonadea, Hemimastigea, Protostelea, below. Strains of certain unicellular microorganisms dis Myxagastrea, Dictyostelea, Choanomonadea, Apicomon closed herein are members of the order Thraustochytriales. adea, Eogregarinea, Neogregarinea, Coelotrolphea, Eucoc Thraustochytrids are marine eukaryotes with a evolving cidea, Haemosporea, Piroplasmea, Spirotrichea, Prosto taxonomic history. Problems with the taxonomic placement matea, Litostomatea, Phyllopharyngea, Nassophorea, of the Thraustochytrids have been reviewed by Moss (1986), Oligohymenophorea, Colpodea, Karyorelicta, Nucleohelea, 10 Bahnweb and Jackle (1986) and Chamberlain and Moss Centrohelea, Acantharea, Sticholonchea, Polycystinea, (1988). According to the present invention, the phrases Phaeodarea, Lobosea, Filosea, Athalamea, Monothalamea, “Thraustochytrid, “Thraustochytriales microorganism” and Polythalamea, Xenophyophorea, Schizocladea, Holosea, “microorganism of the order Thraustochytriales' can be Entamoebea, Myxosporea, Actinomyxea, Halosporea, used interchangeably. Paramyxea, Rhombozoa and Orthonectea. 15 For convenience purposes, the Thraustochytrids were first A preferred embodiment of the present invention includes placed by taxonomists with other colorless Zoosporic strains of the microorganisms listed above that have been eukaryotes in the Phycomycetes (algae-like fungi). The collected from one of the preferred habitats listed above. name Phycomycetes, however, was eventually dropped from One embodiment of the present invention relates to any taxonomic status, and the Thraustochytrids were retained in microorganisms identified using the novel PUFA PKS the Oomycetes (the biflagellate Zoosporic fungi). It was screening method described above, to the PUFA PKS genes initially assumed that the Oomycetes were related to the and proteins encoded thereby, and to the use of Such algae, and eventually a wide range of ultrastruc microorganisms and/or PUFA PKS genes and proteins (in tural and biochemical studies, summarized by Barr (Barr, cluding homologues and fragments thereof) in any of the 1981, Biosystems 14:359-370) supported this assumption. methods described herein. In particular, the present inven 25 The Oomycetes were in fact accepted by Leedale (Leedale, tion encompasses organisms identified by the screening 1974, Taxon 23:261-270) and other phycologists as part of method of the present invention which are then genetically the heterokont algae. However, as a matter of convenience modified to regulate the production of bioactive molecules resulting from their heterotrophic nature, the Oomycetes and by said PUFA PKS system. Thraustochytrids have been largely studied by mycologists Yet another embodiment of the present invention relates 30 (scientists who study fungi) rather than phycologists (Sci to an isolated nucleic acid molecule comprising a nucleic entists who study algae). acid sequence encoding at least one biologically active From another taxonomic perspective, evolutionary biolo domain or biologically active fragment thereof of a polyun gists have developed two general Schools of thought as to saturated fatty acid (PUFA) polyketide synthase (PKS) how eukaryotes evolved. One theory proposes an exogenous system from a Thraustochytrid microorganism. As discussed 35 origin of membrane-bound organelles through a series of above, the present inventors have successfully used the endosymbioses (Margulis, 1970, Origin of Eukaryotic Cells. method to identify a non-bacterial microorganism that has a Yale University Press, New Haven); e.g., mitochondria were PUFA PKS system to identify additional members of the derived from bacterial endosymbionts, chloroplasts from order Thraustochytriales which contain a PUFA PKS sys cyanophytes, and flagella from Spirochetes. The other theory tem. The identification of three such microorganisms is 40 Suggests a gradual evolution of the membrane-bound described in Example 2. Specifically, the present inventors organelles from the non-membrane-bounded systems of the have used the screening method of the present invention to prokaryote ancestor via an autogenous process (Cavalier identify Thraustochytrium sp. 23B (ATCC 1020892) as Smith, 1975, Nature (Lond.) 256:462-468). Both groups of being highly predicted to contain a PUFA PKS system, evolutionary biologists however, have removed the followed by detection of sequences in the Thraustochytrium 45 Oomycetes and Thraustochytrids from the fungi and place sp. 23B genome that hybridize to the Schizochytrium PUFA them either with the chromophyte algae in the PKS genes disclosed herein. Schizochytrium limacium (IFO Chromophyta (Cavalier-Smith, 1981, BioSystems 14:461 32693) and Ulkenia (BP-5601) have also been identified as 481) (this kingdom has been more recently expanded to good candidates for containing PUFA PKS systems. Based include other protists and members of this kingdom are now on these data and on the similarities among members of the 50 called Stramenopiles) or with all algae in the kingdom order Thraustochytriales, it is believed that many other Protoctista (Margulis and Sagen, 1985, Biosystems 18:141 Thraustochytriales PUFA PKS systems can now be readily 147). identified using the methods and tools provided by the With the development of electron microscopy, studies on present invention. Therefore, Thraustochytriales PUFAPKS the ultrastructure of the Zoospores of two genera of Thraus systems and portions and/or homologues thereof (e.g., pro 55 tochytrids, Thraustochytrium and Schizochytrium, (Perkins, teins, domains and fragments thereof), genetically modified 1976, pp. 279-312 in “Recent Advances in Aquatic Mycol organisms comprising Such systems and portions and/or ogy’ (ed. E. B. G. Jones), John Wiley & Sons, New York; homologues thereof, and methods of using such microor Kazama, 1980, Can. J Bot. 58:2434-2446; Barr, 1981, ganisms and PUFA PKS systems, are encompassed by the Biosystems 14:359-370) have provided good evidence that present invention. 60 the Thraustochytriaceae are only distantly related to the Developments have resulted in revision of the Oomycetes. Additionally, genetic data representing a corre of the Thraustochytrids. Taxonomic theorists place Thraus spondence analysis (a form of multivariate statistics) of 5 S tochytrids with the algae or algae-like protists. However, ribosomal RNA sequences indicate that Thraustochytriales because of taxonomic uncertainty, it would be best for the are clearly a unique group of eukaryotes, completely sepa purposes of the present invention to consider the strains 65 rate from the fungi, and most closely related to the red and described in the present invention as Thraustochytrids (Or , and to members of the Oomycetes (Mannella, der: Thraustochytriales; Family: Thraustochytriaceae: et al., 1987, Mol. Evol. 24:228-235). Most taxonomists have US 7,256,023 B2 43 44 agreed to remove the Thraustochytrids from the Oomycetes ism does not necessarily endogenously (naturally) contain a (Bartnicki-Garcia, 1987, pp. 389-403 in “Evolutionary Biol PUFA PKS system, but is genetically modified to introduce ogy of the Fungi' (eds. Rayner, A. D. M., Brasier, C. M. & at least one recombinant nucleic acid molecule encoding an Moore, D.), Cambridge University Press, Cambridge). amino acid sequence having the biological activity of at least In Summary, employing the taxonomic system of Cava one domain of a PUFA PKS system. In this aspect, PUFA lier-Smith (Cavalier-Smith, 1981, BioSystems 14:461-481, PKS activity is affected by introducing or increasing PUFA 1983; Cavalier-Smith, 1993, Microbiol Rev. 57:953-994), PKS activity in the organism. Various embodiments associ the Thraustochytrids are classified with the chromophyte ated with each of these aspects will be discussed in greater algae in the kingdom Chromophyta (Stramenopiles). This detail below. taxonomic placement has been more recently reaffirmed by 10 Therefore, according to the present invention, one Cavalier-Smith et al. using the 18s rRNA signatures of the embodiment relates to a genetically modified microorgan Heterokonta to demonstrate that Thraustochytrids are ism, wherein the microorganism expresses a PKS system chromists not Fungi (Cavalier-Smith et al., 1994, Phil. Tran. comprising at least one biologically active domain of a Roy. Soc. London Series BioSciences 346:387-397). This polyunsaturated fatty acid (PUFA) polyketide synthase places them in a completely different kingdom from the 15 (PKS) system. The at least one domain of the PUFA PKS fungi, which are all placed in the kingdom Eufungi. The system is encoded by a nucleic acid sequence chosen from: taxonomic placement of the Thraustochytrids is therefore (a) a nucleic acid sequence encoding at least one domain of summarized below: a polyunsaturated fatty acid (PUFA) polyketide synthase Kingdom: Chromophyta (Stramenopiles) (PKS) system from a Thraustochytrid microorganism; (b) a Phylum: Heterokonta nucleic acid sequence encoding at least one domain of a Order: Thraustochytriales PUFA PKS system from a microorganism identified by a Family: Thraustochytriaceae screening method of the present invention; (c) a nucleic acid Genus: Thraustochytrium, Schizochytrium, Labyrinthu sequence encoding an amino acid sequence that is at least loides, or Japonochytrium about 60% identical to at least 500 consecutive amino acids Some early taxonomists separated a few original members 25 of an amino acid sequence selected from the group consist of the genus Thraustochytrium (those with an amoeboid life ing of: SEQ ID NO:2, SEQ ID NO:4, and SEQ ID NO:6; stage) into a separate genus called Ulkenia. However it is wherein the amino acid sequence has a biological activity of now known that most, if not all, Thraustochytrids (including at least one domain of a PUFA PKS system; and, (d) a Thraustochytrium and Schizochytrium), exhibit amoeboid nucleic acid sequence encoding an amino acid sequence that stages and as such, Ulkenia is not considered by Some to be 30 is at least about 60% identical to an amino acid sequence a valid genus. As used herein, the genus Thraustochytrium selected from the group consisting of: SEQ ID NO:8, SEQ will include Ulkenia. ID NO:10, SEQID NO:13, SEQID NO:18, SEQID NO:20, Despite the uncertainty of taxonomic placement within SEQ ID NO:22, SEQ ID NO:24, SEQ ID NO:26, SEQ ID higher classifications of Phylum and Kingdom, the Thraus NO:28, SEQ ID NO:30, and SEQ ID NO:32; wherein the tochytrids remain a distinctive and characteristic grouping 35 amino acid sequence has a biological activity of at least one whose members remain classifiable within the order Thraus domain of a PUFA PKS system. The genetic modification tochytriales. affects the activity of the PKS system in the organism. The Polyunsaturated fatty acids (PUFAs) are essential mem screening process referenced in part (b) has been described brane components in higher eukaryotes and the precursors of in detail above and includes the steps of: (a) selecting a many lipid-derived signaling molecules. The PUFA PKS 40 microorganism that produces at least one PUFA; and, (b) system of the present invention uses pathways for PUFA identifying a microorganism from (a) that has an ability to synthesis that do not require desaturation and elongation of produce increased PUFAs under dissolved oxygen condi saturated fatty acids. The pathways catalyzed by PUFA tions of less than about 5% of saturation in the fermentation PKSs that are distinct from previously recognized PKSs in medium, as compared to production of PUFAs by the both structure and mechanism. Generation of cis double 45 microorganism under dissolved oxygen conditions of greater bonds is Suggested to involve position-specific isomerases; than about 5% of saturation, and preferably about 10%, and these enzymes are believed to be useful in the production of more preferably about 15%, and more preferably about 20% new families of antibiotics. of Saturation in the fermentation medium. The genetically To produce significantly high yields of various bioactive modified microorganism can include any one or more of the molecules using the PUFA PKS system of the present 50 above-identified nucleic acid sequences, and/or any of the invention, an organism, preferably a microorganism or a other homologues of any of the Schizochytrium PUFA PKS plant, can be genetically modified to affect the activity of a ORFs or domains as described in detail above. PUFA PKS system. In one aspect, Such an organism can As used herein, a genetically modified microorganism can endogenously contain and express a PUFA PKS system, and include a genetically modified bacterium, protist, microal the genetic modification can be a genetic modification of one 55 gae, fungus, or other microbe, and particularly, any of the or more of the functional domains of the endogenous PUFA genera of the order Thraustochytriales (e.g., a Thraus PKS system, whereby the modification has some effect on tochytrid) described herein (e.g., Schizochytrium, Thraus the activity of the PUFAPKS system. In another aspect, such tochytrium, Japonochytrium, Labyrinthuloides). Such a an organism can endogenously contain and express a PUFA genetically modified microorganism has a genome which is PKS system, and the genetic modification can be an intro 60 modified (i.e., mutated or changed) from its normal (i.e., duction of at least one exogenous nucleic acid sequence wild-type or naturally occurring) form such that the desired (e.g., a recombinant nucleic acid molecule), wherein the result is achieved (i.e., increased or modified PUFA PKS exogenous nucleic acid sequence encodes at least one bio activity and/or production of a desired product using the logically active domain or protein from a second PKS PKS system). Genetic modification of a microorganism can system and/or a protein that affects the activity of said PUFA 65 be accomplished using classical strain development and/or PKS system (e.g., a phosphopantetheinyl transferases molecular genetic techniques. Such techniques known in the (PPTase), discussed below). In yet another aspect, the organ art and are generally disclosed for microorganisms, for US 7,256,023 B2 45 46 example, in Sambrook et al., 1989, Molecular Cloning. A ID NO:10, SEQID NO:13, SEQID NO:18, SEQID NO:20, Laboratory Manual, Cold Spring Harbor Labs Press. The SEQ ID NO:22, SEQ ID NO:24, SEQ ID NO:26, SEQ ID reference Sambrook et al., ibid, is incorporated by reference NO:28, SEQ ID NO:30, SEQ ID NO:32, and biologically herein in its entirety. A genetically modified microorganism active fragments thereof: (e) a nucleic acid sequence encod can include a microorganism in which nucleic acid mol ing an amino acid sequence that is at least about 60% ecules have been inserted, deleted or modified (i.e., mutated: identical to at least 500 consecutive amino acids of an amino e.g., by insertion, deletion, Substitution, and/or inversion of acid sequence selected from the group consisting of: SEQID nucleotides), in Such a manner that such modifications NO:2, SEQID NO:4, and SEQID NO:6; wherein the amino provide the desired effect within the microorganism. acid sequence has a biological activity of at least one domain Preferred microorganism host cells to modify according 10 of a PUFA PKS system; and/or (f) a nucleic acid sequence to the present invention include, but are not limited to, any encoding an amino acid sequence that is at least about 60% bacteria, protist, microalga, fungus, or protozoa. In one identical to an amino acid sequence selected from the group aspect, preferred microorganisms to genetically modify consisting of: SEQ ID NO:8, SEQ ID NO:10, SEQ ID include, but are not limited to, any microorganism of the NO:13, SEQ ID NO:18, SEQ ID NO:20, SEQ ID NO:22, order Thraustochytriales. Particularly preferred host cells for 15 SEQ ID NO:24, SEQ ID NO:26, SEQ ID NO:28, SEQ ID use in the present invention could include microorganisms NO:30, and SEQ ID NO:32; wherein the amino acid from a genus including, but not limited to: Thraus sequence has a biological activity of at least one domain of tochytrium, Labyrinthuloides, Japonochytrium, and a PUFA PKS system. The genetically modified plant can Schizochytrium. Preferred species within these genera include any one or more of the above-identified nucleic acid include, but are not limited to: any Schizochytrium species, sequences, and/or any of the other homologues of any of the including Schizochytrium aggregatum, Schizochytrium Schizochytrium PUFA PKS ORFs or domains as described limacinum, Schizochytrium minutum; any Thraustochytrium in detail above. species (including former Ulkenia species such as U. visur As used herein, a genetically modified plant can include gensis, U. amoeboida, U. Sarkariana, U. profilinda, U. any genetically modified plant including higher plants and radiata, U. minuta and Ulkenia sp. BP-5601), and including 25 particularly, any consumable plants or plants useful for Thraustochytrium striatum, Thraustochytrium aureum, producing a desired bioactive molecule of the present inven Thraustochytrium roseum; and any Japonochytrium species. tion. Such a genetically modified plant has a genome which Particularly preferred strains of Thraustochytriales include, is modified (i.e., mutated or changed) from its normal (i.e., but are not limited to: Schizochytrium sp. (S31)(ATCC wild-type or naturally occurring) form such that the desired 20888); Schizochytrium sp. (S8)(ATCC 20889); 30 result is achieved (i.e., increased or modified PUFA PKS Schizochytrium Sp. (LC-RM)(ATCC 18915): activity and/or production of a desired product using the Schizochytrium sp. (SR21); Schizochytrium aggregatum PKS system). Genetic modification of a plant can be accom (Goldstein et Belsky)(ATCC 28209); Schizochytrium plished using classical strain development and/or molecular limacinum (Honda et Yokochi) (IFO 32693); Thraus genetic techniques. Methods for producing a transgenic tochytrium sp. (23B)(ATCC 20891); Thraustochytrium 35 plant, wherein a recombinant nucleic acid molecule encod striatum (Schneider)(ATCC 24473); Thraustochytrium ing a desired amino acid sequence is incorporated into the aureum (Goldstein)(ATCC 34304); Thraustochytrium genome of the plant, are known in the art. A preferred plant roseum (Goldstein)(ATCC 28210); and Japonochytrium sp. to genetically modify according to the present invention is (L1)(ATCC 28207). Other examples of suitable host micro preferably a plant Suitable for consumption by animals, organisms for genetic modification include, but are not 40 including humans. limited to, yeast including Saccharomyces cerevisiae, Sac Preferred plants to genetically modify according to the charomyces Carlsbergensis, or other yeast Such as Candida, present invention (i.e., plant host cells) include, but are not Kluyveromyces, or other fungi, for example, filamentous limited to any higher plants, and particularly consumable fungi such as Aspergillus, Neurospora, Penicillium, etc. plants, including crop plants and especially plants used for Bacterial cells also may be used as hosts. This includes 45 their oils. Such plants can include, for example: canola, Escherichia coli, which can be useful in fermentation pro Soybeans, rapeseed, linseed, corn, Safflowers, Sunflowers cesses. Alternatively, a host such as a Lactobacillus species and tobacco. Other preferred plants include those plants that or Bacillus species can be used as a host. are known to produce compounds used as pharmaceutical Another embodiment of the present invention relates to a agents, flavoring agents, neutraceutical agents, functional genetically modified plant, wherein the plant has been 50 food ingredients or cosmetically active agents or plants that genetically modified to recombinantly express a PKS system are genetically engineered to produce these compounds/ comprising at least one biologically active domain of a agents. polyunsaturated fatty acid (PUFA) polyketide synthase According to the present invention, a genetically modified (PKS) system. The domain is encoded by a nucleic acid microorganism or plant includes a microorganism or plant sequence chosen from: (a) a nucleic acid sequence encoding 55 that has been modified using recombinant technology. As at least one domain of a polyunsaturated fatty acid (PUFA) used herein, genetic modifications which result in a decrease polyketide synthase (PKS) system from a Thraustochytrid in gene expression, in the function of the gene, or in the microorganism; (b) a nucleic acid sequence encoding at least function of the gene product (i.e., the protein encoded by the one domain of a PUFA PKS system from a microorganism gene) can be referred to as inactivation (complete or partial), identified by the screening and selection method described 60 deletion, interruption, blockage or down-regulation of a herein (see brief summary of method in discussion of gene. For example, a genetic modification in a gene which genetically modified microorganism above); (c) a nucleic results in a decrease in the function of the protein encoded acid sequence encoding an amino acid sequence selected by such gene, can be the result of a complete deletion of the from the group consisting of: SEQID NO:2, SEQID NO:4, gene (i.e., the gene does not exist, and therefore the protein SEQID NO:6, and biologically active fragments thereof; (d) 65 does not exist), a mutation in the gene which results in a nucleic acid sequence encoding an amino acid sequence incomplete or no translation of the protein (e.g., the protein selected from the group consisting of: SEQID NO:8, SEQ is not expressed), or a mutation in the gene which decreases US 7,256,023 B2 47 48 or abolishes the natural function of the protein (e.g., a domain or protein, increased inhibition or degradation of the protein is expressed which has decreased or no enzymatic domain or protein and a reduction or elimination of expres activity or action). Genetic modifications that result in an sion of the domain or protein. For example, the action of increase in gene expression or function can be referred to as domain or protein of the present invention can be decreased amplification, overproduction, overexpression, activation, by blocking or reducing the production of the domain or enhancement, addition, or up-regulation of a gene. protein, "knocking out the gene or portion thereof encoding The genetic modification of a microorganism or plant the domain or protein, reducing domain or protein activity, according to the present invention preferably affects the or inhibiting the activity of the domain or protein. Blocking activity of the PKS system expressed by the plant, whether or reducing the production of an domain or protein can the PKS system is endogenous and genetically modified, 10 include placing the gene encoding the domain or protein endogenous with the introduction of recombinant nucleic under the control of a promoter that requires the presence of acid molecules into the organism, or provided completely by an inducing compound in the growth medium. By establish recombinant technology. According to the present invention, ing conditions such that the inducer becomes depleted from to “affect the activity of a PKS system’ includes any genetic the medium, the expression of the gene encoding the domain modification that causes any detectable or measurable 15 or protein (and therefore, of protein synthesis) could be change or modification in the PKS system expressed by the turned off. Blocking or reducing the activity of domain or organism as compared to in the absence of the genetic protein could also include using an excision technology modification. A detectable change or modification in the approach similar to that described in U.S. Pat. No. 4,743, PKS system can include, but is not limited to: the introduc 546, incorporated herein by reference. To use this approach, tion of PKS system activity into an organism such that the the gene encoding the protein of interest is cloned between organism now has measurable/detectable PKS system activ specific genetic sequences that allow specific, controlled ity (i.e., the organism did not contain a PKS system prior to excision of the gene from the genome. Excision could be the genetic modification), the introduction into the organism prompted by, for example, a shift in the cultivation tem of a functional domain from a different PKS system than a perature of the culture, as in U.S. Pat. No. 4,743,546, or by PKS system endogenously expressed by the organism Such 25 Some other physical or nutritional signal. that the PKS system activity is modified (e.g., a bacterial In one embodiment of the present invention, a genetic PUFA PKS domain or a type I PKS domain is introduced modification includes a modification of a nucleic acid into an organism that endogenously expresses a non-bacte sequence encoding an amino acid sequence that has a rial PUFA PKS system), a change in the amount of a biological activity of at least one domain of a non-bacterial bioactive molecule produced by the PKS system (e.g., the 30 PUFA PKS system as described herein. Such a modification system produces more (increased amount) or less (decreased can be to an amino acid sequence within an endogenously amount) of a given product as compared to in the absence of (naturally) expressed non-bacterial PUFA PKS system, the genetic modification), a change in the type of a bioactive whereby a microorganism that naturally contains such a molecule produced by the PKS system (e.g., the system system is genetically modified by, for example, classical produces a new or different product, or a variant of a product 35 mutagenesis and selection techniques and/or molecular that is naturally produced by the system), and/or a change in genetic techniques, include genetic engineering techniques. the ratio of multiple bioactive molecules produced by the Genetic engineering techniques can include, for example, PKS system (e.g., the system produces a different ratio of using a targeting recombinant vector to delete a portion of an one PUFA to another PUFA, produces a completely different endogenous gene, or to replace a portion of an endogenous lipid profile as compared to in the absence of the genetic 40 gene with a heterologous sequence. Examples of heterolo modification, or places various PUFAs in different positions gous sequences that could be introduced into a host genome in a triacylglycerol as compared to the natural configura include sequences encoding at least one functional domain tion). Such a genetic modification includes any type of from another PKS system, such as a different non-bacterial genetic modification and specifically includes modifications PUFA PKS system, a bacterial PUFA PKS system, a type I made by recombinant technology and by classical mutagen 45 iterative (fungal-like) PKS system, a type II PKS system, or esis. a type I modular PKS system. Other heterologous sequences It should be noted that reference to increasing the activity to introduce into the genome of a host includes a sequence of a functional domain or protein in a PUFA PKS system encoding a protein or functional domain that is not a domain refers to any genetic modification in the organism containing of a PKS system, but which will affect the activity of the the domain or protein (or into which the domain or protein 50 endogenous PKS system. For example, one could introduce is to be introduced) which results in increased functionality into the host genome a nucleic acid molecule encoding a of the domain or protein system and can include higher phosphopantetheinyl transferase (discussed below). Specific activity of the domain or protein (e.g., specific activity or in modifications that could be made to an endogenous PUFA Vivo enzymatic activity), reduced inhibition or degradation PKS system are discussed in detail below. of the domain or protein system, and overexpression of the 55 In another aspect of this embodiment of the invention, the domain or protein. For example, gene copy number can be genetic modification can include: (1) the introduction of a increased, expression levels can be increased by use of a recombinant nucleic acid molecule encoding an amino acid promoter that gives higher levels of expression than that of sequence having a biological activity of at least one domain the native promoter, or a gene can be altered by genetic of a non-bacterial PUFA PKS system; and/or (2) the intro engineering or classical mutagenesis to increase the activity 60 duction of a recombinant nucleic acid molecule encoding a of the domain or protein encoded by the gene. protein or functional domain that affects the activity of a Similarly, reference to decreasing the activity of a func PUFA PKS system, into a host. The host can include: (1) a tional domain or protein in a PUFAPKS system refers to any host cell that does not express any PKS system, wherein all genetic modification in the organism containing Such functional domains of a PKS system are introduced into the domain or protein (or into which the domain or protein is to 65 host cell, and wherein at least one functional domain is from be introduced) which results in decreased functionality of a non-bacterial PUFA PKS system; (2) a host cell that the domain or protein and includes decreased activity of the expresses a PKS system (endogenous or recombinant) hav US 7,256,023 B2 49 50 ing at least one functional domain of a non-bacterial PUFA in the production of EPA (C20:5, co-3) and ARA (C20:4, PKS system, wherein the introduced recombinant nucleic (O-6). For example, in heterologous systems, one could acid molecule can encode at least one additional non exploit the CLF by directly substituting a CLF from an EPA bacterial PUFA PKS domain function or another protein or producing system (such as one from Photobacterium) into domain that affects the activity of the host PKS system; and 5 the Schizochytrium gene set. The fatty acids of the resulting (3) a host cell that expresses a PKS system (endogenous or transformants can then be analyzed for alterations in profiles recombinant) which does not necessarily include a domain to identify the transformants producing EPA and/or ARA. function from a non-bacterial PUFA PKS, and wherein the introduced recombinant nucleic acid molecule includes a In addition to dependence on development of a heterolo nucleic acid sequence encoding at least one functional 10 gous system (recombinant system, such as could be intro domain of a non-bacterial PUFA PKS system. In other duced into plants), the CLF concept can be exploited in words, the present invention intends to encompass any Schizochytrium (i.e., by modification of a Schizochytrium genetically modified organism (e.g., microorganism or genome). Transformation and homologous recombination plant), wherein the organism comprises at least one non has been demonstrated in Schizochytrium. One can exploit bacterial PUFA PKS domain function (either endogenously 15 this by constructing a clone with the CLF of OrfB replaced or by recombinant modification), and wherein the genetic with a CLF from a C20 PUFA-PKS system. A marker gene modification has a measurable effect on the non-bacterial will be inserted downstream of the coding region. One can PUFA PKS domain function or on the PKS system when the then transform the wild type cells, select for the marker organism comprises a functional PKS system. phenotype and then screen for those that had incorporated Therefore, using the non-bacterial PUFA PKS systems of the new CLF. Again, one would analyze these for any effects the present invention, which, for example, makes use of on fatty acid profiles to identify transformants producing genes from Thraustochytrid PUFA PKS systems, gene mix EPA and/or ARA. If some factor other than those associated with the CLF are found to influence the chain length of the ing can be used to extend the range of PUFA products to end product, a similar strategy could be employed to alter include EPA, DHA, ARA, GLA, SDA and others, as well as those factors. to produce a wide variety of bioactive molecules, including 25 antibiotics, other pharmaceutical compounds, and other Another preferred embodiment involving alteration of the desirable products. The method to obtain these bioactive PUFA-PKS products involves modification or substitution molecules includes not only the mixing of genes from of the f-hydroxy acyl-ACP dehydrase/keto synthase pairs. various organisms but also various methods of genetically During cis-vaccenic acid (C18:1, A11) synthesis in E. coli, modifying the non-bacterial PUFA PKS genes disclosed 30 creation of the cis double bond is believed to depend on a herein. Knowledge of the genetic basis and domain structure specific DH enzyme, B-hydroxy acyl-ACP dehydrase, the of the non-bacterial PUFA PKS system of the present product of the FabA gene. This enzyme removes HOH from invention provides a basis for designing novel genetically a B-keto acyl-ACP and leaves a trans double bond in the modified organisms which produce a variety of bioactive carbon chain. A subset of DH's, Fab A-like, possess cis-trans molecules. Although mixing and modification of any PKS 35 isomerase activity (Heath et al., 1996, supra). A novel aspect domains and related genes are contemplated by the present of bacterial and non-bacterial PUFA-PKS systems is the inventors, by way of example, various possible manipula presence of two Fab A-like DH domains. Without being tions of the PUFA-PKS system are discussed below with bound by theory, the present inventors believe that one or regard to genetic modification and bioactive molecule pro both of these DH domains will possess cis-trans isomerase duction. 40 activity (manipulation of the DH domains is discussed in For example, in one embodiment, non-bacterial PUFA greater detail below). PKS system products, such as those produced by Thraus Another aspect of the unsaturated fatty acid synthesis in tochytrids, are altered by modifying the CLF (chain length E. coli is the requirement for a particular KS enzyme, factor) domain. This domain is characteristic of Type II B-ketoacyl-ACP synthase, the product of the FabB gene. (dissociated enzymes) PKS systems. Its amino acid 45 This is the enzyme that carries out condensation of a fatty sequence shows homology to KS (keto synthase pairs) acid, linked to a cysteine residue at the active site (by a domains, but it lacks the active site cysteine. CLF may thio-ester bond), with a malonyl-ACP. In the multi-step function to determine the number of elongation cycles, and reaction, CO is released and the linear chain is extended by hence the chain length, of the end product. In this embodi two carbons. It is believed that only this KS can extend a ment of the invention, using the current state of knowledge 50 carbon chain that contains a double bond. This extension of FAS and PKS synthesis, a rational strategy for production occurs only when the double bond is in the cis configuration; of ARA by directed modification of the non-bacterial PUFA if it is in the trans configuration, the double bond is reduced PKS system is provided. There is controversy in the litera by enoyl-ACP reductase (ER) prior to elongation (Heath et ture concerning the function of the CLF in PKS systems (C. al., 1996, supra). All of the PUFA-PKS systems character Bisang et al., Nature 401, 502 (1999) and it is realized that 55 ized so far have two KS domains, one of which shows other domains may be involved in determination of the chain greater homology to the FabB-like KS of E. coli than the length of the end product. However, it is significant that other. Again, without being bound by theory, the present Schizochytrium produces both DHA (C22:6, ()-3) and DPA inventors believe that in PUFA-PKS systems, the specifici (C22:5, co-6). In the PUFA-PKS system the cis double bonds ties and interactions of the DH (FabA-like) and KS (FabB are introduced during synthesis of the growing carbon chain. 60 like) enzymatic domains determine the number and place Since placement of the co-3 and co-6 double bonds occurs ment of cis double bonds in the end products. Because the early in the synthesis of the molecules, one would not expect number of 2-carbon elongation reactions is greater than the that they would affect Subsequent end-product chain length number of double bonds present in the PUFA-PKS end determination. Thus, without being bound by theory, the products, it can be determined that in some extension cycles present inventors believe that introduction of a factor (e.g. 65 complete reduction occurs. Thus the DH and KS domains CLF) that directs synthesis of C20 units (instead of C22 can be used as targets for alteration of the DHA/DPA ratio units) into the Schizochytrium PUFA-PKS system will result or ratios of other long chain fatty acids. These can be US 7,256,023 B2 51 52 modified and/or evaluated by introduction of homologous even though the corresponding fungal PPTase was not domains from other systems or by mutagenesis of these gene present in those plants. This suggested that an endogenous fragments. plant PPTase(s) recognized and activated the fungal PKS In another embodiment, the ER (enoyl-ACP reductase— ACP domain. Of relevance to this observation, the present an enzyme which reduces the trans-double bond in the fatty inventors have identified two sequences (genes) in the acyl-ACP resulting in fully saturated carbons) domains can Arabidopsis whole genome database that are likely to be modified or substituted to change the type of product encode PPTases. These sequences (GenBank Accession made by the PKS system. For example, the present inventors numbers: AAG51443 and AAC05345) are currently listed as know that Schizochytrium PUFA-PKS system differs from encoding “Unknown Proteins’. They can be identified as the previously described bacterial systems in that it has two 10 putative PPTases based on the presence in the translated (rather than one) ER domains. Without being bound by protein sequences of several signature motifs including: theory, the present inventors believe these ER domains can G(I/V)D and WXXKE(A/S)XXK (SEQ ID NO:33), (listed in strongly influence the resulting PKS production product. Lambalot et al., 1996 as characteristic of all PPTases). In The resulting PKS product could be changed by separately addition, these two putative proteins contain two additional knocking out the individual domains or by modifying their 15 motifs typically found in PPTases typically associated with nucleotide sequence or by substitution of ER domains from PKS and non-ribosomal peptide synthesis systems; i.e., other organisms. FN(I/L/V)SHS (SEQ ID NO:34) and (I/V/L)G(I/L/V)D(I/ In another embodiment, nucleic acid molecules encoding L/V) (SEQ ID NO:35). Furthermore, these motifs occur in proteins or domains that are not part of a PKS system, but the expected relative positions in the protein sequences. It is which affect a PKS system, can be introduced into an likely that homologues of the Arabidopsis genes are present organism. For example, all of the PUFA PKS systems in other plants. Such as tobacco. Again, these genes can be described above contain multiple, tandem, ACP domains. cloned and expressed to see if the enzymes they encode can ACP (as a separate protein or as a domain of a larger protein) activate the Schizochytrium ORFAACP domains, or alter requires attachment of a phosphopantetheline cofactor to natively, OrfA could be expressed directly in the transgenic produce the active, holo-ACP. Attachment of phosphopan 25 plant (either targeted to the plastid or the cytoplasm). tetheline to the apo-ACP is carried out by members of the Another heterologous PPTase which may recognize the Superfamily of enzymes—the phosphopantetheinyl trans ORFA ACP domains as substrates is the Het I protein of ferases (PPTase) (Lambalot R. H., et al., Chemistry and Nostoc sp. PCC 7120 (formerly called Anabaena sp. PCC Biology, 3,923 (1996)). 7120). As noted in U.S. Pat. No. 6,140,486, several of the By analogy to other PKS and FAS systems, the present 30 PUFA-PKS genes of Shewanella showed a high degree of inventors presume that activation of the multiple ACP homology to protein domains present in a PKS cluster found domains present in the Schizochytrium ORFA protein is in Nostoc (FIG. 2 of that patent). This Nostoc PKS system carried out by a specific, endogenous, PPTase. The gene is associated with the synthesis of long chain (C26 or C28) encoding this presumed PPTase has not yet been identified hydroxy fatty acids that become esterified to Sugar moieties in Schizochytrium. If such a gene is present in 35 and form a part of the heterocyst . These Nostoc Schizochytrium, one can envision several approaches that PKS domains are also highly homologous to the domains could be used in an attempt to identify and clone it. These found in Orfs B and C of the Schizochytrium PKS proteins could include (but would not be limited to): generation and (i.e. the same ones that correspond to those found in the partial sequencing of a cDNA library prepared from actively Shewanella PKS proteins). Until very recently, none of the growing Schizochytrium cells (note, one sequence was iden 40 Nostoc PKS domains present in the GenBank databases tified in the currently available Schizochytrium cDNA showed high homology to any of the domains of library set which showed homology to PPTases; however, Schizochytrium OrfA (or the homologous Shewanella Orf5 it appears to be part of a multidomain FAS protein, and as protein). However, the complete genome of Nostoc has such may not encode the desired OrfA specific PPTase); use recently been sequenced and as a result, the sequence of the of degenerate oligonucleotide primers designed using amino 45 region just upstream of the PKS gene cluster is now avail acid motifs present in many PPTases in PCR reactions (to able. In this region are three Orfs that show homology to the obtain a nucleic acid probe molecule to screen genomic or domains (KS, MAT, ACP and KR) of OrfA (see FIG. 3). cDNA libraries); genetic approaches based on protein-pro Included in this set are two ACP domains, both of which tein interactions (e.g. a yeast two-hybrid system) in which show high homology to the ORFAACP domains. At the end the ORFA-ACP domains would be used as a “bait' to find 50 of the Nostoc PKS cluster is the gene that encodes the Het a “target' (i.e. the PPTase); and purification and partial I PPTase. Previously, it was not obvious what the substrate sequencing of the enzyme itself as a means to generate a of the Het I enzyme could be, however the presence of nucleic acid probe for Screening of genomic or cDNA tandem ACP domains in the newly identified Orf (Hgl E) of libraries. the cluster strongly suggests to the present inventors that it It is also conceivable that a heterologous PPTase may be 55 is those ACPs. The homology of the ACP domains of capable of activating the Schizochytrium ORFA ACP Schizochytrium and Nostoc, as well as the tandem arrange domains. It has been shown that some PPTases, for example ment of the domains in both proteins, makes Het I a likely the sfp enzyme of Bacillus subtilis (Lambalot et al., Supra) candidate for heterologous activation of the Schizochytrium and the SVp enzyme of Streptomyces verticillus (Sanchez et ORFAACPs. The present inventors are believed to be the al., 2001, Chemistry & Biology 8:725-738), have a broad 60 first to recognize and contemplate this use for Nostoc Het I substrate tolerance. These enzymes can be tested to see if PPTase. they will activate the Schizochytrium ACP domains. Also, a As indicated in Metz et al., 2001, supra, one novel feature recent publication described the expression of a fungal PKS of the PUFAPKS systems is the presence of two dehydratase protein in tobacco (Yalpani et al., 2001, The Plant Cell domains, both of which show homology to the FabA pro 13:1401-1409). Products of the introduced PKS system 65 teins of E. coli. With the availability of the new Nostoc PKS (encoded by the 6-methylsalicyclic acid synthase gene of gene sequences mentioned above, one can now compare the Penicillium patulum) were detected in the transgenic plant, two systems and their products. The sequence of domains in US 7,256,023 B2 53 54 the Nostoc cluster (from HglE to Het I) as the present Finally, one could extend the approach by mixing domains inventors have defined them is (see FIG. 3): from the PUFA-PKS system and other PKS or FAS systems (e.g., type I iterative (fungal-like), type II, type I modular) KS-MAT-2xACP, KR, KS, CLF-AT, ER (HetM, to create an entire range of new end products. For example, HetN) HetI one could introduce the PUFA-PKS DH domains into sys In the Schizochytrium PUFA-PKS Orfs A,B&C the sequence tems that do not normally incorporate cis double bonds into their end products. (OrfA-B-C) is: Accordingly, encompassed by the present invention are KS-MAT-9xACP-KRKS-CLF-ATER DEH-DH-ER methods to genetically modify microbial or plant cells by: 10 genetically modifying at least one nucleic acid sequence in One can see the correspondence of the domains sequence the organism that encodes an amino acid sequence having (there is also a high amino acid sequence homology). The the biological activity of at least one functional domain of a product of the Nostoc PKS system is a long chain hydroxy non-bacterial PUFA PKS system according to the present fatty acid (C26 or C28 with one or two hydroxy groups) that invention, and/or expressing at least one recombinant contains no double bonds (cis or trans). The product of the 15 nucleic acid molecule comprising a nucleic acid sequence Schizochytrium PKS system is a long chain polyunsaturated encoding Such amino acid sequence. Various embodiments fatty acid (C22, with 5 or 6 double bonds—all cis). An of Such sequences, methods to genetically modify an organ obvious difference between the two domain sets is the ism, and specific modifications have been described in detail presence of the two DH domains in the Schizochytrium above. Typically, the method is used to produce a particular proteins just the domains implicated in the formation of genetically modified organism that produces a particular the cis double bonds of DHA and DPA (presumably HetM bioactive molecule or molecules. and HetN in the Nostoc system are involved in inclusion of One embodiment of the present invention relates to a the hydroxyl groups and also contain a DH domain whose recombinant host cell which has been modified to express a origin differs from the those found in the PUFA). Also, the polyunsaturated fatty acid (PUFA) polyketide synthase role of the duplicated ER domain in the Schizochytrium Orfs 25 (PKS) system, wherein the PKS catalyzes both iterative and B and C is not known (the second ER domain in is not non-iterative enzymatic reactions, and wherein the PUFA present other characterized PUFA PKS systems). The amino PKS system comprises: (a) at least two enoyl ACP-reductase acid sequence homology between the two sets of domains (ER) domains; (b) at least six acyl carrier protein (ACP) implies an evolutionary relationship. One can conceive of domains; (c) at least two B-keto acyl-ACP synthase (KS) the PUFA PKS gene set being derived from (in an evolu 30 domains; (d) at least one acyltransferase (AT) domain; (e) at tionary sense) an ancestral Nostoc-like PKS gene set by least one ketoreductase (KR) domain; (f) at least two FabA incorporation of the DH (FabA-like) domains. The addition like B-hydroxy acyl-ACP dehydrase (DH) domains; (g) at of the DH domains would result in the introduction of cis least one chain length factor (CLF) domain; and (h) at least double bonds in the new PKS end product structure. one malonyl-CoA:ACP acyltransferase (MAT) domain. In The comparisons of the Schizochytrium and Nostoc PKS 35 one embodiment, the PUFA PKS system is a eukaryotic domain structures as well as the comparison of the domain PUFA PKS system. In a preferred embodiment, the PUFA organization between the Schizochytrium and Shewanella PKS system is an algal PUFA PKS system. In a more PUFA-PKS proteins demonstrate nature’s ability to alter preferred embodiment, the PUFA PKS system is a Thraus domain order as well as incorporate new domains to create tochytriales PUFA PKS system. Such PUFA PKS systems novel end products. In addition, the genes can now be 40 can include, but are not limited to, a Schizochytrium PUFA manipulated in the laboratory to create new products. The PKS system, and a Thraustochytrium PUFA PKS system. In implication from these observations is that it should be one embodiment, the PUFAPKS system can be expressed in possible to continue to manipulate the systems in either a a prokaryotic host cell. In another embodiment, the PUFA directed or random way to influence the end products. For PKS system can be expressed in a eukaryotic host cell. example, in a preferred embodiment, one could envision 45 Another embodiment of the present invention relates to a substituting one of the DH (FabA-like) domains of the recombinant host cell which has been modified to express a PUFA-PKS system for a DH domain that did not posses non-bacterial PUFA PKS system, wherein the PKS system isomerization activity, potentially creating a molecule with catalyzes both iterative and non-iterative enzymatic reac a mix of cis- and trans-double bonds. The current products tions, and wherein the non-bacterial PUFA PKS system of the Schizochytrium PUFA PKS system are DHA and DPA 50 comprises at least the following biologically active domains: (C22:5 (O6). If one manipulated the system to produce C20 (a) at least one enoyl ACP-reductase (ER) domain; (b) fatty acids, one would expect the products to be EPA and multiple acyl carrier protein (ACP) domains (at least four); ARA (C20:4 (O6). This could provide a new source for ARA. (c) at least two B-ketoacyl-ACP synthase (KS) domains; (d) One could also substitute domains from related PUFA-PKS at least one acyltransferase (AT) domain; (e) at least one systems that produced a different DHA to DPA ratio—for 55 ketoreductase (KR) domain; (f) at least two Fab A-like example by using genes from Thraustochytrium 23B (the B-hydroxyacyl-ACP dehydrase (DH) domains; (g) at least PUFA PKS system of which is identified for the first time one chain length factor (CLF) domain; and (h) at least one herein). malonyl-CoA:ACP acyltransferase (MAT) domain. Additionally, one could envision specifically altering one One aspect of this embodiment of the invention relates to of the ER domains (e.g. removing, or inactivating) in the 60 a method to produce a product containing at least one PUFA, Schizochytrium PUFA PKS system (other PUFA PKS sys comprising growing a plant comprising any of the recom tems described so far do not have two ER domains) to binant host cells described above, wherein the recombinant determine its effect on the end product profile. Similar host cell is a plant cell, under conditions effective to produce strategies could be attempted in a directed manner for each the product. Another aspect of this embodiment of the of the distinct domains of the PUFA-PKS proteins using 65 invention relates to a method to produce a product contain more or less Sophisticated approaches. Of course one would ing at least one PUFA, comprising culturing a culture not be limited to the manipulation of single domains. containing any of the recombinant host cells described US 7,256,023 B2 55 56 above, wherein the host cell is a microbial cell, under in that microorganism, in another microorganism, or in a conditions effective to produce the product. In a preferred higher plant, to produce a variety of compounds. embodiment, the PKS system in the host cell catalyzes the It is recognized that many genetic alterations, either direct production of triglycerides. random or directed, which one may introduce into a native Another embodiment of the present invention relates to a (endogenous, natural) PKS system, will result in an inacti microorganism comprising a non-bacterial, polyunsaturated vation of enzymatic functions. A preferred embodiment of fatty acid (PUFA) polyketide synthase (PKS) system, the invention includes a system to select for only those wherein the PKS catalyzes both iterative and non-iterative modifications that do not block the ability of the PKS system enzymatic reactions, and wherein the PUFA PKS system to produce a product. For example, the FabB-strain of E. coli comprises: (a) at least two enoyl ACP-reductase (ER) 10 is incapable of synthesizing unsaturated fatty acids and domains; (b) at least six acyl carrier protein (ACP) domains; requires Supplementation of the medium with fatty acids that (c) at least two B-ketoacyl-ACP synthase (KS) domains; (d) can Substitute for its normal unsaturated fatty acids in order at least one acyltransferase (AT) domain; (e) at least one to grow (see Metz et al., 2001, Supra). However, this ketoreductase (KR) domain; (f) at least two Fab A-like requirement (for Supplementation of the medium) can be B-hydroxyacyl-ACP dehydrase (DH) domains; (g) at least 15 removed when the strain is transformed with a functional one chain length factor (CLF) domain; and (h) at least one PUFA-PKS system (i.e. one that produces a PUFA product malonyl-CoA:ACP acyltransferase (MAT) domain. Prefer in the E. coli host—see (Metz et al., 2001, Supra, FIG. 2A). ably, the microorganism is a non-bacterial microorganism The transformed FabB-strain now requires a functional and more preferably, a eukaryotic microorganism. PUFA-PKS system (to produce the unsaturated fatty acids) Yet another embodiment of the present invention relates for growth without supplementation. The key element in this to a microorganism comprising a non-bacterial, polyunsatu example is that production of a wide range of unsaturated rated fatty acid (PUFA) polyketide synthase (PKS) system, fatty acid will suffice (even unsaturated fatty acid substitutes wherein the PKS catalyzes both iterative and non-iterative such as branched chain fatty acids). Therefore, in another enzymatic reactions, and wherein the PUFA PKS system preferred embodiment of the invention, one could create a comprises: (a) at least one enoyl ACP-reductase (ER) 25 large number of mutations in one or more of the PUFAPKS domain; (b) multiple acyl carrier protein (ACP) domains (at genes disclosed herein, and then transform the appropriately least four); (c) at least two f-ketoacyl-ACP synthase (KS) modified FabB-strain (e.g. create mutations in an expression domains; (d) at least one acyltransferase (AT) domain; (e) at construct containing an ER domain and transform a FabB least one ketoreductase (KR) domain; (f) at least two FabA strain having the other essential domains on a separate like B-hydroxy acyl-ACP dehydrase (DH) domains; (g) at 30 plasmid-or integrated into the chromosome) and select least one chain length factor (CLF) domain; and (h) at least only for those transformants that grow without Supplemen one malonyl-CoA:ACP acyltransferase (MAT) domain. tation of the medium (i.e., that still possessed an ability to In one embodiment of the present invention, it is con produce a molecule that could complement the FabB-de templated that a mutagenesis program could be combined fect). Additional screens could be developed to look for with a selective Screening process to obtain bioactive mol 35 particular compounds (e.g. use of GC for fatty acids) being ecules of interest. This would include methods to search for produced in this selective subset of an active PKS system. a range of bioactive compounds. This search would not be One could envision a number of similar selective screens for restricted to production of those molecules with cis double bioactive molecules of interest. bonds. The mutagenesis methods could include, but are not As described above, in one embodiment of the present limited to: chemical mutagenesis, gene shuffling, Switching 40 invention, a genetically modified microorganism or plant regions of the genes encoding specific enzymatic domains, includes a microorganism or plant which has an enhanced or mutagenesis restricted to specific regions of those genes, ability to synthesize desired bioactive molecules (products) as well as other methods. or which has a newly introduced ability to synthesize For example, high throughput mutagenesis methods could specific products (e.g., to synthesize a specific antibiotic). be used to influence or optimize production of the desired 45 According to the present invention, “an enhanced ability to bioactive molecule. Once an effective model system has synthesize' a product refers to any enhancement, or up been developed, one could modify these genes in a high regulation, in a pathway related to the synthesis of the throughput manner. Utilization of these technologies can be product such that the microorganism or plant produces an envisioned on two levels. First, if a sufficiently selective increased amount of the product (including any production screen for production of a product of interest (e.g., ARA) can 50 of a product where there was none before) as compared to be devised, it could be used to attempt to alter the system to the wild-type microorganism or plant, cultured or grown, produce this product (e.g., in lieu of or in concert with, other under the same conditions. Methods to produce Such geneti strategies such as those discussed above). Additionally, if the cally modified organisms have been described in detail strategies outlined above resulted in a set of genes that did above. produce the product of interest, the high throughput tech 55 One embodiment of the present invention is a method to nologies could then be used to optimize the system. For produce desired bioactive molecules (also referred to as example, if the introduced domain only functioned at rela products or compounds) by growing or culturing a geneti tively low temperatures, selection methods could be devised cally modified microorganism or plant of the present inven to permit removing that limitation. In one embodiment of the tion (described in detail above). Such a method includes the invention, Screening methods are used to identify additional 60 step of culturing in a fermentation medium or growing in a non-bacterial organisms having novel PKS systems similar Suitable environment, such as Soil, a microorganism or plant, to the PUFA PKS system of Schizochytrium, as described respectively, that has a genetic modification as described herein (see above). Homologous PKS systems identified in previously herein and in accordance with the present inven Such organisms can be used in methods similar to those tion. In a preferred embodiment, method to produce bioac described herein for the Schizochytrium, as well as for an 65 tive molecules of the present invention includes the step of additional source of genetic material from which to create, culturing under conditions effective to produce the bioactive further modify and/or mutate a PKS system for expression molecule a genetically modified organism that expresses a US 7,256,023 B2 57 58 PKS system comprising at least one biologically active Zation. Alternatively, microorganisms producing the desired domain of a polyunsaturated fatty acid (PUFA) polyketide compound, or extracts and various fractions thereof, can be synthase (PKS) system. In this preferred aspect, at least one used without removal of the microorganism components domain of the PUFA PKS system is encoded by a nucleic from the product. acid sequence selected from the group consisting of: (a) a In the method for production of desired bioactive com nucleic acid sequence encoding at least one domain of a pounds of the present invention, a genetically modified plant polyunsaturated fatty acid (PUFA) polyketide synthase is cultured in a fermentation medium or grown in a Suitable (PKS) system from a Thraustochytrid microorganism; (b) a medium such as soil. An appropriate, or effective, fermen nucleic acid sequence encoding at least one domain of a tation medium has been discussed in detail above. A suitable PUFA PKS system from a microorganism identified by the 10 growth medium for higher plants includes any growth novel screening method of the present invention (described medium for plants, including, but not limited to, Soil, sand, above in detail); (c) a nucleic acid sequence encoding an any other particulate media that Support root growth (e.g. amino acid sequence selected from the group consisting of Vermiculite, perlite, etc.) or Hydroponic culture, as well as SEQ ID NO:2, SEQID NO:4, SEQ ID NO:6, and biologi Suitable light, water and nutritional Supplements which cally active fragments thereof; (d) a nucleic acid sequence 15 optimize the growth of the higher plant. The genetically encoding an amino acid sequence selected from the group modified plants of the present invention are engineered to consisting of: SEQ ID NO:8, SEQ ID NO:10, SEQ ID produce significant quantities of the desired product through NO:13, SEQ ID NO:18, SEQ ID NO:20, SEQ ID NO:22, the activity of the PKS system that is genetically modified SEQ ID NO:24, SEQ ID NO:26, SEQ ID NO:28, SEQ ID according to the present invention. The compounds can be NO:30, SEQ ID NO:32, and biologically active fragments recovered through purification processes which extract the thereof (e) a nucleic acid sequence encoding an amino acid compounds from the plant. In a preferred embodiment, the sequence that is at least about 60% identical to at least 500 compound is recovered by harvesting the plant. In this consecutive amino acids of an amino acid sequence selected embodiment, the plant can be consumed in its natural state from the group consisting of: SEQID NO:2, SEQID NO:4, or further processed into consumable products. and SEQ ID NO:6; wherein the amino acid sequence has a 25 As described above, a genetically modified microorgan biological activity of at least one domain of a PUFA PKS ism useful in the present invention can, in one aspect, system; and, (f) a nucleic acid sequence encoding an amino endogenously contain and express a PUFAPKS system, and acid sequence that is at least about 60% identical to an amino the genetic modification can be a genetic modification of one acid sequence selected from the group consisting of: SEQID or more of the functional domains of the endogenous PUFA NO:8, SEQ ID NO:10, SEQ ID NO:13, SEQ ID NO:18, 30 PKS system, whereby the modification has some effect on SEQ ID NO:20, SEQ ID NO:22, SEQ ID NO:24, SEQ ID the activity of the PUFAPKS system. In another aspect, such NO:26, SEQ ID NO:28, SEQ ID NO:30, and SEQ ID an organism can endogenously contain and express a PUFA NO:32; wherein the amino acid sequence has a biological PKS system, and the genetic modification can be an intro activity of at least one domain of a PUFA PKS system. In duction of at least one exogenous nucleic acid sequence this preferred aspect of the method, the organism is geneti 35 (e.g., a recombinant nucleic acid molecule), wherein the cally modified to affect the activity of the PKS system exogenous nucleic acid sequence encodes at least one bio (described in detail above). Preferred host cells for genetic logically active domain or protein from a second PKS modification related to the PUFA PKS system of the inven system and/or a protein that affects the activity of said PUFA tion are described above. PKS system (e.g., a phosphopantetheinyl transferases In the method of production of desired bioactive com 40 (PPTase), discussed below). In yet another aspect, the organ pounds of the present invention, a genetically modified ism does not necessarily endogenously (naturally) contain a microorganism is cultured or grown in a Suitable medium, PUFA PKS system, but is genetically modified to introduce under conditions effective to produce the bioactive com at least one recombinant nucleic acid molecule encoding an pound. An appropriate, or effective, medium refers to any amino acid sequence having the biological activity of at least medium in which a genetically modified microorganism of 45 one domain of a PUFA PKS system. In this aspect, PUFA the present invention, when cultured, is capable of produc PKS activity is affected by introducing or increasing PUFA ing the desired product. Such a medium is typically an PKS activity in the organism. Various embodiments associ aqueous medium comprising assimilable carbon, nitrogen ated with each of these aspects have been discussed in detail and phosphate sources. Such a medium can also include above. appropriate salts, minerals, metals and other nutrients. 50 In one embodiment of the method to produce bioactive Microorganisms of the present invention can be cultured in compounds, the genetic modification changes at least one conventional fermentation bioreactors. The microorganisms product produced by the endogenous PKS system, as com can be cultured by any fermentation process which includes, pared to a wild-type organism. but is not limited to, batch, fed-batch, cell recycle, and In another embodiment, the organism endogenously continuous fermentation. Preferred growth conditions for 55 expresses a PKS system comprising the at least one biologi potential host microorganisms according to the present cally active domain of the PUFA PKS system, and the invention are well known in the art. The desired bioactive genetic modification comprises transfection of the organism molecules produced by the genetically modified microor with a recombinant nucleic acid molecule selected from the ganism can be recovered from the fermentation medium group consisting of a recombinant nucleic acid molecule using conventional separation and purification techniques. 60 encoding at least one biologically active domain from a For example, the fermentation medium can be filtered or second PKS system and a recombinant nucleic acid mol centrifuged to remove microorganisms, cell debris and other ecule encoding a protein that affects the activity of the PUFA particulate matter, and the product can be recovered from the PKS system. In this embodiment, the genetic modification cell-free Supernatant by conventional methods, such as, for preferably changes at least one product produced by the example, ion exchange, chromatography, extraction, Solvent 65 endogenous PKS system, as compared to a wild-type organ extraction, membrane separation, electrodialysis, reverse ism. A second PKS system can include another PUFA PKS osmosis, distillation, chemical derivatization and crystalli system (bacterial or non-bacterial), a type I iterative (fungal US 7,256,023 B2 59 60 like) PKS system, a type II PKS system, and/or a type I having a biological activity of at least one functional domain modular PKS system. Examples of proteins that affect the of a non-bacterial PUFA PKS system as described herein. activity of a PKS system have been described above (e.g., Such bioactive molecules can include, but are not limited to: PPTase). a polyunsaturated fatty acid (PUFA), an anti-inflammatory In another embodiment, the organism is genetically modi- 5 formulation, a chemotherapeutic agent, an active excipient, fied by transfection with a recombinant nucleic acid mol an osteoporosis drug, an anti-depressant, an anti-convulsant, ecule encoding the at least one domain of the polyunsatu an anti-Heliobactor pylori drug, a drug for treatment of rated fatty acid (PUFA) polyketide synthase (PKS) system. neurodegenerative disease, a drug for treatment of degen Such recombinant nucleic acid molecules have been erative liver disease, an antibiotic, and a cholesterol lower described in detail previously herein. 10 ing formulation. One advantage of the non-bacterial PUFA In another embodiment, the organism endogenously PKS system of the present invention is the ability of such a expresses a non-bacterial PUFAPKS system, and the genetic system to introduce carbon-carbon double bonds in the cis modification comprises Substitution of a domain from a configuration, and molecules including a double bond at different PKS system for a nucleic acid sequence encoding every third carbon. This ability can be utilized to produce a at least one domain of the non-bacterial PUFA PKS system. 15 variety of compounds. In another embodiment, the organism endogenously Preferably, bioactive compounds of interest are produced expresses a non-bacterial PUFA PKS system that has been by the genetically modified microorganism in an amount modified by transfecting the organism with a recombinant that is greater than about 0.05%, and preferably greater than nucleic acid molecule encoding a protein that regulates the about 0.1%, and more preferably greater than about 0.25%, chain length of fatty acids produced by the PUFA PKS 20 and more preferably greater than about 0.5%, and more system. In one aspect, the recombinant nucleic acid mol preferably greater than about 0.75%, and more preferably ecule encoding a protein that regulates the chain length of greater than about 1%, and more preferably greater than fatty acids replaces a nucleic acid sequence encoding a chain about 2.5%, and more preferably greater than about 5%, and length factor in the non-bacterial PUFA PKS system. In more preferably greater than about 10%, and more prefer another aspect, the protein that regulates the chain length of 25 ably greater than about 15%, and even more preferably fatty acids produced by the PUFA PKS system is a chain greater than about 20% of the dry weight of the microor length factor. In another aspect, the protein that regulates the ganism. For lipid compounds, preferably, such compounds chain length of fatty acids produced by the PUFA PKS are produced in an amount that is greater than about 5% of system is a chain length factor that directs the synthesis of the dry weight of the microorganism. For other bioactive C20 units. 30 compounds, such as antibiotics or compounds that are In another embodiment, the organism expresses a non synthesized in Smaller amounts, those strains possessing bacterial PUFA PKS system comprising a genetic modifi such compounds at of the dry weight of the microorganism cation in a domain selected from the group consisting of a are identified as predictably containing a novel PKS system domain encoding B-hydroxyacyl-ACP dehydrase (DH) and of the type described above. In some embodiments, particu a domain encoding B-ketoacyl-ACP synthase (KS), wherein 35 lar bioactive molecules (compounds) are secreted by the the modification alters the ratio of long chain fatty acids microorganism, rather than accumulating. Therefore, Such produced by the PUFA PKS system as compared to in the bioactive molecules are generally recovered from the culture absence of the modification. In one aspect of this embodi medium and the concentration of molecule produced will ment, the modification is selected from the group consisting vary depending on the microorganism and the size of the of a deletion of all or a part of the domain, a substitution of 40 culture. a homologous domain from a different organism for the One embodiment of the present invention relates to a domain, and a mutation of the domain. method to modify an endproduct containing at least one fatty In another embodiment, the organism expresses a non acid, comprising adding to said endproduct an oil produced bacterial PUFA PKS system comprising a modification in an by a recombinant host cell that expresses at least one enoyl-ACP reductase (ER) domain, wherein the modifica- 45 recombinant nucleic acid molecule comprising a nucleic tion results in the production of a different compound as acid sequence encoding at least one biologically active compared to in the absence of the modification. In one domain of a PUFA PKS system. The PUFA PKS system is aspect of this embodiment, the modification is selected from any non-bacterial PUFA PKS system, and preferably, is the group consisting of a deletion of all or a part of the ER selected from the group of: (a) a nucleic acid sequence domain, a substitution of an ER domain from a different 50 encoding at least one domain of a polyunsaturated fatty acid organism for the ER domain, and a mutation of the ER (PUFA) polyketide synthase (PKS) system from a Thraus domain. tochytrid microorganism; (b) a nucleic acid sequence encod In one embodiment of the method to produce a bioactive ing at least one domain of a PUFA PKS system from a molecule, the organism produces a polyunsaturated fatty microorganism identified by the novel screening method acid (PUFA) profile that differs from the naturally occurring 55 disclosed herein; (c) a nucleic acid sequence encoding an organism without a genetic modification. amino acid sequence selected from the group consisting of Many other genetic modifications useful for producing SEQID NO:2, SEQID NO:4, SEQID NO:6, and biologi bioactive molecules will be apparent to those of skill in the cally active fragments thereof; (d) a nucleic acid sequence art, given the present disclosure, and various other modifi encoding an amino acid sequence selected from the group cations have been discussed previously herein. The present 60 consisting of: SEQ ID NO:8, SEQ ID NO:10, SEQ ID invention contemplates any genetic modification related to a NO:13, SEQ ID NO:18, SEQ ID NO:20, SEQ ID NO:22, PUFA PKS system as described herein which results in the SEQ ID NO:24, SEQ ID NO:26, SEQ ID NO:28, SEQ ID production of a desired bioactive molecule. NO:30, SEQ ID NO:32, and biologically active fragments Bioactive molecules, according to the present invention, thereof: (e) a nucleic acid sequence encoding an amino acid include any molecules (compounds, products, etc.) that have 65 sequence that is at least about 60% identical to at least 500 a biological activity, and that can be produced by a PKS consecutive amino acids of an amino acid sequence selected system that comprises at least one amino acid sequence from the group consisting of: SEQID NO:2, SEQID NO:4, US 7,256,023 B2 61 62 and SEQ ID NO:6; wherein the amino acid sequence has a and biologically active fragments thereof; (e) a nucleic acid biological activity of at least one domain of a PUFA PKS sequence encoding an amino acid sequence that is at least system; and, (f) a nucleic acid sequence encoding an amino about 60% identical to at least 500 consecutive amino acids acid sequence that is at least about 60% identical to an amino of an amino acid sequence selected from the group consist acid sequence selected from the group consisting of: SEQID ing of: SEQ ID NO:2, SEQ ID NO:4, and SEQ ID NO:6; NO:8, SEQ ID NO:10, SEQ ID NO:13, SEQ ID NO:18, wherein the amino acid sequence has a biological activity of SEQ ID NO:20, SEQ ID NO:22, SEQ ID NO:24, SEQ ID at least one domain of a PUFA PKS system; and/or (f) a NO:26, SEQ ID NO:28, SEQ ID NO:30, and SEQ ID nucleic acid sequence encoding an amino acid sequence that NO:32; wherein the amino acid sequence has a biological is at least about 60% identical to an amino acid sequence activity of at least one domain of a PUFA PKS system. 10 selected from the group consisting of: SEQ ID NO:8, SEQ Variations of these nucleic acid sequences have been ID NO:10, SEQID NO:13, SEQID NO:18, SEQID NO:20, described in detail above. SEQ ID NO:22, SEQ ID NO:24, SEQ ID NO:26, SEQ ID Preferably, the endproduct is selected from the group NO:28, SEQ ID NO:30, and SEQ ID NO:32; wherein the consisting of a food, a dietary Supplement, a pharmaceutical amino acid sequence has a biological activity of at least one formulation, a humanized animal milk, and an infant for 15 domain of a PUFA PKS system. mula. Suitable pharmaceutical formulations include, but are Methods to genetically modify a host cell and to produce not limited to, an anti-inflammatory formulation, a chemo a genetically modified non-human, milk-producing animal, therapeutic agent, an active excipient, an osteoporosis drug, are known in the art. Examples of host animals to modify an anti-depressant, an anti-convulsant, an anti-Heliobactor include cattle, sheep, pigs, goats, yaks, etc., which are pylori drug, a drug for treatment of neurodegenerative amenable to genetic manipulation and cloning for rapid disease, a drug for treatment of degenerative liver disease, expansion of a transgene expressing population. For ani an antibiotic, and a cholesterol lowering formulation. In one mals, PKS-like transgenes can be adapted for expression in embodiment, the endproduct is used to treat a condition target organelles, tissues and body fluids through modifica selected from the group consisting of chronic inflammation, tion of the gene regulatory regions. Of particular interest is acute inflammation, gastrointestinal disorder, cancer, 25 the production of PUFAs in the breast milk of the host cachexia, cardiac restenosis, neurodegenerative disorder, animal. degenerative disorder of the liver, blood lipid disorder, The following examples are provided for the purpose of osteoporosis, osteoarthritis, autoimmune disease, preec illustration and are not intended to limit the scope of the lampsia, preterm birth, age related maculopathy, pulmonary present invention. disorder, and peroxisomal disorder. 30 Suitable food products include, but are not limited to, fine EXAMPLES bakery wares, bread and rolls, breakfast cereals, processed and unprocessed cheese, condiments (ketchup, mayonnaise, Example 1 etc.), dairy products (milk, yogurt), puddings and gelatine desserts, carbonated drinks, teas, powdered beverage mixes, 35 The following example describes the further analysis of processed fish products, fruit-based drinks, chewing gum, PKS related sequences from Schizochytrium. hard confectionery, frozen dairy products, processed meat The present inventors have sequenced the genomic DNA products, nut and nut-based spreads, pasta, processed poul including the entire length of all three open reading frames try products, gravies and sauces, potato chips and other (Orfs) in the Schizochytrium PUFA PKS system using the chips or crisps, chocolate and other confectionery, Soups and 40 general methods outlined in Examples 8 and 9 from PCT Soup mixes, Soya based products (milks, drinks, creams, Publication No. WO 0042195 and U.S. application Ser. No. whiteners), vegetable oil-based spreads, and vegetable 09/231,899. The biologically active domains in the based drinks. Schizochytrium PKS proteins are depicted graphically in Yet another embodiment of the present invention relates FIG. 1. The domain structure of the Schizochytrium PUFA to a method to produce a humanized animal milk. This 45 PKS system is described more particularly as follows. method includes the steps of genetically modifying milk producing cells of a milk-producing animal with at least one Open Reading Frame A (OrfA): recombinant nucleic acid molecule comprising a nucleic The complete nucleotide sequence for OrfA is represented acid sequence encoding at least one biologically active herein as SEQID NO:1. OrfA is a 8730 nucleotide sequence domain of a PUFA PKS system. The PUFA PKS system is 50 (not including the stop codon) which encodes a 2910 amino a non-bacterial PUFA PKS system, and preferably, the at acid sequence, represented herein as SEQID NO:2. Within least one domain of the PUFA PKS system is encoded by a OrfA are twelve domains: nucleic acid sequence selected from the group consisting of (a) one B-keto acyl-ACP synthase (KS) domain; (a) a nucleic acid sequence encoding at least one domain of (b) one malonyl-CoA:ACP acyltransferase (MAT) a polyunsaturated fatty acid (PUFA) polyketide synthase 55 domain; (PKS) system from a Thraustochytrid microorganism; (b) a (c) nine acyl carrier protein (ACP) domains; nucleic acid sequence encoding at least one domain of a (d) one ketoreductase (KR) domain. PUFA PKS system from a microorganism identified by the The domains contained within OrfA have been deter novel screening method described previously herein; (c) a mined based on: nucleic acid sequence encoding an amino acid sequence 60 (1) results of an analysis with Pfam program (Pfam is a selected from the group consisting of: SEQID NO:2, SEQ database of multiple alignments of protein domains or ID NO:4, SEQ ID NO:6, and biologically active fragments conserved protein regions. The alignments represent some thereof; (d) a nucleic acid sequence encoding an amino acid evolutionary conserved structure that has implications for sequence selected from the group consisting of SEQ ID the protein’s function. Profile hidden Markov models (pro NO:8, SEQ ID NO:10, SEQ ID NO:13, SEQ ID NO:18, 65 file HMMs) built from the Pfam alignments can be very SEQ ID NO:20, SEQ ID NO:22, SEQ ID NO:24, SEQ ID useful for automatically recognizing that a new protein NO:26, SEQ ID NO:28, SEQ ID NO:30, SEQ ID NO:32, belongs to an existing protein family, even if the homology US 7,256,023 B2 63 64 is weak. Unlike standard pairwise alignment methods (e.g. is represented herein as SEQ ID NO:12 (positions 3343 BLAST, FASTA), Pfam HMMs deal sensibly with multido 3600 of SEQID NO:1). The amino acid sequence containing main proteins. The reference provided for the Pfam version the first ACP domain spans from about position 1115 to used is: Bateman A, Birney E. Cerruti L. Durbin R, Etwiller about position 1200 of SEQ ID NO:2. The amino acid L., Eddy S R, Griffiths-Jones S. Howe K L, Marshall M, 5 sequence containing the ORFA-ACP1 domain is represented Sonnhammer E L (2002) Nucleic Acids Research 30(1):276 herein as SEQ ID NO:13 (positions 1115-1200 of SEQ ID 280); and/or NO:2). It is noted that the ORFA-ACP1 domain contains an (2) homology comparison to bacterial PUFA-PKS sys active site motif: LGIDS* (pantetheline binding motif tems (e.g., Shewanella) using a BLAST 2.0 Basic BLAST Ss), represented herein by SEQID NO:14. The nucleotide homology search using blastp for amino acid searches with 10 and amino acid sequences of all nine ACP domains are standard default parameters, wherein the query sequence is highly conserved and therefore, the sequence for each filtered for low complexity regions by default (described in domain is not represented herein by an individual sequence Altschul, S. F., Madden, T. L., Schääffer, A. A., Zhang, J., identifier. However, based on this information, one of skill Zhang, Z. Miller, W. & Lipman, D. J. (1997) “Gapped in the art can readily determine the sequence for each of the BLAST and PSI-BLAST: a new generation of protein data 15 other eight ACP domains. The repeat interval for the nine base search programs. Nucleic Acids Res. 25:3389-3402, domains is approximately about 110 to about 330 nucle incorporated herein by reference in its entirety). otides of SEQ ID NO:1. Sequences provided for individual domains are believed All nine ACP domains together span a region of OrfA of to contain the full length of the sequence encoding a from about position 3283 to about position 6288 of SEQID functional domain, and may contain additional flanking NO:1, which corresponds to amino acid positions of from sequence within the Orf. about 1095 to about 2096 of SEQ ID NO:2. This region includes the linker segments between individual ACP ORFA-KS domains. Each of the nine ACP domains contains a panteth The first domain in OrfA is a KS domain, also referred to eine binding motif LGIDS* (represented herein by SEQ ID herein as ORFA-KS. This domain is contained within the 25 NO:14), wherein * is the pantetheline binding site S. At each nucleotide sequence spanning from a starting point of end of the ACP domain region and between each ACP between about positions 1 and 40 of SEQID NO:1 (OrfA) domain is a region that is highly enriched for proline (P) and to an ending point of between about positions 1428 and 1500 alanine (A), which is believed to be a linker region. For of SEQ ID NO:1. The nucleotide sequence containing the example, between ACP domains 1 and 2 is the sequence: sequence encoding the ORFA-KS domain is represented 30 APAPVKAAAPAAPVASAPAPA, represented herein as herein as SEQID NO:7 (positions 1-1500 of SEQID NO:1). The amino acid sequence containing the KS domain spans SEQ ID NO:15. from a starting point of between about positions 1 and 14 of ORFA-KR SEQ ID NO:2 (ORFA) to an ending point of between about Domain 12 in OrfA is a KR domain, also referred to positions 476 and 500 of SEQ ID NO:2. The amino acid herein as ORFA-KR. This domain is contained within the sequence containing the ORFA-KS domain is represented 35 nucleotide sequence spanning from a starting point of about herein as SEQID NO:8 (positions 1-500 of SEQID NO:2). position 6598 of SEQ ID NO:1 to an ending point of about It is noted that the ORFA-KS domain contains an active site position 8730 of SEQ ID NO:1. The nucleotide sequence motif: DXAC* (*acyl binding site Cs). containing the sequence encoding the ORFA-KR domain is represented herein as SEQ ID NO:17 (positions 6598-8730 ORFA-MAT 40 of SEQ ID NO:1). The amino acid sequence containing the The second domain in OrfA is a MAT domain, also KR domain spans from a starting point of about position referred to herein as ORFA-MAT. This domain is contained 2200 of SEQ ID NO:2 (ORFA) to an ending point of about within the nucleotide sequence spanning from a starting position 2910 of SEQ ID NO:2. The amino acid sequence point of between about positions 1723 and 1798 of SEQ ID containing the ORFA-KR domain is represented herein as NO: 1 (OrfA) to an ending point of between about positions 45 SEQ ID NO:18 (positions 2200-2910 of SEQ ID NO:2). 2805 and 3000 of SEQ ID NO:1. The nucleotide sequence Within the KR domain is a core region with homology to containing the sequence encoding the ORFA-MAT domain short chain aldehyde-dehydrogenases (KR is a member of is represented herein as SEQID NO:9 (positions 1723-3000 this family). This core region spans from about position of SEQ ID NO:1). The amino acid sequence containing the 7198 to about position 7500 of SEQ ID NO:1, which MAT domain spans from a starting point of between about 50 corresponds to amino acid positions 2400-2500 of SEQ ID positions 575 and 600 of SEQID NO:2 (ORFA) to an ending NO:2. point of between about positions 935 and 1000 of SEQ ID NO:2. The amino acid sequence containing the ORFA-MAT Open Reading Frame B (OrfB): domain is represented herein as SEQ ID NO:10 (positions The complete nucleotide sequence for OrfB is represented 575-1000 of SEQID NO:2). It is noted that the ORFA-MAT 55 herein as SEQID NO:3. OrfB is a 6177 nucleotide sequence domain contains an active site motif: GHS*XG (acyl (not including the stop codon) which encodes a 2059 amino binding site So), represented herein as SEQ ID NO:11. acid sequence, represented herein as SEQID NO:4. Within OrfB are four domains: ORFA-ACPH1-9 (a) B-ketoacyl-ACP synthase (KS) domain; Domains 3-11 of OrfA are nine tandem ACP domains, 60 (b) one chain length factor (CLF) domain; also referred to herein as ORFA-ACP (the first domain in the (c) one acyl transferase (AT) domain; sequence is ORFA-ACP1, the second domain is ORFA (d) one enoyl ACP-reductase (ER) domain. ACP2, the third domain is ORFA-ACP3, etc.). The first ACP The domains contained within ORFB have been deter domain, ORFA-ACP1, is contained within the nucleotide mined based on: (1) results of an analysis with Pfam sequence spanning from about position 3343 to about posi 65 program, described above; and/or (2) homology comparison tion 3600 of SEQID NO:1 (OrfA). The nucleotide sequence to bacterial PUFA-PKS systems (e.g., Shewanella) using a containing the sequence encoding the ORFA-ACP1 domain BLAST 2.0 Basic BLAST homology search, also described US 7,256,023 B2 65 66 above. Sequences provided for individual domains are 4648-6177 of SEQ ID NO:3). The amino acid sequence believed to contain the full length of the sequence encoding containing the ER domain spans from a starting point of a functional domain, and may contain additional flanking about position 1550 of SEQID NO:4 (ORFB) to an ending sequence within the Orf. point of about position 2059 of SEQ ID NO:4. The amino ORFB-KS acid sequence containing the ORFB-ER domain is repre The first domain in OrfB is a KS domain, also referred to sented herein as SEQ ID NO:26 (positions 1550-2059 of herein as ORFB-KS. This domain is contained within the SEQ ID NO:4). nucleotide sequence spanning from a starting point of Open Reading Frame C (or C): between about positions 1 and 43 of SEQ ID NO:3 (OrfB) The complete nucleotide sequence for Orfc is represented to an ending point of between about positions 1332 and 1350 10 herein as SEQID NO:5. Orf is a 4509 nucleotide sequence of SEQ ID NO:3. The nucleotide sequence containing the (not including the stop codon) which encodes a 1503 amino sequence encoding the ORFB-KS domain is represented acid sequence, represented herein as SEQID NO:6. Within herein as SEQ ID NO:19 (positions 1-1350 of SEQ ID OrfQ are three domains: NO:3). The amino acid sequence containing the KS domain (a) two Fab A-like B-hydroxy acyl-ACP dehydrase (DH) spans from a starting point of between about positions 1 and 15 domains; 15 of SEQ ID NO:4 (ORFB) to an ending point of between (b) one enoyl ACP-reductase (ER) domain. about positions 444 and 450 of SEQ ID NO:4. The amino The domains contained within ORFC have been deter acid sequence containing the ORFB-KS domain is repre mined based on: (1) results of an analysis with Pfam sented herein as SEQID NO:20 (positions 1-450 of SEQID program, described above; and/or (2) homology comparison NO:4). It is noted that the ORFB-KS domain contains an to bacterial PUFA-PKS systems (e.g., Shewanella) using a active site motif: DXAC* (*acyl binding site Co.). BLAST 2.0 Basic BLAST homology search, also described ORFB-CLF above. Sequences provided for individual domains are The second domain in OrfB is a CLF domain, also believed to contain the full length of the sequence encoding referred to herein as ORFB-CLF. This domain is contained 25 a functional domain, and may contain additional flanking within the nucleotide sequence spanning from a starting sequence within the Orf. point of between about positions 1378 and 1402 of SEQID ORFC-DH1 NO:3 (OrfB) to an ending point of between about positions The first domain in Orf is a DH domain, also referred to 2682 and 2700 of SEQ ID NO:3. The nucleotide sequence herein as ORFC-DH1. This is one of two DH domains in containing the sequence encoding the ORFB-CLF domain is 30 OrfQ, and therefore is designated DH1. This domain is represented herein as SEQ ID NO:21 (positions 1378-2700 contained within the nucleotide sequence spanning from a of SEQID NO:3). The amino acid sequence containing the starting point of between about positions 1 and 778 of SEQ CLF domain spans from a starting point of between about ID NO:5 (Orf) to an ending point of between about positions 460 and 468 of SEQID NO:4 (ORFB) to an ending positions 1233 and 1350 of SEQ ID NO:5. The nucleotide point of between about positions 894 and 900 of SEQ ID 35 sequence containing the sequence encoding the ORFC-DH1 NO:4. The amino acid sequence containing the ORFB-CLF domain is represented herein as SEQ ID NO:27 (positions domain is represented herein as SEQ ID NO:22 (positions 1-1350 of SEQ ID NO:5). The amino acid sequence con 460-900 of SEQID NO:4). It is noted that the ORFB-CLF taining the DH1 domain spans from a starting point of domain contains a KS active site motif without the acyl between about positions 1 and 260 of SEQID NO:6 (ORFC) binding cysteine. 40 to an ending point of between about positions 411 and 450 ORFB-AT of SEQ ID NO:6. The amino acid sequence containing the The third domain in OrfB is an AT domain, also referred ORFC-DH1 domain is represented herein as SEQID NO:28 to herein as ORFB-AT. This domain is contained within the (positions 1-450 of SEQ ID NO:6). nucleotide sequence spanning from a starting point of 45 ORFC-DH2 between about positions 2701 and 3598 of SEQ ID NO:3 The second domain in Orf is a DH domain, also referred (OrfB) to an ending point of between about positions 3975 to herein as ORFC-DH2. This is the Second of two DH and 4200 of SEQ ID NO:3. The nucleotide sequence con domains in Orfc, and therefore is designated DH2. This taining the sequence encoding the ORFB-AT domain is domain is contained within the nucleotide sequence Span represented herein as SEQ ID NO:23 (positions 2701-4200 ning from a starting point of between about positions 1351 of SEQ ID NO:3). The amino acid sequence containing the 50 and 2437 of SEQ ID NO:5 (Orf() to an ending point of AT domain spans from a starting point of between about between about positions 2607 and 2850 of SEQ ID NO:5. positions 901 and 1200 of SEQ ID NO:4 (ORFB) to an The nucleotide sequence containing the sequence encoding ending point of between about positions 1325 and 1400 of the ORFC-DH2 domain is represented herein as SEQ ID SEQ ID NO:4. The amino acid sequence containing the NO:29 (positions 1351-2850 of SEQ ID NO:5). The amino ORFB-AT domain is represented herein as SEQ ID NO:24 55 acid sequence containing the DH2 domain spans from a (positions 901-1400 of SEQ ID NO:4). It is noted that the starting point of between about positions 451 and 813 of ORFB-AT domain contains an AT active site motif of SEQ ID NO:6 (ORFC) to an ending point of between about GXS*XG (*acyl binding site So). positions 869 and 950 of SEQ ID NO:6. The amino acid ORFB-ER 60 sequence containing the ORFC-DH2 domain is represented The fourth domain in OrfB is an ER domain, also referred herein as SEQ ID NO:30 (positions 451-950 of SEQ ID to herein as ORFB-ER. This domain is contained within the NO:6). nucleotide sequence spanning from a starting point of about ORFC-ER position 4648 of SEQID NO:3 (OrfB) to an ending point of The third domain in Orf is an ER domain, also referred about position 6177 of SEQ ID NO:3. The nucleotide 65 to herein as ORFC-ER. This domain is contained within the sequence containing the sequence encoding the ORFB-ER nucleotide sequence spanning from a starting point of about domain is represented herein as SEQ ID NO:25 (positions position 2998 of SEQID NO:5 (OrfC) to an ending point of US 7,256,023 B2 67 68 about position 4509 of SEQ ID NO:5. The nucleotide Following the method outlined above, a frozen vial of sequence containing the sequence encoding the ORFC-ER Thraustochytrium sp. 23B (ATCC 20892) was used to domain is represented herein as SEQ ID NO:31 (positions inoculate a 250 mL shake flask containing 50 mL of RCA 2998-4509 of SEQ ID NO: 5). The amino acid sequence medium. The culture was shaken on a shaker table (200 rpm) containing the ER domain spans from a starting point of 5 for 72 hr at 25° C. RCA medium contains the following: about position 1000 of SEQ ID NO:6 (ORFC) to an ending point of about position 1502 of SEQ ID NO:6. The amino acid sequence containing the ORFC-ER domain is repre sented herein as SEQ ID NO:32 (positions 1000-1502 of RCA Medium 10 SEQ ID NO:6). Deionized water 1000 mL. Reef Crystals (R) sea salts 40 g/L Example 2 Glucose 20 g/L Monosodium glutamate (MSG) 20 g/L Yeast extract 1 g/L The following example describes the use of the screening PII metals* 5 mLL process of the present invention to identify three other 15 Vitamin mix 1 mLL non-bacterial organisms comprising a PUFA PKS system pH 7.0 according to the present invention. *PII metal mix and vitamin mix are same as those outlined in U.S. Pat. Thraustochytrium sp. 23B (ATCC 20892) was cultured No. 5,130,742, incorporated herein by reference in its entirety. according to the screening method described in U.S. Provi sional Application Ser. No. 60/298.796 and as described in 25 mL of the 72 hr old culture was then used to inoculate detail herein. another 250 mL shake flask containing 50 mL of low The biorational screen (using shake flask cultures) devel nitrogen RCA medium (10 g/L MSG instead of 20 g/L) and oped for detecting microorganisms containing PUFA pro the other 25 mL of culture was used to inoculate a 250 mL ducing PKS systems is as follows: 25 shake flask containing 175 mL of low-nitrogen RCA Two mL of a culture of the strain/microorganism to be medium. The two flasks were then placed on a shaker table tested is placed in 250 mL baffled shake flask with 50 mL culture media (aerobic treatment) and another 2 mL of (200 rpm) for 72 hr at 25°C. The cells were then harvested culture of the same strain is placed in a 250 mL non-baffled via centrifugation and dried by lyophilization. The dried shake flask with 200 mL culture medium (anoxic treatment). cells were analyzed for fat content and fatty acid profile and Both flasks are placed on a shaker table at 200 rpm. After 30 content using standard gas chromatograph procedures (such 48-72 hr of culture time, the cultures are harvested by as those outlined in U.S. Pat. No. 5,130,742). centrifugation and the cells analyzed for fatty acid methyl The screening results for Thraustochytrium 23B were as esters via gas chromatography to determine the following follows: data for each culture: (1) fatty acid profile; (2) PUFA content; (3) fat content (estimated as amount total fatty acids 35 (TFA)). These data are then analyzed asking the following five Did DHA as % FAME increase? Yes (38->44%) C14:0 + C16:0 + C16:1 greater than about 40% Yes (44%) questions: TFA 40 No C18:3(n - 3) or C18:3(n - 6)? Yes (0%) Selection Criteria: Low O/Anoxic Flask vs. Aerobic Flask Did fat content increase? Yes (2-fold increase) (Yes/No) Did DHA (or other HUFA content increase)? Yes (2.3-fold increase) (1) Did the DHA (or other PUFA content) (as % FAME) stay about the same or preferably increase in the low oxygen culture compared to the aerobic culture? The results, especially the significant increase in DHA 45 content (as % FAME) under low oxygen conditions, condi (2) Is C14:0+C16:0+C16:1 greater than about 40%TFA in tions, strongly indicates the presence of a PUFA producing the anoxic culture? PKS system in this strain of Thraustochytrium. (3) Is there very little (>1% as FAME) or no precursors In order to provide additional data confirming the pres (C18:3n-3+C 18:2n-6+C 18:3n-6) to the conventional ence of a PUFA PKS system, southern blot of Thraus oxygen dependent elongase? desaturase pathway in the 50 anoxic culture? tochytrium 23B was conducted using PKS probes from (4) Did fat content (as amount total fatty acids/cell dry Schizochytrium strain 20888, a strain which has already weight) increase in the low oxygen culture compared to been determined to contain a PUFA producing PKS system the aerobic culture? (i.e., SEQ ID Nos: 1-32 described above). Fragments of (5) Did DHA (or other PUFA content) increase as % cell 55 Thraustochytrium 23B genomic DNA which are homolo dry weight in the low oxygen culture compared to the gous to hybridization probes from PKS PUFA synthesis aerobic culture? genes were detected using the Southern blot technique. If first three questions are answered yes, there is a good Thraustochytrium 23B genomic DNA was digested with indication that the strain contains a PKS genetic system for either Clal or KpnI restriction endonucleases, separated by making long chain PUFAs. The more questions that are 60 agarose gel electrophoresis (0.7% agarose, in standard Tris answered yes (preferably the first three questions must be Acetate-EDTA buffer), and blotted to a Schleicher & Schuell answered yes), the stronger the indication that the strain Nytran Supercharge membrane by capillary transfer. Two contains such a PKS genetic system. If all five questions are digoxigenin labeled hybridization probes were used—one answered yes, then there is a very strong indication that the 65 specific for the Enoyl Reductase (ER) region of strain contains a PKS genetic system for making long chain Schizochytrium PKS OrfB (nucleotides 5012-5511 of OrfB: PUFAS. SEQID NO:3), and the other specific for a conserved region US 7,256,023 B2 69 70 at the beginning of Schizochytrium PKS Orf C (nucleotides bound desaturases or fatty acid elongation enzymes like 76-549 of Orf'; SEQ ID NO:5). those described for other eukaryotes (Parker-Barnes et al., The OrfB-ER probe detected an approximately 13 kb ClaI 2000, supra; Shanklin et al., 1998, supra). fragment and an approximately 3.6 kb KpnI fragment in the Schizochytrium accumulates large quantities of triacylg Thraustochytrium 23B genomic DNA. The Orf probe lycerols rich in DHA and docosapentaenoic acid (DPA; detected an approximately 7.5 kb ClaI fragment and an 22:5c)6); e.g., 30% DHA+DPA by dry weight. In eukaryotes approximately 4.6 kb Kpnl fragment in the Thraus that synthesize 20- and 22-carbon PUFAs by an elongation/ tochytrium 23B genomic DNA. desaturation pathway, the pools of 18-, 20- and 22-carbon Finally, a recombinant genomic library, consisting of intermediates are relatively large so that in vivo labeling DNA fragments from Thraustochytrium 23B genomic DNA 10 experiments using C-acetate reveal clear precursor-prod inserted into vector lambda FIX II (Stratagene), was uct kinetics for the predicted intermediates. Furthermore, screened using digoxigenin labeled probes corresponding to radiolabeled intermediates provided exogenously to Such the following segments of Schizochytrium 20888 PUFA organisms are converted to the final PUFA products. PKS genes: nucleotides 7385-7879 of OrfA (SEQID NO:1), 1-Clacetate was supplied to a 2-day-old culture as a nucleotides 5012-5511 of Orf B (SEQ ID NO:3), and 15 single pulse at Zero time. Samples of cells were then nucleotides 76-549 of Orf C (SEQ ID NO:5). Each of these harvested by centrifugation and the lipids were extracted. In probes detected positive plaques from the Thraustochytrium addition, 1-'Clacetate uptake by the cells was estimated by 23B library, indicating extensive homology between the measuring the radioactivity of the sample before and after Schizochytrium PUFA-PKS genes and the genes of Thraus centrifugation. Fatty acid methyl esters derived from the tochytrium 23B. total cell lipids were separated by AgNO-TLC (solvent, In Summary, these results demonstrate that Thraus hexane:diethyl ether:acetic acid, 70:30:2 by volume). The tochytrium 23B genomic DNA contains sequences that are identity of the fatty acid bands was verified by gas chroma homologous to PKS genes from Schizochytrium 20888. tography, and the radioactivity in them was measured by This Thraustochytrid microorganism is encompassed scintillation counting. Results showed that 1-''C-acetate herein as an additional sources of these genes for use in the 25 was rapidly taken up by Schizochytrium cells and incorpo embodiments above. rated into fatty acids, but at the shortest labeling time (1 min) Thraustochytrium 23B (ATCC 20892) is significantly DHA contained 31% of the label recovered in fatty acids and different from Schizochytrium sp. (ATCC 20888) in its fatty this percentage remained essentially unchanged during the acid profile. Thraustochytrium 23B can have DHA: DPA(n- 10-15 min of ''C-acetate incorporation and the subsequent 6) ratios as high as 14:1 compared to only 2-3:1 in 30 24 hours of culture growth (data not shown). Similarly, DPA Schizochytrium (ATCC 20888). Thraustochytrium 23B can represented 10% of the label throughout the experiment. also have higher levels of C20:5(n-3). Analysis of the There is no evidence for a precursor-product relationship domains in the PUFA PKS system of Thraustochytrium 23B between 16- or 18-carbon fatty acids and the 22-carbon in comparison to the known Schizochytrium PUFA PKS polyunsaturated fatty acids. These results are consistent with system should provide us with key information on how to 35 rapid synthesis of DHA from '''C-acetate involving very modify these domains to influence the ratio and types of PUFA produced using these systems. Small (possibly enzyme-bound) pools of intermediates. The screening method described above has been utilized Next, cells were disrupted in 100 mM phosphate buffer the identify other potential candidate strains containing a (pH 7.2), containing 2 mM DTT, 2 mM EDTA, and 10% PUFA PKS system. Two additional strains that have been 40 glycerol, by Vortexing with glass beads. The cell-free homo identified by the present inventors to have PUFA PKS genate was centrifuged at 100,000 g for 1 hour. Equivalent systems are Schizochytrium limacium (SR21) Honda & aliquots of total homogenate, pellet (H-Spellet), and Super Yokochi (IFO32693) and Ulkenia (BP-5601). Both were natant (H-S Super) fractions were incubated in homogeni screened as above but in N2 media (glucose: 60 g/L.; zation buffer supplemented with 20 uMacetyl-CoA, 100 uM KHPO: 4.0 g/l; yeast extract: 1.0 g/L; corn steep liquor: 1 45 1- Cmalonyl-CoA (0.9 Gbc/mol), 2 mM NADH, and 2 mL/L. NHNO: 1.0 g/L, artificial sea salts (Reef Crystals): mM NADPH for 60 min at 25°C. Assays were extracted and 20 g/L, all above concentrations mixed in deionized water). fatty acid methyl esters were prepared and separated as For both the Schizochytrium and Ulkenia strains, the described above before detection of radioactivity with an answers to the first three screen questions discussed above Instantimager (Packard Instruments, Meriden, Conn.). for Thraustochytrium 23B was yes (Schizochytrium—DHA 50 Results showed that a cell-free homogenate derived from 96 FAME 32->41% aerobic vs anoxic, 58%. 14:0/16:0/16: 1, Schizochytrium cultures incorporated 1-''C-malonyl-CoA 0% precursors) and (Ulkenia–DHA % FAME 28->44% into DHA, DPA, and saturated fatty acids (data not shown). aerobic vs anoxic, 63%. 14:0/16:0/16:1, 0% precursors), The same biosynthetic activities were retained by a 100, indicating that these strains are good candidates for contain 000xg supernatant fraction but were not present in the ing a PUFA PKS system. Negative answers were obtained 55 membrane pellet. These data contrast with those obtained for the final two questions for each strain: fat decreased from during assays of the bacterial enzymes (see Metz et al., 61% dry wt to 22% dry weight, and DHA from 21-9% dry 2001, supra) and may indicate use of a different (soluble) weight in S. limacium and fat decreased from 59 to 21% dry acyl acceptor molecule. Thus, DHA and DPA synthesis in weight in Ulkenia and DHA from 16% to 9% dry weight. Schizochytrium does not involve membrane-bound desatu These Thraustochytrid microorganisms are also claimed 60 rases or fatty acid elongation enzymes like those described herein as additional sources of the genes for use in the for other eukaryotes. embodiments above. While various embodiments of the present invention have been described in detail, it is apparent that modifications and Example 3 adaptations of those embodiments will occur to those skilled 65 in the art. It is to be expressly understood, however, that such The following example demonstrates that DHA and DPA modifications and adaptations are within the scope of the synthesis in Schizochytrium does not involve membrane present invention, as set forth in the following claims.

US 7,256,023 B2 80

-continued Glu Wall Asn Ala Met Ala Glu Ile Ala Gly Ser Ser Ala 200 cc.g cost gct gcc gct cc.g gct cc.g gcc aag got gcc cost 3654 Pro Pro Ala Ala Ala Pro Ala Pro Ala Lys Ala Ala Pro 215 gcc gct gCg cost gct tog aac gag citt citc gag aag gcc 3699 Ala Ala Ala Pro Ala Ser Asn Glu Teu Leu Glu Lys Ala 230 gag gto gto atg gag citc. gcc gcc aag act ggc tac gag 3744 Glu Wall Wall Met Glu Telu Ala Ala Lys Thr Gly Tyr Glu 245 act atg atc gag to c atg gag citc. gag act gag citc ggC 3789 Thr Met Ile Glu Ser Met Glu Teu Glu Thr Glu Lieu Gly 260 att too atc aag cgt. gag atc citc. too gag gtt Cag gcc 3834 Ile Ser Ile Lys Arg Glu Ile Teu Ser Glu Wall Glin Ala 275 atg aac gto gag gcc gac gtc gac gct citc agc cqc act 3879 Met Asn Wall Glu Ala Asp Wall Asp Ala Leu Ser Arg Thr 29 O

gtg ggit gag gtc aac gcc atg aag gct gag atc gct 392.4 Wall Gly Glu Wall Asn Ala Met Lys Ala Glu Ile Ala 305

tot gcc cc.g gCg gcc gcc gct gcc cca ggit cog gct 3969 Ser Ala Pro Ala Ala Ala Ala Ala Pro Gly Pro Ala 320

gcc cost gC9 cost gCC gCC gcc cost gct gito tcg aac 4014 Ala Pro Ala Pro Ala Ala Ala Pro Ala Wal Ser Asn 335 gag citt gag aag gcc acc gtc gto atg gag gtc. citc gcc Glu Teu Glu Lys Ala Thr Wall Wall Met Glu Wall Leu Ala 350 gcc act ggC tac gag gac atg atc. gag to C gac atg gag 4104 Ala Thr Gly Tyr Glu Asp Met Ile Glu Ser Asp Met Glu 365 citc. acc gag citc. ggC gac to c atc. aag Cgt gtc. gag att 4149 Teu Thr Glu Telu Gly Asp Ser Ile Lys Arg Val Glu Ile 38O citc. gag gto Cag gcc citc. aac gto gag gcc aag gac gto 4.194 Teu Glu Wall Glin Ala Telu Asn Wall Glu Ala Lys Asp Wall 395 gac citc. agC cgc acc act gtt ggC gag gto gtc gat gcc 4239 Asp Teu Ser Arg Thr Thr Wall Gly Glu Val Val Asp Ala 410 atg gcc gag atc gct ggC tot gcc cc.g gc g cct gcc gcc 4284 Met Ala Glu Ile Ala Gly Ser Ala Pro Ala Pro Ala Ala 425 gct cost gct cc.g gct gcc gcc cost gCg cct gcc gcc cost 4329 Ala Pro Ala Pro Ala Ala Ala Pro Ala Pro Ala Ala Pro 4 40 gCg gct gto tog agc citt citc. gag aag gcc gag act gto 4374 Ala Ala Wall Ser Ser Telu Telu Glu Ala Glu Thr Wall 455 gto gag gto citc. gcc aag act ggC tac gag act gac atg 4 419 Wall Glu Wall Telu Ala Lys Thr Gly Tyr Glu Thir Asp Met 470 atc too gac atg gag citc. gag acc gag citc. ggc att gac too 4 464 Ile Ser Asp Met Glu Telu Glu Thr Glu Teu Gly Ile Asp Ser 480 485 US 7,256,023 B2 82

-continued atc aag cgt. gto gag att to c gag gto Cag gcc atg citc aac 4509 Ile Lys Arg Wall Glu Ile Ser Glu Wall Glin Ala Met Leu Asn 490 5 OO gto gcc aag gac gtc gct citc. agC cgc acc cgc act gtt 4554 Wall Ala Lys Asp Wall Ala Telu Ser Arg Thr Arg Thr Wall 515

gto gto gat gcc aag gcc gag atc gct ggt ggC tot 45.99 Wall Wall Asp Ala Lys Ala Glu Ile Ala Gly Gly Ser 530 gcc gCg cost gcc gcc gct cost gct cc.g gct gct gcc gcc 4 64 4 Ala Ala Pro Ala Ala Ala Pro Ala Pro Ala Ala Ala Ala 545 cost cost gcc gcc cost cost gcc gcc cost gc g cct gct gto 4689 Pro Pro Ala Ala Pro Pro Ala Ala Pro Ala Pro Ala Wall 560 tog gag citt citc. gag gcc gag act gto gto atg gag gto 4734 Ser Glu Teu Telu Glu Ala Glu Thr Wall Wal Met Glu Wall 575 citc. gcc aag act ggC gag act gac atg att gag to c gac 4779 Teu Ala Lys Thr Glu Thr Asp Met le Glu Ser Asp 590 atg citc. gag acc gag ggC att gac too atc aag cqt gto 4824 Met Teu Glu Thr Glu Gly Ile Asp Ser le Lys Arg Wall 605 gag citc. too gag gtt gcc atg citc. aac gto gag gCC aag 486.9 Glu Teu Ser Glu Wall Ala Met Teu Asn Wall Glu Ala Lys 62O gac gac gct citc. agc act cgc act gtt ggit gag gtc. gto 4.914 Asp Asp Ala Telu Ser Thr Arg Thr Wall Gly Glu Val Wall 635 gat atg aag gct gag gct ggC agC too gcc tog gcq cost 4.959 Asp Met Ala Glu Ala Gly Ser Ser Ala Ser Ala Pro 650 gcc gct gct cost gct gct gct gcc gct cct gcg ccc gct 5 OO 4 Ala Ala Ala Pro Ala Ala Ala Ala Ala Pro Ala Pro Ala 665 gcc gcc cost gct gtc aac gag citt citc. gag aaa gCC gag Ala Ala Pro Ala Wall Asn Glu Teu Teu Glu Lys Ala Glu 680 act gto atg gag gtc gcc gcc aag act ggC tac gag act Thr Wall Met Glu Wall Ala Ala Thr Gly Tyr Glu Thr 695 gac atc gag to c gac gag citc. gag act gag Ctc ggC att 5 139 Asp Ile Glu Ser Asp Glu Telu Glu Thr Glu Lieu Gly Ile 710 gac atc aag cgt. gtc atc citc. too gag gtt cag goc atg 51.84 Asp Ile Arg Wall Ile Telu Ser Glu Wall Glin Ala Met 725 citc. gto gag gcc aag gtc gat gcc citc. agc cqc acc cgc 5229 Teu Wall Glu Ala Lys Wall Asp Ala Teu Ser Arg Thr Arg 740 act ggC gag gtt gtc gcc atg aag gcc gag atc gct ggit 5274 Thr Gly Glu Wall Wall Ala Met Lys Ala Glu Ile Ala Gly 755

gcc cc.g gCg cost gcc gct gcc cost gct cog got gcc 5319 Ala Pro Ala Pro Ala Ala Ala Pro Ala Pro Ala Ala 770

cost gct gtc tog gag citt citc. gag aag gCC gag act 5364 Pro Ala Wall Ser Glu Telu Teu Glu Lys Ala Glu Thr 785

US 7,256,023 B2 91 92

-continued

35 40 45

Teu Pro Glu Asp Arg Val Asp Wall Thr Ala Tyr Phe Asp Pro Wall 50 55 60

Thr Thr Lys Ile Tyr Cys Lys Arg Gly Gly Phe Ile Pro Glu 65 70 75

Asp Phe Ala Arg Glu Phe Gly Telu Asn Met Phe Glin Met Glu 85 90 95

Asp Ser Asp Ala Asn Glin Thr Ile Ser Telu Teu Wall Lys Glu Ala 100 105 110

Teu Glin Asp Ala Gly Ile Ala Telu Gly Lys Glu Lys Asn Ile 115 120 125

Gly Cys Wall Telu Gly Ile Gly Gly Gly Glin Ser Ser His Glu Phe 130 135 1 4 0

Tyr Ser Arg Telu Asn Tyr Wall Wall Wall Glu Lys Wall Teu Arg Met 145 15 O 155 160

Gly Met Pro Glu Glu Asp Wall Lys Wall Ala Wall Glu Lys Ala 1.65 170 175

Asn Phe Pro Glu Trp Arg Teu Asp Ser Phe Pro Gly Phe Telu Gly Asn 18O 185 19 O

Wall Thr Ala Gly Thr Asn Thr Phe Asn Teu Asp Gly Met Asn 195 200 2O5

Wall Wall Asp Ala Ala Cys Ala Ser Ser Teu Ile Ala Wall Wall 210 215 220

Ala Ile Asp Glu Teu Teu Tyr Gly Asp Asp Met Met Wall Thr Gly 225 230 235 240

Ala Thr Thr Asp Asn Ser Ile Gly Met Met Ala Phe Ser 245 250 255

Thr Pro Wall Phe Ser Thr Pro Ser Wall Arg Ala Asp Glu 260 265 27 O

Thr Gly Met Lieu. Ile Gly Glu Gly Ser Ala Met Teu Wall Telu 275 280 285

Arg Tyr Ala Asp Ala Wall Arg Asp Gly Asp Glu Ile His Ala Wall Ile 29 O 295

Arg Gly Cys Ala Ser Ser Ser Asp Gly Lys Ala Ala Gly Ile Thr 305 310 315 320

Pro Thr Ile Ser Gly Glin Glu Glu Ala Telu Arg Arg Ala Asn Arg 325 330 335

Ala Wall Asp Pro Ala Thr Wall Thr Telu Wall Glu Gly His Gly Thr 340 345 35 O

Gly Thr Pro Wall Gly Asp Ile Glu Telu Thr Ala Teu Arg Asn Telu 355 360 365

Phe Asp Ala Tyr Gly Glu Gly Asn Thr Glu Lys Wall Ala Wall Gly 370 375

Ser Ile Ser Ser Ile Gly His Telu Ala Wall Ala Gly Telu Ala 385 390 395 400

Gly Met Ile Wall Ile Met Ala Telu Lys His Thr Telu Pro Gly 405 410 415

Thr Ile Asn Wall Asp Asn Pro Pro Asn Telu Asp Asn Thr Pro Ile 420 425 43 O

Asn Glu Ser Ser Leu Tyr Ile Asn Thr Met Asn Arg Pro Trp Phe Pro 435 4 40 4 45

Pro Pro Gly Wall Pro Arg Arg Ala Gly Ile Ser Ser Phe Gly Phe Gly 450 455 460 US 7,256,023 B2 93 94

-continued

Gly Ala Asn His Ala Wall Telu Glu Glu Ala Glu Pro Glu His Thr 465 470 475 480

Thr Ala Tyr Arg Teu Asn Pro Glin Pro Wall Teu Met Met Ala 485 490 495

Ala Thr Pro Ala Ala Teu Glin Ser Telu Glu Ala Glin Telu Glu 5 OO 505

Phe Glu Ala Ala Ile Glu Asn Glu Thr Wall Asn Thr Ala 515 525

Ile Lys Wall Lys Phe Gly Glu Glin Phe Phe Pro Gly Ser Ile 530 535 540

Pro Ala Thr Asn Ala Arg Teu Gly Phe Telu Wall Asp Ala Glu Asp 545 550 555 560

Ala Ser Thr Teu Arg Ala Ile Cys Ala Glin Phe Ala Asp Wall 565 570 575

Thr Glu Ala Trp Arg Teu Pro Arg Glu Gly Wall Ser Phe Arg Ala 585 59 O

Gly Ile Ala Thr Asn Gly Ala Wall Ala Ala Teu Phe Ser Gly Glin 595 600 605

Gly Ala Glin Thr His Met Phe Ser Glu Wall Ala Met Asn Trp Pro 610 615

Glin Phe Arg Glin Ser Ile Ala Ala Met Asp Ala Ala Glin Ser Wall 625 630 635 640

Ala Gly Ser Asp Lys Asp Phe Glu Arg Wall Ser Glin Wall Telu Tyr Pro 645 650 655

Arg Pro Tyr Glu Arg Glu Pro Glu Glin Asn Pro Lys Ile Ser 660 665 67 O

Teu Thr Ala Ser Glin Pro Ser Thr Telu Ala Ala Telu Gly Ala 675 680 685

Phe Glu Ile Phe Lys Glu Ala Gly Phe Thr Pro Asp Phe Ala Ala Gly 69 O. 695 7 OO

His Ser Telu Gly Glu Phe Ala Ala Telu Ala Ala Gly Wall Asp 705 710 715 720

Arg Asp Glu Telu Phe Glu Teu Wall Cys Arg Arg Ala Arg Ile Met Gly 725 730 735

Gly Asp Ala Pro Ala Thr Pro Lys Gly Cys Met Ala Ala Wall Ile 740 745 750

Gly Pro Asn Ala Glu Asn Ile Lys Wall Glin Ala Ala Asn Wall Trp Telu 755 760 765

Gly Asn Ser Asn Ser Pro Ser Glin Thr Wall Ile Thr Gly Ser Wall Glu 770 775

Gly Ile Glin Ala Glu Ser Ala Telu Glin Lys Glu Gly Phe Arg Wall 785 790 795

Wall Pro Telu Ala Cys Glu Ser Ala Phe His Ser Pro Glin Met Glu Asn 805 810 815

Ala Ser Ser Ala Phe Asp Wall Ile Ser Wall Ser Phe Arg Thr 820 825

Pro Ala Glu Thr Teu Phe Ser Asn Wall Ser Gly Glu Thr 835 840 845

Pro Thr Asp Ala Glu Met Telu Thr Glin His Met Thr Ser Ser Wall 85 O 855 860

Lys Phe Telu Thr Glin Wall Arg Asn Met His Glin Ala Gly Ala Arg Ile 865 870 875 88O US 7,256,023 B2 95

-continued Phe Val Glu Phe Gly Pro Lys Glin Val Leu Ser Lys Leu Val Ser Glu 885 890 895 Thr Leu Lys Asp Asp Pro Ser Val Val Thr Val Ser Val Asn Pro Ala 9 OO 905 910 Ser Gly Thr Asp Ser Asp Ile Glin Leu Arg Asp Ala Ala Val Glin Lieu 915 920 925 Val Val Ala Gly Val Asn Lieu Glin Gly Phe Asp Lys Trp Asp Ala Pro 930 935 940 Asp Ala Thr Arg Met Glin Ala Ile Lys Lys Lys Arg Thr Thr Lieu Arg 945 950 955 96.O Leu Ser Ala Ala Thr Tyr Val Ser Asp Llys Thr Lys Lys Val Arg Asp 965 970 975 Ala Ala Met Asn Asp Gly Arg Cys Val Thr Tyr Lieu Lys Gly Ala Ala 98O 985 99 O Pro Lieu. Ile Lys Ala Pro Glu Pro Val Val Asp Glu Ala Ala Lys Arg 995 10 OO 1005 Glu Ala Glu Arg Lieu Gln Lys Glu Lieu Glin Asp Ala Glin Arg Glin O 10 O15 O20 Leu Asp Asp Ala Lys Arg Ala Ala Ala Glu Ala Asn. Ser Lys Lieu O25 O3O O35 Ala Ala Ala Lys Glu Glu Ala Lys Thr Ala Ala Ala Ser Ala Lys

Pro Ala Val Asp Thr Ala Val Val Glu Lys His Arg Ala Ile Lieu

Lys Ser Met Leu Ala Glu Lieu. Asp Gly Tyr Gly Ser Val Asp Ala

Ser Ser Leu Glin Glin Glin Glin Glin Glin Glin Thr Ala Pro Ala Pro

Wall Lys Ala Ala Ala Pro Ala Ala Pro Wall Ala Ser Ala Pro Ala

Pro Ala Val Ser Asn. Glu Lieu Lieu Glu Lys Ala Glu Thr Val Val

Met Glu Val Leu Ala Ala Lys Thr Gly Tyr Glu Thr Asp Met Ile

Glu Ala Asp Met Glu Lieu Glu Thr Glu Lieu Gly e Asp Ser Ile

Lys Arg Val Glu Ile Leu Ser Glu Val Glin Ala Met Lieu. Asn. Wall

Glu Ala Lys Asp Wall Asp Ala Leu Ser Arg Thr Arg Thr Val Gly

Glu Val Val Asn Ala Met Lys Ala Glu Ile Ala Gly Ser Ser Ala

Pro Ala Pro Ala Ala Ala Ala Pro Ala Pro Ala Lys Ala Ala Pro

Ala Ala Ala Ala Pro Ala Val Ser Asn. Glu Lieu Lleu Glu Lys Ala

Glu Thr Val Val Met Glu Val Leu Ala Ala Lys Thr Gly Tyr Glu 235 240 245 Thr Asp Met Ile Glu Ser Asp Met Glu Leu Glu Thr Glu Leu Gly 25 O 255 260 Ile Asp Ser Ile Lys Arg Val Glu Ile Leu Ser Glu Val Glin Ala 265 27 O 275 Met Lieu. Asn Val Glu Ala Lys Asp Wall Asp Ala Leu Ser Arg Thr US 7,256,023 B2 98

-continued

29 O

Arg Wall Gly Glu Wall Asn Ala Met Glu Ile Ala

Gly Ser Ala Pro Ala Ala Ala Ala Ala Gly Pro Ala

Ala Ala Pro Ala Pro Ala Ala Ala Pro Wall Ser Asn

Glu Teu Glu Ala Thr Wall Wall Met Wall Teu Ala

Ala Thr Gly Glu Asp Met Ile Glu Asp Met Glu

Teu Thr Glu Telu Gly Asp Ser Ile Wall Glu Ile

Teu Glu Wall Glin Ala Telu Asn Wall Glu Asp Wall

Asp Teu Ser Arg Thr Thr Wall Gly Glu Wall Asp Ala

Met Ala Glu Ile Ala Gly Ser Ala Pro Pro Ala Ala

Ala Pro Ala Pro Ala Ala Ala Pro Ala Ala Ala Pro

Ala Ala Wall Ser Ser Telu Telu Glu Glu Thr Wall

Wall Glu Wall Telu Ala Thr Gly Thr Asp Met

Ile Ser Asp Met Glu Telu Glu Thr Glu Teu Ile Asp Ser 480

Ile Arg Wall Glu Ile Telu Ser Glu Wall Glin Met Teu Asn 495

Wall Ala Asp Wall Ala Telu Ser Arg Arg Thr Wall

Gly Wall Wall Asp Ala Ala Glu Ile Gly Gly Ser

Ala Ala Pro Ala Ala Ala Pro Ala Pro Ala Ala Ala

Pro Pro Ala Ala Pro Pro Ala Ala Pro Pro Ala Wall

Ser Glu Teu Telu Glu Ala Glu Thr Wall Met Glu Wall

Teu Ala Thr Gly Glu Thr Asp Met Glu Ser Asp

Met Teu Glu Thr Glu Gly Ile Asp Ser Arg Wall

Glu Teu Ser Glu Wall Ala Met Teu Asn Glu Ala

Asp Asp Ala Telu Ser Thr Arg Thr Wall Glu Wall Wall

Asp Met Ala Glu Ala Gly Ser Ser Ser Ala Pro

Ala Ala Ala Ala Pro Ala Ala Ala Ala Ala Ala Pro Ala 655

Ala Ala Ala Pro Ala Wall Asn Glu Teu Teu Ala Glu 670 US 7,256,023 B2 99 100

-continued

Thr Wall Wall Met Glu Wall Telu Ala Ala Thr Gly Tyr Glu Thr 685 69 O. 695

Asp Met Ile Glu Ser Asp Met Glu Telu Glu Thr Teu Gly Ile 7 OO 705

Asp Ile Arg Wall Ile Telu Ser Glu Glin Ala Met

Teu Wall Glu Ala Wall Asp Ala Teu Arg Thr Arg

Thr Gly Glu Wall Wall Ala Met Ala Ile Ala Gly

Gly Ala Pro Ala Pro Ala Ala Ala Pro Pro Ala Ala

Ala Pro Ala Wall Ser Glu Telu Teu Glu Ala Glu Thr

Wall Met Glu Wall Telu Ala Thr Gly Glu Thr Asp

Met Glu Ser Asp Met Telu Glu Thr Glu Gly Ile Asp

Ser Arg Wall Glu Telu Ser Glu Wall Ala Met Teu

Asn Glu Ala Asp Asp Ala Teu Ser Thr Arg Thr

Wall Glu Wall Wall Asp Met Ala Glu Ala Gly Ser

Ser Pro Ala Pro Ala Ala Ala Pro Ala Ala Ala Ala

Ala Ala Pro Ala Ala Ala Pro Ala Wall Ser Glu Teu

Teu Ala Glu Thr Wall Met Glu Wall Ala Ala

Thr Glu Thr Asp Ile Glu Ser Asp Glu Teu Glu

Thr Teu Gly Ile Asp Ile Arg Wall Ile Teu Ser

Glu Glin Ala Met Telu Wall Glu Ala Wall Asp Ala

Teu Arg Thr Arg Thr Gly Glu Wall Wall Ala Met

Ala Ile Ala Gly Gly Ala Pro Ala Pro Ala Ala Ala

Pro Pro Ala Ala Ala Pro Ala Wall Ser Asn Glu Teu Teu 995

Glu Lys Ala Glu Thr Wall Wall Met Glu Wall Teu Ala Ala Thr 2OOO 2005 2010

Gly Tyr Glu Thr Asp Met Ile Glu Ser Asp Met Glu Teu Glu Thr 2015 2020 2025

Glu Telu Gly Ile Asp Ser Ile Arg Wall Glu Ile Teu Ser Glu 2030 2O35 20 40

Wall Glin Ala Met Telu Asn Wall Glu Ala Asp Wall Asp Ala Teu 2O45 2O5 O 2O55

Ser Arg Thr Arg Thr Wall Gly Glu Wall Wall Asp Ala Met Ala 2060 2O65 2070 US 7,256,023 B2 101 102

-continued

Glu Ile Ala Gly Gly Ser Ala Pro Ala Pro Ala Ala Ala Ala Pro 2O75 2080 2O85

Ala Ser Ala Gly Ala Ala Pro Ala Wall Ile Ser Wall His 2O 90 2095

Gly Asp Asp Asp Ser Telu Met His Wall Wall

Asp Arg Arg Pro Asp Telu Teu Glu Pro Glu Asn

Arg Wall Teu Wall Wall Asp Ser Glu Thr Teu Ala

Teu Arg Wall Telu Gly Wall Wall Thr Phe Glu

Gly Glin Teu Ala Glin Ala Ala Ala Ile Arg His

Wall Ala Asp Telu Ala Ser Ala Ala Ile

l Ala Glu Glin Arg Phe Gly Teu Gly Phe Ile Ser g 2200

Glin Ala Glu Arg Phe Glu Pro Glu Ile Teu Gly Phe Thr 2210 2215 2220

Teu Met Ala Phe Ala Ser Teu Cys Thr Ala Wall 2225 22.30 2235

Ala Gly Gly Pro Ala Phe Ile Wall Ala Arg Teu Asp Gly 2240 22 45 225 O

Arg Telu Gly Phe Thr Ser Glin Gly Thr Ser Asp Ala Teu Arg 2255 2260 2265

Ala Glin Arg Gly Ala Ile Phe Gly Telu Thr Ile Gly Teu 2270 2275 228O

Glu Trp Ser Glu Ser Asp Wall Phe Ser Arg Gly Wall Asp Ile Ala 2285 229 O 2295

Glin Gly Met His Pro Glu Asp Ala Ala Wall Ala Ile Wall Arg Glu 2300 2305 2310

Met Ala Ala Asp Ile Arg Ile Arg Glu Wall Gly Ile Gly Ala 2315 2320 2325

Asn Glin Glin Arg Thr Ile Arg Ala Ala Lys Teu Glu Thr Gly 2330 2335 234. O

Asn Pro Glin Arg Glin Ile Ala Asp Asp Wall Teu Teu Wall Ser 2345 235 O 2355

Gly Gly Ala Arg Gly Ile Thr Pro Telu Ile Arg Glu Ile Thr 2360 2365 2370

Arg Glin Ile Ala Gly Gly Lys Ile Teu Teu Gly Arg Ser 2375 2380 2385

Wall Ser Ala Ser Glu Pro Ala Trp Ala Gly Ile Thr Asp Glu 2390 2395 24 OO

Ala Wall Glin Ala Ala Thr Glin Glu Teu Lys Arg Ala Phe 2405 2410 24.15

Ser Ala Gly Glu Gly Pro Lys Pro Thr Pro Ala Wall Thr 2420 24.25 24.30

Teu Wall Gly Ser Wall Telu Gly Ala Arg Glu Wall Arg Ser Ser 2435 24 40 2445

Ala Ala Ile Glu Ala Telu Gly Gly Lys Ala Ile Tyr Ser Ser 2450 2455 2460

Asp Wall Asn Ser Ala Ala Asp Wall Ala Ala Wall Arg Asp Ala US 7,256,023 B2 103 104

-continued

2465 2470 24.75

Glu Ser Glin Teu Gly Ala Arg Wall Ser Gly Ile Wall His Ala Ser 24.80 2485 24.90

Gly Wall Teu Arg Asp Arg Telu Ile Glu Teu Pro Asp Glu 2495 25 OO 25 O5

Phe Asp Ala Wall Phe Gly Thr Wall Gly Teu Glu Asn Teu 2510 2515 252O

Teu Ala Ala Wall Asp Arg Ala Asn Telu His Met Wall Teu Phe 2525 25.30 2535

Ser Ser Teu Ala Gly Phe His Gly Asn Wall Gly Glin Ser Asp 2540 25.45 255 O

Ala Met Ala Asn Glu Ala Telu Asn Met Gly Teu Glu Teu Ala 2555 2560 2565

Asp Wall Ser Wall Ser Ile Phe Gly Pro Trp Asp Gly 2570 2575 258O

Gly Met Wall Thr Pro Glin Telu Glin Phe Glin Glu Met Gly 2585 2590 2595

Wall Glin Ile Ile Pro Arg Glu Gly Gly Ala Asp Thr Wall Ala Arg 26 OO 2605 26.10

Ile Wall Teu Gly Ser Ser Pro Ala Glu Ile Teu Wall Gly Asn Trp 2615 262O 2625

Arg Thr Pro Ser Wall Gly Ser Asp Thr Ile Thr Teu His 2630 2635 264 O

Arg Lys Ile Ser Ala Ser Asn Pro Phe Teu Glu Asp His Wall 2645 26.50 2655

Ile Glin Gly Arg Wall Telu Pro Met Thr Teu Ala Ile Gly Ser 2660 2665 2670

Teu Ala Glu Thr Telu Gly Telu Phe Pro Gly Tyr Ser Teu Trp 2675 268O 2685

Ala Ile Asp Asp Ala Glin Telu Phe Gly Wall Thr Wall Asp Gly 2690 2695 27 OO

Asp Wall Asn Glu Wall Thr Telu Thr Pro Ser Thr Ala Pro Ser 27 O5 210 2715

Gly Arg Wall Asn Wall Glin Ala Thr Telu Thr Phe Ser Ser Gly 2720 2725 273 O

Telu Wall Pro Ala Arg Ala Wall Ile Wall Teu Ser Asn Glin 2735 2740 2745

Gly Ala Pro Pro Ala Asn Ala Thr Met Glin Pro Pro Ser Teu Asp 2750 2755 276 O.

Ala Asp Pro Ala Telu Glin Gly Ser Wall Asp Gly Thr Teu 2765 2770 2775

Phe His Gly Pro Ala Phe Arg Gly Ile Asp Asp Wall Teu Ser 2780 2785 279 O

Thr Lys Ser Glin Telu Wall Ala Ser Ala Wall Pro Gly Ser 2795 2800 2805

Asp Ala Ala Arg Gly Glu Phe Ala Thr Asp Thr Asp Ala His Asp 2810 2815 282O

Pro Phe Wall Asn Asp Telu Ala Phe Glin Ala Met Teu Wall Trp Wall 2825 2830 2835

Arg Arg Thr Teu Gly Glin Ala Ala Telu Pro Asn Ser Ile Glin Arg 284 O 284.5 285 O

Ile Wall Glin His Arg Pro Wall Pro Glin Asp Pro Phe Ile 2855 2.860 2865

US 7,256,023 B2 116

-continued

470 aag gto gct coc aag tac gCg cgt. citc. aac att gac gag 4 464 Wall Ala Pro Lys Ala Arg Teu Asn. Ile Asp Glu 485

Cag gag acc cga. gat atc citc. aac aag gac aac gcg cc.g 4509 Glin Glu Thr Arg Asp Ile Telu Asn Asp Asn Ala Pro 5 OO tot tot tot tot tot tot tot tot tot tot tot tot tot 4554 Ser Ser Ser Ser Ser Ser Ser Ser Ser Ser Ser Ser Ser 515 cc.g cost gct cost tog coc gtg Cala aag aag got gct cco 45.99 Pro Pro Ala Pro Ser Pro Wall Glin Lys Lys Ala Ala Pro 530 gcc gag acc aag gct gct tog gct gac gca citt cqc agt 4 64 4 Ala Glu Thr Lys Ala Ala Ser Ala Asp Ala Lieu Arg Ser 545 gcc citc. gat citc. gac atg citt gcg citg agc tot gcc agt 4689 Ala Teu Asp Telu Asp Met Telu Ala Teu Ser Ser Ala Ser 560 gcc ggC aac citt gtt act gCg cost agC gac goc tog gto 4734 Ala Gly Asn Telu Wall Thr Ala Pro Asp Ala Ser Wall 575 att cc.g cco tgc aac gCg gat citc. agc cqc gcc titc. 4779 Ile Pro Pro Cys Asn Ala Asp Teu Ser Arg Ala Phe 590 atg acg tac ggit gtt gCg cost citg acg ggC gCC atg 4824 Met Thr Gly Wall Ala Pro Teu Thr Gly Ala Met 605 gcc ggC att gcc tot gac citc. gto gcc gcc ggC cgc 486.9 Ala Gly Ile Ala Ser Asp Telu Wall Ala Ala Gly Arg 62O

Cag atc citt gCg to c ggC gcc ggC citt coc atg Cag 4.914 Glin Ile Teu Ala Ser Gly Ala Gly Leu Pro Met Glin 635 gtt cgt. to c atc aag att Cag gcc gcc ctd coc aat 4.959 Wall Arg Ser Ile Lys Ile Glin Ala Ala Leu Pro Asn 650

tac gtc aac atc cat tot cco titt gac agc aac 5 OO 4 Wall Asn Ile His Ser Pro Phe Asp Ser Asn 665

aag aat gtc citc. titc. citc. gag aag ggt gtc. acc Lys Asn Wall Telu Phe Teu Glu Lys Gly Val Thr 680 titt gag tog gcc atg acg citc. acc cc.g. cag gto gtg Phe Glu Ser Ala Met Thr Teu Thr Pro Glin Wall Wall 695

Cgg cgc gct acg cgc aac gcc gac ggC tog gto 5 139 Arg Arg Ala Thr Arg Asn Ala Asp Gly Ser Wall 710 aac cgc aac cgt. ggC aag gto tog cgc acc gag citc. 51.84 Asn Arg Asn Arg Gly Lys Wall Ser Arg Thr Glu Teu 725 gcc atg titc. atg cgt. gCg coc gag cac citt citt cag aag 5229 Ala Met Phe Met Arg Ala Pro Glu His Lieu. Leu Glin Lys 740 citc. gct too ggC gag aac Cag gag Cag gcc gag citc gcc 5274 Teu Ala Ser Gly Glu Asn Glin Glu Glin Ala Glu Lieu Ala 755 cgc cco gct gac atc gaa got gac tog 5319

US 7,256,023 B2 119 120

-continued citg 6.177 Teu

<210> SEQ ID NO 4 &2 11s LENGTH 2059 &212> TYPE PRT <213> ORGANISM: Schizochytrium sp. &220s FEATURE <221 NAME/KEY: misc feature <222> LOCATION: (370) ... (370) <223> OTHER INFORMATION: The 'Xaa' at location 370 stands for Leu. &220s FEATURE <221 NAME/KEY: misc feature <222> LOCATION: (371) . . (371) <223> OTHER INFORMATION: The 'Xaa' at location 371 stands for Ala O Wall.

<400 SEQUENCE: 4 Met Ala Ala Arg Asn. Wal Ser Ala Ala His Glu Met His Asp Glu 1 10 15

Arg Ile Ala Val Val Gly Met Ala Val Glin Ala Gly Cys 2O 25 30

Asp Glu Phe Trp Glu Val Leu Met Asn Wall Glu Ser 35 40 45

Wall Ile Ser Asp Lys Arg Lieu Gly Ser Asn Ala Glu His 50 55

Lys Ala Glu Arg Ser Lys Tyr Ala Asp Thr Phe Asn Glu Thr Tyr 65 70 75

Gly Thr Leu Asp Glu Asn. Glu Ile Asp Asn Glu His Glu Telu Telu Telu 85 90 95

Asn Leu Ala Lys Glin Ala Lieu Ala Glu Thr Ser Wall Asp Ser Thr 100 105 110

Arg Cys Gly Ile Val Ser Gly Cys Lieu Ser Phe Pro Met Asp Asn Telu 115 120 125

Glin Gly Glu Lieu Lleu. Asn Val Tyr Glin Asn His Wall Glu Telu 130 135 1 4 0

Gly Ala Arg Val Phe Lys Asp Ala Ser His Trp Ser Glu Arg Glu Glin 145 15 O 155 160

Ser Asn Lys Pro Glu Ala Gly Asp Arg Arg Ile Phe Met Asp Pro Ala 1.65 170 175

Ser Phe Wall Ala Glu Glu Lieu. Asn Lieu Gly Ala Teu His Tyr Ser Wall 18O 185 19 O

Asp Ala Ala Cys Ala Thr Ala Leu Tyr Wall Teu Arg Teu Ala Glin Asp 195 200

His Lieu Val Ser Gly Ala Ala Asp Val Met Teu Cys Gly Ala Thr 210 215 220

Teu Pro Glu Pro Phe Phe Ile Leu Ser Gly Phe Ser Thr Phe Glin Ala 225 230 235 240

Met Pro Val Gly Thr Gly Glin Asn Val Ser Met Pro Teu His Lys Asp 245 250 255

Ser Gln Gly Leu Thr Pro Gly Glu Gly Gly Ser Ile Met Wall Telu 260 265 27 O

Arg Lieu. Asp Asp Ala Ile Arg Asp Gly Asp His Ile Tyr Gly Thr Telu 275 280 285

Teu Gly Ala Asn. Wal Ser Asn. Ser Gly Thr Gly Teu Pro Telu Pro 29 O 295

Teu Leu Pro Ser Glu Lys Lys Cys Lieu Met Asp Thr Thr Arg Ile 305 310 315 320 US 7,256,023 B2 121 122

-continued

Asn Wall His Pro His Lys Ile Glin Tyr Wall Glu Cys His Ala Thr Gly 325 330 335

Thr Pro Glin Gly Asp Arg Wall Glu Ile Asp Ala Wall Ala Cys Phe 340 345 35 O

Glu Gly Lys Wall Pro Arg Phe Gly Thr Thr Gly Asn Phe Gly His 355 360 365

Thr Xaa Xaa Ala Ala Gly Phe Ala Gly Met Lys Wall Telu Telu Ser 370 375

Met His Gly Ile Ile Pro Pro Thr Pro Gly Ile Asp Asp Glu Thr 385 390 395 400

Met Asp Pro Teu Wall Wall Ser Gly Glu Ala Ile Pro Trp Pro Glu 405 410 415

Thr Asn Gly Glu Pro Ala Gly Telu Ser Ala Phe Gly Phe Gly 420 425 43 O

Gly Thr Asn Ala His Ala Wall Phe Glu Glu His Asp Pro Ser Asn Ala 435 4 40 4 45

Ala Cys Thr Gly His Asp Ser Ile Ser Ala Teu Ser Ala Arg Gly 450 455 460

Gly Glu Ser Asn Met Arg Ile Ala Ile Thr Gly Met Asp Ala Thr Phe 465 470 475 480

Gly Ala Telu Gly Teu Asp Ala Phe Glu Arg Ala Ile Thr Gly 485 490 495

Ala His Gly Ala Ile Pro Teu Pro Glu Arg Trp Arg Phe Telu Gly 5 OO 505 51O.

Asp Lys Asp Phe Teu Asp Telu Cys Gly Wall Ala Thr Pro His 515 525

Gly Cys Ile Glu Asp Wall Glu Wall Asp Phe Glin Arg Telu Arg Thr 530 535 540

Pro Met Thr Pro Glu Asp Met Telu Telu Pro Glin Glin Teu Telu Ala Wall 545 550 555 560

Thr Thr Ile Asp Arg Ala Ile Telu Asp Ser Gly Met Gly Gly 565 570 575

Asn Wall Ala Wall Phe Wall Gly Telu Gly Thr Asp Teu Glu Telu Arg 585 59 O

His Arg Ala Arg Wall Ala Teu Lys Glu Arg Wall Arg Pro Glu Ala Ser 595 600 605

Lys Telu Asn Asp Met Met Glin Tyr Ile Asn Asp Gly Thr Ser 610 615 62O

Thr Ser Thr Ser Tyr Ile Gly Asn Telu Wall Ala Thr Arg Wall Ser 625 630 635 640

Ser Glin Trp Gly Phe Thr Gly Pro Ser Phe Thr Ile Thr Glu Gly Asn 645 650 655

Asn Ser Wall Tyr Arg Cys Ala Glu Telu Gly Lys Teu Telu Glu Thr 660 665 67 O

Gly Glu Wall Asp Gly Wall Wall Wall Ala Gly Wall Asp Teu Gly Ser 675 680 685

Ala Glu Asn Telu Tyr Wall Lys Ser Phe Lys Wall Ser Thr Ser 69 O. 695 7 OO

Asp Thr Pro Ala Ser Phe Ala Ala Ala Asp Gly Phe Wall 705 710 715 720

Gly Glu Gly Cys Gly Ala Phe Wall Telu Lys Arg Glu Thr Ser Cys Thr 725 730 735 US 7,256,023 B2 123 124

-continued Asp Asp Arg Ile Tyr Ala Cys Met Asp Ala Ile Wall Pro Gly Asn 740 745 750

Wall Pro Ser Ala Cys Lieu Arg Glu Ala Telu Asp Glin Ala Arg Wall 755 760 765

Pro Gly Asp Ile Glu Met Leu Glu Lieu Ser Ala Asp Ser Ala Arg His 770 775

Teu Asp Pro Ser Wall Leu Pro Lys Glu Teu Thr Ala Glu Glu Glu 785 790 795 8OO

Ile Gly Gly Telu Glin Thr Ile Leu Arg Asp Asp Asp Telu Pro Arg 805 810 815

Asn Wall Ala Thr Gly Ser Wall Lys Ala Thr Wall Gly Asp Thr Gly 820 825

Ala Ser Gly Ala Ala Ser Lieu. Ile Lys Ala Ala Teu Cys Ile Asn 835 840 845

Arg Tyr Telu Pro Ser Asn Gly Asp Asp Trp Asp Glu Pro Ala Pro Glu 85 O 855 860

Ala Pro Trp Asp Ser Thr Leu Phe Ala Glin Thr Ser Arg Ala Trp 865 870 875

Teu Asn Pro Gly Glu Arg Arg Tyr Ala Ala Wall Ser Gly Wall Ser 885 890 895

Glu Thr Arg Ser Cys Tyr Ser Wall Leu Telu Ser Glu Glu Gly His 9 OO 905 910

Glu Arg Glu Asn Arg Ile Ser Lieu Asp Glu Glu Pro Telu 915 920

Ile Wall Telu Arg Ala Asp Ser His Glu Glu Ile Teu Arg Telu Asp 930 935 940

Lys Ile Arg Glu Arg Phe Leu Glin Pro Thr Gly Ala Pro Arg Glu 945 950 955 96.O

Ser Glu Telu Ala Glin Ala Arg Arg Ile Phe Teu Telu Telu Gly 965 970 975

Glu Thr Telu Ala Glin Asp Ala Ala Ser Ser Gly Ser Lys Pro Telu 98O 985 99 O

Ala Telu Ser Telu Wa Ser Thr Pro Ser Lys Lieu G in Arg Glu Val Glu 995 10 OO O

Teu Ala Ala Gly Ile Pro Arg Cys Teu Lys Arg Asp O 10 O15

Trp Ser Pro Ser Arg Tyr Ala Pro G Pro Teu Ala

Ser Arg Wall Ala Phe Glu Gly A Ser Pro

Ile Thr le His Arg Ile Trp P Glu Teu His

Glu Ile Asn Thr Asn Arg Teu Trp A Glu Gly Asp

Arg Wall Met Pro Arg Ala Ser Phe Ser G Teu Glu Ser

Glin Glin Glu Phe Asp Arg Asn Met Ile Glu M Phe Arg Teu

Gly Teu Thr Ser Ile Ala Phe Thr Asn Teu. A Arg Asp Wall

Teu Asn Ile Thr Pro Lys Ala Ala Phe Gly Teu S Teu Gly Glu 130

Ile Ser Met Ile Phe Ala Phe Ser Lys Asn G Teu Ile Ser US 7,256,023 B2 125 126

-continued

145 155

Asp Glin Teu Thr Asp Arg Glu Ser Asp Trp Asn 160

Ala Telu Ala Wall Glu Phe Ala Telu Arg Trp Gly Ile 175

Pro Ser Wall Pro Glu Phe Trp Ile Wall

Arg Thr Glin Asp Glu Ala Ala Pro Asp Ser

Wall Telu Thr Ile Asn Asp Thr Ala Teu

Ile Gly Pro Asp Ala Ala Arg Teu

Gly Asn Ile Pro Ala Pro Wall Thr Met Gly

His Pro Glu Wall Gly Thr Asp Ala Ile

His Asn Teu Glu Phe Wall Wall Asp Gly Asp Teu Trp

Thr Ile Asn Glin Telu Wall Pro Arg Thr Gly Ala

Glu Trp Ala Pro Ser Phe Gly Glu Ala Gly Glin

Teu Glu Glin Ala Phe Pro Glin Ile Glu Thr Ile

Glin Asn Asp Phe Wall Glu Wall Pro Asn Asn

His Ser Thr Ala Wall Thr Thr Teu Gly Glin Arg Asn

His Ala Gly Ala Ile Glin Asn Glu Ala Trp Thr

Thr Wall Telu Wall Ser Telu Ala Teu Wall Pro

Gly Thr Ser Pro His Ser Wall Ala Glu

Ala Ala Ala Telu Gly Pro

Phe Wall Arg Ile Glin Teu Asn Arg Phe Asn

Ser Ala Asp Pro Ile Ser Ala Asp Teu Ser Phe Pro

Pro Asp Pro Ala Ile Ala Ala Ile Ser Arg Ile Met

Wall Ala Pro Ala Arg Teu Ile Asp Glu

Glin Glu Thr Arg Asp Ile Telu Asn Asn Ala Pro

Ser Ser Ser Ser Ser Ser Ser Ser Ser Ser Ser Ser

Pro Pro Ala Pro Ser Pro Wall Glin Ala Ala Pro

Ala Ala Glu Thr Ala Ala Ser Ala Asp Teu Arg Ser 535 US 7,256,023 B2 127 128

-continued

Ala Teu Asp Telu Asp Ser Met Telu Ala Teu Ser Ser Ala Ser 555 560

Ala Gly Asn Telu Wall Thr Ala Pro Ser Ala Ser Wall

Ile Pro Pro Asn Ala Asp Teu Gly Arg Ala Phe

Met Thr Gly Wall Ala Pro Teu Gly Ala Met

Ala Gly Ile Ala Ser Asp Telu Wall Ile Ala Gly Arg

Glin Ile Teu Ala Ser Gly Ala Gly Gly Pro Met Glin

Wall Arg Glu Ser Ile Ile Glin Ala Teu Pro Asn

Gly Ala Wall Asn Ile His Ser Pro Asp Ser Asn

Teu Gly Asn Wall Telu Phe Teu Glu Gly Wall Thr

Phe Glu Ala Ser Ala Met Thr Teu Thr Glin Wall Wall

Arg Arg Ala Ala Gly Thr Arg Asn Ala Gly Ser Wall

Asn Arg Asn Arg Ile Gly Wall Ser Thr Glu Teu

Ala Met Phe Met Arg Ala Pro Glu His Teu Glin

Teu Ala Ser Gly Glu Asn Glin Glu Glin Glu Teu Ala

Arg Wall Pro Wall Ala Asp Ile Ala Wall Ala Asp Ser

Gly His Thr Asp Asn Pro Ile His Wall Teu Pro Teu

Ile Asn Teu Arg Asp Telu His Arg Glu Gly Pro

Ala Teu Arg Wall Arg Gly Ala Gly Gly Ile Gly

Pro Ala Ala Telu Ala Phe Asn Met Gly Ser Phe

Wall Gly Thr Wall Asn Wall Ala Glin Gly Thr

Asp Wall Arg Glin Ala Ala Thr Ser Asp Wall

Ala Pro Ala Met Phe Glu Glu Wall Teu

Glin Teu Gly Met Phe Pro Ser Ala Asn

Teu Glu Teu Phe Asp Ser Phe Ser Met Pro

Pro Glu Teu Ala Arg Glu Arg Ile Ser Arg Ala

Teu Glu Wall Trp Asp Thr Asn Phe Ile Asn Arg US 7,256,023 B2 129 130

-continued

Teu His Asn Pro Glu Ile Glin Arg Ala Glu Arg Asp Pro Lys 1940 1945 1950

Teu Lys Met Ser Telu Phe Arg Trp Teu Ser Leu Ala Ser 1955 1960 1965

Arg Trp Ala Asn Thr Gly Ala Ser Asp Arg Wall Met Asp Glin 1970 1975 1980

Wall Trp Gly Pro Ala Ile Gly Ser Phe Asn Asp Phe 1985 1990 1995

Gly Thr Teu Asp Pro Ala Wall Ala Asn Glu Tyr Pro Cys Wall 2OOO 2005 2010

Wall Glin Ile Asn Glin Ile Leu Arg Gly Ala Cys Phe Leu Arg 2015 2020 2025

Arg Telu Glu Ile Telu Arg Asn Ala Arg Teu Ser Asp Gly Ala Ala 2030 2O35 20 40

Ala Telu Wall Ala Ser Ile Asp Asp Thr Wall Pro Ala Glu 2O45 2O5 O

Teu

SEQ ID NO 5 LENGTH 45.09 TYPE DNA ORGANISM: Schizochytrium sp. FEATURE: NAME/KEY: CDS LOCATION: (1) ... (450.9) <400 SEQUENCE: 5 atg gCg citc. cgt. gto aag acg aac aag aag cca tgc tgg gag. atg acc 48 Met Ala Telu Arg Wall Lys Thir As in Lys Lys Pro Trp Glu Met Thr 1 5 10 15 aag gag gag citg acc agC ggC aag acc gag gtg titc. aac tat gag gaa 96 Lys Glu Glu Telu Thr Ser Gly Lys Thr Glu Wall Phe Asn Tyr Glu Glu 2O 25 30 citc. citc. gag titc. gCa gag ggc gac atc gcc aag gto titc. gga coc gag 144 Teu Telu Glu Phe Ala Glu Gly. As p Ile Ala Lys Wall Phe Gly Pro Glu 35 40 45 titc. gcc gtc atc gac aag tac cc cgc gtg cgc citg coc gcc cgc 192 Phe Ala Wall Ile Asp Lys Tyr Pro Arg Arg Wall Arg Teu Pro Ala Arg 50 55 60 gag tac citg citc. gtg acc cgc gtc acc citc. atg gac gcc gag. gtc aac 240 Glu Tyr Telu Telu Wall Thr Arg Wall Thr Telu Met Asp Ala Glu Wall Asn 65 70 75 8O aac tac cgc gtc ggC gcc cgc at g g to acc gag tac gat citc. coc gtc 288 Asn Tyr Arg Wall Gly Ala Arg Met Wall Thr Glu Asp Telu Pro Wall 85 90 95 aac gga gag citc. too gag ggC gga gac tgc cco tgg gcc gtc citg gtc 336 Asn Gly Glu Telu Ser Glu Gly Gl y Asp Cys Pro Trp Ala Wall Telu Wall 100 105 110 gag agt ggC Cag tgc gat citc atg citc atc too tac atg ggc att gac 384 Glu Ser Gly Glin Cys Asp Teu Met Telu Ile Ser Met Gly Ile Asp 115 12 O 125 titc. Cag aac Cag ggC gac cgc gtc tac cgc citg citc. aac acc acg citc. 432 Phe Glin Asn Glin Gly Asp Arg Val Tyr Arg Teu Teu Asn Thr Thr Telu 130 135 1 4 0 acc titt tac ggC gtg gcc Cac gag ggc gag acc citc. gag tac gac att 480 Thr Phe Tyr Gly Wall Ala His Gl u Gly Glu Thr Teu Glu Asp Ile 145 15 O 155 160 cgc gtc acc ggC titc. gcc aag cqt citc gac ggC ggC atc to c atg titc. 528 Arg Wall Thr Gly Phe Ala Lys Arg Lieu Asp Gly Gly Ile Ser Met Phe

US 7,256,023 B2 140

-continued

400 405 410 gcc agc cqc togg gcc aac at g g g c goc cog gac cqc gtc atg gac 4284 Ala Ser Arg Trp Ala Asn Met Gly Ala Pro Asp Arg Val Met Asp 415 420 425 tac cag gito togg togt ggc cc g goc att ggc gcc titc aac gac titc 4329 Tyr Glin Val Trp Cys Gly Pro Ala Ile Gly Ala Phe Asn Asp Phe 430 435 4 40 atc aag ggc acc tac citc gac coc got gtc. tcc aac gag tac ccc 4374 Ile Lys Gly Thr Tyr Leu Asp Pro Ala Val Ser Asn Glu Tyr Pro 4 45 450 455 tgt gtc gito cag atc aac citg caa atc citc cqt ggit gcc toc tac 4 419 Cys Val Val Glin Ile Asn Lieu Glin Ile Leu Arg Gly Ala Cys Tyr 460 465 470 citg cqc cqt citc aac goc citg cqc aac gac cc g c go att gac citc 4 464 Leu Arg Arg Lieu. Asn Ala Lieu Arg Asn Asp Pro Arg Ile Asp Lieu 475 480 485 gag acc gag gat gct gcc titt gtc tac gag ccc acc aac gog citc 4509 Glu Thr Glu Asp Ala Ala Phe Val Tyr Glu Pro Thr Asn Ala Leu 490 495 5 OO

<210> SEQ ID NO 6 &2 11s LENGTH 1503 &212> TYPE PRT <213> ORGANISM: Schizochytrium sp. <400 SEQUENCE: 6 Met Ala Leu Arg Val Lys Thr Asn Lys Llys Pro Cys Trp Glu Met Thr 1 5 10 15 Lys Glu Glu Leu Thir Ser Gly Lys Thr Glu Val Phe Asn Tyr Glu Glu 2O 25 30 Leu Lieu Glu Phe Ala Glu Gly Asp Ile Ala Lys Val Phe Gly Pro Glu 35 40 45 Phe Ala Val Ile Asp Lys Tyr Pro Arg Arg Val Arg Lieu Pro Ala Arg 50 55 60 Glu Tyr Leu Leu Val Thr Arg Val Thr Leu Met Asp Ala Glu Val Asn 65 70 75 8O Asn Tyr Arg Val Gly Ala Arg Met Val Thr Glu Tyr Asp Leu Pro Val 85 90 95 Asn Gly Glu Lieu Ser Glu Gly Gly Asp Cys Pro Trp Ala Val Lieu Val 100 105 110 Glu Ser Gly Glin Cys Asp Leu Met Lieu. Ile Ser Tyr Met Gly Ile Asp 115 120 125 Phe Glin Asn Glin Gly Asp Arg Val Tyr Arg Lieu Lleu. Asn. Thir Thr Lieu 130 135 1 4 0 Thr Phe Tyr Gly Val Ala His Glu Gly Glu Thir Leu Glu Tyr Asp Ile 145 15 O 155 160 Arg Val Thr Gly Phe Ala Lys Arg Lieu. Asp Gly Gly Ile Ser Met Phe 1.65 170 175 Phe Phe Glu Tyr Asp Cys Tyr Val Asin Gly Arg Leu Leu Ile Glu Met 18O 185 19 O Arg Asp Gly Cys Ala Gly Phe Phe Thr Asn. Glu Glu Lieu. Asp Ala Gly 195 200 2O5 Lys Gly Val Val Phe Thr Arg Gly Asp Leu Ala Ala Arg Ala Lys Ile 210 215 220 Pro Lys Glin Asp Val Ser Pro Tyr Ala Wall Ala Pro Cys Lieu. His Lys 225 230 235 240 US 7,256,023 B2 141 142

-continued

Thr Telu Asn Glu Lys Glu Met Glin Thr Teu Val Asp Lys Asp Trp 245 250 255

Ala Ser Wall Phe Gly Ser Lys Asn Gly Met Pro Glu Ile Asn Tyr 260 265 27 O

Teu Ala Arg Lys Met Teu Met Ile Asp Arg Wall Thr Ser Ile Asp 275 280 285

His Lys Gly Gly Wall Tyr Gly Telu Gly Glin Teu Wall Gly Glu Lys Ile 29 O 295

Teu Glu Arg Asp His Trp Tyr Phe Pro Cys His Phe Wall Asp Glin 305 310 315 320

Wall Met Ala Gly Ser Teu Wall Ser Asp Gly Cys Ser Glin Met Telu 325 330 335

Met Met Ile Trp Teu Gly Telu His Telu Thr Thr Gly Pro Phe Asp 340 345 35 O

Phe Arg Pro Wall Asn Gly His Pro Asn Lys Wall Arg Cys Arg Gly Glin 355 360 365

Ile Ser Pro His Lys Gly Lys Telu Wall Wall Met Glu Ile Lys Glu 370 375

Met Gly Phe Asp Glu Asp Asn Asp Pro Tyr Ala Ile Ala Asp Wall Asn 385 390 395 400

Ile Ile Asp Wall Asp Phe Glu Lys Gly Glin Asp Phe Ser Telu Asp Arg 405 410 415

Ile Ser Asp Tyr Gly Lys Gly Telu Asn Ile Wall Wall Asp 420 425 43 O

Phe Gly Ile Ala Teu Lys Met Glin Lys Arg Ser Thr Asn Lys Asn 435 4 40 4 45

Pro Ser Wall Glin Pro Wall Phe Ala Asn Gly Ala Ala Thr Wall Gly 450 455 460

Pro Glu Ala Ser Lys Ala Ser Ser Gly Ala Ser Ala Ser Ala Ser Ala 465 470 475 480

Ala Pro Ala Lys Pro Ala Phe Ser Ala Asp Wall Teu Ala Pro Lys Pro 485 490 495

Wall Ala Telu Pro Glu His Ile Telu Lys Gly Asp Ala Teu Ala Pro 5 OO 505

Glu Met Ser Trp His Pro Met Ala Arg Ile Pro Gly Asn Pro Thr Pro 515 525

Ser Phe Ala Pro Ser Ala Tyr Lys Pro Asn Ile Ala Phe Thr Pro 530 535 540

Phe Pro Gly Asn Pro Asn Asp Asn Asp His Thr Pro Gly Lys Met Pro 545 550 555 560

Teu Thr Trp Phe Asn Met Ala Glu Phe Met Ala Gly Lys Wall Ser Met 565 570 575

Telu Gly Pro Glu Phe Ala Lys Phe Asp Asp Ser Asn Thr Ser Arg 585 59 O

Ser Pro Ala Trp Asp Teu Ala Telu Wall Thr Arg Ala Wall Ser Wall Ser 595 600 605

Asp Telu His Wall Asn Tyr Arg Asn Ile Asp Teu Asp Pro Ser 610 615

Gly Thr Met Wall Gly Glu Phe Pro Ala Asp Ala Trp Phe Tyr 625 630 635 640

Gly Ala Cys Asn Asp Ala His Met Pro Ser Ile Telu Met Glu 645 650 655

Ile Ala Telu Glin Thr Ser Gly Wall Telu Thr Ser Wall Teu Ala Pro US 7,256,023 B2 143 144

-continued

660 665 67 O

Teu Thr Met Glu Lys Asp Asp Ile Leu Phe Arg Asn Teu Asp Ala Asn 675 680 685

Ala Glu Phe Val Arg A Asp Lieu. Asp Tyr Arg Gly Lys Thr Ile Arg 69 O. 695 7 OO

Asn Wall Thr Lys Cys Gly Tyr Ser Met Teu Gly Glu Met Gly Wall 705 715 720

His Arg Phe Thr Phe Leu Tyr Val Asp Asp Wall Teu Phe Tyr 725 730 735

Gly Ser Thr Ser Phe Trp Phe Val Pro Glu Wall Phe Ala Ala Glin 740 745 750

Ala Gly Telu Asp Asn Arg Lys Ser Glu Pro Trp Phe Ile Glu Asn 755 760 765

Wall Pro Ala Ser Wall Ser Ser Phe Asp Wall Arg Pro Asn Gly 770 775

Ser Gly Thr Ala Phe Ala Asn Ala Pro Ser Gly Ala Glin Telu 785 795

Asn Arg Arg Thr Asp Gly Glin Tyr Telu Asp Ala Wall Asp Ile Wall 805 810 815

Ser Gly Ser Gly Lys Lys Ser Lieu Gly Tyr Ala His Gly Ser Lys Thr 820 825

Wall Asn Pro Asn Asp Trp Phe Phe Ser Cys His Phe Trp Phe Asp Ser 835 840 845

Wall Met Pro Gly Ser Teu Gly Val Glu Ser Met Phe Glin Telu Wall Glu 85 O 855 860

Ala Ile Ala Ala His Glu Asp Leu Ala Gly Lys Ala Arg His Glin 865 870 875

Pro His Telu Cys Ala Arg Pro Arg Ala Arg Ser Ser Trp Tyr Arg 885 890 895

Gly Glin Telu Thr Pro Lys Ser Lys Lys Met Asp Ser Glu Wall His Ile 9 OO 905 910

Wall Ser Wall Asp Ala His Asp Gly Val Wall Asp Teu Wall Ala Asp Gly 915 920 925

Phe Telu Trp Ala Asp Ser Leu Arg Val Tyr Ser Wall Ser Asn Ile Arg 930 935 940

Wall Arg Ile Ala Ser Gly Glu Ala Pro Ala Ala Ala Ser Ser Ala Ala 945 950 955 96.O

Ser Wall Gly Ser Ser Ala Ser Ser Wall Glu Arg Thr Arg Ser Ser Pro 965 970 975

Ala Wall Ala Ser Gly Pro Ala Glin Thr Ile Asp Teu Lys Glin Telu Lys 985 99 O

Thr Glu Telu Leu Glu Teu Asp Ala Pro Leu Tyr Leu Ser Glin Asp Pro 995 10 OO 10

Thr Ser Gly Glin Leu Lys Lys His Thr Asp Wall Ala Ser Gly Glin 1010 1015 1020

Ala Thr Ile Wall Glin Pro Cys Thir Leu Gly Asp Teu Gly Asp Arg 1025 1030 1035

Ser Phe Met Glu Thr Tyr Gly Wal Wall Ala Pro Teu Tyr Thr Gly 1040 1045 105 O

Ala Met Ala Lys Gly Ile Ala Ser Ala Asp Leu Wall Ile Ala Ala 1055 1060 1065

Gly Lys Arg Lys Ile Telu Gly Ser Phe Gly Ala Gly Gly Teu Pro 1070 1075 1080 US 7,256,023 B2 145 146

-continued

Met His His Wall Arg Ala Telu Glu Ile Gn Ala Ala Teu O85 O95

Pro Gly Pro Ala Asn Telu Ile His Ser Pro Phe Asp 10

Ser Teu Glu Gly Wall Asp Teu Phe Teu Glu Gly 25

Wall Wall Wall Glu Ala Ala Phe Met Thr Teu Thr Pro Glin 40

Wall Arg Arg Ala Gly Telu Ser Arg Asn Ala Asp Gly 55

Ser Asn Ile Arg Asn Ile Ile Gly Wall Ser Arg Thr 70

Glu Ala Glu Met Phe Arg Pro Ala Pro His Teu Teu

Glu Teu Ile Ala Ser Glu Ile Thr Glin Glin Ala Glu

Teu Arg Arg Wall Pro Ala Asp Asp Ile Wall Glu Ala

Asp Gly Gly His Thr Asn Arg Pro Ile Wall Ile Teu

Pro Ile Ile Asn Telu Asn Arg Teu His Glu Gly

Ala His Telu Arg Arg Wall Gly Ala Gly Gly Wall

Gly Pro Glin Ala Ala Ala Ala Teu Thr Gly Ala Ala

Phe Wall Thr Gly Thr Asn Glin Wall Ala Glin Ser Gly

Thr Asp Asn Wall Arg Glin Telu Ser Glin Thr Ser

Asp Met Ala Pro Ala Asp Met Phe Glu Gly Wall

Glin Wall Telu Gly Thr Met Phe Ser Arg Ala

Asn Teu Glu Telu Asp Phe Asp Ser

Met Pro Ala Glu Telu Arg Ile Glu Ile Phe

Arg Teu Glin Glu Wall Glu Glu Thr Phe Ile

Asn Teu Asn Pro Ile Glin Arg Glu His Asp

Pro Teu Met Ser Phe Arg Trp Teu Gly Teu

Ala Arg Trp Ala Asn Gly Ala Pro Asp Wall Met Asp

Wall Trp Gly Ala Ile Gly Ala Phe Asn Asp Phe 4 40

Gly Thr Telu Pro Ala Wall Ser Asn Glu Pro 455

Wall Glin Ile Asn Telu Glin Ile Teu Arg Ala 465 470

US 7,256,023 B2 151 152

-continued

Arg Glu Ser Trp Glu Thr Ile Arg Ala Gly Ile Asp Cys Telu Ser Asp 35 40 45

Teu Pro Glu Asp Wall Asp Wall Thr Ala Tyr Phe Asp Pro Wall 50 55 60

Thr Thr Asp Lys Ile Cys Lys Gly Gly Phe Ile Pro Glu 65 70 75

Asp Phe Asp Ala Glu Phe Gly Telu Asn Met Phe Glin Met Glu 85 90 95

Asp Ser Asp Ala Asn Glin Thr Ile Ser Telu Lel Wall Lys Glu Ala 100 105 110

Teu Glin Asp Ala Gly Ile Asp Ala Telu Gly Glu Lys Asn Ile 115 120 125

Gly Cys Wall Telu Gly Ile Gly Gly Gly Glin Ser Ser His Glu Phe 130 135 1 4 0

Tyr Ser Arg Telu Asn Tyr Wall Wall Wall Glu Lys Wall Teu Arg Met 145 15 O 55 160

Gly Met Pro Glu Glu Asp Wall Lys Wall Ala Wall Glu Lys Ala 1.65 170 175

Asn Phe Pro Glu Trp Arg Teu Asp Ser Phe Gly Phe Telu Gly Asn 18O 185 19 O

Wall Thr Ala Gly Thr Asn Thr Phe Asn Teu Asp Gly Met Asn 195 200 2O5

Wall Wall Asp Ala Ala Cys Ala Ser Ser Teu Ile Ala Wall Wall 210 215 220

Ala Ile Asp Glu Teu Teu Gly Asp Cys Asp Met Met Wall Thr Gly 225 230 235 240

Ala Thr Thr Asp Asn Ser Ile Gly Met Met Ala Phe Ser 245 250 255

Thr Pro Wall Phe Ser Thr Asp Pro Ser Wall Arg Ala Asp Glu 260 265 27 O

Thr Gly Met Teu Ile Gly Glu Gly Ser Ala Met Teu Wall Telu 275 280 285

Arg Tyr Ala Asp Ala Wall Arg Asp Gly Asp Glu Ile His Ala Wall Ile 29 O 295

Arg Gly Cys Ala Ser Ser Ser Asp Gly Lys Ala Ala Gly Ile Thr 305 310 315 320

Pro Thr Ile Ser Gly Glin Glu Glu Ala Telu Arg Arg Ala Asn Arg 325 330 335

Ala Wall Asp Pro Ala Thr Wall Thr Telu Wall Glu Gly His Gly Thr 340 345 35 O

Gly Thr Pro Wall Gly Asp Arg Ile Glu Telu Thr Ala Teu Arg Asn Telu 355 360 365

Phe Asp Ala Tyr Gly Glu Gly Asn Thr Glu Lys Wall Ala Wall Gly 370 375

Ser Ile Ser Ser Ile Gly His Telu Lys Ala Wall Ala Gly Telu Ala 385 390 395 400

Gly Met Ile Wall Ile Met Ala Telu Lys His Thr Telu Pro Gly 405 410 415

Thr Ile Asn Wall Asp Asn Pro Pro Asn Telu Asp Asn Thr Pro Ile 420 425 43 O

Asn Glu Ser Ser Teu Tyr Ile Asn Thr Met Asn Arg Pro Trp Phe Pro 435 4 40 4 45 US 7,256,023 B2 153 154

-continued Pro Pro Gly Val Pro Arg Arg Ala Gly Ile Ser Ser Phe Gly Phe Gly 450 455 460

Gly Ala Asn Tyr His Ala Wall Leu Glu Glu Ala Glu Pro Glu His Thr 465 470 475 480

Thr Ala Tyr Arg Lieu. Asn Lys Arg Pro Gln Pro Val Leu Met Met Ala 485 490 495

Ala Thr Pro Ala 5 OO

SEQ ID NO 9 LENGTH 1278 TYPE DNA ORGANISM: Schizochytrium sp. FEATURE: NAME/KEY: CDS LOCATION: (1) . . (1278) <400 SEQUENCE: 9 gat gtc acc aag gag gcc tgg cgc citc. coc cgc gag ggC gtc agc titc. 48 Asp Wall Thr Lys Glu Ala Trp Arg Telu Pro Arg Glu Gly Wall Ser Phe 1 10 15 cgc gcc aag atc gcc acc aac ggC gct gto gcc gCg citc. titc. to c 96 Arg Ala Lys Ile Ala Thr Asn Gly Ala Wall Ala Ala Telu Phe Ser 25 30 ggC Cag ggC gCg Cag tac acg cac atg titt agC gag gtg gcc atg aac 144 Gly Glin Gly Ala Glin Thr His Met Phe Ser Glu Wall Ala Met Asn 35 40 45 tgg coc cag titc. cgc cag agc att gCC gCC atg gac gcc gcc cag to c 192 Trp Pro Glin Phe Arg Glin Ser Ile Ala Ala Met Asp Ala Ala Glin Ser 50 55 60 aag gtc gct gga agC gac aag gac titt gag cgc gto too Cag gtc citc. 240 Lys Wall Ala Gly Ser Lys Asp Phe Glu Arg Wall Ser Glin Wall Telu 65 75 8O tac cc.g cgc aag cc.g gag cgt. gag coc gag Cag aac coc aag aag 288 Pro Arg Lys Pro Glu Arg Glu Pro Glu Glin Asn Pro Lys Lys 85 90 95 atc to c citc. acc gcc tog Cag coc tog acc citg gcc tgc gct citc. 336 Ile Ser Telu Thr Ala Ser Glin Pro Ser Thr Teu Ala Cys Ala Telu 100 105 110 ggit gcc titt gag atc titc. aag gag gcc ggC titc. acc cc.g gac titt gcc 384 Gly Ala Phe Glu Ile Phe Lys Glu Ala Gly Phe Thr Pro Asp Phe Ala 115 120 125 gcc ggC cat tog citc. ggit gag titc. gcc gcc citc. tac gcc gC9 ggC tgc 432 Ala Gly His Ser Teu Gly Glu Phe Ala Ala Teu Tyr Ala Ala Gly Cys 130 135 1 4 0 gto gac cgc gac gag citc. titt gag citt gtc tgc cgc cgc gcc cgc atc 480 Wall Asp Arg Asp Glu Teu Phe Glu Telu Wall Cys Arg Arg Ala Arg Ile 145 15 O 155 160 atg ggC ggC aag gac gCa cc.g gcc acc coc aag gga tgc atg gcc gcc 528 Met Gly Gly Lys Asp Ala Pro Ala Thr Pro Lys Gly Cys Met Ala Ala 1.65 170 175 gto att ggC coc aac gcc gag aac atc aag gto Cag gcc gcc aac gtc 576 Wall Ile Gly Pro Asn Ala Glu Asn Ile Lys Wall Glin Ala Ala Asn Wall 18O 185 19 O tgg citc. ggC aac too aac tog cost tog Cag acc gto atc acc ggC to c 624 Trp Telu Gly Asn Ser Asn Ser Pro Ser Glin Thr Wall Ile Thr Gly Ser 195 200 2O5 gto gaa ggit atc Cag gcc gag agc gcc cgc citc. Cag aag gag. ggC titc. 672 Wall Glu Gly Ile Glin Ala Glu Ser Ala Arg Teu Glin Glu Gly Phe 210 215 220 US 7,256,023 B2 155 156

-continued cgc gtc gtg cost citt gcc tgc gag agc gcc titc. cac tog coc Cag atg 720 Arg Wall Wall Pro Teu Ala Cys Glu Ser Ala Phe His Ser Pro Glin Met 225 230 235 240 gag aac gcc tog tog gcc titc. aag gac gtc atc. too aag gtc to c titc. 768 Glu Asn Ala Ser Ser Ala Phe Lys Asp Wall Ile Ser Lys Wall Ser Phe 245 250 255 cgc acc coc aag gcc gag acc aag citc. titc. agC aac gto tot ggC gag 816 Arg Thr Pro Lys Ala Glu Thr Lys Telu Phe Ser Asn Wall Ser Gly Glu 260 265 27 O acc tac coc acg gac gcc cgc gag atg citt acg Cag cac atg acc agc 864 Thr Tyr Pro Thr Asp Ala Arg Glu Met Telu Thr Glin His Met Thr Ser 275 280 285 agC gtc aag titc. citc. acc Cag gtc cgc aac atg cac Cag gcc ggit gCg 912 Ser Wall Lys Phe Teu Thr Glin Wall Arg Asn Met His Glin Ala Gly Ala 29 O 295 3OO cgc atc titt gtc gag titc. gga coc aag Cag gtg citc. too aag citt gtc 96.O Arg Ile Phe Wall Glu Phe Gly Pro Lys Glin Wall Teu Ser Lys Telu Wall 305 310 315 320

too gag acc citc. aag gat gac coc tog gtt gto acc gto tot gtc aac OO 8 Ser Glu Thr Telu Lys Asp Asp Pro Ser Wall Wall Thr Wall Ser Wall Asn 325 330 335 cc.g gcc tog ggC acg gat tog gac atc Cag citc. cgc gac gC9 gcc gtc Pro Ala Ser Gly Thr Asp Ser Asp Ile Glin Teu Arg Asp Ala Ala Wall 340 345 35 O

Cag citc. gtt gtc gct ggC gto aac citt Cag ggC titt gac aag tgg gac 104 Glin Telu Wall Wall Ala Gly Wall Asn Telu Glin Gly Phe Asp Lys Trp Asp 355 360 365 gcc coc gat gcc acc cgc atg Cag gcc atc aag aag aag cgc act acc 152 Ala Pro Asp Ala Thr Arg Met Glin Ala Ile Lys Lys Lys Arg Thr Thr 370 375 38O citc. citt tog gcc gcc acc tac gtc tog gac aag acc aag aag gtc 200 Teu Arg Telu Ser Ala Ala Thr Tyr Wall Ser Asp Thr Lys Lys Wall 385 390 395 400

gac gcc gcc atg aac gat ggC cgc tgc gto acc tac citc. aag ggC 248 Arg Asp Ala Ala Met Asn Asp Gly Arg Cys Wall Thr Telu Lys Gly 405 410 415 gcc gCa cc.g citc. atc aag gcc cc.g gag coc 278 Ala Ala Pro Telu Ile Lys Ala Pro Glu Pro 420 425

SEQ ID NO 10 LENGTH 426 TYPE PRT ORGANISM: Schizochytrium sp.

<400 SEQUENCE: 10

Asp Val Thr Lys Glu Ala Trp Telu Pro Arg Glu Gly Wall Ser Phe 1 5 10 15

Arg Ala Lys Gly Ile Ala Thr Asn Gly Ala Wall Ala Ala Telu Phe Ser 25 30

Gly Glin Gly Ala Glin Tyr Thr His Met Phe Ser Glu Wall Ala Met Asn 35 40 45

Trp Pro Glin Phe Glin Ser Ile Ala Ala Met Asp Ala Ala Glin Ser 50 55 60

Lys Wall Ala Gly Ser Asp Phe Glu Arg Wall Ser Glin Wall Telu 65 70 75 8O

Pro Lys Pro Tyr Glu Glu Pro Glu Glin Asn Pro Lys 85 90 95

Ile Ser Telu Thr Ala Tyr Ser Glin Pro Ser Thr Teu Ala Ala Telu US 7,256,023 B2 157 158

-continued

100 105 110

Gly Ala Phe Glu Ile Phe Lys Glu Ala Gly Phe Thr Pro Asp Phe Ala 115 120 125

Ala Gly His Ser Leu Gly Glu Phe Ala Ala Teu Tyr Ala Ala Gly Cys 130 135 1 4 0

Wall Asp Arg Asp Glu Lieu Phe Glu Lieu Wall Cys Arg Arg Ala Arg Ile 145 15 O 155 160

Met Gly Gly Lys Asp Ala Pro Ala Thr Pro Gly Cys Met Ala Ala 1.65 170 175

Wall Ile Gly Pro Asn Ala Glu Asn. Ile Lys Wall Glin Ala Ala Asn Wall 18O 185 19 O

Trp Telu Gly Asn Ser Asn Ser Pro Ser Glin Thr Wall Ile Thr Gly Ser 195 200

Wall Glu Gly Ile Glin Ala Glu Ser Ala Teu Glin Glu Gly Phe 210 215 220

Arg Wall Wall Pro Leu Ala Cys Glu Ser Ala Phe His Ser Pro Glin Met 225 230 235 240

Glu Asn Ala Ser Ser Ala Phe Lys Asp Wall Ile Ser Wall Ser Phe 245 250 255

Arg Thr Pro Lys Ala Glu Thr Lys Lieu Phe Ser Asn Wall Ser Gly Glu 260 265 27 O

Thr Pro Thr Asp Ala Glu Met Telu Thr Glin His Met Thr Ser 275 280 285

Ser Wall Phe Leu. Thr Glin Val Arg Asn Met His Glin Ala Gly Ala 29 O 295

Arg Ile Phe Wall Glu Phe Gly Pro Lys Glin Wall Teu Ser Telu Wall 305 310 315 320

Ser Glu Thr Telu Lys Asp Asp Pro Ser Wall Wall Thr Wall Ser Wall Asn 325 330 335

Pro Ala Ser Gly Thr Asp Ser Asp Ile Glin Teu Asp Ala Ala Wall 340 345 35 O

Glin Telu Wall Wall Ala Gly Wall Asn Lieu Glin Gly Phe Asp Trp Asp 355 360 365

Ala Pro Asp Ala Thr Arg Met Glin Ala Ile Lys Thr Thr 370 375 38O

Teu Arg Telu Ser Ala Ala Thr Tyr Val Ser Asp Lys Wall 385 390 395 400

Arg Asp Ala Ala Met Asn Asp Gly Arg Cys Wall Thr Telu Lys Gly 405 410 415

Ala Ala Pro Telu Ile Lys Ala Pro Glu Pro 420 425

SEQ ID NO 11 LENGTH 5 TYPE PRT ORGANISM Schizochytrium sp. FEATURE: NAME/KEY: MISC FEATURE LOCATION: (4) . . (4) OTHER INFORMATION X = any amino acid <400 SEQUENCE: 11 Gly His Ser Xaa Gly 1 5

<210> SEQ ID NO 12 US 7,256,023 B2 160

-continued

&2 11s LENGTH 258 &212> TYPE DNA <213> ORGANISM: Schizochytrium sp. &220s FEATURE <221 NAME/KEY: CDS <222> LOCATION: (1) . . (258) <400 SEQUENCE: 12 gct gtc. tcg aac gag citt citt gag aag gcc gag act gto gtc atg gag 48 Ala Val Ser Asn. Glu Lieu Lleu Glu Lys Ala Glu Thr Wall Wall Met Glu 1 5 10 15 gto: ctic goc goc aag acc ggc tac gag acc gac atg atc gag. gct gac 96 Val Lieu Ala Ala Lys Thr Gly Tyr Glu Thr Asp Met Ile Glu Ala Asp 2O 25 30 atg gag ctic gag acc gag citc ggc att gac too atc aag cgt. gtc gag 144 Met Glu Leu Glu Thr Glu Leu Gly Ile Asp Ser Ile Lys Arg Wall Glu 35 40 45 atc ctic toc gag gtc. cag goc at g citc aat gto gag gcc aag gat gtc 192 Ile Leu Ser Glu Wall Glin Ala Met Leu Asn Wall Glu Ala Lys Asp Wall 50 55 60 gat goc ctic agc cqc act c go act gtt ggit gag gtt gto aac gcc atg 240 Asp Ala Leu Ser Arg Thr Arg Thr Val Gly Glu Wall Wall Asn Ala Met 65 70 75 8O aag gCC gag atc gct ggC 258 Lys Ala Glu Ile Ala Gly 85

<210> SEQ ID NO 13 &2 11s LENGTH: 86 &212> TYPE PRT <213> ORGANISM: Schizochytrium sp. <400 SEQUENCE: 13

Ala Wal Ser Asn Glu Lieu Lleu Glu Lys Ala Glu Thr Wall Wall Met Glu 1 5 10 15

Wall Telu Ala Ala Lys Thr Gly Tyr Glu Thr Asp Met Ile Glu Ala Asp 25 30

Met Glu Telu Glu Thr Glu Leu Gly Ile Asp Ser Ile Lys Arg Wall Glu 35 40 45

Ile Telu Ser Glu Wall Glin Ala Met Leu Asn Wall Glu Ala Lys Asp Wall 50 55 60

Asp Ala Telu Ser Arg Thr Arg Thr Val Gly Glu Wall Wall Asn Ala Met 65 70 75 8O

Ala Glu Ile Ala Gly 85

SEQ ID NO 14 LENGTH 5 TYPE PRT ORGANISM Schizochytrium sp.

<400 SEQUENCE: 14 Leu Gly Ile Asp Ser 1 5

SEQ ID NO 15 LENGTH 21 TYPE ORGANISM: Schizochytrium sp.

<400 SEQUENCE: 15 Ala Pro Ala Pro Wall Lys Ala Ala Ala Pro Ala Ala Pro Val Ala Ser

US 7,256,023 B2 169 170

-continued

Pro Ala Glu Ile Teu Gly Phe Thr Telu Met Cys Ala Lys Phe Ala 25 30

Ala Ser Telu Cys Thr Ala Wall Ala Gly Gly Arg Pro Ala Phe Ile Gly 35 40 45

Wall Ala Arg Telu Asp Gly Telu Gly Phe Thr Ser Glin Gly Thr Ser 50 55 60

Asp Ala Telu Lys Ala Glin Gly Ala Ile Phe Gly Telu Lys 65 70 75 8O

Thr Ile Gly Telu Glu Trp Ser Glu Ser Asp Wall Phe Ser Arg Gly Wall 85 90 95

Asp Ile Ala Glin Gly Met His Pro Glu Asp Ala Ala Wall Ala Ile Wall 100 105 110

Arg Glu Met Ala Cys Ala Ile Arg Ile Arg Glu Wall Gly Ile 115 120 125

Ala Asn Glin Glin Cys Thr Ile Arg Ala Ala Lys Teu Glu Thr 130 135 1 4 0

Asn Pro Glin Arg Glin Ile Ala Lys Asp Wall Teu Teu Wall Ser 145 15 O 155

Gly Arg Gly Ile Thr Pro Telu Cys Ile Arg Glu Ile Thr Arg 1.65 170 175

Ile Gly Gly Lys Tyr Ile Telu Telu Gly Arg Ser Wall Ser 18O 185 19 O

Ser Pro Ala Trp Cys Ala Gly Ile Thr Asp Glu Lys Ala Wall 195 200

Ala Thr Glin Glu Teu Lys Ala Phe Ser Ala Gly Glu 215 220

Pro Pro Thr Pro Arg Ala Wall Thr Lys Teu Wall Gly Ser Wall Telu 225 230 235 240

Gly Arg Glu Wall Arg Ser Ser Ile Ala Ala Ile Glu Ala Telu Gly 245 250 255

Gly Ala Ile Tyr Ser Ser Cys Asp Wall Asn Ser Ala Ala Asp Wall 260 265 27 O

Ala Ala Wall Arg Asp Ala Glu Ser Glin Teu Gly Ala Arg Wall Ser 275 280 285

Gly Ile Wall His Ala Ser Gly Wall Telu Asp Arg Teu Ile Glu 29 O 295

Lys Telu Pro Asp Glu Phe Ala Wall Phe Gly Thr Wall Thr Gly 305 310 315 320

Teu Glu Asn Telu Teu Ala Ala Wall Asp Arg Ala Asn Teu His Met 325 330 335

Wall Telu Phe Ser Ser Teu Ala Gly Phe His Gly Asn Wall Gly Glin Ser 340 345 35 O

Asp Ala Met Ala Asn Glu Ala Telu Asn Lys Met Gly Telu Glu Telu 355 360 365

Ala Lys Asp Wall Ser Wall Lys Ser Ile Cys Phe Gly Pro Trp Asp Gly 370 375

Gly Met Wall Thr Pro Glin Teu Lys Lys Glin Phe Glin Glu Met Gly Wall 385 390 395 400

Glin Ile Ile Pro Arg Glu Gly Gly Ala Asp Thr Wall Ala Arg Ile Wall 405 410 415

Teu Gly Ser Ser Pro Ala Glu Ile Telu Wall Gly Asn Trp Arg Thr Pro 420 425 43 O US 7,256,023 B2 171 172

-continued Ser Lys Llys Val Gly Ser Asp Thr Ile Thr Teu His Arg Lys Ile Ser 435 4 40 4 45

Ala Lys Ser Asn Pro Phe Leu Glu Asp His Wall Ile Glin Gly Arg 450 455 460

Val Leu Pro Met Thr Leu Ala Ile Gly Ser Teu Ala Glu Thr Cys Telu 465 470 475 480

Gly Leu Phe Pro Gly Tyr Ser Leu Trp Ala Ile Asp Asp Ala Glin Telu 485 490 495

Phe Lys Gly Val Thr Val Asp Gly Asp Val Asn Glu Wall Thr Telu 5 OO 505

Thr Pro Ser Thr Ala Pro Ser Gly Arg Val Asn Wall Glin Ala Thr Telu 515 52O 525

Lys Thr Phe Ser Ser Gly Lys Leu Val Pro Ala Tyr Arg Ala Wall Ile 530 535 540

Val Lieu Ser Asn Glin Gly Ala Pro Pro Ala Asn Ala Thr Met Glin Pro 545 550 555 560

Pro Ser Lieu. Asp Ala Asp Pro Ala Leu Glin Gly Ser Wall Asp Gly 565 570 575

Lys Thr Leu Phe His Gly Pro Ala Phe Arg Gly Ile Asp Asp Wall Telu 58O 585 59 O

Ser Cys Thr Lys Ser Glin Lieu Val Ala Lys Ser Ala Wall Pro Gly 595 600 605

Ser Asp Ala Ala Arg Gly Glu Phe Ala Thr Asp Thr Asp Ala His Asp 610 615 62O

Pro Phe Val Asn Asp Leu Ala Phe Glin Ala Met Teu Wall Trp Wall Arg 625 630 635 640

Arg Thr Lieu Gly Glin Ala Ala Lieu Pro Asn Ser Ile Glin Arg Ile Wall 645 650 655

Gln His Arg Pro Val Pro Gln Asp Llys Pro Phe Ile Thr Telu 660 665 67 O

Ser Asn Glin Ser Gly Gly. His Ser Gln His His Ala Telu Glin Phe 675 680 685

His Asn. Glu Glin Gly Asp Leu Phe Ile Asp Wall Glin Ala Ser Wall Ile 69 O. 695 7 OO Ala Thr Asp Ser Leu Ala Phe 705 710

SEQ ID NO 19 LENGTH 1350 TYPE DNA ORGANISM: Schizochytrium sp. FEATURE: NAME/KEY: CDS LOCATION: (1) . . (1350) <400 SEQUENCE: 19 atg goc got cqg aat gtg agc goc gog cat gag atg cac gat gaa aag 48 Met Ala Ala Arg Asn. Wal Ser Ala Ala His Glu Met His Asp Glu Lys 1 5 10 15 cgc atc gcc gtc. gtc. g.gc atg goc gtc. cag tac gcc gga tgc a.a.a. acc 96 Arg Ile Ala Val Val Gly Met Ala Val Glin Ala Gly Cys Thr 2O 25 30 aag gac gag titc togg gag gtg citc atgaac ggC aag gto gag. to c aag 144 Lys Asp Glu Phe Trp Glu Val Lieu Met Asn Gly Lys Wall Glu Ser Lys 35 40 45 gtg atc agc gac aaa ciga citc ggc toc aac tac cgc gcc gag. cac tac 192 Val Ile Ser Asp Lys Arg Lieu Gly Ser Asn Arg Ala Glu His

US 7,256,023 B2 175 176

-continued

Thr Xaa Xaa Ala Ala Gly Phe Ala Gly Met Lys Wall Telu Telu Ser 370 375 38O atg aag cat ggC atc atc cc.g coc acc cc.g ggit atc gat gac gag acc 200 Met Lys His Gly Ile Ile Pro Pro Thr Pro Gly Ile Asp Asp Glu Thr 385 390 395 400 aag atg gac cost citc. gto gto to c ggit gag gcc atc cca tgg cca gag 248 Lys Met Asp Pro Teu Wall Wall Ser Gly Glu Ala Ile Pro Trp Pro Glu 405 410 415 acc aac ggC gag cco aag cgc gcc ggit citc. tog gcc titt ggc titt ggit 296 Thr Asn Gly Glu Pro Lys Arg Ala Gly Telu Ser Ala Phe Gly Phe Gly 420 425 43 O ggC acc aac gcc cat gcc gto titt gag gag cat gac cco to c aac gcc 344 Gly Thr Asn Ala His Ala Wall Phe Glu Glu His Asp Pro Ser Asn Ala 435 4 40 4 45 gcc tgc 350 Ala Cys 450

SEQ ID NO 20 LENGTH 450 TYPE PRT ORGANISM: Schizochytrium sp. FEATURE: NAME/KEY: misc feature LOCATION: (370) ... (370) OTHER INFORMATION: The at location 370 stands for Leu. FEATURE: NAME/KEY: misc feature LOCATION: (371) . . (371) OTHER INFORMATION: The at location 371 stands for Ala or Wall.

<400 SEQUENCE: 20

Met Ala Ala Arg Asn Wall Ser Ala Ala His Glu Met His Asp Glu 1 5 10 15

Arg Ile Ala Wall Wall Gly Met Ala Wall Glin Ala Gly Cys 25 30

Asp Glu Phe Trp Glu Wall Telu Met Asn Wall Glu Ser 35 40 45

Wall Ile Ser Asp Lys Arg Teu Gly Ser Asn Ala Glu His 50 55

Lys Ala Glu Arg Ser Lys Tyr Ala Asp Thr Phe Asn Glu Thr Tyr 65 70 75

Gly Thr Telu Asp Glu Asn Glu Ile Asp Asn Glu His Glu Telu Telu Telu 85 90 95

Asn Telu Ala Lys Glin Ala Teu Ala Glu Thr Ser Wall Asp Ser Thr 100 105 110

Arg Gly Ile Wall Ser Gly Cys Telu Ser Phe Pro Met Asp Asn Telu 115 120 125

Glin Gly Glu Telu Teu Asn Wall Tyr Glin Asn His Wall Glu Telu 130 135 1 4 0

Gly Ala Arg Wall Phe Lys Asp Ala Ser His Trp Ser Glu Arg Glu Glin 145 15 O 155 160

Ser Asn Pro Glu Ala Gly Arg Ile Phe Met Asp Pro Ala 1.65 170 175

Ser Phe Wall Ala Glu Glu Teu Asn Telu Gly Ala Teu His Tyr Ser Wall 18O 185 19 O

Asp Ala Ala Cys Ala Thr Ala Telu Tyr Wall Teu Arg Teu Ala Glin Asp 195 200 2O5 US 7,256,023 B2 177 178

-continued

His Telu Wall Ser Gly Ala Ala Asp Wall Met Teu Cys Gly Ala Thr 210 215 220

Teu Pro Glu Pro Phe Phe Ile Telu Ser Gly Phe Ser Thr Phe Glin Ala 225 230 235 240

Met Pro Wall Gly Thr Gly Glin Asn Wall Ser Met Pro Teu His Lys Asp 245 250 255

Ser Glin Gly Telu Thr Pro Gly Glu Gly Gly Ser Ile Met Wall Telu 260 265 27 O

Arg Telu Asp Asp Ala Ile Asp Gly Asp His Ile Tyr Gly Thr Telu 275 280 285

Teu Gly Ala Asn Wall Ser Asn Ser Gly Thr Gly Teu Pro Telu Pro 29 O 295

Teu Telu Pro Ser Glu Lys Lys Cys Telu Met Asp Thr Thr Arg Ile 305 310 315 320

Asn Wall His Pro His Lys Ile Glin Tyr Wall Glu His Ala Thr Gly 325 330 335

Thr Pro Glin Gly Asp Wall Glu Ile Asp Ala Wall Ala Phe 340 345 35 O

Glu Gly Lys Wall Pro Phe Gly Thr Thr Gly Asn Phe Gly His 355 360 365

Thr Xaa Xaa Ala Ala Gly Phe Ala Gly Met Lys Wall Telu Telu Ser 370 375

Met His Gly Ile Ile Pro Pro Thr Pro Gly Ile Asp Asp Glu Thr 385 390 395 400

Met Asp Pro Teu Wall Wall Ser Gly Glu Ala Ile Pro Trp Pro Glu 405 410 415

Thr Asn Gly Glu Pro Lys Ala Gly Telu Ser Ala Phe Gly Phe Gly 420 425 43 O

Gly Thr Asn Ala His Ala Wall Phe Glu Glu His Asp Pro Ser Asn Ala 435 4 40 4 45

Ala Cys 450

SEQ ID NO 21 LENGTH 1323 TYPE DNA ORGANISM: Schizochytrium sp. FEATURE: NAME/KEY: CDS LOCATION: (1) . . (1323) <400 SEQUENCE: 21 tog gcc cgc tgc ggC ggit gaa agc aac atg cgc atc gcc atc. act ggit 48 Ser Ala Arg Cys Gly Gly Glu Ser Asn Met Arg Ile Ala Ile Thr Gly 1 5 10 15 atg gac gcc acc titt ggC gct citc. aag gga citc. gac gcc titc. gag cgc 96 Met Asp Ala Thr Phe Gly Ala Telu Lys Gly Teu Asp Ala Phe Glu Arg 2O 25 30 gcc att tac acc ggC gct cac ggit gcc atc cca citc. cca gaa aag cgc 144 Ala Ile Tyr Thr Gly Ala His Gly Ala Ile Pro Teu Pro Glu Lys Arg 35 40 45 tgg cgc titt citc. ggC aag gac aag gac titt citt gac citc. tgc ggC gtc 192 Trp Arg Phe Telu Gly Lys Asp Lys Asp Phe Teu Asp Teu Cys Gly Wall 50 55 60 aag gcc acc cc.g cac ggC tgc tac att gaa gat gtt gag gtc gac titc. 240 Lys Ala Thr Pro His Gly Cys Tyr Ile Glu Asp Wall Glu Wall Asp Phe 65 70 75 8O

US 7,256,023 B2 181 182

-continued gaa coc gcc cost gag gCg cco tgg gaC agc acc citc. titt gC9 tgc Cag 1248 Glu Pro Ala Pro Glu Ala Pro Trp Asp Ser Thr Teu Phe Ala Cys Glin 405 410 415 acc tog cgc gct tgg citc. aag aac cost ggC gag cgt. cgc tat gCg gcc 1296 Thr Ser Arg Ala Trp Teu Lys Asn. Pro Gly Glu Arg Arg Tyr Ala Ala 420 425 43 O gto tog ggC gtc too gag acg cgc tog 1323 Wall Ser Gly Wall Ser Glu Thr Arg Ser 435 4 40

SEQ ID NO 22 LENGTH 441 TYPE PRT ORGANISM: Schizochytrium sp.

<400 SEQUENCE: 22

Ser Ala Arg Cys Gly Gly Glu Ser Asn Met Arg Ile Ala Ile Thr Gly 1 5 10 15

Met Asp Ala Thr Phe Gly Ala Lieu Lys Gly Teu Asp Ala Phe Glu Arg 2O 25 30

Ala Ile Tyr Thr Gly Ala His Gly Ala Ile Pro Teu Pro Glu Arg 35 40 45

Trp Arg Phe Telu Gly Lys Lys Asp Phe Teu Asp Teu Gly Wall 50 55 60

Lys Ala Thr Pro His Gly Cys Tyr Ile Glu Asp Wall Glu Wall Asp Phe 65 70 75

Glin Arg Telu Thr Pro Met Thr Pro Glu Asp Met Teu Telu Pro Glin 85 90 95

Glin Telu Telu Ala Wall Thr Thr Ile Asp Arg Ala Ile Teu Asp Ser Gly 100 105 110

Met Lys Gly Gly Asn Wall Ala Wall Phe Wall Gly Teu Gly Thr Asp 115 120 125

Teu Glu Telu His Arg Ala Arg Wall Ala Teu Glu Arg Wall 130 135 1 4 0

Arg Pro Glu Ala Ser Lys Lys Lieu. Asn Asp Met Met Glin Ile Asn 145 15 O 155 160

Asp Gly Thr Ser Thr Ser Tyr Thr Ser Ile Gly Asn Telu Wall 1.65 170 175

Ala Thr Arg Wall Ser Ser Glin Trp Gly Phe Gly Pro Ser Phe Thr 18O 185 19 O

Ile Thr Glu Gly Asn Asn Ser Val Tyr Arg Ala Glu Telu Gly 195 200

Telu Telu Glu Thr Gly Glu Val Asp Gly Wall Wall Wall Ala Gly Wall 210 215 220

Asp Telu Gly Ser Ala Glu Asn Lieu Wall Ser Arg Arg Phe 225 230 235 240

Wall Ser Thr Ser Asp Thr Pro Arg Ala Ser Phe Asp Ala Ala Ala 245 250 255

Asp Gly Tyr Phe Wall Gly Glu Gly Cys Gly Ala Phe Wall Telu Arg 260 265 27 O

Glu Thr Ser Thr Asp Arg Ile Ala Cys Met Asp Ala 275 280 285

Ile Wall Pro Gly Asn Wall Pro Ser Ala Cys Teu Arg Glu Ala Telu Asp 29 O 295

Glin Ala Arg Wall Lys Pro Gly Asp Ile Glu Met Teu Glu Telu Ser Ala US 7,256,023 B2 183 184

-continued

305 310 315 320

Asp Ser Ala Arg His Teu Pro Ser Wall Teu Pro Glu Telu 325 330 335

Thr Ala Glu Glu Glu Ile Gly Gly Telu Glin Thr Ile Teu Arg Asp Asp 340 345 35 O

Asp Telu Pro Asn Wall Ala Thr Gly Ser Wall Lys Ala Thr Wall 355 360 365

Gly Asp Thr Gly Tyr Ala Ser Gly Ala Ala Ser Teu Ile Ala Ala 370 375 38O

Teu Ile Tyr Asn Arg Telu Pro Ser Asn Gly Asp Asp Trp Asp 385 390 395 400

Glu Pro Ala Pro Glu Ala Pro Trp Asp Ser Thr Teu Phe Ala Cys Glin 405 410 415

Thr Ser Arg Ala Trp Teu Asn Pro Gly Glu Arg Arg Tyr Ala Ala 420 425 43 O

Wall Ser Gly Wall Ser Glu Thr Arg Ser 435 4 40

SEQ ID NO 23 LENGTH 15 OO TYPE DNA ORGANISM: Schizochytrium sp. FEATURE: NAME/KEY: CDS LOCATION: (1) . . (1500) <400 SEQUENCE: 23

tat to c gtg citc. citc. too gaa gcc gag ggC cac tac gag. cgc gag 48 Tyr Ser Wall Teu Teu Ser Glu Glu Gly His Glu Arg Glu 5 10 15 aac cgc atc tog citc. gac gag gag coc aag citc. att gtg citt cgc 96 Asn Arg Ile Ser Teu Asp Glu Glu Pro Lys Teu Ile Wall Telu Arg 2O 30 gcc gac to c cac gag gag atc citt cgc citc. gac aag atc. cgc gag 144 Ala Asp Ser His Glu Glu Ile Telu Arg Teu Asp Lys Ile Arg Glu 35 40 45 cgc titc. ttg Cag cco acg ggC gcc cc.g cgc gag too gag. citc. aag 192 Arg Phe Telu Glin Pro Thr Gly Ala Pro Arg Glu Ser Glu Telu Lys 50 55 60 gCg Cag gcc cgc cgc atc titc. citc. citc. citc. ggC gag acc citt gcc 240 Ala Glin Ala Arg Arg Ile Phe Telu Telu Teu Gly Glu Thr Telu Ala 65 70 75 8O

Cag gat gcc gct tot toa ggC tog aag cco citc. gct citc. agc citc. 288 Glin Asp Ala Ala Ser Ser Gly Ser Lys Pro Teu Ala Telu Ser Telu 85 90 95 gto to c acg coc too aag citc. Cag cgc gag gto gag citc. gC9 gcc aag 336 Wall Ser Thr Pro Ser Lys Teu Glin Arg Glu Wall Glu Teu Ala Ala Lys 100 105 110 ggit atc cc.g cgc tgc citc. aag atg cgc gat tgg agC to c cost gct 384 Gly Ile Pro Arg Cys Teu Lys Met Arg Arg Asp Trp Ser Ser Pro Ala 115 120 125 ggC agc tac gCg cost gag cc.g citc. gcc agC gac gtc gcc titc. 432 Gly Ser Arg Ala Pro Glu Pro Telu Ala Ser Asp Arg Wall Ala Phe 130 135 1 4 0 atg tac ggC gaa ggit cgc agC cost tac tac ggC atc acc Cala gac att 480 Met Gly Glu Gly Arg Ser Pro Tyr Gly Ile Thr Glin Asp Ile 145 15 O 155 160 cac cgc att tgg cco gaa citc. cac gag gtc atc. aac gaa aag acg aac 528 His Arg Ile Trp Pro Glu Teu His Glu Wall Ile Asn Glu Lys Thr Asn

US 7,256,023 B2 187 188

-continued Ala Trp Thir Thr Ile Wall Lys Lieu Val Ala Ser Lieu Lys Ala His Lieu 485 490 495 gtt cot ggc gtc 15 OO Val Pro Gly Val 5 OO

SEQ ID NO 24 LENGTH 500 TYPE PRT ORGANISM: Schizochytrium sp.

<400 SEQUENCE: 24

Cys Tyr Ser Val Leu Teu Ser Glu Ala Glu Gly His Glu Arg Glu 1 5 10 15

Asn Arg Ile Ser Lieu Asp Glu Glu Ala Pro Teu Ile Wall Telu Arg 30

Ala Asp Ser His Glu Glu Ile Leu Gly Arg Teu Asp Lys Ile Arg Glu 35 40 45

Arg Phe Leu Gln Pro Thr Gly Ala Ala Pro Arg Glu Ser Glu Telu 50 55 60

Ala Glin Ala Arg Arg Ile Phe Lieu. Gu Telu Teu Gly Glu Thr Telu Ala 65 70 75 8O

Glin Asp Ala Ala Ser Ser Gly Ser Gn. Lys Pro Teu Ala Telu Ser Telu 85 90 95

Wal Ser Thr Pro Ser Lys Teu Glin Arg Glu Wall Glu Teu Ala Ala 100 105 110

Gly Ile Pro Arg Cys Teu Lys Met Arg Asp Trp Ser Ser Pro Ala 115 120 125

Gly Ser Arg Tyr Ala Pro Glu Pro Leu Ala Ser Asp Arg Wall Ala Phe 130 135 1 4 0

Met Tyr Gly Glu Gly Arg Ser Pro Tyr Tyr Gly Ile Thr Glin Asp Ile 145 15 O 155 160

His Arg Ile Trp Pro Glu Teu His Glu Wall Ile Asn Glu Thr Asn 1.65 170 175

Arg Lieu Trp Ala Glu Gly Wall Met Pro Arg Ala Ser Phe 18O 185 19 O

Lys Ser Glu Lieu Glu Ser Glin Glin Glin Glu Phe Asp Arg Asn Met Ile 195 200

Glu Met Phe Arg Leu Gly Ile Leu. Thr Ser Ile Ala Phe Thr Asn Telu 210 215 220

Ala Arg Asp Val Lieu Asn Ile Thr Pro Lys Ala Ala Phe Gly Telu Ser 225 230 235 240

Leu Gly Glu Ile Ser Met Ile Phe Ala Phe Ser Asn Gly Telu 245 250 255

Ile Ser Asp Gln Leu Thr Lys Asp Lieu Glu Ser Asp Wall Trp Asn 260 265 27 O

Lys Ala Lieu Ala Val Glu Phe Asn Ala Telu Arg Glu Ala Trp Gly Ile 275 280 285

Pro Glin Ser Wall Pro Lys Asp Glu Phe Trp Glin Gly Tyr Ile Wall Arg 29 O 295

Gly Thr Lys Glin Asp Ile Glu Ala Ala Ile Ala Pro Asp Ser Tyr 305 310 315 320

Val Arg Leu Thir Ile Ile Asn Asp Ala Asn Thr Ala Teu Ile Ser Gly 325 330 335

Lys Pro Asp Ala Cys Lys Ala Ala Ile Ala Arg Teu Gly Gly Asn Ile US 7,256,023 B2 189 190

-continued

340 345 35 O

Pro Ala Telu Pro Wall Thr Glin Gly Met Gly His Cys Pro Glu Wall 355 360 365

Gly Pro Tyr Thr Lys Ile Ala Lys Ile His Ala Asn Telu Glu Phe 370 375 38O

Pro Wall Wall Asp Gly Teu Asp Telu Trp Thr Thr Ile Asn Glin Arg 385 390 395 400

Teu Wall Pro Ala Thr Gly Ala Lys Asp Glu Trp Ala Pro Ser Ser 405 410 415

Phe Gly Glu Tyr Ala Gly Glin Telu Tyr Glu Glin Ala Asn Phe Pro 420 425 43 O

Glin Ile Wall Glu Thr Ile Tyr Lys Glin Asn Asp Wall Phe Wall Glu 435 4 40 4 45

Wall Gly Pro Asn Asn His Arg Ser Thr Ala Wall Arg Thr Thr Telu Gly 450 455 460

Pro Glin Arg Asn His Teu Ala Gly Ala Ile Asp Glin Asn Glu Asp 465 470 475 480

Ala Trp Thr Thr Ile Wall Lys Telu Wall Ala Ser Teu Ala His Telu 485 490 495

Wall Pro Gly Wall 5 OO

SEQ ID NO 25 LENGTH 1530 TYPE DNA ORGANISM: Schizochytrium sp. FEATURE: NAME/KEY: CDS LOCATION: (1) . . (1530) <400 SEQUENCE: 25 citg citc. gat citc. gac agt atg citt gCg citg agC tot gcc agt gcc to c 48 Teu Telu Asp Telu Asp Ser Met Telu Ala Telu Ser Ser Ala Ser Ala Ser 1 5 10 15 ggC aac citt gtt gag act gCg cost agc gac gcc tog gto att gtg cc.g 96 Gly Asn Telu Wall Glu Thr Ala Pro Ser Asp Ala Ser Wall Ile Wall Pro 2O 25 30 cco tgc aac att gCg gat citc. ggC agc cgc gcc titc. atg a.a.a. acg tac 144 Pro Cys Asn Ile Ala Asp Teu Gly Ser Arg Ala Phe Met Thr 35 40 45 ggit gtt tog gCg cost citg tac acg gcc atg gcc aag ggc att gcc 192 Gly Wall Ser Ala Pro Teu Tyr Thr Ala Met Ala Lys Gly Ile Ala 50 55 60 tot gCg gac citc. gto att gcc gcc cgc Cag ggC atc citt gCg to c 240 Ser Ala Asp Telu Wall Ile Ala Ala Arg Glin Gly Ile Telu Ala Ser 65 70 75 8O titt ggC gcc ggC gga citt cco atg gtt gtg cgt. gag to c atc gaa 288 Phe Gly Ala Gly Gly Teu Pro Met Wall Wall Arg Glu Ser Ile Glu 85 90 95 aag att Cag gcc gcc citg cco aat cc.g tac gct gto aac citt atc 336 Ile Glin Ala Ala Teu Pro Asn Pro Tyr Ala Wall Asn Telu Ile 100 110 cat tot coc titt gac agC aac citc. aag ggC aat gto gat citc. titc. 384 His Ser Pro Phe Asp Ser Asn Telu Lys Gly Asn Wall Asp Telu Phe 115 120 125 citc. gag aag ggit gto acc titt gtc gcc tog gcc titt atg acg citc. 432 Teu Glu Lys Gly Wall Thr Phe Wall Ala Ser Ala Phe Met Thr Telu 130 135 1 4 0