USOO7645597B2

(12) United States Patent (10) Patent No.: US 7,645,597 B2 Metz et al. (45) Date of Patent: Jan. 12, 2010

(54) PUFA POLYKETIDE SYNTHASE SYSTEMS CI2N 5/82 (2006.01) AND USES THEREOF CI2P 7/64 (2006.01) (52) U.S. Cl...... 435/134:435/419,435/189: (75) Inventors: James G. Metz, Longmont, CO (US); 435/252.3; 435/320.1; 435/410,536/23.2 Craig A. Weaver, Boulder, CO (US); (58) Field of Classification Search ...... None William R. Barclay, Boulder, CO (US); See application file for complete search history. James H. Flatt, Colorado Springs, CO (US) (56) References Cited (73) Assignee: Martek Biosciences Corporation, U.S. PATENT DOCUMENTS Columbia, MD (US) 5,130,242 A 7/1992 Barclay et al. (*) Notice: Subject to any disclaimer, the term of this inued patent is extended or adjusted under 35 (Continued) U.S.C. 154(b) by 0 days. FOREIGN PATENT DOCUMENTS (21) Appl. No.: 11/778,608 CA 252O795 10, 2004 (22) Filed: Jul. 16, 2007 (Continued) (65) Prior Publication Data OTHER PUBLICATIONS US 2008/OO44871 A1 Feb. 21, 2008 Abbadi et al., Eur, J. Lipid Sci. Technol. 103: 106-113 (2001). Related U.S. Application Data (Continued) (60) Division of application No. 1 1/676.971, filed on Feb. Primary Examiner Anand U Desai 20, 2007, which is a division of application No. Assistant Examiner William W. Moore 10/810,352, filed on Mar. 26, 2004, now Pat. No. (74) Attorney, Agent, or Firm—Sheridan Ross P.C. 7.211,418, and a continuation-in-part of application No. 10/124,800, filed on Apr. 16, 2002, now Pat. No. (57) ABSTRACT 7.247.461, and a continuation-in-part of application No. 09/231,899, filed on Jan. 14, 1999, now Pat. No. The invention generally relates to polyunsaturated fatty acid 6,566,583. (PUFA) polyketide synthase (PKS) systems, to homologues (60) Provisional application No. 60/457,979, filed on Mar thereof, to isolated nucleic acid molecules and recombinant 26, 2003, provisional appl ication No. 60,284.066. nucleic acid molecules encoding biologically active domains filed O Apr. 16, 2001, provisional application No. of Such a PUFA PKS system, to genetically modified organ 60/298.796, filed on Jun. 15, 2001, provisional appli- isms comprising PUFA PKS systems, to methods of making cation No. 60,323,269, filedons ep 18, 2001 and using Such systems for the production of bioactive mol s1- Y - s • Y-s ecules of interest, and to novel methods for identifying new (51) Int. Cl. bacterial and non-bacterial microorganisms having Such a CI2N 15/53 (2006.01) PUFA PKS system. CI2N 15/74 (2006.01) CI2N 15/79 (2006.01) 21 Claims, 3 Drawing Sheets

Orf A

Orf B KS CLF AT ER

Orf C 166 kDa US 7,645,597 B2 Page 2

U.S. PATENT DOCUMENTS WO WO 2004/087879 10, 2004 WO WO 2006/008099 1, 2006 5,246,841 A 9, 1993 Yazawa et al. WO WO 2006/034228 3, 2006 5,310,242 A 5, 1994 Golder 5,639,790 A 6, 1997 Voelker et al. OTHER PUBLICATIONS 5,672.491 A 9, 1997 Khosia et al. 5,683,898 A 11/1997 Yazawa et al. Allen et al., Appl. Envir. Microbiol. 65(4): 1710-1720 (1999). 5,798,259 A 8, 1998 Yazawa et al. Bateman et al., Nucl. Acids Res., 30(1):276-280 (2002). 5,908,622 A 6/1999 Barclay Bentley et al., Annu. Rev. Microbiol., 53:411-46 (1999). 6,033,883. A 3, 2000 Barr et al. Bisang et al., Nature, 401:502-505 (1999). 6,140,486 A 10, 2000 Facciotti et al. Bork, TIG, 12(10):425-427 (1996). 6,503,706 B1 1/2003. Abken et al. Brenner, TIG, 15(4): 132-133 (1999). 6,566,583 B1 5, 2003 Facciotti et al. Broun et al., Science, 282:1315-1317 (1998). 7,001,772 B2 2/2006 Roessler et al. Creelman et al., Annu. Rev. Plan Physiol. Plant Mol. Biol. 48:355-81 7,125,672 B2 10/2006 Picataggio et al. (1997). 7,211,418 B2 5, 2007 Metz et al. DeLong & Yayanos, Appl. Environ. Microbiol. 51(4):730-737 7,214,853 B2 5, 2007 Facciotti et al. (1986). 7,217,856 B2 5, 2007 Weaver et al. Doerks, TIG, 14(6):248-250 (1998). 7,256,022 B2 8, 2007 Metz et al. Facciotti et al., “Cloning and Characterization of Polyunsaturated 7,256,023 B2 8, 2007 Metz et al. Fatty Acids (PUFA) Genes from Marine Bacteria” in Proceedings of 7,259,295 B2 8, 2007 Metz et al. the international Symposium on progress and prospect of marine 7,271,315 B2 9, 2007 Metz et al. biotechnology (China Ocean Pres 1999), pp. 404-405 Abstract. 2002/0138874 A1 9/2002 Mukerji et al. Heath et al., J. Biol. Chem., 271 (44):27795-27801 (1996). 2002/0156254 A1 10/2002 Qiu et al. Hopwood & Sherman, Annu. Rev. Genet. 24:37-66 (1990). 2002/0194641 A1 12/2002 Metz et al. Hutchinson, Annu. Rev. Microbiol. 49:201-238 (1995). 2004/OOO5672 A1 1/2004 Santi et al. Jostensen & Landfald, FEMS Microbiology Letters, 151:95-101 2004/0010817 A1 1/2004 Shockey et al. (1997). 2004/O139498 A1 7/2004 Jaworski et al. Katz & Donadio, Annu. Rev. Microbiol., 47:875-912 (1993). 2004/0172682 A1 9/2004 Kinney et al. Kealing et al., Curr. Opin. Chem. Biol. 3:598-606 (1999). 2005, OO14231 A1 1/2005 Mukerji et al. Kyle et al., HortScience, 25:1523-26 (1990). 2005.0089865 A1 4/2005 Napier et al. Magnuson, Microbil. Rev. 57(3):522-542 (1993) Abstract. 2005, 0164.192 A1 7/2005 Graham et al. Metz et al., Science, 293:290-293 (2001). 2007/0244192 A1 10, 2007 Metz Nakahara, Yukagaku, 44(10):821-7 (1995). 2007/0245431 A1 10, 2007 Metz et al. Nasu et al., J. Ferment. Bioeng, 122:467-473 (1997). 2007,0256146 A1 11, 2007 Metz et al. Nichols et al., Curr. Opin Biotechnol., 10:240-246 (1999). 2007/0270494 A1 11, 2007 Metz et al. Nogi et al., Extremophiles, 2:1-7 (1998). 2008/0022422 A1 1/2008 Weaver et al. Parker-Barnes et al., PNAS,97(15):8284-8289 (2000). 2008/0026434 A1 1/2008 Weaver et al. Sánchez et al., Chemistry & Biolosy, 8:725-738 (2001). 2008/0026435 A1 1/2008 Weaver et al. Shanklin et al., Annu. Rev. Plant Physiol. Plant Mol. Biol. 49:611-41 2008, 0026438 A1 1/2008 Metz et al. (1998). 2008, 0026439 A1 1/2008 Metz et al. Smith et al., Nature Biotechnol. 15:1222-1223 (1997). 2008, OO323510026440 A1,A1 2,1/2008 2008 Metz et al. Somervilleille Am. J. Clin. Nutt,Nutr., 58(2 supp):270S-275Ssupp) (1993) 2008, OO38378 A1 2/2008 Metz et al. Van de Loo, Proc. Natl. Acad. Sci. USA, 92:6743-6747 (1995). 2008.OO38379 A1 2/2008 Metz et al. Watanabe et al., J. Biochem., 122:467-473 (1997). 2008.OO38790 A1 2/2008 Metz et al. Yalpaniet al., The Plant Cell, 13:1401-1409 (2001). 2008.OO38792 A1 2/2008 Metz et al. Metz et al. Production of Polyunsaturated Fatty Acids by Polyketide 2008.OO38793 A1 2/2008 Metz et al. Synthase in Both Prokaryotes and Eukaryotes. Science Jul. 12, 2001, 2008, OO38794 A1 2/2008 Metz et al. vol. 293, pp. 290-293. 2008.OO38795 A1 2/2008 Metz et al. Qiu et al. Identification of a delta4 fatty acid desaturase from 2008, OO387.96 A1 2/2008 Metz et al. Thraustochytrium sp. involved in the biosynthesis. J. Biol. Chem. 2008, OO387.97 A1 2/2008 Metz et al. Aug. 24, 2001, vol. 276, No. 34, pp. 31561-31566. 2008/0040822 A1 2/2008 Metz et al. Yokochi et al. Optimization of docosahexaenoic acid production. 2008/0044867 A1 2/2008 Metz et al. App. Microbiol. Biotechnol. 1998, vol. 49, pp. 72-76. 2008/0044868 A1 2/2008 Metz et al. Nakahara et al. Production of docosahexaenoic and 2008/0044869 A1 2/2008 Metz et al. docosapentaenoic acids by Schizochytrium sp. isolated from Yap 2008/0044870 A1 2/2008 Metz et al. Islands. 1996 J. Am. Oil Chem. Soc. 1996, vol. 73, No. 11, pp. 2008/0044872 A1 2/2008 Metz et al. 1421-1426. 2008/0044873 A1 2/2008 Metz et al. Weete et al. Lipids and Ultrasctructure of Thrauchytrium sp. 2008, OOSO790 A1 2/2008 Metz et al. ATCC26.185. 1997, Am Oil Chem. Soc. vol. 32, No. 8, pp. 839-845. 2008. O148433 A1 6, 2008 Metz et al. Singh et al. Microbial Production of Docosahexaenoic Acid (DHA. C22:6). Adv. Appl. Microbial, 1997. vol. 45, pp. 271-312. FOREIGN PATENT DOCUMENTS Napier, Trends Plant Sci. Feb. 2002; 7(2): 51-4. International Search Report for International (PCT) Patent Applica EP O594.868 5, 1994 tion No. PCT/US04/09323, mailed Apr. 4, 2007. EP O823475 2, 1998 Written Opinion for International (PCT) Patent Application No. WO 9.323545 11, 1993 PCT/US04/09323, mailed Apr. 4, 2007. WO 962.1735 T 1996 International Preliminary Report on Patentability for International WO 98.46764 10, 1998 (PCT) Patent Application No. PCT/US04/09323, mailed May 9, WO 98.55625 12/1998 2007. WO OO42.195 T 2000 Examiner's First Report for Australian Patent Application No. WO WO O2/O83870 10, 2002 2004225485, mailed Nov. 17, 2006. US 7,645,597 B2 Page 3

First Examination Report for Indian Patent Application No. 4359. Wallis et al., “Polyunsaturated fatty acid synthesis: what will they DELNP/2005, dated Mar. 14, 2007. think of next?', Tibs Trends in Bio Sciences, Elsevier Publ., Cam U.S. Appl. No. 1 1/674,574, Facciotti et al. (Feb. 13, 2007). bridge, EN, vol. 27, No. 9, Sep. 2002, pp. 467-473, XP004378766. U.S. Appl. No. 1 1/777.277, Metz et al. (Jul 12, 2007). Wiesmann et al. “The molecular basis of Celmer's rules: the U.S. Appl. No. 1 1/778,594, Metz et al. (Jul 16, 2007). Stereochemistry of the condensation step in chain extension on the Allen E.A. et al. 2002 "Structure and regulation of the omega-3 erythromycin polyketide synthase.” Biochemistry (1997) 36: 13849 polyunsaturated fatty acid synthase genes from the deep-sea bacte 13855. rium Photobacterium profundum strain SS9” Microbiology vol. 148 Wiesmann et al. “Origin of starter units for erythromycin pp. 1903-1913. biosynthesis.” Biochemistry (1998). 37: 11012-11017. Cane et al., “Harnessing the Biosynthetic Code: Combinations, Per Wiesmann et al. “Polyketide synthesis in vitro on a modular mutations, and Mutations.” Science 1998, vol. 282, pp. 63-68. polyketide synthase.” Chemistry & Biology (Sep. 1995) 2: 583-589. Chuck et al., “Molecular recognition of diketide substrates by a International Search Report for International (PCT) Patent Applica beta-ketoacyl-acyl carrier protein synthase domain within a tion No. PCT/US02/12254, mailed Nov. 15, 2002. bimodular polyketide synthase'. Chem and Bio, Current Bio, (Lon International Preliminary Examination Report for International don), GB, vol. 4, No. 10, 1997, pp. 757-766, XP000884721. (PCT) Patent Application No. PCT/US02/12254, mailed Oct. 16, Database Geneseq 'Online Dec. 11, 2000, "S. aggregatum PKS clus 2006. ter ORF6 homolog DNA” XP002368912, retrieved from EBI acces International Search Report for International (PCT) Patent Applica Sion No. GSN:AAA71567Database accession No. AAA71567 & tion No. PCT/US00/00956, mailed Jul. 6, 2000. Database Geneseq 'Online! Dec. 11, 2000, "S. aggregatum PKS Written Opinion for International (PCT) Patent Application No. cluster ORF6 homolog protein.” XP002368914 retrieved from EBI PCT/US00/00956, mailed Dec. 19, 2000. accession No. GSP:AAB10482 Database accession No. AAB10482 International Preliminary Examination Report for International & WO 00/42.195 A (Calgene, LLC) Jul. 20, 2000. (PCT) Patent Application No. PCT/US00/00956, mailed Apr. 19, GenBank Accession No. AF4091 00. (Allen et al.) 2002. 2001. GenBank Accession No. U09865. Alcaligenes eutrophus pyruvate International Search Report for International (PCT) Patent Applica dehydrogenase (pdhA), dihydrolipoamide acetyltransferase (pdhB), tion No. PCT/US05/36998, mailed Mar. 22, 2007. dihydrolipoamide dehydrogenase (pdhL), and ORF3 genes, com Written Opinion for International (PCT) Patent Application No. plete cols (1994). PCT/US05/36998, mailed Mar. 22, 2007. Harlow et al. Antibodies: A Laboratory Manual (1988) Cold Spring International Search Report for International (PCT) Patent Applica Harbor Laboratory Press, p. 76. tion No. PCT/US08/63835, mailed Nov. 3, 2008. Jezet al., “Structural control of polyketide formation in plant-specific Written Opinion for International (PCT) Patent Application No. polyketide synthases”. Chem and Bio (London), vol. 7, No. 12, Dec. PCT/US08/63835, mailed Nov. 3, 2008. 2000, pp. 919-930, XP0023.38564. International Search Report for International (PCT) Patent Applica Kaulmann et al. “Biosynthesis of Polyunsaturated Fatty Acids by tion No. PCT/US06/22893, mailed Feb. 29, 2008. Polyketide Synthases”. Angew. Chem. Int. Ed. 2002, 41, No. 11, pp. Written Opinion for International (PCT) Patent Application No. PCT/US06/22893, mailed Feb. 29, 2008. 1866-1869. International Search Report for International (PCT) Patent Applica Kealey et al., “Production of a polyketide natural product in non tion No. PCT/US07/64105, mailed Nov. 23, 2007. polyketide-producing prokaryotic and eukaryotic hosts'. Proceed Written Opinion for International (PCT) Patent Application No. ings of the Natural Academy of Sciences of the United States of PCT/US07/64105, mailed Nov. 23, 2007. America, vol. 95, No. 2, Jan. 20, 1998, pp. 505-509, XP0023.38563. International Preliminary Report on Patentability for International Khosla et al., “Tolerance and Specificity of Polyketide Synthases”. (PCT) Patent Application No. PCT/US07/64105, mailed Sep. 25, Annu. Rev. Biochem. 1999. 68:219-253. 2008. Leadlay PF. “Combinatorial Approaches to Polyketides International Search Report for International (PCT) Patent Applica Biosynthesis' Current Opinion in Chemical Biology (1997) 1: 162 tion No. PCT/US07/64104, mailed Dec. 5, 2008. 168. Written Opinion for International (PCT) Patent Application No. Nasu et al., “Efficient Transformation of Marchantia polymorpha PCT/US07/64104, mailed Dec. 5, 2008. That is Haploid and Has Very Small Genome DNA.” Journal of International Search Report for International (PCT) Patent Applica Fermentation and Bioengineering vol. 84, No. 6, 519-523 1997. tion No. PCT/US2007/064106, mailed Sep. 16, 2008. Nicholson et al., “Design and utility of oligonucleotide gene probes Written Opinion for International (PCT) Patent Application No. for fungal polyketide synthases”. Chem & Bio (London) vol. 8, No. PCT/US2007/064106, mailed Sep. 16, 2008. 2, Feb. 2001, pp. 157-178, XP0023.38562. International Preliminary Report on Patentability for International Oliynuk et al. "A hybrid modular polyketide synthase obtained by (PCT) Patent Application No. PCT/US2007/064106, mailed Oct. 30, domain Swapping.” Chemistry & Biology (1996) 3: 833-839. 2008. Orikasa et al. Characterization of the eicosapentaenoic acid Fan K W et al: "Eicosapentaenoic and docosahexaenoic acids pro biosynthesis gene cluster from Shewanella sp. strain SCRC-2738, duction by and okara-utilizing potential of thraustochytrids' Journal Cellular and Molecular Biology (Noisy-le-grand), Jul. 2004, vol. 50. of Industrial Microbiology and Biotechnology, Basingstoke, GB, No. 5, pp. 625-630. vol. 27, No. 4, Oct. 1, 2001 (Oct. 1, 2001), pp. 199-202, Satomi et al. Shewanelia marinintesina sp. nov. Shewanella XPOO2393382 ISSN: 1367-5435 Schlegeliana sp. nov. and Shewanelia Sairae sp. nov. novel Wolff et al. Arachidonic, Eicosapentaenoic and Biosynthetically eicosapentaenoic-acid-producing marine bacteria isolated from See Related Fatty Acids in Seed Lipids from a primitive Gymnosperm. animal intestines. Internat. J. Syst. Evol. Microbiol. 2003, vol.53, pp. Agathis robusta. Lipids 34(10), 1994, 1083-1097. 491–499. Grimsley et al., “Fatty acid composition of mutants of the moss Takeyama et al. Expression of eicosapentaenoic acid synthesis gene Physcomitrella patens” Phytochemistry 2007): 1519-1524, 1981. cluster from Shewanella sp. in transgenic marine cyanobacterium. Bedford et al. "A functional chimeric modular polyketide synthase Synechecoccus sp. Microbiology. 1997, vol. 143, pp. 2725-2731. generated via domain replacement.” Chemistry & Biology 3: UniProt Accession No. Q93CG6 PHOPR, (Allen et al.) 2002. 827-831, Oct 1996. U.S. Patent Jan. 12, 2010 Sheet 1 of 3 US 7,645,597 B2

CAS

N4 l?) On CN É \O c

vm X O) CD H

Z*OIH

US 7,645,597 B2 1. 2 PUFA POLYKETDE SYNTHASE SYSTEMS from malonyl-CoA. The PKS pathways for PUFA synthesis AND USES THEREOF in Shewanella and another marine bacteria, Vibrio marinus, are described in detail in U.S. Pat. No. 6,140,486. The PKS CROSS-REFERENCE TO RELATED pathways for PUFA synthesis in the eukaryotic Thraus APPLICATIONS tochytrid, Schizochytrium is described in detail in U.S. Pat. No. 6,566,583. Finally, the PKS pathways for PUFA synthe This application is a divisional of U.S. application Ser. No. sis in eukaryotes Such as members of Thraustochytriales, 1 1/676,971, filed Feb. 20, 2007, which is a divisional of U.S. including the complete structural description of the PUFA application Ser. No. 10/810,352, filed Mar. 26, 2004, now PKS pathway in Schizochytrium and the identification of the U.S. Pat. No. 7,211,418, which claims the benefit of priority 10 PUFA PKS pathway in Thraustochytrium, including details under 35 U.S.C. S 119(e) from U.S. Provisional Application regarding uses of these pathways, are described in detail in Ser. No. 60/457,979, filed Mar. 26, 2003, entitled “Modifica U.S. Patent Application Publication No. 20020194641, pub tion of a Schizochytrium PKS System to Facilitate Production lished Dec. 19, 2002 (corresponding to U.S. patent applica of Lipids Rich in Polyunsaturated Fatty Acids’. U.S. appli tion Ser. No. 10/124,800, filed Apr. 16, 2002). cation Ser. No. 10/810,352 is also a continuation-in-part of 15 Researchers have attempted to exploit polyketide synthase U.S. patent application Ser. No. 10/124,800, filed Apr. 16, (PKS) systems that have been described in the literature as 2002, now U.S. Pat. No. 7,247,461, which claims the benefit falling into one of three basic types, typically referred to as: of priority under 35 U.S.C. S 119(e) to: U.S. Provisional Type II, Type I and modular. The Type II system is character Application Ser. No. 60/284,066, filed Apr. 16, 2001; U.S. ized by separable proteins, each of which carries out a distinct Provisional Application Ser. No. 60/298.796, filed Jun. 15, enzymatic reaction. The enzymes work in concert to produce 2001; and U.S. Provisional Application Ser. No. 60/323,269, the end product and each individual enzyme of the system filed Sep. 18, 2001. U.S. patent application Ser. No. 10/124, typically participates several times in the production of the 800, Supra, is also a continuation-in-part of U.S. application end product. This type of system operates in a manner analo Ser. No. 09/231,899, filed Jan. 14, 1999, now U.S. Pat. No. gous to the fatty acid synthase (FAS) systems found in plants 6,566,583. Each of the above-identified patent applications is 25 and bacteria. Type I PKS systems are similar to the Type II incorporated herein by reference in its entirety. system in that the enzymes are used in an iterative fashion to This application does not claim the benefit of priority from produce the end product. The Type I differs from Type II in U.S. application Ser. No. 09/090,793, filed Jun. 4, 1998, now that enzymatic activities, instead of being associated with U.S. Pat. No. 6,140,486, although U.S. application Ser. No. separable proteins, occur as domains of larger proteins. This 09/090,793 is incorporated herein by reference in its entirety. 30 system is analogous to the Type I FAS systems found in animals and fungi. REFERENCE TO SEQUENCE LISTING In contrast to the Type I and II systems, in modular PKS systems, each enzyme domain is used only once in the pro This application contains a Sequence Listing Submitted as duction of the end product. The domains are found in very an electronic text file named “Sequence Listing..txt, having 35 large proteins and the product of each reaction is passed on to a size in bytes of 593 kb, and created on 26 Mar. 2004. The another domain in the PKS protein. Additionally, in all of the information contained in this electronic file is hereby incor PKS systems described above, if a carbon-carbon double porated by reference in its entirety pursuant to 37 CFRS1.52 bond is incorporated into the end product, it is always in the (e)(5). trans configuration. 40 In the Type I and Type II PKS systems described above, the FIELD OF THE INVENTION same set of reactions is carried out in each cycle until the end product is obtained. There is no allowance for the introduc This invention relates to polyunsaturated fatty acid (PUFA) tion of unique reactions during the biosynthetic procedure. polyketide synthase (PKS) systems from microorganisms, The modular PKS systems require huge proteins that do not including eukaryotic organisms, such as Thraustochytrid 45 utilize the economy of iterative reactions (i.e., a distinct microorganisms. More particularly, this invention relates to domain is required for each reaction). Additionally, as stated nucleic acids encoding non-bacterial PUFA PKS systems, to above, carbon-carbon double bonds are introduced in the non-bacterial PUFA PKS systems, to genetically modified trans configuration in all of the previously described PKS organisms comprising non-bacterial PUFAPKS systems, and systems. to methods of making and using the non-bacterial PUFAPKS 50 Polyunsaturated fatty acids (PUFAs) are critical compo systems disclosed herein. This invention also relates to nents of membrane lipids in most eukaryotes (Lauritzenet al., genetically modified microorganisms and methods to effi Prog. Lipid Res. 401 (2001): McConnet al., Plant J. 15,521 ciently produce lipids (triacylglyerols (TAG), as well as (1998)) and are precursors of certain hormones and signaling membrane-associated phospholipids (PL)) enriched in Vari molecules (Heller et al., Drugs 55, 487 (1998); Creelman et ous polyunsaturated fatty acids (PUFAs) and particularly, 55 al., Annu. Rev. Plant Physiol. Plant Mol. Biol. 48, 355 eicosapentaenoic acid (C20:5, co-3; EPA) by manipulation of (1997)). Known pathways of PUFA synthesis involve the a PUFA polyketide synthase (PKS) system. processing of saturated 16:0 or 18:0 fatty acids (the abbrevia tion X:Y indicates an acyl group containing X carbon atoms BACKGROUND OF THE INVENTION and Y double bonds (usually cis in PUFAs); double-bond 60 positions of PUFAs are indicated relative to the methylcarbon Polyketide synthase (PKS) systems are generally known in of the fatty acid chain (c)3 or (O6) with systematic methylene the art as enzyme complexes derived from fatty acid synthase interruption of the double bonds) derived from fatty acid (FAS) systems, but which are often highly modified to pro synthase (FAS) by elongation and aerobic desaturation reac duce specialized products that typically show little resem tions (Sprecher, Curr: Opin. Clin. Nutr. Metab. Care 2, 135 blance to fatty acids. It has now been shown, however, that 65 (1999); Parker-Barnes et al., Proc. Natl. Acad. Sci. USA 97, polyketide synthase systems exist in marine bacteria and 8284 (2000); Shanklin et al., Annu. Rev. Plant Physiol. Plant certain microalgae that are capable of synthesizing PUFAS Nol. Biol. 49, 611 (1998)). Starting from acetyl-CoA, the US 7,645,597 B2 3 4 synthesis of docosahexaenoic acid (DHA) requires approxi system) are modified by a series of elongation and desatura mately 30 distinct enzyme activities and nearly 70 reactions tion reactions. Because a number of separate desaturase and including the four repetitive steps of the fatty acid synthesis elongase enzymes are required for fatty acid synthesis from cycle. Polyketide synthases (PKSs) carry out some of the linoleic and linolenic acids to produce the more saturated and same reactions as FAS (Hopwood et al., Annu. Rev. Genet. 24. longer chain PUFAs, engineering plant host cells for the 37 (1990); Bentley et al., Annu. Rev. Microbiol. 53, 411 expression of PUFAs such as EPA and docosahexaenoic acid (1999)) and use the same small protein (or domain), acyl (DHA) may require expression of several separate enzymes carrier protein (ACP), as a covalent attachment site for the to achieve synthesis. Additionally, for production ofuseable growing carbon chain. However, in these enzyme systems, quantities of such PUFAs, additional engineering efforts may the complete cycle of reduction, dehydration and reduction 10 be required, for example, engineering the down regulation of seen in FAS is often abbreviated so that a highly derivatized enzymes that compete for Substrate, engineering of higher carbon chain is produced, typically containing many keto enzyme activities such as by mutagenesis or targeting of and hydroxy-groups as well as carbon-carbon double bonds enzymes to plastid organelles. Therefore it is of interest to in the trans configuration. The linear products of PKSs are obtain genetic material involved in PUFA biosynthesis from often cyclized to form complex biochemicals that include 15 species that naturally produce these fatty acids and to express antibiotics and many other secondary products (Hopwood et the isolated material alone or in combination in a heterolo al., (1990) supra; Bentley et al., (1999), supra; Keating et al., gous system which can be manipulated to allow production of Curr. Opin. Chem. Biol. 3,598 (1999)). commercial quantities of PUFAs. Very long chain PUFAs such as docosahexaenoic acid The discovery of a PUFA PKS system in marine bacteria (DHA; 22:6c)3) and eicosapentaenoic acid (EPA; 20:5c)3) such as Shewanella and Vibrio marinus (see U.S. Pat. No. have been reported from several species of marine bacteria, 6,140,486, ibid.) provides a resource for new methods of including Shewanella sp (Nichols et al., Curr: Op. Biotechnol. commercial PUFA production. However, these marine bac 10, 240 (1999); Yazawa, Lipids 31, S (1996); DeLong et al., teria have limitations which may ultimately restrict their use Appl. Environ. Microbiol. 51, 730 (1986)). Analysis of a fulness on a commercial level. First, although U.S. Pat. No. genomic fragment (cloned as plasmid pEPA) from 25 6,140,486 discloses that these marine bacteria PUFA PKS Shewanella sp. strain SCRC2738 led to the identification of systems can be used to genetically modify plants, the marine five open reading frames (Orfs), totaling 20 Kb, that are bacteria naturally live and grow in cold marine environments necessary and sufficient for EPA production in E. coli and the enzyme systems of these bacteria do not function well (Yazawa, (1996), supra). Several of the predicted protein above 22°C. In contrast, many crop plants, which are attrac domains were homologues of FAS enzymes, while other 30 tive targets for genetic manipulation using the PUFA PKS regions showed no homology to proteins of known function. system, have normal growth conditions attemperatures above At least 11 regions within the five Orfs were identifiable as 22°C. and ranging to higher than 40°C. Therefore, the PUFA putative enzyme domains (See Metz et al., Science 293:290 PKS systems from these marine bacteria are not predicted to 293 (2001)). When compared with sequences in the gene be readily adaptable to plant expression under normal growth databases, seven of these were more strongly related to PKS 35 conditions. Additionally, the known marine bacteria PUFA proteins than to FAS proteins. Included in this group were PKS systems do not directly produce triacylglyerols (TAG), domains putatively encoding malonyl-CoA:ACP acyltrans whereas direct production of TAG would be desirable ferase (MAT), B-ketoacyl-ACP synthase (KS), B-ketoacyl because TAG are a lipid storage product, and as a result, can ACP reductase (KR), acyltransferase (AT), phosphopanteth be accumulated at very high levels in cells, as opposed to a eine transferase, chain length (or chain initiation) factor 40 'structural lipid product (e.g. phospholipids), which can (CLF) and a highly unusual cluster of six ACP domains (i.e., generally only accumulate at low levels. the presence of more than two clustered ACP domains had not With regard to the production of eicosapentaenoic acid previously been reported in PKS or FAS sequences). It is (EPA) in particular, researchers have tried to produce EPA likely that the PKS pathway for PUFA synthesis that has been with microbes by growing them in both photosynthetic and identified in Shewanella is widespread in marine bacteria. 45 heterotrophic cultures. They have also used both classical and Genes with high homology to the Shewanella gene cluster directed genetic approaches in attempts to increase the pro have been identified in Photobacterium profiundum (Allen et ductively of the organisms under culture conditions. Other al., Appli. Environ. Microbiol. 65:1710 (1999)) and in Mori researchers have attempted to produce EPA in oil-seed crop tella marina (Vibrio marinus) (see U.S. Pat. No. 6,140,486, plants by introduction of genes encoding various desaturase ibid., and Tanaka et al., Biotechnol. Lett. 21:939 (1999)). 50 and elongase enzymes. Polyunsaturated fatty acids (PUFAs) are considered to be Researchers have attempted to use cultures of red microal useful for nutritional, pharmaceutical, industrial, and other gae (Monodus), diatoms (e.g. Phaeodactylum), other purposes. An expansive supply of PUFAs from natural microalgae and fungi (e.g. Mortierella cultivated at low tem Sources and from chemical synthesis are not sufficient for peratures). However, in all cases, productivity was low com commercial needs. A major current source for PUFAs is from 55 pared to existing commercial microbial production systems marine fish; however, fish stocks are declining, and this may for other long chain PUFAs such as DHA. In many cases, the not be a Sustainable resource. Additionally, contamination, EPA occurred primarily in the phospholipids (PL) rather than both heavy metal and toxic organic molecules, is a serious the triacylglycerols (TAG). Since productivity of microalgae issue with oil derived from marine fish. Vegetable oils derived under heterotrophic growth conditions can be much higher from oil seed crops are relatively inexpensive and do not have 60 than under phototrophic conditions, researchers have the contamination issues associated with fish oils. However, attempted, and achieved, trophic conversion by introduction the PUFAs found in commercially developed plant oils are of genes encoding specific Sugar transporters. However, even typically limited to linoleic acid (eighteen carbons with 2 with the newly acquired heterotrophic capability, productiv double bonds, in the delta 9 and 12 positions—18:2 delta ity in terms of oil remained relatively low. 9,12) and linolenic acid (18:3 delta 9,12,15). In the conven 65 Efforts to produce EPA in oil-seed crop plants by modifi tional pathway for PUFA synthesis, medium chain-length cation of the endogenous fatty acid biosynthesis pathway saturated fatty acids (products of a fatty acid synthase (FAS) have only yielded plants with very low levels of the PUFA in US 7,645,597 B2 5 6 their oils. As discussed above, several marine bacteria have SEQ ID NO:47, SEQ ID NO:49, SEQ ID NO:51, SEQ ID been shown to produce PUFAs (EPA as well as DHA). How NO:53, SEQ ID NO:55, SEQ ID NO:57, SEQ ID NO:59, ever, these bacteria do not produce TAG and the EPA is found SEQID NO:61, SEQID NO:63, SEQID NO:65, and SEQID primarily in the PL membranes. The levels of EPA produced NO:67. as well as the growth characteristics of these particular marine 5 Another embodiment of the present invention relates to a bacteria (discussed above) limit their utility for commercial recombinant nucleic acid molecule comprising any of the production of EPA. above-described nucleic acid molecules, operatively linked Therefore, there is a need in the art for other PUFA PKS to at least one transcription control sequence. systems having greater flexibility for commercial use, and for Yet another embodiment of the present invention relates to a biological system that efficiently produces quantities of 10 a recombinant cell transfected with any of the above-de lipids (PL and TAG) enriched in desired PUFAs, such as EPA, scribed recombinant nucleic acid molecules. in a commercially useful production process. Another embodiment of the present invention relates to a genetically modified microorganism, wherein the microor SUMMARY OF THE INVENTION ganism expresses a PKS system comprising at least one bio 15 logically active domain of a polyunsaturated fatty acid One embodiment of the present invention relates to an (PUFA) polyketide synthase (PKS) system, wherein the at isolated nucleic acid molecule. The nucleic acid molecule least one domain of the PUFA PKS system comprises an comprises a nucleic acid sequence selected from: (a) a nucleic amino acid sequence selected from: (a) an amino acid acid sequence encoding an amino acid sequence selected sequence selected from the group consisting of SEQ ID from the group consisting of: SEQID NO:39, SEQID NO:41, NO:39, SEQ ID NO:41, SEQ ID NO:43, SEQ ID NO:45, SEQ ID NO:43, SEQ ID NO:45, SEQ ID NO:48, SEQ ID SEQ ID NO:48, SEQ ID NO:50, SEQ ID NO:52, SEQ ID NO:50, SEQ ID NO:52, SEQ ID NO:54, SEQ ID NO:56, NO:54, SEQ ID NO:56, SEQ ID NO:58, SEQ ID NO:60, SEQ ID NO:58, SEQ ID NO:60, SEQ ID NO:62, SEQ ID SEQ ID NO:62, SEQ ID NO:64, SEQ ID NO:66, SEQ ID NO:64, SEQ ID NO:66, SEQ ID NO:68 and biologically NO:68 and biologically active fragments thereof; (b) an active fragments thereof; (b) a nucleic acid sequence encod 25 amino acid sequence that is at least about 60% identical, and ing an amino acid sequence that is at least about 60% identi more preferably at least about 70% identical, and more pref cal, and more preferably at least about 70% identical, and erably at least about 80% identical, and more preferably at more preferably at least about 80% identical, and more pref least about 90% identical, to an amino acid sequence selected erably at least about 90% identical, to an amino acid sequence from the group consisting of: SEQID NO:39, SEQID NO:43, selected from the group consisting of: SEQID NO:39, SEQ 30 SEQID NO:50, SEQID NO:52, SEQID NO:56 and SEQID ID NO:43, SEQID NO:50, SEQID NO:52, SEQ ID NO:56 NO:58, wherein the amino acid sequence has a biological and SEQID NO:58, wherein the amino acid sequence has a activity of at least one domain of a polyunsaturated fatty acid biological activity of at least one domain of a polyunsaturated (PUFA) polyketide synthase (PKS) system; (c) an amino acid fatty acid (PUFA) polyketide synthase (PKS) system; (c) a sequence that is at least about 65% identical, and more pref nucleic acid sequence encoding an amino acid sequence that 35 erably at least about 70% identical, and more preferably at is at least about 65% identical, and more preferably at least least about 80% identical, and more preferably at least about about 70% identical, and more preferably at least about 80% 90% identical, to SEQ ID NO:54, wherein the amino acid identical, and more preferably at least about 90% identical, to sequence has a biological activity of at least one domain of a SEQ ID NO:54, wherein the amino acid sequence has a polyunsaturated fatty acid (PUFA) polyketide synthase biological activity of at least one domain of a polyunsaturated 40 (PKS) system; (d) an amino acid sequence that is at least fatty acid (PUFA) polyketide synthase (PKS) system; (d) a about 70% identical, and more preferably at least about 80% nucleic acid sequence encoding an amino acid sequence that identical, and more preferably at least about 90% identical, to is at least about 70% identical, and more preferably at least an amino acid sequence selected from the group consisting about 80% identical, and more preferably at least about 90% of: SEQID NO:45, SEQID NO:48, SEQID NO:60, SEQID identical, to an amino acid sequence selected from the group 45 NO:62 and SEQID NO:64, wherein the amino acid sequence consisting of: SEQ ID NO:45, SEQ ID NO:48, SEQ ID has a biological activity of at least one domain of a polyun NO:60, SEQ ID NO:62 and SEQ ID NO:64, wherein the saturated fatty acid (PUFA) polyketide synthase (PKS) sys amino acid sequence has a biological activity of at least one tem; and/or (e) an amino acid sequence that is at least about domain of a polyunsaturated fatty acid (PUFA) polyketide 80% identical, and more preferably at least about 90% iden synthase (PKS) system; (e) a nucleic acid sequence encoding 50 tical, to an amino acid sequence selected from the group an amino acid sequence that is at least about 80% identical, consisting of: SEQ ID NO:41, SEQ ID NO:66, SEQ ID and more preferably at least about 90% identical, to an amino NO:68, wherein the amino acid sequence has a biological acid sequence selected from the group consisting of: SEQID activity of at least one domain of a polyunsaturated fatty acid NO:41, SEQID NO:66, SEQID NO:68, wherein the amino (PUFA) polyketide synthase (PKS) system. The microorgan acid sequence has a biological activity of at least one domain 55 ism is genetically modified to affect the activity of the PKS of a polyunsaturated fatty acid (PUFA) polyketide synthase system. (PKS) system; and/or (f) a nucleic acid sequence that is fully In one aspect, the microorganism is genetically modified complementary to the nucleic acid sequence of (a), (b), (c), by transfection with a recombinant nucleic acid molecule (d), or (e). In one aspect, the nucleic acid sequence encodes an encoding the at least one domain of a polyunsaturated fatty amino acid sequence selected from: SEQID NO:39, SEQID 60 acid (PUFA) polyketide synthase (PKS) system. For NO:41, SEQ ID NO:43, SEQ ID NO:45, SEQ ID NO:48, example, the microorganism can include a Thraustochytrid, SEQ ID NO:50, SEQ ID NO:52, SEQ ID NO:54, SEQ ID Such as a Schizochytrium. In one aspect, Such a microorgan NO:56, SEQ ID NO:58, SEQ ID NO:60, SEQ ID NO:62, ism has been further genetically modified to recombinantly SEQ ID NO:64, SEQID NO:66, SEQID NO:68, and bio express at least one nucleic acid molecule encoding at least logically active fragments thereof. In one aspect, the nucleic 65 one biologically active domain from a PKS system selected acid sequence is selected from the group consisting of: SEQ from the group consisting of: a bacterial PUFA PKS system, ID NO:38, SEQID NO:40, SEQID NO:42, SEQID NO:44, a Type IPKS system, a Type II PKS system, a modular PKS US 7,645,597 B2 7 8 system, and a non-bacterial PUFA PKS system. The non acid sequence selected from the group consisting of: SEQID bacterial PUFA PKS system can include a Thraustochytrid NO:39, SEQ ID NO:41, SEQ ID NO:43, SEQ ID NO:45, PUFAPKS system and in one aspect, a Schizochytrium PUFA SEQ ID NO:48, SEQ ID NO:50, SEQ ID NO:52, SEQ ID PKS system. NO:54, SEQ ID NO:56, SEQ ID NO:58, SEQ ID NO:60, In another aspect, the microorganism endogenously 5 SEQ ID NO:62, SEQ ID NO:64, SEQ ID NO:66, SEQ ID expresses a PKS system comprising the at least one domain of NO:68 and biologically active fragments thereof; (b) an the PUFA PKS system, and wherein the genetic modification amino acid sequence that is at least about 60% identical, and is in a nucleic acid sequence encoding at least one domain of more preferably at least about 70% identical, and more pref the PUFA PKS system. In another aspect, such a microorgan erably at least about 80% identical, and more preferably at ism has been further genetically modified to recombinantly 10 least about 90% identical, to an amino acid sequence selected express at least one nucleic acid molecule encoding at least from the group consisting of: SEQID NO:39, SEQID NO:43, one biologically active domain from a PKS system selected SEQID NO:50, SEQID NO:52, SEQID NO:56 and SEQID from the group consisting of: a bacterial PUFA PKS system, NO:58, wherein the amino acid sequence has a biological a Type IPKS system, a Type II PKS system, a modular PKS activity of at least one domain of a polyunsaturated fatty acid system, and a non-bacterial PUFA PKS system (e.g., a 15 (PUFA) polyketide synthase (PKS) system; (c) an amino acid Thraustochytrid PUFA PKS system, such as a sequence that is at least about 65% identical, and more pref Schizochytrium PUFA PKS system). erably at least about 70% identical, and more preferably at In another aspect, the microorganism endogenously least about 80% identical, and more preferably at least about expresses a PUFA PKS system comprising the at least one 90% identical, to SEQ ID NO:54, wherein the amino acid biologically active domain of a PUFA PKS system, and 20 sequence has a biological activity of at least one domain of a wherein the genetic modification comprises expression of a polyunsaturated fatty acid (PUFA) polyketide synthase recombinant nucleic acid molecule selected from the group (PKS) system; (d) an amino acid sequence that is at least consisting of a recombinant nucleic acid molecule encoding about 70% identical, and more preferably at least about 80% at least one biologically active domain from a second PKS identical, and more preferably at least about 90% identical, to system and a recombinant nucleic acid molecule encoding a 25 an amino acid sequence selected from the group consisting protein that affects the activity of the endogenous PUFAPKS of: SEQID NO:45, SEQID NO:48, SEQID NO:60, SEQID system. The biologically active domain from a second PKS NO:62 and SEQID NO:64, wherein the amino acid sequence system can include, but is not limited to: (a) a domain of a has a biological activity of at least one domain of a polyun polyunsaturated fatty acid (PUFA) polyketide synthase saturated fatty acid (PUFA) polyketide synthase (PKS) sys (PKS) system from a Thraustochytrid microorganism; (b) a 30 tem; and/or (e) an amino acid sequence that is at least about domain of a PUFA PKS system from a microorganism iden 80% identical, and more preferably at least about 90% iden tified by the following method: (i) selecting a microorganism tical, to an amino acid sequence selected from the group that produces at least one PUFA; and, (ii) identifying a micro consisting of: SEQ ID NO:41, SEQ ID NO:66, SEQ ID organism from (i) that has an ability to produce increased NO:68, wherein the amino acid sequence has a biological PUFAs under dissolved oxygen conditions of less than about 35 activity of at least one domain of a polyunsaturated fatty acid 5% of saturation in the fermentation medium, as compared to (PUFA) polyketide synthase (PKS) system. In one aspect, the production of PUFAs by the microorganism under dissolved at least one domain of the PUFA PKS system comprises an oxygen conditions of greater than about 5% of Saturation in amino acid sequence selected from the group consisting of the fermentation medium; (c) a domain comprising an amino SEQ ID NO:39, SEQ ID NO:41, SEQ ID NO:43, SEQ ID acid sequence selected from the group consisting of: SEQID 40 NO:45, SEQ ID NO:48, SEQ ID NO:50, SEQ ID NO:52, NO:2, SEQID NO:4, SEQID NO:6, SEQID NO:8, SEQID SEQ ID NO:54, SEQ ID NO:56, SEQ ID NO:58, SEQ ID NO:10, SEQ ID NO:13, SEQ ID NO:18, SEQ ID NO:20, NO:60, SEQID NO:62, SEQID NO:64, SEQID NO:66 and SEQ ID NO:22, SEQ ID NO:24, SEQ ID NO:26, SEQ ID SEQID NO:68 and biologically active fragments thereof. In NO:28, SEQ ID NO:30, SEQ ID NO:32, and biologically one aspect, the plant has been further genetically modified to active fragments thereof, and (d) a domain comprising an 45 recombinantly express at least one nucleic acid molecule amino acid sequence that is at least about 60% identical, and encoding at least one biologically active domain from a PKS more preferably at least about 70% identical, and more pref system selected from the group consisting of a bacterial erably at least about 80% identical, and more preferably at PUFA PKS system, a Type I PKS system, a Type II PKS least about 90% identical, to the amino acid sequence of (c), system, a modular PKS system, and a non-bacterial PUFA wherein the amino acid sequence has a biological activity of 50 PKS system (e.g., a Thraustochytrid PUFA PKS system, such at least one domain of a polyunsaturated fatty acid (PUFA) as a Schizochytrium PUFA PKS system). polyketide synthase (PKS) system. In one aspect, recombi Yet another embodiment of the present invention relates to nant nucleic acid molecule encodes a phosphopantetheline a method to produce a bioactive molecule that is produced by transferase. In one aspect, the second PKS system is selected a polyketide synthase system, comprising culturing under from the group consisting of: a bacterial PUFA PKS system, 55 conditions effective to produce the bioactive molecule a a type I PKS system, a type II PKS system, a modular PKS genetically modified organism that expresses a PKS system system, and a non-bacterial PUFA PKS system (e.g., a comprising at least one biologically active domain of a poly eukaryotic PUFA PKS system, such as a Thraustochytrid unsaturated fatty acid (PUFA) polyketide synthase (PKS) PUFA PKS system, including, but not limited to a system, wherein the at least one domain of the PUFA PKS Schizochytrium PUFA PKS system). 60 system comprises an amino acid sequence selected from the Yet another embodiment of the present invention relates to group consisting of: (a) an amino acid sequence selected from a genetically modified plant, wherein the plant has been the group consisting of: SEQIDNO:39, SEQID NO:41, SEQ genetically modified to recombinantly express a PKS system ID NO:43, SEQID NO:45, SEQID NO:48, SEQID NO:50, comprising at least one biologically active domain of a poly SEQ ID NO:52, SEQ ID NO:54, SEQ ID NO:56, SEQ ID unsaturated fatty acid (PUFA) polyketide synthase (PKS) 65 NO:58, SEQ ID NO:60, SEQ ID NO:62, SEQ ID NO:64, system, wherein the domain comprises an amino acid SEQID NO:66, SEQID NO:68 and biologically active frag sequence selected from the group consisting of: (a) an amino ments thereof; (b) an amino acid sequence that is at least US 7,645,597 B2 10 about 60% identical, and more preferably at least about 70% nucleic acid molecule encoding a protein that regulates the identical, and more preferably at least about 80% identical, chain length of fatty acids produced by the PUFA PKS sys and more preferably at least about 90% identical, to an amino tem. acid sequence selected from the group consisting of: SEQID In another aspect, the bioactive molecule is selected from: NO:39, SEQ ID NO:43, SEQ ID NO:50, SEQ ID NO:52, an anti-inflammatory formulation, a chemotherapeutic agent, SEQ ID NO:56 and SEQID NO:58, wherein the amino acid an active excipient, an osteoporosis drug, an anti-depressant, sequence has a biological activity of at least one domain of a an anti-convulsant, an anti-Heliobactor pylori drug, a drug polyunsaturated fatty acid (PUFA) polyketide synthase for treatment of neurodegenerative disease, a drug for treat (PKS) system; (c) an amino acid sequence that is at least ment of degenerative liver disease, an antibiotic, and/or a about 65% identical, and more preferably at least about 70% 10 cholesterollowering formulation. In one aspect, the bioactive identical, and more preferably at least about 80% identical, molecule is an antibiotic. In another aspect, the bioactive and more preferably at least about 90% identical, to SEQID molecule is a polyunsaturated fatty acid (PUFA). In yet NO:54, wherein the amino acid sequence has a biological another aspect, the bioactive molecule is a molecule includ activity of at least one domain of a polyunsaturated fatty acid ing carbon-carbon double bonds in the cis configuration. In (PUFA) polyketide synthase (PKS) system; (d) an amino acid 15 one aspect, the bioactive molecule is a molecule including a sequence that is at least about 70% identical, and more pref double bond at every third carbon. In one aspect, the organism erably at least about 80% identical, and more preferably at is a microorganism. In another aspect, the organism is a plant. least about 90% identical, to an amino acid sequence selected Another embodiment of the present invention relates to a from the group consisting of: SEQID NO:45, SEQID NO:48, method to produce a plant that has a polyunsaturated fatty SEQID NO:60, SEQID NO:62 and SEQID NO:64, wherein acid (PUFA) profile that differs from the naturally occurring the amino acid sequence has a biological activity of at least plant, comprising genetically modifying cells of the plant to one domain of a polyunsaturated fatty acid (PUFA) express a PKS system comprising at least one recombinant polyketide synthase (PKS) system; and/or (e) an amino acid nucleic acid molecule comprising a nucleic acid sequence sequence that is at least about 80% identical, and more pref encoding at least one biologically active domain of a PUFA erably at least about 90% identical, to an amino acid sequence 25 PKS system, wherein the at least one domain of the PUFA selected from the group consisting of: SEQID NO:41, SEQ PKS system comprises an amino acid sequence selected from ID NO:66, SEQID NO:68, wherein the amino acid sequence the group consisting of: (a) an amino acid sequence selected has a biological activity of at least one domain of a polyun from the group consisting of: SEQID NO:39, SEQID NO:41, saturated fatty acid (PUFA) polyketide synthase (PKS) sys SEQ ID NO:43, SEQ ID NO:45, SEQ ID NO:48, SEQ ID tem. 30 NO:50, SEQ ID NO:52, SEQ ID NO:54, SEQ ID NO:56, In one aspect, the organism endogenously expresses a PKS SEQ ID NO:58, SEQ ID NO:60, SEQ ID NO:62, SEQ ID system comprising the at least one domain of the PUFA PKS NO:64, SEQ ID NO:66, SEQ ID NO:68 and biologically system, and wherein the genetic modification is in a nucleic active fragments thereof; (b) an amino acid sequence that is at acid sequence encoding the at least one domain of the PUFA least about 60% identical, and more preferably at least about PKS system. In one aspect, the genetic modification changes 35 70% identical, and more preferably at least about 80% iden at least one product produced by the endogenous PKS system, tical, and more preferably at least about 90% identical, to an as compared to an organism wherein the PUFA PKS system amino acid sequence selected from the group consisting of has not been genetically modified. SEQ ID NO:39, SEQ ID NO:43, SEQ ID NO:50, SEQ ID In another aspect, the organism endogenously expresses a NO:52, SEQ ID NO:56 and SEQ ID NO:58, wherein the PKS system comprising the at least one biologically active 40 amino acid sequence has a biological activity of at least one domain of the PUFA PKS system, and the genetic modifica domain of a polyunsaturated fatty acid (PUFA) polyketide tion comprises transfection of the organism with a recombi synthase (PKS) system; (c) an amino acid sequence that is at nant nucleic acid molecule selected from the group consisting least about 65% identical, and more preferably at least about of a recombinant nucleic acid molecule encoding at least one 70% identical, and more preferably at least about 80% iden biologically active domain from a second PKS system and a 45 tical, and more preferably at least about 90% identical, to recombinant nucleic acid molecule encoding a protein that SEQ ID NO:54, wherein the amino acid sequence has a affects the activity of the PUFA PKS system. In one aspect, biological activity of at least one domain of a polyunsaturated the genetic modification changes at least one product pro fatty acid (PUFA) polyketide synthase (PKS) system; (d) an duced by the endogenous PKS system, as compared to an amino acid sequence that is at least about 70% identical, and organism that has not been genetically modified to affect 50 more preferably at least about 80% identical, and more pref PUFA production. erably at least about 90% identical, to an amino acid sequence In another aspect, the organism is genetically modified by selected from the group consisting of: SEQID NO:45, SEQ transfection with a recombinant nucleic acid molecule encod ID NO:48, SEQ ID NO:60, SEQ ID NO:62 and SEQ ID ing the at least one domain of the polyunsaturated fatty acid NO:64, wherein the amino acid sequence has a biological 55 activity of at least one domain of a polyunsaturated fatty acid (PUFA) polyketide synthase (PKS) system. (PUFA) polyketide synthase (PKS) system; and (e) an amino In another aspect, the organism produces a polyunsaturated acid sequence that is at least about 80% identical, and more fatty acid (PUFA) profile that differs from the naturally occur preferably at least about 90% identical, to an amino acid ring organism without a genetic modification. sequence selected from the group consisting of SEQ ID In another aspect, the organism endogenously expresses a 60 NO:41, SEQID NO:66, SEQID NO:68, wherein the amino non-bacterial PUFA PKS system, and wherein the genetic acid sequence has a biological activity of at least one domain modification comprises Substitution of a domain from a dif of a polyunsaturated fatty acid (PUFA) polyketide synthase ferent PKS system for a nucleic acid sequence encoding at (PKS) system. least one domain of the non-bacterial PUFA PKS system. Another embodiment of the present invention relates to a In yet another aspect, the organism endogenously 65 method to modify an endproduct containing at least one fatty expresses a non-bacterial PUFA PKS system that has been acid, comprising adding to the endproduct an oil produced by modified by transfecting the organism with a recombinant a recombinant host cell that expresses at least one recombi US 7,645,597 B2 11 12 nant nucleic acid molecule comprising a nucleic acid group consisting of: SEQID NO:39, SEQID NO:41, SEQID sequence encoding at least one biologically active domain of NO:43, SEQ ID NO:45, SEQ ID NO:48, SEQ ID NO:50, a PUFA PKS system, wherein the at least one domain of a SEQ ID NO:52, SEQ ID NO:54, SEQ ID NO:56, SEQ ID PUFA PKS system comprises an amino acid sequence NO:58, SEQ ID NO:60, SEQ ID NO:62, SEQ ID NO:64, selected from the group consisting of: (a) an amino acid SEQID NO:66, SEQID NO:68 and biologically active frag sequence selected from the group consisting of SEQ ID ments thereof; (b) an amino acid sequence that is at least NO:39, SEQ ID NO:41, SEQ ID NO:43, SEQ ID NO:45, about 60% identical, and more preferably at least about 70% SEQ ID NO:48, SEQ ID NO:50, SEQ ID NO:52, SEQ ID identical, and more preferably at least about 80% identical, NO:54, SEQ ID NO:56, SEQ ID NO:58, SEQ ID NO:60, and more preferably at least about 90% identical, to an amino SEQ ID NO:62, SEQ ID NO:64, SEQ ID NO:66, SEQ ID 10 acid sequence selected from the group consisting of: SEQID NO:68 and biologically active fragments thereof; (b) an NO:39, SEQ ID NO:43, SEQ ID NO:50, SEQ ID NO:52, amino acid sequence that is at least about 60% identical, and SEQID NO:56 and SEQID NO:58, wherein the amino acid more preferably at least about 70% identical, and more pref sequence has a biological activity of at least one domain of a erably at least about 80% identical, and more preferably at polyunsaturated fatty acid (PUFA) polyketide synthase least about 90% identical, to an amino acid sequence selected 15 (PKS) system; (c) an amino acid sequence that is at least from the group consisting of: SEQID NO:39, SEQID NO:43, about 65% identical, and more preferably at least about 70% SEQID NO:50, SEQID NO:52, SEQID NO:56 and SEQID identical, and more preferably at least about 80% identical, NO:58, wherein the amino acid sequence has a biological and more preferably at least about 90% identical, to SEQID activity of at least one domain of a polyunsaturated fatty acid NO:54, wherein the amino acid sequence has a biological (PUFA) polyketide synthase (PKS) system; (c) an amino acid activity of at least one domain of a polyunsaturated fatty acid sequence that is at least about 65% identical, and more pref (PUFA) polyketide synthase (PKS) system; (d) an amino acid erably at least about 70% identical, and more preferably at sequence that is at least about 70% identical, and more pref least about 80% identical, and more preferably at least about erably at least about 80% identical, and more preferably at 90% identical, to SEQ ID NO:54, wherein the amino acid least about 90% identical, to an amino acid sequence selected sequence has a biological activity of at least one domain of a 25 from the group consisting of: SEQID NO:45, SEQID NO:48, polyunsaturated fatty acid (PUFA) polyketide synthase SEQID NO:60, SEQID NO:62 and SEQID NO:64, wherein (PKS) system; (d) an amino acid sequence that is at least the amino acid sequence has a biological activity of at least about 70% identical, and more preferably at least about 80% one domain of a polyunsaturated fatty acid (PUFA) identical, and more preferably at least about 90% identical, to polyketide synthase (PKS) system; and (e) an amino acid an amino acid sequence selected from the group consisting 30 sequence that is at least about 80% identical, and more pref of: SEQID NO:45, SEQID NO:48, SEQID NO:60, SEQID erably at least about 90% identical, to an amino acid sequence NO:62 and SEQID NO:64, wherein the amino acid sequence selected from the group consisting of: SEQID NO:41, SEQ has a biological activity of at least one domain of a polyun ID NO:66, SEQID NO:68, wherein the amino acid sequence saturated fatty acid (PUFA) polyketide synthase (PKS) sys has a biological activity of at least one domain of a polyun tem; and (e) an amino acid sequence that is at least about 80% 35 saturated fatty acid (PUFA) polyketide synthase (PKS) sys identical, and more preferably at least about 90% identical, to tem. an amino acid sequence selected from the group consisting Another embodiment of the present invention relates to a of: SEQID NO:41, SEQID NO:66, SEQID NO:68, wherein genetically modified Thraustochytrid microorganism, the amino acid sequence has a biological activity of at least wherein the microorganism has an endogenous polyunsatu one domain of a polyunsaturated fatty acid (PUFA) 40 rated fatty acid (PUFA) polyketide synthase (PKS) system, polyketide synthase (PKS) system. In one aspect, the end and wherein the endogenous PUFA PKS system has been product is selected from: a dietary Supplement, a food prod genetically modified to alter the expression profile of a poly uct, a pharmaceutical formulation, a humanized animal milk, unsaturated fatty acid (PUFA) by the Thraustochytrid micro and an infant formula. In one aspect, the pharmaceutical organism as compared to the Thraustochytrid microorganism formulation is selected from the group consisting of an anti 45 in the absence of the genetic modification. inflammatory formulation, a chemotherapeutic agent, an In one aspect, the endogenous PUFA PKS system has been active excipient, an osteoporosis drug, an anti-depressant, an modified by mutagenesis of a nucleic acid sequence that anti-convulsant, an anti-Heliobactor pylon drug, a drug for encodes at least one domain of the endogenous PUFA PKS treatment of neurodegenerative disease, a drug for treatment system. In one aspect, the modification is produced by tar of degenerative liver disease, an antibiotic, and a cholesterol 50 geted mutagenesis. In another aspect, the modification is lowering formulation. In one aspect, the endproduct is used to produced by classical mutagenesis and screening. treat a condition selected from the group consisting of In another aspect, the endogenous PUFA PKS system has chronic inflammation, acute inflammation, gastrointestinal been modified by deleting at least one nucleic acid sequence disorder, cancer, cachexia, cardiac restenosis, neurodegen that encodes at least one domain of the endogenous PUFA erative disorder, degenerative disorder of the liver, blood lipid 55 PKS system and inserting therefore a nucleic acid sequence disorder, osteoporosis, osteoarthritis, autoimmune disease, encoding a homologue of the endogenous domain to alter the preeclampsia, preterm birth, age related maculopathy, pull PUFA production profile of the Thraustochytrid microorgan monary disorder, and peroxisomal disorder. ism, wherein the homologue has a biological activity of at Yet another embodiment of the present invention relates to least one domain of a PKS system. In one aspect, the homo a method to produce a humanized animal milk, comprising 60 logue of the endogenous domain comprises a modification, as genetically modifying milk-producing cells of a milk-pro compared to the endogenous domain, selected from the group ducing animal with at least one recombinant nucleic acid consisting of at least one deletion, insertion or Substitution molecule comprising a nucleic acid sequence encoding at that results in an alteration of PUFA production profile by the least one biologically active domain of a PUFA PKS system, microorganism. In another aspect, the amino acid sequence of wherein the at least one domain of the PUFA PKS system 65 the homologue is at least about 60% identical, and more comprises an amino acid sequence selected from the group preferably about 70% identical, and more preferably about consisting of: (a) an amino acid sequence selected from the 80% identical, and more preferably about 90% identical to the US 7,645,597 B2 13 14 amino acid sequence of the endogenous domain. In one Japonochytrium. In another aspect, the Thraustochytrid is aspect, homologue of the endogenous domain is a domain from the genus Schizochytrium. In another aspect, the Thraus from a PUFA PKS system of another Thraustochytrid micro tochytrid is from a Schizochytrium species selected from the organism. group consisting of Schizochytrium aggregatum, In another aspect, the endogenous PUFA PKS system has Schizochytrium limacinum, and Schizochytrium minutum. In been modified by deleting at least one nucleic acid sequence another aspect, the Thraustochytrid is from the genus Thraus that encodes at least one domain of the endogenous PUFA tochytrium. PKS system and inserting therefore a nucleic acid sequence Yet another embodiment of the present invention relates to encoding at least one domain of a PKS system from a differ a genetically modified Schizochytrium that produces eicosa ent microorganism. In one aspect, the nucleic acid sequence 10 pentaenoic acid (EPA), wherein the Schizochytrium has an encoding at least one domain of a PKS system from a differ endogenous polyunsaturated fatty acid (PUFA) polyketide ent microorganism is from a bacterial PUFAPKS system. For synthase (PKS) system comprising a genetic modification in example, the different microorganism can be a marine bacte at least one nucleic acid sequence that encodes at least one ria having a PUFA PKS system that naturally produces domain of the endogenous PUFA PKS system that results in PUFAs at a temperature of about 25° C. or greater. In one 15 the production of EPA by the Schizochytrium. In one aspect, aspect, the marine bacteria is selected from the group con the Schizochytrium comprises a genetic modification in at sisting of Shewanella Olleyana and Shewanella japonica. In least one nucleic acid sequence encoding at least one domain one aspect, the domain of a PKS system from a different having a biological activity of at least one of the following microorganism is from a PKS system selected from the group proteins: malonyl-CoA:ACP acyltransferase (MAT), B-keto consisting of a Type IPKS system, a Type II PKS system, a acyl-ACP synthase (KS), ketoreductase (KR), acyltrans modular PKS system, and a PUFA PKS system from a dif ferase (AT), FabA-like B-hydroxyacyl-ACP dehydrase (DH), ferent Thraustochytrid microorganism. phosphopantetheline transferase, chain length factor (CLF), In any of the above aspects, the domain of the endogenous acyl carrier protein (ACP), enoyl ACP-reductase (ER), an PUFAPKS system can include, but is not limited to, a domain enzyme that catalyzes the synthesis of trans-2-acyl-ACP, an having a biological activity of at least one of the following 25 enzyme that catalyzes the reversible isomerization of trans proteins: malonyl-CoA:ACP acyltransferase (MAT), B-keto 2-acyl-ACP to cis-3-acyl-ACP, and an enzyme that catalyzes acyl-ACP synthase (KS), ketoreductase (KR), acyltrans the elongation of cis-3-acyl-ACP to cis-5-3-keto-acyl-ACP. ferase (AT), FabA-like B-hydroxyacyl-ACP dehydrase (DH), In one aspect, the Schizochytrium comprises a genetic modi phosphopantetheline transferase, chain length factor (CLF), fication in at least one nucleic acid sequence encoding at least acyl carrier protein (ACP), enoyl ACP-reductase (ER), an 30 one domain from the open reading frame encoding SEQID enzyme that catalyzes the synthesis of trans-2-acyl-ACP, an NO:2 of the endogenous PUFA PKS system. In one aspect, enzyme that catalyzes the reversible isomerization of trans the Schizochytrium comprises a genetic modification in at 2-acyl-ACP to cis-3-acyl-ACP, and an enzyme that catalyzes least one nucleic acid sequence encoding at least one domain the elongation of cis-3-acyl-ACP to cis-5-3-keto-acyl-ACP. from the open reading frame encoding SEQID NO:4 of the In any of the above aspects, the domain of the endogenous 35 endogenous PUFA PKS system. In one aspect, the PUFA PKS system can include an amino acid sequence Schizochytrium comprises a genetic modification in at least selected from the group consisting of: (a) an amino acid one nucleic acid sequence encoding at least one domain from sequence selected from the group consisting of SEQ ID the open reading frame encoding SEQID NO:6 of the endog NO:2, SEQID NO:4, SEQID NO:6, SEQID NO:8, SEQID enous PUFA PKS system. In one aspect, the Schizochytrium NO:10, SEQ ID NO:13, SEQ ID NO:18, SEQ ID NO:20, 40 comprises a genetic modification in at least one nucleic acid SEQ ID NO:22, SEQ ID NO:24, SEQ ID NO:26, SEQ ID sequence encoding at least one domain having a biological NO:28, SEQ ID NO:30, SEQ ID NO:32, SEQ ID NO:39, activity of at least one of the following proteins: B-ketoacyl SEQ ID NO:41, SEQ ID NO:43, SEQ ID NO:45, SEQ ID ACP synthase (KS), FabA-like B-hydroxy acyl-ACP dehy NO:48, SEQ ID NO:50, SEQ ID NO:52, SEQ ID NO:54, drase (DH), chain length factor (CLF), an enzyme that cata SEQ ID NO:56, SEQ ID NO:58, SEQ ID NO:60, SEQ ID 45 lyzes the synthesis of trans-2-acyl-ACP, an enzyme that NO:62, SEQID NO:64, SEQID NO:66, SEQID NO:68 and catalyzes the reversible isomerization of trans-2-acyl-ACP to biologically active fragments thereof, and (b) an amino acid cis-3-acyl-ACP, and an enzyme that catalyzes the elongation sequence that is at least about 60% identical, and more pref of cis-3-acyl-ACP to cis-5-B-keto-acyl-ACP. In one aspect, erably at least about 70% identical, and more preferably at the Schizochytrium comprises a genetic modification in at least about 80% identical, and more preferably at least about 50 least one nucleic acid sequence encoding an amino acid 90% identical, to an amino acid sequence of (a), wherein the sequence selected from the group consisting of SEQ ID amino acid sequence has a biological activity of at least one NO:20, SEQID NO:22, SEQID NO:28 and SEQID NO:30 domain of a polyunsaturated fatty acid (PUFA) polyketide of the endogenous PUFA PKS system. In one aspect, the synthase (PKS) system. Schizochytrium has been modified by deleting at least one In one aspect, the PUFA production profile is altered to 55 nucleic acid sequence that encodes at least one domain of the initiate, increase or decrease production of eicosapentaenoic endogenous PUFA PKS system and inserting therefore a acid (EPA) by the microorganism. In another aspect, the nucleic acid sequence encoding at least one domain of a PKS PUFA production profile is altered to initiate, increase or system from a non-Schizochytrium microorganism. In one decrease production of docosahexaenoic acid (DHA) by the aspect, the non-Schizochytrium microorganism grows and microorganism. In another aspect, the PUFA production pro 60 produces PUFAs attemperature of at least about 15°C., and file is altered to initiate, increase or decrease production of more preferably at least about 20°C., and more preferably at one or both isomers of docosapentaenoic acid (DPA) by the least about 25°C., and more preferably at least about 30°C., microorganism. In another aspect, the PUFA production pro and more preferably between about 20° C. and about 40°C. In file is altered to initiate, increase or decrease production of one aspect, the nucleic acid sequence encoding at least one arachidonic acid (ARA) by the microorganism. In another 65 domain of a PKS system from a non-Schizochytrium micro aspect, the Thraustochytrid is from a genus selected from the organism is from a bacterial PUFA PKS system. In one group consisting of Schizochytrium, Thraustochytrium, and aspect, the bacterial PUFA PKS system is from a bacterium US 7,645,597 B2 15 16 selected from the group consisting of Shewanella Olleyana enoyl reduction reactions in selected cycles (See FIG. 1, for and Shewanella japonica. In another aspect, the nucleic acid example). Reference to a PUFA PKS system refers collec sequence encoding at least one domain of a PKS system is tively to all of the genes and their encoded products that work selected from the group consisting of a Type IPKS system, a in a complex to produce PUFAs in an organism. Therefore, Type II PKS system, a modular PKS system, and a non the PUFA PKS system refers specifically to a PKS system for bacterial PUFA PKS system (e.g., a eukaryotic PUFA PKS which the natural products are PUFAs. system, such as a Thraustochytrid PUFA PKS system). More specifically, first, a PUFA PKS system that forms the Another embodiment of the present invention relates to a basis of this invention produces polyunsaturated fatty acids genetically modified Schizochytrium that produces increased (PUFAs) as products (i.e., an organism that endogenously amounts of docosahexaenoic acid (DHA) as compared to a 10 (naturally) contains such a PKS system makes PUFAs using non-genetically modified Schizochytrium, wherein the this system). The PUFAs referred to herein are preferably Schizochytrium has an endogenous polyunsaturated fatty acid polyunsaturated fatty acids with a carbon chain length of at (PUFA) polyketide synthase (PKS) system comprising a least 16 carbons, and more preferably at least 18 carbons, and genetic modification in at least one nucleic sequence that more preferably at least 20 carbons, and more preferably 22 encodes at least one domain of the endogenous PUFA PKS 15 or more carbons, with at least 3 or more double bonds, and system that results in increased the production of DHA by the preferably 4 or more, and more preferably 5 or more, and even Schizochytrium. In one aspect, at least one domain of the more preferably 6 or more double bonds, wherein all double endogenous PUFA PKS system has been modified by substi bonds are in the cis configuration. It is an object of the present tution for at least one domain of a PUFA PKS system from invention to find or create via genetic manipulation or Thraustochytrium. In one aspect, the ratio of DHA to DPA manipulation of the endproduct, PKS systems which produce produced by the Schizochytrium is increased as compared to polyunsaturated fatty acids of desired chain length and with a non-genetically modified Schizochytrium. desired numbers of double bonds. Examples of PUFAs Another embodiment of the present invention relates to a include, but are not limited to, DHA (docosahexaenoic acid method to produce lipids enriched for at least one selected (C22:6, ()-3)), ARA (eicosatetraenoic acid or arachidonic polyunsaturated fatty acid (PUFA), comprising culturing 25 acid (C20:4, n-6)), DPA (docosapentaenoic acid (C22:5, (D-6 under conditions effective to produce the lipids a genetically or (O-3)), and EPA (eicosapentaenoic acid (C20:5, c)-3)). modified Thraustochytrid microorganism as described above Second, the PUFA PKS system described herein incorpo or a genetically modified Schizochytrium as described above. rates both iterative and non-iterative reactions, which distin In one aspect, the selected PUFA is eicosapentaenoic acid guish the system from previously described PKS systems (EPA). 30 (e.g., type I, type II or modular). More particularly, the PUFA Yet another embodiment of the present invention relates to PKS system described herein contains domains that appear to a method to produce eicosapentaenoic acid (EPA)-enriched function during each cycle as well as those which appear to lipids, comprising culturing under conditions effective to pro function during only some of the cycles. A key aspect of this duce the EPA-enriched lipids a genetically modified Thraus functionality may be related to the domains showing homol tochytrid microorganism, wherein the microorganism has an 35 ogy to the bacterial Fab-A enzymes. For example, the Fab-A endogenous polyunsaturated fatty acid (PUFA) polyketide enzyme of E. coli has been shown to possess two enzymatic synthase (PKS) system, and wherein the endogenous PUFA activities. It possesses a dehydration activity in which a water PKS system has been genetically modified in at least one molecule (H2O) is abstracted from a carbon chain containing domain to initiate or increase the production of EPA in the a hydroxy group, leaving a trans double bond in that carbon lipids of the microorganism as compared to in the absence of 40 chain. In addition, it has an isomerase activity in which the the modification. trans double bond is converted to the cis configuration. This isomerization is accomplished in conjunction with a migra BRIEF DESCRIPTION OF THE FIGURES tion of the double bond position to adjacent carbons. In PKS (and FAS) systems, the main carbon chain is extended in 2 FIG. 1 is a graphical representation of the domainstructure 45 carbon increments. One can therefore predict the number of of the Schizochytrium PUFA PKS system. extension reactions required to produce the PUFA products of FIG. 2 shows a comparison of domains of PUFA PKS these PKS systems. For example, to produce DHA (C22:6, all systems from Schizochytrium and Shewanella. cis) requires 10 extension reactions. Since there are only 6 FIG. 3 shows a comparison of domains of PUFA PKS double bonds in the end product, it means that during some of systems from Schizochytrium and a related PKS system from 50 the reaction cycles, a double bond is retained (as a cis isomer), Nostoc whose product is a long chain fatty acid that does not and in others, the double bond is reduced prior to the next contain any double bonds. extension. Before the discovery of a PUFA PKS system in marine DETAILED DESCRIPTION OF THE INVENTION bacteria (see U.S. Pat. No. 6,140.486), PKS systems were not 55 known to possess this combination of iterative and selective The present invention generally relates to polyunsaturated enzymatic reactions, and they were not thought of as being fatty acid (PUFA) polyketide synthase (PKS) systems, to able to produce carbon-carbon double bonds in the cis con genetically modified organisms comprising such PUFA PKS figuration. However, the PUFA PKS system described by the systems, to methods of making and using such systems for the present invention has the capacity to introduce cis double production of products of interest, including bioactive mol 60 bonds and the capacity to vary the reaction sequence in the ecules and particularly, PUFAs, such as DHA, DPA and EPA. cycle. As used herein, a PUFA PKS system generally has the fol The present inventors propose to use these features of the lowing identifying features: (1) it produces PUFAs as a natu PUFA PKS system to produce a range of bioactive molecules ral product of the system; and (2) it comprises several multi that could not be produced by the previously described (Type functional proteins assembled into a complex that conducts 65 II, Type I and modular) PKS systems. These bioactive mol both iterative processing of the fatty acid chain as well non ecules include, but are not limited to, polyunsaturated fatty iterative processing, including trans-cis isomerization and acids (PUFAs), antibiotics or other bioactive compounds, US 7,645,597 B2 17 18 many of which will be discussed below. For example, using In one embodiment, a PUFA PKS system according to the the knowledge of the PUFA PKS gene structures described present invention comprises at least the following biologi herein, any of a number of methods can be used to alter the cally active domains: (a) at least two enoyl-ACP reductase PUFA PKS genes, or combine portions of these genes with (ER) domains; (b) at least six acyl carrier protein (ACP) other synthesis systems, including other PKS systems. Such domains; (c) at least two B-ketoacyl-ACP synthase (KS) that new products are produced. The inherent ability of this domains; (d) at least one acyltransferase (AT) domain; (e) at particular type of system to do both iterative and selective least one B-ketoacyl-ACP reductase (KR) domain; (f) at least reactions will enable this system to yield products that would two FabA-like B-hydroxyacyl-ACP dehydrase (DH) not be found if similar methods were applied to other types of domains; (g) at least one chain length factor (CLF) domain; PKS systems. 10 and (h) at least one malonyl-CoA:ACP acyltransferase (MAT) domain. The functions of these domains are generally Much of the structure of the PKS system for PUFA syn individually known in the art and will be described in detail thesis in the eukaryotic Thraustochytrid, Schizochytrium has below with regard to the PUFA PKS system of the present been described in detail in U.S. Pat. No. 6,566,583. Complete invention. sequencing of cDNA and genomic clones in Schizochytrium 15 In another embodiment, the PUFA PKS system comprises by the present inventors allowed the identification of the at least the following biologically active domains: (a) at least full-length genomic sequence of each of OrfA, OrfB and one enoyl-ACP reductase (ER) domain; (b) multiple acyl OrfQ and the complete identification of the specific domains carrier protein (ACP) domains (at least from one to four, and in these Schizochytrium Orfs with homology to those in preferably at least five, and more preferably at least six, and Shewanella (see FIG. 2 and U.S. patent application Ser. No. even more preferably seven, eight, nine, or more than nine); 10/124,800, supra). In U.S. patent application Ser. No. (c) at least two B-ketoacyl-ACP synthase (KS) domains; (d) at 10/124,800, the inventors also identified a Thraustochytrium least one acyltransferase (AT) domain; (e) at least one B-ke species as meeting the criteria for having a PUFAPKS system toacyl-ACP reductase (KR) domain; (f) at least two FabA and then demonstrated that this organism was likely to con like B-hydroxyacyl-ACP dehydrase (DH) domains; (g) at tain genes with homology to Schizochytrium PUFA PKS 25 least one chain length factor (CLF) domain; and (h) at least genes by Southern blot analysis. However, the isolation and one malonyl-CoA:ACP acyltransferase (MAT) domain. Pref determination of the structure of such genes and the domain erably, such a PUFA PKS system is a non-bacterial PUFA organization of the genes was not described in U.S. patent PKS system. application Ser. No. 10/124.800. In the present invention, the In one embodiment, a PUFA PKS system of the present inventors have now cloned and sequenced the full-length 30 invention is a non-bacterial PUFA PKS system. In other genomic sequence of homologous open reading frames words, in one embodiment, the PUFA PKS system of the (Orfs) in this Thraustochytrid of the genus Thraustochytrium present invention is isolated from an organism that is not a (specifically, Thraustochytrium sp. 23B (ATCC 20892)), and bacterium, or is a homologue of, or derived from, a PUFA have identified the domains comprising the PUFA PKS sys PKS system from an organism that is not a bacterium, Such as tem in this Thraustochytrium. Therefore, the present inven 35 a eukaryote oran archaebacterium. Eukaryotes are separated tion solves the above-mentioned problem of providing addi from prokaryotes based on the degree of differentiation of the tional PUFA PKS systems that have the flexibility for cells, with eukaryotes having more highly differentiated cells commercial use. The Thraustochytrium PUFA PKS system is and prokaryotes having less differentiated cells. In general, described in detail below. prokaryotes do not possess a nuclear membrane, do not The present invention also solves the above-identified 40 exhibit mitosis during cell division, have only one chromo problem for production of commercially valuable lipids some, their cytoplasm contains 70S ribosomes, they do not enriched in a desired PUFA, such as EPA, by the present possess any mitochondria, endoplasmic reticulum, chloro inventors’ development of genetically modified microorgan plasts, lysosomes or Golgi apparatus, their flagella (if isms and methods for efficiently producing lipids (triacylgly present) consists of a single fibril. In contrast, eukaryotes erols (TAG) as well as membrane-associated phospholipids 45 have a nuclear membrane, they do exhibit mitosis during cell (PL)) enriched in PUFAs by manipulation of the polyketide division, they have many chromosomes, their cytoplasm con synthase-like system that produces PUFAs in eukaryotes, tains 80S ribosomes, they do possess mitochondria, endo including members of the order Thraustochytriales such as plasmic reticulum, chloroplasts (in ), lysosomes and Schizochytrium and Thraustochytrium. Specifically, and by Golgi apparatus, and their flagella (if present) consists of way of example, the present inventors describe herein a strain 50 many fibrils. In general, bacteria are prokaryotes, while algae, of Schizochytrium that has previously been optimized for fungi, protist, protozoa and higher plants are eukaryotes. commercial production of oils enriched in PUFA, primarily The PUFA PKS systems of the marine bacteria (e.g., docosahexaenoic acid (DHA; C22:6 n-3) and docosapen Shewanella sp. strain SCRC2738 and Vibrio marinus) are not taenoic acid (DPA; C22:5 n-6), and that will now be geneti the basis of the present invention, although the present inven cally modified such that EPA (C20:5 n-3) production (or other 55 tion does contemplate the use of domains from these bacterial PUFA production) replaces the DHA production, without PUFA PKS systems in conjunction with domains from the sacrificing the oil productivity characteristics of the organ non-bacterial PUFA PKS systems of the present invention. In ism. In addition, the present inventors describe herein the addition, the present invention does contemplate the isolation genetic modification of Schizochytrium with PUFA PKS and use of PUFA PKS gene sets (and proteins and domains genes from Thraustochytrium to improve the DHA produc 60 encoded thereby) isolated from other bacteria (e.g. tion by the Schizochytrium organism, specifically by altering Shewanella Olleyana and Shewanella japonica) that will be the ratio of DHA to DPA produced by the microorganism particularly suitable for use as sources of PUFA PKS genes through the modification of the PUFA PKS system. These are for modifying or combining with the non-bacterial PUFA only a few examples of the technology encompassed by the PKS genes described herein to produce hybrid constructs and invention, as the concepts of the invention can readily be 65 genetically modified microorganisms and plants. For applied to other production organisms and other desired example, according to the present invention, genetically PUFAs as described in detail below. modified organisms can be produced which incorporate non US 7,645,597 B2 19 20 bacterial PUFAPKS functional domains with bacterial PUFA assumed that the Oomycetes were related to the PKS functional domains, as well as PKS functional domains algae, and eventually a wide range of ultrastructural and or proteins from other PKS systems (type I, type II, modular) biochemical studies, summarized by Barr (Barr, 1981, Bio or FAS systems. As discussed in more detail below, PUFA systems 14:359-370) supported this assumption. The PKS genes from two species of Shewanella, namely Oomycetes were in fact accepted by Leedale (Leedale, 1974, Shewanella Olleyana or Shewanella japonica, are exemplary Taxon 23:261-270) and other phycologists as part of the het bacterial genes that are preferred for use in genetically modi erokont algae. However, as a matter of convenience resulting fied microorganisms, plants, and methods of the invention. from their heterotrophic nature, the Oomycetes and Thraus PUFA PKS systems (genes and the proteins and domains tochytrids have been largely studied by mycologists (scien encoded thereby) from Such marine bacteria (e.g., 10 tists who study fungi) rather than phycologists (Scientists who Shewanella Olleyana or Shewanella japonica) are encom study algae). passed by the present invention as novel PUFA PKS From another taxonomic perspective, evolutionary biolo Sequences. gists have developed two general Schools of thought as to how According to the present invention, the terms/phrases eukaryotes evolved. One theory proposes an exogenous ori “Thraustochytrid, “Thraustochytriales microorganism” and 15 gin of membrane-bound organelles through a series of endo “microorganism of the order Thraustochytriales' can be used symbioses (Margulis, 1970, Origin of Eukaryotic Cells. Yale interchangeably and refer to any members of the order University Press, New Haven); e.g., mitochondria were Thraustochytriales, which includes both the family Thraus derived from bacterial endosymbionts, chloroplasts from tochytriaceae and the family Labyrinthulaceae. The terms cyanophytes, and flagella from Spirochaetes. The other theory “Labyrinthulid’ and “Labyrinthulaceae are used herein to Suggests a gradual evolution of the membrane-bound specifically refer to members of the family Labyrinthulaceae. organelles from the non-membrane-bounded systems of the To specifically reference Thraustochytrids that are members prokaryote ancestor via an autogenous process (Cavalier of the family Thraustochytriaceae, the term “Thraustochytri Smith, 1975, Nature (Lond.) 256:462-468). Both groups of aceae is used herein. Thus, for the present invention, mem evolutionary biologists however, have removed the bers of the Labyrinthulids are considered to be included in the 25 Oomycetes and Thraustochytrids from the fungi and place Thraustochytrids. them either with the chromophyte algae in the kingdom Chro Developments have resulted in frequent revision of the mophyta (Cavalier-Smith, 1981, BioSystems 14:461-481) taxonomy of the Thraustochytrids. Taxonomic theorists gen (this kingdom has been more recently expanded to include erally place Thraustochytrids with the algae or algae-like other protists and members of this kingdom are now called protists. However, because of taxonomic uncertainty, it would 30 Stramenopiles) or with all algae in the kingdom Protoctista be best for the purposes of the present invention to consider (Margulis and Sagen, 1985, Biosystems 18:141-147). the strains described in the present invention as Thraus With the development of electron microscopy, studies on tochytrids to include the following organisms: Order: Thraus the ultrastructure of the Zoospores of two genera of Thraus tochytriales; Family: Thraustochytriaceae (Genera: Thraus tochytrids, Thraustochytrium and Schizochytrium, (Perkins, tochytrium, Schizochytrium, Japonochytrium, 35 1976, pp. 279-312 in “Recent Advances in Aquatic Mycol Aplanochytrium, or Elina) or Labyrinthulaceae (Genera ogy’ (ed. E. B. G. Jones), John Wiley & Sons, New York; Labyrinthula, Labyrinthuloides, or Labyrinthonyxia). Also, Kazama, 1980, Can. J. Bot. 58:2434-2446; Barr, 1981, Bio the following genera are sometimes included in either family systems 14:359-370) have provided good evidence that the Thraustochytriaceae or Labyrinthulaceae: Althornia, Coral Thraustochytriaceae are only distantly related to the lochytrium, Diplophyry's, and Pyrrhosorus), and for the pur 40 Oomycetes. Additionally, genetic data representing a corre poses of this invention are encompassed by reference to a spondence analysis (a form of multivariate statistics) of 5-S Thraustochytrid or a member of the order Thraustochytriales. ribosomal RNA sequences indicate that Thraustochytriales It is recognized that at the time of this invention, revision in are clearly a unique group of eukaryotes, completely separate the taxonomy of Thraustochytrids places the genus Labyrin from the fungi, and most closely related to the red and brown thuloides in the family of Labyrinthulaceae and confirms the 45 algae, and to members of the Oomycetes (Mannella, et al., placement of the two families Thraustochytriaceae and Laby 1987, Mol. Evol. 24:228-235). Most taxonomists have agreed rinthulaceae within the Stramenopile lineage. It is noted that to remove the Thraustochytrids from the Oomycetes (Bart the Labyrinthulaceae are sometimes commonly called laby nicki-Garcia, 1987, pp. 389-403 in “Evolutionary Biology of rinthulids or labyrinthula, or labyrinthuloides and the the Fungi' (eds. Rayner, A. D. M., Brasier, C. M. & Moore, Thraustochytriaceae are commonly called thraustochytrids, 50 D.), Cambridge University Press, Cambridge). although, as discussed above, for the purposes of clarity of In Summary, employing the taxonomic system of Cavalier this invention, reference to Thraustochytrids encompasses Smith (Cavalier-Smith, 1981, BioSystems 14:461-481, 1983; any member of the order Thraustochytriales and/or includes Cavalier-Smith, 1993, Microbiol Rev. 57:953-994), the members of both Thraustochytriaceae and Labyrinthulaceae. Thraustochytrids are classified with the chromophyte algae in Recent taxonomic changes are Summarized below. 55 the kingdom Chromophyta (Stramenopiles). This taxonomic Strains of certain unicellular microorganisms disclosed placement has been more recently reaffirmed by Cavalier herein are members of the order Thraustochytriales. Thraus Smith et al. using the 18s rRNA signatures of the Heterokonta tochytrids are marine eukaryotes with an evolving taxonomic to demonstrate that Thraustochytrids are chromists not Fungi history. Problems with the taxonomic placement of the (Cavalier-Smith et al., 1994, Phil. Tran. Roy. Soc. London Thraustochytrids have been reviewed by Moss (1986), Bah 60 Series BioSciences 346:387-397). This places the Thraus nweb and Jackle (1986) and Chamberlain and Moss (1988). tochytrids in a completely different kingdom from the fungi, For convenience purposes, the Thraustochytrids were first which are all placed in the kingdom Eufungi. placed by taxonomists with other colorless Zoosporic eukary Currently, there are 71 distinct groups of eukaryotic organ otes in the Phycomycetes (algae-like fungi). The name Phy isms (Patterson 1999) and within these groups four major comycetes, however, was eventually dropped from taxo 65 lineages have been identified with some confidence: (1) nomic status, and the Thraustochytrids were retained in the Alveolates, (2) Stramenopiles, (3) a Land Plant-green algae Oomycetes (the biflagellate Zoosporic fungi). It was initially Rhodophyte Glaucophyte (“plant') clade and (4) an US 7,645,597 B2 21 22 Opisthokont clade (Fungi and Animals). Formerly these four PUFA products. The present inventors have shown that major lineages would have been labeled Kingdoms but use of 1-''C-acetate was rapidly taken up by Schizochytrium cells the "kingdom’ concept is no longer considered useful by and incorporated into fatty acids, but at the shortest labeling Some researchers. time (1 min), DHA contained 31% of the label recovered in As noted by Armstrong, Stramenopile refers to three fatty acids, and this percentage remained essentially parted tubular hairs, and most members of this lineage have unchanged during the 10-15 min of ''C-acetate incorpora flagella bearing such hairs. Motile cells of the Stramenopiles tion and the Subsequent 24 hours of culture growth. Similarly, (unicellular organisms, sperm, Zoopores) are asymmetrical DPA represented 10% of the label throughout the experiment. having two laterally inserted flagella, one long, bearing three There is no evidence for a precursor-product relationship parted tubular hairs that reverse the thrust of the flagellum, 10 between 16- or 18-carbon fatty acids and the 22-carbon poly and one short and Smooth. Formerly, when the group was less unsaturated fatty acids. These results are consistent with rapid broad, the Stramenopiles were called Kingdom Chromista or synthesis of DHA from '''C-acetate involving very small the heterokont (-different flagella) algae because those (possibly enzyme-bound) pools of intermediates. A cell-free groups consisted of the Brown Algae or Phaeophytes, along homogenate derived from Schizochytrium cultures incorpo with the yellow-green Algae, Golden-brown Algae, Eustig 15 rated 1-CI-malonyl-CoA into DHA, DPA, and saturated matophytes and Diatoms. Subsequently some heterotrophic, fatty acids. The same biosynthetic activities were retained by fungal-like organisms, the water molds, and labyrinthulids a 100,000xg supernatant fraction but were not present in the (slime net amoebas), were found to possess similar motile membrane pellet. Thus, DHA and DPA synthesis in cells, so a group name referring to photosynthetic pigments or Schizochytrium does not involve membrane-bound desatu algae became inappropriate. Currently, two of the families rases or fatty acid elongation enzymes like those described for within the Stramenopile lineage are the Labyrinthulaceae and other eukaryotes (Parker-Barnes et al., 2000, supra; Shanklin the Thraustochytriaceae. Historically, there have been numer et al., 1998, Supra). These fractionation data contrast with ous classification strategies for these unique microorganisms those obtained from the Shewanella enzymes (See Metz et al., and they are often classified under the same order (i.e., 2001, supra) and may indicate use of a different (soluble) acyl Thraustochytriales). Relationships of the members in these 25 acceptor molecule. Such as CoA, by the Schizochytrium groups are still developing. Porter and Leander have devel enzyme. It is expected that Thraustochytrium will have a oped databased on 18S small subunit ribosomal DNA indi similar biochemistry. cating the thraustochytrid-labyrinthulid clade in monophyl In U.S. Pat. No. 6,566,583, a cDNA library from etic. However, the clade is supported by two branches; the Schizochytrium was constructed and approximately 8500 first contains three species of Thraustochytrium and Ulkenia 30 random clones (ESTs) were sequenced. Sequences that profunda, and the second includes three species of Labyrin exhibited homology to 8 of the 11 domains of the Shewanella thula, two species of Labyrinthuloides and Schizochytrium PKS genes shown in FIG. 2 were all identified at frequencies aggregatum. of 0.2-0.5%. In U.S. Pat. No. 6,566,583, several cDNA clones The taxonomic placement of the Thraustochytrids as used from Schizochytrium showing homology to the Shewanella in the present invention is therefore summarized below: 35 PKS genes were sequenced, and various clones were Kingdom: Chromophyta (Stramenopiles) assembled into nucleic acid sequences representing two par Phylum: Heterokonta tial open reading frames and one complete open reading Order: Thraustochytriales (Thraustochytrids) frame. Family: Thraustochytriaceae or Labyrinthulaceae Further sequencing of cDNA and genomic clones by the Genera: Thraustochytrium, Schizochytrium, 40 present inventors allowed the identification of the full-length Japonochytrium, Aplanochytrium, Elina, Labyrinthula, genomic sequence of each of OrfA, OrfB and Orf in Labyrinthuloides, or Labyrinthulomyxa Schizochytrium and the complete identification of the Some early taxonomists separated a few original members domains in Schizochytrium with homology to those in of the genus Thraustochytrium (those with an amoeboid life Shewanella (see FIG. 2). These genes are described in detail stage) into a separate genus called Ulkenia. Howeverit is now 45 in U.S. patent application Ser. No. 10/124,800, supra and are known that most, if not all. Thraustochytrids (including described in some detail below. Thraustochytrium and Schizochytrium), exhibit amoeboid The present inventors have now identified, cloned, and stages and as such, Ulkenia is not considered by Some to be a sequenced the full-length genomic sequence of homologous valid genus. As used herein, the genus Thraustochytrium will Orfs in a Thraustochytrid of the genus Thraustochytrium include Ulkenia. 50 (specifically, Thraustochytrium sp. 23B (ATCC 20892)) and Despite the uncertainty of taxonomic placement within have identified the domains comprising the PUFA PKS sys higher classifications of Phylum and Kingdom, the Thraus tem in this Thraustochytrium. tochytrids remain a distinctive and characteristic grouping Based on the comparison of the domains of the PUFAPKS whose members remain classifiable within the order Thraus system of Schizochytrium with the domains of the PUFAPKS tochytriales. 55 system of Shewanella, clearly, the Schizochytrium genome Schizochytrium is a Thraustochytrid marine microorgan encodes proteins that are highly similar to the proteins in ism that accumulates large quantities of triacylglycerols rich Shewanella that are capable of catalyzing EPA synthesis. The in DHA and docosapentaenoic acid (DPA; 22:5 (0-6); e.g., proteins in Schizochytrium constitute a PUFA PKS system 30% DHA+DPA by dry weight (Barclay et al., J. Appl. Phy that catalyzes DHA and DPA synthesis. Simple modification col. 6, 123 (1994)). In eukaryotes that synthesize 20- and 60 of the reaction scheme identified for Shewanella will allow 22-carbon PUFAs by an elongation/desaturation pathway, the for DHA synthesis in Schizochytrium. The homology pools of 18-, 20- and 22-carbon intermediates are relatively between the prokaryotic Shewanella and eukaryotic large so that in vivo labeling experiments using '"C-acetate Schizochytrium genes suggests that the PUFAPKS has under reveal clear precursor-product kinetics for the predicted inter gone lateral gene transfer. mediates (Gellerman et al., Biochim. Biophys. Acta 573:23 65 A similar comparison can be made for Thraustochytrium. (1979)). Furthermore, radiolabeled intermediates provided In all cases, comparison of the Thraustochytrium 23B (Th. exogenously to Such organisms are converted to the final 23B) PUFA PKS proteins or domains to other known US 7,645,597 B2 23 24 sequences revealed that the closest match was one of the and similar derivatives. The acyl group destined for elonga Schizochytrium PUFA PKS proteins (OrfA, B or C, or a tion is linked to a cysteine residue at the active site of the domain therefrom) as described in U.S. patent application enzyme by a thioester bond. In the multi-step reaction, the Ser. No. 10/124,800, supra. The next closest matches in all acyl-enzyme undergoes condensation with malonyl-ACP to cases were to one of the PUFA PKS proteins from marine form -ketoacyl-ACP, CO and free enzyme. The KS plays a bacteria (Shewanella SCRC-2738, Shewanella Oneidensis, key role in the elongation cycle and in many systems has been Photobacter profiundum and Moritella marina) or from a shown to possess greater Substrate specificity than other related system found in nitrogen fixing cyanobacteria (e.g., enzymes of the reaction cycle. For example, E. coli has three Nostoc punctiforme and Nostoc sp. PCC 7120). The products distinct KS enzymes—each with its own particular role in the of the cyanobacterial enzyme systems lack double bonds and 10 physiology of the organism (Magnuson et al., Microbiol. Rev. the proteins lack domains related to the DH domains impli 57, 522 (1993)). The two KS domains of the PUFA-PKS cated in cis double bond formation (i.e., the FabA related DH systems could have distinct roles in the PUFA biosynthetic domains). reaction sequence. According to the present invention, the phrase “open read As a class of enzymes, KS's have been well characterized. ing frame' is denoted by the abbreviation “Orf. It is noted 15 The sequences of many verified KS genes are known, the that the protein encoded by an open reading frame can also be active site motifs have been identified and the crystal struc denoted in all upper case letters as “ORF and a nucleic acid tures of several have been determined. Proteins (or domains sequence for an open reading frame can also be denoted in all of proteins) can be readily identified as belonging to the KS lower case letters as “orf, but for the sake of consistency, the family of enzymes by homology to known KS sequences. spelling "Orf is preferentially used herein to describe either The second domain in OrfA is a malonyl-CoA:ACP acyl the nucleic acid sequence or the protein encoded thereby. It transferase (MAT) domain, also referred to herein as OrfA will be obvious from the context of the usage of the term MAT. This domain is contained within the nucleotide whether a protein or nucleic acid sequence is referenced. sequence spanning from a starting point of between about Schizochytrium PUFA PKS positions 1723 and 1798 of SEQID NO:1 (OrfA) to an ending 25 point of between about positions 2805 and 3000 of SEQ ID FIG. 1 is a graphical representation of the three open read NO:1. The nucleotide sequence containing the sequence ing frames from the Schizochytrium PUFA PKS system, and encoding the OrfA-MAT domain is represented herein as includes the domain structure of this PUFA PKS system. As SEQID NO:9 (positions 1723-3000 of SEQID NO:1). The described in detail in U.S. patent application Ser. No. 10/124, amino acid sequence containing the MAT domain spans from 800, the domain structure of each open reading frame is as 30 a starting point of between about positions 575 and 600 of follows: SEQ ID NO:2 (OrfA) to an ending point of between about Open Reading Frame A (OrfA): positions 935 and 1000 of SEQID NO:2. The amino acid The complete nucleotide sequence for OrfA is represented sequence containing the OrfA-MAT domain is represented herein as SEQID NO:1. OrfA is a 8730 nucleotide sequence herein as SEQ ID NO:10 (positions 575-1000 of SEQ ID (not including the stop codon) which encodes a 2910 amino 35 NO:2). It is noted that the OrfA-MAT domain contains an acid sequence, represented herein as SEQ ID NO:2. Within active site motif: GHS*XG (*acyl binding site S7), repre OrfA are twelve domains: (a) one B-ketoacyl-ACP synthase sented herein as SEQID NO:11. (KS) domain; (b) one malonyl-CoA:ACP acyltransferase According to the present invention, a domain or protein (MAT) domain; (c) nine acyl carrier protein (ACP) domains: having malonyl-CoA:ACP acyltransferase (MAT) biological and (d) one B-ketoacyl-ACP reductase (KR) domain. The 40 activity (function) is characterized as one that transfers the nucleotide sequence for OrfA has been deposited with Gen malonyl moiety from malonyl-CoA to ACP. The term “malo Bank as Accession No. AF378327 (amino acid sequence nyl-CoA:ACP acyltransferase' can be used interchangeably Accession No. AAK728879). with “malonyl acyltransferase' and similar derivatives. In The first domain in Schizochytrium OrfA is a f-ketoacyl addition to the active site motif (GXSXG), these enzymes ACP synthase (KS) domain, also referred to herein as OrfA 45 possess an extended motif (R and Q amino acids in key KS. This domain is contained within the nucleotide sequence positions) that identifies them as MAT enzymes (in contrast to spanning from a starting point of between about positions 1 the AT domain of Schizochytrium Orf B). In some PKS sys and 40 of SEQID NO:1 (OrfA) to an ending point of between tems (but not the PUFA PKS domain) MAT domains will about positions 1428 and 1500 of SEQID NO:1. The nucle preferentially load methyl- or ethyl-malonate on to the ACP otide sequence containing the sequence encoding the OrfA 50 group (from the corresponding CoA ester), thereby introduc KS domain is represented herein as SEQID NO:7 (positions ing branches into the linear carbon chain. MAT domains can 1-1500 of SEQID NO:1). The amino acid sequence contain be recognized by their homology to known MAT sequences ing the KS domain spans from a starting point of between and by their extended motif structure. about positions 1 and 14 of SEQID NO:2 (OrfA) to an ending Domains 3-11 of OrfA are nine tandem acyl carrier protein point of between about positions 476 and 500 of SEQ ID 55 (ACP) domains, also referred to hereinas OrfA-ACP (the first NO:2. The amino acid sequence containing the OrfA-KS domain in the sequence is OrfA-ACP1, the second domain is domain is represented herein as SEQ ID NO:8 (positions OrfA-ACP2, the third domain is OrfA-ACP3, etc.). The first 1-500 of SEQID NO:2). It is noted that the OrfA-KS domain ACP domain, OrfA-ACP1, is contained within the nucleotide contains an active site motif: DXAC* (*acyl binding site sequence spanning from about position 3343 to about posi C21s). 60 tion 3600 of SEQID NO:1 (OrfA). The nucleotide sequence According to the present invention, a domain or protein containing the sequence encoding the OrfA-ACP1 domain is having B-ketoacyl-ACP synthase (KS) biological activity represented hereinas SEQID NO:12 (positions 3343-3600 of (function) is characterized as the enzyme that carries out the SEQID NO:1). The amino acid sequence containing the first initial step of the FAS (and PKS) elongation reaction cycle. ACP domain spans from about position 1115 to about posi The term “B-ketoacyl-ACP synthase' can be used inter 65 tion 1200 of SEQ ID NO:2. The amino acid sequence con changeably with the terms “3-keto acyl-ACP synthase'. taining the OrfA-ACP1 domain is represented herein as SEQ “B-ketoacyl-ACP synthase', and “keto-acyl ACP synthase'. ID NO:13 (positions 1115-1200 of SEQID NO:2). It is noted US 7,645,597 B2 25 26 that the OrfA-ACP1 domain contains an active site motif: point of about position 2200 of SEQID NO:2 (OrfA) to an LGIDS* (*pantetheline binding motif Ss), represented ending point of about position 2910 of SEQ ID NO:2. The herein by SEQID NO:14. amino acid sequence containing the OrfA-KR domain is rep The nucleotide and amino acid sequences of all nine ACP resented herein as SEQ ID NO:18 (positions 2200-2910 of domains are highly conserved and therefore, the sequence for SEQ ID NO:2). Within the KR domain is a core region with each domain is not represented herein by an individual homology to short chain aldehyde-dehydrogenases (KR is a sequence identifier. However, based on the information dis member of this family). This core region spans from about closed herein, one of skill in the art can readily determine the position 7198 to about position 7500 of SEQID NO:1, which sequence containing each of the other eight ACP domains corresponds to amino acid positions 2400-2500 of SEQ ID (see discussion below). 10 NO:2. All nine ACP domains together span a region of OrfA of According to the present invention, a domain or protein from about position 3283 to about position 6288 of SEQID having fB-ketoacyl-ACP reductase (KR) activity is character NO:1, which corresponds to amino acid positions of from ized as one that catalyzes the pyridine-nucleotide-dependent about 1095 to about 2096 of SEQ ID NO:2. The nucleotide reduction of 3-ketoacyl forms of ACP. The term “B-ketoacyl sequence for the entire ACP region containing all nine 15 ACP reductase' can be used interchangeably with the terms domains is represented herein as SEQID NO:16. The region “ketoreductase”, “3-ketoacyl-ACP reductase'. “keto-acyl represented by SEQID NO:16 includes the linker segments ACP reductase' and similar derivatives of the term. It is the between individual ACP domains. The repeat interval for the first reductive step in the de novo fatty acid biosynthesis nine domains is approximately every 330 nucleotides of SEQ elongation cycle and a reaction often performed in polyketide ID NO:16 (the actual number of amino acids measured biosynthesis. Significant sequence similarity is observed with between adjacent active site serines ranges from 104 to 116 one family of enoyl-ACP reductases (ER), the other reductase amino acids). Each of the nine ACP domains contains a pan of FAS (but not the ER family present in the PUFA PKS tetheline binding motif LGIDS (represented herein by SEQ system), and the short-chain alcohol dehydrogenase family. ID NO:14), wherein S* is the pantetheine binding site serine Pfam analysis of the PUFA PKS region indicated above (S). The pantetheline binding site serine (S) is located near the 25 reveals the homology to the short-chain alcohol dehydroge center of each ACP domain sequence. At each end of the ACP nase family in the core region. Blast analysis of the same domain region and between each ACP domain is a region that region reveals matches in the core area to known KR enzymes is highly enriched for proline (P) and alanine (A), which is as well as an extended region of homology to domains from believed to be a linker region. For example, between ACP the other characterized PUFA PKS systems. domains 1 and 2 is the sequence: APAPVKAAA 30 PAAPVASAPAPA, represented herein as SEQ ID NO:15. Open Reading Frame B (OrfB): The locations of the active site serine residues (i.e., the pan The complete nucleotide sequence for OrfB is represented tetheline binding site) for each of the nine ACP domains, with herein as SEQID NO:3. OrfB is a 6177 nucleotide sequence respect to the amino acid sequence of SEQID NO:2, are as (not including the stop codon) which encodes a 2059 amino follows: ACP1=Ss: ACP2=S; ACP3-S; 35 acid sequence, represented herein as SEQ ID NO:4. Within ACP4-Sass: ACP5-Sea; ACP6=Szs: ACP7=Ss: OrfB are four domains: (a) one B-ketoacyl-ACP synthase ACP8-Soo and ACP9–Sosa. Given that the average size of (KS) domain; (b) one chain length factor (CLF) domain; (c) an ACP domain is about 85 amino acids, excluding the linker, one acyltransferase (AT) domain; and, (d) one enoyl-ACP and about 110 amino acids including the linker, with the reductase (ER) domain. The nucleotide sequence for OrfB active site serine being approximately in the center of the 40 has been deposited with GenBank as Accession No. domain, one of skill in the art can readily determine the AF378328 (amino acid sequence Accession No. positions of each of the nine ACP domains in OrfA. AAK728880). According to the present invention, a domain or protein The first domain in OrfB is a B-ketoacyl-ACP synthase having acyl carrier protein (ACP) biological activity (func (KS) domain, also referred to herein as OrfB-KS. This tion) is characterized as being Small polypeptides (typically, 45 domain is contained within the nucleotide sequence spanning 80 to 100 amino acids long), that function as carriers for from a starting point of between about positions 1 and 43 of growing fatty acyl chains via a thioester linkage to a SEQ ID NO:3 (OrfB) to an ending point of between about covalently bound co-factor of the protein. They occur as positions 1332 and 1350 of SEQ ID NO:3. The nucleotide separate units or as domains within larger proteins. ACPs are sequence containing the sequence encoding the OrfB-KS converted from inactive apo-forms to functional holo-forms 50 domain is represented herein as SEQ ID NO:19 (positions by transfer of the phosphopantetheinyl moeity of CoA to a 1-1350 of SEQID NO:3). The amino acid sequence contain highly conserved serine residue of the ACP. Acyl groups are ing the KS domain spans from a starting point of between attached to ACP by a thioester linkage at the free terminus of about positions 1 and 15 of SEQID NO:4 (OrfB) to an ending the phosphopantetheinyl moiety. ACPs can be identified by point of between about positions 444 and 450 of SEQ ID labeling with radioactive pantetheline and by sequence 55 NO:4. The amino acid sequence containing the OrfB-KS homology to known ACPs. The presence of variations of the domain is represented herein as SEQ ID NO:20 (positions above mentioned motif (LGIDS) is also a signature of an 1-450 of SEQID NO:4). It is noted that the OrfB-KS domain ACP. contains an active site motif: DXAC* (*acyl binding site Domain 12 in OrfA is a B-ketoacyl-ACP reductase (KR) Co.). KS biological activity and methods of identifying pro domain, also referred to herein as OrfA-KR. This domain is 60 teins or domains having Such activity is described above. contained within the nucleotide sequence spanning from a The second domain in OrfB is a chain length factor (CLF) starting point of about position 6598 of SEQID NO:1 to an domain, also referred to herein as OrfB-CLF. This domain is ending point of about position 8730 of SEQ ID NO:1. The contained within the nucleotide sequence spanning from a nucleotide sequence containing the sequence encoding the starting point of between about positions 1378 and 1402 of OrfA-KR domain is represented herein as SEQ ID NO:17 65 SEQ ID NO:3 (OrfB) to an ending point of between about (positions 6598-8730 of SEQ ID NO:1). The amino acid positions 2682 and 2700 of SEQ ID NO:3. The nucleotide sequence containing the KR domain spans from a starting sequence containing the sequence encoding the OrfB-CLF US 7,645,597 B2 27 28 domain is represented herein as SEQ ID NO:21 (positions among the various ACP domains, or transfer of the fatty acyl 1378-2700 of SEQID NO:3). The amino acid sequence con group to a lipophilic acceptor molecule (e.g. to lysophos taining the CLF domain spans from a starting point of phadic acid). between about positions 460 and 468 of SEQID NO:4(OrfB) The fourth domain in OrfB is an ER domain, also referred to an ending point of between about positions 894 and 900 of 5 to herein as OrfB-ER. This domain is contained within the SEQID NO:4. The amino acid sequence containing the OrfB nucleotide sequence spanning from a starting point of about CLF domain is represented herein as SEQID NO:22 (posi position 4648 of SEQID NO:3 (OrfB) to an ending point of tions 460-900 of SEQID NO:4). It is noted that the OrfB-CLF about position 6177 of SEQ ID NO:3. The nucleotide domain contains a KS active site motif without the acyl sequence containing the sequence encoding the OrfB-ER binding cysteine. 10 domain is represented herein as SEQ ID NO:25 (positions According to the present invention, a domain or protein is 4648-6177 of SEQID NO:3). The amino acid sequence con referred to as a chain length factor (CLF) based on the fol taining the ER domain spans from a starting point of about lowing rationale. The CLF was originally described as char position 1550 of SEQID NO:4 (OrfB) to an ending point of acteristic of Type II (dissociated enzymes) PKS systems and about position 2059 of SEQ ID NO:4. The amino acid was hypothesized to play a role in determining the number of 15 sequence containing the OrfB-ER domain is represented elongation cycles, and hence the chain length, of the end herein as SEQ ID NO:26 (positions 1550-2059 of SEQ ID product. CLF amino acid sequences show homology to KS NO:4). domains (and are thought to form heterodimers with a KS According to the present invention, this domain has enoyl protein), but they lack the active site cysteine. CLF's role in ACP reductase (ER) biological activity. According to the PKS systems is currently controversial. New evidence (C. present invention, the term “enoyl-ACP reductase' can be Bisang et al., Nature 401, 502 (1999)) suggests a role in used interchangeably with “enoyl reductase”, “enoyl ACP priming (providing the initial acyl group to be elongated) the reductase' and "enoyl acyl-ACP reductase'. The ER enzyme PKS systems. In this role the CLF domain is thought to reduces the trans-double bond (introduced by the DH activ decarboxylate malonate (as malonyl-ACP), thus forming an ity) in the fatty acyl-ACP, resulting in fully saturating those acetate group that can be transferred to the KS active site. This 25 carbons. The ER domain in the PUFA-PKS shows homology acetate therefore acts as the priming molecule that can to a newly characterized family of ER enzymes (Heath et al., undergo the initial elongation (condensation) reaction. Nature 406, 145 (2000)). Heath and Rock identified this new Homologues of the Type II CLF have been identified as class of ER enzymes by cloning a gene of interest from loading domains in Some modular PKS systems. A domain Streptococcus pneumoniae, purifying a protein expressed with the sequence features of the CLF is found in all currently 30 from that gene, and showing that it had ER activity in an in identified PUFA PKS systems and in each case is found as vitro assay. The sequence of the Schizochytrium ER domain part of a multidomain protein. of OrfB shows homology to the S. pneumoniae ER protein. The third domain in OrfB is an AT domain, also referred to All of the PUFA PKS systems currently examined contain at herein as OrfB-AT. This domain is contained within the least one domain with very high sequence homology to the nucleotide sequence spanning from a starting point of 35 Schizochytrium ER domain. The Schizochytrium PUFA PKS between about positions 2701 and 3598 of SEQ ID NO:3 system contains two ER domains (one on OrfB and one on (OrfB) to an ending point of between about positions 3975 Orf(). and 4200 of SEQID NO:3. The nucleotide sequence contain ing the sequence encoding the OrfB-AT domain is repre Open Reading Frame C (OrfQ): sented hereinas SEQID NO:23 (positions 2701-4200 of SEQ 40 The complete nucleotide sequence for Orfc is represented ID NO:3). The amino acid sequence containing the AT herein as SEQID NO:5. Orf is a 4509 nucleotide sequence domain spans from a starting point of between about posi (not including the stop codon) which encodes a 1503 amino tions 901 and 1200 of SEQID NO:4(OrfB) to an ending point acid sequence, represented herein as SEQ ID NO:6. Within of between about positions 1325 and 1400 of SEQID NO:4. OrfQ are three domains: (a) two FabA-like B-hydroxyacyl The amino acid sequence containing the OrfB-AT domain is 45 ACP dehydrase (DH) domains; and (b) one enoyl-ACP reduc represented herein as SEQID NO:24 (positions 901-1400 of tase (ER) domain. The nucleotide sequence for Orf has been SEQID NO:4). It is noted that the OrfB-AT domain contains deposited with GenBank as Accession No. AF3783.29 (amino an active site motif of GXS*XG (acyl binding site Sao) that acid sequence Accession No. AAK728881). is characteristic of acyltransferase (AT) proteins. The first domain in Orf is a DH domain, also referred to An “acyltransferase' or “AT” refers to a general class of 50 hereinas Orf-DH1. This is one of two DH domains in Orf, enzymes that can carry out a number of distinct acyl transfer and therefore is designated DH1. This domain is contained reactions. The term “acyltransferase' can be used inter within the nucleotide sequence spanning from a starting point changeably with the term “acyl transferase’. The of between about positions 1 and 778 of SEQID NO:5 (OrfQ) Schizochytrium domain shows good homology to a domain to an ending point of between about positions 1233 and 1350 present in all of the other PUFA PKS systems currently exam 55 of SEQ ID NO:5. The nucleotide sequence containing the ined and very weak homology to some acyltransferases sequence encoding the Orfc-DH1 domain is represented whose specific functions have been identified (e.g. to malo hereinas SEQIDNO:27 (positions 1-1350 of SEQIDNO:5). nyl-CoA:ACP acyltransferase, MAT). In spite of the weak The amino acid sequence containing the DH1 domain spans homology to MAT, this AT domain is not believed to function from a starting point of between about positions 1 and 260 of as a MAT because it does not possess an extended motif 60 SEQ ID NO:6 (OrfQ) to an ending point of between about structure characteristic of such enzymes (see MAT domain positions 411 and 450 of SEQ ID NO:6. The amino acid description, above). For the purposes of this disclosure, the sequence containing the Orfo-DH1 domain is represented functions of the AT domain in a PUFA PKS system include, herein as SEQID NO:28 (positions 1-450 of SEQID NO:6). but are not limited to: transfer of the fatty acyl group from the According to the present invention, this domain has FabA OrfAACP domain(s) to water (i.e. a thioesterase—releasing 65 like B-hydroxyacyl-ACP dehydrase (DH) biological activity. the fatty acyl group as a free fatty acid), transfer of a fatty acyl The term “FabA-like B-hydroxyacyl-ACP dehydrase' can be group to an acceptor Such as CoA, transfer of the acyl group used interchangeably with the terms "Fab A-like B-hydroxy US 7,645,597 B2 29 30 acyl-ACP dehydrase”, “B-hydroxyacyl-ACP dehydrase'. (1997) “Gapped BLAST and PSI-BLAST: a new generation “dehydrase' and similar derivatives. The characteristics of of protein database search programs. Nucleic Acids Res. both the DH domains (see below for DH2) in the PUFA PKS 25:3389-3402, incorporated herein by reference in its systems have been described in the preceding sections. This entirety))). At the amino acid level, the sequences with the class of enzyme removes HOH from a -ketoacyl-ACP and greatest degree of homology to Th. 23B OrfA was leaves a trans double bond in the carbon chain. The DH Schizochytrium Orf A (gb AAK72879.1) (SEQ ID NO:2). domains of the PUFA PKS systems show homology to bac The alignment extends over the entire query but is broken into terial DH enzymes associated with their FAS systems (rather 2 pieces (due to the difference in numbers of ACP repeats). than to the DH domains of other PKS systems). A subset of SEQ ID NO:39 first aligns at positions 6 through 1985 (in bacterial DH's, the Fab A-like DHs, possesses cis-trans 10 cluding 8 ACP domains) with SEQ ID NO:2 and shows a isomerase activity (Heath et al., J. Biol. Chem., 271, 27795 sequence identity to SEQID NO:2 of 54% over 2017 amino (1996)). It is the homologies to the Fab A-like DH's that acids. SEQ ID NO:39 also aligns at positions 980 through indicate that one or both of the DH domains is responsible for 2811 with SEQ ID NO:2 and shows a sequence identity to insertion of the cis double bonds in the PUFA PKS products. SEQID NO:2 of 43% over 1861 amino acids. In this second The second domain in Orf is a DH domain, also referred 15 alignment, the match is evident for the Th. 23B 8xACPs in the to hereinas Orf-DH2. This is the second of two DH domains regions of the conserved pantetheline attachment site motif. in Orf , and therefore is designated DH2. This domain is but is very poor over the 1st Schizochytrium ACP domain (i.e., contained within the nucleotide sequence spanning from a there is not a 9” ACP domain in the Th. 23B query sequence, starting point of between about positions 1351 and 2437 of but the Blastp output under theses conditions attempts to align SEQ ID NO:5 (Orf) to an ending point of between about them anyway). SEQID NO:39 shows the next closest identity positions 2607 and 2850 of SEQ ID NO:5. The nucleotide with sequences from Shewanella Oneidensis (Accession No. sequence containing the sequence encoding the Orf-DH2 NP 717214) and Photobacter profindum (Accession No. domain is represented herein as SEQ ID NO:29 (positions AAL01060). 1351-2850 of SEQID NO:5). The amino acid sequence con The first domain in Th. 23B OrfA is a KS domain, also taining the DH2 domain spans from a starting point of 25 referred to herein as Th. 23B OrfA-KS. KS domain function between about positions 451 and 813 of SEQID NO:6 (Orf) has been described in detail above. This domain is contained to an ending point of between about positions 869 and 950 of within the nucleotide sequence spanning from about position SEQID NO:6. The amino acid sequence containing the Orf - 1 to about position 1500 of SEQ ID NO:38, represented DH2 domain is represented herein as SEQID NO:30 (posi herein as SEQID NO:40. The amino acid sequence contain tions 451-950 of SEQID NO:6). DH biological activity has 30 ing the Th. 23B KS domain is a region of SEQ ID NO:39 been described above. spanning from about position 1 to about position 500 of SEQ The third domain in Orf(C is an ER domain, also referred to ID NO:39, represented hereinas SEQID NO:41. This region herein as Orf-ER. This domain is contained within the of SEQ ID NO:39 has a Pfam match to FabB (B-ketoacyl nucleotide sequence spanning from a starting point of about ACP synthase) spanning from position 1 to about position position 2998 of SEQID NO:5 (Orf) to an ending point of 35 450 of SEQID NO:39 (also positions 1 to about 450 of SEQ about position 4509 of SEQ ID NO:5. The nucleotide ID NO:41). It is noted that the Th. 23B OrfA-KS domain sequence containing the sequence encoding the Orf-ER contains an active site motif: DXAC* (*acyl binding site domain is represented herein as SEQ ID NO:31 (positions Co.,). Also, a characteristic motifat the end of the Th. 23B KS 2998-4509 of SEQID NO:5). The amino acid sequence con region, GFGG, is present in positions 453-456 of SEQ ID taining the ER domain spans from a starting point of about 40 NO:39 (also positions 453-456 of SEQ ID NO:41). The position 1000 of SEQID NO:6 (Orf) to an ending point of amino acid sequence spanning positions 1-500 of SEQ ID about position 1502 of SEQ ID NO:6. The amino acid NO:39 is about 79% identical to Schizochytrium OrfA (SEQ sequence containing the Orf-ER domain is represented ID NO:2) over 496 amino acids. The amino acid sequence herein as SEQ ID NO:32 (positions 1000-1502 of SEQ ID spanning positions 1-450 of SEQ ID NO:39 is about 81% NO:6). ER biological activity has been described above. 45 identical to Schizochytrium OrfA (SEQ ID NO:2) over 446 amino acids. Thraustochytrium 23B PUFA PKS The second domain in Th. 23B OrfA is a MAT domain, also Th. 23B Open Reading Frame A (OrfA): referred to herein as Th. 23B OrfA-MAT. MAT domain func The complete nucleotide sequence for Th. 23B OrfA is tion has been described in detail above. This domain is con represented herein as SEQ ID NO:38. SEQ ID NO:38 50 tained within the nucleotide sequence spanning from between encodes the following domains in Th. 23B OrfA: (a) one about position 1503 and about position 3000 of SEQ ID B-ketoacyl-ACP synthase (KS) domain; (b) one malonyl NO:38, represented hereinas SEQID NO:42. The amino acid CoA:ACP acyltransferase (MAT) domain; (c) eight acyl car sequence containing the Th. 23B MAT domain is a region of rier protein (ACP) domains; and (d) one f-ketoacyl-ACP SEQ ID NO:39 spanning from about position 501 to about reductase (KR) domain. This domain organization is the same 55 position 1000, represented herein by SEQ ID NO:43. This as is present in Schizochytrium OrfA(SEQID NO:1) with the region of SEQID NO:39 has a Pfammatch to FabD (malonyl exception that the Th. 23B OrfA has 8 adjacent ACP domains, CoA:ACP acyltransferase) spanning from about position 580 while Schizochytrium OrfA has 9 adjacent ACP domains. Th. to about position 900 of SEQID NO:39 (positions 80-400 of 23B OrfA is a 8433 nucleotide sequence (not including the SEQ ID NO:43). It is noted that the Th. 23B OrfA-MAT stop codon) which encodes a 2811 amino acid sequence, 60 domain contains an active site motif: GHS*XG (acyl bind represented herein as SEQ ID NO:39. The Th. 23B OrfA ing site S), represented by positions 695-699 of SEQ ID amino acid sequence (SEQ ID NO:39) was compared with NO:39. The amino acid sequence spanning positions 501 known sequences in a standard BLAST search (BLAST 1000 of SEQ ID NO:39 is about 46% identical to parameters: Blastp, low complexity filter Off, program— Schizochytrium OrfA (SEQID NO:2) over 481 amino acids. BLOSUM62, Gap cost Existence: 11, Extension 1: 65 The amino acid sequence spanning positions 580-900 of SEQ (BLAST described in Altschul, S. F., Madden, T. L., Schääf ID NO:39 is about 50% identical to Schizochytrium OrfA fer, A. A., Zhang, J., Zhang, Z. Miller, W. & Lipman, D. J. (SEQ ID NO:2) over 333 amino acids. US 7,645,597 B2 31 32 Domains 3-10 of Th. 23B OrfA are eight tandem ACP SEQ ID NO:39 is about 51% identical to Schizochytrium domains, also referred to herein as Th. 23B OrfA-ACP (the OrfA (SEQ ID NO:2) over 235 amino acids. first domain in the sequence is OrfA-ACP1, the second domain is OrfA-ACP2, the third domain is OrfA-ACP3, etc.). Th. 23B Open Reading Frame B (Or/B): The function of ACP domains has been described in detail The complete nucleotide sequence for Th. 23B OrfB is above. The first Th. 23BACP domain, Th. 23B OrfA-ACP1, represented herein as SEQ ID NO:51. SEQ ID NO:51 is contained within the nucleotide sequence spanning from encodes the following domains in Th. 23B OrfB: (a) one about position 3205 to about position 3555 of SEQID NO:38 B-ketoacyl-ACP synthase (KS) domain; (b) one chain length (OrfA), represented hereinas SEQID NO:44. The amino acid factor (CLF) domain; (c) one acyltransferase (AT) domain; sequence containing the first Th. 23BACP domain is a region 10 and, (d) one enoyl-ACP reductase (ER) domain. This domain of SEQIDNO:39 spanning from about position 1069 to about organization is the same as in Schizochytrium OrfB (SEQID position 1185 of SEQID NO:39, represented herein by SEQ NO:3) with the exception that the linker region between the ID NO:45. The amino acid sequence spanning positions AT and ER domains of the Schizochytrium protein is longer 1069-1185 of SEQ ID NO:39 is about 65% identical to than that of Th. 23B by about 50-60 amino acids. Also, this Schizochytrium OrfA (SEQ ID NO:2) over 85 amino acids. 15 linker region in Schizochytrium has a specific area that is Th. 23B OrfA-ACP1 has a similar identity to any one of the highly enriched in serine residues (it contains 15 adjacent nine ACP domains in Schizochytrium OrfA. serine residues, in addition to other serines in the region), The eight ACP domains in Th. 23B OrfA are adjacent to whereas the corresponding linker region in Th. 23B OrfB is one another and can be identified by the presence of the not enriched in serine residues. This difference in the AT/ER phosphopantetheine binding site motif, LGXDS (repre linker region most likely accounts for a break in the alignment sented by SEQ ID NO:46), wherein the S* is the phospho between Schizochytrium OrfB and Th. 23B OrfB at the start pantetheline attachment site. The amino acid position of each of this region. of the eight S* sites, with reference to SEQ ID NO:39, are Th. 23B OrfB is a 5805 nucleotide sequence (not including 1128 (ACP1), 1244 (ACP2), 1360 (ACP3), 1476 (ACP4), the stop codon) which encodes a 1935 amino acid sequence, 1592 (ACP5), 1708 (ACP6), 1824 (ACP7) and 1940 (ACP8). 25 represented herein as SEQ ID NO:52. The Th. 23B OrfB The nucleotide and amino acid sequences of all eight Th. 23B amino acid sequence (SEQ ID NO:52) was compared with ACP domains are highly conserved and therefore, the known sequences in a standard BLAST search (BLAST sequence for each domain is not represented herein by an parameters: Blastp, low complexity filter Off, program— individual sequence identifier. However, based on the infor BLOSUM62, Gap cost Existence: 11, Extension 1: mation disclosed herein, one of skill in the art can readily 30 (BLAST described in Altschul, S. F., Madden, T. L., Schääf determine the sequence containing each of the other seven fer, A. A., Zhang, J., Zhang, Z. Miller, W. & Lipman, D. J. ACP domains in SEQID NO:38 and SEQID NO:39. (1997) “Gapped BLAST and PSI-BLAST: a new generation All eight Th. 23B ACP domains together span a region of of protein database search programs. Nucleic Acids Res. Th. 23B OrfA of from about position 3205 to about position 25:3389-3402, incorporated herein by reference in its 5994 of SEQ ID NO:38, which corresponds to amino acid 35 entirety))). At the amino acid level, the sequences with the positions of from about 1069 to about 1998 of SEQ ID greatest degree of homology to Th. 23B OrfB were NO:39. The nucleotide sequence for the entire ACP region Schizochytrium Orf B (gb AAK72880.1) (SEQ ID NO:4), containing all eight domains is represented herein as SEQID over most of OrfB; and Schizochytrium Orf (gb NO:47. SEQ ID NO.47 encodes an amino acid sequence AAK728881.1) (SEQ ID NO:6), over the last domain (the represented herein by SEQ ID NO:48. SEQ ID NO:48 40 alignment is broken into 2 pieces, as mentioned above). SEQ includes the linker segments between individual ACP ID NO:52 first aligns at positions 10 through about 1479 domains. The repeat interval for the eight domains is approxi (including the KS, CLF and AT domains) with SEQID NO:4 mately every 116 amino acids of SEQID NO:48, and each and shows a sequence identity to SEQID NO:4 of 52% over domain can be considered to consist of about 116 amino acids 1483 amino acids. SEQ ID NO:52 also aligns at positions centered on the active site motif (described above). It is noted 45 1491 through 1935 (including the ER domain) with SEQID that the linker regions between the nine adjacent ACP NO:6 and shows a sequence identity to SEQID NO:4 of 64% domains in OrfA in Schizochytrium are highly enriched in over 448 amino acids. proline and alanine residues, while the linker regions between The first domain in the Th. 23B OrfB is a KS domain, also the eight adjacent ACP domains in OrfA of Thraustochytrium referred to herein as Th. 23B OrfB-KS. KS domain function are highly enriched in serine residues (and not proline or 50 has been described in detail above. This domain is contained alanine residues). within the nucleotide sequence spanning from between about The last domain in Th. 23B OrfA is a KR domain, also position 1 and about position 1500 of SEQ ID NO:51 (Th. referred to herein as Th. 23B OrfA-KR. KR domain function 23B OrfB), represented herein as SEQID NO:53. The amino has been discussed in detail above. This domain is contained acid sequence containing the Th. 23B KS domain is a region within the nucleotide sequence spanning from between about 55 of SEQID NO: 52 spanning from about position 1 to about position 6001 to about position 8433 of SEQ ID NO:38, position500 of SEQIDNO:52, represented hereinas SEQID represented herein by SEQ ID NO:49. The amino acid NO:54. This region of SEQ ID NO:52 has a Pfam match to sequence containing the Th. 23B KR domain is a region of FabB (B-ketoacyl-ACP synthase) spanning from about posi SEQID NO:39 spanning from about position 2001 to about tion 1 to about position 450 (positions 1-450 of SEQ ID position 2811 of SEQID NO:39, represented herein by SEQ 60 NO:54). It is noted that the Th. 23B OrfB-KS domain con IDNO:50. This region of SEQID NO:39 has a Pfammatch to tains an active site motif: DXAC, where C* is the site of acyl FabG (B-ketoacyl-ACP reductase) spanning from about posi group attachment and wherein the C* is at position 201 of tion 2300 to about 2550 of SEQID NO:39 (positions 300-550 SEQID NO:52. Also, a characteristic motif at the end of the of SEQID NO:50). The amino acid sequence spanning posi KS region, GFGG is present in amino acid positions 434–437 tions 2001-2811 of SEQID NO:39 is about 40% identical to 65 of SEQID NO:52. The amino acid sequence spanning posi Schizochytrium OrfA (SEQID NO:2) over 831 amino acids. tions 1-500 of SEQ ID NO:52 is about 64% identical to The amino acid sequence spanning positions 2300-2550 of Schizochytrium OrfB (SEQID NO:4) over 500 amino acids. US 7,645,597 B2 33 34 The amino acid sequence spanning positions 1-450 of SEQ SEQ ID NO:52 is about 70% identical to Schizochytrium ID NO:52 is about 67% identical to Schizochytrium OrfB OrfB (SEQ ID NO:4) over 305 amino acids. (SEQ ID NO:4) over 442 amino acids. The second domain in Th. 23B OrfB is a CLF domain, also Th. 23B Open Reading Frame C (Orf): referred to herein as Th. 23B OrfB-CLF. CLF domain func The complete nucleotide sequence for Th. 23B Orf is tion has been described in detail above. This domain is con represented herein as SEQ ID NO:61. SEQ ID NO:61 tained within the nucleotide sequence spanning from between encodes the following domains in Th. 23B Orf': (a) two about position 1501 and about position 3000 of SEQ ID FabA-like B-hydroxyacyl-ACP dehydrase (DH) domains, NO:51 (OrfB), represented herein as SEQ ID NO:55. The both with homology to the FabA protein (an enzyme that amino acid sequence containing the CLF domain is a region 10 catalyzes the synthesis of trans-2-decenoyl-ACP and the of SEQID NO: 52 spanning from about position 501 to about reversible isomerization of this product to cis-3-decenoyl position 1000 of SEQID NO:52, represented herein as SEQ ACP); and (b) one enoyl-ACP reductase (ER) domain with IDNO:56. This region of SEQID NO:52 has a Pfammatch to high homology to the ER domain of Schizochytrium OrfB. FabB (B-ketoacyl-ACP synthase) spanning from about posi This domain organization is the same as in Schizochytrium tion 550 to about position 910 (positions 50-410 of SEQ ID 15 NO:56). Although CLF has homology to KS proteins, it lacks Orf C (SEQID NO:5). an active site cysteine to which the acyl group is attached in Th. 23B Orf is a 4410 nucleotide sequence (not including KS proteins. The amino acid sequence spanning positions the stop codon) which encodes a 1470 amino acid sequence, 501-1000 of SEQ ID NO:52 is about 49% identical to represented herein as SEQ ID NO:62. The Th. 23B Orf Schizochytrium OrfB (SEQID NO:4) over 517 amino acids. amino acid sequence (SEQ ID NO:62) was compared with The amino acid sequence spanning positions 550-910 of SEQ known sequences in a standard BLAST search (BLAST ID NO:52 is about 54% identical to Schizochytrium OrfB parameters: Blastp, low complexity filter Off, program— (SEQ ID NO:4) over 360 amino acids. BLOSUM62, Gap cost Existence: 11, Extension 1: The third domain in Th. 23B OrfB is an AT domain, also (BLAST described in Altschul, S. F., Madden, T. L., Schääf referred to herein as Th. 23B OrfB-AT. AT domain function 25 fer, A. A., Zhang, J., Zhang, Z. Miller, W. & Lipman, D. J. has been described in detail above. This domain is contained (1997) “Gapped BLAST and PSI-BLAST: a new generation within the nucleotide sequence spanning from between about of protein database search programs. Nucleic Acids Res. position 3001 and about position 4500 of SEQID NO:51 (Th. 25:3389-3402, incorporated herein by reference in its 23B OrfB), represented herein as SEQID NO:58. The amino entirety))). At the amino acid level, the sequences with the acid sequence containing the Th. 23BAT domain is a region 30 greatest degree of homology to Th. 23B Orf was of SEQ ID NO: 52 spanning from about position 1001 to Schizochytrium Orf (gb AAK7288.81.1) (SEQ ID NO:6). about position 1500 of SEQID NO:52, represented hereinas SEQ ID NO:52 is 66% identical to Schizochytrium Orf SEQ ID NO:58. This region of SEQ ID NO:52 has a Pfam (SEQID NO:6). match to Fab) (malonyl-CoA:ACP acyltransferase) span The first domain in Th. 23B Orf is a DH domain, also ning from about position 1100 to about position 1375 (posi 35 referred to hereinas Th. 23B Orf-DH1. DH domain function tions 100-375 of SEQID NO:58). Although this AT domain has been described in detail above. This domain is contained of the PUFA synthases has homology to MAT proteins, it within the nucleotide sequence spanning from between about lacks the extended motif of the MAT (key arginine and position 1 to about position 1500 of SEQID NO:61 (Orf(r), glutamine residues) and it is not thought to be involved in represented herein as SEQ ID NO:63. The amino acid malonyl-CoA transfers. The GXS*XG motif of acyltrans 40 sequence containing the Th. 23B DH1 domain is a region of ferases is present, with the S* being the site of acyl attachment SEQ ID NO: 62 spanning from about position 1 to about and located at position 1123 with respect to SEQID NO:52. position500 of SEQIDNO:62, represented hereinas SEQID The amino acid sequence spanning positions 1001-1500 of NO:64. This region of SEQ ID NO:62 has a Pfam match to SEQ ID NO:52 is about 44% identical to Schizochytrium FabA, as mentioned above, spanning from about position 275 OrfB (SEQID NO:4) over 459 amino acids. The amino acid 45 to about position 400 (positions 275-400 of SEQID NO:64). sequence spanning positions 1100-1375 of SEQID NO:52 is The amino acid sequence spanning positions 1-500 of SEQ about 45% identical to Schizochytrium OrfB (SEQID NO:4) ID NO:62 is about 66% identical to Schizochytrium Orf over 283 amino acids. (SEQ ID NO:6) over 526 amino acids. The amino acid The fourth domain in Th. 23B OrfB is an ER domain, also sequence spanning positions 275-400 of SEQ ID NO:62 is referred to herein as Th. 23B OrfB-ER. ER domain function 50 about 81% identical to Schizochytrium Orf (SEQID NO:6) has been described in detail above. This domain is contained over 126 amino acids. within the nucleotide sequence spanning from between about The second domain in Th. 23B Orf is also a DH domain, position 4501 and about position 5805 of SEQ ID NO:51 also referred to herein as Th. 23B Orf-DH2. This is the (OrfB), represented hereinas SEQIDNO:59. The amino acid second of two DH domains in Orf, and therefore is desig sequence containing the Th. 23B ER domain is a region of 55 nated DH2. This domain is contained within the nucleotide SEQID NO: 52 spanning from about position 1501 to about sequence spanning from between about position 1501 to position 1935 of SEQID NO:52, represented herein as SEQ about 3000 of SEQID NO:61 (Orf), represented herein as IDNO:60. This region of SEQID NO:52 has a Pfammatch to SEQID NO:65. The amino acid sequence containing the Th. a family of dioxygenases related to 2-nitropropane dioxyge 23B DH2 domain is a region of SEQ ID NO: 62 spanning nases spanning from about position 1501 to about position 60 from about position 501 to about position 1000 of SEQ ID 1810 (positions 1-310 of SEQID NO:60). That this domain NO:62, represented herein as SEQID NO:66. This region of functions as an ER can be further predicted due to homology SEQ ID NO:62 has a Pfam match to FabA, as mentioned to a newly characterized ER enzyme from Streptococcus above, spanning from about position 800 to about position pneumoniae. The amino acid sequence spanning positions 925 (positions 300-425 of SEQID NO:66). The amino acid 1501-1935 of SEQ ID NO:52 is about 66% identical to 65 sequence spanning positions 501-1000 of SEQID NO:62 is Schizochytrium OrfB (SEQID NO:4) over 433 amino acids. about 56% identical to Schizochytrium Orf (SEQID NO:6) The amino acid sequence spanning positions 1501-1810 of over 518 amino acids. The amino acid sequence spanning US 7,645,597 B2 35 36 positions 800-925 of SEQID NO:62 is about 58% identical to to a protein or peptide including, but not limited to, methyla Schizochytrium Orf (SEQID NO:6) over 124 amino acids. tion, farnesylation, carboxymethylation, geranylgeranyla The third domain in Th. 23B Orf is an ER domain, also tion, glycosylation, phosphorylation, acetylation, myristoy referred to herein as Th. 23B Orf-ER. ER domain function lation, prenylation, palmitation, and/or amidation. has been described in detail above. This domain is contained 5 Modifications can also include, for example, complexing a within the nucleotide sequence spanning from between about protein or peptide with another compound. Such modifica position 3001 to about position 4410 of SEQ ID NO:61 tions can be considered to be mutations, for example, if the (Orf), represented hereinas SEQIDNO:67. The amino acid modification is different than the post-translational modifica sequence containing the Th. 23B ER domain is a region of tion that occurs in the natural, wild-type protein or peptide. SEQID NO: 62 spanning from about position 1001 to about 10 As used herein, the term “homologue' is used to refer to a position 1470 of SEQID NO:62, represented herein as SEQ protein or peptide which differs from a naturally occurring IDNO:68. This region of SEQID NO:62 has a Pfammatch to protein or peptide (i.e., the “prototype' or “wild-type' pro the dioxygenases related to 2-nitropropane dioxygenases, as tein) by one or more minor modifications or mutations to the mentioned above, spanning from about position 1025 to naturally occurring protein or peptide, but which maintains about position 1320 (positions 25-320 of SEQ ID NO:68). 15 the overall basic protein and side chain structure of the natu This domain function as an ER can also be predicted due to rally occurring form (i.e., Such that the homologue is identi homology to a newly characterized ER enzyme from Strep fiable as being related to the wild-type protein). Such changes tococcus pneumoniae. The amino acid sequence spanning include, but are not limited to: changes in one or a few amino positions 1001-1470 of SEQID NO:62 is about 75% identical acid side chains; changes one or a few amino acids, including to Schizochytrium OrfB (SEQ ID NO:4) over 474 amino deletions (e.g., a truncated version of the protein or peptide) acids. The amino acid sequence spanning positions 1025 insertions and/or substitutions; changes in Stereochemistry of 1320 of SEQ ID NO:62 is about 81% identical to one or a few atoms; and/or minor derivatizations, including Schizochytrium OrfB (SEQID NO:4) over 296 amino acids. but not limited to: methylation, farnesylation, geranylgera One embodiment of the present invention relates to an nylation, glycosylation, carboxymethylation, phosphoryla isolated protein or domain from a non-bacterial PUFA PKS 25 tion, acetylation, myristoylation, prenylation, palmitation, system, a homologue thereof, and/or a fragment thereof. Also and/or amidation. A homologue can have either enhanced, included in the invention are isolated nucleic acid molecules decreased, or Substantially similar properties as compared to encoding any of the proteins, domains or peptides described the naturally occurring protein or peptide. Preferred homo herein (discussed in detail below). According to the present logues of a PUFA PKS protein or domain are described in invention, an isolated protein or peptide. Such as a protein or 30 detail below. It is noted that homologues can include syntheti peptide from a PUFA PKS system, is a protein or a fragment cally produced homologues, naturally occurring allelic vari thereof (including a polypeptide or peptide) that has been ants of a given protein or domain, or homologous sequences removed from its natural milieu (i.e., that has been subject to from organisms other than the organism from which the ref human manipulation) and can include purified proteins, par erence sequence was derived. tially purified proteins, recombinantly produced proteins, and 35 synthetically produced proteins, for example. As such, “iso Conservative substitutions typically include substitutions lated does not reflect the extent to which the protein has been within the following groups: glycine and alanine; Valine, purified. Preferably, an isolated protein of the present inven isoleucine and leucine; aspartic acid, glutamic acid, aspar tion is produced recombinantly. An isolated peptide can be agine, and glutamine; serine and threonine; lysine and argin produced synthetically (e.g., chemically, such as by peptide 40 ine; and phenylalanine and tyrosine. Substitutions may also synthesis) or recombinantly. In addition, and by way of be made on the basis of conserved hydrophobicity or hydro example, a “Thraustochytrium PUFA PKS protein’ refers to philicity (Kyte and Doolittle, J. Mol. Biol. (1982) 157: 105 a PUFA PKS protein (generally including a homologue of a 132), or on the basis of the ability to assume similar polypep naturally occurring PUFA PKS protein) from a Thraus tide secondary structure (Chou and Fasman, Adv. Enzymol. tochytrium microorganism, or to a PUFAPKS protein that has 45 (1978)47:45-148, 1978). been otherwise produced from the knowledge of the structure Homologues can be the result of natural allelic variation or (e.g., sequence), and perhaps the function, of a naturally natural mutation. A naturally occurring allelic variant of a occurring PUFA PKS protein from Thraustochytrium. In nucleic acid encoding a protein is a gene that occurs at essen other words, general reference to a Thraustochytrium PUFA tially the same locus (or loci) in the genome as the gene which PKS protein includes any PUFA PKS protein that has sub 50 encodes such protein, but which, due to natural variations stantially similar structure and function of a naturally occur caused by, for example, mutation or recombination, has a ring PUFA PKS protein from Thraustochytrium or that is a similar but not identical sequence. Allelic variants typically biologically active (i.e., has biological activity) homologue of encode proteins having similar activity to that of the protein a naturally occurring PUFA PKS protein from Thraus encoded by the gene to which they are being compared. One tochytrium as described in detail herein. As such, a Thraus 55 class of allelic variants can encode the same protein but have tochytrium PUFA PKS protein can include purified, partially different nucleic acid sequences due to the degeneracy of the purified, recombinant, mutated/modified and synthetic pro genetic code. Allelic variants can also comprise alterations in teins. The same description applies to reference to other pro the 5' or 3' untranslated regions of the gene (e.g., in regulatory teins or peptides described herein, such as the PUFA PKS control regions). Allelic variants are well known to those proteins and domains from Schizochytrium or from other 60 skilled in the art. microorganisms. Homologues can be produced using techniques known in According to the present invention, the terms “modifica the art for the production of proteins including, but not limited tion' and “mutation' can be used interchangeably, particu to, direct modifications to the isolated, naturally occurring larly with regard to the modifications/mutations to the pri protein, direct protein synthesis, or modifications to the mary amino acid sequences of a protein or peptide (or nucleic 65 nucleic acid sequence encoding the protein using, for acid sequences) described herein. The term “modification' example, classic or recombinant DNA techniques to effect can also be used to describe post-translational modifications random or targeted mutagenesis. US 7,645,597 B2 37 38 Modifications or mutations in protein homologues, as also well known in the art. For example, a BIAcore machine compared to the wild-type protein, either increase, decrease, can be used to determine the binding constant of a complex or do not substantially change, the basic biological activity of between two proteins. The dissociation constant for the com the homologue as compared to the naturally occurring (wild plex can be determined by monitoring changes in the refrac type) protein. In general, the biological activity or biological 5 tive index with respect to time as buffer is passed over the chip action of a protein refers to any function(s) exhibited or (O'Shannessy et al. Anal. Biochem. 212:457-468 (1993); performed by the protein that is ascribed to the naturally Schuster et al., Nature 365:343-347 (1993)). Other suitable occurring form of the protein as measured or observed in vivo assays for measuring the binding of one protein to another (i.e., in the natural physiological environment of the protein) include, for example, immunoassays Such as enzyme linked or in vitro (i.e., under laboratory conditions). Biological 10 immunoabsorbent assays (ELISA) and radioimmunoassays activities of PUFA PKS systems and the individual proteins/ (RIA); or determination of binding by monitoring the change domains that make up a PUFA PKS system have been in the spectroscopic or optical properties of the proteins described in detail elsewhere herein. Modifications of a pro through fluorescence, UV absorption, circular dichrosim, or tein, such as in a homologue or mimetic (discussed below), nuclear magnetic resonance (NMR). may result in proteins having the same biological activity as 15 In one embodiment, the present invention relates to an the naturally occurring protein, or in proteins having isolated protein comprising an amino acid sequence selected decreased or increased biological activity as compared to the from the group consisting of: (a) an amino acid sequence naturally occurring protein. Modifications which result in a selected from the group consisting of: SEQID NO:39, SEQ decrease in protein expression or a decrease in the activity of ID NO:52, SEQID NO:62, and biologically active fragments the protein, can be referred to as inactivation (complete or thereof; (b) an amino acid sequence selected from the group partial), down-regulation, or decreased action (or activity) of consisting of: SEQ ID NO:41, SEQ ID NO:43, SEQ ID a protein. Similarly, modifications which result in an increase NO:45, SEQ ID NO:48, SEQ ID NO:50, SEQ ID NO:54, in protein expression or an increase in the activity of the SEQ ID NO:56, SEQ ID NO:58, SEQ ID NO:60, SEQ ID protein, can be referred to as amplification, overproduction, NO:64, SEQ ID NO:66, SEQ ID NO:68 and biologically activation, enhancement, up-regulation or increased action 25 active fragments thereof; (c) an amino acid sequence that is at (or activity) of a protein. It is noted that general reference to least about 60% identical to at least 500 consecutive amino a homologue having the biological activity of the wild-type acids of the amino acid sequence of (a), wherein the amino protein does not necessarily mean that the homologue has acid sequence has a biological activity of at least one domain identical biological activity as the wild-type protein, particu of a polyunsaturated fatty acid (PUFA) polyketide synthase larly with regard to the level of biological activity. Rather, a 30 (PKS) system; and/or (d) an amino acid sequence that is at homologue can perform the same biological activity as the least about 60% identical to the amino acid sequence of (b). wild-type protein, but at a reduced or increased level of activ wherein the amino acid sequence has a biological activity of ity as compared to the wild-type protein. A functional domain at least one domain of a polyunsaturated fatty acid (PUFA) of a PUFA PKS system is a domain (i.e., a domain can be a polyketide synthase (PKS) system. In a further embodiment, portion of a protein) that is capable of performing a biological 35 an amino acid sequence including the active site domains or function (i.e., has biological activity). other functional motifs described above for several of the Methods of detecting and measuring PUFAPKS protein or PUFA PKS domains are encompassed by the invention. In domain biological activity include, but are not limited to, one embodiment, the amino acid sequence described above measurement of transcription of a PUFA PKS protein or does not include any of the following amino acid sequences: domain, measurement of translation of a PUFA PKS protein 40 SEQID NO:2, SEQID NO:4, SEQID NO:6, SEQID NO:8, or domain, measurement of posttranslational modification of SEQ ID NO:10, SEQ ID NO:13, SEQ ID NO:18, SEQ ID a PUFA PKS protein or domain, measurement of enzymatic NO:20, SEQ ID NO:22, SEQ ID NO:24, SEQ ID NO:26, activity of a PUFA PKS protein or domain, and/or measure SEQID NO:28, SEQID NO:30, SEQ ID NO:32. ment production of one or more products of a PUFA PKS In one aspect of the invention, a PUFA PKS protein or system (e.g., PUFA production). It is noted that an isolated 45 domain encompassed by the present invention, including a protein of the present invention (including a homologue) is homologue of a particular PUFA PKS protein or domain not necessarily required to have the biological activity of the described herein, comprises an amino acid sequence that is at wild-type protein. For example, a PUFA PKS protein or least about 60% identical to at least 500 consecutive amino domain can be a truncated, mutated or inactive protein, for acids of an amino acid sequence chosen from: SEQ ID example. Such proteins are useful in screening assays, for 50 NO:39, SEQ ID NO:52, or SEQ ID NO:62, wherein the example, or for other purposes such as antibody production. amino acid sequence has a biological activity of at least one In a preferred embodiment, the isolated proteins of the domain of a PUFA PKS system. In a further aspect, the amino present invention have biological activity that is similar to acid sequence of the protein is at least about 60% identical to that of the wild-type protein (although not necessarily equiva at least about 600 consecutive amino acids, and more prefer lent, as discussed above). 55 ably to at least about 700 consecutive amino acids, and more Methods to measure protein expression levels generally preferably to at least about 800 consecutive amino acids, and include, but are not limited to: Western blot, immunoblot, more preferably to at least about 900 consecutive amino enzyme-linked immunosorbant assay (ELISA), radioimmu acids, and more preferably to at least about 1000 consecutive noassay (RIA), immunoprecipitation, Surface plasmon reso amino acids, and more preferably to at least about 1100 nance, chemiluminescence, fluorescent polarization, phos 60 consecutive amino acids, and more preferably to at least about phorescence, immunohistochemical analysis, matrix 1200 consecutive amino acids, and more preferably to at least assisted laser desorption/ionization time-of-flight (MALDI about 1300 consecutive amino acids, and more preferably to TOF) mass spectrometry, microcytometry, microarray, at least about 1400 consecutive amino acids of any of SEQID microscopy, fluorescence activated cell sorting (FACS), and NO:39, SEQ ID NO:52, or SEQ ID NO:62, or to the full flow cytometry, as well as assays based on a property of the 65 length of SEQID NO:62. In a further aspect, the amino acid protein including but not limited to enzymatic activity or sequence of the protein is at least about 60% identical to at interaction with other protein partners. Binding assays are least about 1500 consecutive amino acids, and more prefer US 7,645,597 B2 39 40 ably to at least about 1600 consecutive amino acids, and more least about 98% identical, and more preferably at least about preferably to at least about 1700 consecutive amino acids, and 99% identical to an amino acid sequence chosen from: SEQ more preferably to at least about 1800 consecutive amino ID NO:39, SEQID NO:41, SEQID NO:43, SEQID NO:45, acids, and more preferably to at least about 1900 consecutive SEQ ID NO:48, SEQ ID NO:50, SEQ ID NO:52, SEQ ID amino acids, of any of SEQID NO:39 or SEQID NO:52, or NO:54, SEQ ID NO:56, SEQ ID NO:58, SEQ ID NO:60, to the full length of SEQ ID NO:52. In a further aspect, the SEQ ID NO:62, SEQ ID NO:64, SEQ ID NO:66, SEQ ID amino acid sequence of the protein is at least about 60% NO:68, wherein the amino acid sequence has a biological identical to at least about 2000 consecutive amino acids, and activity of at least one domain of a PUFA PKS system. In one more preferably to at least about 2100 consecutive amino embodiment, the amino acid sequence described above does acids, and more preferably to at least about 2200 consecutive 10 not include any of the following amino acid sequences: SEQ amino acids, and more preferably to at least about 2300 ID NO:2, SEQID NO:4, SEQID NO:6, SEQID NO:8, SEQ consecutive amino acids, and more preferably to at least about ID NO:10, SEQID NO:13, SEQID NO:18, SEQID NO:20, 2400 consecutive amino acids, and more preferably to at least SEQ ID NO:22, SEQ ID NO:24, SEQ ID NO:26, SEQ ID about 2500 consecutive amino acids, and more preferably to NO:28, SEQID NO:30, SEQID NO:32. at least about 2600 consecutive amino acids, and more pref 15 In another aspect, a PUFA PKS protein or domain encom erably to at least about 2700 consecutive amino acids, and passed by the present invention, including a homologue as more preferably to at least about 2800 consecutive amino described above, comprises an amino acid sequence that is at acids, and even more preferably, to the full length of SEQID least about 50% identical to an amino acid sequence chosen NO:39. In one embodiment, the amino acid sequence from: SEQID NO:39, SEQID NO:43, SEQID NO:50, SEQ described above does not include any of the following amino ID NO:52, and SEQ ID NO:58, wherein the amino acid acid sequences: SEQID NO:2, SEQID NO:4, SEQID NO:6, sequence has a biological activity of at least one domain of a SEQ ID NO:8, SEQ ID NO:10, SEQ ID NO:13, SEQ ID PUFA PKS system. In another aspect, the amino acid NO:18, SEQ ID NO:20, SEQ ID NO:22, SEQ ID NO:24, sequence of the protein is at least about 55% identical, and SEQ ID NO:26, SEQ ID NO:28, SEQ ID NO:30, SEQ ID more preferably at least about 60% identical, to an amino acid NO:32. 25 sequence chosen from: SEQID NO:39, SEQID NO:43, SEQ In another aspect, a PUFA PKS protein or domain encom ID NO:50, SEQ ID NO:52, SEQ ID NO:56 and SEQ ID passed by the present invention, including homologues as NO:58, wherein the amino acid sequence has a biological described above, comprises an amino acid sequence that is at activity of at least one domain of a PUFA PKS system. In a least about 65% identical, and more preferably at least about further aspect, the amino acid sequence of the protein is at 70% identical, and more preferably at least about 75% iden 30 least about 65% identical to an amino acid sequence chosen tical, and more preferably at least about 80% identical, and from SEQID NO:39, SEQID NO:43, SEQID NO:50, SEQ more preferably at least about 85% identical, and more pref ID NO:52, SEQ ID NO:54, SEQ ID NO:56 and SEQ ID erably at least about 90% identical, and more preferably at NO:58, wherein the amino acid sequence has a biological least about 95% identical, and more preferably at least about activity of at least one domain of a PUFA PKS system. In 96% identical, and more preferably at least about 97% iden 35 another aspect, the amino acid sequence of the protein is at tical, and more preferably at least about 98% identical, and least about 70% identical, and more preferably at least about more preferably at least about 99% identical to an amino acid 75% identical, to an amino acid sequence chosen from: SEQ sequence chosen from: SEQ ID NO:39, SEQ ID NO:52, or ID NO:39, SEQID NO:43, SEQID NO:45, SEQID NO:48, SEQ ID NO:62, over any of the consecutive amino acid SEQ ID NO:50, SEQ ID NO:52, SEQ ID NO:54, SEQ ID lengths described in the paragraph above, wherein the amino 40 NO:56, SEQID NO:58, SEQID NO:60, SEQID NO:62, and acid sequence has a biological activity of at least one domain SEQ ID NO:64, wherein the amino acid sequence has a of a PUFA PKS system. In one embodiment, the amino acid biological activity of at least one domain of a PUFA PKS sequence described above does not include any of the follow system. In another aspect, the amino acid sequence of the ingamino acid sequences: SEQID NO:2, SEQIDNO:4, SEQ protein is at least about 80% identical, and more preferably at ID NO:6, SEQ ID NO:8, SEQ ID NO:10, SEQ ID NO:13, 45 least about 85% identical, and more preferably at least about SEQ ID NO:18, SEQ ID NO:20, SEQ ID NO:22, SEQ ID 90% identical, and more preferably at least about 95% iden NO:24, SEQ ID NO:26, SEQ ID NO:28, SEQ ID NO:30, tical, and more preferably at least about 96% identical, and SEQID NO:32. more preferably at least about 97% identical, and more pref In one aspect of the invention, a PUFA PKS protein or erably at least about 98% identical, and more preferably at domain encompassed by the present invention, including a 50 least about 99% identical, to an amino acid sequence chosen homologue as described above, comprises an amino acid from: SEQID NO:39, SEQID NO:41, SEQID NO:43, SEQ sequence that is at least about 60% identical to an amino acid ID NO:45, SEQID NO:48, SEQID NO:50, SEQID NO:52, sequence chosen from: SEQID NO:39, SEQID NO:41, SEQ SEQ ID NO:54, SEQ ID NO:56, SEQ ID NO:58, SEQ ID ID NO:43, SEQID NO:45, SEQID NO:48, SEQID NO:50, NO:60, SEQ ID NO:62, SEQ ID NO:64, SEQ ID NO:66, SEQ ID NO:52, SEQ ID NO:54, SEQ ID NO:56, SEQ ID 55 SEQ ID NO:68, wherein the amino acid sequence has a NO:58, SEQ ID NO:60, SEQ ID NO:62, SEQ ID NO:64, biological activity of at least one domain of a PUFA PKS SEQ ID NO:66, SEQ ID NO:68, wherein the amino acid system. In one embodiment, the amino acid sequence sequence has a biological activity of at least one domain of a described above does not include any of the following amino PUFA PKS system. In a further aspect, the amino acid acid sequences: SEQID NO:2, SEQID NO:4, SEQID NO:6, sequence of the protein is at least about 65% identical, and 60 SEQ ID NO:8, SEQ ID NO:10, SEQ ID NO:13, SEQ ID more preferably at least about 70% identical, and more pref NO:18, SEQ ID NO:20, SEQ ID NO:22, SEQ ID NO:24, erably at least about 75% identical, and more preferably at SEQ ID NO:26, SEQ ID NO:28, SEQ ID NO:30, SEQ ID least about 80% identical, and more preferably at least about NO:32. 85% identical, and more preferably at least about 90% iden In a preferred embodiment an isolated protein or domain of tical, and more preferably at least about 95% identical, and 65 the present invention comprises, consists essentially of, or more preferably at least about 96% identical, and more pref consists of an amino acid sequence chosen from: SEQ ID erably at least about 97% identical, and more preferably at NO:39, SEQ ID NO:41, SEQ ID NO:43, SEQ ID NO:45, US 7,645,597 B2 41 42 SEQ ID NO:48, SEQ ID NO:50, SEQ ID NO:52, SEQ ID In another aspect of the invention, a PUFA PKS protein or NO:54, SEQ ID NO:56, SEQ ID NO:58, SEQ ID NO:60, domain useful in one or more embodiments of the present SEQ ID NO:62, SEQ ID NO:64, SEQ ID NO:66, SEQ ID invention comprises an amino acid sequence that is at least NO:68, or any biologically active fragments thereof, includ about 60% identical to an amino acid sequence chosen from: ing any fragments that have a biological activity of at least one SEQ ID NO:8, SEQ ID NO:10, SEQ ID NO:13, SEQ ID domain of a PUFA PKS system. NO:18, SEQ ID NO:20, SEQ ID NO:22, SEQ ID NO:24, In one aspect of the present invention, the following SEQID NO:26, SEQID NO:28, SEQID NO:30, or SEQID Schizochytrium proteins and domains are useful in one or NO:32, wherein the amino acid sequence has a biological more embodiments of the present invention, all of which have activity of at least one domain of a PUFA PKS system. In a been previously described in detail in U.S. patent application 10 further aspect, the amino acid sequence of the protein is at Ser. No. 10/124,800, supra. In one aspect of the invention, a least about 65% identical, and more preferably at least about PUFA PKS protein or domain useful in the present invention 70% identical, and more preferably at least about 75% iden comprises an amino acid sequence that is at least about 60% tical, and more preferably at least about 80% identical, and identical to at least 500 consecutive amino acids of an amino more preferably at least about 85% identical, and more pref acid sequence chosen from: SEQ ID NO:2, SEQ ID NO:4, 15 erably at least about 90% identical, and more preferably at and SEQ ID NO:6; wherein the amino acid sequence has a least about 95% identical, and more preferably at least about biological activity of at least one domain of a PUFA PKS 96% identical, and more preferably at least about 97% iden system. In a further aspect, the amino acid sequence of the tical, and more preferably at least about 98% identical, and protein is at least about 60% identical to at least about 600 more preferably at least about 99% identical to an amino acid consecutive amino acids, and more preferably to at least about sequence chosen from: SEQID NO:8, SEQID NO:10, SEQ 700 consecutive amino acids, and more preferably to at least ID NO:13, SEQID NO:18, SEQID NO:20, SEQID NO:22, about 800 consecutive amino acids, and more preferably to at SEQ ID NO:24, SEQ ID NO:26, SEQ ID NO:28, SEQ ID least about 900 consecutive amino acids, and more preferably NO:30, SEQID NO:32, wherein the amino acid sequence has to at least about 1000 consecutive amino acids, and more a biological activity of at least one domain of a PUFA PKS preferably to at least about 1100 consecutive amino acids, and 25 system. more preferably to at least about 1200 consecutive amino In yet another aspect of the invention, a PUFA PKS protein acids, and more preferably to at least about 1300 consecutive or domain useful in one or more embodiments of the present amino acids, and more preferably to at least about 1400 invention comprises, consists essentially of, or consists of, an consecutive amino acids, and more preferably to at least about amino acid sequence chosen from: SEQ ID NO:2, SEQ ID 1500 consecutive amino acids of any of SEQID NO:2, SEQ 30 NO:4, SEQIDNO:6, SEQIDNO:8, SEQID NO:10, SEQID ID NO.4 and SEQID NO:6, or to the full length of SEQID NO:13, SEQ ID NO:18, SEQ ID NO:20, SEQ ID NO:22, NO:6. In a further aspect, the amino acid sequence of the SEQ ID NO:24, SEQ ID NO:26, SEQID NO:28, SEQ ID protein is at least about 60% identical to at least about 1600 NO:30, SEQID NO:32 or any biologically active fragments consecutive amino acids, and more preferably to at least about thereof, including any fragments that have a biological activ 1700 consecutive amino acids, and more preferably to at least 35 ity of at least one domain of a PUFA PKS system. about 1800 consecutive amino acids, and more preferably to According to the present invention, the term "contiguous’ at least about 1900 consecutive amino acids, and more pref or “consecutive', with regard to nucleic acid or amino acid erably to at least about 2000 consecutive amino acids of any sequences described herein, means to be connected in an of SEQID NO:2 or SEQID NO:4, or to the full length of SEQ unbroken sequence. For example, for a first sequence to com ID NO:4. In a further aspect, the amino acid sequence of the 40 prise 30 contiguous (or consecutive) amino acids of a second protein is at least about 60% identical to at least about 2100 sequence, means that the first sequence includes an unbroken consecutive amino acids, and more preferably to at least about sequence of 30 amino acid residues that is 100% identical to 2200 consecutive amino acids, and more preferably to at least an unbroken sequence of 30 amino acid residues in the second about 2300 consecutive amino acids, and more preferably to sequence. Similarly, for a first sequence to have “100% iden at least about 2400 consecutive amino acids, and more pref 45 tity” with a second sequence means that the first sequence erably to at least about 2500 consecutive amino acids, and exactly matches the second sequence with no gaps between more preferably to at least about 2600 consecutive amino nucleotides or amino acids. acids, and more preferably to at least about 2700 consecutive As used herein, unless otherwise specified, reference to a amino acids, and more preferably to at least about 2800 percent (%) identity refers to an evaluation of homology consecutive amino acids, and even more preferably, to the full 50 which is performed using: (1) a BLAST 2.0 Basic BLAST length of SEQID NO:2. homology search using blastp foramino acid searches, blastin In another aspect, a PUFAPKS protein or domain useful in for nucleic acid searches, and blastX for nucleic acid searches one or more embodiments of the present invention comprises and searches of translated amino acids in all 6 open reading an amino acid sequence that is at least about 65% identical, frames, all with standard default parameters, wherein the and more preferably at least about 70% identical, and more 55 query sequence is filtered for low complexity regions by preferably at least about 75% identical, and more preferably default (described in Altschul, S.F., Madden, T. L., Schäffer, at least about 80% identical, and more preferably at least A. A., Zhang, J., Zhang, Z. Miller, W. & Lipman, D.J. (1997) about 85% identical, and more preferably at least about 90% “Gapped BLAST and PSI-BLAST: a new generation of pro identical, and more preferably at least about 95% identical, tein database search programs. Nucleic Acids Res. 25:3389 and more preferably at least about 96% identical, and more 60 3402, incorporated herein by reference in its entirety); (2) a preferably at least about 97% identical, and more preferably BLAST 2 alignment (using the parameters described below); at least about 98% identical, and more preferably at least (3) and/or PSI-BLAST with the standard default parameters about 99% identical to an amino acid sequence chosen from: (Position-Specific Iterated BLAST). It is noted that due to SEQID NO:2, SEQID NO:4, or SEQID NO:6, over any of some differences in the standard parameters between BLAST the consecutive amino acid lengths described in the paragraph 65 2.0 Basic BLAST and BLAST 2, two specific sequences above, wherein the amino acid sequence has a biological might be recognized as having significant homology using activity of at least one domain of a PUFA PKS system. the BLAST 2 program, whereas a search performed in US 7,645,597 B2 43 44 BLAST 2.0 Basic BLAST using one of the sequences as the deduce a complementary sequence are known to those skilled query sequence may not identify the second sequence in the in the art. It should be noted that since amino acid sequencing top matches. In addition, PSI-BLAST provides an automated, and nucleic acid sequencing technologies are not entirely easy-to-use version of a “profile search, which is a sensitive error-free, the sequences presented herein, at best, represent way to look for sequence homologues. The program first 5 apparent sequences of PUFA PKS domains and proteins of performs a gapped BLAST database search. The PSI-BLAST the present invention. program uses the information from any significant align As used herein, hybridization conditions refer to standard ments returned to construct a position-specific score matrix, hybridization conditions under which nucleic acid molecules which replaces the query sequence for the next round of are used to identify similar nucleic acid molecules. Such database searching. Therefore, it is to be understood that 10 standard conditions are disclosed, for example, in Sambrook percent identity can be determined by using any one of these et al., Molecular Cloning. A Laboratory Manual, Cold Spring programs. Harbor Labs Press, 1989. Sambrook et al., ibid., is incorpo Two specific sequences can be aligned to one another using rated by reference herein in its entirety (see specifically, pages BLAST 2 sequence as described in Tatusova and Madden, 9.31-9.62). In addition, formulae to calculate the appropriate (1999), "Blast 2 sequences—a new tool for comparing pro 15 hybridization and wash conditions to achieve hybridization tein and nucleotide sequences. FEMS Microbiol Lett. 174, permitting varying degrees of mismatch of nucleotides are 247, incorporated herein by reference in its entirety. BLAST disclosed, for example, in Meinkoth et al., 1984, Anal. Bio 2 sequence alignment is performed in blastp or blastin using chem. 138,267-284; Meinkoth et al., ibid., is incorporated by the BLAST 2.0 algorithm to perform a Gapped BLAST reference herein in its entirety. search (BLAST 2.0) between the two sequences allowing for More particularly, moderate stringency hybridization and the introduction of gaps (deletions and insertions) in the washing conditions, as referred to herein, refer to conditions resulting alignment. For purposes of clarity herein, a BLAST which permit isolation of nucleic acid molecules having at 2 sequence alignment is performed using the standard default least about 70% nucleic acid sequence identity with the parameters as follows. nucleic acid molecule being used to probe in the hybridiza For Blastin, using 0 BLOSUM62 Matrix: 25 tion reaction (i.e., conditions permitting about 30% or less Reward for match=1 mismatch of nucleotides). High Stringency hybridization and Penalty for mismatch=-2 washing conditions, as referred to herein, refer to conditions Open gap (5) and extension gap (2) penalties which permit isolation of nucleic acid molecules having at gap X dropoff (50) expect (10) word size (11) filter (on) least about 80% nucleic acid sequence identity with the 30 nucleic acid molecule being used to probe in the hybridiza For Blastp, using 0 BLOSUM62 Matrix: tion reaction (i.e., conditions permitting about 20% or less Open gap (11) and extension gap (1) penalties mismatch of nucleotides). Very high stringency hybridization gap X dropoff (50) expect (10) word size (3) filter (on). and washing conditions, as referred to herein, refer to condi According to the present invention, an amino acid tions which permit isolation of nucleic acid molecules having sequence that has a biological activity of at least one domain 35 at least about 90% nucleic acid sequence identity with the of a PUFA PKS system is an amino acid sequence that has the nucleic acid molecule being used to probe in the hybridiza biological activity of at least one domain of the PUFA PKS tion reaction (i.e., conditions permitting about 10% or less system described in detail herein, as previously exemplified mismatch of nucleotides). As discussed above, one of skill in by the Schizochytrium PUFA PKS system or as additionally the art can use the formulae in Meinkoth et al., ibid. to calcu exemplified herein by the Thraustochytrium PUFA PKS sys 40 late the appropriate hybridization and wash conditions to tem. The biological activities of the various domains within achieve these particular levels of nucleotide mismatch. Such the Schizochytrium or Thraustochytrium PUFA PKS systems conditions will vary, depending on whether DNA:RNA or have been described in detail above. Therefore, an isolated DNA:DNA hybrids are being formed. Calculated melting protein useful in the present invention can include the trans temperatures for DNA:DNA hybrids are 10° C. less than for lation product of any PUFA PKS open reading frame, any 45 DNA:RNA hybrids. In particular embodiments, stringent PUFA PKS domain, biologically active fragment thereof, or hybridization conditions for DNA:DNA hybrids include any homologue of a naturally occurring PUFA PKS open hybridization at an ionic strength of 6xSSC (0.9 M. Na") at a reading frame product or domain which has biological activ temperature of between about 20° C. and about 35°C. (lower ity. stringency), more preferably, between about 28°C. and about In another embodiment of the invention, an amino acid 50 40° C. (more stringent), and even more preferably, between sequence having the biological activity of at least one domain about 35° C. and about 45° C. (even more stringent), with of a PUFA PKS system of the present invention includes an appropriate wash conditions. In particular embodiments, amino acid sequence that is sufficiently similar to a naturally stringent hybridization conditions for DNA:RNA hybrids occurring PUFA PKS protein or polypeptide that a nucleic include hybridization at an ionic strength of 6xSSC (0.9 M acid sequence encoding the amino acid sequence is capable of 55 Na") at a temperature of between about 30° C. and about 45° hybridizing under moderate, high, or very high Stringency C., more preferably, between about 38°C. and about 50°C., conditions (described below) to (i.e., with) a nucleic acid and even more preferably, between about 45° C. and about molecule encoding the naturally occurring PUFA PKS pro 55°C., with similarly stringent wash conditions. These values tein or polypeptide (i.e., to the complement of the nucleic acid are based on calculations of a melting temperature for mol strand encoding the naturally occurring PUFAPKS protein or 60 ecules larger than about 100 nucleotides, 0% formamide and polypeptide). Preferably, an amino acid sequence having the a G+C content of about 40%. Alternatively, T, can be calcu biological activity of at least one domain of a PUFA PKS lated empirically as set forth in Sambrook et al., Supra, pages system of the present invention is encoded by a nucleic acid 9.31 to 9.62. In general, the wash conditions should be as sequence that hybridizes under moderate, high or very high stringent as possible, and should be appropriate for the chosen stringency conditions to the complement of a nucleic acid 65 hybridization conditions. For example, hybridization condi sequence that encodes any of the above-described amino acid tions can include a combination of salt and temperature con sequences for a PUFA PKS protein or domain. Methods to ditions that are approximately 20-25°C. below the calculated US 7,645,597 B2 45 46 T of a particular hybrid, and wash conditions typically at least about 100 amino acids in length, or at least about 150 include a combination of salt and temperature conditions that amino acids in length, or at least about 200 amino acids in are approximately 12-20° C. below the calculated T of the length, or at least about 250 amino acids in length, or at least particular hybrid. One example of hybridization conditions about 300 amino acids in length, or at least about 350 amino suitable for use with DNA:DNA hybrids includes a 2-24 hour acids in length, or at least about 400 amino acids in length, or hybridization in 6xSSC (50% formamide) at about 42°C., at least about 450 amino acids in length, or at least about 500 followed by washing steps that include one or more washes at amino acids in length, or at least about 750 amino acids in room temperature in about 2xSSC, followed by additional length, and so on, in any length between 8amino acids and up washes at higher temperatures and lower ionic strength (e.g., to the full length of a protein or domain of the invention or at least one wash as about 37° C. in about 0.1x-0.5xSSC, 10 longer, in whole integers (e.g., 8, 9, 10, ... 25, 26, ... 500, followed by at least one wash at about 68°C. in about 0.1x 501, . . . 1234, 1235. . . .). There is no limit, other than a 0.5xSSC). practical limit, on the maximum size of such a protein in that The present invention also includes a fusion protein that the protein can include a portion of a PUFA PKS protein, includes any PUFAPKS protein or domain or any homologue domain, or biologically active or useful fragment thereof, or or fragment thereof attached to one or more fusion segments. 15 a full-length PUFA PKS protein or domain, plus additional Suitable fusion segments for use with the present invention sequence (e.g., a fusion protein sequence), if desired. include, but are not limited to, segments that can: enhance a Further embodiments of the present invention include iso protein's stability; provide other desirable biological activity; lated nucleic acid molecules comprising, consisting essen and/or assist with the purification of the protein (e.g., by tially of, or consisting of nucleic acid sequences that encode affinity chromatography). A Suitable fusion segment can be a any of the above-identified proteins or domains, including a domain of any size that has the desired function (e.g., imparts homologue or fragment thereof, as well as nucleic acid increased stability, Solubility, biological activity; and/or sim sequences that are fully complementary thereto. In accor plifies purification of a protein). Fusion segments can be dance with the present invention, an isolated nucleic acid joined to amino and/or carboxyl termini of the protein and can molecule is a nucleic acid molecule that has been removed be susceptible to cleavage in order to enable straight-forward 25 from its natural milieu (i.e., that has been Subject to human recovery of the desired protein. Fusion proteins are preferably manipulation), its natural milieu being the genome or chro produced by culturing a recombinant cell transfected with a mosome in which the nucleic acid molecule is found in fusion nucleic acid molecule that encodes a protein including nature. As such, “isolated does not necessarily reflect the the fusion segment attached to either the carboxyl and/or extent to which the nucleic acid molecule has been purified, amino terminal end of the protein of the invention as dis 30 but indicates that the molecule does not include an entire cussed above. genome or an entire chromosome in which the nucleic acid In one embodiment of the present invention, any of the molecule is found in nature. An isolated nucleic acid mol above-described PUFAPKS amino acid sequences, as well as ecule can include a gene. An isolated nucleic acid molecule homologues of such sequences, can be produced with from at that includes a gene is not a fragment of a chromosome that least one, and up to about 20, additional heterologous amino 35 includes such gene, but rather includes the coding region and acids flanking each of the C- and/or N-terminal end of the regulatory regions associated with the gene, but no additional given amino acid sequence. The resulting protein or polypep genes naturally found on the same chromosome. An isolated tide can be referred to as “consisting essentially of a given nucleic acid molecule can also include a specified nucleic amino acid sequence. According to the present invention, the acid sequence flanked by (i.e., at the 5' and/or the 3' end of the heterologous amino acids are a sequence of amino acids that 40 sequence) additional nucleic acids that do not normally flank are not naturally found (i.e., not found in nature, in vivo) the specified nucleic acid sequence in nature (i.e., heterolo flanking the given amino acid sequence or which would not be gous sequences). Isolated nucleic acid molecule can include encoded by the nucleotides that flank the naturally occurring DNA, RNA (e.g., mRNA), or derivatives of either DNA or nucleic acid sequence encoding the given amino acid RNA (e.g., cDNA). Although the phrase “nucleic acid mol sequence as it occurs in the gene, if Such nucleotides in the 45 ecule' primarily refers to the physical nucleic acid molecule naturally occurring sequence were translated using standard and the phrase “nucleic acid sequence' primarily refers to the codon usage for the organism from which the given amino sequence of nucleotides on the nucleic acid molecule, the two acid sequence is derived. Similarly, the phrase “consisting phrases can be used interchangeably, especially with respect essentially of, when used with reference to a nucleic acid to a nucleic acid molecule, or a nucleic acid sequence, being sequence herein, refers to a nucleic acid sequence encoding a 50 capable of encoding a protein or domain of a protein. given amino acid sequence that can be flanked by from at least Preferably, an isolated nucleic acid molecule of the present one, and up to as many as about 60, additional heterologous invention is produced using recombinant DNA technology nucleotides at each of the 5' and/or the 3' end of the nucleic (e.g., polymerase chain reaction (PCR) amplification, clon acid sequence encoding the given amino acid sequence. The ing) or chemical synthesis. Isolated nucleic acid molecules heterologous nucleotides are not naturally found (i.e., not 55 include natural nucleic acid molecules and homologues found in nature, in vivo) flanking the nucleic acid sequence thereof, including, but not limited to, natural allelic variants encoding the given amino acid sequence as it occurs in the and modified nucleic acid molecules in which nucleotides natural gene. have been inserted, deleted, substituted, and/or inverted in The minimum size of a protein or domain and/or a homo Such a manner that Such modifications provide the desired logue or fragment thereof of the present invention is, in one 60 effect on PUFA PKS system biological activity as described aspect, a size Sufficient to have the requisite biological activ herein. Protein homologues (e.g., proteins encoded by ity, or Sufficient to serve as an antigen for the generation of an nucleic acid homologues) have been discussed in detail antibody or as a target in an in vitro assay. In one embodiment, above. a protein of the present invention is at least about 8 amino A nucleic acid molecule homologue can be produced using acids in length (e.g., Suitable for an antibody epitope or as a 65 a number of methods known to those skilled in the art (see, for detectable peptide in an assay), or at least about 25 amino example, Sambrook et al., Molecular Cloning. A Laboratory acids in length, or at least about 50 amino acids in length, or Manual, Cold Spring Harbor Labs Press, 1989). For example, US 7,645,597 B2 47 48 nucleic acid molecules can be modified using a variety of in detail above. According to the present invention, a recom techniques including, but not limited to, classic mutagenesis binant vector is an engineered (i.e., artificially produced) techniques and recombinant DNA techniques, such as site nucleic acid molecule that is used as a tool for manipulating a directed mutagenesis, chemical treatment of a nucleic acid nucleic acid sequence of choice and for introducing Such a molecule to induce mutations, restriction enzyme cleavage of 5 nucleic acid sequence into a host cell. The recombinant vector a nucleic acid fragment, ligation of nucleic acid fragments, is therefore Suitable for use in cloning, sequencing, and/or PCR amplification and/or mutagenesis of selected regions of otherwise manipulating the nucleic acid sequence of choice, a nucleic acid sequence, synthesis of oligonucleotide mix Such as by expressing and/or delivering the nucleic acid tures and ligation of mixture groups to “build a mixture of sequence of choice into a host cell to form a recombinant cell. nucleic acid molecules and combinations thereof. Nucleic 10 Such a vector typically contains heterologous nucleic acid acid molecule homologues can be selected from a mixture of sequences, that is nucleic acid sequences that are not naturally modified nucleic acids by screening for the function of the found adjacent to nucleic acid sequence to be cloned or deliv protein encoded by the nucleic acid and/or by hybridization ered, although the vector can also contain regulatory nucleic with a wild-type gene. acid sequences (e.g., promoters, untranslated regions) which The minimum size of a nucleic acid molecule of the present 15 are naturally found adjacent to nucleic acid molecules of the invention is a size sufficient to form a probe or oligonucle present invention or which are useful for expression of the otide primer that is capable of forming a stable hybrid (e.g., nucleic acid molecules of the present invention (discussed in under moderate, high or very high Stringency conditions) detail below). The vector can be either RNA or DNA, either with the complementary sequence of a nucleic acid molecule prokaryotic or eukaryotic, and typically is a plasmid. The useful in the present invention, or of a size sufficient to encode vector can be maintained as an extrachromosomal element an amino acid sequence having a biological activity of at least (e.g., a plasmid) or it can be integrated into the chromosome one domain of a PUFA PKS system according to the present of a recombinant organism (e.g., a microbe or a plant). The invention. As such, the size of the nucleic acid molecule entire vector can remain in place within a host cell, or under encoding Such a protein can be dependent on nucleic acid certain conditions, the plasmid DNA can be deleted, leaving composition and percent homology or identity between the 25 behind the nucleic acid molecule of the present invention. The nucleic acid molecule and complementary sequence as well integrated nucleic acid molecule can be under chromosomal as upon hybridization conditions per se (e.g., temperature, promoter control, under native or plasmid promoter control, salt concentration, and formamide concentration). The mini or under a combination of several promoter controls. Single mal size of a nucleic acid molecule that is used as an oligo or multiple copies of the nucleic acid molecule can be inte nucleotide primer or as a probe is typically at least about 12 to 30 grated into the chromosome. A recombinant vector of the about 15 nucleotides in length if the nucleic acid molecules present invention can contain at least one selectable marker. are GC-rich and at least about 15 to about 18 bases in length In one embodiment, a recombinant vector used in a recom if they are AT-rich. There is no limit, other than a practical binant nucleic acid molecule of the present invention is an limit, on the maximal size of a nucleic acid molecule of the expression vector. As used herein, the phrase “expression present invention, in that the nucleic acid molecule can 35 vector” is used to refer to a vector that is suitable for produc include a sequence sufficient to encode a biologically active tion of an encoded product (e.g., a protein of interest). In this fragment of a domain of a PUFA PKS system, an entire embodiment, a nucleic acid sequence encoding the product to domain of a PUFA PKS system, several domains within an be produced (e.g., a PUFA PKS domain) is inserted into the open reading frame (Orf) of a PUFA PKS system, an entire recombinant vector to produce a recombinant nucleic acid Orfof a PUFA PKS system, or more than one Orf of a PUFA 40 molecule. The nucleic acid sequence encoding the protein to PKS system. be produced is inserted into the vector in a manner that opera In one embodiment of the present invention, an isolated tively links the nucleic acid sequence to regulatory sequences nucleic acid molecule comprises, consists essentially of, or in the vector which enable the transcription and translation of consists of a nucleic acid sequence encoding any of the above the nucleic acid sequence within the recombinant host cell. described amino acid sequences, including any of the amino 45 In another embodiment, a recombinant vector used in a acid sequences, or homologues thereof, from a recombinant nucleic acid molecule of the present invention is Schizochytrium or Thraustochytrium described herein. In one a targeting vector. As used herein, the phrase “targeting vec aspect, the nucleic acid sequence is selected from the group tor” is used to refer to a vector that is used to deliver a of: SEQ ID NO:1, SEQ ID NO:3, SEQ ID NO:5, SEQ ID particular nucleic acid molecule into a recombinant host cell, NO:7, SEQID NO:9, SEQID NO:12, SEQID NO:17, SEQ 50 wherein the nucleic acid molecule is used to delete or inacti ID NO:19, SEQID NO:21, SEQID NO:23, SEQID NO:25, vate an endogenous gene within the host cell or microorgan SEQ ID NO:27, SEQ ID NO:29, SEQ ID NO:31, SEQ ID ism (i.e., used for targeted gene disruption or knock-out tech NO:38, SEQ ID NO:40, SEQ ID NO:42, SEQ ID NO:44, nology). Such a vector may also be known in the art as a SEQ ID NO:47, SEQ ID NO:49, SEQ ID NO:51, SEQ ID “knock-out' Vector. In one aspect of this embodiment, a por NO:53, SEQ ID NO:55, SEQ ID NO:57, SEQ ID NO:59, 55 tion of the vector, but more typically, the nucleic acid mol SEQID NO:61, SEQID NO:63, SEQID NO:65, or SEQID ecule inserted into the vector (i.e., the insert), has a nucleic NO:67, or homologues (including sequences that are at least acid sequence that is homologous to a nucleic acid sequence about 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, of a target gene in the host cell (i.e., a gene which is targeted 95%, 96%, 97%, 98%, or 99% identical to such sequences), to be deleted or inactivated). The nucleic acid sequence of the or fragments thereof, or any complementary sequences 60 vector insertis designed to bind to the target gene Such that the thereof. target gene and the insert undergo homologous recombina Another embodiment of the present invention includes a tion, whereby the endogenous target gene is deleted, inacti recombinant nucleic acid molecule comprising a recombi vated or attenuated (i.e., by at least a portion of the endog nant vector and a nucleic acid sequence encoding protein or enous target gene being mutated or deleted). The use of this peptide having a biological activity of at least one domain (or 65 type of recombinant vector to replace an endogenous homologue or fragment thereof) of a PUFA PKS system as Schizochytrium gene with a recombinant gene is described in described herein. Such nucleic acid sequences are described the Examples section, and the general technique for genetic US 7,645,597 B2 49 50 transformation of Thraustochytrids is described in detail in that the regulatory elements, under the appropriate condi U.S. patent application Ser. No. 10/124,807, published as tions, would provide for coordinated, high level expression of U.S. Patent Application Publication No. 20030166207, pub the two introduced genes. The complete nucleotide sequence lished Sep. 4, 2003. for the regulatory region containing Schizochytrium PUFA Typically, a recombinant nucleic acid molecule includes at 5 PKS regulatory elements (e.g., a promoter) is represented least one nucleic acid molecule of the present invention herein as SEQID NO:36. operatively linked to one or more expression control In a similar manner, Orf is highly expressed in sequences. As used herein, the phrase “recombinant mol Schizochytrium during the time of oil production and regula ecule' or “recombinant nucleic acid molecule' primarily tory elements are expected to reside in the region upstream of refers to a nucleic acid molecule or nucleic acid sequence 10 its start codon. A region of genomic DNA upstream of Orf operatively linked to a expression control sequence, but can has been cloned and sequenced and is represented herein as be used interchangeably with the phrase “nucleic acid mol (SEQID NO:37). This sequence contains the 3886 nt imme ecule', when Such nucleic acid molecule is a recombinant diately upstream of the Orfc start codon. Examination of this molecule as discussed herein. According to the present inven region did not reveal any obvious Orfs (i.e., no significant tion, the phrase “operatively linked refers to linking a 15 matches were found on a BlastX search). It is believed that nucleic acid molecule to an expression control sequence (e.g., regulatory elements contained in this region, under the appro a transcription control sequence and/or a translation control priate conditions, will provide for high-level expression of a sequence) in a manner Such that the molecule is able to be gene placed behind them. Additionally, under the appropriate expressed when transfected (i.e., transformed, transduced, transfected, conjugated or conduced) into a host cell. Tran- 20 conditions, the level of expression may be coordinated with Scription control sequences are sequences which control the genes under control of the A-B intergenic region (SEQ ID initiation, elongation, or termination of transcription. Particu NO:36). larly important transcription control sequences are those Therefore, in one embodiment, a recombinant nucleic acid which control transcription initiation, Such as promoter, molecule useful in the present invention, as disclosed herein, enhancer, operator and repressor sequences. Suitable tran- 25 can include a PUFA PKS regulatory region contained within Scription control sequences include any transcription control SEQ ID NO:36 and/or SEQ ID NO:37. Such a regulatory sequence that can function in a host cell or organism into region can include any portion (fragment) of SEQID NO:36 which the recombinant nucleic acid molecule is to be intro and/or SEQ ID NO:37 that has at least basal PUFA PKS duced. transcriptional activity. Recombinant nucleic acid molecules of the present inven- 30 One or more recombinant molecules of the present inven tion can also contain additional regulatory sequences, such as tion can be used to produce an encoded product (e.g., a PUFA translation regulatory sequences, origins of replication, and PKS domain, protein, or system) of the present invention. In other regulatory sequences that are compatible with the one embodiment, an encoded product is produced by express recombinant cell. In one embodiment, a recombinant mol ing a nucleic acid molecule as described herein under condi ecule of the present invention, including those which are 35 tions effective to produce the protein. A preferred method to integrated into the host cell chromosome, also contains secre produce an encoded protein is by transfecting a host cell with tory signals (i.e., signal segment nucleic acid sequences) to one or more recombinant molecules to form a recombinant enable an expressed protein to be secreted from the cell that cell. Suitable host cells to transfect include, but are not limited produces the protein. Suitable signal segments include a sig to, any bacterial, fungal (e.g., yeast), insect, plant or animal nal segment that is naturally associated with the protein to be 40 cell that can be transfected. In one embodiment of the inven expressed or any heterologous signal segment capable of tion, a preferred host cell is a Thraustochytrid host cell (de directing the secretion of the protein according to the present scribed in detail below) or a plant host cell. Host cells can be invention. In another embodiment, a recombinant molecule either untransfected cells or cells that are already transfected of the present invention comprises a leader sequence to with at least one other recombinant nucleic acid molecule. enable an expressed protein to be delivered to and inserted 45 According to the present invention, the term “transfection' into the membrane of a host cell. Suitable leader sequences is used to refer to any method by which an exogenous nucleic include a leader sequence that is naturally associated with the acid molecule (i.e., a recombinant nucleic acid molecule) can protein, or any heterologous leader sequence capable of be inserted into a cell. The term “transformation' can be used directing the delivery and insertion of the protein to the mem interchangeably with the term “transfection' when such term brane of a cell. 50 is used to refer to the introduction of nucleic acid molecules The present inventors have found that the Schizochytrium into microbial cells, such as algae, bacteria and yeast, or into PUFAPKSOrfs A and B are closely linked in the genome and plants. In microbial systems, the term “transformation” is region between the Orfs has been sequenced. The Orfs are used to describe an inherited change due to the acquisition of oriented in opposite directions and 4244 base pairs separate exogenous nucleic acids by the microorganism or plant and is the start (ATG) codons (i.e. they are arranged as follows: 55 essentially synonymous with the term “transfection.” How 3'OrfA5' 4244 bp–5'OrfB3'). Examination of the 4244 bp ever, in animal cells, transformation has acquired a second intergenic region did not reveal any obvious Orfs (no signifi meaning which can refer to changes in the growth properties cant matches were found on a BlastX search). Both Orfs. A of cells in culture after they become cancerous, for example. and B are highly expressed in Schizochytrium, at least during Therefore, to avoid confusion, the term “transfection' is pref the time of oil production, implying that active promoter 60 erably used with regard to the introduction of exogenous elements are embedded in this intergenic region. These nucleic acids into animal cells, and the term “transfection' genetic elements are believed to have utility as a bi-direc will be used herein to generally encompass transfection of tional promoter sequence for transgenic applications. For animal cells, and transformation of microbial cells or plant example, in a preferred embodiment, one could clone this cells, to the extent that the terms pertain to the introduction of region, place any genes of interest at each end and introduce 65 exogenous nucleic acids into a cell. Therefore, transfection the construct into Schizochytrium (or some other host in techniques include, but are not limited to, transformation, which the promoters can be shown to function). It is predicted particle bombardment, diffusion, active transport, bath Soni US 7,645,597 B2 51 52 cation, electroporation, microinjection, lipofection, adsorp duces at least one PUFA; and, (b) identifying a microorgan tion, infection and protoplast fusion. ism from (a) that has an ability to produce increased PUFAs It will be appreciated by one skilled in the art that use of under dissolved oxygen conditions of less than about 5% of recombinant DNA technologies can improve control of saturation in the fermentation medium, as compared to pro expression of transfected nucleic acid molecules by manipu duction of PUFAs by the microorganism under dissolved lating, for example, the number of copies of the nucleic acid oxygen conditions of greater than about 5% of saturation, and molecules within the host cell, the efficiency with which those preferably about 10%, and more preferably about 15%, and nucleic acid molecules are transcribed, the efficiency with more preferably about 20% of saturation in the fermentation which the resultant transcripts are translated, and the effi medium. ciency of post-translational modifications. Additionally, the 10 In one aspect, such an organism can endogenously contain promoter sequence might be genetically engineered to and express a PUFA PKS system, and the genetic modifica improve the level of expression as compared to the native tion can be a genetic modification of one or more of the promoter. Recombinant techniques useful for controlling the functional domains of the endogenous PUFA PKS system, expression of nucleic acid molecules include, but are not whereby the modification has some effect on the activity of limited to, integration of the nucleic acid molecules into one 15 the PUFA PKS system. In another aspect, such an organism or more host cell chromosomes, addition of vector stability can endogenously contain and express a PUFA PKS system, sequences to plasmids, Substitutions or modifications of tran and the genetic modification can be an introduction of at least Scription control signals (e.g., promoters, operators, enhanc one exogenous nucleic acid sequence (e.g., a recombinant ers), Substitutions or modifications of translational control nucleic acid molecule), wherein the exogenous nucleic acid signals (e.g., ribosome binding sites, Shine-Dalgarno sequence encodes at least one biologically active domain or sequences), modification of nucleic acid molecules to corre protein from a second PKS system and/or a protein that spond to the codon usage of the host cell, and deletion of affects the activity of the PUFA PKS system (e.g., a phospho sequences that destabilize transcripts. pantetheinyl transferases (PPTase), discussed below). In yet General discussion above with regard to recombinant another aspect, the organism does not necessarily endog nucleic acid molecules and transfection of host cells is 25 enously (naturally) contain a PUFA PKS system, but is intended to be applied to any recombinant nucleic acid mol genetically modified to introduce at least one recombinant ecule discussed herein, including those encoding any amino nucleic acid molecule encoding an amino acid sequence hav acid sequence having a biological activity of at least one ing the biological activity of at least one domain of a PUFA domain from a PUFA PKS, those encoding amino acid PKS system. In this aspect, PUFA PKS activity is affected by sequences from other PKS systems, and those encoding other 30 introducing or increasing PUFAPKS activity in the organism. proteins or domains. Various embodiments associated with each of these aspects Polyunsaturated fatty acids (PUFAs) are essential mem will be discussed in greater detail below. brane components in higher eukaryotes and the precursors of It is to be understood that a genetic modification of a PUFA many lipid-derived signaling molecules. The PUFA PKS sys PKS system or an organism comprising a PUFA PKS system tem of the present invention uses pathways for PUFA synthe 35 can involve the modification of at least one domain of a PUFA sis that do not require desaturation and elongation of Satu PKS system (including a portion of a domain), more than one rated fatty acids. The pathways catalyzed by PUFAPKSs that or several domains of a PUFA PKS system (including adja are distinct from previously recognized PKSs in both struc cent domains, non-contiguous domains, or domains on dif ture and mechanism. Generation of cis double bonds is Sug ferent proteins in the PUFA PKS system), entire proteins of gested to involve position-specific isomerases; these enzymes 40 the PUFA PKS system, and the entire PUFA PKS system are believed to be useful in the production of new families of (e.g., all of the proteins encoded by the PUFA PKS genes). As antibiotics. Such, modifications can include a small modification to a To produce significantly high yields of one or more desired single domain of an endogenous PUFA PKS system; to sub polyunsaturated fatty acids or other bioactive molecules, an stitution, deletion or addition to one or more domains or organism, preferably a microorganism or a plant, and most 45 proteins of a given PUFA PKS system; up to replacement of preferably a Thraustochytrid microorganism, can be geneti the entire PUFA PKS system in an organism with the PUFA cally modified to alter the activity and particularly, the end PKS system from a different organism. One of skill in the art product, of the PUFA PKS system in the microorganism or will understand that any genetic modification to a PUFAPKS plant. system is encompassed by the invention. Therefore, one embodiment of the present invention relates 50 As used herein, a genetically modified microorganism can to a genetically modified microorganism, wherein the micro include a genetically modified bacterium, protist, microalgae, organism expresses a PKS system comprising at least one fungus, or other microbe, and particularly, any of the genera biologically active domain of a polyunsaturated fatty acid of the order Thraustochytriales (e.g., a Thraustochytrid) (PUFA) polyketide synthase (PKS) system. The domain of described herein (e.g., Schizochytrium, Thraustochytrium, the PUFA PKS system can include any of the domains, 55 Japonochytrium, Labyrinthula, Labyrinthuloides, etc.). Such including homologues thereof, for PUFA PKS systems as a genetically modified microorganism has a genome which is described above (e.g., for Schizochytrium and Thraus modified (i.e., mutated or changed) from its normal (i.e., tochytrium), and can also include any domain of a PUFA PKS wild-type or naturally occurring) form such that the desired system from any other non-bacterial microorganism, includ result is achieved (i.e., increased or modified PUFA PKS ing any eukaryotic microorganism, including any Thraus 60 activity and/or production of a desired product using the PKS tochytrid microorganism or any domain of a PUFA PKS system). Genetic modification of a microorganism can be system from a microorganism identified by a screening accomplished using classical strain development and/or method as described in U.S. patent application Ser. No. molecular genetic techniques. Such techniques known in the 10/124,800, supra. The genetic modification affects the activ art and are generally disclosed for microorganisms, for ity of the PKS system in the organism. The screening process 65 example, in Sambrook et al., 1989, Molecular Cloning. A described in U.S. patent application Ser. No. 10/124.800 Laboratory Manual, Cold Spring Harbor Labs Press. The includes the steps of: (a) selecting a microorganism that pro reference Sambrook et al., ibid., is incorporated by reference US 7,645,597 B2 53 54 herein in its entirety. A genetically modified microorganism 28210); and Japonochytrium sp. (L1)(ATCC 28207). Other can include a microorganism in which nucleic acid molecules examples of Suitable host microorganisms for genetic modi have been inserted, deleted or modified (i.e., mutated; e.g., by fication include, but are not limited to, yeast including Sac insertion, deletion, Substitution, and/or inversion of nucle charomyces cerevisiae, Saccharomyces Carlsbergensis, or otides), in Such a manner that such modifications provide the other yeast Such as Candida, Kluyveromyces, or other fungi, desired effect within the microorganism. for example, filamentous fungi such as Aspergillus, Neuro Preferred microorganism host cells to modify according to spora, Penicillium, etc. Bacterial cells also may be used as the present invention include, but are not limited to, any hosts. These include, but are not limited to, Escherichia coli, bacteria, protist, microalga, fungus, or protozoa. In one which can be useful in fermentation processes. Alternatively, aspect, preferred microorganisms to genetically modify 10 and only by way of example, a host such as a Lactobacillus include, but are not limited to, any microorganism of the order species or Bacillus species can be used as a host. Thraustochytriales, including any microorganism in the fami Another embodiment of the present invention relates to a lies Thraustochytriaceae and Labyrinthulaceae. Particularly genetically modified plant, wherein the planthas been geneti preferred host cells for use in the present invention could cally modified to recombinantly express a PKS system com include microorganisms from a genus including, but not lim 15 prising at least one biologically active domain of a polyun ited to: Thraustochytrium, Japonochytrium, Aplanochytrium, saturated fatty acid (PUFA) polyketide synthase (PKS) Elina and Schizochytrium within the Thraustochytriaceae and system. The domain of the PUFAPKS system can include any Labyrinthula, Labyrinthuloides, and Labyrinthomyxa within of the domains, including homologues thereof, for PUFA the Labyrinthulaceae. Preferred species within these genera PKS systems as described above (e.g., for Schizochytrium include, but are not limited to: any species within Labyrin and/or Thraustochytrium), and can also include any domain thula, including Labrinthula sp., Labyrinthula algeriensis, of a PUFA PKS system from any non-bacterial microorgan Labyrinthula cienkowski, Labyrinthula chattonii, Labyrin ism (including any eukaryotic microorganism and any other thula Coenocystis, Labyrinthula macrocystis, Labyrinthula Thraustochytrid microorganism) or any domain of a PUFA macrocystis atlantica, Labyrinthula macrocystis macrocys PKS system from a microorganism identified by a screening tis, Labyrinthula magnifica, Labyrinthula minuta, Labyrin 25 method as described in U.S. patent application Ser. No. thula roscoffensis, Labyrinthula valkanovii, Labyrinthula 10/124,800, supra. The plant can also be further modified vitellina, Labyrinthula vitellina pacifica, Labyrinthula with at least one domain or biologically active fragment vitellina vitellina, Labyrinthula Zopfi; any Labyrinthuloides thereof of another PKS system, including, but not limited to, species, including Labyrinthuloides sp., Labyrinthuloides bacterial PUFA PKS or PKS systems, Type I PKS systems, minuta, Labyrinthuloides schizochytrops; any Labyrinth 30 Type II PKS systems, modular PKS systems, and/or any omyxa species, including Labyrinthomyxa sp., Labyrinth non-bacterial PUFA PKS system (e.g., eukaryotic, Thraus Onyxa pohlia, Labyrinthomyxa sauvageaui, any Apl.- tochytrid, Thraustochytriaceae or Labyrinthulaceae, anochytrium species, including Aplanochytrium sp. and Schizochytrium, etc.). Aplanochytrium kerguelensis; any Elina species, including As used herein, a genetically modified plant can include Elina sp., Elina marisalba, Elina Sinorifica; any 35 any genetically modified plant including higher plants and Japanochytrium species, including Japanochytrium sp., particularly, any consumable plants or plants useful for pro Japanochytrium marinum; any Schizochytrium species, ducing a desired bioactive molecule of the present invention. including Schizochytrium sp., Schizochytrium aggregatum, Such a genetically modified plant has a genome which is Schizochytrium limacinum, Schizochytrium minutum, modified (i.e., mutated or changed) from its normal (i.e., Schizochytrium Octosporum; and any Thraustochytrium spe 40 wild-type or naturally occurring) form such that the desired cies, including Thraustochytrium sp., Thraustochytrium result is achieved (i.e., increased or modified PUFA PKS aggregatum, Thraustochytrium arudimentale, Thraus activity and/or production of a desired product using the PKS tochytrium aureum, Thraustochytrium benthicola, Thraus system). Genetic modification of a plant can be accomplished tochytrium globosum, Thraustochytrium kinnei, Thraus using classical strain development and/or molecular genetic tochytrium motivum, Thraustochytrium pachydermum, 45 techniques. Methods for producing a transgenic plant, Thraustochytrium proliferum, Thraustochytrium roseum, wherein a recombinant nucleic acid molecule encoding a Thraustochytrium striatum, Ulkenia sp., Ulkenia minuta, desired amino acid sequence is incorporated into the genome Ulkenia profinda, Ulkenia radiate, Ulkenia Sarkariana, and of the plant, are known in the art. A preferred plant to geneti Ulkenia visurgensis. Particularly preferred species within cally modify according to the present invention is preferably these genera include, but are not limited to: any 50 a plant Suitable for consumption by animals, including Schizochytrium species, including Schizochytrium aggrega humans. tum, Schizochytrium limacinum, Schizochytrium minutum; Preferred plants to genetically modify according to the any Thraustochytrium species (including former Ulkenia present invention (i.e., plant host cells) include, but are not species Such as U. visurgensis, U. amoeboida, U. Sarkariana, limited to any higher plants, and particularly consumable U. profinda, U. radiata, U. minuta and Ulkenia sp. BP-5601), 55 plants, including crop plants and especially plants used for and including Thraustochytrium striatum, Thraustochytrium their oils. Such plants can include, for example: canola, Soy aureum, Thraustochytrium roseum; and any Japonochytrium beans, rapeseed, linseed, corn, safflowers, Sunflowers and species. Particularly preferred strains of Thraustochytriales tobacco. Other preferred plants include those plants that are include, but are not limited to: Schizochytrium sp. (S31) known to produce compounds used as pharmaceutical agents, (ATCC 20888); Schizochytrium sp. (S8)(ATCC 20889); 60 flavoring agents, neutraceutical agents, functional food ingre Schizochytrium sp. (LC-RM)(ATCC 18915); Schizochytrium dients or cosmetically active agents or plants that are geneti sp. (SR21); Schizochytrium aggregatum (Goldstein et Bel cally engineered to produce these compounds/agents. sky)(ATCC 28209); Schizochytrium limacinum (Honda et According to the present invention, a genetically modified Yokochi)(IFO 32693); Thraustochytrium sp. (23B)(ATCC microorganism or plant includes a microorganism or plant 20891); Thraustochytrium striatum (Schneider)(ATCC 65 that has been modified using recombinant technology or by 24473); Thraustochytrium aureum (Goldstein)(ATCC classical mutagenesis and Screening techniques. As used 34304); Thraustochytrium roseum (Goldstein)(ATCC herein, genetic modifications which result in a decrease in US 7,645,597 B2 55 56 gene expression, in the function of the gene, or in the function PUFAs in different positions in a triacylglycerol as compared of the gene product (i.e., the protein encoded by the gene) can to the natural configuration). Such a genetic modification be referred to as inactivation (complete or partial), deletion, includes any type of genetic modification and specifically interruption, blockage or down-regulation of a gene. For includes modifications made by recombinant technology and example, a genetic modification in a gene which results in a by classical mutagenesis. decrease in the function of the protein encoded by Such gene, It should be noted that reference to increasing the activity can be the result of a complete deletion of the gene (i.e., the of a functional domain or protein in a PUFA PKS system gene does not exist, and therefore the protein does not exist), refers to any genetic modification in the organism containing a mutation in the gene which results in incomplete or no the domain or protein (or into which the domain or protein is translation of the protein (e.g., the protein is not expressed), or 10 to be introduced) which results in increased functionality of a mutation in the gene which decreases or abolishes the natu the domain or protein system and can include higher activity ral function of the protein (e.g., a protein is expressed which of the domain or protein (e.g., specific activity or in vivo has decreased or no enzymatic activity or action). Genetic enzymatic activity), reduced inhibition or degradation of the modifications that result in an increase in gene expression or domain or protein system, and overexpression of the domain function can be referred to as amplification, overproduction, 15 or protein. For example, gene copy number can be increased, overexpression, activation, enhancement, addition, or expression levels can be increased by use of a promoter that up-regulation of a gene. gives higher levels of expression than that of the native pro The genetic modification of a microorganism or plant moter, or a gene can be altered by genetic engineering or according to the present invention preferably affects the activ classical mutagenesis to increase the activity of the domain or ity of the PKS system expressed by the microorganism or protein encoded by the gene. plant, whether the PKS system is endogenous and genetically Similarly, reference to decreasing the activity of a func modified, endogenous with the introduction of recombinant tional domain or protein in a PUFA PKS system refers to any nucleic acid molecules into the organism (with the option of genetic modification in the organism containing such domain modifying the endogenous system or not), or provided com or protein (or into which the domain or protein is to be pletely by recombinant technology. To alter the PUFA pro 25 introduced) which results in decreased functionality of the duction profile of a PUFA PKS system or organism express domain or protein and includes decreased activity of the ing Such system includes causing any detectable or domain or protein, increased inhibition or degradation of the measurable change in the production of any one or more domain or protein and a reduction or elimination of expres PUFAS by the host microorganism or plant as compared to in sion of the domain or protein. For example, the action of the absence of the genetic modification (i.e., as compared to 30 domain or protein of the present invention can be decreased the unmodified, wild-type microorganism or plant or the by blocking or reducing the production of the domain or microorganism or plant that is unmodified at least with protein, “knocking out the gene or portion thereof encoding respect to PUFA synthesis—i.e., the organism might have the domain or protein, reducing domain or protein activity, or other modifications not related to PUFA synthesis). To affect inhibiting the activity of the domain or protein. Blocking or the activity of a PKS system includes any genetic modifica 35 reducing the production of a domain or protein can include tion that causes any detectable or measurable change or modi placing the gene encoding the domain or protein under the fication in the PKS system expressed by the organism as control of a promoter that requires the presence of an inducing compared to in the absence of the genetic modification. A compound in the growth medium. By establishing conditions detectable change or modification in the PKS system can such that the inducer becomes depleted from the medium, the include, but is not limited to: a change or modification (intro 40 expression of the gene encoding the domain or protein (and duction of increase or decrease) of the expression and/or therefore, of protein synthesis) could be turned off. The biological activity of any one or more of the domains in a present inventors demonstrate the ability to delete (knockout) modified PUFA PKS system as compared to the endogenous targeted genes in a Thraustochytrid microorganism in the PUFAPKS system in the absence of genetic modification, the Examples section. Blocking or reducing the activity of introduction of PKS system activity into an organism Such 45 domain or protein could also include using an excision tech that the organism now has measurable/detectable PKS system nology approach similar to that described in U.S. Pat. No. activity (i.e., the organism did not contain a PKS system prior 4,743,546, incorporated herein by reference. To use this to the genetic modification), the introduction into the organ approach, the gene encoding the protein of interest is cloned ism of a functional domain from a different PKS system than between specific genetic sequences that allow specific, con a PKS system endogenously expressed by the organism Such 50 trolled excision of the gene from the genome. Excision could that the PKS system activity is modified (e.g., a bacterial be prompted by, for example, a shift in the cultivation tem PUFA PKS domain or a type IPKS domain is introduced into perature of the culture, as in U.S. Pat. No. 4,743,546, or by an organism that endogenously expresses a non-bacterial Some other physical or nutritional signal. PUFA PKS system), a change in the amount of a bioactive In one embodiment of the present invention, a genetic molecule (e.g., a PUFA) produced by the PKS system (e.g., 55 modification includes a modification of a nucleic acid the system produces more (increased amount) or less (de sequence encoding an amino acid sequence that has a bio creased amount) of a given product as compared to in the logical activity of at least one domain of a non-bacterial absence of the genetic modification), a change in the type of PUFA PKS system as described herein (e.g., a domain, more a bioactive molecule (e.g., a change in the type of PUFA) than one domain, a protein, or the entire PUFA PKS system, produced by the PKS system (e.g., the system produces an 60 of an endogenous PUFA PKS system of a Thraustochytrid additional or different PUFA, a new or different product, or a host). Such a modification can be made to an amino acid variant of a PUFA or other product that is naturally produced sequence within an endogenously (naturally) expressed non by the system), and/or a change in the ratio of multiple bio bacterial PUFA PKS system, whereby a microorganism that active molecules produced by the PKS system (e.g., the sys naturally contains such a system is genetically modified by, tem produces a different ratio of one PUFA to another PUFA, 65 for example, classical mutagenesis and selection techniques produces a completely different lipid profile as compared to and/or molecular genetic techniques, include genetic engi in the absence of the genetic modification, or places various neering techniques. Genetic engineering techniques can US 7,645,597 B2 57 58 include, for example, using a targeting recombinant vector to tions and methods described herein. For example, microbial delete a portion of an endogenous gene (demonstrated in the organisms with a PUFA PKS system similar to that found in Examples), or to replace a portion of an endogenous gene Schizochytrium, Such as the Thraustochytrium microorgan with a heterologous sequence (demonstrated in the ism discovered by the present inventors and described in Examples). Examples of heterologous sequences that could Example 1, can be readily identified/isolated/screened by beintroduced into a host genome include sequences encoding methods to identify other non-bacterial microorganisms that at least one functional domain from another PKS system, have a polyunsaturated fatty acid (PUFA) polyketide syn such as a different non-bacterial PUFA PKS system (e.g., thase (PKS) system that are described in detail in U.S. Patent from a eukaryote, including another Thraustochytrid), a bac Application Publication No. 20020194641, supra (corre terial PUFA PKS system, a type IPKS system, a type II PKS 10 sponding to U.S. patent application Ser. No. 10/124.800). system, or a modular PKS system. A heterologous sequence Locations for collection of the preferred types of microbes can also include an entire PUFA PKS system (e.g., all genes for screening for a PUFA PKS system according to the associated with the PUFA PKS system) that is used to replace present invention include any of the following: low oxygen the entire endogenous PUFA PKS system (e.g., all genes of environments (or locations near these types of low oxygen the endogenous PUFAPKS system) in a host. A heterologous 15 environments including in the guts of animals including sequence can also include a sequence encoding a modified invertebrates that consume microbes or microbe-containing functional domain (a homologue) of a natural domain from a foods (including types of filter feeding organisms), low or PUFA PKS system of a host Thraustochytrid (e.g., a nucleic non-oxygen containing aquatic habitats (including freshwa acid sequence encoding a modified domain from OrfB of a ter, Saline and marine), and especially at-or near-low oxygen Schizochytrium, wherein the modified domain will, when environments (regions) in the oceans. The microbial strains used to replace the naturally occurring domain expressed in would preferably not be obligate anaerobes but be adapted to the Schizochytrium, alter the PUFA production profile by the live in both aerobic and low or anoxic environments. Soil Schizochytrium). Other heterologous sequences to introduce environments containing both aerobic and low oxygen or into the genome of a host includes a sequence encoding a anoxic environments would also excellent environments to protein or functional domain that is not a domain of a PKS 25 find these organisms in and especially in these types of soil in system, but which will affect the activity of the endogenous aquatic habitats or temporary aquatic habitats. PKS system. For example, one could introduce into the host A particularly preferred non-bacterial microbial strain to genome a nucleic acid molecule encoding a phosphopanteth screen for use as a host and/or a source of PUFA PKS genes einyl transferase (discussed below). Specific modifications according to the present invention would be a strain (selected that could be made to an endogenous PUFA PKS system are 30 from the group consisting of algae, fungi (including yeast), discussed in detail herein. protozoa or protists) that, during a portion of its life cycle, is In another aspect of this embodiment of the invention, the capable of consuming whole bacterial cells (bacterivory) by genetic modification can include: (1) the introduction of a mechanisms such as phagocytosis, phagotrophic or endocytic recombinant nucleic acid molecule encoding an amino acid capability and/or has a stage of its life cycle in which it exists sequence having a biological activity of at least one domain of 35 as an amoeboid stage or naked protoplast. This method of a PUFA PKS system; and/or (2) the introduction of a recom nutrition would greatly increase the potential for transfer of a binant nucleic acid molecule encoding a protein or functional bacterial PKS system into a eukaryotic cell if a mistake domain that affects the activity of a PUFA PKS system, into occurred and the bacterial cell (or its DNA) did not get a host. The host can include: (1) a host cell that does not digested and instead are functionally incorporated into the express any PKS system, wherein all functional domains of a 40 eukaryotic cell. PKS system are introduced into the host cell, and wherein at Included in the present invention as sources of PUFAPKS least one functional domain is from a non-bacterial PUFA genes (and proteins and domains encoded thereby) are any PKS system; (2) a host cell that expresses a PKS system Thraustochytrids other than those specifically described (endogenous or recombinant) having at least one functional herein that contain a PUFA PKS system. Such Thraus domain of a non-bacterial PUFA PKS system, wherein the 45 tochytrids include, but are not limited to, but are not limited introduced recombinant nucleic acid molecule can encode at to, any microorganism of the order Thraustochytriales, least one additional non-bacterial PUFA PKS domain func including any microorganism in the families Thraustochytri tion or another protein or domain that affects the activity of aceae and Labyrinthulaceae, which further comprise a genus the host PKS system; and (3) a host cell that expresses a PKS including, but not limited to: Thraustochytrium, system (endogenous or recombinant) which does not neces 50 Japonochytrium, Aplanochytrium, Elina and Schizochytrium sarily include a domain function from a non-bacterial PUFA within the Thraustochytriaceae and Labyrinthula, Labyrin PKS, and wherein the introduced recombinant nucleic acid thuloides, and Labyrinthomyxa within the Labyrinthulaceae. molecule includes a nucleic acid sequence encoding at least Preferred species within these genera include, but are not one functional domain of a non-bacterial PUFA PKS system. limited to: any species within Labyrinthula, including In other words, the present invention intends to encompass 55 Labrinthula sp., Labyrinthula algeriensis, Labyrinthula cien any genetically modified organism (e.g., microorganism or kowski, Labyrinthula chattonii, Labyrinthula coenocystis, plant), wherein the organism comprises at least one non Labyrinthula macrocystis, Labyrinthula macrocystis atlan bacterial PUFA PKS domain function (either endogenously tica, Labyrinthula macrocystis macrocystis, Labyrinthula or introduced by recombinant modification), and wherein the magnifica, Labyrinthula minuta, Labyrinthula roscoffensis, genetic modification has a measurable effect on the non 60 Labyrinthula valkanovii, Labyrinthula vitellina, Labyrin bacterial PUFA PKS domain function or on the PKS system thula vitellina pacifica, Labyrinthula vitellina vitellina, when the organism comprises a functional PKS system. Labyrinthula Zopfi; any Labyrinthuloides species, including The present invention encompasses many possible non Labyrinthuloides sp., Labyrinthuloides minuta, Labyrinthu bacterial and bacterial microorganisms as either possible host loides schizochytrops; any Labyrinthomyxa species, includ cells for the PUFA PKS systems described herein and/or as 65 ing Labyrinthomyxa sp., Labyrinthomyxa pohlia, Labyrinth sources for additional genetic material encoding PUFA PKS Onyxa Sauvageaui, any Aplanochytrium species, including system proteins and domains for use in the genetic modifica Aplanochytrium sp. and Aplanochytrium kerguelensis; any US 7,645,597 B2 59 60 Elina species, including Elina sp., Elina marisalba, Elina Cercomonads, Chrysophytes (for example the genera Antho Sinorifica; any Japanochytrium species, including physa, Chrysanoemba, Chrysosphaerella, Dendromonas, Japanochytrium sp., Japanochytrium marinum; any , Mallomonas, Ochromonas, Paraphysomonas, Schizochytrium species, including Schizochytrium sp., Poteriodchromonas, Spumella, Syncrypta, Synura, and Schizochytrium aggregatum, Schizochytrium limacinum, Uroglena), Collar flagellates, Cryptophytes (for example the Schizochytrium minutum, Schizochytrium Octosporum; and genera Chilomonas, Cryptomonas, Cyanomonas, and Goni any Thraustochytrium species, including Thraustochytrium Omonas), Dinoflagellates, Diplomonads, Euglenoids, Heter sp., Thraustochytrium aggregatum, Thraustochytrium arudi olobosea, Pedinellids, Pelobionts, Phalansteriids, Pseudo mentale, Thraustochytrium aureum, Thraustochytrium dendromonads, Spongomonads and Volvocales (and other benthicola, Thraustochytrium globosum, Thraustochytrium 10 flagellates including the unassigned flagellate genera of Arto kinnei, Thraustochytrium motivum, Thraustochytrium pachy discus, Clautriavia, Helkesimastix, Kathablepharis and Mul dermum, Thraustochytrium proliferum, Thraustochytrium ticilia). Amoeboid protozoans include the groups: Acti roseum, Thraustochytrium striatum, Ulkenia sp., Ulkenia nophryids, Centrohelids, Desmothoricids, Diplophryids, minuta, Ulkenia profilinda, Ulkenia radiate, Ulkenia Sarkari Eumamoebae, Heterolobosea, Leptomyxids, Nucleariid ana, and Ulkenia visurgensis. 15 filose amoebae, Pelebionts, Testate amoebae and Vampyrel It is noted that, without being bound by theory, the present lids (and including the unassigned amoebid genera Gym inventors consider Labyrinthula and other Labyrinthulaceae nophrys, Biomyxa, Microcometes, Reticulomyxa, Belonocys as sources of PUFA PKS genes because the Labyrinthulaceae tis, Elaeorhanis, Allelogromia, Gromia or Lieberkuhnia). are closely related to the Thraustochytriaceae which are The protozoan orders include the following: Percolomon known to possess PUFA PKS genes, the Labyrinthulaceae are adeae, Heterolobosea, Lyromonadea, Pseudociliata, Tri known to be bactivorous/phagocytotic, and some members of chomonadea, Hypermastigea, Heteromiteae, Telonemea, the Labyrinthulaceae have fatty acid/PUFA profiles consis Cyathobodonea, Ebridea, Pyytomyxea, Opalinea, Kineto tent with having a PUFA PKS system. monadea, Hemimastigea, Protostelea, Myxagastrea, Dicty Strains of microbes (other than the members of the Thraus ostelea, Choanomonadea, Apicomonadea, Eogregarinea, tochytrids) capable of bacterivory (especially by phagocyto 25 Neogregarinea, Coelotrolphea, Eucoccidea, Haemosporea, sis or endocytosis) can be found in the following microbial Piroplasmea, Spirotrichea, Prostomatea, Litostomatea, Phyl classes (including but not limited to example genera): lopharyngea, Nassophorea, Oligohymenophorea, Colpodea, In the algae and algae-like microbes (including Strameno Karyorelicta, Nucleohelea, Centrohelea, Acantharea, Sti piles): of the class Euglenophyceae (for example genera cholonchea, Polycystinea, Phaeodarea, Lobosea, Filosea, Euglena, and Peranema), the class Chrysophyceae (for 30 Athalamea, Monothalamea, Polythalamea, Xenophyopho example the genus Ochromonas), the class Dinobryaceae (for rea, Schizocladea, Holosea, Entamoebea, Myxosporea, Acti example the genera Dinobryon, Platychrysis, and Chrysoch nomyxea, Halosporea, Paramyxea, Rhombozoa and Ortho romulina), the Dinophyceae (including the genera Crypth nectea. ecodinium, Gymnodinium, Peridinium, Ceratium, Gyrod A preferred embodiment of the present invention includes inium, and Oxyrrhis), the class Cryptophyceae (for example 35 strains of the microorganisms listed above that have been the genera Cryptomonas, and Rhodomonas), the class Xan collected from one of the preferred habitats listed above. thophyceae (for example the genus Olisthodiscus) (and In some embodiments of this method of the present inven including forms of algae in which an amoeboid stage occurs tion, PUFA PKS systems from bacteria, including genes and as in the flagellates Rhizochloridaceae, and Zoospores/ga portions thereof (encoding entire PUFA PKS systems, pro metes of Aphanochaete pascheri, Bumilleria Stigeoclonium 40 teins thereof and/or domains thereof) can be used to geneti and Vaucheria geminata), the class Eustigmatophyceae, and cally modify other PUFA PKS systems (e.g., any non-bacte the class Prymnesiopyceae (including the genera Prynne rial PUFA PKS system) and/or microorganisms containing sium and Diacronema). the same (or vice versa) in the embodiments of the invention. In the Stramenopiles including the: Proteromonads, Opa In one aspect, novel PUFA PKS systems can be identified in lines, Developavella, Diplophorys, Labyrinthulids, Thraus 45 bacteria that are expected to be particularly useful for creating tochytrids, Bicosecids, Oomycetes, Hypochytridiomycetes, genetically modified microorganisms (e.g., genetically modi Commation, Reticulosphaera, Pelagomonas, Pelapococcus, fied Thraustochytrids) and/or novel hybrid constructs encod Ollicola, Aureococcus, Parmales, Raphidiophytes, Synurids, ing PUFAPKS systems for use in the methods and genetically Rhizochromulinaales, Pedinellales, Dictyochales, Chrysom modified microorganisms and plants of the present invention. eridales, Sarcinochrysidales, Hydrurales, Hibberdiales, and 50 In one aspect, bacteria that may be particularly useful in the . embodiments of the present invention have PUFA PKS sys In the Fungi. Class Myxomycetes (form myxamoebae)— tems, wherein the PUFA PKS system is capable of producing slime molds, class Acrasieae including the orders Acrasiceae PUFAs at temperatures exceeding about 20° C., preferably (for example the genus Sappinia), class Guttulinaceae (for exceeding about 25°C. and even more preferably exceeding example the genera Guttulinopsis, and Guttulina), class Dic 55 about 30°C. As described previously herein, the marine bac ty Steliaceae (for example the genera Acrasis, Dictyostelium, teria, Shewanella and Vibrio marinus, described in U.S. Pat. Polysphondylium, and Coenonia), and class Phycomyceae No. 6,140,486, do not produce PUFAs at higher temperatures, including the orders Chytridiales, Ancylistales, Blastocladi which limits the usefulness of PUFA PKS systems derived ales, Monoblepharidales, Saprolegniales, Peronosporales, from these bacteria, particularly in plant applications under Mucorales, and Entomophthorales. 60 field conditions. Therefore, in one embodiment, the screening In the Protozoa: Protozoa strains with life stages capable of method of the present invention can be used to identify bac bacterivory (including by phageocytosis) can be selected teria that have a PUFA PKS system, wherein the bacteria are from the types classified as ciliates, flagellates or amoebae. capable of growth and PUFA production at higher tempera Protozoan ciliates include the groups: Chonotrichs, Colpo tures (e.g., above about 15° C., 20° C., 25°C., or 30° C. or dids, Cyrtophores, Haptorids, Karyorelicts, Oligohymeno 65 even higher). However, even if the bacteria sources do not phora, Polyhymenophora (spirotrichs), Prostomes and Suc grow well and/or produce PUFAs at the higher temperatures, toria. Protozoan flagellates include the Biosoecids, Bodonids, the present invention encompasses the identification, isola US 7,645,597 B2 61 62 tion and use of the PUFA PKS systems (genes and proteins/ see Example 7). Alternatively, whole DNA fragments can be domains encoded thereby), wherein the PUFA PKS systems cloned directly from purified environmental DNA by any of from the bacteria have enzymatic/biological activity at tem several methods known to the art. Sequence of the DNA peratures above about 15°C., 20°C., 25°C., or 30° C. or even fragments thus obtained can reveal homologs to known genes higher. In one aspect of this embodiment, inhibitors of 5 such as PUFA PKS genes. Homologs of OrfB and Orf eukaryotic growth such as nystatin (antifungal) or cyclohex (referring to the domain structure of Schizochytrium and imide (inhibitor of eukaryotic protein synthesis) can be added Thraustochytrium, for example) may be particularly useful in to agar plates used to culture/select initial Strains from water defining the PUFA PKS end product. Whole coding regions samples/soil samples collected from the types of habitats/ of PUFA PKS genes can then be expressed in host organisms niches such as marine or estuarian habits, or any other habitat 10 (such as Escherichia coli or yeast) in combination with each where such bacteria can be found. This process would help other or with known PUFA PKS gene or gene fragment com select for enrichment of bacterial strains without (or minimal) binations to evaluate their effect on PUFA production. As contamination of eukaryotic strains. This selection process, described above, activity in cell-free extracts can be used to in combination with culturing the plates at elevated tempera determine function at desired temperatures. Isolated PUFA tures (e.g. 30° C.), and then selecting strains that produce at 15 PKS genes can also be transformed directly into appropriate least one PUFA would initially identify candidate bacterial Schizochytrium or other suitable strains to measure function. strains with a PUFA PKS system that is operative at elevated PUFA PKS system-encoding constructs identified or pro temperatures (as opposed to those bacterial strains in the prior duced in Such a manner, including hybrid constructs, can also art which only exhibit PUFA production attemperatures less be used to transform other organisms. Such as plants. than about 20° C. and more preferably below about 5°C.). Therefore, using the non-bacterial PUFA PKS systems of However, even in bacteria that do not grow well (or at all) the present invention, which, for example, makes use of genes at higher temperatures, or that do not produce at least one from Thraustochytrid PUFA PKS systems, as well as PUFA PUFA at higher temperatures, such strains can be identified PKS systems and PKS systems from bacteria, gene mixing and selected as comprising a PUFA PKS system by the iden can be used to extend the range of PUFA products to include tification of the ability of the bacterium to produce PUFAs 25 EPA, DHA, ARA, GLA, SDA and others (described in detail under any conditions and/or by Screening the genome of the below), as well as to produce a wide variety of bioactive bacterium for genes that are homologous to other known molecules, including antibiotics, other pharmaceutical com PUFA PKS genes from bacteria or non-bacterial organisms pounds, and other desirable products. The method to obtain (e.g., see Example 7). To evaluate PUFA PKS function at these bioactive molecules includes not only the mixing of higher temperatures for genes from any bacterial source, one 30 genes from various organisms but also various methods of can produce cell-free extracts and test for PUFA production at genetically modifying the non-bacterial PUFA PKS genes various temperatures, followed by selection of microorgan disclosed herein. Knowledge of the genetic basis and domain isms that contain PUFA PKS genes that have enzymatic/ structure of the non-bacterial PUFA PKS system of the biological activity at higher temperature ranges (e.g., 15°C., present invention provides a basis for designing novel geneti 20° C., 25°C., or 30° C. or even higher). 35 cally modified organisms which produce a variety of bioac Suitable bacteria to use as hosts for genetic modification tive molecules. Although mixing and modification of any include any bacterial strain as discussed above. Particularly PKS domains and related genes are contemplated by the suitable bacteria to use as a source of PUFA PKS genes (and present inventors, by way of example, various possible proteins and domains encoded thereby) for the production of manipulations of the PUFA-PKS system are discussed below genetically modified sequences and organisms according to 40 with regard to genetic modification and bioactive molecule the present invention include any bacterium that comprises a production. PUFA PKS system. Such bacteria are typically isolated from Accordingly, encompassed by the present invention are marine or estuarian habitats and can be readily identified by methods to genetically modify microbial or plant cells by: their ability to product PUFAs and/or by the presence of one genetically modifying at least one nucleic acid sequence in or more genes having homology to known PUFA PKS genes 45 the organism that encodes an amino acid sequence having the in the organism. Suchbacteria can include, but are not limited biological activity of at least one functional domain of a to, bacteria of the genera Shewanella and Vibrio. Preferred non-bacterial PUFA PKS system according to the present bacteria for use in the present invention include those with invention, and/or expressing at least one recombinant nucleic PUFA PKS systems that are biologically active at higher acid molecule comprising a nucleic acid sequence encoding temperatures (e.g., above about 15° C., 20° C., 25°C., or 30° 50 Such amino acid sequence. Various embodiments of Such C. or even higher). The present inventors have identified two sequences, methods to genetically modify an organism, and exemplary bacteria (e.g. Shewanella Olleyana and specific modifications have been described in detail above. Shewanella japonica; see Examples 7 and 8) that will be Typically, the method is used to produce a particular geneti particularly suitable for use as sources of PUFA PKS genes, cally modified organism that produces a particular bioactive and others can be readily identified or are known to comprise 55 molecule or molecules. PUFA PKS genes and may be useful in an embodiment of the One embodiment of the present invention relates to a present invention (e.g., Shewanella gelidimarina). genetically modified Thraustochytrid microorganism, Furthermore, it is recognized that not all bacterial or non wherein the microorganism has an endogenous polyunsatu bacterial microorganisms can be readily cultured from natu rated fatty acid (PUFA) polyketide synthase (PKS) system, ral habitats. However, genetic characteristics of such un-cul 60 and wherein the endogenous PUFA PKS system has been turable microorganisms can be evaluated by isolating genes genetically modified to alter the expression profile of a poly from DNA prepared en mass from mixed or crude environ unsaturated fatty acid (PUFA) by the microorganism as com mental samples. Particularly Suitable to the present invention, pared to the Thraustochytrid microorganism in the absence of PUFA PKS genes derived from un-culturable microorgan the modification. Thraustochytrid microorganisms useful as isms can be isolated from environmental DNA samples by 65 host organisms in the present invention endogenously contain degenerate PCR using primers designed to generally match and express a PUFA PKS system. The genetic modification regions of high similarity in known PUFA PKS genes (e.g., can be a genetic modification of one or more of the functional US 7,645,597 B2 63 64 domains of the endogenous PUFA PKS system, whereby the identify other organisms useful in the present method and all modification alters the PUFA production profile of the endog Such organisms are encompassed herein. enous PUFAPKS system. In addition, or as an alternative, the This embodiment of the present invention can be illustrated genetic modification can be an introduction of at least one as follows. By way of example, based on the present inven exogenous nucleic acid sequence (e.g., a recombinant nucleic acid molecule) to the microorganism, wherein the exogenous tors' current understanding of PUFA synthesis and accumu nucleic acid sequence encodes at least one biologically active lation in Schizochytrium, the overall biochemical process can domain or protein from a second PKS system and/or a protein be divided into three parts. that affects the activity of the PUFA PKS system (e.g., a First, the PUFAs that accumulate in Schizochytrium oil phosphopantetheinyl transferases (PPTase)). The second 10 (DHA and DPA) are the product of a PUFA PKS system as PKS system can be any PKS system, including other PUFA discussed above. The PUFA PKS system in Schizochytrium PKS systems and including homologues of genes from the converts malonyl-CoA into the end product PUFA without Thraustochytrid PUFA PKS system to be genetically modi release of significant amounts of intermediate compounds. In fied. Schizochytrium, three genes have been identified (Orfs A, B This embodiment of the invention is particularly useful for 15 and C; also represented by SEQID NO:1, SEQID NO:3 and the production of commercially valuable lipids enriched in a SEQID NO:5, respectively) that encode all of the enzymatic desired PUFA, such as EPA, via the present inventors devel domains known to be required for actual synthesis of PUFAs. opment of genetically modified microorganisms and methods Similar sets of genes (encoding proteins containing homolo for efficiently producing lipids (triacylglyerols (TAG) as well gous sets of enzymatic domains) have been cloned and char as membrane-associated phospholipids (PL)) enriched in acterized from several other non-eukaryotic organisms that PUFAS. produce PUFAs, namely, several strains of marine bacteria. In This particular embodiment of the present invention is addition, the present inventors have identified and now derived in part from the following knowledge: (1) utilization sequenced PUFA PKS genes in at least one other marine of the inherent TAG production capabilities of selected protist (Thraustochytrium strain 23B) (described in detail microorganisms, and particularly, of Thraustochytrids. Such 25 as the commercially developed Schizochytrium strain below). described herein; (2) the present inventors detailed under The PUFA products of marine bacteria include EPA (e.g., standing of PUFA PKS biosynthetic pathways (i.e., PUFA produced by Shewanella SRC2738 and Photobacter profiln PKS systems) in eukaryotes and in particular, in members of dum) as well as DHA (Vibrio marinus, now known as Mori the order Thraustochytriales; and, (3) utilization of a homolo 30 tella marina) (described in U.S. Pat. No. 6,140,486, supra; gous genetic recombination system in Schizochytrium. Based and in U.S. Pat. No. 6,566,583, supra). It is an embodiment of on the inventors knowledge of the systems involved, the the invention that any PUFA PKS gene set could be envi same general approach may be exploited to produce PUFAS sioned to substitute for the Schizochytrium genes described in other than EPA. the example herein, as long as the physiological growth In one embodiment of the invention, the endogenous 35 requirements of the production organism (e.g., Thraustochytrid PUFA PKS genes, such as the Schizochytrium) in fermentation conditions were satisfied. In Schizochytrium genes encoding PUFA PKS enzymes that particular, the PUFA-producing bacterial strains described normally produce DHA and DPA, are modified by random or above grow only at relatively low temperatures (typically less targeted mutagenesis, replaced with genes from other organ than 20° C.) which further indicates that their PUFA PKS isms that encode homologous PKS proteins (e.g., from bac 40 gene products will not function at standard growth tempera teria or other sources), or replaced with genetically modified tures for Schizochytrium (25-30°C.). However, the inventors Schizochytrium, Thraustochytrium or other Thraustochytrid have recently identified at least two other marine bacteria that PUFA PKS genes. The product of the enzymes encoded by grow and produce EPA at Standard growth temperatures for these introduced and/or modified genes can be EPA, for Schizochytrium and other Thraustochytrids (see Example 7). example, or it could be some other related molecule, includ 45 These alternate marine bacteria have been shown to possess ing other PUFAs. One feature of this method is the utilization PUFA-PKS-like genes that will serve as material for modifi of endogenous components of Thraustochytrid PUFA synthe cation of Schizochytrium and other Thraustochytrids by sis and accumulation machinery that is essential for efficient methods described herein. It will be apparent to those skilled production and incorporation of the PUFA into PL and TAG. in the art from this disclosure that other currently unstudied or In particular, this embodiment of the invention is directed to 50 unidentified PUFA-producing bacteria could also contain the modification of the type of PUFA produced by the organ PUFA PKS genes useful for modification of Thraus ism, while retaining the high oil productivity of the parent tochytrids. strain. Second, in addition to the genes that encode the enzymes Although some of the following discussion uses the organ directly involved in PUFA synthesis, an “accessory” enzyme ism Schizochytrium as an exemplary host organism, any 55 is required. The gene encodes a phosphopantetheline trans Thraustochytrid can be modified according to the present ferase (PPTase) that activates the acyl-carrier protein (ACP) invention, including members of the genera Thraus domains present in the PUFA PKS complex. Activation of the tochytrium, Labyrinthuloides, and Japonochytrium. For ACP domains by addition of this co-factor is required for the example, the genes encoding the PUFA PKS system for a PUFA PKS enzyme complex to function. All of the ACP species of Thraustochytrium have been identified (see 60 domains of the PUFA PKS systems identified so far show a Example 6), and this organism can also serve as a host organ high degree of amino acid sequence conservation and, with ism for genetic modification using the methods described out being bound by theory, the present inventors believe that herein, although it is more likely that the Thraustochytrium the PPTase of Schizochytrium and other Thraustochytrids will PKS genes will be used to modify the endogenous PUFAPKS recognize and activate ACP domains from other PUFA PKS genes of another Thraustochytrid, Such as Schizochytrium. 65 systems. As proof of principle that heterologous PPTases and Furthermore, using methods for screening organisms as set PUFA PKS genes can function together to produce a PUFA forth in U.S. application Ser. No. 10/124,800, supra, one can product, the present inventors demonstrate herein the use of US 7,645,597 B2 65 66 two different heterologous PPTases with the PUFA PKS Second, the present inventors have previously found that genes from Schizochytrium to produce a PUFA in a bacterial PPTases can activate heterologous PUFA PKSACP domains. host cell. Production of DHA in E. coli transformed with the PUFA Third, in Schizochytrium, the products of the PUFA PKS PKS genes from Vibrio marinus occurred only when an system are efficiently channeled into both the phospholipids appropriate PPTase gene (in this case, from Shewanella (PL) and triacylglycerols (TAG). The present inventors’ data SCRC2738) was also present (see U.S. Pat. No. 6,140,486, suggest that the PUFA is transferred from the ACP domains of supra). This demonstrated that the Shewanella PPTase was the PKS complex to coenzyme A (CoA). As in other eukary able to activate the Vibrio PUFA PKS ACP domains. Addi otic organisms, this acyl-CoA would then serve as the Sub tionally, the present inventors have now demonstrated the strate for the various acyl-transferases that form the PL and 10 activation (pantetheinylation) of ACP domains from TAG molecules. In contrast, the data indicate that in bacteria, Schizochytrium Orf A using a PPTase (sfp) from Bacillus transfer to CoA does not occur; rather, there is a direct transfer subtilus (see Example 2). The present inventors have also from the ACP domains of the PKS complex to the acyl demonstrated activation (pantetheinylation) of ACP domains transferases that form PL. The enzymatic system in from Schizochytrium Orf A by a PPTase called Het I from Schizochytrium that transfers PUFA from ACP to CoA clearly 15 Nostoc (see Example 2). The HetI enzyme was additionally used as the PPTase in the experiments discussed above for the can recognize both DHA and DPA and therefore, the present production of DHA and DPA in E. coli using the recombinant inventors believe that it is predictable that any PUFA product Schizochytrium PUFA PKS genes (Example 2). of the PUFA PKS system (as attached to the PUFA PKSACP Third, data indicate that DHA-CoA and DPA-CoA may be domains) will serve as a Substrate. metabolic intermediates in the Schizochytrium TAG and PL Therefore, in one embodiment of the present invention, the synthesis pathway. Published biochemical data Suggest that present inventors propose to alter the genes encoding the in bacteria, the newly synthesized PUFAs are transferred components of the PUFA PKS enzyme complex (part 1) directly from the PUFA PKS ACP domains to the phospho while utilizing the endogenous PPTase from Schizochytrium lipid synthesis enzymes. In contrast, the present inventors or another Thraustochytrid host (part 2) and PUFA-ACP to 25 data indicate that in Schizochytrium, a eukaryotic organism, PUFA-CoA transferase activity and TAG/PL synthesis sys there may be an intermediate between the PUFA on the PUFA tems (or other endogenous PUFAACP to TAG/PL mecha PKSACP domains and the target TAG and PL molecules. The nism) (part 3). These methods of the present invention are typical carrier of fatty acids in the eukaryotic cytoplasm is Supported by experimental data, Some of which are presented CoA. The inventors examined extracts of Schizochytrium in the Examples section in detail. 30 cells and found significant levels of compounds that co-mi First, the present inventors have found that the PUFA PKS grated during HPLC fractionation with authentic standards of system can be transferred between organisms, and that some DHA-CoA, DPA-CoA, 16:0-CoA and 18:1-CoA. The iden parts are interchangeable. More particularly, it has been pre tity of the putative DHA-CoA and DPA-CoA peaks were viously shown that the PUFA PKS pathways of the marine confirmed using mass spectroscopy. In contrast, the inventors bacteria, Shewanella SCR2738 (Yazawa, 1996, Lipids 35 were notable to detect DHA-CoA in extracts of Vibrio mari 31:S297-300) and Vibrio marinus (along with the PPTase nus, again Suggesting that a different mechanism exists in from Shewanella) (U.S. Pat. No. 6,140.486), can be success bacteria for transfer of the PUFA to its final target (e.g., direct fully transferred to a heterologous host (i.e., to E. coli). Addi transfer to PL). The data indicate a mechanism likely exists in tionally, the degree of structural homology between the sub Schizochytrium for transfer of the newly synthesized PUFA to units of the PUFA PKS enzymes from these two organisms 40 CoA (probably via a direct transfer from the ACP to CoA). (Shewanella SCRC2738 and Vibrio marinus) is such that it Both TAG and PL synthesis enzymes could then access this has been possible to mix and match genes from the two PUFA-CoA. The observation that both DHA and DPA CoA systems (U.S. Pat. No. 6,140,486, supra). The PUFA end are produced Suggests that the enzymatic transfer machinery product of the mixed sets of genes varied depending on the may recognize a range of PUFAs. origins of the specific gene homologues. At least one open 45 Fourth, the present inventors have now created knockouts reading frame (Shewanella's Orf 7 and its Vibrio marinus of OrfA, OrfB, and Orfo in Schizochytrium (see Example 3). homologue; see FIG. 13 of U.S. Pat. No. 6,140.486; note that The knockout strategy relies on the homologous recombina the nomenclature for this Orfhas changed; it is labeled as Orf tion that has been demonstrated to occur in Schizochytrium 8 in the patent, but was submitted to Genbank as Orf7, and is (see U.S. patent application Ser. No. 10/124.807, supra). Sev now referred to by its GenBank designation) could be asso 50 eral strategies can be employed in the design of knockout ciated with determination of whether DHA or EPA would be constructs. The specific strategy used to inactivate these three the product of the composite system. The functional domains genes utilized insertion of a ZeocinTM resistance gene of all of the PUFA PKS enzymes identified so far show coupled to a tubulin promoter (derived from pMON50000, sequence homology to one another. Similarly, these data indi see U.S. patent application Ser. No. 10/124,807) into a cloned cated that PUFA PKS systems, including those from the 55 portion of the Orf. The new construct containing the inter marine bacteria, can be transferred to, and will function in, rupted coding region was then used for the transformation of Schizochytrium and other Thraustochytrids. wildtype Schizochytrium cells via particle bombardment (see The present inventors have now expressed the PUFA PKS U.S. patent application Ser. No. 10/124,807). Bombarded genes (Orfs A, B and C) from Schizochytrium in an E. coli cells were spread on plates containing both ZeocinTM and a host and have demonstrated that the cells made DHA and 60 supply of PUFA (see below). Colonies that grew on these DPA in about the same ratio as the endogenous production of plates were then streaked onto ZeocinTM plates that were not these PUFAs in Schizochytrium (see Example 2). Therefore, supplemented with PUFAs. Those colonies that required it has been demonstrated that the recombinant PUFA supplementation for growth were candidates for hav Schizochytrium PUFA PKS genes encode a functional PUFA ing had the PUFA PKS Orf inactivated via homologous synthesis system. Additionally, all or portions of the Thraus 65 recombination. In all three cases, this presumption was con tochytrium 23B OrfA and Orf genes have been shown to firmed by rescuing the knockout by transforming the cells function in Schizochytrium (see Example 6). with a full-length genomic DNA clones of the respective US 7,645,597 B2 67 68 Schizochytrium Orfs. Furthermore, in some cases, it was The present invention can make use of genes and nucleic found that the ZeocinTM resistance gene had been removed acid sequences which encode proteins or domains from PKS (see Example 5), indicating that the introduced functional systems other than the PUFA PKS system described herein gene had integrated into the original site by double homolo and in U.S. patent application Ser. No. 10/124.800, and gous recombination (i.e. deleting the resistance marker). One include genes and nucleic acid sequences from bacterial and key to the Success of this strategy was Supplementation of the non-bacterial PKS systems, including PKS systems of Type growth medium with PUFAs. In the present case, an effective II, Type I and modular, described above. Organisms which means of Supplementation was found to be sequestration of express each of these types of PKS systems are known in the the PUFA by mixing with partially methylated beta-cyclo art and can serve as sources for nucleic acids useful in the dextrin prior to adding to the growth medium (see Example 10 genetic modification process of the present invention. 5). Together, these experiments demonstrate the principle that In a preferred embodiment, genes and nucleic acid one of skill in the art, given the guidance provided herein, can sequences which encode proteins or domains from PKS sys inactivate one or more of the PUFA PKS genes in a PUFA tems other than the PUFA PKS system or from other PUFA PKS-containing microorganism such as Schizochytrium, and PKS systems are isolated or derived from organisms which create a PUFA auxotroph which can then be used for further 15 have preferred growth characteristics for production of genetic modification (e.g., by introducing other PKS genes) PUFAs. In particular, it is desirable to be able to culture the according to the present invention (e.g., to alter the fatty acid genetically modified Thraustochytrid microorganism attem profile of the recombinant organism). peratures greater than about 15° C., greater than 20° C. One important element of the genetic modification of the greater than 25°C., greater than 30°C., greater than 35°C., organisms of the present invention is the ability to directly greater than 40°C., or in one embodiment, at any temperature transform a Thraustochytridgenome. In U.S. application Ser. between about 20° C. and 40°C. Therefore, PKS proteins or No. 10/124.807, Supra, transformation of Schizochytrium via domains having functional enzymatic activity at these tem single crossover homologous recombination and targeted peratures are preferred. For example, the present inventors gene replacement via double crossover homologous recom describe herein the use of PKS genes from Shewanella bination were demonstrated. As discussed above, the present 25 Olleyana or Shewanella japonica, which are marine bacteria inventors have now used this technique for homologous that naturally produce EPA and grow at temperatures up to recombination to inactivate Orf A, Orf B and Orf of the 30° C. and 35°C., respectively (see Example 7). PKS proteins PUFA-PKA system in Schizochytrium. The resulting mutants or domains from these organisms are examples of proteins are dependent on supplementation of the media with PUFA. and domains that can be mixed with Thraustochytrid PUFA Several markers of transformation, promoter elements for 30 PKS proteins and domains as described herein to produce a high level expression of introduced genes and methods for genetically modified organism that has a specifically delivery of exogenous genetic material have been developed designed or modified PUFA production profile. and are available. Therefore, the tools are in place for knock In another preferred embodiment, the genes and nucleic ing out endogenous PUFA PKS genes in Thraustochytrids acid sequences that encode proteins or domains from a PUFA and other eukaryotes having similar PUFA PKS systems and 35 PKS system that produces one fatty acid profile are used to replacing them with genes from other organisms (or with modify another PUFA PKS system and thereby alter the fatty modified Schizochytrium genes) as proposed above. acid profile of the host. For example, Thraustochytrium 23B In one approach for production of EPA-rich TAG, the (ATCC 20892) is significantly different from Schizochytrium PUFA PKS system of Schizochytrium can be altered by the sp. (ATCC 20888) in its fatty acid profile. Thraustochytrium addition of heterologous genes encoding a PUFAPKS system 40 23B can have DHA: DPA(n-6) ratios as high as 40:1 compared whose product is EPA. It is anticipated that the endogenous to only 2-3:1 in Schizochytrium (ATCC 20888). Thraus PPTase will activate the ACP domains of that heterologous tochytrium 23B can also have higher levels of C20:5(n-3). PUFA PKS system. Additionally, it is anticipated that the However, Schizochytrium (ATCC 20888) is an excellent oil EPA will be converted to EPA-CoA and will readily be incor producer as compared to Thraustochytrium 23B. porated into Schizochytrium TAG and PL membranes. In one 45 Schizochytrium accumulates large quantities of triacylglyc modification of this approach, techniques can be used to erols rich in DHA and docosapentaenoic acid (DPA; 22:5c)6); modify the relevant domains of the endogenous e.g., 30% DHA+DPA by dry weight. Therefore, the present Schizochytrium system (either by introduction of specific inventors describe herein the modification of the regions of heterologous genes or by mutagenesis of the Schizochytrium endogenous PUFA PKS system with Thraus Schizochytrium genes themselves) Such that its end product is 50 tochytrium 23B PUFA PKS genes to create a genetically EPA rather than DHA and DPA. This is an exemplary modified Schizochytrium with a DHA: DPA profile more approach, as this technology can be applied to the production similar to Thraustochytrium 23B (i.e., a “super-DHA-pro of other PUFA end products and to any eukaryotic microor ducer Schizochytrium, wherein the production capabilities ganism that comprises a PUFA PKS system and that has the of the Schizochytrium combine with the DHA:DPA ratio of ability to efficiently channel the products of the PUFA PKS 55 Thraustochytrium). system into both the phospholipids (PL) and triacylglycerols Therefore, the present invention makes use of genes from (TAG). In particular, the invention is applicable to any Thraustochytrid PUFA PKS systems, and further utilizes Thraustochytrid microorganism or any other eukaryote that gene mixing to extend and/or alter the range of PUFA prod has an endogenous PUFA PKS system, which is described in ucts to include EPA, DHA, DPA, ARA, GLA, SDA and detail below by way of example. In addition, the invention is 60 others. The method to obtain these altered PUFA production applicable to any suitable host organism, into which the modi profiles includes not only the mixing of genes from various fied genetic material for production of various PUFA profiles organisms into the Thrasustochytrid PUFA PKS genes, but as described herein can be transformed. For example, in the also various methods of genetically modifying the endog Examples, the PUFA PKS system from Schizochytrium is enous Thraustochytrid PUFA PKS genes disclosed herein. transformed into an E. coli. Such a transformed organism 65 Knowledge of the genetic basis and domain structure of the could then be further modified to alter the PUFA production Thraustochytrid PUFA PKS system of the present invention profile using the methods described herein. (e.g., described in detail for Schizochytrium above) provides US 7,645,597 B2 69 70 a basis for designing novel genetically modified organisms and then screen for those that had incorporated the new CLF. which produce a variety of PUFA profiles. Novel PUFA PKS Again, one would analyze these transformants for any effects constructs prepared in microorganisms such as a Thraus on fatty acid profiles to identify transformants producing EPA tochytrid can be isolated and used to transform plants to and/or ARA. If some factor other than those associated with impart similar PUFA production properties onto the plants. the CLF is found to influence the chain length of the end Any one or more of the endogenous Thraustochytrid PUFA product, a similar strategy could be employed to alter those PKS domains can be altered or replaced according to the factors. present invention, provided that the modification produces In another aspect of the invention, modification or Substi the desired result (i.e., alteration of the PUFA production tution of the B-hydroxy acyl-ACP dehydrase/keto synthase profile of the microorganism). Particularly preferred domains 10 pairs is contemplated. During cis-vaccenic acid (C18:1, A11) to alter or replace include, but are not limited to, any of the synthesis in E. coli, creation of the cis double bond is believed domains corresponding to the domains in Schizochytrium to depend on a specific DH enzyme, B-hydroxy acyl-ACP OrfB or Orf (B-keto acyl-ACP synthase (KS), acyltrans dehydrase, the product of the fabA gene. This enzyme ferase (AT), FabA-like B-hydroxyacyl-ACP dehydrase (DH), removes HOH from a B-keto acyl-ACP and leaves a trans chain length factor (CLF), enoyl ACP-reductase (ER), an 15 double bond in the carbon chain. A subset of DH's, FabA enzyme that catalyzes the synthesis of trans-2-acyl-ACP, an like, possess cis-trans isomerase activity (Heath et al., 1996, enzyme that catalyzes the reversible isomerization of trans supra). A novel aspect of bacterial and non-bacterial PUFA 2-acyl-ACP to cis-3-acyl-ACP, and an enzyme that catalyzes PKS systems is the presence of two FabA-like DH domains. the elongation of cis-3-acyl-ACP to cis-5-3-keto-acyl-ACP). Without being bound by theory, the present inventors believe In one embodiment, preferred domains to alter or replace that one or both of these DH domains will possess cis-trans include, but are not limited to. B-keto acyl-ACP synthase isomerase activity (manipulation of the DH domains is dis (KS), FabA-like B-hydroxy acyl-ACP dehydrase (DH), and cussed in greater detail below). chain length factor (CLF). Another aspect of the unsaturated fatty acid synthesis in E. In one aspect of the invention, Thraustochytrid PUFA-PKS coli is the requirement for a particular KS enzyme, B-ketoa PUFA production is altered by modifying the CLF (chain 25 cyl-ACP synthase, the product of the fabB gene. This is the length factor) domain. This domain is characteristic of Type II enzyme that carries out condensation of a fatty acid, linked to (dissociated enzymes) PKS systems. Its amino acid sequence a cysteine residue at the active site (by a thio-ester bond), with shows homology to KS (keto synthase pairs) domains, but it a malonyl-ACP. In the multi-step reaction, CO, is released lacks the active site cysteine. CLF may function to determine and the linear chain is extended by two carbons. It is believed the number of elongation cycles, and hence the chain length, 30 that only this KS can extend a carbon chain that contains a of the end product. In this embodiment of the invention, using double bond. This extension occurs only when the double the current state of knowledge of FAS and PKS synthesis, a bond is in the cis configuration; if it is in the trans configura rational strategy for production of ARA by directed modifi tion, the double bond is reduced by enoyl-ACP reductase cation of the non-bacterial PUFA-PKS system is provided. (ER) prior to elongation (Heath et al., 1996, supra). All of the There is controversy in the literature concerning the function 35 PUFA-PKS systems characterized so far have two KS of the CLF in PKS systems (Bisang et al., Nature 401, 502 domains, one of which shows greater homology to the FabB (1999);Yi et al., J. Am. Chem. Soc. 125, 12708 (2003)) and it like KS of E. coli than the other. Again, without being bound is realized that other domains may be involved in determina by theory, the present inventors believe that in PUFA-PKS tion of the chain length of the end product. However, it is systems, the specificities and interactions of the DH (FabA significant that Schizochytrium produces both DHA (C22:6. 40 like) and KS (FabB-like) enzymatic domains determine the (0-3) and DPA (C22:5, co-6). In the PUFA-PKS system the cis number and placement of cis double bonds in the end prod double bonds are introduced during synthesis of the growing ucts. Because the number of 2-carbon elongation reactions is carbon chain. Since placement of the co-3 and (O-6 double greater than the number of double bonds present in the PUFA bonds occurs early in the synthesis of the molecules, one PKS end products, it can be determined that in some exten would not expect that they would affect subsequent end 45 sion cycles complete reduction occurs. Thus the DH and KS product chain length determination. Thus, without being domains can be used as targets for alteration of the DHA/DPA bound by theory, the present inventors believe that introduc ratio or ratios of other long chain fatty acids. These can be tion of a factor (e.g. CLF) that directs synthesis of C20 units modified and/or evaluated by introduction of homologous (instead of C22 units) into the Schizochytrium PUFA-PKS domains from other systems or by mutagenesis of these gene system will result in the production of EPA (C20:5, co-3) and 50 fragments. ARA (C20:4. (O-6). For example, in heterologous systems, In another embodiment, the ER (enoyl-ACP reductase—an one could exploit the CLF by directly substituting a CLF from enzyme which reduces the trans-double bond in the fatty an EPA producing system (such as one from Photobacterium, acyl-ACP resulting in fully saturated carbons) domains can or preferably from a microorganism with the preferred be modified or substituted to change the type of product made growth requirements as described below) into the 55 by the PKS system. For example, the present inventors know Schizochytrium gene set. The fatty acids of the resulting trans that Schizochytrium PUFA-PKS system differs from the pre formants can then be analyzed for alterations in profiles to viously described bacterial systems in that it has two (rather identify the transformants producing EPA and/or ARA. than one) ER domains. Without being bound by theory, the By way of example, in this aspect of the invention, one present inventors believe these ER domains can strongly could construct a clone with the CLF of OrfB replaced with a 60 influence the resulting PKS production product. The resulting CLF from a C20 PUFA-PKS system. A marker gene could be PKS product could be changed by separately knocking out inserted downstream of the coding region. More specifically, the individual domains or by modifying their nucleotide one can use the homologous recombination system for trans sequence or by Substitution of ER domains from other organ formation of Thraustochytrids as described herein and in 1SS. detail in U.S. patent application Ser. No. 10/124.807, supra. 65 In another aspect of the invention, substitution of one of the One can then transform the wild type Thraustochytrid cells DH (FabA-like) domains of the PUFA-PKS system for a DH (e.g., Schizochytrium cells), select for the marker phenotype, domain that does not posses isomerization activity is contem US 7,645,597 B2 71 72 plated, potentially creating a molecule with a mix of cis- and systems into the Thraustochytridgenes (including genes from trans-double bonds. The current products of the non-Thraustochytrid microorganisms and genes from differ Schizochytrium PUFA PKS system are DHA and DPA (C22:5 ent Thraustochytrid microorganisms). The PUFA PKS sys (O6). If one manipulated the system to produce C20 fatty tem can be expressed in the E. coli and the PUFA production acids, one would expect the products to be EPA and ARA profile measured. In this manner, potential genetic modifica (C20:4 (O6). This could provide a new source for ARA. One tions can be evaluated prior to manipulation of the Thraus could also substitute domains from related PUFA-PKS sys tochytrid PUFA production organism. tems that produced a different DHA to DPA ratio for The present invention includes the manipulation of endog example by using genes from Thraustochytrium 23B (the enous nucleic acid molecules and/or the use of isolated PUFA PKS system of which is identified in U.S. patent appli 10 nucleic acid molecules comprising a nucleic acid sequence cation Ser. No. 10/124.800, supra). from a Thraustochytrid PUFA PKS system or a homologue Additionally, in one embodiment, one of the ER domains is thereof. In one aspect, the present invention relates to the altered in the Thraustochytrid PUFA PKS system (e.g. by modification and/or use of a nucleic acid molecule compris removing or inactivating) to alter the end product profile. ing a nucleic acid sequence encoding a domain from a PUFA Similar strategies could be attempted in a directed manner for 15 PKS system having a biological activity of at least one of the each of the distinct domains of the PUFA-PKS proteins using following proteins: malonyl-CoA:ACP acyltransferase more or less Sophisticated approaches. Of course one would (MAT), 3-ketoacyl-ACP synthase (KS), ketoreductase (KR), not be limited to the manipulation of single domains. Finally, acyltransferase (AT), Fab A-like B-hydroxyacyl-ACP dehy one could extend the approach by mixing domains from the drase (DH), phosphopantetheline transferase, chain length PUFA-PKS system and other PKS or FAS systems (e.g., type factor (CLF), acyl carrier protein (ACP), enoyl ACP-reduc I, type II, modular) to create an entire range of new PUFA end tase (ER), an enzyme that catalyzes the synthesis of trans-2- products. acyl-ACP, an enzyme that catalyzes the reversible isomeriza It is recognized that many genetic alterations, either ran tion of trans-2-acyl-ACP to cis-3-acyl-ACP, and/or an dom or directed, which one may introduce into a native (en enzyme that catalyzes the elongation of cis-3-acyl-ACP to dogenous, natural) PKS system, will result in an inactivation 25 cis-5-3-keto-acyl-ACP. Preferred domains to modify in order of enzymatic functions. Therefore, in order to test for the to alter the PUFA production profile of a host Thraustochytrid effects of genetic manipulation of a Thraustochytrid PUFA have been discussed previously herein. PKS system in a controlled environment, one could first use a The genetic modification of a Thraustochytrid microorgan recombinant system in another host, such as E. coli, to ism according to the present invention preferably affects the manipulate various aspects of the system and evaluate the 30 type, amounts, and/or activity of the PUFAs produced by the results. For example, the FabB-strain of E. coli is incapable of microorganism, whether the endogenous PUFA PKS system synthesizing unsaturated fatty acids and requires supplemen is genetically modified and/or whether recombinant nucleic tation of the medium with fatty acids that can substitute for its acid molecules are introduced into the organism. According normal unsaturated fatty acids in order to grow (see Metz, et to the present invention, to affect an activity of a PUFA PKS al., 2001, Supra). However, this requirement (for Supplemen 35 system, such as to affect the PUFA production profile, tation of the medium) can be removed when the strain is includes any genetic modification in the PUFAPKS system or transformed with a functional PUFA-PKS system (i.e. one genes that interact with the PUFAPKS system that causes any that produces a PUFA product in the E. coli host—see (Metz detectable or measurable change or modification in any bio et al., 2001, supra, FIG. 2A). The transformed FabB-strain logical activity the PUFA PKS system expressed by the now requires a functional PUFA-PKS system (to produce the 40 organism as compared to in the absence of the genetic modi unsaturated fatty acids) for growth without Supplementation. fication. According to the present invention, the phrases The key element in this example is that production of a wide “PUFA profile”, “PUFA expression profile” and “PUFA pro range of unsaturated fatty acid will Suffice (even unsaturated duction profile' can be used interchangeably and describe the fatty acid Substitutes such as branched chain fatty acids). overall profile of PUFAs expressed/produced by a microor Therefore, in another preferred embodiment of the invention, 45 ganism. The PUFA expression profile can include the types of one could create a large number of mutations in one or more PUFAs expressed by the microorganism, as well as the abso of the PUFA PKS genes disclosed herein, and then transform lute and relative amounts of the PUFAs produced. Therefore, the appropriately modified FabB-strain (e.g. create mutations a PUFA profile can be described in terms of the ratios of in an expression construct containing an ER domain and PUFAS to one another as produced by the microorganism, in transform a FabB-strain having the other essential domains 50 terms of the types of PUFAs produced by the microorganism, on a separate plasmid—or integrated into the chromosome) and/or in terms of the types and absolute or relative amounts and select only for those transformants that grow without of PUFAs produced by the microorganism. Supplementation of the medium (i.e., that still possessed an As discussed above, while the host microorganism can ability to produce a molecule that could complement the include any eukaryotic microorganism with an endogenous FabB-defect). 55 PUFA PKS system and the ability to efficiently channel the One test system for genetic modification of a PUFA PKS is products of the PUFA PKS system into both the phospholip exemplified in the Examples section. Briefly, a host microor ids (PL) and triacylglycerols (TAG), the preferred host micro ganism such as E. coli is transformed with genes encoding a organism is any member of the order Thraustochytriales, PUFA PKS system including all or a portion of a Thraus including the families Thraustochytriaceae and Labyrinthu tochytrid PUFA PKS system (e.g., Orfs A, B and C of 60 laceae. Particularly preferred host cells for use in the present Schizochytrium) and a gene encoding a phosphopantetheinyl invention could include microorganisms from a genus includ transferases (PPTase), which is required for the attachment of ing, but not limited to: Thraustochytrium, Japonochytrium, a phosphopantetheline cofactor to produce the active, holo Aplanochytrium, Elina, and Schizochytrium within the ACP in the PKS system. The genes encoding the PKS system Thraustochytriaceae, and Labyrinthula, Labyrinthuloides, can be genetically engineered to introduce one or more modi 65 and Labyrinthomyxa within the Labyrinthulaceae. Preferred fications to the Thraustochytrid PUFA PKS genes and/or to species within these genera include, but are not limited to: any introduce nucleic acids encoding domains from other PKS species within Labyrinthula, including Labrinthula sp., US 7,645,597 B2 73 74 Labyrinthula algeriensis, Labyrinthula cienkowski, Laby PUFA profile of interest, the high throughput technologies rinthula chattonii, Labyrinthula coenocystis, Labyrinthula could then be used to optimize the system. For example, if the macrocystis, Labyrinthula macrocystis atlantica, Labyrin introduced domain only functioned at relatively low tempera thula macrocystis macrocystis, Labyrinthula magnifica, tures, selection methods could be devised to permit removing Labyrinthula minuta, Labyrinthula roscoffensis, Labyrin that limitation. thula valkanovii, Labyrinthula vitellina, Labyrinthula In one embodiment of the present invention, a genetically vitellina pacifica, Labyrinthula vitellina vitellina, Labyrin modified Thraustochytrid microorganism has an enhanced thula Zopfii; any Labyrinthuloides species, including Laby ability to synthesize desired PUFAs and/or has a newly intro rinthuloides sp., Labyrinthuloides minuta, Labyrinthuloides duced ability to synthesize a different profile of PUFAs. schizochytrops; any Labyrinthomyxa species, including 10 According to the present invention, “an enhanced ability to Labyrinthomyxa sp., Labyrinthomyxa pohlia, Labyrinth synthesize' a product refers to any enhancement, or up-regu Onyxa Sauvageaui, any Aplanochytrium species, including lation, in a pathway related to the synthesis of the product Aplanochytrium sp. and Aplanochytrium kerguelensis; any Such that the microorganism produces an increased amount of Elina species, including Elina sp., Elina marisalba, Elina the product (including any production of a product where Sinorifica; any Japanochytrium species, including 15 there was none before) as compared to the wild-type micro Japanochytrium sp., Japanochytrium marinum; any organism, cultured or grown, under the same conditions. Schizochytrium species, including Schizochytrium sp., Methods to produce Such genetically modified organisms Schizochytrium aggregatum, Schizochytrium limacinum, have been described in detail above. Schizochytrium minutum, Schizochytrium Octosporum; and As described above, in one embodiment of the present any Thraustochytrium species, including Thraustochytrium invention, a genetically modified microorganism or plant sp., Thraustochytrium aggregatum, Thraustochytrium arudi includes a microorganism or plant which has an enhanced mentale, Thraustochytrium aureum, Thraustochytrium ability to synthesize desired bioactive molecules (products) benthicola, Thraustochytrium globosum, Thraustochytrium or which has a newly introduced ability to synthesize specific kinnei, Thraustochytrium motivum, Thraustochytrium pachy products (e.g., to synthesize a specific antibiotic). According dermum, Thraustochytrium proliferum, Thraustochytrium 25 to the present invention, “an enhanced ability to synthesize a roseum, Thraustochytrium striatum, Ulkenia sp., Ulkenia product refers to any enhancement, or up-regulation, in a minuta, Ulkenia profilinda, Ulkenia radiate, Ulkenia Sarkari pathway related to the synthesis of the product such that the ana, and Ulkenia visurgensis. Particularly preferred species microorganism or plant produces an increased amount of the within these genera include, but are not limited to: any product (including any production of a product where there Schizochytrium species, including Schizochytrium aggrega 30 was none before) as compared to the wild-type microorgan tum, Schizochytrium limacinum, Schizochytrium minutum; ism or plant, cultured or grown, under the same conditions. any Thraustochytrium species (including former Ulkenia Methods to produce such genetically modified organisms species Such as U. visurgensis, U. amoeboida, U. Sarkariana, have been described in detail above. U. profinda, U. radiata, U. minuta and Ulkenia sp. BP-5601), One embodiment of the present invention is a method to and including Thraustochytrium striatum, Thraustochytrium 35 produce desired bioactive molecules (also referred to as prod aureum, Thraustochytrium roseum; and any Japonochytrium ucts or compounds) by growing or culturing a genetically species. Particularly preferred strains of Thraustochytriales modified microorganism or plant of the present invention include, but are not limited to: Schizochytrium sp. (S31) (described in detail above). Such a method includes the step (ATCC 20888); Schizochytrium sp. (S8)(ATCC 20889); of culturing in a fermentation medium or growing in a Suit Schizochytrium sp. (LC-RM)(ATCC 18915); Schizochytrium 40 able environment, Such as soil, a microorganism or plant, sp. (SR21); Schizochytrium aggregatum (Goldstein et Bel respectively, that has a genetic modification as described sky)(ATCC 28209); Schizochytrium limacinum (Honda et previously herein and in accordance with the present inven Yokochi)(IFO 32693); Thraustochytrium sp. (23B)(ATCC tion. Preferred host cells for genetic modification related to 20891); Thraustochytrium striatum (Schneider)(ATCC the PUFA PKS system of the invention are described above. 24473); Thraustochytrium aureum (Goldstein)(ATCC 45 One embodiment of the present invention is a method to 34304); Thraustochytrium roseum (Goldstein)(ATCC produce desired PUFAs by culturing a genetically modified 28210); and Japonochytrium sp. (L1)(ATCC 28207). Thraustochytrid microorganism of the present invention (de In one embodiment of the present invention, it is contem scribed in detail above). Such a method includes the step of plated that a mutagenesis program could be combined with a culturing in a fermentation medium and under conditions selective screening process to obtain a Thraustochytrid 50 effective to produce the PUFA(s) a Thraustochytrid microor microorganism with the PUFA production profile of interest. ganism that has a genetic modification as described previ The mutagenesis methods could include, but are not limited ously herein and in accordance with the present invention. An to: chemical mutagenesis, gene shuffling, Switching regions appropriate, or effective, medium refers to any medium in of the genes encoding specific enzymatic domains, or which a genetically modified microorganism of the present mutagenesis restricted to specific regions of those genes, as 55 invention, including Thraustochytrids and other microorgan well as other methods. isms, when cultured, is capable of producing the desired For example, high throughput mutagenesis methods could PUFA product(s). Such a medium is typically an aqueous be used to influence or optimize production of the desired medium comprising assimilable carbon, nitrogen and phos PUFA profile. Once an effective model system has been phate sources. Such a medium can also include appropriate developed, one could modify these genes in a high throughput 60 salts, minerals, metals and other nutrients. Any microorgan manner. Utilization of these technologies can be envisioned isms of the present invention can be cultured in conventional on two levels. First, if a sufficiently selective screen for pro fermentation bioreactors. The microorganisms can be cul duction of a product of interest (e.g., EPA) can be devised, it tured by any fermentation process which includes, but is not could be used to attempt to alter the system to produce this limited to, batch, fed-batch, cell recycle, and continuous fer product (e.g., in lieu of or in concert with, other strategies 65 mentation. Preferred growth conditions for Thraustochytrid Such as those discussed above). Additionally, if the strategies microorganisms according to the present invention are well outlined above resulted in a set of genes that did produce the known in the art and are described in detail, for example, in US 7,645,597 B2 75 76 U.S. Pat. No. 5,130,242, U.S. Pat. No. 5,340,742, and U.S. recovered by harvesting the plant. In this embodiment, the Pat. No. 5,698,244, each of which is incorporated herein by plant can be consumed in its natural state or further processed reference in its entirety. into consumable products. In one embodiment, the genetically modified microorgan Many genetic modifications useful for producing bioactive ism is cultured at a temperature of greater than about 15°C., molecules will be apparent to those of skill in the art, given the present disclosure, and various other modifications have been and in another embodiment, greater than about 20°C., and in discussed previously herein. The present invention contem another embodiment, greater than about 25° C., and in plates any genetic modification related to a PUFA PKS sys another embodiment, greater than about 30° C., and in tem as described herein which results in the production of a another embodiment, greater than about 35° C., and in 10 desired bioactive molecule. another embodiment, greater than about 40°C., and in one Bioactive molecules, according to the present invention, embodiment, at any temperature between about 20° C. and include any molecules (compounds, products, etc.) that have 40° C. a biological activity, and that can be produced by a PKS The desired PUFA(s) and/or other bioactive molecules pro system that comprises at least one amino acid sequence hav duced by the genetically modified microorganism can be 15 ing a biological activity of at least one functional domain of a recovered from the fermentation medium using conventional non-bacterial PUFA PKS system as described herein. Such separation and purification techniques. For example, the fer bioactive molecules can include, but are not limited to: a mentation medium can be filtered or centrifuged to remove polyunsaturated fatty acid (PUFA), an anti-inflammatory for microorganisms, cell debris and other particulate matter, and mulation, a chemotherapeutic agent, an active excipient, an the product can be recovered from the cell-free supernatant by osteoporosis drug, an anti-depressant, an anti-convulsant, an conventional methods, such as, for example, ion exchange, anti-Heliobactor pyloridrug, a drug for treatment of neuro chromatography, extraction, Solvent extraction, phase sepa degenerative disease, a drug for treatment of degenerative ration, membrane separation, electrodialysis, reverse osmo liver disease, an antibiotic, and a cholesterol lowering formu sis, distillation, chemical derivatization and crystallization. lation. One advantage of the non-bacterial PUFAPKS system Alternatively, microorganisms producing the PUFA(s), or 25 of the present invention is the ability of such a system to extracts and various fractions thereof, can be used without introduce carbon-carbon double bonds in the cis configura removal of the microorganism components from the product. tion, and molecules including a double bond at every third carbon. This ability can be utilized to produce a variety of Preferably, a genetically modified Thraustochytrid micro compounds. organism of the invention produces one or more polyunsatu 30 Preferably, bioactive compounds of interest are produced rated fatty acids including, but not limited to, EPA (C20:5, by the genetically modified microorganism in an amount that (0-3), DHA (C22:6, co-3), DPA (C22:5, (D-6), ARA (C20:4, is greater than about 0.05%, and preferably greater than about (O-6), GLA (C18:3, n-6), and SDA (C18:4, n-3)). In one 0.1%, and more preferably greater than about 0.25%, and preferred embodiment, a Schizochytrium that, in wild-type more preferably greater than about 0.5%, and more prefer form, produces high levels of DHA and DPA, is genetically 35 ably greater than about 0.75%, and more preferably greater modified according to the invention to produce high levels of than about 1%, and more preferably greater than about 2.5%, EPA. As discussed above, one advantage of using genetically and more preferably greater than about 5%, and more prefer modified Thraustochytrid microorganisms to produce PUFAs ably greater than about 10%, and more preferably greater than is that the PUFAs are directly incorporated into both the about 15%, and even more preferably greater than about 20% phospholipids (PL) and triacylglycerides (TAG). 40 of the dry weight of the microorganism. For lipid compounds, Preferably, PUFAs are produced in an amount that is preferably, Such compounds are produced in an amount that is greater than about 5% of the dry weight of the microorganism, greater than about 5% of the dry weight of the microorganism. and in one aspect, in an amount that is greater than 6%, and in For other bioactive compounds, such as antibiotics or com another aspect, in an amount that is greater than 7%, and in pounds that are synthesized in Smaller amounts, those strains another aspect, in an amount that is greater than 8%, and in 45 possessing Such compounds at of the dry weight of the micro another aspect, in an amount that is greater than 9%, and in organism are identified as predictably containing a novel PKS another aspect, in an amount that is greater than 10%, and so system of the type described above. In some embodiments, on in whole integer percentages, up to greater than 90% dry particular bioactive molecules (compounds) are secreted by weight of the microorganism (e.g., 15%, 20%, 30%, 40%, the microorganism, rather than accumulating. Therefore, 50%, and any percentage in between). 50 such bioactive molecules are generally recovered from the In the method for production of desired bioactive com culture medium and the concentration of molecule produced pounds of the present invention, a genetically modified plant will vary depending on the microorganism and the size of the is cultured in a fermentation medium or grown in a Suitable culture. medium such as soil. An appropriate, or effective, fermenta One embodiment of the present invention relates to a tion medium has been discussed in detail above. A suitable 55 method to modify an endproduct containing at least one fatty growth medium for higher plants includes any growth acid, comprising adding to the endproduct an oil produced by medium for plants, including, but not limited to, Soil, sand, a recombinant host cell that expresses at least one recombi any other particulate media that Support root growth (e.g. nant nucleic acid molecule comprising a nucleic acid Vermiculite, perlite, etc.) or hydroponic culture, as well as sequence encoding at least one biologically active domain of Suitable light, water and nutritional Supplements which opti 60 a PUFA PKS system. The PUFA PKS system includes any mize the growth of the higher plant. The genetically modified suitable bacterial or non-bacterial PUFA PKS system plants of the present invention are engineered to produce described herein, including the PUFA PKS systems from significant quantities of the desired product through the activ Thraustochytrium and Schizochytrium, or any PUFA PKS ity of the PKS system that is genetically modified according system from bacteria that normally (i.e., under normal or to the present invention. The compounds can be recovered 65 natural conditions) are capable of growing and producing through purification processes which extract the compounds PUFAs at temperatures above 22°C., such as Shewanella from the plant. In a preferred embodiment, the compound is Olleyana or Shewanella japonica. US 7,645,597 B2 77 78 Preferably, the endproduct is selected from the group con sisting of a food, a dietary Supplement, a pharmaceutical formulation, a humanized animal milk, and an infant formula. Suitable pharmaceutical formulations include, but are not RCA Medium limited to, an anti-inflammatory formulation, a chemothera Deionized water 1000 mL. peutic agent, an active excipient, an osteoporosis drug, an Reef Crystals (R) sea salts 40 g/L anti-depressant, an anti-convulsant, an anti-Heliobactor Glucose 20 g/L Monosodium glutamate (MSG) 20 g/L pylori drug, a drug for treatment of neurodegenerative dis Yeast extract 1 g/L ease, a drug for treatment of degenerative liver disease, an PII metals 5 mLL antibiotic, and a cholesterol lowering formulation. In one 10 Vitamin mix 1 mLL embodiment, the endproduct is used to treat a condition pH 7.0 selected from the group consisting of chronic inflammation, *PII metal mix and vitamin mix are same as those outlined in U.S. Pat. No. acute inflammation, gastrointestinal disorder, cancer, 5,130,742, incorporated herein by reference in its entirety. cachexia, cardiac restenosis, neurodegenerative disorder, degenerative disorder of the liver, blood lipid disorder, 15 25 mL of the 72 hr old culture was then used to inoculate osteoporosis, osteoarthritis, autoimmune disease, preeclamp another 250 mL shake flask containing 50 mL of low nitrogen sia, preterm birth, age related maculopathy, pulmonary dis RCA medium (10 g/L MSG instead of 20 g/L) and the other order, and peroxisomal disorder. 25 mL of culture was used to inoculate a 250 mL shake flask Suitable food products include, but are not limited to, fine containing 175 mL of low-nitrogen RCA medium. The two bakery wares, bread and rolls, breakfast cereals, processed flasks were then placed on a shaker table (200 rpm) for 72 hr and unprocessed cheese, condiments (ketchup, mayonnaise, at 25°C. The cells were then harvested via centrifugation and etc.), dairy products (milk, yogurt), puddings and gelatin dried by lyophilization. The dried cells were analyzed for fat desserts, carbonated drinks, teas, powdered beverage mixes, content and fatty acid profile and content using standard gas processed fish products, fruit-based drinks, chewing gum, chromatograph procedures. hard confectionery, frozen dairy products, processed meat 25 The screening results for Thraustochytrium 23B under low products, nut and nut-based spreads, pasta, processed poultry oxygen conditions relative to high oxygen conditions were as products, gravies and sauces, potato chips and other chips or follows: crisps, chocolate and other confectionery, Soups and Soup mixes, Soya based products (milks, drinks, creams, whiten ers), vegetable oil-based spreads, and vegetable-based drinks. 30 Did DHA as % FAME increase? Yes (38->44%) Yet another embodiment of the present invention relates to C14:0+ C16:0 + C16:1 greater than about Yes (44%) a method to produce a humanized animal milk. This method 40% TFA includes the steps of genetically modifying milk-producing No C18:3(n-3) or C18:3(n-6)? Yes (0%) Did fat content increase? Yes (2-fold increase) cells of a milk-producing animal with at least one recombi Did DHA (or other HUFA content increase)? Yes (2.3-fold increase) nant nucleic acid molecule comprising a nucleic acid 35 sequence encoding at least one biologically active domain of a PUFA PKS system as described herein. The results, especially the significant increase in DHA Methods to genetically modify a host cell and to produce a content (as % FAME) under low oxygen conditions, condi genetically modified non-human, milk-producing animal, are tions, strongly indicates the presence of a PUFA producing known in the art. Examples of host animals to modify include 40 PKS system in this strain of Thraustochytrium. cattle, sheep, pigs, goats, yaks, etc., which are amenable to In order to provide additional data confirming the presence genetic manipulation and cloning for rapid expansion of a of a PUFA PKS system, a Southern blot of Thraustochytrium transgene expressing population. For animals, PKS-like 23B was conducted using PKS probes from Schizochytrium transgenes can be adapted for expression in target organelles, strain 20888, a strain which has already been determined to tissues and body fluids through modification of the gene 45 contain a PUFA producing PKS system (i.e., SEQID Nos: 1 regulatory regions. Of particular interest is the production of 32 described above). Fragments of Thraustochytrium 23B PUFAs in the breast milk of the host animal. genomic DNA which are homologous to hybridization The following examples are provided for the purpose of probes from PKS PUFA synthesis genes were detected using illustration and are not intended to limit the scope of the the Southern blot technique. Thraustochytrium 23B genomic present invention. 50 DNA was digested with either Clal or KpnI restriction endo nucleases, separated by agarose gel electrophoresis (0.7% EXAMPLES agarose, in standard tris-acetate-EDTA buffer), and blotted to a Schleicher & Schuell Nytran Supercharge membrane by Example 1 capillary transfer. Two digoxigenin labeled hybridization 55 probes were used—one specific for the enoyl-ACP reductase The following example, from U.S. patent application Ser. (ER) region of Schizochytrium PKS OrfB (nucleotides 5012 No. 10/124,800, describes the use of the screening process of 5511 of Orf B; SEQ ID NO:3), and the other specific for a the present invention to identify other non-bacterial organ conserved region at the beginning of Schizochytrium PKSOrf isms comprising a PUFA PKS system according to the C (nucleotides 76-549 of Orf'; SEQID NO:5). present invention. 60 The OrfB-ER probe detected an approximately 13 kb ClaI Thraustochytrium sp. 23B (ATCC 20892) was cultured as fragment and an approximately 3.6 kb KpnI fragment in the described in detail herein. Thraustochytrium 23B genomic DNA. The Orf probe A frozen vial of Thraustochytrium sp. 23B (ATCC 20892) detected an approximately 7.5 kb Clal fragment and an was used to inoculate a 250 mL shake flask containing 50 mL approximately 4.6 kb KpnI fragment in the Thraustochytrium of RCA medium. The culture was shaken on a shaker table 65 23B genomic DNA. (200 rpm) for 72 hr at 25° C. RCA medium contains the Finally, a recombinant genomic library, consisting of DNA following: fragments from Thraustochytrium 23B genomic DNA US 7,645,597 B2 79 80 inserted into vector lambda FIXII (Stratagene), was screened PPTase using digoxigenin labeled probes corresponding to the fol The ACP domains of the OrfA protein (SEQ ID NO:2 in lowing segments of Schizochytrium 20888 PUFA-PKS Schizochytrium) must be activated by addition of phospho genes: nucleotides 7385-7879 of Orf A (SEQ ID NO:1), pantetheline group in order to function. The enzymes that nucleotides 5012-5511 of OrfB (SEQID NO:3), and nucle catalyze this general type of reaction are called phosphopan otides 76-549 of Orf C (SEQID NO:5). Each of these probes tetheline transferases (PPTases). E. coli contains two endog detected positive plaques from the Thraustochytrium 23B enous PPTases, but it was anticipated that they would not library, indicating extensive homology between the recognize the OrfAACP domains from Schizochytrium. This Schizochytrium PUFA-PKS genes and the genes of Thraus was confirmed by expressing Orfs A, B* (see above) and C in tochytrium 23B. 10 E. coli without an additional PPTase. In this transformant, no These results demonstrate that Thraustochytrium 23B DHA production was detected. The inventors tested two het genomic DNA contains sequences that are homologous to erologous PPTases in the E. coli PUFA PKS expression sys PKS genes from Schizochytrium 20888. tem: (1) sfp (derived from Bacillus subtilis) and (2) Het I (from the cyanobacterium Nostoc strain 7120). Example 2 15 The sfp PPTase has been well characterized and is widely used due to its ability to recognizeabroad range of substrates. The following example demonstrates that Schizochytrium Based on published sequence information (Nakana, et al., Orfs A, B and C encode a functional DHA/DPA synthesis 1992, Molecular and General Genetics 232: 313-321), an enzyme via functional expression in E. coli. expression vector for Sfp was built by cloning the coding General Preparation of E. coli Transformants region, along with defined up-and downstream flanking DNA The three genes encoding the Schizochytrium PUFA PKS sequences, into a pACYC-184 cloning vector. The oligo system that produces DHA and DPA in Schizochytrium (Orfs nucleotides: A, B & C; SEQID NO:1, SEQID NO:3 and SEQID NO:5, respectively) were cloned into a single E. coli expression vector (derived from pET21c (Novagen)). The genes are tran 25 (forward; SEQ ID NO: 73) scribed as a single message (by the T7 RNA-polymerase), and CGGGGTACCCGGGAGCCGCCTTGGCTTTGT a ribosome-binding site cloned in front of each of the genes and initiates translation. Modification of the Orf B coding (reverse; SEQ ID NO: 74) sequence was needed to obtain production of a full-length Orf AAACTGCAGCCCGGGTCCAGCTGGCAGGCACCCTG, B protein in E. coli (see below). An accessory gene, encoding 30 a PPTase (see below) was cloned into a second plasmid (de were used to amplify the region of interest from genomic B. rived from p ACYC184, New England Biolabs). subtilus DNA. Convenient restriction enzyme sites were OrfB included in the oligonucleotides to facilitate cloning in an The OrfB gene is predicted to encode a protein with a mass intermediate, high copy number vector and finally into the of ~224 kDa. Initial attempts at expression of the gene in E. 35 EcoRV site of pACYC184 to create the plasmid: pBR301. coli resulted in accumulation of a protein with an apparent Examination of extracts of E. coli transformed with this plas molecular mass of ~165 kDa (as judged by comparison to mid revealed the presence of a novel protein with the mobility proteins of known mass during SDS-PAGE). Examination of expected for Sfp. Co-expression of the Sfp construct in cells the Orf B nucleotide sequence revealed a region containing expressing the Orf A, B, C proteins, under certain condi 15 sequential serine codons—all of them being the TCT 40 tions, resulted in DHA production. This experiment demon codon. The genetic code contains 6 different serine codons, strated that sfp was able to activate the Schizochytrium OrfA and three of these are used frequently in E. coli. The present ACP domains. In addition, the regulatory elements associated inventors used four overlapping oligonucleotides in combi with the Sfp gene were used to create an expression cassette nation with a polymerase chain reaction protocol to resynthe into which other genes could be inserted. Specifically, the sfp size a small portion of the OrfB gene (a ~195 base pair, BspHI 45 coding region (along with three nucleotides immediately to SacII restriction enzyme fragment) that contained the upstream of the ATG) in pRR301 was replaced with a 53 base serine codon repeat region. In the synthetic OrfB fragment, a pair section of DNA designed so that it contains several random mixture of the 3 serine codons commonly used by E. unique (for this construct) restriction enzyme sites. The initial coli was used, and some other potentially problematic codons restriction enzyme site in this region is Ndel (CATATG: SEQ were changed as well (i.e., other codons rarely used by E. 50 ID NO:79). The ATG sequence embedded in this site is uti coli). The BspHI to SacII fragment present in the original Orf lized as the initiation methionine codon for introduced genes. B was replaced by the resynthesized fragment (to yield Orf The additional restriction sites (BglLL, NotI, SmaI, PmelI. B*) and the modified gene was cloned into the relevant HindIII, Spel and XhoI) were included to facilitate the clon expression vectors. The modified OrfB still encodes the ing process. The functionality of this expression vector cas amino acid sequence of SEQ ID NO:4. Expression of the 55 sette was tested by using PCR to generate a version ofsfp with modified OrfB* clone in E. coli resulted in the appearance of a Ndel site at the 5' end and an XhoI site at the 3' end. This a ~224 kDa protein, indicating that the full-length product of fragment was cloned into the expression cassette and trans OrfB was produced. The sequence of the resynthesized Orf ferred into E. coli along with the OrfA, B and C expression B* BspHI to SacII fragment is shown in SEQ ID NO:80. vector. Under appropriate conditions, these cells accumulated Referring to SEQID NO:80, the nucleotide sequence of the 60 DHA, demonstrating that a functional Sfp had been produced. resynthesized BspHI to SacII region of Orf B is shown. The To the present inventors knowledge, Het I has not been BspHI restriction site and the SacII restriction site are iden tested previously in a heterologous situation. Het I is present tified. The BspHI site starts at nucleotide 4415 of the OrfB in a cluster of genes in Nostocknown to be responsible for the CDS (SEQ ID NO:3) (note: there are a total of three BspHI synthesis of long chain hydroxy-fatty acids that are a compo sites in the Orf B CDS, while the SacII site is unique). The 65 nent of a glyco-lipid layer present in heterocysts of that organ sequence of the unmodified Orf B CDS is given in GenBank ism. The present inventors, without being bound by theory, Accession number AF378328 and in SEQID NO:3. believe that Het I activates the ACP domains of a protein, Hgl US 7,645,597 B2 81 82 E. present in that cluster. The two ACP domains of HglE have PKS genes plus a PPTase (in this case Het I). Identity of the a high degree of sequence homology to the ACP domains labeled FAMES has been confirmed using mass spectroscopy. found in Schizochytrium Orf A. The endogenous start codon of Het I has not been identified (there is no methionine present Example 3 in the putative protein). There are several potential alternative start codons (e.g., TTG and ATT) near the 5' end of the open The following example shows demonstrates that genes reading frame. The sequence of the region of Nostoc DNA encoding the Schizochytrium PUFA PKS enzyme complex encoding the HetI gene is shown in SEQID NO:81. SEQ ID can be selectively inactivated (knocked out), and that it is a NO:82 represents the amino acid sequence encoded by SEQ lethal phenotype unless the medium is Supplemented with ID NO:81. Referring to SEQID NO:81, limit to the upstream 10 polyunsaturated fatty acids. coding region indicated by the inframe nonsense triplet Homologous recombination has been demonstrated in (TAA) at positions 1-3 of SEQID NO: 81 and ends with the Schizochytrium (see copending U.S. patent application Ser. stop codon (TGA) at positions 715–717 of SEQID NO:81. No No. 10/124.807, incorporated herein by reference in its methionine codons (ATG) are present in the sequence. Poten entirety). A plasmid designed to inactivate Schizochytrium tial alternative initiation codons are: 3TTG codons (positions 15 Orf A (SEQ ID NO:1) was made by inserting a ZeocinTM 4-6, 7-9 and 49-51 of SEQID NO:81), ATT (positions 76-78 resistance marker into the SmaI site of a clone containing the of SEQ ID NO:81) and GTG (positions 235-237 of SEQID OrfA coding sequence. The ZeocinTM resistance marker was NO:81). A Het I expression construct was made by using PCR obtained from the plasmid pMON50000 expression of the to replace the furthest 5' potential alternative start codon ZeocinTM resistance gene is driven by a Schizochytrium (TTG) with a methionine codon (ATG, as part of the above derived tubulin promoter element (see U.S. patent application described Ndel restriction enzyme recognition site), and Ser. No. 10/124,807, ibid.). The knock-out construct thus introducing an XhoI site at the 3' end of the coding sequence. consists of 5' Schizochytrium Orf A coding sequence, the The modified HetI coding sequence was then inserted into the tub-ZeocinTM resistance element and 3' Schizochytrium OrfA Ndel and XhoI sites of the paCYC184 vector construct con coding sequence, all cloned into pBluescript II SK (+) vector taining the Sfp regulatory elements. Expression of this Het I 25 (Stratagene). construct in E. coli resulted in the appearance of a new protein The plasmid was introduced into Schizochytrium cells by of the size expected from the sequence data.Co-expression of particle bombardment and transformants were selected on Het I with Schizochytrium Orfs A, B*, C in E. coli under plates containing Zeocin TM and Supplemented with polyun several conditions resulted in the accumulation of DHA and saturated fatty acids (PUFA) (see Example 4). Colonies that DPA in those cells. In all of the experiments in which sfp and 30 grew on the ZeocinTM plus PUFA plates were tested for ability Het I were compared, more DHA and DPA accumulated in to grow on plates without the PUFA supplementation and the cells containing the Het I construct than in cells containing several were found that required the PUFA. These PUFA the Sfp construct. auxotrophs are putative Orf A knockouts. Northern blot Production of DHA and DPA in E. coli Transformants analysis of RNA extracted from several of these mutants 35 confirmed that a full-length OrfA message was not produced The two plasmids encoding: (1) the Schizochytrium PUFA in these mutants. PKS genes (Orfs A, B* and C) and (2) the PPTase (from sfp or These experiments demonstrate that a Schizochytrium from Het I) were transformed into E. coli strain BL21 which gene (e.g., OrfA) can be inactivated via homologous recom contains an inducible T7 RNA polymerase gene. Synthesis of bination, that inactivation of OrfA results in a lethal pheno the Schizochytrium proteins was induced by addition of IPTG 40 type, and that those mutants can be rescued by Supplementa to the medium, while PPTase expression was controlled by a tion of the media with PUFA. separate regulatory element (see above). Cells were grown Similar sets of experiments directed to the inactivation of under various defined conditions and using either of the two Schizochytrium Orf B (SEQ ID NO:3) and Orf C (SEQ ID heterologous PPTase genes. The cells were harvested and the NO:5) have yielded similar results. That is, OrfB and Orf C fatty acids were converted to methyl-esters (FAME) and ana 45 can be individually inactivated by homologous recombina lyzed using gas-liquid chromatography. tion and those cells require PUFA supplementation for Under several conditions, DHA and DPA were detected in growth. E. coli cells expressing the Schizochytrium PUFAPKS genes, plus either of the two heterologous PPTases. No DHA or DPA Example 4 was detected in FAMEs prepared from control cells (i.e., cells 50 transformed with a plasmid lacking one of the Orfs). The ratio The following example shows that PUFA auxotrophs can of DHA to DPA observed in E. coli approximates that of the be maintained on medium supplemented with EPA, demon endogenous DHA and DPA production observed in strating that EPA can substitute for DHA in Schizochytrium. Schizochytrium. The highest level of PUFA (DHA plus DPA), As indicated in Example 3, Schizochytrium cells in which representing ~17% of the total FAME, was found in cells 55 the PUFAPKS complex has been inactivated required supple grown at 32° C. in 765 medium (recipe available from the mentation with PUFA to survive. Aside from demonstrating American Type Culture Collection) supplemented with 10% that Schizochytrium is dependent on the products of this sys (by weight) glycerol. Note that PUFA accumulation was also tem for growth, this experimental system permits the testing observed when cells were grown in Luria Broth supple of various fatty acids for their ability to rescue the mutants. It mented with 5 or 10% glycerol, and when grown at 20° C. 60 was discovered that the mutant cells (in which any of the three Selection for the presence of the respective plasmids was genes have been inactivated) grew as well on media Supple maintained by inclusion of the appropriate antibiotics during mented with EPA as they did on media supplemented with the growth and IPTG (to a final concentration of 0.5 mM) was DHA. This result indicates that, if the endogenous PUFAPKS used to induce expression of Orfs A, B and C. complex which produces DHA were replaced with one whose FIG. 4 shows an example chromatogram from gas-liquid 65 product was EPA, the cells would be viable. Additionally, chromatographic analysis of FAMEs derived from control these mutant cells could be rescued by supplementation with cells and from cells expressing the Schizochytrium PUFA either ARA or GLA, demonstrating the feasibility of produc US 7,645,597 B2 83 84 ing genetically modified Schizochytrium that produce these Two EPA-producing marine bacterial strains of the genus products. It is noted that a preferred method for supplemen Shewanella have been shown to grow attemperatures typical tation with PUFAs involves combining the free fatty acids of Schizochytrium fermentations and to possess PUFA PKS with partially methylated beta-cyclodextrin prior to addition like genes. Shewanella Olleyana (Australian Collection of of the PUFAs to the medium. Antarctic Microorganisms (ACAM) strain number 644; Sker ratt et al., Int. J. Syst. Evol. Microbiol. 52, 2101 (2002)) Example 5 produces EPA and grows up to 30° C. Shewanella japonica (American Type Culture Collection (ATCC) strain number The following example shows that inactivated PUFA genes BAA-316; Ivanova et al., Int. J. Syst. Evol. Microbiol. 51, can be replaced at the same site with active forms of the genes 10 1027 (2001)) produces EPA and grows up to 35° C. in order to restore PUFA synthesis. To identify and isolate the PUFA-PKS genes from these Double homologous recombination at the acetolactate syn bacterial strains, degenerate PCR primer pairs for the KS thase gene site has been demonstrated in Schizochytrium (see MAT region of bacterial orf5/pfaA genes and the DH-DH U.S. patent application Ser. No. 10/124.807, supra). The region of bacterial orf7/pfaC genes were designed based on present inventors tested this concept for replacement of the 15 published gene sequences for Shewanella SCRC-2738, Schizochytrium PUFA PKS genes by transformation of a Shewanella Oneidensis MR-1; Shewanella sp. GA-22; Pho Schizochytrium OrfA knockout strain (described in Example tobacter profiindum, and Moritella marina (see discussion 2) with a full-length Schizochytrium Orf A genomic clone. above). Specifically, the primers and PCR conditions were The transformants were selected by their ability to grow on designed as follows: media without supplemental PUFAs. These PUFA prototro Primers for the KS/AT region; based on the following phs were then tested for resistance to ZeocinTM and several published sequences: Shewanella sp. SCRC-2738; were found that were sensitive to the antibiotic. These results Shewanella Oneidensis MR-1; Photobacter profiumdum, indicate that the introduced Schizochytrium Orf A has Moritella marina: replaced the ZeocinTM resistance gene in the knockout strain via double homologous recombination. This experiment 25 demonstrates the proof of concept for gene replacement prRZ23 within the PUFA PKS genes. Similar experiments for GGYATGMTGRTTGGTGAAGG (forward; SEQ ID NO: 69) Schizochytrium OrfB and Orf C knock-outs have given iden prRZ24 tical results. TRTTSASRTAYTGYGAACCTTG (reverse; SEO ID NO: 7O) 30 Example 6 Primers for the DH region; based on the following pub lished sequences: Shewanella sp. GA-22. Shewanella sp. This example shows that all or some portions of the SCRC-2738; Photobacter profundum, Moritella marina: Thraustochytrium 23B PUFA PKS genes can function in Schizochytrium. 35 prRZ28 As described in U.S. patent application Ser. No. 10/124, ATGKCNGAAGGTTGTGGCCA (forward; SEQ ID NO: 71.) 800 (supra), the DHA-producing protist Thraustochytrium 23B (Th. 23B) has been shown to contain orfA, orfB, and prRZ29 orf homologs. Complete genomic clones of the three Th. CCWGARATRAAGCCRTTDGGTTG (reverse; SEO ID NO: 72) 23B genes were used to transform the Schizochytrium strain 40 containing the cognate orf"knock-out”. Direct selection for The PCR conditions (with bacterial chromosomal DNA as complemented transformants was carried out in the absence templates) were as follows: of PUFA supplementation. By this method, it was shown that Reaction Mixture: the Th. 23B orfA and orfc genes could complement the 0.2 uM dNTPs Schizochytrium orfA and orfc knock-out strains, respec 45 0.1 uM each primer tively, to PUFA prototrophy. Complemented transformants 8% DMSO were found that either retained or lost ZeocinTM resistance 250 ng chromosomal DNA (the marker inserted into the Schizochytrium genes thereby 2.5 UHerculase R DNA polymerase (Stratagene) defining the knock-outs). The ZeocinTM-resistant comple 1x Herculase(R) buffer mented transformants are likely to have arisen by a single 50 50 uL total volume cross-over integration of the entire Thraustochytrium gene PCR Protocol: (1) 98°C. for 3 min.; (2) 98°C. for 40 sec.: into the Schizochytrium genome outside of the respective orf (3) 56°C. for 30 sec.; (4) 72° C. for 90 sec.; (5) Repeat steps region. This result Suggests that the entire Thraustochytrium 2-4 for 29 cycles; (6) 72° C. for 10 min. (7) Hold at 6° C. gene is functioning in Schizochytrium. The ZeocinTM-sensi For both primer pairs, PCR gave distinct products with tive complemented transformants are likely to have arisen by 55 expected sizes using chromosomal DNA templates from double cross-over events in which portions (or conceivably either Shewanella Olleyana or Shewanella japonica. The four all) of the Thraustochytrium genes functionally replaced the respective PCR products were cloned into pCR-BLUNT II cognate regions of the Schizochytrium genes that had con TOPO (Invitrogen) and insert sequences were determined tained the disruptive ZeocinTM resistance marker. This result using the M13 forward and reverse primers. In all cases, the Suggests that a fraction of the Thraustochytrium gene is func 60 DNA sequences thus obtained were highly homologous to tioning in Schizochytrium. known bacterial PUFA PKS gene regions. The DNA sequences obtained from the bacterial PCR Example 7 products were compared with known sequences and with PUFA PKS genes from Schizochytrium ATCC 20888 in a The following example shows that certain EPA-producing 65 standard Blastx search (BLAST parameters: Low Complex bacteria contain PUFA PKS-like genes that appear to be ity filter: On: Matrix: BLOSUM62: Word Size: 3; Gap Costs: suitable for modification of Schizochytrium. Existance 11, Extension 1 (BLAST described in Altschul, S. US 7,645,597 B2 85 86 F., Madden, T. L., Schäffer, A. A., Zhang, J., Zhang, Z. Schizochytrium PUFA production. PUFA PKS genes and the Miller, W. & Lipman, D. J. (1997) “Gapped BLAST and proteins and domains encoded thereby from either of PSI-BLAST: a new generation of protein database search Shewanella Olleyana or Shewanella japonica are explicitly programs. Nucleic Acids Res. 25:3389-3402, incorporated encompassed by the present invention. herein by reference in its entirety)). At the amino acid level, the sequences with the greatest Example 8 degree of homology to the Shewanella Olleyana ACAM644 ketoacyl synthase/acyl transferase (KS-AT) deduced amino This example demonstrates how the bacterial PUFA PKS acid sequence encoded by SEQID NO:76 were: Photobacter gene fragments described in Example 7 can be used to modify profundum pfaA (identity-70%; positives=81%); 10 PUFA production in Schizochytrium. Shewanella Oneidensis MR-1 “multi-domain B-ketoacyl syn All presently-known examples of PUFA PKS genes from thase” (identity=66%; positives=77%); and Moritella marina bacteria exist as four closely linked genes that contain the ORF8 (identity=56%; positives=71%). The Schizochytrium same domains as in the three-gene Schizochytrium set. It is sp. ATCC20888 orfA was 41% identical and 56% positive to anticipated that the PUFA PKS genes from Shewanella the deduced amino acid sequence encoded by SEQID NO:76. 15 Olleyana and Shewanella japonica will likewise be found in At the amino acid level, the sequences with the greatest this tightly clustered arrangement. The homologous regions degree of homology to the Shewanella japonica ATCC BAA identified in Example 7 are used to isolate the PUFA PKS 316 ketoacyl synthase/acyl transferase (KS-AT) deduced gene clusters from clone banks of Sh. Olleyana and Sh. amino acid sequence encoded by SEQ ID NO:78 were: japonica DNAS. Clone banks can be constructed in bacte Shewanella Oneidensis MR-1 “multi-domain B-ketoacyl syn riophage lambda Vectors, cosmid vectors, bacterial artificial thase” (identity=67%; positives=79%); Shewanella sp. chromosome (“BAC) vectors, or by other methods known in SCRC-2738 orf5 (identity=69%; positives=77%); and Mori the art. Desired clones containing bacterial PUFA PKS genes tella marina ORF8 (identity=56%; positives–70%). The can be identified by colony or plaque hybridization (as Schizochytrium sp. ATCC20888 orfA was 41% identical and described in Example 1) using probes generated by PCR of 55% positive to the deduced amino acid sequence encoded by 25 the partial gene sequences of Example 7 employing primers SEQID NO:78. designed from these sequences. The complete DNA sequence At the amino acid level, the sequences with the greatest of the new bacterial PUFA PKS gene sets are then used to degree of homology to the Shewanella Olleyana ACAM644 design vectors for transformation of Schizochytrium strains dehydrogenase (DH) deduced amino acid sequence encoded defective in the endogenous PUFA PKS genes (e.g., see by SEQ ID NO:75 were: Shewanella sp. SCRC-2738 orf7 30 Examples 3, 5, and 6). Whole bacterial genes (coding (identity=77%; positives=86%); Photobacter profiundum sequences) may be used to replace whole Schizochytrium pfaC (identity=72%; positives 81%); and Shewanella genes (coding sequences), thus utilizing the Schizochytrium Oneidensis MR-1 “multi-domain B-ketoacyl synthase” (iden gene expression regions, and the fourth bacterial gene may be tity 75%; positives=83%). The Schizochytrium sp. targeted to a different location within the genome. Alterna ATCC20888 orfc was 26% identical and 42% positive to the 35 tively, individual bacterial PUFA PKS functional domains deduced amino acid sequence encoded by SEQID NO:75. may be “swapped' or exchanged with the analogous At the amino acid level, the sequences with the greatest Schizochytrium domains by similar techniques of homolo degree of homology to the Shewanella japonica ATCC BAA gous recombination. It is understood that the sequence of the 316 dehydrogenase (DH) deduced amino acid sequence bacterial PUFA PKS genes or domains may have to be modi encoded by SEQ ID NO:77 were: Shewanella sp. SCRC 40 fied to accommodate details of Schizochytrium codon usage, 2738 orf7 (identity=77%; positives=86%); Photobacter pro but this is within the ability of those of skill in the art. findum pfaC (identity=73%; positives=83%) and Each publication cited or discussed herein is incorporated Shewanella Oneidensis MR-1 “multi-domain B-ketoacyl syn herein by reference in its entirety. thase” (identity–74%: positives=81%). The Schizochytrium While various embodiments of the present invention have sp. ATCC20888 orfc was 27% identical and 42% positive to 45 been described in detail, it is apparent that modifications and the deduced amino acid sequence encoded by SEQID NO:77. adaptations of those embodiments will occur to those skilled It is expected that the PUFA PKS gene sets from these two in the art. It is to be expressly understood, however, that such Shewanella strains will provide beneficial sources of whole modifications and adaptations are within the scope of the genes or individual domains for the modification of present invention, as set forth in the following claims.

SEQUENCE LISTING

<16 Oc NUMBER OF SEO ID NOS: 82

<210 SEQ ID NO 1 <211 LENGTH: 873 O &212> TYPE: DNA <213> ORGANISM: Schizochytrium sp. &220s FEATURE: <221 NAME/KEY: CDS <222> LOCATION: (1) . . (8730)

<4 OO SEQUENCE: 1 atg gcg gcc ct ctg cag gag caa aag gga ggc gag atg gat acc cc Met Ala Ala Arg Lieu. Glin Glu Glin Lys Gly Gly Glu Met Asp Thr Arg

US 7,645,597 B2 105 106

- Continued Llys Lieu Val Pro Ala Tyr Arg Ala Val Ile Wall Lell Ser Asn Glin 2735 274 O 2745 ggc gcg C cc ccg gCC aac goc acc atg cag cc.g c cc tog ctic gat 8289 Gly Ala Pro Pro Ala Asn Ala Thr Met Glin Pro Pro Ser Luell Asp 2750 27s 276 O gcc gat CC9 gcg Ct c cag ggc ticc gt C tac gac ggc aag acc citc. 8334 Ala Asp Pro Ala Lieu. Glin Gly Ser Val Gly Lys Thr Lell 2765 2770 2775 ttic cac ggc ccg gcc tt C. c9c ggc at C gat gac gtg ctic ticg tgc 83.79 Phe His Gly Pro Ala Phe Arg Gly Ile Asp Asp Wall Luell Ser Cys 2780 2785 279 O acc aag agc cag Ctt gtg gCC aag tec agc gct gtc Ccc ggc t cc 84.24 Thir Lys Ser Glin Lieu Val Ala Lys Cys Ser Ala Wall Pro Gly Ser 2.79s 28OO 2805 gac goc got cqc ggc gag titt gcc acg gac act gac gcc cat gac 8 469 Asp Ala Ala Arg Gly Glu Phe Ala Thr Asp Thr Asp Ala His Asp 2810 2815 282O c cc titc gtgaac gac ct g goc titt cag gcc atg citc. gtC td gtg 8514 Pro Phe Val Asn Asp Lieu Ala Phe Glin Ala Met Lell Val Trp Wall 2825 283 O 2835 cgc cgc acg Ctic ggc cag gCt gcg ct c CCC aac tcg atc cag cgc 8.559 Arg Arg Thr Lieu. Gly Glin Ala Ala Lieu. Pro Asn Ser Ile Glin Arg 284 O 284.5 285 O atc gt C cag cac cqc ccg gt C ccg cag gac aag c cc ttic tact att Ile Val Gln His Arg Pro Val Pro Glin Asp Llys Pro Phe Tyr Ile 2855 286 O 2865 acc Ct c cqc toc aac cag tog ggc ggit Cac toc cag cac aag CaC 8649 Thir Lieu. Arg Ser Asn Glin Ser Gly Gly His Ser Glin His Llys His 2870 2875 288O gcc ctt Cag titc. cac aac gag cag ggc gat citc tto att gat gtc 86.94 Ala Lieu. Glin Phe His Asn. Glu Glin Gly Asp Lieu. Phe Ile Asp Wall 2.885 289 O 2.895

Cag gct tcg gtc at C gcc acg gac agc citt gcc tto 873 O Glin Ala Ser Val Ile Ala Thr Asp Ser Lieu. Ala Phe 29 OO 29 OS 291. O

<210 SEQ ID NO 2 <211 LENGTH: 291 O &212> TYPE: PRT <213> ORGANISM: Schizochytrium sp.

<4 OO SEQUENCE: 2

Met Ala Ala Arg Lieu. Glin Glu Glin Lys Gl y Gly Glu Met Asp Thr Arg 1. 5 1O 15

Ile Ala Ile Ile Gly Met Ser Ala Ile Le ul Pro Gly Th Thr Wall 2O 25 3O

Arg Glu Ser Trp Glu Thir Ile Arg Ala Gl y Ile Asp Cys Luell Ser Asp 35 4 O 45

Lieu Pro Glu Asp Arg Val Asp Val Thir Al a Tyr Phe Asp Pro Wall Lys SO 55 6 O

Thir Thr Lys Asp Llys Ile Tyr Cys Lys Air Gly Phe Ile Pro Glu 65 70

Tyr Asp Phe Asp Ala Arg Glu Phe Gly Lie ul Asn Met Phe Gln Met Glu 85 90 95

Asp Ser Asp Ala Asn Glin Thir Ile Ser Le ul Luell Wall Lys Glu Ala 1OO 105 11 O

Lieu. Glin Asp Ala Gly Ile Asp Ala Lieu Gl y Lys Glu Lys Lys Asn. Ile 115 12 O 125 US 7,645,597 B2 107 108

- Continued

Gly Cys Wall Luell Gly Ile Gly Gly Gly Glin Lys Ser Ser His Glu Phe 13 O 135 14 O

Tyr Ser Arg Luell Asn Tyr Wall Wall Wall Glu Lys Wall Lell Arg Lys Met 145 150 155 160

Gly Met Pro Glu Glu Asp Wall Wall Ala Wall Glu Lys Ala 1.65 17s

Asn Phe Pro Glu Trp Arg Lell Asp Ser Phe Pro Gly Phe Luell Gly Asn 18O 185 19 O

Wall Thir Ala Gly Arg Thir Asn Thir Phe ASn Lell Asp Gly Met Asn 195

Wall Wall Asp Ala Ala Cys Ala Ser Ser Luell Ile Ala Wall Wall 21 O 215

Ala Ile Asp Glu Lell Lell Gly Asp Asp Met Met Wall Thir Gly 225 23 O 235 24 O

Ala Thir Thir Asp Asn Ser Ile Gly Met Met Ala Phe Ser 245 250 255

Thir Pro Wall Phe Ser Thir Asp Pro Ser Wall Arg Ala Asp Glu 26 O 265 27 O

Thir Gly Met Lell Ile Gly Glu Gly Ser Ala Met Lell Wall Luell 27s 28O 285

Arg Tyr Ala Asp Ala Wall Arg Asp Gly Asp Glu Ile His Ala Wall Ile 29 O 295 3 OO

Arg Gly Ala Ser Ser Ser Asp Gly Ala Ala Gly Ile Thir 3. OS 310 315 32O

Pro Thir Ile Ser Gly Glin Glu Glu Ala Luell Arg Arg Ala Asn Arg 3.25 330 335

Ala Wall Asp Pro Ala Thir Wall Thir Luell Wall Glu Gly His Gly Thir 34 O 345 35. O

Gly Thir Pro Wall Gly Asp Arg Ile Glu Luell Thir Ala Lell Arg Asn Luell 355 360 365

Phe Asp Ala Tyr Gly Glu Gly Asn Thir Glu Lys Wall Ala Wall Gly 37 O 375

Ser Ile Ser Ser Ile Gly His Luell Ala Wall Ala Gly Lel Ala 385 390 395 4 OO

Gly Met Ile Wall Ile Met Ala Luell Lys His Thir Luell Gly 4 OS

Thir Ile Asn Wall Asp Asn Pro Pro Asn Luell Tyr Asp Asn Thir Ile 42O 425 43 O

Asn Glu Ser Ser Lell Ile Asn Thir Met ASn Arg Pro Trp Phe Pro 435 44 O 445

Pro Pro Gly Wall Pro Arg Arg Ala Gly Ile Ser Ser Phe Gly Phe Gly 450 45.5 460

Gly Ala Asn His Ala Wall Luell Glu Glu Ala Glu Pro Glu His Thir 465 470

Thir Ala Arg Lell Asn Arg Pro Glin Pro Wall Lell Met Met Ala 485 490 495

Ala Thir Pro Ala Ala Lell Glin Ser Luell Glu Ala Glin Luell Glu SOO 505

Phe Glu Ala Ala Ile Glu Asn Glu Thir Wall Asn Thir Ala Tyr 515 525

Ile Lys Wall Lys Phe Gly Glu Glin Phe Phe Pro Gly Ser Ile 53 O 535 54 O

Pro Ala Thir Asn Ala Arg Lell Gly Phe Luell Wall Asp Ala Glu Asp US 7,645,597 B2 109 110

- Continued

5.45 550 555 560

Ala Ser Thir Lell Arg Ala Ile Ala Glin Phe Ala Asp Wall 565 st O sts

Thir Glu Ala Trp Arg Lell Pro Arg Glu Gly Wall Ser Phe Arg Ala 58O 585 59 O

Gly Ile Ala Thir Asn Gly Ala Wall Ala Ala Lell Phe Ser Gly Glin 595 605

Gly Ala Glin Thir His Met Phe Ser Glu Wall Ala Met Asn Trp Pro 610 615

Glin Phe Arg Glin Ser Ile Ala Ala Met Asp Ala Ala Glin Ser Wall 625 630 635 64 O

Ala Gly Ser Asp Lys Asp Phe Glu Arg Wall Ser Glin Wall Luell Tyr Pro 645 650 655

Arg Pro Tyr Glu Arg Glu Pro Glu Glin ASn Pro Lys Ile Ser 660 665 67 O

Lell Thir Ala Ser Glin Pro Ser Thir Luell Ala Ala Luell Gly Ala 675 685

Phe Glu Ile Phe Lys Glu Ala Gly Phe Thir Pro Asp Phe Ala Ala Gly 69 O. 695 7 OO

His Ser Luell Gly Glu Phe Ala Ala Luell Tyr Ala Ala Gly Wall Asp 7 Os

Arg Asp Glu Luell Phe Glu Lell Wall Arg Arg Ala Arg Ile Met Gly 72 73 O 73

Gly Asp Ala Pro Ala Thir Pro Lys Gly Met Ala Ala Wall Ile 740 74. 7 O

Gly Pro Asn Ala Glu Asn Ile Lys Wall Glin Ala Ala Asn Wall Trp Luell 760 765

Gly Asn Ser Asn Ser Pro Ser Glin Thir Wall Ile Thir Gly Ser Wall Glu 770 775

Gly Ile Glin Ala Glu Ser Ala Arg Luell Glin Lys Glu Gly Phe Arg Wall 79 O 79.

Wall Pro Luell Ala Cys Glu Ser Ala Phe His Ser Pro Glin Met Glu Asn 805 810 815

Ala Ser Ser Ala Phe Asp Wall Ile Ser Wall Ser Phe Arg Thir 825 83 O

Pro Ala Glu Thir Lell Phe Ser Asn Wall Ser Gly Glu Thir 835 84 O 845

Pro Thir Asp Ala Arg Glu Met Luell Thir Glin His Met Thir Ser Ser Wall 850 855 860

Lys Phe Luell Thir Glin Wall Arg Asn Met His Glin Ala Gly Ala Arg Ile 865

Phe Wall Glu Phe Gly Pro Glin Wall Luell Ser Lell Wall Ser Glu 885 890 895

Thir Luell Asp Asp Pro Ser Wall Wall Thir Wall Ser Wall Asn Pro Ala 9 OO 905 91 O

Ser Gly Thir Asp Ser Asp Ile Glin Luell Arg Asp Ala Ala Wall Glin Luell 915 92 O 925

Wall Wall Ala Gly Wall Asn Lell Glin Gly Phe Asp Lys Trp Asp Ala Pro 93 O 935 94 O

Asp Ala Thir Arg Met Glin Ala Ile Lys Arg Thir Thir Luell Arg 945 950 955 96.O

Lell Ser Ala Ala Thir Wall Ser Asp Lys Thir Wall Arg Asp 965 97O 97. US 7,645,597 B2 111 112

- Continued

Ala Ala Met Asn Asp Gly Arg Cys Val Thir Tyr Lieu Lys Gly Ala Ala 985 99 O

Pro Lieu. Ile Lys Ala Pro G ul Pro Val Val Asp G u Ala Ala Lys Arg 995 1OOO 1005

Glu Ala Glu Arg Lieu. Glin Glu Lieu. Glin Asp Ala Glin Arg Glin O1O

Lell Asp Ala Lys Arg Ala Ala Glu Ala Ser Lys Lell

Ala Ala Glu Glu Thir Ala Ala Ser Ala

Pro Wall Asp Thir Ala Wall Glu Lys His Ala Ile Lell

Ser Met Lell Ala Glu Asp Gly Tyr Gly Wall Asp Ala Of O

Ser Ser Lell Glin Glin Glin Glin Glin Glin Thir Pro Ala Pro O85

Wall Ala Ala Ala Pro Ala Pro Wall Ala Ala Pro Ala

Pro Wall Ser Asn Glu Luell Glu Lys Ala Thir Wall Wall

Met Wall Lell Ala Ala Thir Gly Tyr Glu Asp Met Ile

Glu Ala Asp Met Glu Luell Thir Glu Luell Gly Asp Ser Ile

Wall Glu Ile Luell Glu Wall Glin Ala Luell Asn Wall

Glu Ala Asp Wall Asp Luell Ser Arg Thir Thir Wall Gly

Wall Asn Ala Met Ala Glu Ile Ala Ser Ser Ala

Pro Pro Ala Ala Ala Pro Ala Pro Ala Ala Ala Pro

Ala Ala Ala Pro Ala Ser Asn Glu Lell Lell Glu Ala 23 O

Glu Wall Wall Met Glu Luell Ala Ala Gly Tyr Glu 245

Thir Met Ile Glu Ser Met Glu Luell Glu Glu Lell Gly 26 O

Ile Ser Ile Glu Ile Luell Ser Glu Wall Glin Ala 27s

Met Luell Asn Wall Glu Ala Asp Wall Asp Ala Lell Ser Arg Thir 28O 29 O

Wall Gly Glu Wall Asn Ala Met Ala Glu Ile Ala 295 305

Ser Ala Pro Ala Ala Ala Ala Ala Pro Gly Pro Ala 32O

Ala Ala Pro Ala Pro Ala Ala Ala Pro Ala Wall Ser Asn 335

Glu Lell Glu Ala Glu Thir Wall Wall Met Glu Wall Lell Ala 345 350

Ala Thir Gly Glu Asp Met Ile Glu Asp Met Glu 360 365 US 7,645,597 B2 113 114

- Continued Lell Glu Thir Glu Luell Gly Asp Ser Ile Lys Arg Val Glu Ile 37O

Lell Ser Glu Wall Glin Ala Luell Asn Wall Glu Ala Lys Asp Wall 385 395

Asp Lell Ser Arg Thir Thir Wall Gly Glu Wall Wall Asp Ala 41 O

Met Ala Glu Ile Ala Gly Ser Ala Pro Ala Pro Ala Ala 425

Ala Pro Ala Pro Ala Ala Ala Pro Ala Pro Ala Ala Pro 44 O

Ala Pro Ala Wall Ser Ser Luell Luell Glu Lys Ala Glu Thir Wall 445 45.5

Wall Met Glu Wall Luell Ala Lys Thir Gly Tyr Glu Thir Asp Met 460 47 O

Ile Glu Ser Asp Met Glu Luell Glu Thir Glu Lell Gly Ile Asp Ser 48O 485

Ile Arg Wall Glu Ile Luell Ser Glu Wall Glin Ala Met Lell Asn 490 495 SOO

Wall Glu Ala Asp Wall Ala Luell Ser Arg Thir Arg Thir Wall 5 OS 515

Gly Glu Wall Wall Asp Ala Ala Glu Ile Ala Gly Gly Ser 52O 53 O

Ala Pro Ala Pro Ala Ala Ala Pro Ala Pro Ala Ala Ala Ala 535 545

Pro Pro Ala Ala Pro Pro Ala Ala Pro Ala Pro Ala Wall 550 560

Ser Ser Glu Lell Luell Glu Ala Glu Thir Wall Wall Met Glu Wall 565 sts

Lell Ala Thir Gly Glu Thir Asp Met le Glu Ser Asp 590

Met Lell Glu Thir Glu Gly Ile Asp Ser le Arg Wall 605

Glu Lell Ser Glu Wall Ala Met Luell Asn Wall Glu Ala 62O

Asp Asp Ala Luell Ser Thir Arg Thir Wall Glu Wall Wall 635

Asp Met Ala Glu Ala Gly Ser Ser Ala Ser Ala Pro 650

Ala Ala Ala Pro Ala Pro Ala Ala Ala Ala Pro Ala Pro Ala 660 665

Ala Ala Pro Ala Wall Ser Asn Glu Luell Lell Glu Ala Glu 675 68O

Thir Wall Met Glu Wall Luell Ala Ala Thir Glu Thir 69 O. 695

Asp Met Ile Glu Ser Asp Met Glu Luell Glu Thir Luell Gly Ile 7 OO 7Os

Asp Ser Ile Arg Wall Glu Ile Luell Ser Glu Wall Glin Ala Met 71s 72 O 72

Lell Asn Wall Glu Ala Wall Asp Ala Lell Arg Thir Arg 73 O 74 O

Thir Gly Glu Wall Wall Ala Met Ala Glu Ile Ala Gly 74. 7ss

Gly Ser Ala Pro Ala Pro Ala Ala Ala Pro Ala Pro Ala Ala US 7,645,597 B2 115 116

- Continued

765 770

Ala Pro Ala Wall Ser Asn Glu Luell Luell Glu Lys Ala Glu Thir 78O 78s

Wall Met Glu Wall Luell Ala Thir Gly Glu Thir Asp

Met Glu Ser Asp Met Luell Glu Thir Glu Lell Gly Ile Asp 815

Ser Arg Wall Glu Luell Ser Glu Wall Glin Ala Met Lell 83 O

Asn Glu Ala Asp Asp Ala Luell Ser Thir Arg Thir 845

Wall Glu Wall Wall Asp Met Ala Glu le Ala Gly Ser 86 O

Ser Pro Ala Pro Ala Ala Ala Pro Ala Pro Ala Ala Ala 87s

Ala Pro Ala Pro Ala Ala Ala Pro Ala Wall Ser Glu Lell 88O 890

Lell Glu Ala Glu Thir Wall Met Glu Wall Lell Ala Ala 895 905

Thir Glu Thir Asp Met Ile Glu Ser Asp Met Glu Lell Glu 910 915 92 O

Thir Glu Lell Gly Ile Asp Ser Ile Arg Wall Glu Ile Lell Ser 925 93 O 935

Glu Glin Ala Met Luell Asn Wall Glu Ala Wall Asp Ala 94 O 945

Lell Ser Arg Thir Arg Thir Gly Glu Wall Wall Ala Met 955 96.O

Ala Glu Ile Ala Gly Gly Ser Ala Pro Ala Pro Ala Ala Ala Ala 97O 97. 98 O

Pro Pro Ala Ala Ala Pro Ala Wall Ser Asn Glu Lell Lell 985 990 995

Glu Lys Ala Glu Thir Wall Wall Met Glu Wall Lell Ala Ala Thir 2OOO 2005 2010

Gly Tyr Glu Thir Asp Met Glu Ser Asp Met Glu Luell Glu Thir 2015 2O25

Glu Luell Gly Ile Asp Ser Ile Lys Arg Wall Glu Ile Luell Ser Glu 2O3O 2O35 2O4. O

Wall Glin Ala Met Luell Asn Wall Glu Ala Asp Wall Asp Ala Lell 2O45 2OSO 2O55

Ser Arg Thir Arg Thir Wall Gly Glu Wall Wall Asp Met Ala 2O60 2O65

Glu Ile Ala Gly Gly Ser Pro Ala Pro Ala Ala Ala Pro 2O75

Ala Ser Ala Gly Ala Ala Pro Ala Wall Ile Ser Wall His 2O90 2095

Gly Ala Asp Asp Asp Luell Ser Luell Met His Wall Wall 2105 211 O

Asp Ile Arg Arg Pro Asp Glu Luell Ile Luell Glu Pro Glu Asn 212O 2125

Arg Pro Wall Lell Wall Wall Asp Asp Gly Ser Glu Thir Lell Ala 2135 214 O

Lell Wall Arg Wall Luell Gly Ala Cys Ala Wall Wall Thir Phe Glu 2150 215.5 US 7,645,597 B2 117 118

- Continued

Gly Luell Glin Lell Ala Glin Arg Ala Ala Ala Ala Ile Arg His 21 65 217 O 21.75

Wall Luell Ala Asp Luell Ser Ala Ser Ala Glu Lys Ala Ile 218O 21.85 219 O

Glu Ala Glu Glin Arg Phe Gly Luell Gly Gly Phe Ile Ser 21.95 22 OO 22O5

Glin Glin Ala Glu Arg Phe Glu Pro Glu Ile Lell Gly Phe Thir 221 O 2215 222 O

Lell Met Ala Phe Ala Lys Ser Lell Cys Thir Ala Wall 2225 223 O 2235

Ala Gly Gly Arg Pro Ala Phe Ile Wall Ala Arg Luell Asp Gly 224 O 2.245 225 O

Arg Luell Gly Phe Thir Ser Glin Gly Thir Ser Asp Ala Luell Arg 2255 226 O 2265

Ala Glin Arg Gly Ala Ile Phe Gly Luell Thir Ile Gly Lell 2270 2275 228O

Glu Trp Ser Glu Ser Asp Wall Phe Ser Arg Gly Wall Asp Ile Ala 2285 229 O 2295

Glin Gly Met His Pro Glu Asp Ala Ala Wall Ala Ile Wall Arg Glu 23 OO 23 OS 2310

Met Ala Ala Asp Ile Arg Ile Arg Glu Wall Gly Ile Gly Ala 2315 232O 2325

Asn Glin Glin Arg Thir Ile Arg Ala Ala Lell Glu Thir Gly 233 O 2335 234 O

Asn Pro Glin Arg Glin Ile Ala Lys Asp Asp Wall Lell Luell Wall Ser 2345 2350 2355

Gly Gly Ala Arg Gly Ile Thir Pro Luell Ile Arg Glu Ile Thir 2360 23.65 2370

Arg Glin Ile Ala Gly Gly Lys Tyr Ile Luell Lell Gly Arg Ser 2375 238O 23.85

Wall Ser Ala Ser Glu Pro Ala Trp Ala Gly Ile Thir Asp Glu 23.90 23.95 24 OO

Ala Wall Glin Ala Ala Thir Glin Glu Lell Lys Arg Ala Phe 24 O5 241. O 24.15

Ser Ala Gly Glu Gly Pro Lys Pro Thir Pro Arg Ala Wall Thir 242O 24.25 243 O

Lell Wall Gly Ser Wall Luell Gly Ala Arg Glu Wall Arg Ser Ser Ile 2435 244 O 2445

Ala Ala Ile Glu Ala Luell Gly Gly Ala Ile Tyr Ser Ser 2450 2455 246 O

Asp Wall Asn Ser Ala Ala Asp Wall Ala Ala Wall Arg Asp Ala 24 65 2470 2475

Glu Ser Glin Lell Gly Ala Arg Wall Ser Ile Wall His Ala Ser 248O 2485 249 O

Gly Wall Lell Arg Asp Arg Luell Ile Glu Lell Pro Asp Glu 2495 25 OO 2505

Phe Asp Ala Wall Phe Gly Thir Wall Thir Gly Lell Glu Asn Lell 251O 25.15 252O

Lell Ala Ala Wall Asp Arg Ala Asn Luell His Met Wall Lell Phe 2525 253 O 25.35

Ser Ser Lell Ala Gly Phe His Gly Asn Wall Gly Glin Ser Asp 254 O 25.45 2550 US 7,645,597 B2 119 120

- Continued

Ala Met Ala Asn. Glu Ala Lieu. Asn Met Gly Lell Glu Lieu. Ala 2555 2560 25.65

Lys Asp Wal Ser Val Lys Ser Ile Phe Gly Pro Trp Asp Gly 2570 27s 2580

Gly Met Val Thr Pro Glin Leu Lys Glin Phe Glin Glu Met Gly 2585 2590 2595

Val Glin Ile Ile Pro Arg Glu Gly Gly Ala Asp Thir Wall Ala Arg 26 OO 2605 261 O

Ile Val Lieu. Gly Ser Ser Pro Ala Glu Ile Lell Wall Gly Asn Trp 2615 262O 262s

Arg Thr Pro Ser Lys Llys Val Gly Ser Asp Thir Ile Thir Lell His 263 O 2635 264 O

Arg Lys Ile Ser Ala Lys Ser Asn Pro Phe Lell Glu Asp His Wall 2645 2650 2655

Ile Glin Gly Arg Arg Val Lieu Pro Met Thir Lell Ala Ile Gly Ser 266 O 2665 2670

Lieu Ala Glu Thir Cys Lieu. Gly Lieu. Phe Pro Gly Tyr Ser Lell Trp 2675 268O 2685

Ala Ile Asp Asp Ala Glin Lieu. Phe Gly Wall Thir Wall Asp Gly 2690 2695 27 OO

Asp Wall Asn. Cys Glu Val Thir Lieu Thir Pro Ser Thir Ala Pro Ser 27 OS 271 O 2715

Gly Arg Val Asn Val Glin Ala Thr Luell Thir Phe Ser Ser Gly 272O 2725 273 O

Llys Lieu Val Pro Ala Tyr Arg Ala Wall Ile Wall Lell Ser Asn Glin 2735 274 O 2745

Gly Ala Pro Pro Ala Asn Ala Thr Met Glin Pro Pro Ser Lell Asp 2750 27s 276 O

Ala Asp Pro Ala Lieu. Glin Gly Ser Wall Asp Gly Thir Lell 2765 2770 2775

Phe His Gly Pro Ala Phe Arg Gly Ile Asp Asp Wall Luell Ser 2780 2785 279 O

Thir Lys Ser Glin Lieu Val Ala Lys Ser Ala Wall Pro Gly Ser 2.79s 28OO 2805

Asp Ala Ala Arg Gly Glu Phe Ala Thir Asp Thir Asp Ala His Asp 2810 2815 282O

Pro Phe Val Asn Asp Lieu Ala Phe Glin Ala Met Lell Wall Trp Wall 2825 283 O 2835

Arg Arg Thr Lieu. Gly Glin Ala Ala Luell Pro Asn Ser Ile Glin Arg 284 O 284.5 285 O

Ile Val Gln His Arg Pro Val Pro Glin Asp Pro Phe Tyr Ile 2855 286 O 2865

Thir Lieu. Arg Ser Asn Glin Ser Gly Gly His Ser Glin His His 2870 2875 288O

Ala Lieu. Glin Phe His Asn. Glu Glin Gly Asp Lell Phe Ile Asp Wall 2.885 289 O 2.895

Glin Ala Ser Val Ile Ala Thr Asp Ser Luell Ala Phe 29 OO 29 OS 291. O

<210 SEQ ID NO 3 &2 11s LENGTH: 6.177 &212> TYPE: DNA <213> ORGANISM: Schizochytrium sp. &220s FEATURE: <221 NAME/KEY: CDS

US 7,645,597 B2 129 130

- Continued att agc ggc aag cc c gac gcc tgc aag gct gcg atc gcg cgt. citc. 3744 Ile Ser Gly Lys Pro Asp Ala Cys Lys Ala Ala le Ala Arg Lell 235 24 O 245 ggit aac att cott gcg cc c gtg acc Cag ggc atgtc. ggc 3789 Gly Asn Ile Pro Ala Pro Wall Thir Glin Gly Met Cys Gly 26 O

CaC c cc gag gtg gga tat acc aag gat atc gcc aag atc. 383 4 His Pro Glu Wall Gly Thir Lys Asp le Ala Lys Ile 27s

Cat aac citt gag ttic gtt gt C gac ggc citt gac citc tgg 3879 His Asn Lell Glu Phe Wall Wall Asp Gly Lieu. Asp Lieu Trp 29 O a CC atc. aac cag aag citc. gtg CCa cgc gcc acg ggc gcc 392.4 Thir Ile Asn Glin Lys Luell Wall Pro Arg Ala Thr Gly Ala 305 aag gaa tgg gcc cott to c titt ggc gag tac gcc ggc Cag 3969 Lys Glu Trp Ala Pro Ser Phe Gly Glu Tyr Ala Gly Glin 32O citc. gag aag cag gct ttic cc c Cala atc. gtc gag acc att 4 O14 Lell Glu Lys Glin Ala Phe Pro Glin Ile Wall Glu. Thir Ile 335 tac Cala aac tac gac titt gt C gag gtt ggg ccc aac aac 4 OS 9 Glin Asn Asp Phe Wall Glu Wall Gly Pro Asn Asn 350

CaC agc a CC gca gtg acc acg citt ggit CCC cag cc aac 4104 His Ser Thir Ala Wall Thir Thir Luell Gly Pro Glin Arg Asn 365

CaC gct ggc gcc at C aag cag aac gag gat gct tdg acg 41.49 His Ala Gly Ala Ile Lys Glin ASn Glu Asp Ala Trp Thir 38O a CC gtc aag citt gtg tog citc. aag gcc cac citt gtt cott 41.94 Thir Wall Luell Wall Ser Luell Ala His Lieu Val Pro 395 ggc acg atc. tog cc.g tac CaC to c aag Ctt gtg gcg gag 4239 Gly Thir Ile Ser Pro His Ser Lieu Wall Ala Glu 41 O gct gct tac gct citc. tgc aag ggit gala aag CCC aag 4284 Ala Ala Ala Luell Cys Gly Glu Lys Pro 425 aag aag titt gtg cgc att cag citc. aac ggt cqc titc aac 4329 Phe Wall Arg Ile Glin Luell Asn Gly Arg Phe Asn 44 O agc gcg gac cc c at C tog gcc gat citt gcc agc titt cc.g 4374 Ser Ala Asp Pro Ile Ser Ala Asp Lell Ala Ser Phe Pro 45.5 cott gac cott gcc att gcc gcc atc. tcg agc cgc atc atg 4 419 Pro Asp Pro Ala Ile Ala Ala Ile Ser Ser Arg Ile Met 47 O aag gtc gct cc c aag tac gcg citc. aac att gac gag 4 464 Wall Ala Pro Lys Ala Arg Lell Asn. Ile Asp Glu 485

Cag gag a CC cga gat at C citc. aac aag gac aac gcg cc.g 4509 Glin Glu Thir Arg Asp Ile Luell ASn Lys Asp Asn Ala Pro SOO tot tot tot tot tot tot tot tot tot tot tot tot tot 45.54 Ser Ser Ser Ser Ser Ser Ser Ser Ser Ser Ser Ser Ser 515 cc.g cott gct cott tog cc c gtg Cala aag aag gct gct c cc 45.99 Pro Pro Ala Pro Ser Pro Wall Glin Lys Lys Ala Ala Pro 53 O

US 7,645,597 B2 135 136

- Continued

<4 OO SEQUENCE: 4.

Met Ala Ala Arg Asn Wall Ser Ala Ala His Glu Met His Asp Glu Lys 1. 15

Arg Ile Ala Wall Wall Gly Met Ala Wall Glin Tyr Ala Gly Cys Thir 25

Lys Asp Glu Phe Trp Glu Wall Luell Met Asn Gly Wall Glu Ser Lys 35 4 O 45

Wall Ile Ser Asp Lys Arg Lell Gly Ser Asn Ala Glu His Tyr SO 55

Lys Ala Glu Arg Ser Lys Ala Asp Thir Phe Asn Glu Thir Tyr 65 70

Gly Thir Luell Asp Glu Asn Glu Ile Asp Asn Glu His Glu Luell Luell Luell 85 90 95

Asn Luell Ala Lys Glin Ala Lell Ala Glu Thir Ser Wall Asp Ser Thir 105 11 O

Arg Gly Ile Wall Ser Gly Cys Luell Ser Phe Pro Met Asp Asn Luell 115 12 O 125

Glin Gly Glu Luell Lell Asn Wall Glin Asn His Wall Glu Luell 13 O 135 14 O

Gly Ala Arg Wall Phe Lys Asp Ala Ser His Trp Ser Glu Arg Glu Glin 145 150 155 160

Ser Asn Pro Glu Ala Gly Asp Arg Arg Ile Phe Met Asp Pro Ala 1.65 17s

Ser Phe Wall Ala Glu Glu Lell Asn Luell Gly Ala Lell His Tyr Ser Wall 18O 185 19 O

Asp Ala Ala Ala Thir Ala Luell Wall Luell Arg Lell Ala Glin Asp 195

His Luell Wall Ser Gly Ala Ala Asp Wall Met Luell Cys Gly Ala Thir Cys 21 O 215 22O

Lell Pro Glu Pro Phe Phe Ile Luell Ser Gly Phe Ser Thir Phe Glin Ala 225 23 O 235 24 O

Met Pro Wall Gly Thir Gly Glin Asn Wall Ser Met Pro Lell His Lys Asp 245 250 255

Ser Glin Gly Luell Thir Pro Gly Glu Gly Gly Ser Ile Met Wall Luell Lys 26 O 265 27 O

Arg Luell Asp Asp Ala Ile Arg Asp Gly Asp His Ile Tyr Gly Thir Luell 285

Lell Gly Ala Asn Wall Ser Asn Ser Gly Thir Gly Lell Pro Luell Pro 29 O 295 3 OO

Lell Luell Pro Ser Glu Lys Luell Met Asp Thir Thir Arg Ile 3. OS 310 315

Asn Wall His Pro His Ile Glin Wall Glu His Ala Thir Gly 3.25 330 335

Thir Pro Glin Gly Asp Arg Wall Glu Ile Asp Ala Wall Ala Phe 34 O 345 35. O

Glu Gly Lys Wall Pro Arg Phe Gly Thir Thir Gly Asn Phe Gly His 355 360 365

Thir Xaa Xaa Ala Ala Gly Phe Ala Gly Met Cys Lys Wall Luell Luell Ser 37 O 375

Met His Gly Ile Ile Pro Pro Thir Pro Gly Ile Asp Asp Glu Thir 385 390 395 4 OO

Met Asp Pro Lell Wall Wall Ser Gly Glu Ala Ile Pro Trp Pro Glu US 7,645,597 B2 137 138

- Continued

4 OS 415

Thir Asn Gly Glu Pro Arg Ala Gly Luell Ser Ala Phe Gly Phe Gly 42O 425 43 O

Gly Thir Asn Ala His Ala Wall Phe Glu Glu His Asp Pro Ser Asn Ala 435 44 O 445

Ala Cys Thir Gly His Asp Ser Ile Ser Ala Luell Ser Ala Arg Gly 450 45.5 460

Gly Glu Ser Asn Met Arg Ile Ala Ile Thir Gly Met Asp Ala Thir Phe 465 470

Gly Ala Luell Gly Lell Asp Ala Phe Glu Arg Ala Ile Thir Gly 485 490 495

Ala His Gly Ala Ile Pro Lell Pro Glu Arg Trp Arg Phe Luell Gly SOO 505 51O

Asp Lys Asp Phe Lell Asp Luell Gly Wall Ala Thir Pro His 515 525

Gly Cys Ile Glu Asp Wall Glu Wall Asp Phe Glin Arg Luell Arg Thir 53 O 535 54 O

Pro Met Thir Pro Glu Asp Met Luell Luell Pro Glin Glin Lell Luell Ala Wall 5.45 550 555 560

Thir Thir Ile Asp Arg Ala Ile Luell Asp Ser Gly Met Gly Gly 565 st O sts

Asn Wall Ala Wall Phe Wall Gly Luell Gly Thir Asp Lell Glu Luell Tyr 585 59 O

His Arg Ala Arg Wall Ala Lell Lys Glu Arg Wall Arg Pro Glu Ala Ser 595 605

Lys Luell Asn Asp Met Met Glin Tyr Ile ASn Asp Gly Thir Ser 610 615

Thir Ser Thir Ser Tyr Ile Gly Asn Luell Wall Ala Thir Arg Wall Ser 625 630 635 64 O

Ser Glin Trp Gly Phe Thir Gly Pro Ser Phe Thir Ile Thir Glu Gly Asn 645 650 655

Asn Ser Wall Tyr Arg Ala Glu Luell Gly Lys Lell Luell Glu Thir 660 665 67 O

Gly Glu Wall Asp Gly Wall Wall Wall Ala Gly Wall Asp Lell Gly Ser 675 685

Ala Glu Asn Luell Tyr Wall Lys Ser Arg Arg Phe Lys Wall Ser Thir Ser 69 O. 695 7 OO

Asp Thir Pro Arg Ala Ser Phe Asp Ala Ala Ala Asp Gly Tyr Phe Wall 7 Os

Gly Glu Gly Gly Ala Phe Wall Luell Lys Arg Glu Thir Ser Cys Thir 72 73 O 73

Asp Asp Arg Ile Ala Met Asp Ala Ile Wall Pro Gly Asn 740 74. 7 O

Wall Pro Ser Ala Lell Arg Glu Ala Luell Asp Glin Ala Arg Wall Lys 760 765

Pro Gly Asp Ile Glu Met Lell Glu Luell Ser Ala Asp Ser Ala Arg His 770 775

Lell Asp Pro Ser Wall Lell Pro Glu Luell Thir Ala Glu Glu Glu 79 O 79.

Ile Gly Gly Luell Glin Thir Ile Luell Arg Asp Asp Asp Luell Pro Arg 805 810 815

Asn Wall Ala Thir Gly Ser Wall Ala Thir Wall Gly Asp Thir Gly Tyr 82O 825 83 O US 7,645,597 B2 139 140

- Continued

Ala Ser Gly Ala Ala Ser Lieu. Ile Lys Ala Ala Lieu. Cys Ile Tyr Asn 835 84 O 845

Arg Tyr Luell Pro Ser Asn Gly Asp Asp Trp Asp Glu Pro Ala Pro Glu 850 855 860

Ala Pro Trp Asp Ser Thr Lieu. Phe Ala Cys Glin Thir Ser Arg Ala Trip 865 87O 87s 88O

Lell Asn Pro Gly Glu Arg Arg Tyr Ala Ala Val Ser Gly Val Ser 885 890 895

Glu Thir Arg Ser Cys Tyr Ser Val Lieu. Lieu. Ser Glu Ala Glu Gly His 9 OO 905 91 O

Glu Arg Glu Asn Arg Ile Ser Lieu. Asp Glu Glu Ala Pro Llys Lieu. 915 92 O 925

Ile Wall Luell Arg Ala Asp Ser His Glu Glu Ile Lieu. Gly Arg Lieu. Asp 93 O 935 94 O

Lys Ile Arg Glu Arg Phe Lieu. Glin Pro Thr Gly Ala Ala Pro Arg Glu 945 950 955 96.O

Ser Glu Luell Lys Ala Glin Ala Arg Arg Ile Phe Lieu. Glu Lieu. Lieu. Gly 965 97O 97.

Glu Thir Luell Ala Glin Asp Ala Ala Ser Ser Gly Ser Glin Llys Pro Lieu 98O 985 99 O

Ala Luell Ser Lieu Val Ser Thr Pro Ser Llys Lieu. Glin Arg Glu Val Glu 995 1OOO OOS

Lell Ala Ala Lys Gly Ile Pro Arg Cys Lieu Lys Met Arg Arg Asp O1O O15 O2O

Trp Ser Pro Ala Gly Ser Arg Tyr Ala Pro Glu Pro Lieu Ala O3 O O35

Ser Arg Val Ala Phe Met Tyr Gly Glu Gly Arg Ser Pro Tyr O45 OSO

Ile Thr Glin Asp Ile His Arg Ile Trp Pro Glu Lieu. His O6 O O65

Glu Ile Asn Glu Lys Thr Asn Arg Lieu. Trp Ala Glu Gly Asp O7 O8O

Arg Wall Met Pro Arg Ala Ser Phe Llys Ser Glu Lieu. Glu Ser O9 O O95

Glin Glin Glu Phe Asp Arg Asn Met Ile Glu Met Phe Arg Lieu. O5 10

Gly Lell Thir Ser Ile Ala Phe Thr Asn Lieu Ala Arg Asp Wall 2O 25

Lell Ile Thr Pro Lys Ala Ala Phe Gly Lieu. Ser Lieu. Gly Glu 35 4 O

Ile Met Ile Phe Ala Phe Ser Lys Lys Asn Gly Lieu. Ile Ser SO 55

Asp Lell Thir Lys Asp Lieu. Arg Glu Ser Asp Val Trp Asn Llys 65 70

Ala Ala Val Glu Phe Asn Ala Lieu. Arg Glu Ala Trp Gly Ile 8O 85

Pro Ser Val Pro Lys Asp Glu Phe Trp Gln Gly Tyr Ile Val 95 2OO

Arg Thir Lys Glin Asp e Glu Ala Ala Ile Ala Pro Asp Ser 21 O 215

Wall Arg Lieu. Thir e Ile Asin Asp Ala Asn. Thir Ala Lieu. 22O 225 23 O US 7,645,597 B2 141 142

- Continued Ile Gly Pro Asp Ala Cys Ala Ala le Ala Arg Lieu. 24 O 245

Gly Asn Ile Pro Ala Luell Pro Wall Thir Glin Gly Met Gly 255 26 O

His Pro Glu Wall Gly Pro Thir Asp le Ala Ile 27 O 27s

His Asn Lell Glu Phe Pro Wall Wall Asp Gly Lell Asp Lell Trp 285 29 O

Thir Ile Asn Glin Lys Luell Wall Pro Arg Ala Thir Gly Ala 305

Glu Trp Ala Pro Ser Ser Phe Gly Glu Ala Gly Glin 315

Lell Glu Glin Ala Asn Phe Pro Glin Ile Wall Glu Thir Ile 33 O 335

Glin Asn Asp Phe Wall Glu Wall Pro Asn Asn

His Ser Thir Ala Wall Thir Thir Luell Gly Glin Arg Asn

His Ala Gly Ala Ile Lys Glin ASn Glu Ala Trp Thir

Thir Wall Luell Wall Ser Luell Ala Luell Wall Pro

Gly Thir Ile Ser Pro His Ser Lell Wall Ala Glu 41 O

Ala Glin Ala Ala Luell Gly Glu Pro 415 425

Asn Phe Wall Arg Ile Glin Luell Asn Arg Phe Asn 43 O 44 O

Ser Ala Asp Pro Ile Ser Ser Ala Asp Lell Ala Ser Phe Pro 450 45.5

Pro Asp Pro Ala Ile Glu Ala Ala Ile Ser Arg Ile Met 465

Wall Ala Pro Lys Phe Ala Arg Lell Ile Asp Glu 48O

Glin Glu Thir Arg Asp Pro Ile Luell ASn ASn Ala Pro 495

Ser Ser Ser Ser Ser Ser Ser Ser Ser Ser Ser Ser Ser Ser 5 OS 510

Pro Ser Pro Ala Pro Ser Pro Wall Glin Ala Ala Pro

Ala Glu Thir Ala Ala Ser Ala Asp Luell Arg Ser 535

Ala Luell Lell Asp Luell Asp Ser Met Luell Ala Lell Ser Ala Ser 550 555

Ala Ser Gly Asn Luell Wall Glu Thir Ala Pro Ser Ala Ser Wall 565 st O

Ile Pro Pro Asn Ala Asp Luell Gly Arg Ala Phe 585

Met Thir Gly Wall Ser Ala Pro Luell Gly Ala Met 6OO 605

Ala Gly Ile Ala Ser Asp Luell Wall Ile Ala Ala Gly Arg 615 62O

Glin Ile Lell Ala Ser Phe Gly Ala Gly Gly Lell Pro Met Glin US 7,645,597 B2 143 144

- Continued

625 63 O 635

Wall Arg Glu Ser Ile Glu Ile Glin Ala Ala Luell Pro Asn 64 O 645 650

Gly Pro Ala Wall Asn Luell Ile His Ser Pro Phe Asp Ser Asn 655 660 665

Lell Glu Gly Asn Wall Luell Phe Luell Glu Gly Wall Thir 670 675

Phe Glu Ala Ser Ala Phe Met Thir Luell Thir Glin Wall Wall 685 69 O.

Arg yr Arg Ala Ala Gly Luell Thir Arg ASn Ala Gly Ser Wall 7 OO 7Os

Asn Arg Asn Arg Ile Gly Wall Ser Thir Glu Lell

Ala Met Phe Met Arg Ala Pro Glu His Lell Luell Glin 74 O

Lell Ala Ser Gly Glu Asn Glin Glu Glin Ala Glu Lell Ala 7ss

Arg Wall Pro Wall Ala Asp Ile Ala Wall Glu Ala Asp Ser 770

Gly His Thir Asp Asn Pro Ile His Wall Luell Pro Lell

Ile Asn Lell Arg Asp Luell His Arg Glu Gly Pro

Ala Asn Lell Arg Wall Arg Gly Ala Gly Gly Ile Gly 805

Pro Glin Ala Ala Luell Ala Phe Asn Met Gly Ser Phe Ile 825

Wall Gly Thir Wall Asn Glin Wall Ala Glin Gly Thir 835 84 O

Asp Asn Wall Arg Glin Luell Ala Ala Thir Ser Asp Wall 850 855

Met Ala Pro Ala Ala Met Phe Glu Glu Wall Lell 865 87 O

Glin Lell Gly Met Phe Pro Ser Ala Asn

Lell Glu Lell Phe Tyr Asp Ser Phe Glu Ser Met Pro 905

Pro Glu Lell Ala Arg Glu Arg Ile Phe Ser Arg Ala 92 O

Lell Glu Wall Trp Asp Thir ASn Phe yr Ile Asn Arg 935

Lell Asn Pro Glu Glin Arg Ala Glu Asp Pro

Lell Met Ser Luell Arg Trp Lell Luell Ala Ser

Arg Ala Asn Thir Gly Ser Asp Arg Wall Asp Glin

Wall Gly Pro Ala Gly Ser Phe Asn Phe Ile 985

Gly Lell Asp Pro Ala Wall Ala ASn Glu Pro Wall 2OOO 2005

Wall Glin Ile Asn Glin Luell Arg Gly Ala Cys Phe Lell Arg 2015 2O25

US 7,645,597 B2 155 156

- Continued

Ctg cgc cgt citc aac goc ctd cgc aac gac cc.g cgc att gac citc. 4 464 Lell Arg Arg Lieu. Asn Ala Lieu. Arg Asn Asp Pro Arg Ile Asp Lell 1475 148 O 1485 gag acc gag gat gct gcc titt gt C tac gag c cc a CC aac gcg citc. 4509 Glu Thir Glu Asp Ala Ala Phe Wall Glu Pro Thir Asn Ala Lell 1490 149 5 15OO

<210 SEQ ID NO 6 <211 LENGTH: 1503 &212> TYPE : PRT <213> ORGANISM: Schizochytrium sp.

<4 OO SEQUENCE: 6

Met Ala Lieu. Arg Wall Thir Asn Lys Pro Trp Glu Met Thir 1. 5 15

Lys Glu Glu Luell Thir Ser Gly Thir Glu Wall Phe Asn Tyr Glu Glu 25

Lell Luell Glu Phe Ala Glu Gly Asp Ile Ala Wall Phe Gly Pro Glu 35 4 O 45

Phe Ala Wall Ile Asp Tyr Pro Arg Arg Wall Arg Lell Pro Ala Arg SO 55 6 O

Glu Luell Luell Wall Thir Arg Wall Thir Luell Met Asp Ala Glu Wall Asn 65 70

Asn Arg Wall Gly Ala Arg Met Wall Thir Glu Asp Luell Pro Wall 85 90 95

Asn Gly Glu Luell Ser Glu Gly Gly Asp Pro Ala Wall Luell Wall 105 11 O

Glu Ser Gly Glin Asp Lell Met Luell Ile Ser Met Gly Ile Asp 115 12 O 125

Phe Glin Asn Glin Gly Asp Arg Wall Arg Luell Lell Asn Thir Thir Luell 13 O 135 14 O

Thir Phe Gly Wall Ala His Glu Gly Glu Thir Lell Glu Asp Ile 145 150 155 160

Arg Wall Thir Gly Phe Ala Arg Luell Asp Gly Gly Ile Ser Met Phe 1.65 17O

Phe Phe Glu Tyr Asp Wall Asn Gly Arg Lell Lell Ile Glu Met 18O 185 19 O

Arg Asp Gly Ala Gly Phe Phe Thir Asn Glu Glu Lell Asp Ala Gly 195

Gly Wall Wall Phe Thir Arg Gly Asp Luell Ala Ala Arg Ala Ile 21 O 215

Pro Glin Asp Wall Ser Pro Ala Wall Ala Pro Luell His Lys 225 23 O 235 24 O

Thir Luell Asn Glu Glu Met Glin Thir Luell Wall Asp Asp Trp 245 250 255

Ala Ser Wall Phe Gly Ser Asn Gly Met Pro Glu Ile Asn Lys 26 O 265 27 O

Lell Ala Arg Met Lell Met Ile Asp Arg Wall Thir Ser Ile Asp 285

His Lys Gly Gly Wall Gly Luell Gly Glin Luell Wall Gly Glu Ile 29 O 295 3 OO

Lell Glu Arg Asp His Trp Phe Pro His Phe Wall Asp Glin 3. OS 310 315

Wall Met Ala Gly Ser Lell Wall Ser Asp Gly Ser Glin Met Luell Lys 3.25 330 335 US 7,645,597 B2 157 158

- Continued

Met Met Ile Trp Lell Gly Luell His Luell Thir Thir Gly Pro Phe Asp 34 O 345 35. O

Phe Arg Pro Wall Asn Gly His Pro Asn Wall Arg Cys Arg Gly Glin 355 360 365

Ile Ser Pro His Gly Lys Luell Wall Wall Met Glu Ile Glu 37 O 375

Met Gly Phe Asp Glu Asp Asn Asp Pro Ala Ile Ala Asp Wall Asn 385 390 395 4 OO

Ile Ile Asp Wall Asp Phe Glu Gly Glin Asp Phe Ser Luell Asp Arg 4 OS 41O 415

Ile Ser Asp Tyr Gly Gly Asp Luell Asn Lys Ile Wall Wall Asp 425 43 O

Phe Gly Ile Ala Lell Met Glin Arg Ser Thir Asn Asn 435 44 O 445

Pro Ser Wall Glin Pro Wall Phe Ala Asn Gly Ala Ala Thir Wall Gly 450 45.5 460

Pro Glu Ala Ser Ala Ser Ser Gly Ala Ser Ala Ser Ala Ser Ala 465 470

Ala Pro Ala Pro Ala Phe Ser Ala Asp Wall Lell Ala Pro Lys Pro 485 490 495

Wall Ala Luell Pro Glu His Ile Luell Lys Gly Asp Ala Lell Ala Pro SOO 505

Glu Met Ser Trp His Pro Met Ala Arg Ile Pro Gly Asn Pro Thir Pro 515 525

Ser Phe Ala Pro Ser Ala Tyr Pro Arg ASn Ile Ala Phe Thir Pro 53 O 535 54 O

Phe Pro Gly Asn Pro Asn Asp Asn Asp His Thir Pro Gly Met Pro 5.45 550 555 560

Lell Thir Trp Phe Asn Met Glu Phe Met Ala Gly Wall Ser Met 565 st O sts

Luell Gly Pro Glu Phe Phe Asp Asp Ser Asn Thir Ser Arg 585 59 O

Ser Pro Ala Trp Asp Lell Luell Wall Thir Arg Ala Wall Ser Wall Ser 595 605

Asp Luell His Wall Asn Arg Asn Ile Asp Lell Asp Pro Ser 610

Gly Thir Met Wall Gly Glu Asp Pro Ala Asp Ala Trp Phe Tyr 625 630 635 64 O

Gly Ala Asn Asp His Met Pro Ser Ile Luell Met Glu 645 650 655

Ile Ala Luell Glin Thir Ser Wall Luell Thir Ser Wall Lell Lys Ala Pro 660 665 67 O

Lell Thir Met Glu Asp Asp Ile Luell Phe Arg Asn Lell Asp Ala Asn 675 685

Ala Glu Phe Wall Arg Ala Asp Luell Asp Arg Gly Lys Thir Ile Arg 69 O. 695 7 OO

Asn Wall Thir Cys Thir Gly Tyr Ser Met Luell Gly Glu Met Gly Wall 7 Os 72O

His Arg Phe Thir Phe Glu Lell Tyr Wall Asp Asp Wall Lell Phe Tyr Lys 72 73 O 73

Gly Ser Thir Ser Phe Gly Trp Phe Wall Pro Glu Wall Phe Ala Ala Glin 740 74. 7 O US 7,645,597 B2 159 160

- Continued Ala Gly Lieu. Asp Asn Gly Arg Lys Ser Glu Pro Trp Phe Ile Glu Asn 760 765

Wall Pro Ala Ser Glin Wall Ser Ser Phe Asp Wall Arg Pro Asn Gly 770 775

Ser Gly Arg Thir Ala Ile Phe Ala Asn Ala Pro Ser Gly Ala Glin Lieu. 79 O 79.

Asn Arg Arg Thir Asp Glin Gly Glin Luell Asp Ala Wall Asp Ile Wall 805 810 815

Ser Gly Ser Gly Lys Ser Luell Gly Tyr Ala His Gly Ser Lys Thir 82O 825 83 O

Wall Asn Pro Asn Asp Trp Phe Phe Ser His Phe Trp Phe Asp Ser 835 84 O 845

Wall Met Pro Gly Ser Lell Gly Wall Glu Ser Met Phe Glin Lieu Wall Glu 850 855 860

Ala Ile Ala Ala His Glu Asp Luell Ala Gly Lys Ala Arg His Cys Glin 865

Pro His Luell Cys Ala Arg Pro Arg Ala Arg Ser Ser Trp Arg 885 890 895

Gly Glin Luell Thir Pro Ser Lys Met Asp Ser Glu Wall His Ile 9 OO 905 91 O

Wall Ser Wall Asp Ala His Asp Gly Wall Wall Asp Lell Wall Ala Asp Gly 915 92 O 925

Phe Luell Trp Ala Asp Ser Lell Arg Wall Ser Wall Ser ASn Ile Arg 93 O 935 94 O

Wall Arg Ile Ala Ser Gly Glu Ala Pro Ala Ala Ala Ser Ser Ala Ala 945 950 955 96.O

Ser Wall Gly Ser Ser Ala Ser Ser Wall Glu Arg Thir Arg Ser Ser Pro 965 97O 97.

Ala Wall Ala Ser Gly Pro Ala Glin Thir Ile Asp Lell Lys Glin Lieu Lys 98O 985 99 O

Thir Glu Luell Lieu. Glu Lell Asp Ala Pro Leu Tyr Lieu Ser Glin Asp Pro 995 1OOO

Thir Ser Gly Glin Lieu Lys Llys Hi S Thir Asp Val Ala Ser Gly Glin O1O 5

Ala Ile Wall Glin Pro Cys Thir Luell Gly Asp L. el Gly Asp Arg O25 O O35

Ser Phe Met Glu Thir Tyr Gly Va l Wall Ala Pro L el Tyr Thr Gly O4 O OSO

Ala Met Ala Ile Ala Se r Ala Asp Lieu Wall Ile Ala Ala O55

Arg Lys Ile Lieu. Gly Se r Phe Gly Ala G Gly Lieu. Pro

Met His Val Arg Ala Ala Le u Glu Lys Ile G Ala Ala Lell

Pro Glin Gly Pro Tyr Ala Wall As in Luell Ile His S Pro Phe Asp 1 OO

Ser Asn Lell Glu Lys Gly Asn. Wa l Asp Luell Phe L Glu Lys Gly 115

Wall Wall Wall Glu Ala Ser Al a Phe Met Thir L Thir Pro Glin 13 O

Wall Arg Ala Ala Gl y Lieu. Ser Arg A Ala Asp Gly 145

Ser Wall Asn Ile Arg Asn Arg Il e Ile Gly Llys V al Ser Arg Thir US 7,645,597 B2 161 162

- Continued

16 O 1.65 17 O

Glu Lieu Ala Glu Met Phe Ile Arg Pro Ala Pro Glu His Lell Lell 17s 18O 185

Glu Lys Lieu. Ile Ala Ser Gly Glu Ile Thir Glin Glu Glin Ala Glu 190 195 2OO

Lieu Ala Arg Arg Val Pro Val Ala Asp Asp Ile Ala Wall Glu Ala 2O5 21 O 215

Asp Ser Gly Gly. His Thr Asp Asn Arg Pro Ile Wall Ile Lell 22O 225

Pro Lieu. Ile Ile Asn Lieu. Arg Asn Arg Luell His Glu Gly 235 24 O

Tyr Pro Ala His Lieu. Arg Val Arg Wall Gly Ala Gly Gly Wall 250 255

Gly Cys Pro Glin Ala Ala Ala Ala Ala Luell Thir Gly Ala Ala 265 27 O

Phe Ile Val Thr Gly Thr Val Asn Glin Wall Ala Glin Ser Gly 28O 285

Thir Cys Asp Asn Val Arg Lys Glin Luell Ser Glin Ala Thir Ser 295 3OO 305

Asp Ile Cys Met Ala Pro Ala Ala Asp Met Phe Glu Glu Gly Wall 310 315 32O

Llys Lieu. Glin Val Lieu Lys Lys Gly Thir Met Phe Pro Ser Arg Ala 3.25 33 O 335

ASn Llys Lieu. Tyr Glu Lieu. Phe Cys Asp Phe Asp Ser 34 O 345

Met Pro Pro Ala Glu Lieu. Glu Arg Ile Glu Ile Phe 355 360

Arg Ala Lieu. Glin Glu Val Trp Glu Glu Thir Phe Ile 37O 375

Asn Gly Lieu Lys Asn. Pro Glu Lys Ile Glin Arg Ala Glu His Asp 385 390 395

Pro Llys Lieu Lys Met Ser Lieu. Cys Phe Arg Trp yr Luell Gly Lell 4 OO 405 41 O

Ala Ser Arg Trp Ala Asn Met Gly Ala Pro Asp Wall Met Asp 415 42O 425

Tyr Glin Val Trp Cys Gly Pro Ala Ile Gly Ala Phe ASn Asp Phe 43 O 435 44 O

Ile Lys Gly Thr Tyr Lieu. Asp Pro Ala Wall Ser Glu Pro 445 450

Cys Val Val Glin Ile Asn Lieu. Glin Ile Luell Arg Ala 460 465

Lieu. Arg Arg Lieu. Asn Ala Lieu. Arg Asn Asp Pro Ile Asp Lell 47s 48O

Glu Thr Glu Asp Ala Ala Phe Val Glu Pro ASn Ala Lell 490 495 SOO

<210 SEQ ID NO 7 <211 LENGTH: 6OO &212> TYPE: DNA <213> ORGANISM: Schizochytrium sp. &220s FEATURE: <221 NAME/KEY: CDS <222> LOCATION: (1) . . (6 OO)

<4 OO SEQUENCE: 7 US 7,645,597 B2 163 164

- Continued atg gcg gcc cgt Ctg Cag gag Cala aag gga ggc gag atg gat acc cgc 48 Met Ala Ala Arg Lell Glin Glu Glin Lys Gly Gly Glu Met Asp Thir Arg 15 att gcc at C at C ggc atg tcg gcc at C citc. coc tgc ggc acg acc gtg 96 Ile Ala Ile Ile Gly Met Ser Ala Ile Luell Pro Cys Gly Thir Thir Wall 2O 25 3O cgc gag tog tgg gag a CC atc. cgc gcc ggc atc. gac tgc Ctg tog gat 144 Arg Glu Ser Trp Glu Thir Ile Arg Ala Gly Ile Asp Cys Luell Ser Asp 35 4 O 45 citc. cc c gag gac cgc gtc gac gtg acg gcg tac titt gac cc c gt C aag 192 Lell Pro Glu Asp Arg Wall Asp Wall Thir Ala Tyr Phe Asp Pro Wall Lys SO 55 6 O a CC acc aag gac aag atc. tac tgc aag cgc ggc tto att cc c gag 24 O Thir Thir Lys Asp Lys Ile Cys Lys Arg Gly Phe Ile Pro Glu 65 70 8O tac gac titt gac gcc gag ttic gga citc. aac atg tto cag atg gag 288 Asp Phe Asp Ala Arg Glu Phe Gly Luell ASn Met Phe Glin Met Glu 85 90 95 gac tog gac gca aac Cag a CC at C tog citt citc. aag gtc aag gag gcc 336 Asp Ser Asp Ala Asn Glin Thir Ile Ser Luell Luell Wall Lys Glu Ala 1OO 105 11 O citc. cag gac gcc ggc atc. gac gcc citc. ggc aag gaa aag aag aac at C 384 Lell Glin Asp Ala Gly Ile Asp Ala Luell Gly Lys Glu Lys Lys Asn Ile 115 12 O 125 ggc tgc gtg citc. ggc att ggc ggc ggc Cala aag t cc agc CaC gag ttic 432 Gly Cys Wall Luell Gly Ile Gly Gly Gly Glin Lys Ser Ser His Glu Phe 13 O 135 14 O tac tog cgc citt aat tat gt C gtg gag aag gtc citc. cgc aag atg Tyr Ser Arg Luell Asn Tyr Wall Wall Wall Glu Lys Wall Lell Arg Lys Met 145 150 155 160 ggc atg cc c gag gag gac gtc aag gt C gcc gtc gaa aag tac aag gcc 528 Gly Met Pro Glu Glu Asp Wall Lys Wall Ala Wall Glu Lys Lys Ala 1.65 17O 17s aac ttic cc c gag tgg cgc citc. gac to c ttic cott ggc tto citc. ggc aac 576 Asn Phe Pro Glu Trp Arg Lell Asp Ser Phe Pro Gly Phe Luell Gly Asn 18O 185 19 O gtc acc gcc ggt cgc tgc a CC aac Wall Thir Ala Gly Arg Cys Thir Asn 195 2OO

SEQ ID NO 8 LENGTH: TYPE : PRT ORGANISM: Schizochytrium sp.

SEQUENCE: 8

Met Ala Ala Arg Lell Glin Glu Glin Gly Gly Glu Met Asp Thir Arg 1. 5 1O 15

Ile Ala Ile Ile Gly Met Ser Ala Ile Luell Pro Gly Thir Thir Wall 25

Arg Glu Ser Trp Glu Thir Ile Arg Ala Gly Ile Asp Cys Luell Ser Asp 35 4 O 45

Lell Pro Glu Asp Arg Wall Asp Wall Thir Ala Tyr Phe Asp Pro Wall Lys SO 55 6 O

Thir Thir Asp Ile Gly Gly Phe Ile Pro Glu 65 70

Asp Phe Asp Ala Arg Glu Phe Gly Luell ASn Met Phe Glin Met Glu 85 90 95 US 7,645,597 B2 165 166

- Continued

Asp Ser Asp Ala Asn Glin Thir Ile Ser Luell Luell Llys Val Lys Glu Ala 105 11 O

Lell Glin Asp Ala Gly Ile Asp Ala Luell Gly Lys Glu Lys Lys Asn Ile 115 12 O 125

Gly Cys Wall Luell Gly Ile Gly Gly Gly Glin Lys Ser Ser His Glu Phe 13 O 135 14 O

Tyr Ser Arg Luell Asn Tyr Wall Wall Wall Glu Lys Wall Lell Arg Met 145 150 155 160

Gly Met Pro Glu Glu Asp Wall Wall Ala Wall Glu Lys Ala 1.65 17O 17s

Asn Phe Pro Glu Trp Arg Lell Asp Ser Phe Pro Gly Phe Luell Gly Asn 18O 185 19 O

Wall Thir Ala Gly Arg Thir Asn 195 2OO

SEO ID NO 9 LENGTH: 1278 TYPE: DNA ORGANISM: Schizochytrium sp. FEATURE: NAME/KEY: CDS LOCATION: (1) ... (1278)

<4 OO SEQUENCE: 9 gat gt C acc aag gag gcc tgg cgc citc. cc c cgc gag ggc gtC agc ttic 48 Asp Wall Thir Lys Glu Ala Trp Arg Luell Pro Arg Glu Gly Wall Ser Phe 1. 5 1O 15 cgc gcc aag ggc atc. gcc a CC aac ggc gct gtc gcc gcg citc. ttic to c 96 Arg Ala Lys Gly Ile Ala Thir Asn Gly Ala Wall Ala Ala Luell Phe Ser 25 3O ggc cag ggc gcg Cag tac acg CaC atg titt agc gag gtg gcc atg aac 144 Gly Glin Gly Ala Glin Thir His Met Phe Ser Glu Wall Ala Met Asn 35 4 O 45 tgg cc c cag ttic cgc Cag agc att gcc gcc atg gac gcc gcc cag to c 192 Trp Pro Glin Phe Glin Ser Ile Ala Ala Met Asp Ala Ala Glin Ser SO 55 6 O aag gt C gct gga agc gac aag gac titt gag cgc gtc t cc cag gt C citc. 24 O Lys Wall Ala Gly Ser Asp Phe Glu Arg Wall Ser Glin Wall Luell 65 7s 8O tac cc.g cgc aag cc.g gag cgt gag cc c gag Cag aac cc c aag aag 288 Pro Arg Lys Pro Glu Arg Glu Pro Glu Glin Asn Pro Lys Lys 85 90 95 atc. to c citc. acc gcc tcg cag cc c tog acc Ctg gcc tgc gct citc. 336 Ile Ser Luell Thir Ala Ser Glin Pro Ser Thir Lell Ala Cys Ala Luell 1OO 105 11 O ggit gcc titt gag atc. tto aag gag gcc ggc titc a CC cc.g gac titt gcc 384 Gly Ala Phe Glu Ile Phe Glu Ala Gly Phe Thir Pro Asp Phe Ala 115 12 O 125 gcc ggc Cat tog citc. ggit gag ttic gcc gcc citc. tac gcc gcg ggc tgc 432 Ala Gly His Ser Lell Gly Glu Phe Ala Ala Luell Tyr Ala Ala Gly Cys 13 O 135 14 O gtc gac cgc gac gag citc. titt gag citt gt C tgc cgc gcc cgc at C Wall Asp Arg Asp Glu Lell Phe Glu Luell Wall Cys Arg Arg Ala Arg Ile 145 150 155 160 atg ggc ggc aag gac gca cc.g gcc acc cc c aag gga tgc atg gcc gcc 528 Met Gly Gly Lys Asp Ala Pro Ala Thir Pro Lys Gly Cys Met Ala Ala 1.65 17O 17s gtc att ggc cc c aac gcc gag aac at C aag gtc Cag gcc gcc aac gt C 576 Wall Ile Gly Pro Asn Ala Glu Asn Ile Lys Wall Glin Ala Ala Asn Wall 18O 185 19 O US 7,645,597 B2 167 168

- Continued tgg citc. ggc aac t cc aac tcg cott tog cag acc gtc atc. acc ggc to c 624 Trp Luell Gly Asn Ser Asn Ser Pro Ser Glin Thir Wall Ile Thir Gly Ser 195 2OO 2O5 gtc gala ggt at C Cag gcc gag agc gcc cgc citc. Cag aag gag ggc ttic 672 Wall Glu Gly Ile Glin Glu Ser Ala Arg Luell Glin Lys Glu Gly Phe 21 O 215 22O cgc gt C gtg cott citt tgc gag agc gcc titc CaC tcg cc c cag atg 72 O Arg Wall Wall Pro Lell Cys Glu Ser Ala Phe His Ser Pro Glin Met 225 235 24 O gag aac gcc tog tcg tto aag gac gt C atc. t cc aag gtC to c ttic 768 Glu Asn Ala Ser Ser Phe Lys Asp Wall Ile Ser Wall Ser Phe 245 250 255 cgc acc cc c aag gcc a CC aag citc. ttic agc aac gtc tot ggc gag 816 Arg Thir Pro Lys Ala Thir Lys Luell Phe Ser Asn Wall Ser Gly Glu 26 O 265 27 O a CC tac cc c acg gac cgc gag atg citt acg Cag CaC atg acc agc 864 Thir Pro Thir Asp Arg Glu Met Luell Thir Glin His Met Thir Ser 27s 28O 285 agc gt C aag ttic citc. a CC Cag gt C cgc aac atg CaC Cag gcc ggt gcg 912 Ser Wall Lys Phe Lell Thir Glin Wall Arg Asn Met His Glin Ala Gly Ala 29 O 295 3 OO cgc at C titt gt C gag tto gga cc c aag cag gtg citc. t cc aag citt gt C 96.O Arg Ile Phe Wall Glu Phe Gly Pro Lys Glin Wall Lell Ser Lys Luell Wall 3. OS 310 315 32O t cc gag acc citc. aag gat gac cc c tog gtt gtc a CC gtc tot gt C aac OO8 Ser Glu Thir Luell Lys Asp Asp Pro Ser Wall Wall Thir Wall Ser Wall Asn 3.25 330 335 cc.g gcc tog ggc acg gat tcg gac at C cag citc. cgc gac gcg gcc gt C Pro Ala Ser Gly Thir Asp Ser Asp Ile Glin Luell Arg Asp Ala Ala Wall 34 O 345 35. O

Cag citc. gtt gt C gct ggc gtc aac citt cag ggc titt gac aag tgg gac 104 Glin Luell Wall Wall Ala Gly Wall Asn Luell Glin Gly Phe Asp Lys Trp Asp 355 360 365 gcc cc c gat gcc a CC cgc atg cag gcc at C aag aag aag cgc act acc 152 Ala Pro Asp Ala Thir Arg Met Glin Ala Ile Lys Lys Lys Arg Thir Thir 37 O 375 38O citc. citt tog gcc gcc a CC tac gt C tog gac aag a CC aag aag gt C 2OO Lell Arg Luell Ser Ala Ala Thir Wall Ser Asp Lys Thir Lys Lys Wall 385 390 395 4 OO

gac gcc gcc atg aac gat ggc cgc gtc a CC tac citc. aag ggc 248 Arg Asp Ala Ala Met Asn Asp Gly Arg Wall Thir Luell Lys Gly 4 OS 415 gcc gca cc.g citc. atc. aag gcc cc.g gag cc c 278 Ala Ala Pro Luell Ile Lys Ala Pro Glu Pro 42O 425

<210 SEQ ID NO 10 <211 LENGTH: 426 &212> TYPE : PRT <213> ORGANISM: Schizochytrium sp.

<4 OO SEQUENCE: 10

Asp Val Thir Lys Glu Ala Trp Arg Luell Pro Arg Glu Gly Wall Ser Phe 1. 5 1O 15

Arg Ala Lys Gly Ile Ala Thir Asn Gly Ala Wall Ala Ala Luell Phe Ser 25 3O

Gly Glin Gly Ala Glin Thir His Met Phe Ser Glu Wall Ala Met Asn 35 4 O 45

Trp Pro Glin Phe Arg Glin Ser Ile Ala Ala Met Asp Ala Ala Glin Ser US 7,645,597 B2 169 170

- Continued

SO 55 6 O

Lys Wall Ala Gly Ser Asp Asp Phe Glu Arg Wall Ser Glin Wall Luell 65 70

Pro Arg Pro Glu Arg Glu Pro Glu Glin Asn Pro Lys 85 90 95

Ile Ser Luell Thir Ala Ser Glin Pro Ser Thir Lell Ala Cys Ala Luell 1OO 105 11 O

Gly Ala Phe Glu Ile Phe Glu Ala Gly Phe Thir Pro Asp Phe Ala 115 12 O 125

Ala Gly His Ser Lell Gly Glu Phe Ala Ala Luell Tyr Ala Ala Gly 13 O 135 14 O

Wall Asp Arg Asp Glu Lell Phe Glu Luell Wall Cys Arg Arg Ala Arg Ile 145 150 155 160

Met Gly Gly Asp Ala Pro Ala Thir Pro Gly Met Ala Ala 1.65 17O 17s

Wall Ile Gly Pro Asn Ala Glu Asn Ile Lys Wall Glin Ala Ala Asn Wall 18O 185 19 O

Trp Luell Gly Asn Ser Asn Ser Pro Ser Glin Thir Wall Ile Thir Gly Ser 195 2O5

Wall Glu Gly Ile Glin Ala Glu Ser Ala Arg Luell Glin Glu Gly Phe 21 O 215

Arg Wall Wall Pro Lell Ala Glu Ser Ala Phe His Ser Pro Glin Met 225 23 O 235 24 O

Glu Asn Ala Ser Ser Ala Phe Asp Wall Ile Ser Wall Ser Phe 245 250 255

Arg Thir Pro Lys Ala Glu Thir Luell Phe Ser Asn Wall Ser Gly Glu 26 O 265 27 O

Thir Pro Thir Asp Ala Arg Glu Met Luell Thir Glin His Met Thir Ser 28O 285

Ser Wall Phe Lell Thir Glin Wall Arg Asn Met His Glin Ala Gly Ala 29 O 295 3 OO

Arg Ile Phe Wall Glu Phe Gly Pro Glin Wall Lell Ser Luell Wall 3. OS 310 315

Ser Glu Thir Luell Lys Asp Asp Pro Ser Wall Wall Thir Wall Ser Wall Asn 3.25 330 335

Pro Ala Ser Gly Thir Asp Ser Asp Ile Glin Luell Arg Asp Ala Ala Wall 34 O 345 35. O

Glin Luell Wall Wall Ala Gly Wall Asn Luell Glin Gly Phe Asp Trp Asp 355 360 365

Ala Pro Asp Ala Thir Arg Met Glin Ala Ile Lys Thir Thir 37 O 375

Lell Arg Luell Ser Ala Ala Thir Wall Ser Asp Thir Wall 385 390 395 4 OO

Arg Asp Ala Ala Met Asn Asp Gly Arg Cys Wall Thir Luell Lys Gly 4 OS 415

Ala Ala Pro Luell Ile Ala Pro Glu Pro 42O 425

SEQ ID NO 11 LENGTH: 5 TYPE : PRT ORGANISM: Schizochytrium sp. FEATURE: NAME/KEY: MISC FEATURE LOCATION: (1) . . (5) US 7,645,597 B2 171 172

- Continued &223> OTHER INFORMATION: Xaa = any amino acid

<4 OO SEQUENCE: 11 Gly His Ser Xaa Gly 1. 5

SEQ ID NO 12 LENGTH: 258 TYPE: DNA ORGANISM: Schizochytrium sp. FEATURE: NAME/KEY: CDS LOCATION: (1) ... (258)

SEQUENCE: 12 gct gt C tog aac gag citt citt gag aag gcc gag act gtc gtC atg gag 48 Ala Wall Ser Asn Glu Lell Lell Glu Lys Ala Glu Thir Wall Wall Met Glu 1. 5 1O 15 gtc citc. gcc gcc aag a CC ggc tac gag acc gac atg atc. gag gct gac 96 Wall Luell Ala Ala Lys Thir Gly Glu Thir Asp Met Ile Glu Ala Asp 2O 25 3O atg gag citc. gag a CC gag citc. ggc att gac to c atc. aag cgt. gt C gag 144 Met Glu Luell Glu Thir Glu Lell Gly Ile Asp Ser Ile Lys Arg Wall Glu 35 4 O 45 atc. citc. to c gag gtc Cag gcc atg citc. aat gtc gag gcc aag gat gt C 192 Ile Luell Ser Glu Wall Glin Ala Met Luell Asn Wall Glu Ala Lys Asp Wall SO 55 6 O gat gcc citc. agc cgc act cgc act gtt ggt gag gtt gtc aac gcc atg 24 O Asp Ala Luell Ser Arg Thir Arg Thir Wall Gly Glu Wall Wall Asn Ala Met 65 70 7s 8O aag gcc gag at C gct ggc 258 Lys Ala Glu Ile Ala Gly 85

<210 SEQ ID NO 13 <211 LENGTH: 86 &212> TYPE : PRT <213> ORGANISM: Schizochytrium sp.

<4 OO SEQUENCE: 13

Ala Wal Ser Asn Glu Lell Lell Glu Ala Glu Thir Wall Wall Met Glu 1. 5 1O 15

Wall Luell Ala Ala Lys Thir Gly Glu Thir Asp Met Ile Glu Ala Asp 25 3O

Met Glu Luell Glu Thir Glu Lell Gly Ile Asp Ser Ile Lys Arg Wall Glu 35 4 O 45

Ile Luell Ser Glu Wall Glin Ala Met Luell Asn Wall Glu Ala Asp Wall SO 55 6 O

Asp Ala Luell Ser Arg Thir Arg Thir Wall Gly Glu Wall Wall Asn Ala Met 65 70 7s 8O

Ala Glu Ile Ala Gly 85

<210 SEQ ID NO 14 <211 LENGTH: 5 &212> TYPE : PRT <213> ORGANISM: Schizochytrium sp.

<4 OO SEQUENCE: 14 Lieu. Gly Ile Asp Ser 1. 5

US 7,645,597 B2 181 182

- Continued <210 SEQ ID NO 18 <211 LENGTH: 711 &212> TYPE : PRT <213> ORGANISM: Schizochytrium sp.

<4 OO SEQUENCE: 18

Phe Gly Ala Lieu. Gly Gly Phe Ile Ser Glin Glin Ala Glu Arg Phe Glu 1. 5 15

Pro Ala Glu Ile Lell Gly Phe Thir Luell Met Cys Ala Phe Ala 2O 25

Ala Ser Luell Thir Ala Wall Ala Gly Gly Arg Pro Ala Phe Ile Gly 35 4 O 45

Wall Ala Arg Luell Asp Gly Arg Luell Gly Phe Thir Ser Glin Gly Thir Ser SO 55 6 O

Asp Ala Luell Ala Glin Arg Gly Ala Ile Phe Gly Luell Lys 65 70

Thir Ile Gly Luell Glu Trp Ser Glu Ser Asp Wall Phe Ser Arg Gly Wall 85 90 95

Asp Ile Ala Glin Gly Met His Pro Glu Asp Ala Ala Wall Ala Ile Wall 105 11 O

Arg Glu Met Ala Ala Asp Ile Arg Ile Arg Glu Wall Gly Ile 115 12 O 125

Ala Asn Glin Glin Arg Thir Ile Arg Ala Ala Lys Lell Glu Thir 13 O 135 14 O

Asn Pro Glin Arg Glin Ile Ala Asp Asp Wall Lell Lell Wall Ser 145 150 155

Gly Arg Gly Ile Thir Pro Luell Ile Arg Glu Ile Thir Arg 1.65 17O

Ile Gly Gly Lys Ile Luell Luell Gly Arg Ser Wall Ser 18O 185 19 O

Ser Pro Ala Trp Ala Gly Ile Thir Asp Glu Lys Ala Wall 195

Ala Thir Glin Glu Lell Arg Ala Phe Ser Ala Gly Glu 215 22O

Pro Pro Thir Pro Arg Ala Wall Thir Luell Wall Gly Ser Wall Luell 225 23 O 235 24 O

Gly Arg Glu Wall Arg Ser Ser Ile Ala Ala Ile Glu Ala Luell Gly 245 250 255

Gly Ala Ile Tyr Ser Ser Asp Wall ASn Ser Ala Ala Asp Wall 26 O 265 27 O

Ala Ala Wall Arg Asp Ala Glu Ser Glin Luell Gly Ala Arg Wall Ser 285

Gly Ile Wall His Ala Ser Gly Wall Luell Arg Asp Arg Lell Ile Glu 29 O 295 3 OO

Lys Luell Pro Asp Glu Phe Asp Ala Wall Phe Gly Thir Wall Thir Gly 3. OS 310 315

Lell Glu Asn Luell Lell Ala Ala Wall Asp Arg Ala Asn Lell His Met 3.25 330 335

Wall Luell Phe Ser Ser Lell Ala Gly Phe His Gly Asn Wall Gly Glin Ser 34 O 345 35. O

Asp Ala Met Ala Asn Glu Ala Luell Asn Lys Met Gly Luell Glu Luell 355 360 365

Ala Lys Asp Wall Ser Wall Lys Ser Ile Phe Gly Pro Trp Asp Gly 37 O 375 38O US 7,645,597 B2 183 184

- Continued Gly Met Wall Thir Pro Glin Lell Glin Phe Gln Glu Met Gly Wall 385 390 395 4 OO

Glin Ile Ile Pro Arg Glu Gly Gly Ala Asp Thir Wall Ala Arg Ile Wall 4 OS 415

Lell Gly Ser Ser Pro Ala Glu Ile Luell Wall Gly Asn Trp Arg Thir Pro 425 43 O

Ser Lys Wall Gly Ser Asp Thir Ile Thir Luell His Arg Lys Ile Ser 435 44 O 445

Ala Lys Ser Asn Pro Phe Lell Glu Asp His Wall Ile Glin Gly Arg Arg 450 45.5 460

Wall Lel Pro Met Thir Lell Ala Ile Gly Ser Luell Ala Glu Thir Luell 465 470

Gly Lel Phe Pro Gly Ser Luell Trp Ala Ile Asp Asp Ala Glin Luell 485 490 495

Phe Gly Wall Thir Wall Asp Gly Asp Wall ASn Glu Wall Thir Luell SOO 505

Thir Ser Thir Ala Pro Ser Gly Arg Wall ASn Wall Glin Ala Thir Luell 515 525

Thir Phe Ser Ser Gly Lys Luell Wall Pro Ala Tyr Arg Ala Wall Ile 53 O 535 54 O

Wall Luell Ser Asn Glin Gly Ala Pro Pro Ala ASn Ala Thir Met Glin Pro 5.45 550 555 560

Pro Ser Luell Asp Ala Asp Pro Ala Luell Glin Gly Ser Wall Asp Gly 565 st O sts

Thir Luell Phe His Gly Pro Ala Phe Arg Gly Ile Asp Asp Wall Luell 585 59 O

Ser Thir Ser Glin Lell Wall Ala Ser Ala Wall Pro Gly 595 605

Ser Asp Ala Ala Arg Gly Glu Phe Ala Thir Asp Thir Asp Ala His Asp 610 615 62O

Pro Phe Wall Asn Asp Lell Ala Phe Glin Ala Met Lell Wall Trp Wall Arg 625 630 635 64 O

Arg Thir Luell Gly Glin Ala Ala Luell Pro Asn Ser Ile Glin Arg Ile Wall 645 650 655

Glin His Arg Pro Wall Pro Glin Asp Lys Pro Phe Ile Thir Luell Arg 660 665 67 O

Ser Asn Glin Ser Gly Gly His Ser Glin His His Ala Luell Glin Phe 675 685

His Asn Glu Glin Gly Asp Lell Phe Ile Asp Wall Glin Ala Ser Wall Ile 69 O. 695 7 OO

Ala Thir Asp Ser Lell Ala Phe 7 Os 71O

SEQ ID NO 19 LENGTH: 1350 TYPE: DNA ORGANISM: Schizochytrium sp. FEATURE: NAME/KEY: CDS LOCATION: (1) ... (1350)

<4 OO SEQUENCE: 19 atg gcc gct cqg aat gtg agc gcc gcg cat gag atg cac gat gala aag Met Ala Ala Arg Asn. Wal Ser Ala Ala His Glu Met His Asp Glu Lys 1. 5 15 cgc at C gcc gtC gtc. g.gc atg gcc gt c cag tac gcc gga tigc aaa acc

US 7,645,597 B2 187 188

- Continued acg cc c cag ggt gat cgt gtg gala at C gac gcc gtc aag gcc tgc titt Thir Pro Glin Gly Asp Arg Wall Glu Ile Asp Ala Wall Lys Ala Cys Phe 34 O 345 35. O gaa ggc aag gt C c cc cgt tto ggt acc aca aag ggc aac titt gga CaC 104 Glu Gly Lys Wall Pro Arg Phe Gly Thir Thir Lys Gly Asn Phe Gly His 355 360 365 a CC cts gca gcc ggc titt gcc ggt atg tgc aag gtc citc. citc. to c 152 Thir Xaa Ala Ala Gly Phe Ala Gly Met Cys Lys Wall Luell Luell Ser 37 O 375 38O atg aag Cat ggc atc. atc. cc.g cc c acc cc.g ggt atc. gat gac gag acc 2OO Met Lys His Gly Ile Ile Pro Pro Thir Pro Gly Ile Asp Asp Glu Thir 385 390 395 4 OO aag atg gac cott citc. gtc gtc to c ggt gag gcc atc. C Ca tgg CC a gag 248 Lys Met Asp Pro Lell Wall Wall Ser Gly Glu Ala Ile Pro Trp Pro Glu 4 OS 41O 415 a CC aac ggc gag c cc aag cgc gcc ggt citc. tog gcc titt ggc titt ggt 296 Thir Asn Gly Glu Pro Lys Arg Ala Gly Luell Ser Ala Phe Gly Phe Gly 42O 425 43 O ggc acc aac gcc Cat gcc gtc titt gag gag Cat gac c cc to c aac gcc 344 Gly Thir Asn Ala His Ala Wall Phe Glu Glu His Asp Pro Ser Asn Ala 435 44 O 445 gcc tgc 350 Ala Cys 450

SEQ ID NO 2 O LENGTH: 450 TYPE : PRT ORGANISM: Schizochytrium sp. FEATURE: NAMEAKEY: misc feature LOCATION: (370) ... (370) OTHER INFORMATION: The Xaa." at location 370 stands for Lieu. FEATURE: NAMEAKEY: misc feature LOCATION: (371) ... (371) OTHER INFORMATION: The Xaa." at location 371 stands for Ala, Wall.

SEQUENCE:

Met Ala Ala Arg Asn Wall Ser Ala Ala His Glu Met His Asp Glu Lys 1. 5 15

Arg Ile Ala Wall Wall Gly Met Ala Wall Glin Ala Gly Cys Thir 25

Lys Asp Glu Phe Trp Glu Wall Luell Met Asn Wall Glu Ser Lys 35 4 O 45

Wall Ile Ser Asp Lys Arg Lell Gly Ser Asn Ala Glu His Tyr SO 55

Lys Ala Glu Arg Ser Lys Ala Asp Thir Phe Asn Glu Thir Tyr 65 70

Gly Thir Luell Asp Glu Asn Glu Ile Asp Asn Glu His Glu Luell Luell Luell 85 90 95

Asn Luell Ala Lys Glin Ala Lell Ala Glu Thir Ser Wall Asp Ser Thir 105 11 O

Arg Gly Ile Wall Ser Gly Cys Luell Ser Phe Pro Met Asp Asn Luell 115 12 O 125

Glin Gly Glu Luell Lell Asn Wall Glin Asn His Wall Glu Luell 13 O 135 14 O

Gly Ala Arg Wall Phe Lys Asp Ala Ser His Trp Ser Glu Arg Glu Glin 145 150 155 160