US007842796 B2

(12) United States Patent (10) Patent No.: US 7,842,796 B2 Metz et al. (45) Date of Patent: *Nov.30, 2010

(54) PUFA POLYKETIDE SYNTHASE SYSTEMS CI2P 7/64 (2006.01) AND USES THEREOF CI2N 9/02 (2006.01) (52) U.S. Cl...... 536/23.2:435/69.1; 435/134; (75) Inventors: James G. Metz, Longmont, CO (US); 435/189: 435/252.3:435/257.2:435/320.1; James H. Flatt, Colorado Springs, CO 435/419 (US); Jerry M. Kuner, Longmont, CO (58) Field of Classification Search ...... None (US);US); William R. Barclav,y, Boulder,s CO Seeee applicationapol1Cat1On file forOr complCOmolete Searchh historyh1StorV. (US) (56) References Cited (73) Assignee: Martek Biosciences Corporation, U.S. PATENT DOCUMENTS Columbia, MD (US) 5,130,242 A 7/1992 Barclay et al. (*) Notice: Subject to any disclaimer, the term of this (Continued) patent is extended or adjusted under 35 U.S.C. 154(b) by 186 days. FOREIGN PATENT DOCUMENTS This patent is Subject to a terminal dis- CA 252O795 10, 2004 claimer. (Continued) (21) Appl. No.: 11/689,608 OTHER PUBLICATIONS (22) Filed: Mar 22, 2007 U.S. Appl. No. 1 1/689,587, filed Mar. 22, 2007, Metz et al. e a? a 9 (Continued) (65) Prior Publication Data Primary Examiner Nashaat T Nashed US 2008/0026440 A1 Jan. 31, 2008 Assistant Examiner William W. Moore (74) Attorney, Agent, or Firm—Sterne, Kessler, Goldstein & Related U.S. Application Data Fox P.L.L.C. (60) Division of application No. 10/124,800, filed on Apr. 16, 2002, now Pat. No. 7,247.461, and a continuation- (57) ABSTRACT E.steps: N.'s 1,899, filed on Jan. 14, The invention generally relates to polyunsaturated fatty acid s vs. . . . / w x 3 - x - . (PUFA) polyketide synthase (PKS) systems isolated from or (60) Provisional application No. 60/323,269, filed on Sep. derived from non-bacterial organisms, to homologues 18, 2001, provisional application No. 60/298.796, thereof, to isolated nucleic acid molecules and recombinant filed on Jun. 15, 2001, provisional application No. nucleic acid molecules encoding biologically active domains 60/284,066, filed on Apr. 16, 2001. of such a PUFA PKS system, to genetically modified organ isms comprising PUFA PKS systems, to methods of making (51) Int. Cl. and using Such systems for the production of bioactive mol CI2N 15/53 (2006.01) ecules of interest, and to novel methods for identifying new CI2N 15/74 (2006.01) bacterial and non-bacterial microorganisms having Such a CI2N 15/79 (2006.01) PUFA PKS system. CI2N 15/82 (2006.01) CI2N 5/14 (2006.01) 22 Claims, 3 Drawing Sheets

Orf A ) 307 kDa KS MAT 9X ACP KR

225 kDa

Of C DH DH ER US 7,842,796 B2 Page 2

U.S. PATENT DOCUMENTS WO WO98,46764 10, 1998 WO WO98,55625 12/1998 5,246,841 A 9, 1993 Yazawa et al. WO WOOOf 42.195 T 2000 5,310,242 A 5, 1994 Golder WO WO O2/O83870 10, 2002 5,639,790 A 6, 1997 Voelker et al. WO WO 2004/087879 10, 2004 5,672.491 A 9, 1997 Khosla et al. WO WO 2006/008099 1, 2006 5,683,898 A 11/1997 Yazawa et al. WO WO 2006/034228 3, 2006 5,798,259 A 8, 1998 Yazawa et al. 5,908,622 A 6/1999 Barclay 6,033,883. A 3, 2000 Barr et al. OTHER PUBLICATIONS 6,140,486 A 10/2000 Facciotti et al. U.S. Appl. No. 1 1/689,596, filed Mar. 22, 2007, Metz et al. 6,503,706 B1 1/2003 Abken et al. U.S. Appl. No. 1 1/689,598, filed Mar. 22, 2007, Metz et al. 6,566,583 B1 5/2003 Facciotti et al. U.S. Appl. No. 1 1/689,605, filed Mar. 22, 2007, Metz et al. 6,677,145 B2 1/2004 Mukerji et al. U.S. Appl. No. 1 1/777,220, filed Jul. 12, 2007, Metz et al. 7,001.772 B2 2/2006 Roessler et al. U.S. Appl. No. 1 1/777,275, filed Jul. 12, 2007, Metz et al. 7,087,432 B2 82006 Qiu et al. U.S. Appl. No. 1 1/777.277, filed Jul. 12, 2007, Metz et al. 7,125,672 B2 10, 2006 Picataggio et al. U.S. Appl. No. 1 1/777,278, filed Jul. 12, 2007, Metz et al. 2: 3: Musical U.S. Appl. No. 1 1/777,279, filed Jul. 12, 2007, Metz et al. 7,214,8534. J. J. B2 5/2007 Facciotti et al...... 800/281 Abbadi et al., Eur, J. Lipid- -Sci. Technol. 103: 106-113 (2001). 7,217,856 B2 5, 2007 Weaver et al. Allen et al., Appl. Envir. Microbiol 65(4): 1710-1720 (1999). 7,247.461 B2 7/2007 Metz et al. Bateman et al., Nucl. Acids Res., 30(1):276-280 (2002). 7,256,022 B2 8, 2007 Metz et al. Bentley et al., Annu. Rev. Microbiol., 53:411-46 (1999). 7,256,023 B2 8, 2007 Metz et al. Bisang et al., Nature, 401:502-505 (1999). 7,259,295 B2 8/2007 Metz et al. Bork, TIG, 12(10):425-427 (1996). 7,271,315 B2 * 9/2007 Metz et al...... 800/278 Brenner, TIG, 15(4): 132-133 (1999). 2004/OOO5672 A1 1/2004 Santi et al. Broun et al., Science, 282:1315-1317 (1998). 2004/0010817 A1 1/2004 Shockey et al. Chuck et al., Chem. and Bio. Current Bio. (London, GB), 4:10 2004/O139498 A1 7/2004 Jaworski et al. (1997) pp. 757-766. 2004/0172682 A1 9/2004 Kinney et al. Creelman et al., Annu. Rev. Plan Physiol. Plant Mol. Biol. 48:355-81 2005.0089865 A1 4/2005 Napier et al. (1997). 2005, 0164.192 A1 7/2005 Graham et al. Database Geneseq Online Dec. 11, 2000, "S. aggregatum PKS 2007/0244192 A1 10, 2007 Metz cluster ORF6 homolog DNA” XP002368912, retrieved from EBI 2007/0245431 A1* 10/2007 Metz et al...... , 800/281 accession No. GSN: AAA71567Database accession No. 2007,0256146 A1 11, 2007 Metz et al. AAA71567 & Database Geneseq Online! Dec. 11, 2000, “S. 2007,0266455 A1 11, 2007 Weaver et al. aggregatum PKS cluster ORF6 homolog protein.” XP002368914 2007/0270494 A1 11, 2007 Metz et al. retrieved from EBI accession No. GSP:AAB10482 Database acces 2008/0022422 A1 1/2008 Weaver et al. sion No. AAB10482 & WO 00/42.195 A (Calgene, LLC) Jul. 20, 2008, 0026434 A1 1/2008 Weaver et al. 2000. 2008, 0026435 A1 1/2008 Weaver et al. DeLong & Yayanos, Appl. Environ. Microbiol. 51(4):730-737 2008, 0026436 A1 1/2008 Weaver et al. (1986). 2008, 0026437 A1 1/2008 Weaver et al. Doerks, TIG, 14(6):248-250 (1998). 2008, OO32296 A1 2/2008 Weaver et al. Facciotti et al., “Cloning and Characterization of Polyunsaturated 2008, OO32338 A1 2/2008 Weaver et al. Fatty Acids (PUFA) Genes from Marine Bacteria” in Proceedings of 2008, OO32367 A1 2/2008 Weaver et al. the international Symposium on progress and prospect of marine 2008, OO32368 A1 2/2008 Weaver et al. biotechnology (China Ocean Pres 1999), pp. 404-405 Abstract. 3.8. A. 3.3. Wy: al. Harlow et al. Antibodies: A Laboratory Manual (1988) Cold Spring 2 a. Harbor Laboratory Press, p. 76. 3.83. A. 3.3. Ms. al Heath et al., J. Biol. Chem., 271 (44):27795-27801 (1996). Hopwood & Sherman, Annu. Rev. Genet. 24:37-66 (1990). 38. A. 33. Ms. al Hutchinson, Annu. Rev. Microbiol. 49:201-238 (1995). 2008.OO38797 A1 2/2008 Metz et al. Jez et al., Chem. and Bio. (London), 7:12 (2000) pp. 919-930. 2008, OO38798 A1 2/2008 Weaver et al. Jostensen & Landfald, FEMS Microbiology Letters, 151:95-101 2008.OO38799 A1 2/2008 Weaver et al. (1997). 2008/0040822 A1 2/2008 Metz et al. Katz & Donadio, Annu. Rev. Microbiol. 47:875-912 (1993). 2008/0044868 A1 2/2008 Metz et al. Kealey et al., "Production of a polyketide natural product in non 2008/0044869 A1 2/2008 Metz et al. polyketide-producing prokaryotic and eukaryotic hosts", Proceed 2008/0044870 A1 2/2008 Metz et al. IngS of the National Academy of Sciences of the United States of 2008/0044871 A1 2/2008 Metz et al. America, vol. 95, No. 2. Jan. 20, 1998, pp. 505-509, XP0023.38563. 2008/0044872 A1 2/2008 Metz et al. Keating et al., Curr. Opin. Chem. Biol. 3:598-606 (1999). 2008/0044873 A1 2/2008 Metz et al. Kyle et al., HortScience, 25:1523-26 (1990). 2008/0044874 A1 2/2008 Weaver et al. Leadlay PF. Current Opinion in Chemical Biology (1997) 1: 162 2008/0050790 A1 2/2008 Metz et al. 168. 2008, OOSO791 A1 2/2008 Weaver et al. Magnuson, Microbil. Rev. 57(3):522-542 (1993) Abstract. 2008. O148433 A1 6, 2008 Metz et al. Metz et al., Science, 293:290-293 (2001). Nakahara, Yukagaku, 44(10):821-7 (1995). FOREIGN PATENT DOCUMENTS Nasu et al., J. Ferment. Bioeng. 122:467-473 (1997). Nichols et al., Curr. Opin. Biotechnol., 10:240-246 (1999). EP O594.868 5, 1994 Nicholson et al., Chemistry and Biology (London), 8:2 (2001) pp. EP O823475 2, 1998 157-178. WO WO93/23545 11, 1993 Nogi et al., Extremophiles, 2:1-7 (1998). WO WO96/21735 T 1996 Oliynyk et al. Chemistry & Biology (1996) 3: 833-839. US 7,842,796 B2 Page 3

Parker-Barnes et al., PNAS,97(15):8284-8289 (2000). Nasu et al., “Efficient Transformation of Marchantia polymorpha Sanchez et al., Chemistry & Biolosy, 8:725-738 (2001). That is Haploid and Has Very Small Genome DNA.” Journal of Shanklin et al., Annu. Rev. Plant Physiol. Plant Mol. Biol. 49:611-41 Fermentation and Bioengineering vol. 84. No. 6, 519-523 1997. (1998). Orikasa et al. Characterization of the eicosapentaenoic acid Smith et al., Nature Biotechnol. 15:1222-1223 (1997). biosynthesis gene cluster from Shewanella sp. strain SCRC-2738, Somerville Am. J. Clin. Nutr., 58(2 supp):270S-275S (1993). Cellular and Molecular Biology (Noisy-le-grand), Jul. 2004, vol. 50. No. 5, pp. 625-630. Van de Loo, Proc. Natl. Acad. Sci. USA. 92:6743-6747 (1995). Qiu et al. Identification of a delta4 fatty acid desaturase from Wallis et al., “Polyunsaturated fatty acid synthesis: what will they Thraustochytrium sp. involved in the biosynthesis. J. Biol. Chem. think of next?', Tibs Trends in Bio Sciences, Elsevier Publ., Cam Aug. 24, 2001, vol. 276, No. 34, pp. 31561-31566. bridge, EN, vol. 27, No. 9, Sep. 2002, pp. 467-473, XP004378766. Satomi et al. Shewanelia marinintesina sp. nov... Shewanella Watanabe et al., J. Biochem., 122:467-473 (1997). Schlegeliana sp. nov. and Shewanelia Sairae sp. nov. novel Weissmann et al. Biochemistry (1997) 36: 13849-13855. eicosapentaenoic-acid-producing marine bacteria isolated from See Weissmann et al. Biochemistry (1998) 37: 11012-11017. animal intestines. Internat. J. Syst. Evol. Microbiol. 2003, vol.53, pp. Wiesmann et al. Chemistry & Biology (Sep. 1995) 2: 583-589. 491 - 499. Yalpaniet al., The Plant Cell, 13:1401-1409 (2001). Singh et al. Microbial Production of Docosahexaenoic Acid (DHA. Yazawa, Lipids, 31(supp):S297-S300 (1996). C22:6). Adv. Appl. Microbial, 1997. vol. 45, pp. 271-312. Cane et al., “Harnessing the Biosynthetic Code: Combinations, Per Takeyama et al. Expression of eicosapentaenoic acid synthesis gene mutations, and Mutations.” Science 1998, vol. 282, pp. 63-68. cluster from Shewanella sp. in transgenic marine cyanobacterium. Napier “Plumbing the depths of PUFA biosynthesis: a novel Synechecoccus sp. Microbiology. 1997, vol. 143, pp. 2725-2731. polyketide synthase-like pathway from marine organisms.” Trends in UniProt Accession No. Q93CG6 PHOPR, (Allen et al.) 2002. Plant Science. Feb. 2002, vol. 7, No. 2, pp. 51-54. Weete et al. Lipids and Ultrasctructure of Thrauchytrium sp. International Search Report for International (PCT) Patent Applica ATCC26.185. 1997, Am Oil Chem. Soc. vol. 32, No. 8, pp. 839-845. tion No. PCT/US02/12254, mailed Nov. 15, 2002. Yokochi et al. Optimization of docosahexaenoic acid production. International Preliminary Examination Report for International App. Microbiol. Biotechnol. 1998, vol. 49, pp. 72-76. (PCT) Patent Application No. PCT/US02/12254, mailed Oct. 16, International Search Report for International (PCT) Patent Applica 2006. tion No. PCT/US00/00956, mailed Jul. 6, 2000. Examiner's First Report for Australian Patent Application No. Written Opinion for International (PCT) Patent Application No. 2002303394, mailed Dec. 20, 2006. PCT/US00/00956, mailed Dec. 19, 2000. Supplementary Partial European Search Report for European Patent International Preliminary Examination Report for International Application No. 0273.1415, dated Sep. 20, 2005. (PCT) Patent Application No. PCT/US00/00956, mailed Apr. 19, Supplementary European Search Report for European Patent Appli 2001. International Search Report for International (PCT) Patent Applica cation No. 0273.1415, dated Mar. 13, 2006. tion No. PCT/US04/09323, mailed Apr. 4, 2007. Examiner's Report for European Patent Application No. 0273.1415, Written Opinion for International (PCT) Patent Application No. dated Aug. 1, 2007. PCT/US04/09323, mailed Apr. 4, 2007. Fan KW et al: "Eicosapentaenoic and docosahexaenoic acids pro International Preliminary Report on Patentability for International duction by and okara-utilizing potential of thraustochytrids' Journal (PCT) Patent Application No. PCT/US04/09323, mailed May 9, of Industrial Microbiology and Biotechnology, Basingstoke, GB, 2007. vol. 27, No. 4, 1 Oct. 2001, pp. 199-202, XP002393382 ISSN: 1367 International Search Report for International (PCT) Patent Applica 5435 tion No. PCT/US05/36998, mailed Mar. 22, 2007. Wolff et al. Arachidonic, Eicosapentaenoic and Biosynthetically Written Opinion for International (PCT) Patent Application No. Related Fatty Acids in Seed Lipids from a primitive Gymnosperm. PCT/US05/36998, mailed Mar. 22, 2007. Agathis robusta. Lipids 34(10), 1994, 1083-1097. International Search Report for International (PCT) Patent Applica Grimsley et al., “Fatty acid composition of mutants of the moss tion No. PCT/US08/63835, mailed Nov. 3, 2008. Physcomitrella patens” Phytochemistry 2007): 1519-1524, 1981. Written Opinion for International (PCT) Patent Application No. Bedford et al. "A functional chimeric modular polyketide synthase PCT/US08/63835, mailed Nov. 3, 2008. generated via domain replacement.” Chemistry & Biology 3: 827 International Search Report for International (PCT) Patent Applica 831, Oct 1996. tion No. PCT/US06/22893, mailed Feb. 29, 2008. GenBank Accession No. U09865. Alcaligenes eutrophus pyruvate Written Opinion for International (PCT) Patent Application No. dehydrogenase (pdhA), dihydrolipoamide acetyltransferase (pdhB), PCT/US06/22893, mailed Feb. 29, 2008. dihydrolipoamide dehydrogenase (pdhL), and ORF3 genes, com International Search Report for International (PCT) Patent Applica plete cols (1994). tion No. PCT/US07/64105, mailed Nov. 23, 2007. U.S. Appl. No. 1 1/674,574, filed Feb. 13, 2007, Facciotti et al. Written Opinion for International (PCT) Patent Application No. U.S. Appl. No. 1 1/778,594, filed Jul. 16, 2007, Metz et al. PCT/US07/64105, mailed Nov. 23, 2007. U.S. Appl. No. 1 1/781,861, filed Jul. 23, 2007, Weaver et al. International Preliminary Report on Patentabililty for International U.S. Appl. No. 1 1/781.882, filed Jul. 23, 2007, Weaver et al. (PCT) Patent Application No. PCT/US07/64105, mailed Sep. 25, Allen E.A. et al. 2002 "Structure and regulation of the omega-3 2008. polyunsaturated fatty acid synthase genes from the deep-sea bacte International Search Report for International (PCT) Patent Applica rium Photobacterium profundum strain SS9” Microbiology vol. 148 tion No. PCT/US07/64104, mailed Dec. 5, 2008. pp. 1903-1913. Written Opinion for International (PCT) Patent Application No. GenBank Accession No. AF4091 00. (Allen et at) 2002. PCT/US07/64104, mailed Dec. 5, 2008. Kaulmann et al. “Biosynthesis of Polyunsaturated Fatty Acids by International Search Report for International (PCT) Patent Applica Polyketide Synthases”. Angew. Chem. Int. Ed. 2002, 41, No. 11, pp. tion No. PCT/US2007/064106, mailed Sep. 16, 2008. 1866-1869. Written Opinion for International (PCT) Patent Application No. Khosla et al., “Tolerance and Specificity of Polyketide Synthases”. PCT/US2007/064106, mailed Sep. 16, 2008. Annu. Rev. Biochem. 1999. 68:219-253. International Preliminary Report on Patentability for International Nakahara et al. Production of docosahexaenoic and (PCT) Patent Application No. PCT/US2007/064106, mailed Oct. 30, docosapentaenoic acids by Schizochytrium sp. isloated from Yap 2008. Islands. 1996 J. Am. Oil Chem. Soc. 1996, vol. 73, No. 11, pp. Sequence alignment for SEQID No. 5 with SEQID No. 17 from US 1421-1426. Patent 5,683,898. Search resulted dated Aug. 5, 2009. US 7,842,796 B2 Page 4

Sequence alignment for SEQID No. 1 with SEQID No. 16 from US Sequence alignment of SEQID No. 11 with SEQID No. 6 of Yazawa, Patent 5,683,898. Search resulted dated Aug. 5, 2009. US Patent 5,798,259, search result date Aug. 10, 2009. Sequence alignment of SEQID No. 7 with SEQID No. 1 of Yazawa, US Patent 5.798.259, search result date Aug. 10, 2009. * cited by examiner U.S. Patent Nov.30, 2010 Sheet 1 of 3 US 7,842,796 B2 i

i

c g : 5

2 2 5 U.S. Patent Nov.30, 2010 Sheet 2 of 3 US 7,842,796 B2

- - - - m me in - "22

C)JUO

Z'OIH 8JUO/JUO9JJOGJIO

X

US 7,842,796 B2 1. 2 PUFA POLYKETDE SYNTHASE SYSTEMS systems found in plants and bacteria. Type IPKS systems are AND USES THEREOF similar to the Type II system in that the enzymes are used in an iterative fashion to produce the end product. The Type I CROSS-REFERENCE TO RELATED differs from Type II in that enzymatic activities, instead of APPLICATIONS being associated with separable proteins, occur as domains of larger proteins. This system is analogous to the Type I FAS This application is a divisional of U.S. application Ser. No. systems found in animals and fungi. 10/124,800, filed Apr. 16, 2002, now U.S. Pat. No. 7,247.461, In contrast to the Type I and II systems, in modular PKS entitled “PUFA Polyketide Synthase Systems and Uses systems, each enzyme domain is used only once in the pro Thereof.” which claims the benefit of priority under 35 U.S.C. 10 duction of the end product. The domains are found in very S119(e)to: U.S. Provisional Application Ser. No. 60/284,066, large proteins and the product of each reaction is passed on to filed Apr. 16, 2001, entitled “A Polyketide Synthase System another domain in the PKS protein. Additionally, in all of the and Uses Thereof: U.S. Provisional Application Ser. No. PKS systems described above, if a carbon-carbon double 60/298.796, filed Jun. 15, 2001, entitled “A Polyketide Syn bond is incorporated into the end product, it is always in the thase System and Uses Thereof; and U.S. Provisional Appli 15 trans configuration. cation Ser. No. 60/323,269, filed Sep. 18, 2001, entitled In the Type I and Type II PKS systems described above, the “Thraustochytrium PUFA PKS System and Uses Thereof. same set of reactions is carried out in each cycle until the end U.S. application Ser. No. 10/124,800, is also a continuation product is obtained. There is no allowance for the introduc in-part of copending U.S. application Ser. No. 09/231,899, tion of unique reactions during the biosynthetic procedure. now U.S. Pat. No. 6,566,583, filed Jan. 14, 1999, entitled The modular PKS systems require huge proteins that do not “Schizochytrium PKS Genes”. Each of the above-identified utilize the economy of iterative reactions (i.e., a distinct patent applications is incorporated herein by reference in its domain is required for each reaction). Additionally, as stated entirety. above, carbon-carbon double bonds are introduced in the This application does not claim the benefit of priority from trans configuration in all of the previously described PKS U.S. application Ser. No. 09/090,793, filed Jun. 4, 1998, now 25 systems. U.S. Pat. No. 6,140,486, although U.S. application Ser. No. Polyunsaturated fatty acids (PUFAs) are critical compo 09/090,793 is incorporated herein by reference in its entirety. nents of membrane lipids in most eukaryotes (Lauritzenet al., Prog. Lipid Res. 401 (2001): McConnet al., Plant J. 15,521 REFERENCE TO SEQUENCE LISTING (1998)) and are precursors of certain hormones and signaling 30 molecules (Heller et al., Drugs 55, 487 (1998); Creelman et This application contains a Sequence Listing Submitted as al., Annu. Rev. Plant Physiol. Plant Mol. Biol. 48, 355 an electronic text file named “2997-29 corrected ST25.txt, (1997)). Known pathways of PUFA synthesis involve the having a size in bytes of 280 kb, and created on 4 Mar. 2007. processing of saturated 16:0 or 18:0 fatty acids (the abbrevia The information contained in this electronic file is hereby tion X:Y indicates an acyl group containing X carbon atoms incorporated by reference in its entirety pursuant to 37 CFR 35 and Y cis double bonds; double-bond positions of PUFAs are S1.52(e)(5). indicated relative to the methyl carbon of the fatty acid chain (c)3 or ()6) with systematic methylene interruption of the FIELD OF THE INVENTION double bonds) derived from fatty acid synthase (FAS) by elongation and aerobic desaturation reactions (Sprecher, This invention relates to polyunsaturated fatty acid (PUFA) 40 Curr. Opin. Clin. Nutr Metab. Care 2, 135 (1999); Parker polyketide synthase (PKS) systems from microorganisms, Barnes et al., Proc. Natl. Acad. Sci. USA 97,8284 (2000); including eukaryotic organisms, such as Thraustochytrid Shanklin et al., Annu. Rev. Plant Physiol. Plant Nol. Biol. 49, microorganisms. More particularly, this invention relates to 611 (1998)). Starting from acetyl-CoA, the synthesis of DHA nucleic acids encoding non-bacterial PUFA PKS systems, to requires approximately 30 distinct enzyme activities and non-bacterial PUFA PKS systems, to genetically modified 45 nearly 70 reactions including the four repetitive steps of the organisms comprising non-bacterial PUFAPKS systems, and fatty acid synthesis cycle. Polyketide synthases (PKSs) carry to methods of making and using the non-bacterial PUFAPKS out some of the same reactions as FAS (Hopwood et al., Annu. systems disclosed herein. This invention also relates to a Rev. Genet. 24, 37 (1990); Bentley et al., Annu. Rev. Micro method to identify bacterial and non-bacterial microorgan biol. 53, 411 (1999)) and use the same small protein (or isms comprising PUFA PKS systems. 50 domain), acyl carrier protein (ACP), as a covalent attachment site for the growing carbon chain. However, in these enzyme BACKGROUND OF THE INVENTION systems, the complete cycle of reduction, dehydration and reduction seen in FAS is often abbreviated so that a highly Polyketide synthase (PKS) systems are generally known in derivatized carbon chain is produced, typically containing the art as enzyme complexes derived from fatty acid synthase 55 many keto- and hydroxy-groups as well as carbon-carbon (FAS) systems, but which are often highly modified to pro double bonds in the trans configuration. The linear products duce specialized products that typically show little resem of PKSs are often cyclized to form complex biochemicals that blance to fatty acids. Researchers have attempted to exploit include antibiotics and many other secondary products (Hop polyketide synthase (PKS) systems that have been described wood et al., (1990) supra; Bentley et al., (1999), supra; Keat in the literature as falling into one of three basic types, typi 60 ing et al., Curr. Opin. Chem. Biol. 3, 598 (1999)). cally referred to as: Type II, Type I and modular. The Type II Very long chain PUFAs such as docosahexaenoic acid system is characterized by separable proteins, each of which (DHA; 22:6c)3) and eicosapentaenoic acid (EPA; 20:5c)3) carries out a distinct enzymatic reaction. The enzymes work have been reported from several species of marine bacteria, in concert to produce the end product and each individual including Shewanella sp (Nichols et al., Curr: Op. Biotechnol. enzyme of the system typically participates several times in 65 10, 240 (1999); Yazawa, Lipids 31, S (1996); DeLong et al., the production of the end product. This type of system oper Appl. Environ. Microbiol. 51, 730 (1986)). Analysis of a ates in a manner analogous to the fatty acid synthase (FAS) genomic fragment (cloned as plasmid pEPA) from US 7,842,796 B2 3 4 Shewanella sp. strain SCRC2738 led to the identification of intermediates (such as 16:0-ACP) from the cytoplasmic frac five open reading frames (Orfs), totaling 20 Kb, that are tion. Since the proteins encoded by the Shewanella EPA necessary and sufficient for EPA production in E. coli genes are not particularly hydrophobic, restriction of EPA (Yazawa, (1996), supra). Several of the predicted protein synthesis activity to this fraction may reflect a requirement for domains were homologues of FAS enzymes, while other a membrane-associated acyl acceptor molecule. Additionally, regions showed no homology to proteins of known function. in contrast to the E. coli FAS, EPA synthesis is specifically On the basis of these observations and biochemical studies, it NADPH-dependent and does not require NADH. All these was suggested that PUFA synthesis in Shewanella involved results are consistent with the pEPA genes encoding a multi the elongation of 16- or 18-carbon fatty acids produced by functional PKS that acts independently of FAS, elongase, and FAS and the insertion of double bonds by undefined aerobic 10 desaturase activities to synthesize EPA directly. It is likely desaturases (Watanabe et al., J. Biochem. 122, 467 (1997)). that the PKS pathway for PUFA synthesis that has been The recognition that this hypothesis was incorrect began with identified in Shewanella is widespread in marine bacteria. a reexamination of the protein sequences encoded by the five Genes with high homology to the Shewanella gene cluster Shewanella Orfs. At least 11 regions within the five Orfs were have been identified in Photobacterium profiundum (Allen et identifiable as putative enzyme domains (See Metz et al., 15 al., Appli. Environ. Microbiol. 65:1710 (1999)) and in Mori Science 293:290–293 (2001)). When compared with tella marina (Vibrio marinus) (Tanaka et al., Biotechnol. Lett. sequences in the gene databases, seven of these were more 21:939 (1999)). strongly related to PKS proteins than to FAS proteins. The biochemical and molecular-genetic analyses per Included in this group were domains putatively encoding formed with Shewanella provide compelling evidence for malonyl-CoA:ACP acyltransferase (MAT), 3-ketoacyl-ACP polyketide synthases that are capable of synthesizing PUFAs synthase (KS).3-ketoacyl-ACP reductase (KR), acyltrans from malonyl-CoA. A complete scheme for synthesis of EPA ferase (AT), phosphopantetheline transferase, chain length (or by the Shewanella PKS has been proposed. The identification chain initiation) factor (CLF) and a highly unusual cluster of of protein domains homologous to the E. coli FabA protein, six ACP domains (i.e., the presence of more than two clus and the observation that bacterial EPA synthesis occurs tered ACP domains has not previously been reported in PKS 25 anaerobically, provide evidence for one mechanism wherein or FAS sequences). However, three regions were more highly the insertion of cis double bonds occurs through the action of homologous to bacterial FAS proteins. One of these was a bifunctional dehydratase/2-trans, 3-cis isomerase (DH/2. similar to the newly-described Triclosan-resistant enoyl 3I). In E. coli, condensation of the 3-cis acyl intermediate reductase (ER) from Streptococcus pneumoniae (Heath et al., with malonyl-ACP requires a particular ketoacyl-ACP syn Nature 406, 145 (2000)); comparison of ORF8 peptide with 30 thase and this may provide a rationale for the presence of two the S. pneumoniae enoyl reductase using the LALIGN pro KS in the Shewanella gene cluster (in Orf5 and Orf7). How gram (matrix, BLOSUM50; gap opening penalty, -10: elon ever, the PKS cycle extends the chain in two-carbon incre gation penalty -1) indicated 49% similarity over a 386aa ments while the double bonds in the EPA product occur at overlap). Two regions were homologues of the E. coli FAS every third carbon. This disjunction can be solved if the protein encoded by fabA, which catalyzes the synthesis of 35 double bonds at C-14 and C-8 of EPA are generated by trans-2-decenoyl-ACP and the reversible isomerization of 2-trans, 2-cis isomerization (DH/2.2I) followed by incorpo this product to cis-3-decenoyl-ACP (Heath et al., J. Biol. ration of the cis double bond into the elongating fatty acid Chem., 271, 27795 (1996)). On this basis, it seemed likely chain. The enzymatic conversion of a trans double bond to the that at least some of the double bonds in EPA from cis configuration without bond migration is known to occur, Shewanella are introduced by a dehydrase-isomerase mecha 40 for example, in the synthesis of 11-cis-retinal in the retinoid nism catalyzed by the FabA-like domains in Orf7. cycle (Jang et al., J. Biol. Chem. 275, 28.128 (2000)). Anaerobically-grown E. coli cells harboring the pEPA Although Such an enzyme function has not yet been identified plasmid accumulated EPA to the same levels as aerobic cul in the Shewanella PKS, it may reside in one of the unassigned tures (Metz et al., 2001, Supra), indicating that an oxygen protein domains. dependent desaturase is not involved in EPA synthesis. When 45 The PKS pathways for PUFA synthesis in Shewanella and pEPA was introduced into a fabB mutant of E. coli, which is another marine bacteria, Vibrio marinus, are described in unable to synthesize monounsaturated fatty acids and detail in U.S. Pat. No. 6,140.486 (issued from U.S. applica requires unsaturated fatty acids for growth, the resulting cells tion Ser. No. 09/090,793, filed Jun. 4, 1998, entitled “Produc lost their fatty acid auxotrophy. They also accumulated much tion of Polyunsaturated Fatty Acids by Expression of higher levels of EPA than other pEPA-containing strains, 50 Polyketide-like Synthesis Genes in Plants', which is incor Suggesting that EPA competes with endogenously produced porated herein by reference in its entirety). monounsaturated fatty acids for transfer to glycerolipids. Polyunsaturated fatty acids (PUFAs) are considered to be When pEPA-containing E. coli cells were grown in the pres useful for nutritional, pharmaceutical, industrial, and other ence of 'C-acetate, the data from C-NMR analysis of purposes. An expansive supply of PUFAs from natural purified EPA from the cells confirmed the identity of EPA and 55 Sources and from chemical synthesis are not sufficient for provided evidence that this fatty acid was synthesized from commercial needs. Because a number of separate desaturase acetyl-CoA and malonyl-CoA (See Metz et al., 2001, supra). and elongase enzymes are required for fatty acid synthesis A cell-free homogenate from pBPA-containing fabB cells from linoleic acid (LA, 18:2 A9, 12), common in most plant synthesized both EPA and saturated fatty acids from '''C- species, to the more saturated and longer chain PUFAs, engi malonyl-CoA. When the homogenate was separated into a 60 neering plant host cells for the expression of PUFAs such as 200,000xg high-speed pellet and a membrane-free superna EPA and DHA may require expression of five or six separate tant fraction, Saturated fatty acid synthesis was confined to the enzyme activities to achieve expression, at least for EPA and supernatant, consistent with the soluble nature of the Type II DHA. Additionally, for production of useable quantities of FAS enzymes (Magnuson et al., Microbiol. Rev. 57, 522 such PUFAs, additional engineering efforts may be required, (1993)). Synthesis of EPA was found only in the high-speed 65 for instance the down regulation of enzymes competing for pellet fraction, indicating that EPA synthesis can occur with Substrate, engineering of higher enzyme activities Such as by out reliance on enzymes of the E. coli FAS or on soluble mutagenesis or targeting of enzymes to plastid organelles. US 7,842,796 B2 5 6 Therefore it is of interest to obtain genetic material involved amino acid sequence selected from the group consisting of in PUFA biosynthesis from species that naturally produce SEQID NO:2, SEQID NO:4, and SEQID NO:6; and/or (2) these fatty acids and to express the isolated material alone or a nucleic acid sequence encoding an amino acid sequence that in combination in a heterologous system which can be is at least about 70% identical to an amino acid sequence manipulated to allow production of commercial quantities of 5 selected from the group consisting of: SEQID NO:8, SEQID PUFAS. NO:10, SEQ ID NO:13, SEQ ID NO:18, SEQ ID NO:20, The discovery of a PUFA PKS system in marine bacteria SEQ ID NO:22, SEQ ID NO:24, SEQ ID NO:26, SEQ ID such as Shewanella and Vibrio marinus (see U.S. Pat. No. NO:28, SEQ ID NO:30, and SEQID NO:32. In a preferred 6,140,486, ibid.) provides a resource for new methods of embodiment, the nucleic acid sequence encodes an amino commercial PUFA production. However, these marine bac 10 acid sequence chosen from: SEQ ID NO:2, SEQ ID NO:4, teria have limitations which will ultimately restrict their use SEQ ID NO:6, SEQ ID NO:8, SEQ ID NO:10, SEQ ID fulness on a commercial level. First, although U.S. Pat. No. NO:13, SEQ ID NO:18, SEQ ID NO:20, SEQ ID NO:22, 6,140,486 discloses that the marine bacteria PUFA PKS sys SEQ ID NO:24, SEQ ID NO:26, SEQ ID NO:28, SEQ ID tems can be used to genetically modify plants, the marine NO:30, SEQID NO:32 and/or biologically active fragments bacteria naturally live and grow in cold marine environments 15 thereof. In one aspect, the nucleic acid sequence is chosen and the enzyme systems of these bacteria do not function well from: SEQID NO:1, SEQID NO:3, SEQID NO:5, SEQID above 30°C. In contrast, many crop plants, which are attrac NO:7, SEQID NO:9, SEQID NO:12, SEQID NO:17, SEQ tive targets for genetic manipulation using the PUFA PKS ID NO:19, SEQID NO:21, SEQID NO:23, SEQID NO:25, system, have normal growth conditions attemperatures above SEQID NO:27, SEQID NO:29, and SEQID NO:31. 30° C. and ranging to higher than 40° C. Therefore, the Another embodiment of the present invention relates to a marine bacteria PUFA PKS system is not predicted to be recombinant nucleic acid molecule comprising the nucleic readily adaptable to plant expression under normal growth acid molecule as described above, operatively linked to at conditions. Moreover, the marine bacteria PUFA PKS genes, least one transcription control sequence. In another embodi being from a bacterial source, may not be compatible with the ment, the present invention relates to a recombinant cell trans genomes of eukaryotic host cells, or at least may require 25 fected with the recombinant nucleic acid molecule described significant adaptation to work in eukaryotic hosts. Addition directly above. ally, the known marine bacteria PUFA PKS systems do not Yet another embodiment of the present invention relates to directly produce triglycerides, whereas direct production of a genetically modified microorganism, wherein the microor triglycerides would be desirable because triglycerides are a ganism expresses a PKS system comprising at least one bio lipid storage product in microorganisms and as a result can be 30 logically active domain of a polyunsaturated fatty acid accumulated at very high levels (e.g. up to 80-85% of cell (PUFA) polyketide synthase (PKS) system. The at least one weight) in microbial/plant cells (as opposed to a "structural domain of the PUFAPKS system is encoded by a nucleic acid lipid product (e.g. phospholipids) which can generally only sequence chosen from: (a) a nucleic acid sequence encoding accumulate at low levels (e.g. less than 10-15% of cell weight at least one domain of a polyunsaturated fatty acid (PUFA) at maximum)). 35 polyketide synthase (PKS) system from a Thraustochytrid Therefore, there is a need in the art for other PUFA PKS microorganism; (b) a nucleic acid sequence encoding at least systems having greater flexibility for commercial use. one domain of a PUFA PKS system from a microorganism identified by the screening method of the present invention; SUMMARY OF THE INVENTION (c) a nucleic acid sequence encoding an amino acid sequence 40 selected from the group consisting of: SEQID NO:2, SEQID One embodiment of the present invention relates to an NO:4, SEQ ID NO:6, and biologically active fragments isolated nucleic acid molecule comprising a nucleic acid thereof. (d) a nucleic acid sequence encoding an amino acid sequence chosen from: (a) a nucleic acid sequence encoding sequence selected from the group consisting of SEQ ID an amino acid sequence selected from the group consisting NO:8, SEQID NO:10, SEQID NO:13, SEQID NO:18, SEQ of: SEQID NO:2, SEQID NO:4, SEQID NO:6, and biologi 45 ID NO:20, SEQID NO:22, SEQID NO:24, SEQID NO:26, cally active fragments thereof; (b) a nucleic acid sequence SEQ ID NO:28, SEQID NO:30, SEQID NO:32, and bio encoding an amino acid sequence selected from the group logically active fragments thereof; (e)a nucleic acid sequence consisting of: SEQID NO:8, SEQID NO:10, SEQID NO:13, encoding an amino acid sequence that is at least about 60% SEQ ID NO:18, SEQ ID NO:20, SEQ ID NO:22, SEQ ID identical to at least 500 consecutive amino acids of an amino NO:24, SEQ ID NO:26, SEQ ID NO:28, SEQ ID NO:30, 50 acid sequence selected from the group consisting of: SEQID SEQID NO:32, and biologically active fragments thereof; (c) NO:2, SEQID NO:4, and SEQID NO:6; wherein the amino a nucleic acid sequence encoding an amino acid sequence that acid sequence has a biological activity of at least one domain is at least about 60% identical to at least 500 consecutive of a PUFA PKS system; and, (f) a nucleic acid sequence amino acids of the amino acid sequence of (a), wherein the encoding an amino acid sequence that is at least about 60% amino acid sequence has a biological activity of at least one 55 identical to an amino acid sequence selected from the group domain of a polyunsaturated fatty acid (PUFA) polyketide consisting of: SEQID NO:8, SEQID NO:10, SEQID NO:13, synthase (PKS) system; (d) a nucleic acid sequence encoding SEQ ID NO:18, SEQ ID NO:20, SEQ ID NO:22, SEQ ID an amino acid sequence that is at least about 60% identical to NO:24, SEQID NO:26, SEQID NO:28, SEQID NO:30, and the amino acid sequence of (b), wherein the amino acid SEQ ID NO:32; wherein the amino acid sequence has a sequence has a biological activity of at least one domain of a 60 biological activity of at least one domain of a PUFA PKS polyunsaturated fatty acid (PUFA) polyketide synthase system. In this embodiment, the microorganism is genetically (PKS) system; and (e) a nucleic acid sequence that is fully modified to affect the activity of the PKS system. The screen complementary to the nucleic acid sequence of (a), (b), (c), or ing method of the present invention referenced in (b) above (d). In alternate aspects, the nucleic acid sequence encodes an comprises: (i) selecting a microorganism that produces at amino acid sequence that is at least about 70% identical, or at 65 least one PUFA; and, (ii) identifying a microorganism from least about 80% identical, or at least about 90% identical, or (i) that has an ability to produce increased PUFAs under is identical to: (1) at least 500 consecutive amino acids of an dissolved oxygen conditions of less than about 5% of satura US 7,842,796 B2 7 8 tion in the fermentation medium, as compared to production cally active domain from a bacterial PUFAPKS system, from of PUFAs by the microorganism under dissolved oxygen a type IPKS system, from a type II PKS system, and/or from conditions of greater than 5% of saturation, and more prefer a modular PKS system. ably 10% of saturation, and more preferably greater than 15% In another aspect of this embodiment, the microorganism is of saturation, and more preferably greater than 20% of satu 5 genetically modified by transfection with a recombinant ration in the fermentation medium. nucleic acid molecule encoding the at least one domain of a In one aspect, the microorganism endogenously expresses polyunsaturated fatty acid (PUFA) polyketide synthase a PKS system comprising the at least one domain of the PUFA (PKS) system. Such a recombinant nucleic acid molecule can PKS system, and wherein the genetic modification is in a include any recombinant nucleic acid molecule comprising nucleic acid sequence encoding the at least one domain of the 10 any of the nucleic acid sequences described above. In one PUFA PKS system. For example, the genetic modification aspect, the microorganism has been further genetically modi can be in a nucleic acid sequence that encodes a domain fied to recombinantly express at least one nucleic acid mol having a biological activity of at least one of the following ecule encoding at least one biologically active domain from a proteins: malonyl-CoA:ACP acyltransferase (MAT), B-keto bacterial PUFA PKS system, from a Type IPKS system, from acyl-ACP synthase (KS), ketoreductase (KR), acyltrans 15 a Type II PKS system, or from a modular PKS system. ferase (AT), FabA-like B-hydroxyacyl-ACP dehydrase (DH), Yet another embodiment of the present invention relates to phosphopantetheline transferase, chain length factor (CLF), a genetically modified plant, wherein the plant has been acyl carrier protein (ACP), enoyl ACP-reductase (ER), an genetically modified to recombinantly express a PKS system enzyme that catalyzes the synthesis of trans-2-decenoyl-ACP, comprising at least one biologically active domain of a poly an enzyme that catalyzes the reversible isomerization of unsaturated fatty acid (PUFA) polyketide synthase (PKS) trans-2-decenoyl-ACP to cis-3-decenoyl-ACP, and an system. The domain can be encoded by any of the nucleic acid enzyme that catalyzes the elongation of cis-3-decenoyl-ACP sequences described above. In one aspect, the plant has been to cis-vaccenic acid. In one aspect, the genetic modification is further genetically modified to recombinantly express at least in a nucleic acid sequence that encodes an amino acid one nucleic acid molecule encoding at least one biologically sequence selected from the group consisting of: (a) an amino 25 active domain from a bacterial PUFA PKS system, from a acid sequence that is at least about 70% identical, and pref Type I PKS system, from a Type II PKS system, and/from a erably at least about 80% identical, and more preferably at modular PKS system. least about 90% identical and more preferably identical to at Another embodiment of the present invention relates to a least 500 consecutive amino acids of an amino acid sequence method to identify a microorganism that has a polyunsatu selected from the group consisting of: SEQID NO:2, SEQID 30 rated fatty acid (PUFA) polyketide synthase (PKS) system. NO:4, and SEQID NO:6; wherein the amino acid sequence The method includes the steps of: (a) selecting a microorgan has a biological activity of at leastone domain of a PUFAPKS ism that produces at least one PUFA; and, (b) identifying a system; and, (b) an amino acid sequence that is at least about microorganism from (a) that has an ability to produce 70% identical, and preferably at least about 80% identical, increased PUFAs under dissolved oxygen conditions of less and more preferably at least about 90% identical and more 35 than about 5% of saturation in the fermentation medium, as preferably identical to an amino acid sequence selected from compared to production of PUFAs by the microorganism the group consisting of: SEQID NO:8, SEQID NO:10, SEQ under dissolved oxygen conditions of greater than 5% of ID NO:13, SEQID NO:18, SEQID NO:20, SEQID NO:22, saturation, more preferably 10% of saturation, more prefer SEQ ID NO:24, SEQ ID NO:26, SEQ ID NO:28, SEQ ID ably greater than 15% of saturation and more preferably NO:30, and SEQID NO:32; wherein the amino acid sequence 40 greater than 20% of saturation in the fermentation medium. A has a biological activity of at least one domain of a PUFAPKS microorganism that produces at least one PUFA and has an system. ability to produce increased PUFAs under dissolved oxygen In one aspect, the genetically modified microorganism is a conditions of less than about 5% of saturation is identified as Thraustochytrid, which can include, but is not limited to, a a candidate for containing a PUFA PKS system. Thraustochytrid from a genus chosen from Schizochytrium 45 In one aspect of this embodiment, step (b) comprises iden and Thraustochytrium. In another aspect, the microorganism tifying a microorganism from (a) that has an ability to pro has been further genetically modified to recombinantly duce increased PUFAs under dissolved oxygen conditions of express at least one nucleic acid molecule encoding at least less than about 2% of saturation, and more preferably under one biologically active domain from a bacterial PUFA PKS dissolved oxygen conditions of less than about 1% of satura system, from a Type I PKS system, from a Type II PKS 50 tion, and even more preferably under dissolved conditions of system, and/or from a modular PKS system. about 0% of saturation. In another aspect of this embodiment, the microorganism In another aspect of this embodiment, the microorganism endogenously expresses a PUFA PKS system comprising the selected in (a) has an ability to consume bacteria by phago at least one biologically active domain of a PUFA PKS sys cytosis. In another aspect, the microorganism selected in (a) tem, and wherein the genetic modification comprises expres 55 has a simple fatty acid profile. In another aspect, the micro sion of a recombinant nucleic acid molecule selected from the organism selected in (a) is a non-bacterial microorganism. In group consisting of a recombinant nucleic acid molecule another aspect, the microorganism selected in (a) is a eukary encoding at least one biologically active domain from a sec ote. In another aspect, the microorganism selected in (a) is a ond PKS system and a recombinant nucleic acid molecule member of the order Thraustochytriales. In another aspect, encoding a protein that affects the activity of the PUFA PKS 60 the microorganism selected in (a) has an ability to produce system. Preferably, the recombinant nucleic acid molecule PUFAs at a temperature greater than about 15°C., and pref comprises any one of the nucleic acid sequences described erably greater than about 20°C., and more preferably greater above. than about 25° C., and even more preferably greater than In one aspect of this embodiment, the recombinant nucleic about 30°C. In another aspect, the microorganism selected in acid molecule encodes a phosphopantetheline transferase. In 65 (a) has an ability to produce bioactive compounds (e.g., lip another aspect, the recombinant nucleic acid molecule com ids) of interest at greater than 5% of the dry weight of the prises a nucleic acid sequence encoding at least one biologi organism, and more preferably greater than 10% of the dry US 7,842,796 B2 9 10 weight of the organism. In yet another aspect, the microor tion. In another aspect, the organism endogenously expresses ganism selected in (a) contains greater than 30% of its total a non-bacterial PUFA PKS system, and wherein the genetic fatty acids as C14:0, C16:0 and C16:1 while also producing at modification comprises Substitution of a domain from a dif least one long chain fatty acid with three or more unsaturated ferent PKS system for a nucleic acid sequence encoding at bonds, and preferably, the microorganism selected in (a) con 5 least one domain of the non-bacterial PUFA PKS system. tains greater than 40% of its total fatty acids as C14:0, C16:0 In yet another aspect, the organism endogenously and C16:1 while also producing at least one long chain fatty expresses a non-bacterial PUFA PKS system that has been acid with three or more unsaturated bonds. In another aspect, modified by transfecting the organism with a recombinant the microorganism selected in (a) contains greater than 30% nucleic acid molecule encoding a protein that regulates the of its total fatty acids as C14:0, C16:0 and C16:1 while also 10 chain length of fatty acids produced by the PUFA PKS sys producing at least one long chain fatty acid with four or more tem. For example, the recombinant nucleic acid molecule unsaturated bonds, and more preferably while also producing encoding a protein that regulates the chain length of fatty at least one long chain fatty acid with five or more unsaturated acids can replace a nucleic acid sequence encoding a chain bonds. length factor in the non-bacterial PUFA PKS system. In In another aspect of this embodiment, the method further 15 another aspect, the protein that regulates the chain length of comprises step (c) of detecting whether the organism com fatty acids produced by the PUFA PKS system is a chain prises a PUFA PKS system. In this aspect, the step of detect length factor. In another aspect, the protein that regulates the ing can include detecting a nucleic acid sequence in the chain length offatty acids produced by the PUFAPKS system microorganism that hybridizes under stringent conditions is a chain length factor that directs the synthesis of C20 units. with a nucleic acid sequence encoding an amino acid In one aspect, the organism expresses a non-bacterial sequence from a Thraustochytrid PUFA PKS system. Alter PUFA PKS system comprising a genetic modification in a natively, the step of detecting can include detecting a nucleic domain chosen from: a domain encoding FabA-like B-hy acid sequence in the organism that is amplified by oligonucle droxy acyl-ACP dehydrase (DH) domain and a domain otide primers from a nucleic acid sequence from a Thraus encoding B-ketoacyl-ACP synthase (KS), wherein the modi tochytrid PUFA PKS system. 25 fication alters the ratio of long chain fatty acids produced by Another embodiment of the present invention relates to a the PUFA PKS system as compared to in the absence of the microorganism identified by the screening method described modification. In one aspect, the modification comprises Sub above, wherein the microorganism is genetically modified to stituting a DH domain that does not possess isomerization regulate the production of molecules by the PUFA PKS sys activity for a FabA-like B-hydroxyacyl-ACP dehydrase (DH) tem. 30 in the non-bacterial PUFAPKS system. In another aspect, the Yet another embodiment of the present invention relates to modification is selected from the group consisting of a dele a method to produce a bioactive molecule that is produced by tion of all or a part of the domain, a substitution of a homolo a polyketide synthase system. The method includes the step gous domain from a different organism for the domain, and a of culturing under conditions effective to produce the bioac mutation of the domain. tive molecule a genetically modified organism that expresses 35 In another aspect, the organism expresses a PKS system a PKS system comprising at least one biologically active and the genetic modification comprises Substituting a FabA domain of a polyunsaturated fatty acid (PUFA) polyketide like B-hydroxy acyl-ACP dehydrase (DH) domain from a synthase (PKS) system. The domain of the PUFA PKS sys PUFA PKS system for a DH domain that does not posses tem is encoded by any of the nucleic acid sequences described isomerization activity. above. 40 In another aspect, the organism expresses a non-bacterial In one aspect of this embodiment, the organism endog PUFA PKS system comprising a modification in an enoyl enously expresses a PKS system comprising the at least one ACP reductase (ER) domain, wherein the modification domain of the PUFA PKS system, and the genetic modifica results in the production of a different compound as compared tion is in a nucleic acid sequence encoding the at least one to in the absence of the modification. For example, the modi domain of the PUFA PKS system. For example, the genetic 45 fication can be selected from the group consisting of a dele modification can change at least one product produced by the tion of all or a part of the ER domain, a substitution of an ER endogenous PKS system, as compared to a wild-type organ domain from a different organism for the ER domain, and a 1S. mutation of the ER domain. In another aspect of this embodiment, the organism endog In one aspect, the bioactive molecule produced by the enously expresses a PKS system comprising the at least one 50 present method can include, but is not limited to, an anti biologically active domain of the PUFA PKS system, and the inflammatory formulation, a chemotherapeutic agent, an genetic modification comprises transfection of the organism active excipient, an osteoporosis drug, an anti-depressant, an with a recombinant nucleic acid molecule selected from the anti-convulsant, an anti-Heliobactor pylori drug, a drug for group consisting of a recombinant nucleic acid molecule treatment of neurodegenerative disease, a drug for treatment encoding at least one biologically active domain from a sec 55 of degenerative liver disease, an antibiotic, and a cholesterol ond PKS system and a recombinant nucleic acid molecule lowering formulation. In one aspect, the bioactive molecule is encoding a protein that affects the activity of the PUFA PKS a polyunsaturated fatty acid (PUFA). In another aspect, the system. For example, the genetic modification can change at bioactive molecule is a molecule including carbon-carbon least one product produced by the endogenous PKS system, double bonds in the cis configuration. In another aspect, the as compared to a wild-type organism. 60 bioactive molecule is a molecule including a double bond at In yet another aspect of this embodiment, the organism is every third carbon. genetically modified by transfection with a recombinant In one aspect of this embodiment, the organism is a micro nucleic acid molecule encoding the at least one domain of the organism, and in another aspect, the organism is a plant. polyunsaturated fatty acid (PUFA) polyketide synthase Another embodiment of the present invention relates to a (PKS) system. In another aspect, the organism produces a 65 method to produce a plant that has a polyunsaturated fatty polyunsaturated fatty acid (PUFA) profile that differs from acid (PUFA) profile that differs from the naturally occurring the naturally occurring organism without a genetic modifica plant, comprising genetically modifying cells of the plant to US 7,842,796 B2 11 12 express a PKS system comprising at least one recombinant In this embodiment, the PUFA PKS system can be nucleic acid molecule comprising a nucleic acid sequence expressed in a prokaryotic host cell or in a eukaryotic host encoding at least one biologically active domain of a PUFA cell. In one aspect, the host cell is a plant cell. Accordingly, PKS system. The domain of the PUFA PKS system is one embodiment of the invention is a method to produce a encoded by any of the nucleic acid sequences described product containing at least one PUFA, comprising growing a above. plant comprising Such a plant cell under conditions effective Yet another embodiment of the present invention relates to to produce the product. The host cell is a microbial cell and in a method to modify an endproduct containing at least one this case, one embodiment of the present invention is a fatty acid, comprising adding to the endproduct an oil pro method to produce a product containing at least one PUFA, duced by a recombinant host cell that expresses at least one 10 comprising culturing a culture containing Such a microbial recombinant nucleic acid molecule comprising a nucleic acid cell under conditions effective to produce the product. In one sequence encoding at least one biologically active domain of aspect, the PKS system catalyzes the direct production of a PUFA PKS system. The domain of a PUFA PKS system is triglycerides. encoded by any of the nucleic acid sequences described Yet another embodiment of the present invention relates to above. In one aspect, the endproduct is selected from the 15 a genetically modified microorganism comprising a polyun group consisting of a dietary Supplement, a food product, a saturated fatty acid (PUFA) polyketide synthase (PKS) sys pharmaceutical formulation, a humanized animal milk, and tem, wherein the PKS catalyzes both iterative and non-itera an infant formula. A pharmaceutical formulation can include, tive enzymatic reactions. The PUFA PKS system comprises: but is not limited to: an anti-inflammatory formulation, a (a) at least two enoyl ACP-reductase (ER) domains; (b) at chemotherapeutic agent, an active excipient, an osteoporosis least six acyl carrier protein (ACP) domains; (c) at least two drug, an anti-depressant, an anti-convulsant, an anti-Helio B-keto acyl-ACP synthase (KS) domains; (d) at least one bactor pyloridrug, a drug for treatment of neurodegenerative acyltransferase (AT) domain; (e) at least one ketoreductase disease, a drug for treatment of degenerative liver disease, an (KR) domain; (f) at least two FabA-like B-hydroxyacyl-ACP antibiotic, and a cholesterol lowering formulation. In one dehydrase (DH) domains; (g) at least one chain length factor aspect, the endproduct is used to treat a condition selected 25 (CLF) domain; and (h) at least one malonyl-CoA:ACP acyl from the group consisting of chronic inflammation, acute transferase (MAT) domain. The genetic modification affects inflammation, gastrointestinal disorder, cancer, cachexia, the activity of the PUFA PKS system. In one aspect of this cardiac restenosis, neurodegenerative disorder, degenerative embodiment, the microorganism is a eukaryotic microorgan disorder of the liver, blood lipid disorder, osteoporosis, 1S. osteoarthritis, autoimmune disease, preeclampsia, preterm 30 Yet another embodiment of the present invention relates to birth, age related maculopathy, pulmonary disorder, and per a recombinant host cell which has been modified to express a oxisomal disorder. non-bacterial polyunsaturated fatty acid (PUFA) polyketide Yet another embodiment of the present invention relates to synthase (PKS) system, wherein the non-bacterial PUFA a method to produce a humanized animal milk, comprising PKS catalyzes both iterative and non-iterative enzymatic genetically modifying milk-producing cells of a milk-pro 35 reactions. The non-bacterial PUFA PKS system comprises: ducing animal with at least one recombinant nucleic acid (a) at least one enoyl ACP-reductase (ER) domain; (b) mul molecule comprising a nucleic acid sequence encoding at tiple acyl carrier protein (ACP) domains; (c) at least two least one biologically active domain of a PUFA PKS system. B-keto acyl-ACP synthase (KS) domains; (d) at least one The domain of the PUFAPKS system is encoded by any of the acyltransferase (AT) domain; (e) at least one ketoreductase nucleic acid sequences described above. 40 (KR) domain; (f) at least two FabA-like B-hydroxyacyl-ACP Yet another embodiment of the present invention relates to dehydrase (DH) domains; (g) at least one chain length factor a method produce a recombinant microbe, comprising geneti (CLF) domain; and (h) at least one malonyl-CoA:ACP acyl cally modifying microbial cells to express at least one recom transferase (MAT) domain. binant nucleic acid molecule comprising a comprising a nucleic acid sequence encoding at least one biologically 45 BRIEF DESCRIPTION OF THE FIGURES active domain of a PUFA PKS system. The domain of the PUFA PKS system is encoded by any of the nucleic acid FIG. 1 is a graphical representation of the domain structure sequences described above. of the Schizochytrium PUFA PKS system. Yet another embodiment of the present invention relates to FIG. 2 shows a comparison of PKS domains from a recombinant host cell which has been modified to express a 50 Schizochytrium and Shewanella. polyunsaturated fatty acid (PUFA) polyketide synthase FIG. 3 shows a comparison of PKS domains from (PKS) system, wherein the PKS catalyzes both iterative and Schizochytrium and a related PKS system from Nostoc whose non-iterative enzymatic reactions. The PUFA PKS system product is a long chain fatty acid that does not contain any comprises: (a) at least two enoyl ACP-reductase (ER) double bonds. domains; (b) at least six acyl carrier protein (ACP) domains; 55 (c) at least two B-ketoacyl-ACP synthase (KS) domains; (d) DETAILED DESCRIPTION OF THE INVENTION at least one acyltransferase (AT) domain; (e) at least one ketoreductase (KR) domain; (f) at least two FabA-like B-hy The present invention generally relates to non-bacterial droxy acyl-ACP dehydrase (DH) domains; (g) at least one derived polyunsaturated fatty acid (PUFA) polyketide syn chain length factor (CLF) domain; and (h) at least one malo 60 thase (PKS) systems, to genetically modified organisms com nyl-CoA:ACP acyltransferase (MAT) domain. In one aspect, prising non-bacterial PUFA PKS systems, to methods of the PUFA PKS system is a eukaryotic PUFA PKS system. In making and using Such systems for the production of products another aspect, the PUFA PKS system is an algal PUFA PKS of interest, including bioactive molecules, and to novel meth system, and preferably a Thraustochytriales PUFA PKS sys ods for identifying new eukaryotic microorganisms having tem, which can include, but is not limited to, a Schizochytrium 65 such a PUFA PKS system. As used herein, a PUFA PKS PUFA PKS system or a Thraustochytrium PUFA PKS sys system generally has the following identifying features: (1) it tem. produces PUFAs as a natural product of the system; and (2) it US 7,842,796 B2 13 14 comprises several multifunctional proteins assembled into a alter the PUFA PKS genes, or combine portions of these complex that conducts both iterative processing of the fatty genes with other synthesis systems, including other PKS acid chain as well non-iterative processing, including trans systems, such that new products are produced. The inherent cis isomerization and enoyl reduction reactions in selected ability of this particular type of system to do both iterative and cycles (See FIG. 1, for example). 5 selective reactions will enable this system to yield products More specifically, first, a PUFA PKS system that forms the that would not be found if similar methods were applied to basis of this invention produces polyunsaturated fatty acids other types of PKS systems. (PUFAs) as products (i.e., an organism that endogenously In one embodiment, a PUFA PKS system according to the (naturally) contains such a PKS system makes PUFAs using present invention comprises at least the following biologi this system). The PUFAs referred to herein are preferably 10 polyunsaturated fatty acids with a carbon chain length of at cally active domains: (a) at least two enoyl ACP-reductase least 16 carbons, and more preferably at least 18 carbons, and (ER) domains; (b) at least six acyl carrier protein (ACP) more preferably at least 20 carbons, and more preferably 22 domains; (c) at least two B-keto acyl-ACP synthase (KS) or more carbons, with at least 3 or more double bonds, and domains; (d) at least one acyltransferase (AT) domain; (e) at preferably 4 or more, and more preferably 5 or more, and even 15 least one ketoreductase (KR) domain; (f) at least two FabA more preferably 6 or more double bonds, wherein all double like B-hydroxy acyl-ACP dehydrase (DH) domains; (g) at bonds are in the cis configuration. It is an object of the present least one chain length factor (CLF) domain; and (h) at least invention to find or create via genetic manipulation or one malonyl-CoA:ACP acyltransferase (MAT) domain. The manipulation of the endproduct, PKS systems which produce functions of these domains are generally individually known polyunsaturated fatty acids of desired chain length and with in the art and will be described in detail below with regard to desired numbers of double bonds. Examples of PUFAs the PUFA PKS system of the present invention. include, but are not limited to, DHA (docosahexaenoic acid In another embodiment, the PUFA PKS system comprises (C22:6, co-3)), DPA (docosapentaenoic acid (C22:5, (D-6)), at least the following biologically active domains: (a) at least and EPA (eicosapentaenoic acid (C20:5, co-3)). one enoyl ACP-reductase (ER) domain; (b) multiple acyl Second, the PUFA PKS system described herein incorpo 25 rates both iterative and non-iterative reactions, which distin carrier protein (ACP) domains (at least four, and preferably at guish the system from previously described PKS systems least five, and more preferably at least six, and even more (e.g., type I, type II or modular). More particularly, the PUFA preferably seven, eight, nine, or more than nine); (c) at least PKS system described herein contains domains that appear to two B-ketoacyl-ACP synthase (KS) domains; (d) at least one function during each cycle as well as those which appear to 30 acyltransferase (AT) domain; (e) at least one ketoreductase function during only some of the cycles. A key aspect of this (KR) domain; (f) at least two FabA-like B-hydroxyacyl-ACP may be related to the domains showing homology to the dehydrase (DH) domains; (g) at least one chain length factor bacterial Fab. A enzymes. For example, the Fab. A enzyme of (CLF) domain; and (h) at least one malonyl-CoA:ACP acyl E. coli has been shown to possess two enzymatic activities. It transferase (MAT) domain. Preferably, such a PUFA PKS possesses a dehydration activity in which a water molecule 35 system is a non-bacterial PUFA-PKS system. (H2O) is abstracted from a carbon chain containing a hydroxy In one embodiment, a PUFA PKS system of the present group, leaving a trans double bond in that carbon chain. In invention is a non-bacterial PUFA PKS system. In other addition, it has an isomerase activity in which the trans double words, in one embodiment, the PUFA PKS system of the bond is converted to the cis configuration. This isomerization present invention is isolated from an organism that is not a is accomplished in conjunction with a migration of the double 40 bacteria, or is a homologue of or derived from a PUFA PKS bond position to adjacent carbons. In PKS (and FAS) systems, system from an organism that is not a bacteria, such as a the main carbon chain is extended in 2 carbon increments. eukaryote or an archaebacterium. Eukaryotes are separated One can therefore predict the number of extension reactions from prokaryotes based on the degree of differentiation of the required to produce the PUFA products of these PKS systems. cells. The higher group with more differentiation is called For example, to produce DHA (C22:6, all cis) requires 10 45 eukaryotic. The lower group with less differentiated cells is extension reactions. Since there are only 6 double bonds in called prokaryotic. In general, prokaryotes do no possess a the end product, it means that during some of the reaction nuclear membrane, do not exhibit mitosis during cell divi cycles, a double bond is retained (as a cis isomer), and in Sion, have only one chromosome, their cytoplasm contains others, the double bond is reduced prior to the next extension. 70S ribosomes, they do not possess any mitochondria, endo Before the discovery of a PUFA PKS system in marine 50 plasmic reticulum, chloroplasts, lysosomes or golgi appara bacteria (see U.S. Pat. No. 6,140.486), PKS systems were not tus, their flagella (if present) consists of a single fibril. In known to possess this combination of iterative and selective contrast eukaryotes have a nuclear membrane, they do exhibit enzymatic reactions, and they were not thought of as being mitosis during cell division, they have many chromosomes, able to produce carbon-carbon double bonds in the cis con their cytoplasm contains 80S ribosomes, they do possess figuration. However, the PUFA PKS system described by the 55 mitochondria, endoplasmic reticulum, chloroplasts (in present invention has the capacity to introduce cis double algae), lysosomes and golgi apparatus, and their flagella (if bonds and the capacity to vary the reaction sequence in the present) consists of many fibrils. In general, bacteria are cycle. prokaryotes, while algae, fungi, , protozoa and higher Therefore, the present inventors propose to use these fea plants are eukaryotes. The PUFA PKS systems of the marine tures of the PUFAPKS system to produce a range of bioactive 60 bacteria (e.g., Shewanella and Vibrio marinus) are not the molecules that could not be produced by the previously basis of the present invention, although the present invention described (Type II, Type I and modular) PKS systems. These does contemplate the use of domains from these bacterial bioactive molecules include, but are limited to, polyunsatu PUFA PKS systems in conjunction with domains from the rated fatty acids (PUFAs), antibiotics or other bioactive com non-bacterial PUFA PKS systems of the present invention. pounds, many of which will be discussed below. For example, 65 For example, according to the present invention, genetically using the knowledge of the PUFA PKS gene structures modified organisms can be produced which incorporate non described herein, any of a number of methods can be used to bacterial PUFA PKS functional domains with bacteria PUFA US 7,842,796 B2 15 16 PKS functional domains, as well as PKS functional domains therein as SEQ ID NO:76 and incorrectly designated as a or proteins from other PKS systems (type I, type II, modular) partial open reading frame) match the entire sequence (plus or FAS systems. the stop codon) of the sequence denoted hereinas Orf (SEQ Schizochytrium is a Thraustochytrid marine microorgan ID NO:5). ism that accumulates large quantities of triacylglycerols rich 5 Further sequencing of cDNA and genomic clones by the in DHA and docosapentaenoic acid (DPA; 22:5 (0-6); e.g., present inventors allowed the identification of the full-length 30% DHA+DPA by dry weight (Barclay et al., J. Appl. Phy genomic sequence of each of OrfA, OrfB and Orf and the col. 6, 123 (1994)). In eukaryotes that synthesize 20- and complete identification of the domains with homology to 22-carbon PUFAs by an elongation/desaturation pathway, the those in Shewanella (see FIG. 2). It is noted that in pools of 18-, 20- and 22-carbon intermediates are relatively 10 Schizochytrium, the genomic DNA and cDNA are identical, large so that in vivo labeling experiments using 'C-acetate due to the lack of introns in the organism genome, to the best reveal clear precursor-product kinetics for the predicted inter of the present inventors knowledge. Therefore, reference to a mediates (Gellerman et al., Biochim. Biophys. Acta 573:23 nucleotide sequence from Schizochytrium can refer to (1979)). Furthermore, radiolabeled intermediates provided genomic DNA or cDNA. Based on the comparison of the exogenously to Such organisms are converted to the final 15 Schizochytrium PKS domains to Shewanella, clearly, the PUFA products. The present inventors have shown that Schizochytrium genome encodes proteins that are highly 1-C-acetate was rapidly taken up by Schizochytrium cells similar to the proteins in Shewanella that are capable of and incorporated into fatty acids, but at the shortest labeling catalyzing EPA synthesis. The proteins in Schizochytrium time (1 min), DHA contained 31% of the label recovered in constitute a PUFA PKS system that catalyzes DHA and DPA fatty acids, and this percentage remained essentially synthesis. As discussed in detail herein, simple modification unchanged during the 10-15 min of ''C-acetate incorpora of the reaction scheme identified for Shewanella will allow tion and the subsequent 24 hours of culture growth (See for DHA synthesis in Schizochytrium. The homology Example 3). Similarly, DPA represented 10% of the label between the prokaryotic Shewanella and eukaryotic throughout the experiment. There is no evidence for a precur Schizochytrium genes suggests that the PUFAPKS has under sor-product relationship between 16- or 18-carbon fatty acids 25 gone lateral gene transfer. and the 22-carbon polyunsaturated fatty acids. These results FIG. 1 is a graphical representation of the three open read are consistent with rapid synthesis of DHA from C-ac ing frames from the Schizochytrium PUFA PKS system, and etate involving very Small (possibly enzyme-bound) pools of includes the domain structure of this PUFA PKS system. As intermediates. A cell-free homogenate derived from described in Example 1 below, the domain structure of each Schizochytrium cultures incorporated 1-''C-malonyl-CoA 30 open reading frame is as follows: into DHA, DPA, and saturated fatty acids. The same biosyn thetic activities were retained by a 100,000xg supernatant Open Reading Frame A (OrfA): fraction but were not present in the membrane pellet. Thus, The complete nucleotide sequence for OrfA is represented DHA and DPA synthesis in Schizochytrium does not involve herein as SEQID NO:1. Nucleotides 4677-8730 of SEQ ID membrane-bound desaturases or fatty acid elongation 35 NO:1 correspond to nucleotides 390-4443 of the sequence enzymes like those described for other eukaryotes (Parker denoted as SEQ ID NO:69 in U.S. application Ser. No. Barnes et al., 2000, supra; Shanklin et al., 1998, supra). These 09/231,899. Therefore, nucleotides 1-4676 of SEQID NO:1 fractionation data contrast with those obtained from the represent additional sequence that was not disclosed in U.S. Shewanella enzymes (See Metz et al., 2001, supra) and may application Ser. No. 09/231,899. This novel region of SEQID indicate use of a different (soluble) acyl acceptor molecule, 40 NO:1 encodes the following domains in OrfA: (1) the ORFA such as CoA, by the Schizochytrium enzyme. KS domain; (2) the ORFA-MAT domain; and (3) at least a In copending U.S. application Ser. No. 09/231,899, a portion of the ACP domain region (e.g., at least ACP domains cDNA library from Schizochytrium was constructed and 1-4). It is noted that nucleotides 1-389 of SEQID NO:69 in approximately 8,000 random clones (ESTs) were sequenced. U.S. application Ser. No. 09/231,899 do not match with the Within this dataset, only one moderately expressed gene 45 389 nucleotides that are upstream of position 4677 in SEQID (0.3% of all sequences) was identified as a fatty acid desatu NO:1 disclosed herein. Therefore, positions 1-389 of SEQID rase, although a second putative desaturase was represented NO:69 in U.S. application Ser. No. 09/231,899 appear to be by a single clone (0.01%). By contrast, sequences that exhib incorrectly placed next to nucleotides 390-4443 of that ited homology to 8 of the 11 domains of the Shewanella PKS sequence. Most of these first 389 nucleotides (about positions genes shown in FIG. 2 were all identified at frequencies of 50 60–389) are a match with an upstream portion of OrfA (SEQ 0.2-0.5%. In U.S. application Ser. No. 09/231,899, several ID NO: 1) of the present invention and therefore, it is believed cDNA clones showing homology to the Shewanella PKS that an error occurred in the effort to prepare the contig of the genes were sequenced, and various clones were assembled cDNA constructs in U.S. application Ser. No. 09/231,899. into nucleic acid sequences representing two partial open The region in which the alignment error occurred in U.S. reading frames and one complete open reading frame. Nucle 55 application Ser. No. 09/231,899 is within the region of highly otides 390-4443 of the cDNA sequence containing the first repetitive sequence (i.e., the ACP region, discussed below), partial open reading frame described in U.S. application Ser. which probably created some confusion in the assembly of No. 09/231,899 (denoted therein as SEQID NO:69) match that sequence from various cloNA clones. nucleotides 4677-8730 (plus the stop codon) of the sequence OrfA is a 8730 nucleotide sequence (not including the stop denoted herein as OrfA (SEQID NO:1). Nucleotides 1-4876 60 codon) which encodes a 2910 amino acid sequence, repre of the cDNA sequence containing the second partial open sented herein as SEQ ID NO:2. Within OrfA are twelve reading frame described in U.S. application Ser. No. 09/231, domains: (a) one B-ketoacyl-ACP synthase (KS) domain; (b) 899 (denoted therein as SEQID NO:71) matches nucleotides one malonyl-CoA:ACP acyltransferase (MAT) domain; (c) 1311-6177 (plus the stop codon) of the sequence denoted nine acyl carrier protein (ACP) domains; and (d) one ketore hereinas OrfB (SEQID NO:3). Nucleotides 145-4653 of the 65 ductase (KR) domain. cDNA sequence containing the complete open reading frame The nucleotide sequence for OrfA has been deposited with described in U.S. application Ser. No. 09/231,899 (denoted GenBank as Accession No. AF378327 (amino acid sequence US 7,842,796 B2 17 18 Accession No. AAK728879). OrfA was compared with ID NO:1). The amino acid sequence containing the MAT known sequences in a standard BLAST search (BLAST 2.0 domain spans from a starting point of between about posi Basic BLAST homology search using blastp for amino acid tions 575 and 600 of SEQIDNO:2 (ORFA) to an ending point searches, blastin for nucleic acid searches, and blastX for of between about positions 935 and 1000 of SEQ ID NO:2. nucleic acid searches and searches of the translated amino 5 The amino acid sequence containing the ORFA-MAT domain acid sequence in all 6 open reading frames with standard is represented herein as SEQID NO:10 (positions 575-1000 default parameters, wherein the query sequence is filtered for of SEQ ID NO:2). It is noted that the ORFA-MAT domain low complexity regions by default (described in Altschul, S. contains an active site motif: GHS*XG (acyl binding site F., Madden, T. L., Schäffer, A. A., Zhang, J., Zhang, Z. So), represented herein as SEQID NO:11. Miller, W. & Lipman, D. J. (1997) “Gapped BLAST and 10 According to the present invention, a domain or protein PSI-BLAST: a new generation of protein database search having malonyl-CoA:ACP acyltransferase (MAT) biological programs. Nucleic Acids Res. 25:3389-3402, incorporated activity (function) is characterized as one that transfers the herein by reference in its entirety)). At the nucleic acid level, malonyl moiety from malonyl-CoA to ACP. In addition to the OrfA has no significant homology to any known nucleotide active site motif (GXSXG), these enzymes possess an sequence. At the amino acid level, the sequences with the 15 extended Motif R and Q amino acids in key positions) that greatest degree of homology to ORFA were: Nostoc sp. 7120 identifies them as MAT enzymes (in contrast to the AT heterocyst glycolipid synthase (Accession No. domain of Schizochytrium OrfB). In some PKS systems (but NC 003272), which was 42% identical to ORFA over 1001 not the PUFAPKS domain) MAT domains will preferentially amino acid residues; and Moritella marinus (Vibrio marinus) load methyl- or ethyl-malonate on to the ACP group (from the ORF8 (Accession No. AB025342), which was 40% identical corresponding CoA ester), thereby introducing branches into to ORFA over 993 amino acid residues. the linear carbon chain. MAT domains can be recognized by The first domain in OrfA is a KS domain, also referred to their homology to known MAT sequences and by their herein as ORFA-KS. This domain is contained within the extended motif structure. nucleotide sequence spanning from a starting point of Domains 3-11 of OrfA are nine tandem ACP domains, also between about positions 1 and 40 of SEQID NO:1 (OrfA) to 25 referred to herein as ORFA-ACP (the first domain in the an ending point of between about positions 1428 and 1500 of sequence is ORFA-ACP1, the second domain is ORFA SEQ ID NO:1. The nucleotide sequence containing the ACP2, the third domain is ORFA-ACP3, etc.). The first ACP sequence encoding the ORFA-KS domain is represented domain, ORFA-ACP1, is contained within the nucleotide herein as SEQID NO:7 (positions 1-1500 of SEQID NO:1). sequence spanning from about position 3343 to about posi The amino acid sequence containing the KS domain spans 30 tion 3600 of SEQID NO:1 (OrfA). The nucleotide sequence from a starting point of between about positions 1 and 14 of containing the sequence encoding the ORFA-ACP1 domain SEQID NO:2 (ORFA) to an ending point of between about is represented hereinas SEQID NO:12 (positions 3343-3600 positions 476 and 500 of SEQ ID NO:2. The amino acid of SEQ ID NO:1). The amino acid sequence containing the sequence containing the ORFA-KS domain is represented first ACP domain spans from about position 1115 to about herein as SEQID NO:8 (positions 1-500 of SEQID NO:2). It 35 position 1200 of SEQ ID NO:2. The amino acid sequence is noted that the ORFA-KS domain contains an active site containing the ORFA-ACP1 domain is represented herein as motif: DXAC* (acyl binding site Cs). SEQID NO:13 (positions 1115-1200 of SEQID NO:2). It is According to the present invention, a domain or protein noted that the ORFA-ACP1 domain contains an active site having 3-keto acyl-ACP synthase (KS) biological activity motif: LGIDS (pantetheline binding motif Ss.), repre (function) is characterized as the enzyme that carries out the 40 sented herein by SEQID NO:14. initial step of the FAS (and PKS) elongation reaction cycle. The nucleotide and amino acid sequences of all nine ACP The acyl group destined for elongation is linked to a cysteine domains are highly conserved and therefore, the sequence for residue at the active site of the enzyme by a thioester bond. In each domain is not represented herein by an individual the multi-step reaction, the acyl-enzyme undergoes conden sequence identifier. However, based on the information dis sation with malonyl-ACP to form -ketoacyl-ACP, CO and 45 closed herein, one of skill in the art can readily determine the free enzyme. The KS plays a key role in the elongation cycle sequence containing each of the other eight ACP domains and in many systems has been shown to possess greater (see discussion below). Substrate specificity than other enzymes of the reaction cycle. All nine ACP domains together span a region of OrfA of For example, E. coli has three distinct KS enzymes—each from about position 3283 to about position 6288 of SEQID with its own particular role in the physiology of the organism 50 NO:1, which corresponds to amino acid positions of from (Magnuson et al., Microbiol. Rev. 57, 522 (1993)). The two about 1095 to about 2096 of SEQ ID NO:2. The nucleotide KS domains of the PUFA-PKS systems could have distinct sequence for the entire ACP region containing all nine roles in the PUFA biosynthetic reaction sequence. domains is represented herein as SEQID NO:16. The region As a class of enzymes, KS's have been well characterized. represented by SEQID NO:16 includes the linker segments The sequences of many verified KS genes are know, the active 55 between individual ACP domains. The repeat interval for the site motifs have been identified and the crystal structures of nine domains is approximately every 330 nucleotides of SEQ several have been determined. Proteins (or domains of pro ID NO:16 (the actual number of amino acids measured teins) can be readily identified as belonging to the KS family between adjacent active site serines ranges from 104 to 116 of enzymes by homology to known KS sequences. amino acids). Each of the nine ACP domains contains a pan The second domain in OrfA is a MAT domain, also referred 60 tetheline binding motif LGIDS (represented herein by SEQ to hereinas ORFA-MAT. This domain is contained within the ID NO:14), wherein S is the pantetheine binding site nucleotide sequence spanning from a starting point of serine (S). The pantetheline binding site serine (S) is located between about positions 1723 and 1798 of SEQ ID NO:1 near the center of each ACP domain sequence. At each end of (OrfA) to an ending point of between about positions 2805 the ACP domain region and between each ACP domain is a and 3000 of SEQID NO:1. The nucleotide sequence contain 65 region that is highly enriched for proline (P) and alanine (A), ing the sequence encoding the ORFA-MAT domain is repre which is believed to be a linker region. For example, between sented herein as SEQID NO:9 (positions 1723-3000 of SEQ ACP domains 1 and 2 is the sequence: APAPVKAAA US 7,842,796 B2 19 20 PAAPVASAPAPA, represented herein as SEQ ID NO:15. 6177 of SEQID NO:3 correspond to nucleotides 1-2932 and The locations of the active site serine residues (i.e., the pan 2934-4867 of the sequence denoted as SEQID NO:71 in U.S. tetheline binding site) for each of the nine ACP domains, with application Ser. No. 09/231,899 (The cDNA sequence in U.S. respect to the amino acid sequence of SEQID NO:2, are as application Ser. No. 09/231,899 contains about 345 addi follows: ACP1 =Ss: ACP2=S: ACP3-S77; 5 tional nucleotides beyond the stop codon, including a polyA ACP4-Sass: ACP5-Sea; ACP6=Szs: ACP7=Ss: tail). Therefore, nucleotides 1-1310 of SEQID NO:1 repre ACP8–So, and ACP9–S. Given that the average size of sent additional sequence that was not disclosed in U.S. appli an ACP domain is about 85 amino acids, excluding the linker, cation Ser. No. 09/231,899. This novel region of SEQ ID and about 110 amino acids including the linker, with the NO:3 contains most of the KS domain encoded by OrfB. active site serine being approximately in the center of the 10 OrfB is a 6177 nucleotide sequence (not including the stop domain, one of skill in the art can readily determine the codon) which encodes a 2059 amino acid sequence, repre positions of each of the nine ACP domains in OrfA. sented herein as SEQ ID NO:4. Within OrfB are four According to the present invention, a domain or protein domains: (a) one B-ketoacyl-ACP synthase (KS) domain; (b) having acyl carrier protein (ACP) biological activity (func one chain length factor (CLF) domain; (c) one acyl trans tion) is characterized as being Small polypeptides (typically, 15 ferase (AT) domain; and, (d) one enoyl ACP-reductase (ER) 80 to 100 amino acids long), that function as carriers for domain. growing fatty acyl chains via a thioester linkage to a The nucleotide sequence for OrfB has been deposited with covalently bound co-factor of the protein. They occur as GenBank as Accession No. AF378328 (amino acid sequence separate units or as domains within larger proteins. ACPs are Accession No. AAK728880). OrfB was compared with converted from inactive apo-forms to functional holo-forms known sequences in a standard BLAST search as described by transfer of the phosphopantetheinyl moeity of CoA to a above. At the nucleic acid level, OrfB has no significant highly conserved serine residue of the ACP. Acyl groups are homology to any known nucleotide sequence. At the amino attached to ACP by a thioester linkage at the free terminus of acid level, the sequences with the greatest degree of homol the phosphopantetheinyl moiety. ACPs can be identified by ogy to ORFB were: Shewanella sp. hypothetical protein (Ac labeling with radioactive pantetheline and by sequence 25 cession No. U73935), which was 53% identical to ORFB over homology to known ACPs. The presence of variations of the 458 amino acid residues; Moritella marinus (Vibrio marinus) above mentioned motif (LGIDS) is also a signature of an ORF11 (Accession No. AB025342), which was 53% identi ACP. cal to ORFB over 460 amino acid residues; Photobacterium Domain 12 in OrfA is a KR domain, also referred to herein profundum omega-3 polyunsaturated fatty acid synthase as ORFA-KR. This domain is contained within the nucleotide 30 PfalD (Accession No. AF409100), which was 52% identical sequence spanning from a starting point of about position to ORFB over 457 amino acid residues; and Nostoc sp. 7120 6598 of SEQID NO:1 to an ending point of about position hypothetical protein (Accession No. NC 003272), which 8730 of SEQID NO:1. The nucleotide sequence containing was 53% identical to ORFB over 430 amino acid residues. the sequence encoding the ORFA-KR domain is represented The first domain in OrfB is a KS domain, also referred to herein as SEQ ID NO:17 (positions 6598-8730 of SEQ ID 35 herein as ORFB-KS. This domain is contained within the NO:1). The amino acid sequence containing the KR domain nucleotide sequence spanning from a starting point of spans from a starting point of about position 2200 of SEQID between about positions 1 and 43 of SEQID NO:3 (OrfB) to NO:2 (ORFA) to an ending point of about position 2910 of an ending point of between about positions 1332 and 1350 of SEQ ID NO:2. The amino acid sequence containing the SEQ ID NO:3. The nucleotide sequence containing the ORFA-KR domain is represented herein as SEQ ID NO:18 40 sequence encoding the ORFB-KS domain is represented (positions 2200-2910 of SEQ ID NO:2). Within the KR hereinas SEQIDNO:19 (positions 1-1350 of SEQIDNO:3). domain is a core region with homology to short chain alde The amino acid sequence containing the KS domain spans hyde-dehydrogenases (KR is a member of this family). This from a starting point of between about positions 1 and 15 of core region spans from about position 7198 to about position SEQ ID NO:4 (ORFB) to an ending point of between about 7500 of SEQ ID NO:1, which corresponds to amino acid 45 positions 444 and 450 of SEQ ID NO:4. The amino acid positions 2400-2500 of SEQID NO:2. sequence containing the ORFB-KS domain is represented According to the present invention, a domain or protein herein as SEQID NO:20 (positions 1-450 of SEQID NO:4). having ketoreductase activity, also referred to as 3-ketoacyl It is noted that the ORFB-KS domain contains an active site ACP reductase (KR) biological activity (function), is charac motif: DXAC* (*acyl binding site Co.). KS biological activ terized as one that catalyzes the pyridine-nucleotide-depen 50 ity and methods of identifying proteins or domains having dent reduction of 3-keto acyl forms of ACP. It is the first such activity is described above. reductive step in the de novo fatty acid biosynthesis elonga The second domain in OrfB is a CLF domain, also referred tion cycle and a reaction often performed in polyketide bio to hereinas ORFB-CLF. This domain is contained within the synthesis. Significant sequence similarity is observed with nucleotide sequence spanning from a starting point of one family of enoyl ACP reductases (ER), the other reductase 55 between about positions 1378 and 1402 of SEQ ID NO:3 of FAS (but not the ER family present in the PUFA PKS (OrfB) to an ending point of between about positions 2682 system), and the short-chain alcohol dehydrogenase family. and 2700 of SEQID NO:3. The nucleotide sequence contain Pfam analysis of the PUFA PKS region indicated above ing the sequence encoding the ORFB-CLF domain is repre reveals the homology to the short-chain alcohol dehydroge sented hereinas SEQID NO:21 (positions 1378-2700 of SEQ nase family in the core region. Blast analysis of the same 60 ID NO:3). The amino acid sequence containing the CLF region reveals matches in the core area to known KR enzymes domain spans from a starting point of between about posi as well as an extended region of homology to domains from tions 460 and 468 of SEQIDNO:4(ORFB) to an ending point the other characterized PUFA PKS systems. of between about positions 894 and 900 of SEQID NO:4. The amino acid sequence containing the ORFB-CLF domain is Open Reading Frame B (OrfB): 65 represented herein as SEQ ID NO:22 (positions 460-900 of The complete nucleotide sequence for OrfB is represented SEQ ID NO:4). It is noted that the ORFB-CLF domain con herein as SEQ ID NO:3. Nucleotides 1311-4242 and 4244 tains a KS active site motif without the acyl-binding cysteine. US 7,842,796 B2 21 22 According to the present invention, a domain or protein is taining the ER domain spans from a starting point of about referred to as a chain length factor (CLF) based on the fol position 1550 of SEQID NO:4 (ORFB) to an ending point of lowing rationale. The CLF was originally described as char about position 2059 of SEQ ID NO:4. The amino acid acteristic of Type II (dissociated enzymes) PKS systems and sequence containing the ORFB-ER domain is represented was hypothesized to play a role in determining the number of herein as SEQ ID NO:26 (positions 1550-2059 of SEQ ID elongation cycles, and hence the chain length, of the end NO:4). product. CLF amino acid sequences show homology to KS According to the present invention, this domain has enoyl domains (and are thought to form heterodimers with a KS reductase (ER) biological activity. The ER enzyme reduces protein), but they lack the active site cysteine. CLF's role in the trans-double bond (introduced by the DH activity) in the PKS systems is currently controversial. New evidence (C. 10 fatty acyl-ACP, resulting in fully saturating those carbons. Bisang et al., Nature 401, 502 (1999)) suggests a role in The ER domain in the PUFA-PKS shows homology to a priming (providing the initial acyl group to be elongated) the newly characterized family of ER enzymes (Heath et al., PKS systems. In this role the CLF domain is thought to Nature 406, 145 (2000)). Heath and Rock identified this new decarboxylate malonate (as malonyl-ACP), thus forming an class of ER enzymes by cloning a gene of interest from acetate group that can be transferred to the KS active site. This 15 Streptococcus pneumoniae, purifying a protein expressed acetate therefore acts as the priming molecule that can from that gene, and showing that it had ER activity in an in undergo the initial elongation (condensation) reaction. vitro assay. The sequence of the Schizochytrium ER domain Homologues of the Type II CLF have been identified as of OrfB shows homology to the S. pneumoniae ER protein. loading domains in Some modular PKS systems. A domain All of the PUFA PKS systems currently examined contain at with the sequence features of the CLF is found in all currently least one domain with very high sequence homology to the identified PUFA PKS systems and in each case is found as Schizochytrium ER domain. The Schizochytrium PUFA PKS part of a multidomain protein. system contains two ER domains (one on OrfB and one on The third domain in OrfB is an AT domain, also referred to Orf(). herein as ORFB-AT. This domain is contained within the nucleotide sequence spanning from a starting point of 25 Open Reading Frame C (OrfQ): between about positions 2701 and 3598 of SEQ ID NO:3 The complete nucleotide sequence for Orfc is represented (OrfB) to an ending point of between about positions 3975 hereinas SEQIDNO:5. Nucleotides 1-4506 of SEQID NO:5 and 4200 of SEQID NO:3. The nucleotide sequence contain (i.e., the entire open reading frame sequence, not including ing the sequence encoding the ORFB-AT domain is repre the stop codon) correspond to nucleotides 145-2768,2770 sented hereinas SEQID NO:23 (positions 2701-4200 of SEQ 30 2805, 2807-2817, and 2819-4653 of the sequence denoted as ID NO:3). The amino acid sequence containing the AT SEQID NO:76 in U.S. application Ser. No. 09/231,899 (The domain spans from a starting point of between about posi cDNA sequence in U.S. application Ser. No. 09/231,899 con tions 901 and 1200 of SEQ ID NO:4 (ORFB) to an ending tains about 144 nucleotides upstream of the start codon for point of between about positions 1325 and 1400 of SEQID OrfQ and about 110 nucleotides beyond the stop codon, NO:4. The amino acid sequence containing the ORFB-AT 35 including a polyA tail). Orf is a 4506 nucleotide sequence domain is represented herein as SEQ ID NO:24 (positions (not including the stop codon) which encodes a 1502 amino 901-1400 of SEQ ID NO:4). It is noted that the ORFB-AT acid sequence, represented herein as SEQ ID NO:6. Within domain contains an active site motif of GXS*XG (acyl bind OrfQ are three domains: (a) two FabA-like B-hydroxyacyl ing site Sao) that is characteristic of acyltransferase (AT) ACP dehydrase (DH) domains; and (b) one enoyl ACP-re proteins. 40 ductase (ER) domain. An “acyltransferase' or “AT” refers to a general class of The nucleotide sequence for Orf has been deposited with enzymes that can carry out a number of distinct acyl transfer GenBank as Accession No. AF3783.29 (amino acid sequence reactions. The Schizochytrium domain shows good homology Accession No. AAK728881). Orf was compared with to a domain present in all of the other PUFA PKS systems known sequences in a standard BLAST search as described currently examined and very weak homology to some acyl 45 above. At the nucleic acid level, Orf has no significant transferases whose specific functions have been identified homology to any known nucleotide sequence. At the amino (e.g. to malonyl-CoA:ACP acyltransferase, MAT). In spite of acid level (Blastp), the sequences with the greatest degree of the weak homology to MAT, this AT domain is not believed to homology to ORFC were: Moritella marinus (Vibrio mari function as a MAT because it does not possess an extended nus) ORF11 (Accession No. ABO25342), which is 45% iden motif structure characteristic of such enzymes (see MAT 50 tical to ORFC over 514 amino acid residues. Shewanella sp. domain description, above). For the purposes of this disclo hypothetical protein 8 (Accession No. U73935), which is sure, the functions of the AT domain in a PUFA PKS system 49% identical to ORFC over 447 amino acid residues, Nostoc include, but are not limited to: transfer of the fatty acyl group sp. hypothetical protein (Accession No. NC 003272), which from the ORFAACP domain(s) to water (i.e. a thioesterase— is 49% identical to ORFC over 430 amino acid residues, and releasing the fatty acyl group as a free fatty acid), transfer of 55 Shewanella sp. hypothetical protein 7 (Accession No. a fatty acyl group to an acceptor Such as CoA, transfer of the U73935), which is 37% identical to ORFC over 930 amino acyl group among the various ACP domains, or transfer of the acid residues. fatty acyl group to a lipophilic acceptor molecule (e.g. to The first domain in Orf is a DH domain, also referred to lysophosphadic acid). herein as ORFC-DH1. This is one of two DH domains in The fourth domain in OrfB is an ER domain, also referred 60 OrfQ, and therefore is designated DH1. This domain is con to herein as ORFB-ER. This domain is contained within the tained within the nucleotide sequence spanning from a start nucleotide sequence spanning from a starting point of about ing point of between about positions 1 and 778 of SEQ ID position 4648 of SEQID NO:3 (OrfB) to an ending point of NO:5 (OrfQ) to an ending point of between about positions about position 6177 of SEQ ID NO:3. The nucleotide 1233 and 1350 of SEQ ID NO:5. The nucleotide sequence sequence containing the sequence encoding the ORFB-ER 65 containing the sequence encoding the ORFC-DH1 domain is domain is represented herein as SEQ ID NO:25 (positions represented herein as SEQ ID NO:27 (positions 1-1350 of 4648-6177 of SEQID NO:3). The amino acid sequence con SEQID NO:5). The amino acid sequence containing the DH1 US 7,842,796 B2 23 24 domain spans from a starting point of between about posi ing an amino acid sequence that is at least about 60% identical tions 1 and 260 of SEQID NO:6 (ORFC) to an ending point to at least 500 consecutive amino acids of said amino acid of between about positions 411 and 450 of SEQID NO:6. The sequence of (a), wherein said amino acid sequence has a amino acid sequence containing the ORFC-DH1 domain is biological activity of at least one domain of a polyunsaturated represented hereinas SEQIDNO:28 (positions 1-450 of SEQ fatty acid (PUFA) polyketide synthase (PKS) system; (d) a ID NO:6). nucleic acid sequence encoding an amino acid sequence that The characteristics of both the DH domains (see below for is at least about 60% identical to said amino acid sequence of DH2) in the PUFA PKS systems have been described in the (b), wherein said amino acid sequence has a biological activ preceding sections. This class of enzyme removes HOH from ity of at least one domain of a polyunsaturated fatty acid a B-keto acyl-ACP and leaves a trans double bond in the 10 (PUFA) polyketide synthase (PKS) system; or (e) a nucleic carbon chain. The DH domains of the PUFA PKS systems acid sequence that is fully complementary to the nucleic acid show homology to bacterial DH enzymes associated with sequence of (a), (b), (c), or (d). In a further embodiment, their FAS systems (rather than to the DH domains of other nucleic acid sequences including a sequence encoding the PKS systems). A subset of bacterial DH's, the FabA-like active site domains or other functional motifs described DHs, possesses cis-trans isomerase activity (Heath et al., J. 15 above for several of the PUFAPKS domains are encompassed Biol. Chem., 271, 27795 (1996)). It is the homologies to the by the invention. FabA-like DHS that indicate that one or both of the DH According to the present invention, an amino acid domains is responsible for insertion of the cis double bonds in sequence that has a biological activity of at least one domain the PUFA PKS products. of a PUFA PKS system is an amino acid sequence that has the The second domain in Orf is a DH domain, also referred biological activity of at least one domain of the PUFA PKS to herein as ORFC-DH2. This is the second of two DH system described in detail herein, as exemplified by the domains in Orfc, and therefore is designated DH2. This Schizochytrium PUFA PKS system. The biological activities domain is contained within the nucleotide sequence spanning of the various domains within the Schizochytrium PUFA PKS from a starting point of between about positions 1351 and system have been described in detail above. Therefore, an 2437 of SEQID NO:5 (Orf) to an ending point of between 25 isolated nucleic acid molecule of the present invention can about positions 2607 and 2847 of SEQID NO:5. The nucle encode the translation product of any PUFA PKS open read otide sequence containing the sequence encoding the ORFC ing frame, PUFA PKS domain, biologically active fragment DH2 domain is represented herein as SEQID NO:29 (posi thereof, or any homologue of a naturally occurring PUFA tions 1351-2847 of SEQID NO:5). The amino acid sequence PKS open reading frame or domain which has biological containing the DH2 domain spans from a starting point of 30 activity. A homologue of given protein or domain is a protein between about positions 451 and 813 of SEQ ID NO:6 or polypeptide that has an amino acid sequence which differs (ORFC) to an ending point of between about positions 869 from the naturally occurring reference amino acid sequence and 949 of SEQID NO:6. The amino acid sequence contain (i.e., of the reference protein or domain) in that at least one or ing the ORFC-DH2 domain is represented herein as SEQID a few, but not limited to one or a few, amino acids have been NO:30 (positions 451-949 of SEQID NO:6). DH biological 35 deleted (e.g., a truncated version of the protein, Such as a activity has been described above. peptide or fragment), inserted, inverted, Substituted and/or The third domain in Orf is an ER domain, also referred to derivatized (e.g., by glycosylation, phosphorylation, acetyla herein as ORFC-ER. This domain is contained within the tion, myristoylation, prenylation, palmitation, amidation and/ nucleotide sequence spanning from a starting point of about or addition of glycosylphosphatidyl inositol). Preferred position 2995 of SEQID NO:5 (Orf) to an ending point of 40 homologues of a PUFA PKS protein or domain are described about position 4506 of SEQ ID NO:5. The nucleotide in detail below. It is noted that homologues can include Syn sequence containing the sequence encoding the ORFC-ER thetically produced homologues, naturally occurring allelic domain is represented herein as SEQ ID NO:31 (positions variants of a given protein or domain, or homologous 2995-4506 of SEQID NO:5). The amino acid sequence con sequences from organisms other than the organism from taining the ER domain spans from a starting point of about 45 which the reference sequence was derived. position 9999 of SEQID NO:6 (ORFC) to an ending point of In general, the biological activity or biological action of a about position 1502 of SEQ ID NO:6. The amino acid protein or domain refers to any function(s) exhibited or per sequence containing the ORFC-ER domain is represented formed by the protein or domain that is ascribed to the natu herein as SEQ ID NO:32 (positions 999-1502 of SEQ ID rally occurring form of the protein or domain as measured or NO:6). ER biological activity has been described above. 50 observed in vivo (i.e., in the natural physiological environ One embodiment of the present invention relates to an ment of the protein) or in vitro (i.e., under laboratory condi isolated nucleic acid molecule comprising a nucleic acid tions). Biological activities of PUFA PKS systems and the sequence from a non-bacterial PUFA PKS system, a homo individual proteins/domains that make up a PUFA PKS sys logue thereof, a fragment thereof, and/or a nucleic acid tem have been described in detail elsewhere herein. Modifi sequence that is complementary to any of Such nucleic acid 55 cations of a protein or domain, such as in a homologue or sequences. In one aspect, the present invention relates to an mimetic (discussed below), may result in proteins or domains isolated nucleic acid molecule comprising a nucleic acid having the same biological activity as the naturally occurring sequence selected from the group consisting of: (a) a nucleic protein or domain, or in proteins or domains having decreased acid sequence encoding an amino acid sequence selected or increased biological activity as compared to the naturally from the group consisting of: SEQID NO:2, SEQID NO:4, 60 occurring protein or domain. Modifications which result in a SEQID NO: 6, and biologically active fragments thereof, (b) decrease in expression or a decrease in the activity of the a nucleic acid sequence encoding an amino acid sequence protein or domain, can be referred to as inactivation (com selected from the group consisting of: SEQID NO:8, SEQID plete or partial), down-regulation, or decreased action of a NO:10, SEQ ID NO:13, SEQ ID NO:18, SEQ ID NO:20, protein or domain. Similarly, modifications which result in an SEQ ID NO:22, SEQ ID NO:24, SEQ ID NO:26, SEQ ID 65 increase in expression or an increase in the activity of the NO:28, SEQ ID NO:30, SEQ ID NO:32, and biologically protein or domain, can be referred to as amplification, over active fragments thereof (c) a nucleic acid sequence encod production, activation, enhancement, up-regulation or US 7,842,796 B2 25 26 increased action of a protein or domain. A functional domain with the complementary sequence of a nucleic acid molecule of a PUFA PKS system is a domain (i.e., a domain can be a useful in the present invention, or of a size sufficient to encode portion of a protein) that is capable of performing a biological an amino acid sequence having a biological activity of at least function (i.e., has biological activity). one domain of a PUFA PKS system according to the present In accordance with the present invention, an isolated invention. As such, the size of the nucleic acid molecule nucleic acid molecule is a nucleic acid molecule that has been encoding Such a protein can be dependent on nucleic acid removed from its natural milieu (i.e., that has been subject to composition and percent homology or identity between the human manipulation), its natural milieu being the genome or nucleic acid molecule and complementary sequence as well chromosome in which the nucleic acid molecule is found in as upon hybridization conditions per se (e.g., temperature, nature. As such, “isolated does not necessarily reflect the 10 salt concentration, and formamide concentration). The mini extent to which the nucleic acid molecule has been purified, mal size of a nucleic acid molecule that is used as an oligo but indicates that the molecule does not include an entire nucleotide primer or as a probe is typically at least about 12 to genome or an entire chromosome in which the nucleic acid about 15 nucleotides in length if the nucleic acid molecules molecule is found in nature. An isolated nucleic acid mol are GC-rich and at least about 15 to about 18 bases in length ecule can include a gene. An isolated nucleic acid molecule 15 if they are AT-rich. There is no limit, other than a practical that includes a gene is not a fragment of a chromosome that limit, on the maximal size of a nucleic acid molecule of the includes such gene, but rather includes the coding region and present invention, in that the nucleic acid molecule can regulatory regions associated with the gene, but no additional include a sequence sufficient to encode a biologically active genes naturally found on the same chromosome. An isolated fragment of a domain of a PUFA PKS system, an entire nucleic acid molecule can also include a specified nucleic domain of a PUFA PKS system, several domains within an acid sequence flanked by (i.e., at the 5' and/or the 3' end of the open reading frame (Orf) of a PUFA PKS system, an entire sequence) additional nucleic acids that do not normally flank Orf of a PUFA PKS system, or more than one Orf of a PUFA the specified nucleic acid sequence in nature (i.e., heterolo PKS system. gous sequences). Isolated nucleic acid molecule can include In one embodiment of the present invention, an isolated DNA, RNA (e.g., mRNA), or derivatives of either DNA or 25 nucleic acid molecule comprises or consists essentially of a RNA (e.g., cDNA). Although the phrase “nucleic acid mol nucleic acid sequence selected from the group of: SEQ ID ecule' primarily refers to the physical nucleic acid molecule NO:2, SEQID NO:4, SEQID NO:6, SEQID NO:8, SEQID and the phrase “nucleic acid sequence' primarily refers to the NO:10, SEQ ID NO:13, SEQ ID NO:18, SEQ ID NO:20, sequence of nucleotides on the nucleic acid molecule, the two SEQ ID NO:22, SEQ ID NO:24, SEQ ID NO:26, SEQ ID phrases can be used interchangeably, especially with respect 30 NO:28, SEQ ID NO:30, SEQ ID NO:32, or biologically to a nucleic acid molecule, or a nucleic acid sequence, being active fragments thereof. In one aspect, the nucleic acid capable of encoding a protein or domain of a protein. sequence is selected from the group of: SEQID NO:1, SEQ Preferably, an isolated nucleic acid molecule of the present ID NO:3, SEQID NO:5, SEQID NO:7, SEQID NO:9, SEQ invention is produced using recombinant DNA technology ID NO:12, SEQID NO:17, SEQID NO:19, SEQID NO:21, (e.g., polymerase chain reaction (PCR) amplification, clon 35 SEQ ID NO:23, SEQ ID NO:25, SEQ ID NO:27, SEQ ID ing) or chemical synthesis. Isolated nucleic acid molecules NO:29, and SEQ ID NO:31. In one embodiment of the include natural nucleic acid molecules and homologues present invention, any of the above-described PUFA PKS thereof, including, but not limited to, natural allelic variants amino acid sequences, as well as homologues of Such and modified nucleic acid molecules in which nucleotides sequences, can be produced with from at least one, and up to have been inserted, deleted, substituted, and/or inverted in 40 about 20, additional heterologous amino acids flanking each Such a manner that such modifications provide the desired of the C- and/or N-terminal end of the given amino acid effect on PUFA PKS system biological activity as described sequence. The resulting protein or polypeptide can be herein. Protein homologues (e.g., proteins encoded by referred to as “consisting essentially of a given amino acid nucleic acid homologues) have been discussed in detail sequence. According to the present invention, the heterolo above. 45 gous amino acids are a sequence of amino acids that are not A nucleic acid molecule homologue can be produced using naturally found (i.e., not found in nature, in vivo) flanking the a number of methods known to those skilled in the art (see, for given amino acid sequence or which would not be encoded by example, Sambrook et al., Molecular Cloning. A Laboratory the nucleotides that flank the naturally occurring nucleic acid Manual, Cold Spring Harbor Labs Press, 1989). For example, sequence encoding the given amino acid sequence as it occurs nucleic acid molecules can be modified using a variety of 50 in the gene, if Such nucleotides in the naturally occurring techniques including, but not limited to, classic mutagenesis sequence were translated using standard codon usage for the techniques and recombinant DNA techniques, such as site organism from which the given amino acid sequence is directed mutagenesis, chemical treatment of a nucleic acid derived. Similarly, the phrase “consisting essentially of, molecule to induce mutations, restriction enzyme cleavage of when used with reference to a nucleic acid sequence herein, a nucleic acid fragment, ligation of nucleic acid fragments, 55 refers to a nucleic acid sequence encoding a given amino acid PCR amplification and/or mutagenesis of selected regions of sequence that can be flanked by from at least one, and up to as a nucleic acid sequence, synthesis of oligonucleotide mix many as about 60, additional heterologous nucleotides at each tures and ligation of mixture groups to “build a mixture of of the 5' and/or the 3' end of the nucleic acid sequence encod nucleic acid molecules and combinations thereof. Nucleic ing the given amino acid sequence. The heterologous nucle acid molecule homologues can be selected from a mixture of 60 otides are not naturally found (i.e., not found in nature, in modified nucleic acids by screening for the function of the vivo) flanking the nucleic acid sequence encoding the given protein encoded by the nucleic acid and/or by hybridization amino acid sequence as it occurs in the natural gene. with a wild-type gene. The present invention also includes an isolated nucleic acid The minimum size of a nucleic acid molecule of the present molecule comprising a nucleic acid sequence encoding an invention is a size sufficient to form a probe or oligonucle 65 amino acid sequence having a biological activity of at least otide primer that is capable of forming a stable hybrid (e.g., one domain of a PUFA PKS system. In one aspect, such a under moderate, high or very high Stringency conditions) nucleic acid sequence encodes a homologue of any of the US 7,842,796 B2 27 28 Schizochytrium PUFA PKS ORFs or domains, including: In one aspect of the invention, a homologue of a SEQID NO:2, SEQID NO:4, SEQID NO:6, SEQID NO:8, Schizochytrium PUFA PKS protein or domain encompassed SEQ ID NO:10, SEQ ID NO:13, SEQ ID NO:18, SEQ ID by the present invention comprises an amino acid sequence NO:20, SEQ ID NO:22, SEQ ID NO:24, SEQ ID NO:26, that is at least about 60% identical to an amino acid sequence SEQID NO:28, SEQID NO:30, or SEQID NO:32, wherein chosen from: SEQID NO:8, SEQID NO:10, SEQID NO:13, the homologue has a biological activity of at least one domain SEQ ID NO:18, SEQ ID NO:20, SEQ ID NO:22, SEQ ID of a PUFA PKS system as described previously herein. NO:24, SEQID NO:26, SEQID NO:28, SEQID NO:30, or In one aspect of the invention, a homologue of a SEQ ID NO:32, wherein said amino acid sequence has a Schizochytrium PUFA PKS protein or domain encompassed biological activity of at least one domain of a PUFA PKS by the present invention comprises an amino acid sequence 10 system. In a further aspect, the amino acid sequence of the that is at least about 60% identical to at least 500 consecutive homologue is at least about 65% identical, and more prefer amino acids of an amino acid sequence chosen from: SEQID ably at least about 70% identical, and more preferably at least NO:2, SEQID NO:4, and SEQID NO:6; wherein said amino about 75% identical, and more preferably at least about 80% acid sequence has a biological activity of at least one domain 15 identical, and more preferably at least about 85% identical, of a PUFA PKS system. In a further aspect, the amino acid and more preferably at least about 90% identical, and more sequence of the homologue is at least about 60% identical to preferably at least about 95% identical, and more preferably at least about 600 consecutive amino acids, and more prefer at least about 96% identical, and more preferably at least ably to at least about 700 consecutive amino acids, and more about 97% identical, and more preferably at least about 98% preferably to at least about 800 consecutive amino acids, and identical, and more preferably at least about 99% identical to more preferably to at least about 900 consecutive amino an amino acid sequence chosen from: SEQID NO:8, SEQID acids, and more preferably to at least about 1000 consecutive NO:10, SEQ ID NO:13, SEQ ID NO:18, SEQ ID NO:20, amino acids, and more preferably to at least about 1100 SEQ ID NO:22, SEQ ID NO:24, SEQ ID NO:26, SEQ ID consecutive amino acids, and more preferably to at least about NO:28, SEQID NO:30, SEQID NO:32, wherein the amino 1200 consecutive amino acids, and more preferably to at least 25 acid sequence has a biological activity of at least one domain about 1300 consecutive amino acids, and more preferably to of a PUFA PKS system. at least about 1400 consecutive amino acids, and more pref According to the present invention, the term "contiguous’ erably to at least about 1500 consecutive amino acids of any or “consecutive', with regard to nucleic acid or amino acid of SEQID NO:2, SEQID NO.4 and SEQID NO:6, or to the sequences described herein, means to be connected in an full length of SEQ ID NO:6. In a further aspect, the amino 30 unbroken sequence. For example, for a first sequence to com acid sequence of the homologue is at least about 60% identi prise 30 contiguous (or consecutive) amino acids of a second cal to at least about 1600 consecutive amino acids, and more sequence, means that the first sequence includes an unbroken preferably to at least about 1700 consecutive amino acids, and sequence of 30 amino acid residues that is 100% identical to more preferably to at least about 1800 consecutive amino an unbroken sequence of 30 amino acid residues in the second acids, and more preferably to at least about 1900 consecutive 35 sequence. Similarly, for a first sequence to have “100% iden amino acids, and more preferably to at least about 2000 tity” with a second sequence means that the first sequence consecutive amino acids of any of SEQID NO:2 or SEQID exactly matches the second sequence with no gaps between NO:4, or to the full length of SEQ ID NO:4. In a further nucleotides or amino acids. aspect, the amino acid sequence of the homologue is at least As used herein, unless otherwise specified, reference to a about 60% identical to at least about 2100 consecutive amino 40 percent (%) identity refers to an evaluation of homology acids, and more preferably to at least about 2200 consecutive which is performed using: (1) a BLAST 2.0 Basic BLAST amino acids, and more preferably to at least about 2300 homology search using blastp foramino acid searches, blastin consecutive amino acids, and more preferably to at least about for nucleic acid searches, and blastX for nucleic acid searches 2400 consecutive amino acids, and more preferably to at least and searches of translated amino acids in all 6 open reading about 2500 consecutive amino acids, and more preferably to 45 frames, all with standard default parameters, wherein the at least about 2600 consecutive amino acids, and more pref query sequence is filtered for low complexity regions by erably to at least about 2700 consecutive amino acids, and default (described in Altschul, S.F., Madden, T. L., Schäffer, more preferably to at least about 2800 consecutive amino A. A., Zhang, J., Zhang, Z. Miller, W. & Lipman, D.J. (1997) acids, and even more preferably, to the full length of SEQID “Gapped BLAST and PSI-BLAST: a new generation of pro NO:2. 50 tein database search programs. Nucleic Acids Res. 25:3389 In another aspect, a homologue of a Schizochytrium PUFA 3402, incorporated herein by reference in its entirety); (2) a PKS protein or domain encompassed by the present invention BLAST 2 alignment (using the parameters described below); comprises an amino acid sequence that is at least about 65% (3) and/or PSI-BLAST with the standard default parameters identical, and more preferably at least about 70% identical, (Position-Specific Iterated BLAST). It is noted that due to and more preferably at least about 75% identical, and more 55 some differences in the standard parameters between BLAST preferably at least about 80% identical, and more preferably 2.0 Basic BLAST and BLAST 2, two specific sequences at least about 85% identical, and more preferably at least might be recognized as having significant homology using about 90% identical, and more preferably at least about 95% the BLAST 2 program, whereas a search performed in identical, and more preferably at least about 96% identical, BLAST 2.0 Basic BLAST using one of the sequences as the and more preferably at least about 97% identical, and more 60 query sequence may not identify the second sequence in the preferably at least about 98% identical, and more preferably top matches. In addition, PSI-BLAST provides an automated, at least about 99% identical to an amino acid sequence chosen easy-to-use version of a “profile search, which is a sensitive from: SEQID NO:2, SEQID NO:4, or SEQID NO:6, over way to look for sequence homologues. The program first any of the consecutive amino acid lengths described in the performs a gapped BLAST database search. The PSI-BLAST paragraph above, wherein the amino acid sequence has a 65 program uses the information from any significant align biological activity of at least one domain of a PUFA PKS ments returned to construct a position-specific score matrix, system. which replaces the query sequence for the next round of US 7,842,796 B2 29 30 database searching. Therefore, it is to be understood that which permit isolation of nucleic acid molecules having at percent identity can be determined by using any one of these least about 70% nucleic acid sequence identity with the programs. nucleic acid molecule being used to probe in the hybridiza Two specific sequences can be aligned to one another using tion reaction (i.e., conditions permitting about 30% or less BLAST 2 sequence as described in Tatusova and Madden, mismatch of nucleotides). High Stringency hybridization and (1999), "Blast 2 sequences—a new tool for comparing pro washing conditions, as referred to herein, refer to conditions tein and nucleotide sequences. FEMS Microbiol Lett. 174: which permit isolation of nucleic acid molecules having at 247-250, incorporated herein by reference in its entirety. least about 80% nucleic acid sequence identity with the BLAST 2 sequence alignment is performed in blastp or blastin nucleic acid molecule being used to probe in the hybridiza using the BLAST 2.0 algorithm to perform a Gapped BLAST 10 tion reaction (i.e., conditions permitting about 20% or less search (BLAST 2.0) between the two sequences allowing for mismatch of nucleotides). Very high Stringency hybridization the introduction of gaps (deletions and insertions) in the and washing conditions, as referred to herein, refer to condi resulting alignment. For purposes of clarity herein, a BLAST tions which permit isolation of nucleic acid molecules having 2 sequence alignment is performed using the standard default at least about 90% nucleic acid sequence identity with the parameters as follows. 15 nucleic acid molecule being used to probe in the hybridiza tion reaction (i.e., conditions permitting about 10% or less For blastin, using 0 BLOSUM62 matrix: mismatch of nucleotides). As discussed above, one of skill in Reward for match=1 the art can use the formulae in Meinkoth et al., ibid. to calcu Penalty for mismatch=-2 late the appropriate hybridization and wash conditions to Open gap (5) and extension gap (2) penalties achieve these particular levels of nucleotide mismatch. Such gap X dropoff (50) expect (10) word size (11) filter (on) conditions will vary, depending on whether DNA:RNA or For blastp, using 0 BLOSUM62 matrix: DNA:DNA hybrids are being formed. Calculated melting Open gap (11) and extension gap (1) penalties temperatures for DNA:DNA hybrids are 10° C. less than for gap X dropoff (50) expect (10) word size (3) filter (on). DNA:RNA hybrids. In particular embodiments, stringent In another embodiment of the invention, an amino acid 25 hybridization conditions for DNA:DNA hybrids include sequence having the biological activity of at least one domain hybridization at an ionic strength of 6xSSC (0.9 M. Na") at a of a PUFA PKS system of the present invention includes an temperature of between about 20° C. and about 35°C. (lower amino acid sequence that is sufficiently similar to a naturally stringency), more preferably, between about 28°C. and about occurring PUFA PKS protein or polypeptide that a nucleic 40° C. (more stringent), and even more preferably, between acid sequence encoding the amino acid sequence is capable of 30 about 35° C. and about 45° C. (even more stringent), with hybridizing under moderate, high, or very high Stringency appropriate wash conditions. In particular embodiments, conditions (described below) to (i.e., with) a nucleic acid stringent hybridization conditions for DNA:RNA hybrids molecule encoding the naturally occurring PUFA PKS pro include hybridization at an ionic strength of 6xSSC (0.9 M tein or polypeptide (i.e., to the complement of the nucleic acid Na") at a temperature of between about 30° C. and about 45° strand encoding the naturally occurring PUFAPKS protein or 35 C., more preferably, between about 38°C. and about 50°C., polypeptide). Preferably, an amino acid sequence having the and even more preferably, between about 45° C. and about biological activity of at least one domain of a PUFA PKS 55°C., with similarly stringent wash conditions. These values system of the present invention is encoded by a nucleic acid are based on calculations of a melting temperature for mol sequence that hybridizes under moderate, high or very high ecules larger than about 100 nucleotides, 0% formamide and stringency conditions to the complement of a nucleic acid 40 a G+C content of about 40%. Alternatively, T, can be calcu sequence that encodes a protein comprising an amino acid lated empirically as set forth in Sambrook et al., Supra, pages sequence represented by any of SEQIDNO:2, SEQID NO:4, 9.31 to 9.62. In general, the wash conditions should be as SEQ ID NO:6, SEQ ID NO:8, SEQ ID NO:10, SEQ ID stringent as possible, and should be appropriate for the chosen NO:13, SEQ ID NO:18, SEQ ID NO:20, SEQ ID NO:22, hybridization conditions. For example, hybridization condi SEQ ID NO:24, SEQ ID NO:26, SEQ ID NO:28, SEQ ID 45 tions can include a combination of salt and temperature con NO:30, or SEQID NO:32. Methods to deduce a complemen ditions that are approximately 20-25°C. below the calculated tary sequence are known to those skilled in the art. It should T of a particular hybrid, and wash conditions typically be noted that since amino acid sequencing and nucleic acid include a combination of salt and temperature conditions that sequencing technologies are not entirely error-free, the are approximately 12-20° C. below the calculated T of the sequences presented herein, at best, represent apparent 50 particular hybrid. One example of hybridization conditions sequences of PUFA PKS domains and proteins of the present suitable for use with DNA:DNA hybrids includes a 2-24 hour invention. hybridization in 6xSSC (50% formamide) at about 42°C., As used herein, hybridization conditions refer to standard followed by washing steps that include one or more washes at hybridization conditions under which nucleic acid molecules room temperature in about 2xSSC, followed by additional are used to identify similar nucleic acid molecules. Such 55 washes at higher temperatures and lower ionic strength (e.g., standard conditions are disclosed, for example, in Sambrook at least one wash as about 37° C. in about 0.1x-0.5xSSC, et al., Molecular Cloning. A Laboratory Manual, Cold Spring followed by at least one wash at about 68°C. in about 0.1x Harbor Labs Press, 1989. Sambrook et al., ibid., is incorpo 0.5xSSC). rated by reference herein in its entirety (see specifically, pages Another embodiment of the present invention includes a 9.31-9.62). In addition, formulae to calculate the appropriate 60 recombinant nucleic acid molecule comprising a recombi hybridization and wash conditions to achieve hybridization nant vector and a nucleic acid molecule comprising a nucleic permitting varying degrees of mismatch of nucleotides are acid sequence encoding an amino acid sequence having a disclosed, for example, in Meinkoth et al., 1984, Anal Bio biological activity of at least one domain of a PUFA PKS chem. 138,267-284; Meinkoth et al., ibid., is incorporated by system as described herein. Such nucleic acid sequences are reference herein in its entirety. 65 described in detail above. According to the present invention, More particularly, moderate stringency hybridization and a recombinant vector is an engineered (i.e., artificially pro washing conditions, as referred to herein, refer to conditions duced) nucleic acid molecule that is used as a tool formanipu US 7,842,796 B2 31 32 lating a nucleic acid sequence of choice and for introducing operatively linked to a transcription control sequence, but can Such a nucleic acid sequence into a host cell. The recombinant be used interchangeably with the phrase “nucleic acid mol vector is therefore Suitable for use in cloning, sequencing, ecule', when Such nucleic acid molecule is a recombinant and/or otherwise manipulating the nucleic acid sequence of molecule as discussed herein. According to the present inven choice, such as by expressing and/or delivering the nucleic 5 tion, the phrase “operatively linked’ refers to linking a acid sequence of choice into a host cell to form a recombinant nucleic acid molecule to a transcription control sequence in a cell. Such a vector typically contains heterologous nucleic manner such that the molecule is able to be expressed when acid sequences, that is nucleic acid sequences that are not transfected (i.e., transformed, transduced, transfected, conju naturally found adjacent to nucleic acid sequence to be cloned gated or conduced) into a host cell. Transcription control or delivered, although the vector can also contain regulatory 10 sequences are sequences which control the initiation, elonga nucleic acid sequences (e.g., promoters, untranslated regions) tion, or termination of transcription. Particularly important which are naturally found adjacent to nucleic acid molecules transcription control sequences are those which control tran of the present invention or which are useful for expression of Scription initiation, Such as promoter, enhancer, operator and the nucleic acid molecules of the present invention (discussed repressor sequences. Suitable transcription control sequences in detail below). The vector can be either RNA or DNA, either 15 include any transcription control sequence that can function prokaryotic or eukaryotic, and typically is a plasmid. The in a host cell or organism into which the recombinant nucleic vector can be maintained as an extrachromosomal element acid molecule is to be introduced. (e.g., a plasmid) or it can be integrated into the chromosome Recombinant nucleic acid molecules of the present inven of a recombinant organism (e.g., a microbe or a plant). The tion can also contain additional regulatory sequences, such as entire vector can remain in place within a host cell, or under translation regulatory sequences, origins of replication, and certain conditions, the plasmid DNA can be deleted, leaving other regulatory sequences that are compatible with the behind the nucleic acid molecule of the present invention. The recombinant cell. In one embodiment, a recombinant mol integrated nucleic acid molecule can be under chromosomal ecule of the present invention, including those which are promoter control, under native or plasmid promoter control, integrated into the host cell chromosome, also contains secre or under a combination of several promoter controls. Single 25 tory signals (i.e., signal segment nucleic acid sequences) to or multiple copies of the nucleic acid molecule can be inte enable an expressed protein to be secreted from the cell that grated into the chromosome. A recombinant vector of the produces the protein. Suitable signal segments include a sig present invention can contain at least one selectable marker. nal segment that is naturally associated with the protein to be In one embodiment, a recombinant vector used in a recom expressed or any heterologous signal segment capable of binant nucleic acid molecule of the present invention is an 30 directing the secretion of the protein according to the present expression vector. As used herein, the phrase “expression invention. In another embodiment, a recombinant molecule vector” is used to refer to a vector that is suitable for produc of the present invention comprises a leader sequence to tion of an encoded product (e.g., a protein of interest). In this enable an expressed protein to be delivered to and inserted embodiment, a nucleic acid sequence encoding the product to into the membrane of a host cell. Suitable leader sequences be produced (e.g., a PUFA PKS domain) is inserted into the 35 include a leader sequence that is naturally associated with the recombinant vector to produce a recombinant nucleic acid protein, or any heterologous leader sequence capable of molecule. The nucleic acid sequence encoding the protein to directing the delivery and insertion of the protein to the mem be produced is inserted into the vector in a manner that opera brane of a cell. tively links the nucleic acid sequence to regulatory sequences The present inventors have found that the Schizochytrium in the vector which enable the transcription and translation of 40 PUFAPKSOrfs A and B are closely linked in the genome and the nucleic acid sequence within the recombinant host cell. region between the Orfs has been sequenced. The Orfs are In another embodiment, a recombinant vector used in a oriented in opposite directions and 4244 base pairs separate recombinant nucleic acid molecule of the present invention is the start (ATG) codons (i.e. they are arranged as follows: a targeting vector. As used herein, the phrase “targeting vec 3'OrfA5'-4244 bp-5'OrfB3'). Examination of the 4244 bp tor” is used to refer to a vector that is used to deliver a 45 intergenic region did not reveal any obvious Orfs (no signifi particular nucleic acid molecule into a recombinant host cell, cant matches were found on a BlastX search). Both Orfs. A wherein the nucleic acid molecule is used to delete or inacti and B are highly expressed in Schizochytrium, at least during vate an endogenous gene within the host cell or microorgan the time of oil production, implying that active promoter ism (i.e., used for targeted gene disruption or knock-out tech elements are embedded in this intergenic region. These nology). Such a vector may also be known in the art as a 50 genetic elements are believed to have utility as a bi-direc “knock-out' Vector. In one aspect of this embodiment, a por tional promoter sequence for transgenic applications. For tion of the vector, but more typically, the nucleic acid mol example, in a preferred embodiment, one could clone this ecule inserted into the vector (i.e., the insert), has a nucleic region, place any genes of interest at each end and introduce acid sequence that is homologous to a nucleic acid sequence the construct into Schizochytrium (or some other host in of a target gene in the host cell (i.e., a gene which is targeted 55 which the promoters can be shown to function). It is predicted to be deleted or inactivated). The nucleic acid sequence of the that the regulatory elements, under the appropriate condi vectorinsert is designed to bind to the target gene Such that the tions, would provide for coordinated, high level expression of target gene and the insert undergo homologous recombina the two introduced genes. The complete nucleotide sequence tion, whereby the endogenous target gene is deleted, inacti for the regulatory region containing Schizochytrium PUFA vated or attenuated (i.e., by at least a portion of the endog 60 PKS regulatory elements (e.g., a promoter) is represented enous target gene being mutated or deleted). herein as SEQID NO:36. Typically, a recombinant nucleic acid molecule includes at In a similar manner, Orf is highly expressed in least one nucleic acid molecule of the present invention Schizochytrium during the time of oil production and regula operatively linked to one or more transcription control tory elements are expected to reside in the region upstream of sequences. As used herein, the phrase “recombinant mol 65 its start codon. A region of genomic DNA upstream of Orf ecule' or “recombinant nucleic acid molecule' primarily has been cloned and sequenced and is represented herein as refers to a nucleic acid molecule or nucleic acid sequence (SEQID NO:37). This sequence contains the 3886 nt imme US 7,842,796 B2 33 34 diately upstream of the Orfc start codon. Examination of this sequences to plasmids, Substitutions or modifications of tran region did not reveal any obvious Orfs (i.e., no significant Scription control signals (e.g., promoters, operators, enhanc matches were found on a BlastX search). It is believed that ers), Substitutions or modifications of translational control regulatory elements contained in this region, under the appro signals (e.g., ribosome binding sites, Shine-Dalgarno priate conditions, will provide for high-level expression of a sequences), modification of nucleic acid molecules to corre gene placed behind them. Additionally, under the appropriate spond to the codon usage of the host cell, and deletion of conditions, the level of expression may be coordinated with sequences that destabilize transcripts. genes under control of the A-B intergenic region (SEQ ID General discussion above with regard to recombinant NO:36). nucleic acid molecules and transfection of host cells is Therefore, in one embodiment, a recombinant nucleic acid 10 intended to be applied to any recombinant nucleic acid mol molecule useful in the present invention, as disclosed herein, ecule discussed herein, including those encoding any amino can include a PUFA PKS regulatory region contained within acid sequence having a biological activity of at least one SEQ ID NO:36 and/or SEQ ID NO:37. Such a regulatory domain from a PUFA PKS, those encoding amino acid region can include any portion (fragment) of SEQID NO:36 sequences from other PKS systems, and those encoding other and/or SEQ ID NO:37 that has at least basal PUFA PKS 15 proteins or domains. transcriptional activity. This invention also relates to the use of a novel method to One or more recombinant molecules of the present inven identify a microorganism that has a PUFA PKS system that is tion can be used to produce an encoded product (e.g., a PUFA homologous in structure, domain organization and/or func PKS domain, protein, or system) of the present invention. In tion to a Schizochytrium PUFA PKS system. In one embodi one embodiment, an encoded product is produced by express ment, the microorganism is a non-bacterial microorganism, ing a nucleic acid molecule as described herein under condi and preferably, the microorganism identified by this method tions effective to produce the protein. A preferred method to is a eukaryotic microorganism. In addition, this invention produce an encoded protein is by transfecting a host cell with relates to the microorganisms identified by Such method and one or more recombinant molecules to form a recombinant to the use of these microorganisms and the PUFA PKS sys cell. Suitable host cells to transfect include, but are not limited 25 tems from these microorganisms in the various applications to, any bacterial, fungal (e.g., yeast), insect, plant or animal for a PUFAPKS system (e.g., genetically modified organisms cell that can be transfected. Host cells can be either untrans and methods of producing bioactive molecules) according to fected cells or cells that are already transfected with at least the present invention. The unique screening method one other recombinant nucleic acid molecule. described and demonstrated herein enables the rapid identi According to the present invention, the term “transfection' 30 fication of new microbial strains containing a PUFA PKS is used to refer to any method by which an exogenous nucleic system homologous to the Schizochytrium PUFAPKS system acid molecule (i.e., a recombinant nucleic acid molecule) can of the present invention. Applicants have used this method to be inserted into a cell. The term “transformation' can be used discover and disclose herein that a Thraustochytrium micro interchangeably with the term “transfection' when such term organism contains a PUFAPKS system that is homologous to is used to refer to the introduction of nucleic acid molecules 35 that found in Schizochytrium. This discovery is described in into microbial cells. Such as algae, bacteria and yeast. In detail in Example 2 below. microbial systems, the term “transformation' is used to Microbial organisms with a PUFA PKS system similar to describe an inherited change due to the acquisition of exog that found in Schizochytrium, such as the Thraustochytrium enous nucleic acids by the microorganism and is essentially microorganism discovered by the present inventors and synonymous with the term “transfection.” However, in ani 40 described in Example 2, can be readily identified/isolated/ mal cells, transformation has acquired a second meaning screened by the following methods used separately or in any which can refer to changes in the growth properties of cells in combination of these methods. culture after they become cancerous, for example. Therefore, In general, the method to identify a non-bacterial microor to avoid confusion, the term “transfection' is preferably used ganism that has a polyunsaturated fatty acid (PUFA) with regard to the introduction of exogenous nucleic acids 45 polyketide synthase (PKS) system includes a first step of (a) into animal cells, and the term “transfection' will be used selecting a microorganism that produces at least one PUFA; herein to generally encompass transfection of animal cells, and a second step of (b) identifying a microorganism from (a) plant cells and transformation of microbial cells, to the extent that has an ability to produce increased PUFAs under dis that the terms pertainto the introduction of exogenous nucleic solved oxygen conditions of less than about 5% of saturation acids into a cell. Therefore, transfection techniques include, 50 in the fermentation medium, as compared to production of but are not limited to, transformation, particle bombardment, PUFAS by said microorganism under dissolved oxygen con electroporation, microinjection, lipofection, adsorption, ditions of greater than 5% of saturation, more preferably 10% infection and protoplast fusion. of saturation, more preferably greater than 15% of saturation It will be appreciated by one skilled in the art that use of and more preferably greater than 20% of saturation in the recombinant DNA technologies can improve control of 55 fermentation medium. A microorganism that produces at expression of transfected nucleic acid molecules by manipu least one PUFA and has an ability to produce increased lating, for example, the number of copies of the nucleic acid PUFAs under dissolved oxygen conditions of less than about molecules within the host cell, the efficiency with which those 5% of Saturation is identified as a candidate for containing a nucleic acid molecules are transcribed, the efficiency with PUFA PKS system. Subsequent to identifying a microorgan which the resultant transcripts are translated, and the effi 60 ism that is a strong candidate for containing a PUFA PKS ciency of post-translational modifications. Additionally, the system, the method can include an additional step (c) of promoter sequence might be genetically engineered to detecting whether the organism identified in step (b) com improve the level of expression as compared to the native prises a PUFA PKS system. promoter. Recombinant techniques useful for controlling the In one embodiment of the present invention, step (b) is expression of nucleic acid molecules include, but are not 65 performed by culturing the microorganism selected for the limited to, integration of the nucleic acid molecules into one screening process in low oxygen/anoxic conditions and aero or more host cell chromosomes, addition of vector stability bic conditions, and, in addition to measuring PUFA content in US 7,842,796 B2 35 36 the organism, the fatty acid profile is determined, as well as fat through published or other readily available sources). If the content. By comparing the results under low oxygen/anoxic microbe contains greater than about 30%, and more prefer conditions with the results under aerobic conditions, the ably greater than about 40%, and more preferably greater than method provides a strong indication of whether the test about 45%, and even more preferably greater than about 50% microorganism contains a PUFA PKS system of the present of its total fatty acids as C14:0, C16:0 and/or C16:1, while invention. This preferred embodiment is described in detail also producing at least one long chain fatty acid with three or below. more unsaturated bonds, and more preferably 4 or more Initially, microbial strains to be examined for the presence double bonds, and more preferably 5 or more double bonds, of a PUFA PKS system are cultured under aerobic conditions and even more preferably 6 or more double bonds, then this to induce production of a large number of cells (microbial 10 biomass). As one element of the identification process, these microbial strain is identified as a likely candidate to possess a cells are then placed under low oxygen or anoxic culture novel PUFA PKS system of the type described in this inven conditions (e.g., dissolved oxygen less than about 5% of tion. Screening this organism under the low oxygen condi saturation, more preferably less than about 2%, even more tions described above, and confirming production of bioac preferably less than about 1%, and most preferably dissolved 15 tive molecules containing two or more unsaturated bonds oxygen of about 0% of saturation in the culture medium) and would suggest the existence of a novel PUFA PKS system in allowed to grow for approximately another 24-72 hours. In the organism, which could be further confirmed by analysis of this process, the microorganisms should be cultured at a tem the microbes genome. perature greater than about 15° C., and more preferably The Success of this method can also be enhanced by Screen greater than about 20°C., and even more preferably greater ing eukaryotic strains that are known to contain C 17:0 and or than about 25°C., and even more preferably greater than 30° C17:1 fatty acids (in conjunction with the large percentages C. The low or anoxic culture environment can be easily main of C14:0, C16:0 and C16:1 fatty acids described above)— tained in culture chambers capable of inducing this type of because the C17:0 and C17:1 fatty acids are potential markers atmospheric environment in the chamber (and thus in the for a bacterial (prokaryotic) based or influenced fatty acid cultures) or by culturing the cells in a manner that induces the 25 production system. Another marker for identifying strains low oxygen environment directly in the culture flask/vessel containing novel PUFA PKS systems is the production of itself. simple fatty acid profiles by the organism. According to the In a preferred culturing method, the microbes can be cul tured in shake flasks which, instead of normally containing a present invention, a “simple fatty acid profile' is defined as 8 small amount of culture medium—less than about 50% of 30 or fewer fatty acids being produced by the strain at levels total capacity and usually less than about 25% of total capac greater than 10% of total fatty acids. ity to keep the medium aerated as it is shaken on a shaker Use of any of these methods or markers (singly or prefer table, are instead filled to greater than about 50% of their ably in combination) would enable one of skill in the art to capacity, and more preferably greater than about 60%, and readily identify microbial strains that are highly predicted to most preferably greater than about 75% of their capacity with 35 contain a novel PUFA PKS system of the type described in culture medium. High loading of the shake flask with culture this invention. medium prevents it from mixing very well in the flask when it In a preferred embodiment combining many of the meth is placed on a shaker table, preventing oxygen diffusion into ods and markers described above, a novel biorational Screen the culture. Therefore as the microbes grow, they use up the (using shake flask cultures) has been developed for detecting existing oxygen in the medium and naturally create a low or 40 microorganisms containing PUFA producing PKS systems. no oxygen environment in the shake flask. After the culture period, the cells are harvested and ana This screening system is conducted as follows: lyzed for content of bioactive compounds of interest (e.g., A portion of a culture of the Strain/microorganism to be lipids), but most particularly, for compounds containing two tested is placed in 250 mL baffled shake flask with 50 mL or more unsaturated bonds, and more preferably three or more 45 culture media (aerobic treatment), and another portion of double bonds, and even more preferably four or more double culture of the same strain is placed in a 250 mL non-baffled bonds. For lipids, those strains possessing such compounds at shake flask with 200 mL culture medium (anoxic/low oxygen greater than about 5%, and more preferably greater than about treatment). Various culture media can be employed depend 10%, and more preferably greater than about 15%, and even ing on the type and strain of microorganism being evaluated. more preferably greater than about 20% of the dry weight of 50 Both flasks are placed on a shaker table at 200 rpm. After the microorganism are identified as predictably containing a 48-72 hr of culture time, the cultures are harvested by cen novel PKS system of the type described above. For other trifugation and the cells are analyzed for fatty acid methyl bioactive compounds, such as antibiotics or compounds that ester content via gas chromatography to determine the fol are synthesized in Smaller amounts, those strains possessing lowing data for each culture: (1) fatty acid profile; (2) PUFA Such compounds at greater than about 0.5%, and more pref 55 content; and (3) fat content (approximated as amount total erably greater than about 0.1%, and more preferably greater fatty acids/cell dry weight). than about 0.25%, and more preferably greater than about These data are then analyzed asking the following five 0.5%, and more preferably greater than about 0.75%, and questions (Yes/No): more preferably greater than about 1%, and more preferably greater than about 2.5%, and more preferably greater than 60 Comparing the Data from the Low O/Anoxic Flask with the about 5% of the dry weight of the microorganism are identi Data from the Aerobic Flask: fied as predictably containing a novel PKS system of the type (1) Did the DHA (or other PUFA content) (as % FAME described above. (fatty acid methyl esters)) stay about the same or preferably Alternatively, or in conjunction with this method, prospec increased in the low oxygen culture compared to the aerobic tive microbial strains containing novel PUFAPKS systems as 65 culture? described herein can be identified by examining the fatty acid (2) Is C14:0+C16:0+C16:1 greater than about 40%TFA in profile of the strain (obtained by culturing the organism or the anoxic culture? US 7,842,796 B2 37 38 (3) Are there very little (<1% as FAME) or no precursors with the inventors’ data—i.e., production of the Saturated (C18:3n-3+C 18:2n-6+C 18:3n-6) to the conventional oxygen fatty acids (primarily C14:0 and C16:0 in Schizochytrium) dependent elongase? desaturase pathway in the anoxic cul was not inhibited by the thiolactomycin treatment. There are ture? no indications in the literature or in the inventors’ own data (4) Did fat content (as amount total fatty acids/cell dry that thiolactomycin has any inhibitory effect on the elonga weight) increase in the low oxygen culture compared to the tion of C14:0 or C16:0fatty acids or their desaturation (i.e. the aerobic culture? conversion of short chain saturated fatty acids to PUFAs by (5) Did DHA (or other PUFA content) increase as % cell the classical pathway). Therefore, the fact that the PUFA dry weight in the low oxygen culture compared to the aerobic production in Schizochytrium was blocked by thiolactomycin culture? 10 strongly indicates that the classical PUFA synthesis pathway If the first three questions are answered yes, this is a good does not produce the PUFAs in Schizochytrium, but rather indication that the strain contains a PKS genetic system for that a different pathway of synthesis is involved. Further, it making long chain PUFAs. The more questions that are had previously been determined that the Shewanella PUFA answered yes (preferably the first three questions must be PKS system is inhibited by thiolactomycin (note that the answered yes), the stronger the indication that the strain con 15 PUFA PKS system of the present invention has elements of tains such a PKS genetic system. If all five questions are both Type I and Type II systems), and it was known that answered yes, then there is a very strong indication that the thiolactomycin is an inhibitor of Type II FAS systems (such as strain contains a PKS genetic system for making long chain that found in E. coli). Therefore, this experiment indicated PUFAs. The lack of 18:3n-3/18:2n-6/18:3n-6 would indicate that Schizochytrium produced PUFAs as a result of a pathway that the low oxygen conditions would have turned off or not involving the Type I FAS. A similar rationale and detec inhibited the conventional pathway for PUFA synthesis. A tion step could be used to detect a PUFA PKS system in a high 14:0/16:0/16:1 fatty is an preliminary indicator of a microbe identified using the novel Screening method dis bacterially influenced fatty acid synthesis profile (the pres closed herein. ence of C17:0 and 17:1 is also and indicator of this) and of a In addition, Example 3 shows additional biochemical data simple fatty acid profile. The increased PUFA synthesis and 25 which provides evidence that PUFAs in Schizochytrium are PUFA containing fat synthesis under the low oxygen condi not produced by the classical pathway (i.e., precursor product tions is directly indicative of a PUFA PKS system, since this kinetics between C16:0 and DHA are not observed in whole system does not require oxygen to make highly unsaturated cells and, in vitro PUFA synthesis can be separated from the fatty acids. membrane fraction—all of the fatty acid desaturases of the Finally, in the identification method of the present inven 30 classical PUFA synthesis pathway, with the exception of the tion, once a strong candidate is identified, the microbe is delta 9 desaturase which inserts the first double bond of the preferably screened to detect whether or not the microbe series, are associated with cellular membranes). This type of contains a PUFA PKS system. For example, the genome of biochemical data could be used to detect PUFA PKS activity the microbe can be screened to detect the presence of one or in microbe identified by the novel screening method more nucleic acid sequences that encode a domain of a PUFA 35 described above. PKS system as described herein. Preferably, this step of Preferred microbial Strains to Screen using the screening/ detection includes a suitable nucleic acid detection method, identification method of the present invention are chosen Such as hybridization, amplification and or sequencing of one from the group consisting of bacteria, algae, fungi, protozoa or more nucleic acid sequences in the microbe of interest. The or , but most preferably from the eukaryotic microbes probes and/or primers used in the detection methods can be 40 consisting of algae, fungi, protozoa and protists. These derived from any known PUFA PKS system, including the microbes are preferably capable of growth and production of marine bacteria PUFA PKS systems described in U.S. Pat. the bioactive compounds containing two or more unsaturated No. 6,140,486, or the Thraustochytrid PUFA PKS systems bonds at temperatures greater than about 15° C., more pref described in U.S. application Ser. No. 09/231,899 and herein. erably greater than about 20°C., even more preferably greater Once novel PUFA PKS systems are identified, the genetic 45 than about 25°C. and most preferably greater than about 30° material from these systems can also be used to detect addi C. tional novel PUFA PKS systems. Methods of hybridization, In some embodiments of this method of the present inven amplification and sequencing of nucleic acids for the purpose tion, novel bacterial PUFA PKS systems can be identified in of identification and detection of a sequence are well known bacteria that produce PUFAs attemperatures exceeding about in the art. Using these detection methods, sequence homology 50 20° C., preferably exceeding about 25° C. and even more and domain structure (e.g., the presence, number and/or preferably exceeding about 30° C. As described previously arrangement of various PUFA PKS functional domains) can herein, the marine bacteria, Shewanella and Vibrio marinus, be evaluated and compared to the known PUFA PKS systems described in U.S. Pat. No. 6,140,486, do not produce PUFAs described herein. at higher temperatures, which limits the usefulness of PUFA In some embodiments, a PUFA PKS system can be iden 55 PKS systems derived from these bacteria, particularly in plant tified using biological assays. For example, in U.S. applica applications under field conditions. Therefore, in one tion Ser. No. 09/231,899, Example 7, the results of a key embodiment, the Screening method of the present invention experiment using a well-known inhibitor of Some types of can be used to identify bacteria that have a PUFAPKS system fatty acid synthesis systems, i.e., thiolactomycin, is which are capable of growth and PUFA production at higher described. The inventors showed that the synthesis of PUFAs 60 temperatures (e.g., above about 20, 25, or 30° C.). In this in whole cells of Schizochytrium could be specifically embodiment, inhibitors of eukaryotic growth Such as nystatin blocked without blocking the synthesis of short chain satu (antifungal) or cycloheximide (inhibitor of eukaryotic protein rated fatty acids. The significance of this result is as follows: synthesis) can be added to agar plates used to culture/select the inventors knew from analysis of cDNA sequences from initial strains from water samples/Soil samples collected from Schizochytrium that a Type I fatty acid synthase system is 65 the types of habitats/niches described below. This process present in Schizochytrium. It was known that thiolactomycin would help select for enrichment of bacterial strains without does not inhibit Type I FAS systems, and this is consistent (or minimal) contamination of eukaryotic strains. This selec US 7,842,796 B2 39 40 tion process, in combination with culturing the plates at example the genera Guttulinopsis, and Guttulina), class Dic elevated temperatures (e.g. 30°C.), and then selecting strains ty Steliaceae (for example the genera Acrasis, Dictyostelium, that produce at least one PUFA would initially identify can Polysphondylium, and Coenonia), and class Phycomyceae didate bacterial strains with a PUFA PKS system that is including the orders Chytridiales, Ancylistales, Blastocladi operative at elevated temperatures (as opposed to those bac ales, Monoblepharidales, Saprolegniales, Peronosporales, terial strains in the prior art which only exhibit PUFA pro Mucorales, and Entomophthorales. duction at temperatures less than about 20° C. and more In the Protozoa: Protozoa strains with life stages capable of preferably below about 5° C.). bacterivory (including byphageocytosis) can be selected from Locations for collection of the preferred types of microbes the types classified as ciliates, or amoebae. Proto for screening for a PUFA PKS system according to the 10 Zoan ciliates include the groups: Chonotrichs, Colpodids, present invention include any of the following: low oxygen Cyrtophores, Haptorids, Karyorelicts, Oligohymenophora, environments (or locations near these types of low oxygen Polyhymenophora (spirotrichs), Prostomes and Suctoria. environments including in the guts of animals including Protozoan flagellates include the Biosoecids, Bodonids, Cer invertebrates that consume microbes or microbe-containing comonads, Chrysophytes (for example the genera Antho foods (including types of filter feeding organisms), low or 15 physa, Chrysanoemba, Chrysosphaerella, Dendromonas, non-oxygen containing aquatic habitats (including freshwa Dinobryon, Mallomonas, Ochromonas, Paraphysomonas, ter, saline and marine), and especially at- or near-low oxygen Poteriodchromonas, Spumella, Syncrypta, Synura, and environments (regions) in the oceans. The microbial strains Uroglena), Collar flagellates, Cryptophytes (for example the would preferably not be obligate anaerobes but be adapted to genera Chilomonas, Cryptomonas, Cyanomonas, and Goni live in both aerobic and low or anoxic environments. Soil Omonas), Dinoflagellates, Diplomonads, Euglenoids, Heter environments containing both aerobic and low oxygen or olobosea, Pedinellids, Pelobionts, Phalansteriids, Pseudo anoxic environments would also excellent environments to dendromonads, Spongomonads and Volvocales (and other find these organisms in and especially in these types of soil in flagellates including the unassigned genera of Arto aquatic habitats or temporary aquatic habitats. discus, Clautriavia, Helkesimastix, Kathablepharis and Mul A particularly preferred microbial strain would be a strain 25 ticilia). Amoeboid protozoans include the groups: Acti (selected from the group consisting of algae, fungi (including nophryids, Centrohelids, Desmothoricids, Diplophryids, yeast), protozoa or protists) that, during a portion of its life Eumamoebae, Heterolobosea, Leptomyxids, Nucleariid cycle, is capable of consuming whole bacterial cells (bacte filose amoebae, Pelebionts, and Vampyrel rivory) by mechanisms such as phagocytosis, phagotrophic or lids (and including the unassigned amoebid genera Gym endocytic capability and/or has a stage of its life cycle in 30 nophrys, Biomyxa, Microcometes, , Belonocys which it exists as an amoeboid stage or naked protoplast. This tis, Elaeorhanis, Allelogromia, or Lieberkuhnia). method of nutrition would greatly increase the potential for The protozoan orders include the following: Percolomon transfer of a bacterial PKS system into a eukaryotic cell if a adeae, Heterolobosea, Lyromonadea, Pseudociliata, Tri mistake occurred and the bacterial cell (or its DNA) did not chomonadea, Hypermastigea, Heteromiteae, Telonemea, get digested and instead are functionally incorporated into the 35 Cyathobodonea, Ebridea, Pyytomyxea, Opalinea, Kineto eukaryotic cell. monadea, Hemimastigea, Protostelea, Myxagastrea, Dicty Strains of microbes (other than the members of the Thraus ostelea, Choanomonadea, Apicomonadea, Eogregarinea, tochytrids) capable of bacterivory (especially by phagocyto Neogregarinea, Coelotrolphea, Eucoccidea, Haemosporea, sis or endocytosis) can be found in the following microbial Piroplasmea, Spirotrichea, Prostomatea, Litostomatea, Phyl classes (including but not limited to example genera): 40 lopharyngea, Nassophorea, Oligohymenophorea, Colpodea, In the algae and algae-like microbes (including strameno Karyorelicta, Nucleohelea, Centrohelea, , Sti piles): of the class Euglenophyceae (for example genera cholonchea, Polycystinea, , Lobosea, Filosea, Euglena, and Peranema), the class Chrysophyceae (for Athalamea, Monothalamea, Polythalamea, Xenophyopho example the genus Ochromonas), the class Dinobryaceae (for rea, Schizocladea, Holosea, Entamoebea, Myxosporea, Acti example the genera Dinobryon, Platychrysis, and Chrysoch 45 nomyxea, Halosporea, Paramyxea, Rhombozoa and Ortho romulina), the Dinophyceae (including the genera Crypth nectea. ecodinium, Gymnodinium, Peridinium, Ceratium, Gyrod A preferred embodiment of the present invention includes inium, and Oxyrrhis), the class Cryptophyceae (for example strains of the microorganisms listed above that have been the genera Cryptomonas, and Rhodomonas), the class Xan collected from one of the preferred habitats listed above. thophyceae (for example the genus Olisthodiscus) (and 50 One embodiment of the present invention relates to any including forms of algae in which an amoeboid stage occurs microorganisms identified using the novel PUFA PKS as in the flagellates Rhizochloridaceae, and Zoospores/ga screening method described above, to the PUFA PKS genes metes of Aphanochaete pascheri, Bumilleria Stigeoclonium and proteins encoded thereby, and to the use of Such micro and Vaucheria geminata), the class Eustigmatophyceae, and organisms and/or PUFA PKS genes and proteins (including the class Prymnesiopyceae (including the genera Prynne 55 homologues and fragments thereof) in any of the methods sium and Diacronema). described herein. In particular, the present invention encom In the Stramenopiles including the: Proteromonads, Opa passes organisms identified by the screening method of the lines, Developayella, Diplophorys, Larbrinthulids, Thraus present invention which are then genetically modified to tochytrids, Bicosecids, Oomycetes, Hypochytridiomycetes, regulate the production of bioactive molecules by said PUFA Commation, Reticulosphaera, Pelagomonas, Pelapococcus, 60 PKS system. Ollicola, Aureococcus, Parmales, Raphidiophytes, Synurids, Yet another embodiment of the present invention relates to Rhizochromulinaales, Pedinellales, Dictyochales, Chrysom an isolated nucleic acid molecule comprising a nucleic acid eridales, Sarcinochrysidales, Hydrurales, Hibberdiales, and sequence encoding at least one biologically active domain or Chromulinales. biologically active fragment thereof of a polyunsaturated In the Fungi. Class Myxomycetes (form myxamoebae)— 65 fatty acid (PUFA) polyketide synthase (PKS) system from a slime molds, class Acrasieae including the orders Acrasiceae Thraustochytrid microorganism. As discussed above, the (for example the genus Sappinia), class Guttulinaceae (for present inventors have successfully used the method to iden US 7,842,796 B2 41 42 tify a non-bacterial microorganism that has a PUFA PKS derived from bacterial endosymbionts, chloroplasts from system to identify additional members of the order Thraus cyanophytes, and flagella from Spirochaetes. The other theory tochytriales which contain a PUFA PKS system. The identi Suggests a gradual evolution of the membrane-bound fication of three Such microorganisms is described in organelles from the non-membrane-bounded systems of the Example 2. Specifically, the present inventors have used the prokaryote ancestor via an autogenous process (Cavalier screening method of the present invention to identify Thraus Smith, 1975, Nature (Lond.) 256:462-468). Both groups of tochytrium sp. 23B (ATCC 20892) as being highly predicted evolutionary biologists however, have removed the to contain a PUFA PKS system, followed by detection of Oomycetes and Thraustochytrids from the fungi and place sequences in the Thraustochytrium sp. 23B genome that them either with the chromophyte algae in the Chro hybridize to the Schizochytrium PUFA PKS genes disclosed 10 herein. Schizochytrium limacium (IFO 32693) and Ulkenia mophyta (Cavalier-Smith, 1981, BioSystems 14:461-481) (BP-5601) have also been identified as good candidates for (this kingdom has been more recently expanded to include containing PUFA PKS systems. Based on these data and on other protists and members of this kingdom are now called the similarities among members of the order Thraustochytri Stramenopiles) or with all algae in the kingdom Protoctista ales, it is believed that many other Thraustochytriales PUFA 15 (Margulis and Sagen, 1985, Biosystems 18:141-147). PKS systems can now be readily identified using the methods With the development of electron microscopy, studies on and tools provided by the present invention. Therefore, the ultrastructure of the Zoospores of two genera of Thraus Thraustochytriales PUFA PKS systems and portions and/or tochytrids, Thraustochytrium and Schizochytrium, (Perkins, homologues thereof (e.g., proteins, domains and fragments 1976, pp. 279-312 in “Recent Advances in Aquatic Mycol thereof), genetically modified organisms comprising Such ogy’ (ed. E. B. G. Jones), John Wiley & Sons, New York; systems and portions and/or homologues thereof, and meth Kazama, 1980, Can. J. Bot. 58:2434-2446; Barr, 1981, Bio ods of using such microorganisms and PUFA PKS systems, systems 14:359-370) have provided good evidence that the are encompassed by the present invention. Thraustochytriaceae are only distantly related to the Developments have resulted in revision of the of Oomycetes. Additionally, genetic data representing a corre the Thraustochytrids. Taxonomic theorists place Thraus 25 spondence analysis (a form of multivariate statistics) of 5 S tochytrids with the algae or algae-like protists. However, ribosomal RNA sequences indicate that Thraustochytriales because of taxonomic uncertainty, it would be best for the are clearly a unique group of eukaryotes, completely separate purposes of the present invention to consider the strains described in the present invention as Thraustochytrids (Or from the fungi, and most closely related to the red and brown der: Thraustochytriales; Family: Thraustochytriaceae: 30 algae, and to members of the Oomycetes (Mannella, et al., Genus: Thraustochytrium, Schizochytrium, Labyrinthu 1987, Mol. Evol. 24:228-235). Most taxonomists have agreed loides, or Japonochytrium). For the present invention, mem to remove the Thraustochytrids from the Oomycetes (Bart bers of the labrinthulids are considered to be included in the nicki-Garcia, 1987, pp. 389-403 in “Evolutionary Biology of Thraustochytrids. Taxonomic changes are summarized the Fungi' (eds. Rayner, A. D. M., Brasier, C. M. & Moore, below. Strains of certain unicellular microorganisms dis 35 D.), Cambridge University Press, Cambridge). closed herein are members of the order Thraustochytriales. In Summary, employing the taxonomic system of Cavalier Thraustochytrids are marine eukaryotes with a evolving taxo Smith (Cavalier-Smith, 1981, BioSystems 14:461-481, 1983; nomic history. Problems with the taxonomic placement of the Cavalier-Smith, 1993, Microbiol Rev. 57:953-994), the Thraustochytrids have been reviewed by Moss (1986), Bah Thraustochytrids are classified with the chromophyte algae in nweb and Jackle (1986) and Chamberlain and Moss (1988). 40 the kingdom Chromophyta (Stramenopiles). This taxonomic According to the present invention, the phrases “Thraus placement has been more recently reaffirmed by Cavalier tochytrid”, “Thraustochytriales microorganism” and “micro Smith et al. using the 18s rRNA signatures of the Heterokonta organism of the order Thraustochytriales' can be used inter to demonstrate that Thraustochytrids are chromists not Fungi changeably. (Cavalier-Smith et al., 1994, Phil. Tran. Roy. Soc. London For convenience purposes, the Thraustochytrids were first 45 Series BioSciences 346:387-397). This places them in a com placed by taxonomists with other colorless Zoosporic eukary pletely different kingdom from the fungi, which are all placed otes in the Phycomycetes (algae-like fungi). The name Phy in the kingdom Eufungi. The taxonomic placement of the comycetes, however, was eventually dropped from taxo Thraustochytrids is therefore summarized below: nomic status, and the Thraustochytrids were retained in the Kingdom: Chromophyta (Stramenopiles) Oomycetes (the biflagellate Zoosporic fungi). It was initially 50 assumed that the Oomycetes were related to the heterokont Phylum: Heterokonta algae, and eventually a wide range of ultrastructural and biochemical studies, summarized by Barr (Barr, 1981, Bio Order: Thraustochytriales systems 14:359-370) supported this assumption. The Family: Thraustochytriaceae Oomycetes were in fact accepted by Leedale (Leedale, 1974, 55 Taxon 23:261-270) and other phycologists as part of the het Genus: Thraustochytrium, Schizochytrium, Labyrinthu erokont algae. However, as a matter of convenience resulting loides, or Japonochytrium from their heterotrophic nature, the Oomycetes and Thraus Some early taxonomists separated a few original members tochytrids have been largely studied by mycologists (scien of the genus Thraustochytrium (those with an amoeboid life tists who study fungi) rather than phycologists (Scientists who 60 stage) into a separate genus called Ulkenia. However it is now study algae). known that most, if not all. Thraustochytrids (including From another taxonomic perspective, evolutionary biolo Thraustochytrium and Schizochytrium), exhibit amoeboid gists have developed two general Schools of thought as to how stages and as such, Ulkenia is not considered by Some to be a eukaryotes evolved. One theory proposes an exogenous ori valid genus. As used herein, the genus Thraustochytrium will gin of membrane-bound organelles through a series of endo 65 include Ulkenia. symbioses (Margulis, 1970, Origin of Eukaryotic Cells. Yale Despite the uncertainty of taxonomic placement within University Press, New Haven); e.g., mitochondria were higher classifications of Phylum and Kingdom, the Thraus US 7,842,796 B2 43 44 tochytrids remain a distinctive and characteristic grouping affects the activity of the PKS system in the organism. The whose members remain classifiable within the order Thraus screening process referenced in part (b) has been described in tochytriales. detail above and includes the steps of: (a) selecting a micro Polyunsaturated fatty acids (PUFAs) are essential mem organism that produces at least one PUFA; and, (b) identify brane components in higher eukaryotes and the precursors of 5 ing a microorganism from (a) that has an ability to produce many lipid-derived signaling molecules. The PUFA PKS sys increased PUFAs under dissolved oxygen conditions of less tem of the present invention uses pathways for PUFA synthe than about 5% of saturation in the fermentation medium, as sis that do not require desaturation and elongation of Satu compared to production of PUFAs by the microorganism rated fatty acids. The pathways catalyzed by PUFAPKSs that under dissolved oxygen conditions of greater than about 5% are distinct from previously recognized PKSs in both struc 10 of saturation, and preferably about 10%, and more preferably ture and mechanism. Generation of cis double bonds is Sug about 15%, and more preferably about 20% of saturation in gested to involve position-specific isomerases; these enzymes the fermentation medium. The genetically modified microor are believed to be useful in the production of new families of ganism can include anyone or more of the above-identified antibiotics. nucleic acid sequences, and/or any of the other homologues To produce significantly high yields of various bioactive 15 of any of the Schizochytrium PUFA PKS ORFs or domains as molecules using the PUFA PKS system of the present inven described in detail above. tion, an organism, preferably a microorganism or a plant, can As used herein, a genetically modified microorganism can be genetically modified to affect the activity of a PUFA PKS include a genetically modified bacterium, protist, microalgae, system. In one aspect. Such an organism can endogenously , or other microbe, and particularly, any of the genera contain and express a PUFA PKS system, and the genetic of the order Thraustochytriales (e.g., a Thraustochytrid) modification can be a genetic modification of one or more of described herein (e.g., Schizochytrium, Thraustochytrium, the functional domains of the endogenous PUFA PKS sys Japonochytrium, Labyrinthuloides). Such a genetically tem, whereby the modification has some effect on the activity modified microorganism has a genome which is modified of the PUFAPKS system. In another aspect, such an organism (i.e., mutated or changed) from its normal (i.e., wild-type or can endogenously contain and express a PUFA PKS system, 25 naturally occurring) form Such that the desired result is and the genetic modification can be an introduction of at least achieved (i.e., increased or modified PUFA PKS activity and/ one exogenous nucleic acid sequence (e.g., a recombinant or production of a desired product using the PKS system). nucleic acid molecule), wherein the exogenous nucleic acid Genetic modification of a microorganism can be accom sequence encodes at least one biologically active domain or plished using classical strain development and/or molecular protein from a second PKS system and/or a protein that 30 genetic techniques. Such techniques known in the art and are affects the activity of said PUFA PKS system (e.g., a phos generally disclosed for microorganisms, for example, in phopantetheinyl transferases (PPTase), discussed below). In Sambrook et al., 1989, Molecular Cloning: A Laboratory yet another aspect, the organism does not necessarily endog Manual, Cold Spring Harbor Labs Press. The reference Sam enously (naturally) contain a PUFA PKS system, but is brook et al., ibid., is incorporated by reference herein in its genetically modified to introduce at least one recombinant 35 entirety. A genetically modified microorganism can include a nucleic acid molecule encoding an amino acid sequence hav microorganism in which nucleic acid molecules have been ing the biological activity of at least one domain of a PUFA inserted, deleted or modified (i.e., mutated; e.g., by insertion, PKS system. In this aspect, PUFA PKS activity is affected by deletion, Substitution, and/or inversion of nucleotides), in introducing or increasing PUFAPKS activity in the organism. Such a manner that Such modifications provide the desired Various embodiments associated with each of these aspects 40 effect within the microorganism. will be discussed in greater detail below. Preferred microorganism host cells to modify according to Therefore, according to the present invention, one embodi the present invention include, but are not limited to, any ment relates to a genetically modified microorganism, bacteria, protist, microalga, fungus, or protozoa. In one wherein the microorganism expresses a PKS system compris aspect, preferred microorganisms to genetically modify ing at least one biologically active domain of a polyunsatu 45 include, but are not limited to, any microorganism of the order rated fatty acid (PUFA) polyketide synthase (PKS) system. Thraustochytriales. Particularly preferred host cells for use in The at least one domain of the PUFA PKS system is encoded the present invention could include microorganisms from a by a nucleic acid sequence chosen from: (a) a nucleic acid genus including, but not limited to: Thraustochytrium, Laby sequence encoding at least one domain of a polyunsaturated rinthuloides, Japonochytrium, and Schizochytrium. Preferred fatty acid (PUFA) polyketide synthase (PKS) system from a 50 species within these genera include, but are not limited to: any Thraustochytrid microorganism; (b) a nucleic acid sequence Schizochytrium species, including Schizochytrium aggrega encoding at least one domain of a PUFA PKS system from a tum, Schizochytrium limacinum, Schizochytrium minutum; microorganism identified by a screening method of the any Thraustochytrium species (including former Ulkenia present invention; (c) a nucleic acid sequence encoding an species such as U. visurgensis, U. amoeboida, U. Sarkariana, amino acid sequence that is at least about 60% identical to at 55 U. profinda, U. radiata, U. minuta and Ulkenia sp. BP-5601), least 500 consecutive amino acids of an amino acid sequence and including Thraustochytrium striatum, Thraustochytrium selected from the group consisting of: SEQID NO:2, SEQID aureum, Thraustochytrium roseum; and any Japonochytrium NO:4, and SEQID NO:6; wherein the amino acid sequence species. Particularly preferred strains of Thraustochytriales has a biological activity of at least one domain of a PUFAPKS include, but are not limited to: Schizochytrium sp. (S31) system; and, (d) a nucleic acid sequence encoding an amino 60 (ATCC 20888); Schizochytrium sp. (S8)(ATCC 20889); acid sequence that is at least about 60% identical to an amino Schizochytrium sp. (LC-RM)(ATCC 18915); Schizochytrium acid sequence selected from the group consisting of: SEQID sp. (SR21); Schizochytrium aggregatum (Goldstein et Bel NO:8, SEQID NO:10, SEQID NO:13, SEQID NO:18, SEQ sky)(ATCC 28209); Schizochytrium limacinum (Honda et ID NO:20, SEQID NO:22, SEQID NO:24, SEQID NO:26, Yokochi)(IFO 32693); Thraustochytrium sp. (23B)(ATCC SEQID NO:28, SEQID NO:30, and SEQID NO:32; wherein 65 20891); Thraustochytrium striatum (Schneider)(ATCC the amino acid sequence has a biological activity of at least 24473); Thraustochytrium aureum (Goldstein)(ATCC one domain of a PUFAPKS system. The genetic modification 34304); Thraustochytrium roseum (Goldstein)(ATCC US 7,842,796 B2 45 46 28210); and Japonochytrium sp. (L1)(ATCC 28207). Other cally modify according to the present invention is preferably examples of Suitable host microorganisms for genetic modi a plant Suitable for consumption by animals, including fication include, but are not limited to, yeast including Sac humans. charomyces cerevisiae, Saccharomyces Carlsbergensis, or Preferred plants to genetically modify according to the other yeast Such as Candida, Kluyveromyces, or other fungi, present invention (i.e., plant host cells) include, but are not for example, filamentous fungi such as Aspergillus, Neuro limited to any higher plants, and particularly consumable spora, Penicillium, etc. Bacterial cells also may be used as plants, including crop plants and especially plants used for hosts. This includes Escherichia coli, which can be useful in their oils. Such plants can include, for example: canola, Soy fermentation processes. Alternatively, a host Such as a Lac beans, rapeseed, linseed, corn, safflowers, Sunflowers and 10 tobacco. Other preferred plants include those plants that are tobacillus species or Bacillus species can be used as a host. known to produce compounds used as pharmaceutical agents, Another embodiment of the present invention relates to a flavoring agents, neutraceutical agents, functional food ingre genetically modified plant, wherein the planthas been geneti dients or cosmetically active agents or plants that are geneti cally modified to recombinantly express a PKS system com cally engineered to produce these compounds/agents. prising at least one biologically active domain of a polyun 15 According to the present invention, a genetically modified saturated fatty acid (PUFA) polyketide synthase (PKS) microorganism or plant includes a microorganism or plant system. The domain is encoded by a nucleic acid sequence that has been modified using recombinant technology. As chosen from: (a) a nucleic acid sequence encoding at least one used herein, genetic modifications which result in a decrease domain of a polyunsaturated fatty acid (PUFA) polyketide in gene expression, in the function of the gene, or in the synthase (PKS) system from a Thraustochytrid microorgan function of the gene product (i.e., the protein encoded by the ism; (b) a nucleic acid sequence encoding at least one domain gene) can be referred to as inactivation (complete or partial), of a PUFA PKS system from a microorganism identified by deletion, interruption, blockage or down-regulation of agene. the screening and selection method described herein (see For example, a genetic modification in a gene which results in brief Summary of method in discussion of genetically modi a decrease in the function of the protein encoded by such fied microorganism above); (c) a nucleic acid sequence 25 gene, can be the result of a complete deletion of the gene (i.e., encoding an amino acid sequence selected from the group the gene does not exist, and therefore the protein does not exist), a mutation in the gene which results in incomplete or consisting of: SEQID NO:2, SEQID NO:4, SEQID NO:6, no translation of the protein (e.g., the protein is not and biologically active fragments thereof; (d) a nucleic acid expressed), or a mutation in the gene which decreases or sequence encoding an amino acid sequence selected from the 30 abolishes the natural function of the protein (e.g., a protein is group consisting of: SEQID NO:8, SEQID NO:10, SEQ ID expressed which has decreased or no enzymatic activity or NO:13, SEQ ID NO:18, SEQ ID NO:20, SEQ ID NO:22, action). Genetic modifications that result in an increase in SEQ ID NO:24, SEQ ID NO:26, SEQ ID NO:28, SEQ ID gene expression or function can be referred to as amplifica NO:30, SEQ ID NO:32, and biologically active fragments tion, overproduction, overexpression, activation, enhance thereof, (e) a nucleic acid sequence encoding an amino acid 35 ment, addition, or up-regulation of a gene. sequence that is at least about 60% identical to at least 500 The genetic modification of a microorganism or plant consecutive amino acids of an amino acid sequence selected according to the present invention preferably affects the activ from the group consisting of: SEQID NO:2, SEQID NO:4, ity of the PKS system expressed by the plant, whether the and SEQ ID NO:6; wherein the amino acid sequence has a PKS system is endogenous and genetically modified, endog biological activity of at least one domain of a PUFA PKS 40 enous with the introduction of recombinant nucleic acid mol system; and/or (f) a nucleic acid sequence encoding an amino ecules into the organism, or provided completely by recom acid sequence that is at least about 60% identical to an amino binant technology. According to the present invention, to acid sequence selected from the group consisting of: SEQID “affect the activity of a PKS system” includes any genetic NO:8, SEQID NO:10, SEQID NO:13, SEQID NO:18, SEQ modification that causes any detectable or measurable change ID NO:20, SEQID NO:22, SEQID NO:24, SEQID NO:26, 45 or modification in the PKS system expressed by the organism SEQID NO:28, SEQID NO:30, and SEQID NO:32; wherein as compared to in the absence of the genetic modification. A the amino acid sequence has a biological activity of at least detectable change or modification in the PKS system can one domain of a PUFAPKS system. The genetically modified include, but is not limited to: the introduction of PKS system plant can include any one or more of the above-identified activity into an organism such that the organism now has nucleic acid sequences, and/or any of the other homologues 50 measurable/detectable PKS system activity (i.e., the organ of any of the Schizochytrium PUFA PKS ORFs or domains as ism did not contain a PKS system prior to the genetic modi described in detail above. fication), the introduction into the organism of a functional As used herein, a genetically modified plant can include domain from a different PKS system than a PKS system any genetically modified plant including higher plants and endogenously expressed by the organism Such that the PKS particularly, any consumable plants or plants useful for pro 55 system activity is modified (e.g., a bacterial PUFA PKS ducing a desired bioactive molecule of the present invention. domain or a type IPKS domain is introduced into an organism Such a genetically modified plant has a genome which is that endogenously expresses a non-bacterial PUFA PKS sys modified (i.e., mutated or changed) from its normal (i.e., tem), a change in the amount of a bioactive molecule pro wild-type or naturally occurring) form such that the desired duced by the PKS system (e.g., the system produces more result is achieved (i.e., increased or modified PUFA PKS 60 (increased amount) or less (decreased amount) of a given activity and/or production of a desired product using the PKS product as compared to in the absence of the genetic modifi system). Genetic modification of a plant can be accomplished cation), a change in the type of a bioactive molecule produced using classical strain development and/or molecular genetic by the PKS system (e.g., the system produces a new or dif techniques. Methods for producing a transgenic plant, ferent product, or a variant of a product that is naturally wherein a recombinant nucleic acid molecule encoding a 65 produced by the system), and/or a change in the ratio of desired amino acid sequence is incorporated into the genome multiple bioactive molecules produced by the PKS system of the plant, are known in the art. A preferred plant to geneti (e.g., the system produces a different ratio of one PUFA to US 7,842,796 B2 47 48 another PUFA, produces a completely different lipid profile gene with a heterologous sequence. Examples of heterolo as compared to in the absence of the genetic modification, or gous sequences that could be introduced into a host genome places various PUFAs in different positions in a triacylglyc include sequences encoding at least one functional domain erolas compared to the natural configuration). Such a genetic from another PKS system, such as a different non-bacterial modification includes any type of genetic modification and 5 PUFA PKS system, a bacterial PUFA PKS system, a type I specifically includes modifications made by recombinant PKS system, a type II PKS system, or a modular PKS system. technology and by classical mutagenesis. Other heterologous sequences to introduce into the genome It should be noted that reference to increasing the activity of a host includes a sequence encoding a protein or functional of a functional domain or protein in a PUFA PKS system domain that is not a domain of a PKS system, but which will refers to any genetic modification in the organism containing 10 affect the activity of the endogenous PKS system. For the domain or protein (or into which the domain or protein is example, one could introduce into the host genome a nucleic to be introduced) which results in increased functionality of acid molecule encoding a phosphopantetheinyl transferase the domain or protein system and can include higher activity (discussed below). Specific modifications that could be made of the domain or protein (e.g., specific activity or in vivo to an endogenous PUFA PKS system are discussed in detail enzymatic activity), reduced inhibition or degradation of the 15 below. domain or protein system, and overexpression of the domain In another aspect of this embodiment of the invention, the or protein. For example, gene copy number can be increased, genetic modification can include: (1) the introduction of a expression levels can be increased by use of a promoter that recombinant nucleic acid molecule encoding an amino acid gives higher levels of expression than that of the native pro sequence having a biological activity of at least one domain of moter, or a gene can be altered by genetic engineering or a non-bacterial PUFA PKS system; and/or (2) the introduc classical mutagenesis to increase the activity of the domain or tion of a recombinant nucleic acid molecule encoding a pro protein encoded by the gene. tein or functional domain that affects the activity of a PUFA Similarly, reference to decreasing the activity of a func PKS system, into a host. The host can include: (1) a host cell tional domain or protein in a PUFA PKS system refers to any that does not express any PKS system, wherein all functional genetic modification in the organism containing such domain 25 domains of a PKS system are introduced into the host cell, or protein (or into which the domain or protein is to be and wherein at least one functional domain is from a non introduced) which results in decreased functionality of the bacterial PUFA PKS system; (2) a host cell that expresses a domain or protein and includes decreased activity of the PKS system (endogenous or recombinant) having at least one domain or protein, increased inhibition or degradation of the functional domain of a non-bacterial PUFA PKS system, domain or protein and a reduction or elimination of expres 30 wherein the introduced recombinant nucleic acid molecule sion of the domain or protein. For example, the action of can encode at least one additional non-bacterial PUFA PKS domain or protein of the present invention can be decreased domain function or another protein or domain that affects the by blocking or reducing the production of the domain or activity of the host PKS system; and (3) a host cell that protein, "knocking out the gene or portion thereof encoding expresses a PKS system (endogenous or recombinant) which the domain or protein, reducing domain or protein activity, or 35 does not necessarily include a domain function from a non inhibiting the activity of the domain or protein. Blocking or bacterial PUFA PKS, and wherein the introduced recombi reducing the production of an domain or protein can include nant nucleic acid molecule includes a nucleic acid sequence placing the gene encoding the domain or protein under the encoding at least one functional domain of a non-bacterial control of a promoter that requires the presence of an inducing PUFA PKS system. In other words, the present invention compound in the growth medium. By establishing conditions 40 intends to encompass any genetically modified organism such that the inducer becomes depleted from the medium, the (e.g., microorganism or plant), wherein the organism com expression of the gene encoding the domain or protein (and prises at least one non-bacterial PUFA PKS domain function therefore, of protein synthesis) could be turned off. Blocking (either endogenously or by recombinant modification), and or reducing the activity of domain or protein could also wherein the genetic modification has a measurable effect on include using an excision technology approach similar to that 45 the non-bacterial PUFA PKS domain function or on the PKS described in U.S. Pat. No. 4,743,546, incorporated herein by system when the organism comprises a functional PKS sys reference. To use this approach, the gene encoding the protein tem. of interest is cloned between specific genetic sequences that Therefore, using the non-bacterial PUFA PKS systems of allow specific, controlled excision of the gene from the the present invention, which, for example, makes use of genes genome. Excision could be prompted by, for example, a shift 50 from Thraustochytrid PUFA PKS systems, gene mixing can in the cultivation temperature of the culture, as in U.S. Pat. be used to extend the range of PUFA products to include EPA, No. 4,743,546, or by some other physical or nutritional sig DHA, ARA, GLA, SDA and others, as well as to produce a nal. wide variety of bioactive molecules, including antibiotics, In one embodiment of the present invention, a genetic other pharmaceutical compounds, and other desirable prod modification includes a modification of a nucleic acid 55 ucts. The method to obtain these bioactive molecules includes sequence encoding an amino acid sequence that has a bio not only the mixing of genes from various organisms but also logical activity of at least one domain of a non-bacterial various methods of genetically modifying the non-bacterial PUFA PKS system as described herein. Such a modification PUFAPKS genes disclosed herein. Knowledge of the genetic can be to an amino acid sequence within an endogenously basis and domain structure of the non-bacterial PUFA PKS (naturally) expressed non-bacterial PUFA PKS system, 60 system of the present invention provides a basis for designing whereby a microorganism that naturally contains such a sys novel genetically modified organisms which produce a vari tem is genetically modified by, for example, classical ety of bioactive molecules. Although mixing and modifica mutagenesis and selection techniques and/or molecular tion of any PKS domains and related genes are contemplated genetic techniques, include genetic engineering techniques. by the present inventors, by way of example, various possible Genetic engineering techniques can include, for example, 65 manipulations of the PUFA-PKS system are discussed below using a targeting recombinant vector to delete a portion of an with regard to genetic modification and bioactive molecule endogenous gene, or to replace a portion of an endogenous production. US 7,842,796 B2 49 50 For example, in one embodiment, non-bacterial PUFA Another aspect of the unsaturated fatty acid synthesis in E. PKS system products, such as those produced by Thraus coli is the requirement for a particular KS enzyme, B-ketoa tochytrids, are altered by modifying the CLF (chain length cyl-ACP synthase, the product of the FabB gene. This is the factor) domain. This domain is characteristic of Type II (dis enzyme that carries out condensation of a fatty acid, linked to Sociated enzymes) PKS systems. Its amino acid sequence a cysteine residue at the active site (by a thio-ester bond), with shows homology to KS (keto synthase pairs) domains, but it a malonyl-ACP. In the multi-step reaction, CO is released lacks the active site cysteine. CLF may function to determine and the linear chain is extended by two carbons. It is believed the number of elongation cycles, and hence the chain length, that only this KS can extend a carbon chain that contains a of the end product. In this embodiment of the invention, using double bond. This extension occurs only when the double 10 bond is in the cis configuration; if it is in the trans configura the current state of knowledge of FAS and PKS synthesis, a tion, the double bond is reduced by enoyl-ACP reductase rational strategy for production of ARA by directed modifi (ER) prior to elongation (Heath et al., 1996, supra). All of the cation of the non-bacterial PUFA-PKS system is provided. PUFA-PKS systems characterized so far have two KS There is controversy in the literature concerning the function domains, one of which shows greater homology to the FabB of the CLF in PKS systems (C. Bisang et al., Nature 401, 502 15 like KS of E. coli than the other. Again, without being bound (1999)) and it is realized that other domains may be involved by theory, the present inventors believe that in PUFA-PKS in determination of the chain length of the end product. How systems, the specificities and interactions of the DH (FabA ever, it is significant that Schizochytrium produces both DHA like) and KS (FabB-like) enzymatic domains determine the (C22:6, c)-3) and DPA (C22:5, co-6). In the PUFA-PKS sys number and placement of cis double bonds in the end prod tem the cis double bonds are introduced during synthesis of ucts. Because the number of 2-carbon elongation reactions is the growing carbon chain. Since placement of the ()-3 and greater than the number of double bonds present in the PUFA ()-6 double bonds occurs early in the synthesis of the mol PKS end products, it can be determined that in some exten ecules, one would not expect that they would affect subse sion cycles complete reduction occurs. Thus the DH and KS quent end-product chain length determination. Thus, without domains can be used as targets for alteration of the DHA/DPA being bound by theory, the present inventors believe that 25 ratio or ratios of other long chain fatty acids. These can be introduction of a factor (e.g. CLF) that directs synthesis of modified and/or evaluated by introduction of homologous C20 units (instead of C22 units) into the Schizochytrium domains from other systems or by mutagenesis of these gene PUFA-PKS system will result in the production of EPA (C20: fragments. 5. (O-3) and ARA (C20:4. (O-6). For example, in heterologous systems, one could exploit the CLF by directly substituting a In another embodiment, the ER (enoyl-ACP reductase—an CLF from an EPA producing system (such as one from Pho 30 enzyme which reduces the trans-double bond in the fatty tobacterium) into the Schizochytrium gene set. The fatty acids acyl-ACP resulting in fully saturated carbons) domains can of the resulting transformants can then be analyzed for alter be modified or substituted to change the type of product made ations in profiles to identify the transformants producing EPA by the PKS system. For example, the present inventors know and/or ARA. that Schizochytrium PUFA-PKS system differs from the pre 35 viously described bacterial systems in that it has two (rather In addition to dependence on development of a heterolo than one) ER domains. Without being bound by theory, the gous system (recombinant system, such as could be intro present inventors believe these ER domains can strongly duced into plants), the CLF concept can be exploited in influence the resulting PKS production product. The resulting Schizochytrium (i.e., by modification of a Schizochytrium PKS product could be changed by separately knocking out genome). Transformation and homologous recombination 40 the individual domains or by modifying their nucleotide has been demonstrated in Schizochytrium. One can exploit sequence or by Substitution of ER domains from other organ this by constructing a clone with the CLF of OrfB replaced 1SS. with a CLF from a C20 PUFA-PKS system. A marker gene In another embodiment, nucleic acid molecules encoding will be inserted downstream of the coding region. One can proteins or domains that are not part of a PKS system, but then transform the wild type cells, select for the marker phe 45 which affect a PKS system, can be introduced into an organ notype and then screen for those that had incorporated the ism. For example, all of the PUFA PKS systems described new CLF. Again, one would analyze these for any effects on above contain multiple, tandem, ACP domains. ACP (as a fatty acid profiles to identify transformants producing EPA separate protein or as a domain of a larger protein) requires and/or ARA. If some factor other than those associated with attachment of a phosphopantetheline cofactor to produce the the CLF are found to influence the chain length of the end 50 active, holo-ACP. Attachment of phosphopantetheline to the product, a similar strategy could be employed to alter those apo-ACP is carried out by members of the superfamily of factors. enzymes—the phosphopantetheinyl transferases (PPTase) Another preferred embodiment involving alteration of the (Lambalot R. H., et al., Chemistry and Biology, 3, 923 PUFA-PKS products involves modification or substitution of (1996)). the f-hydroxyacyl-ACP dehydrase/keto synthase pairs. Dur 55 By analogy to other PKS and FAS systems, the present ing cis-vaccenic acid (C18:1, A11) synthesis in E. coli, cre inventors presume that activation of the multiple ACP ation of the cis double bond is believed to depend on a specific domains present in the Schizochytrium ORFA protein is car DH enzyme, B-hydroxyacyl-ACP dehydrase, the product of ried out by a specific, endogenous, PPTase. The gene encod the FabA gene. This enzyme removes HOH from a B-keto ing this presumed PPTase has not yet been identified in acyl-ACP and leaves a trans double bond in the carbon chain. 60 Schizochytrium. If such a gene is present in Schizochytrium, A subset of DH's, FabA-like, possess cis-trans isomerase one can envision several approaches that could be used in an activity (Heath et al., 1996, supra). A novel aspect of bacterial attempt to identify and clone it. These could include (but and non-bacterial PUFA-PKS systems is the presence of two would not be limited to): generation and partial sequencing of FabA-like DH domains. Without being bound by theory, the a cDNA library prepared from actively growing present inventors believe that one or both of these DH 65 Schizochytrium cells (note, one sequence was identified in the domains will possess cis-trans isomerase activity (manipula currently available Schizochytrium cDNA library set which tion of the DH domains is discussed in greater detail below). showed homology to PPTases; however, it appears to be part US 7,842,796 B2 51 52 of a multidomain FAS protein, and as Such may not encode as a result, the sequence of the region just upstream of the the desired OrfA specific PPTase); use of degenerate oligo PKS gene cluster is now available. In this region are three nucleotide primers designed using amino acid motifs present Orfs that show homology to the domains (KS, MAT, ACP and in many PPTase's in PCR reactions (to obtain a nucleic acid KR) of OrfA (see FIG. 3). Included in this set are two ACP probe molecule to screen genomic or cDNA libraries): domains, both of which show high homology to the ORFA genetic approaches based on protein-protein interactions (e.g. ACP domains. At the end of the Nostoc PKS cluster is the a yeast two-hybrid system) in which the ORFA-ACP domains gene that encodes the Het I PPTase. Previously, it was not would be used as a “bait to find a “target' (i.e. the PPTase); obvious what the substrate of the Het I enzyme could be, and purification and partial sequencing of the enzyme itself as however the presence of tandem ACP domains in the newly a means to generate a nucleic acid probe for Screening of 10 identified Orf (Hgl E) of the cluster strongly suggests to the genomic or cDNA libraries. present inventors that it is those ACPs. The homology of the It is also conceivable that a heterologous PPTase may be ACP domains of Schizochytrium and Nostoc, as well as the capable of activating the Schizochytrium ORFA ACP tandem arrangement of the domains in both proteins, makes domains. It has been shown that some PPTases, for example Het I a likely candidate for heterologous activation of the the sfp enzyme of Bacillus subtilis (Lambalot et al., Supra) 15 Schizochytrium ORFA ACPs. The present inventors are and the SVp enzyme of Streptomyces verticillus (Sanchez et believed to be the first to recognize and contemplate this use al., 2001, Chemistry & Biology 8:725-738), have a broad for NoStoc Het IPPTase. substrate tolerance. These enzymes can be tested to see if they As indicated in Metz et al., 2001, supra, one novel feature will activate the Schizochytrium ACP domains. Also, a recent of the PUFA PKS systems is the presence of two dehydratase publication described the expression of a fungal PKS protein domains, both of which show homology to the FabA proteins in tobacco (Yalpani et al., 2001, The Plant Cell 13:1401 of E. coli. With the availability of the new Nostoc PKS gene 1409). Products of the introduced PKS system (encoded by sequences mentioned above, one can now compare the two the 6-methylsalicyclic acid synthase gene of Penicillium systems and their products. The sequence of domains in the patulum) were detected in the transgenic plant, even though Nostoc cluster (from HglE to Het I) as the present inventors the corresponding fungal PPTase was not present in those 25 have defined them is (see FIG. 3): plants. This suggested that an endogenous plant PPTase(s) KS-MAT-2XACP, KR, KS, CLF-AT, ER (HetM, HetN) recognized and activated the fungal PKS ACP domain. Of Het relevance to this observation, the present inventors have iden tified two sequences (genes) in the Arabidopsis whole In the Schizochytrium PUFA-PKSOrfs A, B&C the sequence genome database that are likely to encode PPTases. These 30 (OrfA-B-C) is: sequences (GenBank Accession numbers; AAG51443 and KS-MAT-9XACP-KRKS-CLF-AT-ERDH-DH-ER AAC05345) are currently listed as encoding “Unknown Pro One can see the correspondence of the domains sequence teins’. They can be identified as putative PPTases based on (there is also a high amino acid sequence homology). The the presence in the translated protein sequences of several product of the Nostoc PKS system is a long chain hydroxy signature motifs including; G(I/V)D and WXXKE(A/S)XXK 35 fatty acid (C26 or C28 with one or two hydroxy groups) that (SEQ ID NO:33), (listed in Lambalot et al., 1996 as charac contains no double bonds (cis or trans). The product of the teristic of all PPTases). In addition, these two putative pro Schizochytrium PKS system is a long chain polyunsaturated teins contain two additional motifs typically found in PPTases fatty acid (C22, with 5 or 6 double bonds—all cis). An obvi typically associated with PKS and non-ribosomal peptide ous difference between the two domain sets is the presence of synthesis systems; i.e., FN(IL/V)SHS (SEQID NO:34) and 40 the two DH domains in the Schizochytrium proteins just the (I/V/L)G(IL/V)D(IL/V) (SEQ ID NO:35). Furthermore, domains implicated in the formation of the cis double bonds these motifs occur in the expected relative positions in the of DHA and DPA (presumably HetMand HetN in the Nostoc protein sequences. It is likely that homologues of the Arabi system are involved in inclusion of the hydroxyl groups and dopsis genes are present in other plants, such as tobacco. also contain a DH domain whose origin differs from the those Again, these genes can be cloned and expressed to see if the 45 found in the PUFA). Also, the role of the duplicated ER enzymes they encode can activate the Schizochytrium ORFA domain in the Schizochytrium Orfs Band C is not known (the ACP domains, or alternatively, OrfA could be expressed second ER domain in is not present other characterized PUFA directly in the transgenic plant (either targeted to the plastid or PKS systems). The amino acid sequence homology between the cytoplasm). the two sets of domains implies an evolutionary relationship. Another heterologous PPTase which may recognize the 50 One can conceive of the PUFA PKS gene set being derived ORFA ACP domains as substrates is the Het I protein of from (in an evolutionary sense) an ancestral Nostoc-like PKS Nostoc sp. PCC 7120 (formerly called Anabaena sp. PCC gene set by incorporation of the DH (FabA-like) domains. 7120). As noted in U.S. Pat. No. 6,140,486, several of the The addition of the DH domains would result in the introduc PUFA-PKS genes of Shewanella showed a high degree of tion ofcis double bonds in the new PKS end product structure. homology to protein domains present in a PKS cluster found 55 The comparisons of the Schizochytrium and Nostoc PKS in Nostoc (FIG. 2 of that patent). This Nostoc PKS system is domain structures as well as the comparison of the domain associated with the synthesis of long chain (C26 or C28) organization between the Schizochytrium and Shewanella hydroxy fatty acids that become esterified to Sugar moieties PUFA-PKS proteins demonstrate nature's ability to alter and form a part of the heterocyst cell wall. These Nostoc PKS domain order as well as incorporate new domains to create domains are also highly homologous to the domains found in 60 novel end products. In addition, the genes can now be Orfs B and C of the Schizochytrium PKS proteins (i.e. the manipulated in the laboratory to create new products. The same ones that correspond to those found in the Shewanella implication from these observations is that it should be pos PKS proteins). Until very recently, none of the Nostoc PKS sible to continue to manipulate the systems in eithera directed domains present in the GenBank databases showed high or random way to influence the end products. For example, in homology to any of the domains of Schizochytrium OrfA (or 65 a preferred embodiment, one could envision Substituting one the homologous Shewanella Orf5 protein). However, the of the DH (FabA-like) domains of the PUFA-PKS system for complete genome of Nostoc has recently been sequenced and a DH domain that did not posses isomerization activity, US 7,842,796 B2 53 54 potentially creating a molecule with a mix of cis- and trans prises at least the following biologically active domains: (a) at double bonds. The current products of the Schizochytrium least one enoyl ACP-reductase (ER) domain; (b) multiple PUFA PKS system are DHA and DPA (C22:5 (06). If one acyl carrier protein (ACP) domains (at least four); (c) at least manipulated the system to produce C20 fatty acids, one two B-ketoacyl-ACP synthase (KS) domains; (d) at least one would expect the products to be EPA and ARA (C20:4 (O6). acyltransferase (AT) domain; (e) at least one ketoreductase This could provide a new source for ARA. One could also (KR) domain; (f) at least two FabA-like B-hydroxyacyl-ACP substitute domains from related PUFA-PKS systems that pro dehydrase (DH) domains; (g) at least one chain length factor duced a different DHA to DPA ratio—for example by using (CLF) domain; and (h) at least one malonyl-CoA:ACP acyl genes from Thraustochytrium 23B (the PUFA PKS system of transferase (MAT) domain. which is identified for the first time herein). 10 One aspect of this embodiment of the invention relates to a Additionally, one could envision specifically altering one method to produce a product containing at least one PUFA, of the ER domains (e.g. removing, or inactivating) in the comprising growing a plant comprising any of the recombi Schizochytrium PUFA PKS system (other PUFA PKS sys nant host cells described above, wherein the recombinant host tems described so far do not have two ER domains) to deter cell is a plant cell, under conditions effective to produce the mine its effect on the end product profile. Similar strategies 15 product. Another aspect of this embodiment of the invention could be attempted in a directed manner for each of the relates to a method to produce a product containing at least distinct domains of the PUFA-PKS proteins using more or one PUFA, comprising culturing a culture containing any of less Sophisticated approaches. Of course one would not be the recombinant host cells described above, wherein the host limited to the manipulation of single domains. Finally, one cell is a microbial cell, under conditions effective to produce could extend the approach by mixing domains from the the product. In a preferred embodiment, the PKS system in PUFA-PKS system and other PKS or FAS systems (e.g., type the host cell catalyzes the direct production of triglycerides. I, type II, modular) to create an entire range of new end Another embodiment of the present invention relates to a products. For example, one could introduce the PUFA-PKS microorganism comprising a non-bacterial, polyunsaturated DH domains into systems that do not normally incorporate cis fatty acid (PUFA) polyketide synthase (PKS) system, double bonds into their end products. 25 wherein the PKS catalyzes both iterative and non-iterative Accordingly, encompassed by the present invention are enzymatic reactions, and wherein the PUFA PKS system methods to genetically modify microbial or plant cells by: comprises: (a) at least two enoyl ACP-reductase (ER) genetically modifying at least one nucleic acid sequence in domains; (b) at least six acyl carrier protein (ACP) domains; the organism that encodes an amino acid sequence having the (c) at least two B-ketoacyl-ACP synthase (KS) domains; (d) biological activity of at least one functional domain of a 30 at least one acyltransferase (AT) domain; (e) at least one non-bacterial PUFA PKS system according to the present ketoreductase (KR) domain; (f) at least two FabA-like B-hy invention, and/or expressing at least one recombinant nucleic droxy acyl-ACP dehydrase (DH) domains; (g) at least one acid molecule comprising a nucleic acid sequence encoding chain length factor (CLF) domain; and (h) at least one malo Such amino acid sequence. Various embodiments of Such nyl-CoA:ACP acyltransferase (MAT) domain. Preferably, the sequences, methods to genetically modify an organism, and 35 microorganism is a non-bacterial microorganism and more specific modifications have been described in detail above. preferably, a eukaryotic microorganism. Typically, the method is used to produce a particular geneti Yet another embodiment of the present invention relates to cally modified organism that produces a particular bioactive a microorganism comprising a non-bacterial, polyunsatu molecule or molecules. rated fatty acid (PUFA) polyketide synthase (PKS) system, One embodiment of the present invention relates to a 40 wherein the PKS catalyzes both iterative and non-iterative recombinant host cell which has been modified to express a enzymatic reactions, and wherein the PUFA PKS system polyunsaturated fatty acid (PUFA) polyketide synthase comprises: (a) at least one enoyl ACP-reductase (ER) (PKS) system, wherein the PKS catalyzes both iterative and domain; (b) multiple acyl carrier protein (ACP) domains (at non-iterative enzymatic reactions, and wherein the PUFA least four); (c) at least two f-ketoacyl-ACP synthase (KS) PKS system comprises: (a) at least two enoyl ACP-reductase 45 domains; (d) at least one acyltransferase (AT) domain; (e) at (ER) domains; (b) at least six acyl carrier protein (ACP) least one ketoreductase (KR) domain; (f) at least two FabA domains; (c) at least two B-keto acyl-ACP synthase (KS) like B-hydroxy acyl-ACP dehydrase (DH) domains; (g) at domains; (d) at least one acyltransferase (AT) domain; (e) at least one chain length factor (CLF) domain; and (h) at least least one ketoreductase (KR) domain; (f) at least two FabA one malonyl-CoA:ACP acyltransferase (MAT) domain. like B-hydroxy acyl-ACP dehydrase (DH) domains; (g) at 50 In one embodiment of the present invention, it is contem least one chain length factor (CLF) domain; and (h) at least plated that a mutagenesis program could be combined with a one malonyl-CoA:ACP acyltransferase (MAT) domain. In selective screening process to obtain bioactive molecules of one embodiment, the PUFA PKS system is a eukaryotic interest. This would include methods to search for a range of PUFA PKS system. In a preferred embodiment, the PUFA bioactive compounds. This search would not be restricted to PKS system is an algal PUFA PKS system. In a more pre 55 production of those molecules with cis double bonds. The ferred embodiment, the PUFA PKS system is a Thraustochy mutagenesis methods could include, but are not limited to: triales PUFA PKS system. Such PUFA PKS systems can chemical mutagenesis, gene shuffling, Switching regions of include, but are not limited to, a Schizochytrium PUFA PKS the genes encoding specific enzymatic domains, or mutagen system, and a Thraustochytrium PUFA PKS system. In one esis restricted to specific regions of those genes, as well as embodiment, the PUFA PKS system can be expressed in a 60 other methods. prokaryotic host cell. In another embodiment, the PUFAPKS For example, high throughput mutagenesis methods could system can be expressed in a eukaryotic host cell. be used to influence or optimize production of the desired Another embodiment of the present invention relates to a bioactive molecule. Once an effective model system has been recombinant host cell which has been modified to express a developed, one could modify these genes in a high throughput non-bacterial PUFA PKS system, wherein the PKS system 65 manner. Utilization of these technologies can be envisioned catalyzes both iterative and non-iterative enzymatic reac on two levels. First, if a sufficiently selective screen for pro tions, and wherein the non-bacterial PUFA PKS system com duction of a product of interest (e.g., ARA) can be devised, it US 7,842,796 B2 55 56 could be used to attempt to alter the system to produce this Methods to produce Such genetically modified organisms product (e.g., in lieu of or in concert with, other strategies have been described in detail above. Such as those discussed above). Additionally, if the strategies One embodiment of the present invention is a method to outlined above resulted in a set of genes that did produce the produce desired bioactive molecules (also referred to as prod product of interest, the high throughput technologies could ucts or compounds) by growing or culturing a genetically then be used to optimize the system. For example, if the modified microorganism or plant of the present invention introduced domain only functioned at relatively low tempera (described in detail above). Such a method includes the step tures, selection methods could be devised to permit removing of culturing in a fermentation medium or growing in a Suit that limitation. In one embodiment of the invention, Screening able environment, Such as soil, a microorganism or plant, methods are used to identify additional non-bacterial organ 10 respectively, that has a genetic modification as described isms having novel PKS systems similar to the PUFA PKS previously herein and in accordance with the present inven system of Schizochytrium, as described herein (see above). tion. In a preferred embodiment, method to produce bioactive Homologous PKS systems identified in Such organisms can molecules of the present invention includes the step of cul be used in methods similar to those described herein for the turing under conditions effective to produce the bioactive Schizochytrium, as well as for an additional source of genetic 15 molecule a genetically modified organism that expresses a material from which to create, further modify and/or mutate a PKS system comprising at least one biologically active PKS system for expression in that microorganism, in another domain of a polyunsaturated fatty acid (PUFA) polyketide microorganism, or in a higher plant, to produce a variety of synthase (PKS) system. In this preferred aspect, at least one compounds. domain of the PUFAPKS system is encoded by a nucleic acid It is recognized that many genetic alterations, either ran sequence selected from the group consisting of: (a) a nucleic dom or directed, which one may introduce into a native (en acid sequence encoding at least one domain of a polyunsatu dogenous, natural) PKS system, will result in an inactivation rated fatty acid (PUFA) polyketide synthase (PKS) system of enzymatic functions. A preferred embodiment of the inven from a Thraustochytrid microorganism; (b) a nucleic acid tion includes a system to select for only those modifications sequence encoding at least one domain of a PUFA PKS sys that do not block the ability of the PKS system to produce a 25 tem from a microorganism identified by the novel Screening product. For example, the FabB-strain of E. coli is incapable method of the present invention (described above in detail); of synthesizing unsaturated fatty acids and requires Supple (c) a nucleic acid sequence encoding an amino acid sequence mentation of the medium with fatty acids that can substitute selected from the group consisting of: SEQID NO:2, SEQID for its normal unsaturated fatty acids in order to grow (see NO:4, SEQ ID NO:6, and biologically active fragments Metz et al., 2001, Supra). However, this requirement (for 30 thereof. (d) a nucleic acid sequence encoding an amino acid supplementation of the medium) can be removed when the sequence selected from the group consisting of SEQ ID strain is transformed with a functional PUFA-PKS system NO:8, SEQID NO:10, SEQID NO:13, SEQID NO:18, SEQ (i.e. one that produces a PUFA product in the E. coli host— ID NO:20, SEQID NO:22, SEQID NO:24, SEQID NO:26, see (Metz et al., 2001, supra, FIG. 2A). The transformed SEQ ID NO:28, SEQID NO:30, SEQID NO:32, and bio FabB- strain now requires a functional PUFA-PKS system 35 logically active fragments thereof, (e) a nucleic acid sequence (to produce the unsaturated fatty acids) for growth without encoding an amino acid sequence that is at least about 60% Supplementation. The key element in this example is that identical to at least 500 consecutive amino acids of an amino production of a wide range of unsaturated fatty acid will acid sequence selected from the group consisting of: SEQID Suffice (even unsaturated fatty acid Substitutes such as NO:2, SEQID NO:4, and SEQID NO:6; wherein the amino branched chain fatty acids). Therefore, in another preferred 40 acid sequence has a biological activity of at least one domain embodiment of the invention, one could create a large number of a PUFA PKS system; and, (f) a nucleic acid sequence of mutations in one or more of the PUFAPKS genes disclosed encoding an amino acid sequence that is at least about 60% herein, and then transform the appropriately modified FabB identical to an amino acid sequence selected from the group strain (e.g. create mutations in an expression construct con consisting of: SEQID NO:8, SEQID NO:10, SEQID NO:13, taining an ER domain and transform a FabB- strain having 45 SEQ ID NO:18, SEQ ID NO:20, SEQ ID NO:22, SEQ ID the other essential domains on a separate plasmid-or inte NO:24, SEQID NO:26, SEQID NO:28, SEQID NO:30, and grated into the chromosome) and select only for those trans SEQ ID NO:32; wherein the amino acid sequence has a formants that grow without Supplementation of the medium biological activity of at least one domain of a PUFA PKS (i.e., that still possessed an ability to produce a molecule that system. In this preferred aspect of the method, the organism is could complement the FabB- defect). Additional screens 50 genetically modified to affect the activity of the PKS system could be developed to look for particular compounds (e.g. use (described in detail above). Preferred host cells for genetic of GC for fatty acids) being produced in this selective subset modification related to the PUFAPKS system of the invention of an active PKS system. One could envision a number of are described above. similar selective screens for bioactive molecules of interest. In the method of production of desired bioactive com As described above, in one embodiment of the present 55 pounds of the present invention, a genetically modified invention, a genetically modified microorganism or plant microorganism is cultured or grown in a Suitable medium, includes a microorganism or plant which has an enhanced under conditions effective to produce the bioactive com ability to synthesize desired bioactive molecules (products) pound. An appropriate, or effective, medium refers to any or which has a newly introduced ability to synthesize specific medium in which a genetically modified microorganism of products (e.g., to synthesize a specific antibiotic). According 60 the present invention, when cultured, is capable of producing to the present invention, “an enhanced ability to synthesize a the desired product. Such a medium is typically an aqueous product refers to any enhancement, or up-regulation, in a medium comprising assimilable carbon, nitrogen and phos pathway related to the synthesis of the product such that the phate sources. Such a medium can also include appropriate microorganism or plant produces an increased amount of the salts, minerals, metals and other nutrients. Microorganisms of product (including any production of a product where there 65 the present invention can be cultured in conventional fermen was none before) as compared to the wild-type microorgan tation bioreactors. The microorganisms can be cultured by ism or plant, cultured or grown, under the same conditions. any fermentation process which includes, but is not limited to, US 7,842,796 B2 57 58 batch, fed-batch, cell recycle, and continuous fermentation. cally active domain of the PUFA PKS system, and the genetic Preferred growth conditions for potential host microorgan modification comprises transfection of the organism with a isms according to the present invention are well known in the recombinant nucleic acid molecule selected from the group art. The desired bioactive molecules produced by the geneti consisting of a recombinant nucleic acid molecule encoding cally modified microorganism can be recovered from the at least one biologically active domain from a second PKS fermentation medium using conventional separation and system and a recombinant nucleic acid molecule encoding a purification techniques. For example, the fermentation protein that affects the activity of the PUFA PKS system. In medium can be filtered or centrifuged to remove microorgan this embodiment, the genetic modification preferably isms, cell debris and other particulate matter, and the product changes at least one product produced by the endogenous can be recovered from the cell-free supernatant by conven 10 PKS system, as compared to a wild-type organism. A second tional methods, such as, for example, ion exchange, chroma PKS system can include another PUFA PKS system (bacte tography, extraction, Solvent extraction, membrane separa rial or non-bacterial), a type I PKS system, a type II PKS tion, electrodialysis, reverse osmosis, distillation, chemical system, and/or a modular PKS system. Examples of proteins derivatization and crystallization. Alternatively, microorgan that affect the activity of a PKS system have been described isms producing the desired compound, or extracts and various 15 above (e.g., PPTase). fractions thereof, can be used without removal of the micro In another embodiment, the organism is genetically modi organism components from the product. fied by transfection with a recombinant nucleic acid molecule In the method for production of desired bioactive com encoding the at least one domain of the polyunsaturated fatty pounds of the present invention, a genetically modified plant acid (PUFA) polyketide synthase (PKS) system. Such recom is cultured in a fermentation medium or grown in a Suitable binant nucleic acid molecules have been described in detail medium such as soil. An appropriate, or effective, fermenta previously herein. tion medium has been discussed in detail above. A suitable In another embodiment, the organism endogenously growth medium for higher plants includes any growth expresses a non-bacterial PUFA PKS system, and the genetic medium for plants, including, but not limited to, Soil, sand, modification comprises Substitution of a domain from a dif any other particulate media that Support root growth (e.g. 25 ferent PKS system for a nucleic acid sequence encoding at Vermiculite, perlite, etc.) or Hydroponic culture, as well as least one domain of the non-bacterial PUFA PKS system. In Suitable light, water and nutritional Supplements which opti another embodiment, the organism endogenously expresses a mize the growth of the higher plant. The genetically modified non-bacterial PUFA PKS system that has been modified by plants of the present invention are engineered to produce transfecting the organism with a recombinant nucleic acid significant quantities of the desired product through the activ 30 molecule encoding a protein that regulates the chain length of ity of the PKS system that is genetically modified according fatty acids produced by the PUFA PKS system. In one aspect, to the present invention. The compounds can be recovered the recombinant nucleic acid molecule encoding a protein through purification processes which extract the compounds that regulates the chain length offatty acids replaces a nucleic from the plant. In a preferred embodiment, the compound is acid sequence encoding a chain length factor in the non recovered by harvesting the plant. In this embodiment, the 35 bacterial PUFA PKS system. In another aspect, the protein plant can be consumed in its natural state or further processed that regulates the chain length offatty acids produced by the into consumable products. PUFA PKS system is a chain length factor. In another aspect, As described above, a genetically modified microorganism the protein that regulates the chain length of fatty acids pro useful in the present invention can, in one aspect, endog duced by the PUFA PKS system is a chain length factor that enously contain and express a PUFA PKS system, and the 40 directs the synthesis of C20 units. genetic modification can be a genetic modification of one or In another embodiment, the organism expresses a non more of the functional domains of the endogenous PUFA bacterial PUFA PKS system comprising a genetic modifica PKS system, whereby the modification has some effect on the tion in a domain selected from the group consisting of a activity of the PUFA PKS system. In another aspect, such an domain encoding B-hydroxyacyl-ACP dehydrase (DH) and a organism can endogenously contain and express a PUFAPKS 45 domain encoding B-ketoacyl-ACP synthase (KS), wherein system, and the genetic modification can be an introduction of the modification alters the ratio of long chain fatty acids at least one exogenous nucleic acid sequence (e.g., a recom produced by the PUFA PKS system as compared to in the binant nucleic acid molecule), wherein the exogenous nucleic absence of the modification. In one aspect of this embodi acid sequence encodes at least one biologically active domain ment, the modification is selected from the group consisting or protein from a second PKS system and/or a protein that 50 of a deletion of all or a part of the domain, a substitution of a affects the activity of said PUFA PKS system (e.g., a phos homologous domain from a different organism for the phopantetheinyl transferases (PPTase), discussed below). In domain, and a mutation of the domain. yet another aspect, the organism does not necessarily endog In another embodiment, the organism expresses a non enously (naturally) contain a PUFA PKS system, but is bacterial PUFA PKS system comprising a modification in an genetically modified to introduce at least one recombinant 55 enoyl-ACP reductase (ER) domain, wherein the modification nucleic acid molecule encoding an amino acid sequence hav results in the production of a different compound as compared ing the biological activity of at least one domain of a PUFA to in the absence of the modification. In one aspect of this PKS system. In this aspect, PUFA PKS activity is affected by embodiment, the modification is selected from the group introducing or increasing PUFAPKS activity in the organism. consisting of a deletion of all or a part of the ER domain, a Various embodiments associated with each of these aspects 60 substitution of an ER domain from a different organism for have been discussed in detail above. the ER domain, and a mutation of the ER domain. In one embodiment of the method to produce bioactive In one embodiment of the method to produce a bioactive compounds, the genetic modification changes at least one molecule, the organism produces a polyunsaturated fatty acid product produced by the endogenous PKS system, as com (PUFA) profile that differs from the naturally occurring pared to a wild-type organism. 65 organism without a genetic modification. In another embodiment, the organism endogenously Many other genetic modifications useful for producing expresses a PKS system comprising the at least one biologi bioactive molecules will be apparent to those of skill in the US 7,842,796 B2 59 60 art, given the present disclosure, and various other modifica NO:10, SEQ ID NO:13, SEQ ID NO:18, SEQ ID NO:20, tions have been discussed previously herein. The present SEQ ID NO:22, SEQ ID NO:24, SEQ ID NO:26, SEQ ID invention contemplates any genetic modification related to a NO:28, SEQ ID NO:30, SEQ ID NO:32, and biologically PUFA PKS system as described herein which results in the active fragments thereof: (e) a nucleic acid sequence encod production of a desired bioactive molecule. ing an amino acid sequence that is at least about 60% identical Bioactive molecules, according to the present invention, to at least 500 consecutive amino acids of an amino acid include any molecules (compounds, products, etc.) that have sequence selected from the group consisting of SEQ ID a biological activity, and that can be produced by a PKS NO:2, SEQID NO:4, and SEQID NO:6; wherein the amino system that comprises at least one amino acid sequence hav acid sequence has a biological activity of at least one domain ing a biological activity of at least one functional domain of a 10 of a PUFA PKS system; and, (f) a nucleic acid sequence non-bacterial PUFA PKS system as described herein. Such encoding an amino acid sequence that is at least about 60% bioactive molecules can include, but are not limited to: a identical to an amino acid sequence selected from the group polyunsaturated fatty acid (PUFA), an anti-inflammatory for consisting of: SEQID NO:8, SEQID NO:10, SEQID NO:13, mulation, a chemotherapeutic agent, an active excipient, an SEQ ID NO:18, SEQ ID NO:20, SEQ ID NO:22, SEQ ID osteoporosis drug, an anti-depressant, an anti-convulsant, an 15 NO:24, SEQID NO:26, SEQID NO:28, SEQID NO:30, and anti-Heliobactor pylori drug, a drug for treatment of neuro SEQ ID NO:32; wherein the amino acid sequence has a degenerative disease, a drug for treatment of degenerative biological activity of at least one domain of a PUFA PKS liver disease, an antibiotic, and a cholesterol lowering formu system. Variations of these nucleic acid sequences have been lation. One advantage of the non-bacterial PUFAPKS system described in detail above. of the present invention is the ability of such a system to Preferably, the endproduct is selected from the group con introduce carbon-carbon double bonds in the cis configura sisting of a food, a dietary Supplement, a pharmaceutical tion, and molecules including a double bond at every third formulation, a humanized animal milk, and an infant formula. carbon. This ability can be utilized to produce a variety of Suitable pharmaceutical formulations include, but are not compounds. limited to, an anti-inflammatory formulation, a chemothera Preferably, bioactive compounds of interest are produced 25 peutic agent, an active excipient, an osteoporosis drug, an by the genetically modified microorganism in an amount that anti-depressant, an anti-convulsant, an anti-Heliobactor is greater than about 0.05%, and preferably greater than about pylori drug, a drug for treatment of neurodegenerative dis 0.1%, and more preferably greater than about 0.25%, and ease, a drug for treatment of degenerative liver disease, an more preferably greater than about 0.5%, and more prefer antibiotic, and a cholesterol lowering formulation. In one ably greater than about 0.75%, and more preferably greater 30 embodiment, the endproduct is used to treat a condition than about 1%, and more preferably greater than about 2.5%, selected from the group consisting of chronic inflammation, and more preferably greater than about 5%, and more prefer acute inflammation, gastrointestinal disorder, cancer, ably greater than about 10%, and more preferably greater than cachexia, cardiac restenosis, neurodegenerative disorder, about 15%, and even more preferably greater than about 20% degenerative disorder of the liver, blood lipid disorder, of the dry weight of the microorganism. For lipid compounds, 35 osteoporosis, osteoarthritis, autoimmune disease, preeclamp preferably, Such compounds are produced in an amount that is sia, preterm birth, age related maculopathy, pulmonary dis greater than about 5% of the dry weight of the microorganism. order, and peroxisomal disorder. For other bioactive compounds, such as antibiotics or com Suitable food products include, but are not limited to, fine pounds that are synthesized in Smaller amounts, those strains bakery wares, bread and rolls, breakfast cereals, processed possessing Such compounds at of the dry weight of the micro 40 and unprocessed cheese, condiments (ketchup, mayonnaise, organism are identified as predictably containing a novel PKS etc.), dairy products (milk, yogurt), puddings and gelatine system of the type described above. In some embodiments, desserts, carbonated drinks, teas, powdered beverage mixes, particular bioactive molecules (compounds) are secreted by processed fish products, fruit-based drinks, chewing gum, the microorganism, rather than accumulating. Therefore, hard confectionery, frozen dairy products, processed meat such bioactive molecules are generally recovered from the 45 products, nut and nut-based spreads, pasta, processed poultry culture medium and the concentration of molecule produced products, gravies and sauces, potato chips and other chips or will vary depending on the microorganism and the size of the crisps, chocolate and other confectionery, Soups and soup culture. mixes, Soya based products (milks, drinks, creams, whiten One embodiment of the present invention relates to a ers), vegetable oil-based spreads, and vegetable-based drinks. method to modify an endproduct containing at least one fatty 50 Yet another embodiment of the present invention relates to acid, comprising adding to said endproduct an oil produced a method to produce a humanized animal milk. This method by a recombinant host cell that expresses at least one recom includes the steps of genetically modifying milk-producing binant nucleic acid molecule comprising a nucleic acid cells of a milk-producing animal with at least one recombi sequence encoding at least one biologically active domain of nant nucleic acid molecule comprising a nucleic acid a PUFA PKS system. The PUFA PKS system is any non 55 sequence encoding at least one biologically active domain of bacterial PUFA PKS system, and preferably, is selected from a PUFA PKS system. The PUFA PKS system is a non-bacte the group of: (a) a nucleic acid sequence encoding at least one rial PUFA PKS system, and preferably, the at least one domain of a polyunsaturated fatty acid (PUFA) polyketide domain of the PUFAPKS system is encoded by a nucleic acid synthase (PKS) system from a Thraustochytrid microorgan sequence selected from the group consisting of: (a) a nucleic ism; (b) a nucleic acid sequence encoding at least one domain 60 acid sequence encoding at least one domain of a polyunsatu of a PUFA PKS system from a microorganism identified by rated fatty acid (PUFA) polyketide synthase (PKS) system the novel Screening method disclosed herein; (c) a nucleic from a Thraustochytrid microorganism; (b) a nucleic acid acid sequence encoding an amino acid sequence selected sequence encoding at least one domain of a PUFA PKS sys from the group consisting of: SEQID NO:2, SEQID NO:4, tem from a microorganism identified by the novel Screening SEQID NO:6, and biologically active fragments thereof. (d) 65 method described previously herein; (c) a nucleic acid a nucleic acid sequence encoding an amino acid sequence sequence encoding an amino acid sequence selected from the selected from the group consisting of: SEQID NO:8, SEQID group consisting of: SEQID NO:2, SEQID NO:4, SEQ ID US 7,842,796 B2 61 62 NO:6, and biologically active fragments thereof, (d) a nucleic protein's function. Profile hidden Markov models (profile acid sequence encoding an amino acid sequence selected HMMs) built from the Pfam alignments can be very useful for from the group consisting of: SEQID NO:8, SEQID NO:10, automatically recognizing that a new protein belongs to an SEQ ID NO:13, SEQ ID NO:18, SEQ ID NO:20, SEQ ID existing protein family, even if the homology is weak. Unlike NO:22, SEQ ID NO:24, SEQ ID NO:26, SEQ ID NO:28, standard pairwise alignment methods (e.g. BLAST. FASTA), SEQID NO:30, SEQID NO:32, and biologically active frag Pfam HMMs deal sensibly with multidomain proteins. The ments thereof (e) a nucleic acid sequence encoding an amino reference provided for the Pfam version used is: Bateman A, acid sequence that is at least about 60% identical to at least Birney E. Cerruti L. Durbin R, Etwiller L., Eddy S R, Grif 500 consecutive amino acids of an amino acid sequence fiths-Jones S. Howe K L, Marshall M, Sonnhammer E L selected from the group consisting of: SEQID NO:2, SEQID 10 (2002) Nucleic Acids Research 30(1):276-280); and/or NO:4, and SEQID NO:6; wherein the amino acid sequence (2) homology comparison to bacterial PUFA-PKS systems has a biological activity of at least one domain of a PUFAPKS (e.g., Shewanella) using a BLAST 2.0 Basic BLAST homol system; and/or (f) a nucleic acid sequence encoding an amino ogy search using blastp foramino acid searches with standard acid sequence that is at least about 60% identical to an amino default parameters, wherein the query sequence is filtered for acid sequence selected from the group consisting of: SEQID 15 low complexity regions by default (described in Altschul, S. NO:8, SEQID NO:10, SEQID NO:13, SEQID NO:18, SEQ F., Madden, T. L., Schäffer, A. A., Zhang, J., Zhang, Z. ID NO:20, SEQID NO:22, SEQID NO:24, SEQID NO:26, Miller, W. & Lipman, D. J. (1997) “Gapped BLAST and SEQID NO:28, SEQID NO:30, and SEQID NO:32; wherein PSI-BLAST: a new generation of protein database search the amino acid sequence has a biological activity of at least programs. Nucleic Acids Res. 25:3389-3402, incorporated one domain of a PUFA PKS system. herein by reference in its entirety). Methods to genetically modify a host cell and to produce a Sequences provided for individual domains are believed to genetically modified non-human, milk-producing animal, are contain the full length of the sequence encoding a functional known in the art. Examples of host animals to modify include domain, and may contain additional flanking sequence within cattle, sheep, pigs, goats, yaks, etc., which are amenable to the Orf. genetic manipulation and cloning for rapid expansion of a 25 transgene expressing population. For animals, PKS-like ORFA-KS transgenes can be adapted for expression in target organelles, The first domain in OrfA is a KS domain, also referred to tissues and body fluids through modification of the gene herein as ORFA-KS. This domain is contained within the regulatory regions. Of particular interest is the production of nucleotide sequence spanning from a starting point of PUFAs in the breast milk of the host animal. 30 between about positions 1 and 40 of SEQID NO:1 (OrfA) to The following examples are provided for the purpose of an ending point of between about positions 1428 and 1500 of illustration and are not intended to limit the scope of the SEQ ID NO:1. The nucleotide sequence containing the present invention. sequence encoding the ORFA-KS domain is represented herein as SEQID NO:7 (positions 1-1500 of SEQID NO:1). EXAMPLES 35 The amino acid sequence containing the KS domain spans from a starting point of between about positions 1 and 14 of Example 1 SEQ ID NO:2 (ORFA) to an ending point of between about positions 476 and 500 of SEQ ID NO:2. The amino acid The following example describes the further analysis of sequence containing the ORFA-KS domain is represented PKS related sequences from Schizochytrium. 40 hereinas SEQID NO:8 (positions 1-500 of SEQID NO:2). It The present inventors have sequenced the genomic DNA is noted that the ORFA-KS domain contains an active site including the entire length of all three open reading frames motif: DXAC* (*acyl binding site Cs). (Orfs) in the Schizochytrium PUFA PKS system using the ORFA-MAT general methods outlined in Examples 8 and 9 from PCT The second domain in OrfA is a MAT domain, also referred Publication No. WO 0042195 and U.S. application Ser. No. 45 to hereinas ORFA-MAT. This domain is contained within the 09/231,899. The biologically active domains in the nucleotide sequence spanning from a starting point of Schizochytrium PKS proteins are depicted graphically in FIG. between about positions 1723 and 1798 of SEQ ID NO:1 1. The domain structure of the Schizochytrium PUFA PKS (OrfA) to an ending point of between about positions 2805 system is described more particularly as follows. and 3000 of SEQID NO:1. The nucleotide sequence contain Open Reading Frame A (OrfA): 50 ing the sequence encoding the ORFA-MAT domain is repre The complete nucleotide sequence for OrfA is represented sented herein as SEQID NO:9 (positions 1723-3000 of SEQ ID NO:1). The amino acid sequence containing the MAT herein as SEQID NO:1. OrfA is a 8730 nucleotide sequence domain spans from a starting point of between about posi (not including the stop codon) which encodes a 2910 amino tions 575 and 600 of SEQIDNO:2 (ORFA) to an ending point acid sequence, represented herein as SEQ ID NO:2. Within 55 OrfA are twelve domains: of between about positions 935 and 1000 of SEQ ID NO:2. (a) one B-ketoacyl-ACP synthase (KS) domain; The amino acid sequence containing the ORFA-MAT domain (b) one malonyl-CoA:ACP acyltransferase (MAT) is represented herein as SEQID NO:10 (positions 575-1000 of SEQ ID NO:2). It is noted that the ORFA-MAT domain domain; contains an active site motif: GHS*XG (acyl binding site (c) nine acyl carrier protein (ACP) domains; 60 So), represented herein as SEQID NO:11. (d) one ketoreductase (KR) domain. The domains contained within OrfA have been determined ORFA-ACPH1-9 based on: Domains 3-11 of OrfA are nine tandem ACP domains, also (1) results of an analysis with Pfam program (Pfam is a referred to herein as ORFA-ACP (the first domain in the database of multiple alignments of protein domains or con 65 sequence is ORFA-ACP1, the second domain is ORFA served protein regions. The alignments represent some evo ACP2, the third domain is ORFA-ACP3, etc.). The first ACP lutionary conserved structure that has implications for the domain, ORFA-ACP1, is contained within the nucleotide US 7,842,796 B2 63 64 sequence spanning from about position 3343 to about posi 2.0 Basic BLAST homology search, also described above. tion 3600 of SEQID NO:1 (OrfA). The nucleotide sequence Sequences provided for individual domains are believed to containing the sequence encoding the ORFA-ACP1 domain contain the full length of the sequence encoding a functional is represented hereinas SEQID NO:12 (positions 3343-3600 domain, and may contain additional flanking sequence within of SEQ ID NO:1). The amino acid sequence containing the the Orf. first ACP domain spans from about position 1115 to about ORFB-KS position 1200 of SEQ ID NO:2. The amino acid sequence The first domain in OrfB is a KS domain, also referred to containing the ORFA-ACP1 domain is represented herein as herein as ORFB-KS. This domain is contained within the SEQID NO:13 (positions 1115-1200 of SEQID NO:2). It is nucleotide sequence spanning from a starting point of noted that the ORFA-ACP1 domain contains an active site 10 between about positions 1 and 43 of SEQID NO:3 (OrfB) to motif: LGIDS* (pantetheine binding motif Ss.), repre an ending point of between about positions 1332 and 1350 of sented herein by SEQID NO:14. The nucleotide and amino SEQ ID NO:3. The nucleotide sequence containing the acid sequences of all nine ACP domains are highly conserved sequence encoding the ORFB-KS domain is represented and therefore, the sequence for each domain is not repre hereinas SEQIDNO:19 (positions 1-1350 of SEQIDNO:3). sented herein by an individual sequence identifier. However, 15 based on this information, one of skill in the art can readily The amino acid sequence containing the KS domain spans determine the sequence for each of the other eight ACP from a starting point of between about positions 1 and 15 of domains. The repeat interval for the nine domains is approxi SEQ ID NO:4 (ORFB) to an ending point of between about mately about 110 to about 330 nucleotides of SEQID NO:1. positions 444 and 450 of SEQ ID NO:4. The amino acid All nine ACP domains together span a region of OrfA of sequence containing the ORFB-KS domain is represented from about position 3283 to about position 6288 of SEQID herein as SEQID NO:20 (positions 1-450 of SEQID NO:4). NO:1, which corresponds to amino acid positions of from It is noted that the ORFB-KS domain contains an active site about 1095 to about 2096 of SEQ ID NO:2. This region motif: DXAC* (*acyl binding site C). includes the linker segments between individual ACP ORFB-CLF domains. Each of the nine ACP domains contains a panteth 25 The second domain in OrfB is a CLF domain, also referred eine binding motif LGIDS* (represented herein by SEQ ID to hereinas ORFB-CLF. This domain is contained within the NO:14), wherein * is the pantetheline binding site S. At each nucleotide sequence spanning from a starting point of end of the ACP domain region and between each ACP domain between about positions 1378 and 1402 of SEQ ID NO:3 is a region that is highly enriched for proline (P) and alanine (OrfB) to an ending point of between about positions 2682 (A), which is believed to be a linker region. For example, 30 and 2700 of SEQID NO:3. The nucleotide sequence contain between ACP domains 1 and 2 is the sequence: APAPV ing the sequence encoding the ORFB-CLF domain is repre KAAAPAAPVASAPAPA, represented herein as SEQ ID sented hereinas SEQID NO:21 (positions 1378-2700 of SEQ NO:15. ID NO:3). The amino acid sequence containing the CLF ORFA-KR domain spans from a starting point of between about posi Domain 12 in OrfA is a KR domain, also referred to herein 35 tions 460 and 468 of SEQIDNO:4(ORFB) to an ending point as ORFA-KR. This domain is contained within the nucleotide of between about positions 894 and 900 of SEQID NO:4. The sequence spanning from a starting point of about position amino acid sequence containing the ORFB-CLF domain is 6598 of SEQID NO:1 to an ending point of about position represented herein as SEQ ID NO:22 (positions 460-900 of 8730 of SEQID NO:1. The nucleotide sequence containing SEQ ID NO:4). It is noted that the ORFB-CLF domain con the sequence encoding the ORFA-KR domain is represented 40 tains a KS active site motif without the acyl-binding cysteine. herein as SEQ ID NO:17 (positions 6598-8730 of SEQ ID ORFB-AT NO:1). The amino acid sequence containing the KR domain The third domain in OrfB is an AT domain, also referred to spans from a starting point of about position 2200 of SEQID herein as ORFB-AT. This domain is contained within the NO:2 (ORFA) to an ending point of about position 2910 of 45 nucleotide sequence spanning from a starting point of SEQ ID NO:2. The amino acid sequence containing the between about positions 2701 and 3598 of SEQ ID NO:3 ORFA-KR domain is represented herein as SEQ ID NO:18 (OrfB) to an ending point of between about positions 3975 (positions 2200-2910 of SEQ ID NO:2). Within the KR and 4200 of SEQID NO:3. The nucleotide sequence contain domain is a core region with homology to short chain alde ing the sequence encoding the ORFB-AT domain is repre hyde-dehydrogenases (KR is a member of this family). This sented hereinas SEQID NO:23 (positions 2701-4200 of SEQ core region spans from about position 7198 to about position 50 ID NO:3). The amino acid sequence containing the AT 7500 of SEQ ID NO:1, which corresponds to amino acid domain spans from a starting point of between about posi positions 2400-2500 of SEQID NO:2. tions 901 and 1200 of SEQ ID NO:4 (ORFB) to an ending point of between about positions 1325 and 1400 of SEQ ID Open Reading Frame B (OrfB): NO:4. The amino acid sequence containing the ORFB-AT The complete nucleotide sequence for OrfB is represented 55 herein as SEQID NO:3. OrfB is a 6177 nucleotide sequence domain is represented herein as SEQ ID NO:24 (positions (not including the stop codon) which encodes a 2059 amino 901-1400 of SEQ ID NO:4). It is noted that the ORFB-AT acid sequence, represented herein as SEQ ID NO:4. Within domain contains an AT active site motif of GXS*XG (*acyl OrfB are four domains: binding site Sao). (a) B-ketoacyl-ACP synthase (KS) domain; 60 ORFB-ER (b) one chain length factor (CLF) domain; The fourth domain in OrfB is an ER domain, also referred (c) one acyl transferase (AT) domain; to herein as ORFB-ER. This domain is contained within the (d) one enoyl ACP-reductase (ER) domain. nucleotide sequence spanning from a starting point of about The domains contained within ORFB have been deter position 4648 of SEQID NO:3 (OrfB) to an ending point of mined based on: (1) results of an analysis with Pfam program, 65 about position 6177 of SEQ ID NO:3. The nucleotide described above; and/or (2) homology comparison to bacte sequence containing the sequence encoding the ORFB-ER rial PUFA-PKS systems (e.g., Shewanella) using a BLAST domain is represented herein as SEQ ID NO:25 (positions US 7,842,796 B2 65 66 4648-6177 of SEQID NO:3). The amino acid sequence con sequence containing the sequence encoding the ORFC-ER taining the ER domain spans from a starting point of about domain is represented herein as SEQ ID NO:31 (positions position 1550 of SEQID NO:4 (ORFB) to an ending point of 2998-4509 of SEQID NO:5). The amino acid sequence con about position 2059 of SEQ ID NO:4. The amino acid taining the ER domain spans from a starting point of about sequence containing the ORFB-ER domain is represented 5 position 1000 of SEQID NO:6 (ORFC) to an ending point of herein as SEQ ID NO:26 (positions 1550-2059 of SEQ ID about position 1502 of SEQ ID NO:6. The amino acid NO:4). sequence containing the ORFC-ER domain is represented Open Reading Frame C (OrfQ): herein as SEQ ID NO:32 (positions 1000-1502 of SEQ ID The complete nucleotide sequence for Orf is represented NO:6). herein as SEQID NO:5. Orf is a 4509 nucleotide sequence 10 (not including the stop codon) which encodes a 1503 amino Example 2 acid sequence, represented herein as SEQ ID NO:6. Within Orfare three domains: The following example describes the use of the screening (a) two Fab A-like B-hydroxy acyl-ACP dehydrase (DH) process of the present invention to identify three other non domains; 15 bacterial organisms comprising a PUFA PKS system accord (b) one enoyl ACP-reductase (ER) domain. ing to the present invention. The domains contained within ORFC have been deter Thraustochytrium sp. 23B (ATCC 20892) was cultured mined based on: (1) results of an analysis with Pfam program, according to the screening method described in U.S. Provi described above; and/or (2) homology comparison to bacte sional Application Ser. No. 60/298.796 and as described in rial PUFA-PKS systems (e.g., Shewanella) using a BLAST detail herein. 2.0 Basic BLAST homology search, also described above. The biorational screen (using shake flask cultures) devel Sequences provided for individual domains are believed to oped for detecting microorganisms containing PUFA produc contain the full length of the sequence encoding a functional ing PKS systems is as follows: domain, and may contain additional flanking sequence within Two mL of a culture of the strain/microorganism to be the Orf. 25 tested is placed in 250 mL baffled shake flask with 50 mL culture media (aerobic treatment) and another 2 mL of culture ORFC-DH1 of the same strain is placed in a 250 mL non-baffled shake The first domain in Orf is a DH domain, also referred to flask with 200 mL culture medium (anoxic treatment). Both herein as ORFC-DH1. This is one of two DH domains in flasks are placed on a shaker table at 200 rpm. After 48-72 hr Orf , and therefore is designated DH1. This domain is con 30 of culture time, the cultures are harvested by centrifugation tained within the nucleotide sequence spanning from a start and the cells analyzed for fatty acid methyl esters via gas ing point of between about positions 1 and 778 of SEQ ID chromatography to determine the following data for each NO:5 (OrfQ) to an ending point of between about positions culture: (1) fatty acid profile; (2) PUFA content; (3) fat con 1233 and 1350 of SEQ ID NO:5. The nucleotide sequence tent (estimated as amount total fatty acids (TFA)). containing the sequence encoding the ORFC-DH1 domain is 35 These data are then analyzed asking the following five represented herein as SEQ ID NO:27 (positions 1-1350 of questions: SEQID NO:5). The amino acid sequence containing the DH1 domain spans from a starting point of between about posi Selection Criteria Low O/Anoxic Flask Vs. Aerobic Flask tions 1 and 260 of SEQID NO:6 (ORFC) to an ending point (Yes/No) of between about positions 411 and 450 of SEQID NO:6. The 40 (1) Did the DHA (or other PUFA content) (as % FAME) amino acid sequence containing the ORFC-DH1 domain is stay about the same or preferably increase in the low oxygen represented hereinas SEQIDNO:28 (positions 1-450 of SEQ culture compared to the aerobic culture? ID NO:6). (2) Is C14:0+C16:0+C16:1 greater than about 40%TFA in ORFC-DH2 the anoxic culture? The second domain in Orf is a DH domain, also referred 45 (3) Is there very little (>1% as FAME) or no precursors to herein as ORFC-DH2. This is the second of two DH (C18:3n-3+C 18:2n-6+C 18:3n-6) to the conventional oxygen domains in Orfc, and therefore is designated DH2. This dependent elongase? desaturase pathway in the anoxic cul domain is contained within the nucleotide sequence spanning ture? from a starting point of between about positions 1351 and (4) Did fat content (as amount total fatty acids/cell dry 2437 of SEQID NO:5 (Orf) to an ending point of between 50 weight) increase in the low oxygen culture compared to the about positions 2607 and 2850 of SEQID NO:5. The nucle aerobic culture? otide sequence containing the sequence encoding the ORFC (5) Did DHA (or other PUFA content) increase as % cell DH2 domain is represented herein as SEQID NO:29 (posi dry weight in the low oxygen culture compared to the aerobic tions 1351-2850 of SEQID NO:5). The amino acid sequence 55 culture? containing the DH2 domain spans from a starting point of If first three questions are answered yes, there is a good between about positions 451 and 813 of SEQ ID NO:6 indication that the strain contains a PKS genetic system for (ORFC) to an ending point of between about positions 869 making long chain PUFAs. The more questions that are and 950 of SEQID NO:6. The amino acid sequence contain answered yes (preferably the first three questions must be ing the ORFC-DH2 domain is represented herein as SEQID 60 answered yes), the stronger the indication that the strain con NO:30 (positions 451-950 of SEQID NO:6). tains such a PKS genetic system. If all five questions are ORFC-ER answered yes, then there is a very strong indication that the The third domain in Orf is an ER domain, also referred to strain contains a PKS genetic system for making long chain herein as ORFC-ER. This domain is contained within the PUFAS. nucleotide sequence spanning from a starting point of about 65 Following the method outlined above, a frozen vial of position 2998 of SEQID NO:5 (Orf) to an ending point of Thraustochytrium sp. 23B (ATCC 20892) was used to inocu about position 4509 of SEQ ID NO:5. The nucleotide late a 250 mL shake flask containing 50 mL of RCA medium. US 7,842,796 B2 67 68 The culture was shaken on a shaker table (200 rpm) for 72 hr Finally, a recombinant genomic library, consisting of DNA at 25°C. RCA medium contains the following: fragments from Thraustochytrium 23B genomic DNA inserted into vector lambda FIXII (Stratagene), was screened using digoxigenin labeled probes corresponding to the fol lowing segments of Schizochytrium 20888 PUFA-PKS RCA Medium genes: nucleotides 7385-7879 of OrfA (SEQ ID NO:1), Deionized water 1000 mL. nucleotides 5012-5511 of OrfB (SEQID NO:3), and nucle Reef Crystals (R) sea salts 40 g/L otides 76-549 of Orf (SEQID NO:5). Each of these probes Glucose 20 g/L detected positive plaques from the Thraustochytrium 23B Monosodium glutamate (MSG) 20 g/L Yeast extract 1 g/L 10 library, indicating extensive homology between the PII metals 5 mLL Schizochytrium PUFA-PKS genes and the genes of Thraus Vitamin mix 1 mL.L tochytrium 23B. pH 7.0 In Summary, these results demonstrate that Thraus PII metal mix and vitamin mix are same as those outlined in U.S. Pat, No. 5,130,742, tochytrium 23B genomic DNA contains sequences that are incorporated herein by reference in its entirety, 15 homologous to PKS genes from Schizochytrium 20888. 25 mL of the 72 hr old culture was then used to inoculate This Thraustochytrid microorganism is encompassed another 250 mL shake flask containing 50 mL of low nitrogen herein as an additional sources of these genes for use in the RCA medium (10 g/L MSG instead of 20 g/L) and the other embodiments above. 25 mL of culture was used to inoculate a 250 mL shake flask Thraustochytrium 23B (ATCC 20892) is significantly dif containing 175 mL of low-nitrogen RCA medium. The two ferent from Schizochytrium sp. (ATCC 20888) in its fatty acid flasks were then placed on a shaker table (200 rpm) for 72 hr profile. Thraustochytrium 23B can have DHA:DPA(n-6) at 25°C. The cells were then harvested via centrifugation and ratios as high as 14:1 compared to only 2-3:1 in dried by lyophilization. The dried cells were analyzed for fat Schizochytrium (ATCC 20888). Thraustochytrium 23B can content and fatty acid profile and content using standard gas also have higher levels of C20:5(n-3). Analysis of the 25 domains in the PUFA PKS system of Thraustochytrium 23B chromatograph procedures (such as those outlined in U.S. in comparison to the known Schizochytrium PUFA PKS sys Pat. No. 5,130,742). tem should provideus with key information on how to modify The screening results for Thraustochytrium 23B were as these domains to influence the ratio and types of PUFA pro follows: duced using these systems. 30 The screening method described above has been utilized the identify other potential candidate strains containing a Did DHA as % FAME increase? Yes (38->44%) PUFA PKS system. Two additional strains that have been C14:0 + C16:0 + C16:1 greater than about 40% Yes (44%) identified by the present inventors to have PUFA PKS sys TFA tems are Schizochytrium limacium (SR21) Honda & Yokochi No C18:3(n-3) or C18:3(n-6)? Yes (0%) Did fat content increase? Yes (2-fold increase) 35 (IFO32693) and Ulkenia (BP-5601). Both were screened as Did DHA (or other HUFA content increase)? Yes (2.3-fold increase) above but in N2 media (glucose: 60 g/L. KHPO: 4.0 g/l; yeast extract: 1.0 g/L; corn steep liquor: 1 mL/L. NHNO: 1.0 g/L. artificial sea salts (Reef Crystals): 20 g/L.; all above The results, especially the significant increase in DHA concentrations mixed in deionized water). For both the content (as % FAME) under low oxygen conditions, condi 40 Schizochytrium and Ulkenia strains, the answers to the first tions, strongly indicates the presence of a PUFA producing three screen questions discussed above for Thraustochytrium PKS system in this strain of Thraustochytrium. 23B was yes (Schizochytrium DHA % FAME 32->41% In order to provide additional data confirming the presence aerobic vs anoxic, 58% 14:0/16:0/16:1, 0% precursors) and of a PUFA PKS system, southern blot of Thraustochytrium (Ulkenia DHA% FAME 28->44% aerobic vs anoxic, 63% 23B was conducted using PKS probes from Schizochytrium 45 14:0/16:0/16:1, 0% precursors), indicating that these strains strain 20888, a strain which has already been determined to are good candidates for containing a PUFA PKS system. contain a PUFA producing PKS system (i.e., SEQID Nos: 1 Negative answers were obtained for the final two questions 32 described above). Fragments of Thraustochytrium 23B for each strain: fat decreased from 61% dry wt to 22% dry genomic DNA which are homologous to hybridization weight, and DHA from 21-9% dry weight in S. limacium and probes from PKS PUFA synthesis genes were detected using 50 fat decreased from 59 to 21% dry weight in Ulkenia and DHA the Southern blot technique. Thraustochytrium 23B genomic from 16% to 9% dry weight. These Thraustochytrid microor DNA was digested with either ClaI or KpnI restriction endo ganisms are also claimed herein as additional Sources of the nucleases, separated by agarose gel electrophoresis (0.7% genes for use in the embodiments above. agarose, in standard Tris-Acetate-EDTA buffer), and blotted to a Schleicher &Schuell Nytran Supercharge membrane by 55 Example 3 capillary transfer. Two digoxigenin labeled hybridization probes were used—one specific for the Enoyl Reductase (ER) The following example demonstrates that DHA and DPA region of Schizochytrium PKS OrfB (nucleotides 5012-5511 synthesis in Schizochytrium does not involve membrane of OrfB: SEQIDNO:3), and the other specific for a conserved bound desaturases or fatty acid elongation enzymes like those region at the beginning of Schizochytrium PKS Orf (nucle 60 described for other eukaryotes (Parker-Barnes et al., 2000, otides 76-549 of Orf'; SEQID NO:5). supra; Shanklin et al., 1998, supra). The OrfB-ER probe detected an approximately 13 kb ClaI Schizochytrium accumulates large quantities of triacylg fragment and an approximately 3.6 kb KpnI fragment in the lycerols rich in DHA and docosapentaenoic acid (DPA; Thraustochytrium 23B genomic DNA. The Orf probe 22:5c)6); e.g., 30% DHA+DPA by dry weight. In eukaryotes detected an approximately 7.5 kb ClaI fragment and an 65 that synthesize 20- and 22-carbon PUFAs by an elongation/ approximately 4.6 kb KpnI fragment in the Thraustochytrium desaturation pathway, the pools of 18-, 20- and 22-carbon 23B genomic DNA. intermediates are relatively large so that in vivo labeling US 7,842,796 B2 69 70 experiments using 'C-acetate reveal clear precursor-prod by Vortexing with glass beads. The cell-free homogenate was uct kinetics for the predicted intermediates. Furthermore, centrifuged at 100,000 g for 1 hour. Equivalent aliquots of radiolabeled intermediates provided exogenously to Such total homogenate, pellet (H-S pellet), and Supernatant organisms are converted to the final PUFA products. (H-S Super) fractions were incubated in homogenization 1-Clacetate was supplied to a 2-day-old culture as a buffer supplemented with 20 uM acetyl-CoA, 100 uM single pulse at Zero time. Samples of cells were then harvested by centrifugation and the lipids were extracted. In addition, 1- Cmalonyl-CoA (0.9 Gbc/mol), 2 mM NADH, and 2 1-Clacetate uptake by the cells was estimated by measur mM NADPH for 60 min at 25°C. Assays were extracted and ing the radioactivity of the sample before and after centrifu fatty acid methyl esters were prepared and separated as gation. Fatty acid methyl esters derived from the total cell 10 described above before detection of radioactivity with an lipids were separated by AgNO-TLC (solvent, hexane:di- Instantimager (Packard Instruments, Meriden, Conn.). ethyl ether:acetic acid, 70:30:2 by volume). The identity of Results showed that a cell-free homogenate derived from the fatty acid bands was verified by gas chromatography, and Schizochytrium cultures incorporated 1-''C-malonyl-CoA the radioactivity in them was measured by Scintillation count- into DHA, DPA, and saturated fatty acids (data not shown). ing. Results showed that 1- C -acetate was rapidly taken up 15 The same biosynthetic activities were retained by a 100, by Schizochytrium cells and incorporated into fatty acids, but 000xg supernatant fraction but were not present in the mem at the shortest labeling time (1 min) DHA contained 31% of brane pellet. These data contrast with those obtained during the label recovered in fatty acids and this percentage assays of the bacterial enzymes (see Metz et al., 2001, Supra) rsyst essentially his the E.M. and may indicate use of a different (soluble) acyl acceptor ""C-acetate incorporation and t e subsequent 'S', ' molecule. Thus, DHA and DPA synthesis in Schizochytrium culture growth (data not shown). Similarly, DPA represented d 10%O of the label throughout the experiment. There is no oes not involve membrane-bound desaturases or fatty acid evidence for a precursor-product relationship between 16- or elongation enzymes like those described for other eukaryotes. 18-carbon fatty acids and the 22-carbon polyunsaturated fatty While various embodiments of the present invention have acids. These results are consistent with rapid synthesis of 25 been described in detail, it is apparent that modifications and DHA from '''C-acetate involving very small (possibly adaptations of those embodiments will occur to those skilled enzyme-bound) pools of intermediates. in the art. It is to be expressly understood, however, that such Next, cells were disrupted in 100 mM phosphate buffer (pH modifications and adaptations are within the scope of the 7.2), containing 2 mM DTT, 2 mM EDTA, and 10% glycerol, present invention, as set forth in the following claims.

SEQUENCE LISTING

<16 Os NUMBER OF SEO ID NOS: 37

<21 Os SEQ ID NO 1 &211s LENGTH: 873 O &212s. TYPE: DNA <213> ORGANISM: Schizochytrium sp. 22 Os. FEATURE: <221s NAME/KEY: CDS <222s. LOCATION: (1) . . (8730)

<4 OOs SEQUENCE: 1 atg gcg gcc ct ctg cag gag caa aag gga ggc gag atg gat acc cc 48 Met Ala Ala Arg Lieu. Glin Glu Glin Lys Gly Gly Glu Met Asp Thr Arg 1. 5 1O 15

att gcc atc at C ggc atgtcg gcc atc Ct c ccc tic ggc acg acc gtg 96 Ile Ala Ile Ile Gly Met Ser Ala Ile Leu Pro Cys Gly. Thir Thr Val 2O 25 3 O

cgc gag ticg tdg gag acc atc. c9c gcc ggc at C gac tec Citg tcg gat 144 Arg Glu Ser Trp Glu Thir Ile Arg Ala Gly Ile Asp Cys Lieu. Ser Asp 35 4 O 45

ctic ccc gag gaC cc gtC gac gitg acg gcg tac titt gaC CCC gtc. aag 192 Lieu Pro Glu Asp Arg Val Asp Val Thr Ala Tyr Phe Asp Pro Val Lys SO 55 60

acc acc aag gaC aag at C tac to aag cqc ggt ggc titc att CCC gag 24 O Thir Thr Lys Asp Lys Ile Tyr Cys Lys Arg Gly Gly Phe Ile Pro Glu

tac gaC titt gaC gcc cqc gag titc gga Ct c aac atgtt C Cag atg gag 288 Tyr Asp Phe Asp Ala Arg Glu Phe Gly Lieu. Asn Met Phe Gln Met Glu 85 90 95

gac tog gac goa aac Cag acc atc. tcg Ctt ct c aag gtc. aag gag gcc 336 Asp Ser Asp Ala Asn Glin Thir Ile Ser Lieu. Lieu Lys Wall Lys Glu Ala 1OO 105 110

US 7,842,796 B2 89

- Continued

2825 283 O 2835 cgc cgc acg Ctic ggc cag gCt gcg ct C ccc aac tog at C Cag cqc 8.559 Arg Arg Thr Lieu. Gly Glin Ala Ala Lieu Pro Asn. Ser Ile Glin Arg 284 O 284.5 285 O atc gtc. cag cac cqc cog gtc. cc.g. cag gac aag ccc titc tac att 86O4 Ile Val Gln His Arg Pro Val Pro Glin Asp Llys Pro Phe Tyr Ile 2855 286 O 2865 acc ct c cqc toc aac cag tog ggc ggit cac toc cag cac aag cac 8649 Thr Lieu. Arg Ser Asn Glin Ser Gly Gly His Ser Gln His Llys His 2870 2875 288O gcc citt cag titc. cac aac gag cag ggc gat citc titc att gat gtc 86.94 Ala Lieu. Glin Phe His Asn. Glu Glin Gly Asp Lieu. Phe Ile Asp Wall 2.885 289 O 2.895

Cag gct tcg gtc at C gcc acg gac agc ctit gcc titc 873 O Glin Ala Ser Val Ile Ala Thr Asp Ser Lieu Ala Phe 29 OO 29 OS 291. O

<210s, SEQ ID NO 2 &211s LENGTH: 291 O 212. TYPE: PRT <213> ORGANISM: Schizochytrium sp.

<4 OOs, SEQUENCE: 2 Met Ala Ala Arg Lieu. Glin Glu Glin Lys Gly Gly Glu Met Asp Thr Arg 1. 5 1O 15 Ile Ala Ile Ile Gly Met Ser Ala Ile Leu Pro Cys Gly. Thir Thr Val 2O 25 3O Arg Glu Ser Trp Glu Thir Ile Arg Ala Gly Ile Asp Cys Lieu. Ser Asp 35 4 O 45 Lieu Pro Glu Asp Arg Val Asp Val Thir Ala Tyr Phe Asp Pro Val Lys SO 55 6 O Thir Thr Lys Asp Lys Ile Tyr Cys Lys Arg Gly Gly Phe Ile Pro Glu 65 70 7s 8O Tyr Asp Phe Asp Ala Arg Glu Phe Gly Lieu. Asn Met Phe Gln Met Glu 85 90 95 Asp Ser Asp Ala Asn Glin Thir Ile Ser Lieu. Lieu Lys Wall Lys Glu Ala 1OO 105 11 O Lieu. Glin Asp Ala Gly Ile Asp Ala Lieu. Gly Lys Glu Lys Lys Asn. Ile 115 12 O 125 Gly Cys Val Lieu. Gly Ile Gly Gly Gly Glin Lys Ser Ser His Glu Phe 13 O 135 14 O Tyr Ser Arg Lieu. Asn Tyr Val Val Val Glu Lys Val Lieu. Arg Llys Met 145 150 155 160 Gly Met Pro Glu Glu Asp Wall Lys Val Ala Val Glu Lys Tyr Lys Ala 1.65 17O 17s Asn Phe Pro Glu Trp Arg Lieu. Asp Ser Phe Pro Gly Phe Leu Gly Asn 18O 185 19 O Val Thir Ala Gly Arg Cys Thr Asn. Thir Phe Asn Lieu. Asp Gly Met Asn 195 2OO 2O5 Cys Val Val Asp Ala Ala Cys Ala Ser Ser Lieu. Ile Ala Val Llys Val 21 O 215 22O Ala Ile Asp Glu Lieu. Lieu. Tyr Gly Asp Cys Asp Met Met Val Thr Gly 225 23 O 235 24 O Ala Thr Cys Thr Asp Asn Ser Ile Gly Met Tyr Met Ala Phe Ser Lys 245 250 255 Thr Pro Val Phe Ser Thr Asp Pro Ser Val Arg Ala Tyr Asp Glu Lys US 7,842,796 B2 91 92

- Continued

26 O 265 27 O

Thir Gly Met Lell Ile Gly Glu Gly Ser Ala Met Lell Wall Luell 27s 28O 285

Arg Tyr Ala Asp Ala Wall Arg Asp Gly Asp Glu Ile His Ala Wall Ile 29 O 295 3 OO

Arg Gly Ala Ser Ser Ser Asp Gly Ala Ala Gly Ile Thir 3. OS 310 315

Pro Thir Ile Ser Gly Glin Glu Glu Ala Luell Arg Arg Ala Asn Arg 3.25 330 335

Ala Wall Asp Pro Ala Thir Wall Thir Luell Wall Glu Gly His Gly Thir 34 O 345 35. O

Gly Thir Pro Wall Gly Asp Arg Ile Glu Luell Thir Ala Lell Arg Asn Luell 355 360 365

Phe Asp Ala Tyr Gly Glu Gly Asn Thir Glu Lys Wall Ala Wall Gly 37 O 375

Ser Ile Ser Ser Ile Gly His Luell Ala Wall Ala Gly Lel Ala 385 390 395 4 OO

Gly Met Ile Wall Ile Met Ala Luell Lys His Thir Luell Gly 4 OS

Thir Ile Asn Wall Asp Asn Pro Pro Asn Luell Tyr Asp Asn Thir Ile 425 43 O

Asn Glu Ser Ser Lell Tyr Ile Asn Thir Met ASn Arg Pro Trp Phe Pro 435 44 O 445

Pro Pro Gly Wall Pro Arg Arg Ala Gly Ile Ser Ser Phe Gly Phe Gly 450 45.5 460

Gly Ala Asn Tyr His Ala Wall Luell Glu Glu Ala Glu Pro Glu His Thir 465 470

Thir Ala Arg Lell Asn Arg Pro Glin Pro Wall Lell Met Met Ala 485 490 495

Ala Thir Pro Ala Ala Lell Glin Ser Luell Glu Ala Glin Luell Glu SOO 505

Phe Glu Ala Ala Ile Glu Asn Glu Thir Wall Asn Thir Ala Tyr 515 525

Ile Lys Wall Lys Phe Gly Glu Glin Phe Phe Pro Gly Ser Ile 53 O 535 54 O

Pro Ala Thir Asn Ala Arg Lell Gly Phe Luell Wall Asp Ala Glu Asp 5.45 550 555 560

Ala Ser Thir Lell Arg Ala Ile Ala Glin Phe Ala Asp Wall 565 st O sts

Thir Glu Ala Trp Arg Lell Pro Arg Glu Gly Wall Ser Phe Arg Ala 585 59 O

Gly Ile Ala Thir Asn Gly Ala Wall Ala Ala Lell Phe Ser Gly Glin 595 605

Gly Ala Glin Tyr Thir His Met Phe Ser Glu Wall Ala Met Asn Trp Pro 610 615

Glin Phe Arg Glin Ser Ile Ala Ala Met Asp Ala Ala Glin Ser Wall 625 630 635 64 O

Ala Gly Ser Asp Lys Asp Phe Glu Arg Wall Ser Glin Wall Luell Tyr Pro 645 650 655

Arg Pro Tyr Glu Arg Glu Pro Glu Glin Asp His Lys Ile Ser 660 665 67 O

Lell Thir Ala Ser Glin Pro Ser Thir Luell Ala Ala Luell Gly Ala 675 68O 685 US 7,842,796 B2 93

- Continued

Phe Glu Ile Phe Lys Glu Ala Gly Phe Thr Pro Asp Phe Ala Ala Gly 69 O. 695 7 OO His Ser Lieu. Gly Glu Phe Ala Ala Lieu. Tyr Ala Ala Gly Cys Val Asp 7 Os 71O 71s 72O Arg Asp Glu Lieu. Phe Glu Lieu Val Cys Arg Arg Ala Arg Ile Met Gly 72 73 O 73 Gly Lys Asp Ala Pro Ala Thr Pro Lys Gly Cys Met Ala Ala Val Ile 740 74. 7 O Gly Pro Asn Ala Glu Asn. Ile Llys Val Glin Ala Ala Asn Val Trp Lieu. 7ss 760 765 Gly Asn Ser Asn Ser Pro Ser Glin Thr Val Ile Thr Gly Ser Val Glu 770 775 78O Gly Ile Glin Ala Glu Ser Ala Arg Lieu Gln Lys Glu Gly Phe Arg Val 78s 79 O 79. 8OO Val Pro Leu Ala Cys Glu Ser Ala Phe His Ser Pro Gln Met Glu Asn 805 810 815 Ala Ser Ser Ala Phe Lys Asp Val Ile Ser Llys Val Ser Phe Arg Thr 82O 825 83 O Pro Lys Ala Glu Thir Lys Lieu Phe Ser Asn Val Ser Gly Glu. Thir Tyr 835 84 O 845 Pro Thr Asp Ala Arg Glu Met Lieu. Thr Glin His Met Thr Ser Ser Val 850 855 860 Llys Phe Lieu. Thr Glin Val Arg Asn Met His Glin Ala Gly Ala Arg Ile 865 87O 87s 88O Phe Val Glu Phe Gly Pro Lys Glin Val Leu Ser Lys Lieu Val Ser Glu 885 890 895 Thr Lieu Lys Asp Asp Pro Ser Val Val Thr Val Ser Val Asin Pro Ala 9 OO 905 91 O Ser Gly Thr Asp Ser Asp Ile Glin Lieu. Arg Asp Ala Ala Val Glin Lieu. 915 92 O 925 Val Val Ala Gly Val Asn Lieu. Glin Gly Phe Asp Llys Trp Asp Ala Pro 93 O 935 94 O Asp Ala Thir Arg Met Glin Ala Ile Llys Llys Lys Arg Thir Thr Lieu. Arg 945 950 955 96.O Lieu. Ser Ala Ala Thr Tyr Val Ser Asp Llys Thr Llys Llys Val Arg Asp 965 97O 97. Ala Ala Met Asn Asp Gly Arg Cys Val Thr Tyr Lieu Lys Gly Ala Ala 98O 985 99 O Pro Lieu. Ile Lys Ala Pro Glu Pro Val Val Asp Glu Ala Ala Lys Arg 995 1OOO 1005 Glu Ala Glu Arg Lieu Gln Lys Glu Lieu. Glin Asp Ala Glin Arg Glin O1O O15 O2O Lieu. Asp Asp Ala Lys Arg Ala Ala Ala Glu Ala Asn. Ser Llys Lieu. O25 O3 O O35 Ala Ala Ala Lys Glu Glu Ala Lys Thr Ala Ala Ala Ser Ala Lys O4 O O45 OSO Pro Ala Val Asp Thir Ala Val Val Glu Lys His Arg Ala Ile Lieu. O55 O6 O O65 Llys Ser Met Lieu Ala Glu Lieu. Asp Gly Tyr Gly Ser Val Asp Ala Of O O7 O8O

Ser Ser Leul Glin Glin Glin Glin Glin Glin Glin. Thir Ala Pro Ala Pro O85 O9 O O95 US 7,842,796 B2 96

- Continued

Wall Ala Ala Ala Pro Ala Pro Wall Ala Ser Ala Pro Ala 10

Pro Wall Ser Asn Glu Luell Glu Ala Thir Wall Wall

Met Wall Lell Ala Ala Thir Gly Glu Asp Met Ile

Glu Asp Met Glu Luell Thir Glu Luell Gly Asp Ser Ile

Wall Glu Ile Luell Glu Wall Glin Ala Luell Asn Wall

Glu Asp Wall Asp Luell Ser Arg Thir Thir Wall Gly

Glu Wall Asn Ala Met Ala Glu Ile Ala Ser Ser Ala

Pro Pro Ala Ala Ala Pro Ala Pro Ala Ala Ala Pro

Ala Ala Ala Pro Ala Ser Asn Glu Lell Lell Glu Ala 23 O

Glu Wall Wall Met Glu Luell Ala Ala Gly Glu 245

Thir Met Ile Glu Ser Met Glu Luell Glu Glu Lell Gly 26 O

Ile Ser Ile Glu Ile Luell Ser Glu Wall Glin Ala 27s

Met Luell Asn Wall Glu Ala Asp Wall Asp Ala Lell Ser Arg Thir 28O 29 O

Arg Wall Gly Glu Wall Asn Ala Met Ala Glu Ile Ala 295 305

Gly Ser Ala Pro Ala Ala Ala Ala Ala Pro Gly Pro Ala 32O

Ala Ala Pro Ala Pro Ala Ala Ala Pro Ala Wall Ser Asn 335

Glu Lell Glu Ala Glu Thir Wall Wall Met Wall Lell Ala 345

Ala Thir Gly Glu Asp Met Ile Glu Asp Met Glu 360

Lell Glu Thir Glu Luell Gly Asp Ser Ile Wall Glu Ile 37O

Lell Ser Glu Wall Glin Ala Luell Asn Wall Glu Ala Asp Wall 385 395

Asp Lell Ser Arg Thir Thir Wall Gly Glu Wall Wall Asp Ala 41 O

Met Ala Glu Ile Ala Gly Ser Ala Pro Ala Pro Ala Ala 425

Ala Pro Ala Pro Ala Ala Ala Pro Ala Pro Ala Ala Pro 44 O

Ala Pro Ala Wall Ser Ser Luell Luell Glu Ala Glu Thir Wall 445 45.5

Wall Met Glu Wall Luell Ala Lys Thir Gly Glu Thir Asp Met 460 47 O

Ile Glu Ser Asp Met Glu Luell Glu Thir Glu Lell Ile Asp Ser 47s 48O 485

Ile Arg Wall Glu Ile Luell Ser Glu Wall Glin Ala Met Lell Asn US 7,842,796 B2 98

- Continued

490 SOO

Wall Glu Ala Asp Wall Ala Luell Ser Arg Thir Arg Thir Wall 5 OS 515

Gly Glu Wall Wall Asp Ala Ala Glu Ile Ala Gly Gly Ser 52O 53 O

Ala Pro Ala Pro Ala Ala Ala Pro Ala Pro Ala Ala Ala Ala 535 545

Pro Pro Ala Ala Pro Pro Ala Ala Pro Ala Pro Ala Wall 560

Ser Glu Lell Luell Glu Ala Glu Thir Wall Wall Met Glu Wall sts

Lell Ala Thir Gly Glu Thir Asp Met le Glu Ser Asp 590

Met Lell Glu Thir Glu Luell Gly Ile Asp Ser le Arg Wall 6OO 605

Glu Lell Ser Glu Wall Glin Ala Met Luell Asn Wall Glu Ala 615

Asp Asp Ala Luell Ser Thir Arg Thir Wall Glu Wall Wall 635

Asp Met Ala Glu Ala Gly Ser Ser Ala Ser Ala Pro 650

Ala Ala Ala Pro Ala Pro Ala Ala Ala Ala Pro Ala Pro Ala 660 665

Ala Ala Pro Ala Wall Ser Asn Glu Luell Lell Glu Ala Glu 675

Thir Wall Met Glu Wall Luell Ala Ala Thir Glu Thir 69 O. 695

Asp Met Ile Glu Ser Asp Met Glu Luell Glu Thir Luell Gly Ile 7 OO 7Os

Asp Ser Ile Arg Wall Ile Luell Ser Glu Wall Glin Ala Met 71s 72

Lell Asn Wall Glu Ala Lys Wall Asp Ala Lell Arg Thir Arg 73 O 74 O

Thir Gly Glu Wall Wall Ala Met Ala Glu Ile Ala Gly 7ss

Gly Ala Pro Ala Pro Ala Ala Ala Pro Ala Pro Ala Ala 770

Ala Pro Ala Wall Ser Glu Luell Luell Glu Ala Glu Thir

Wall Met Glu Wall Luell Ala Thir Gly Glu Thir Asp

Met Glu Ser Asp Met Luell Glu Thir Glu Lell Gly Ile Asp 815

Ser Arg Wall Glu Luell Ser Glu Wall Glin Ala Met Lell 83 O

Asn Glu Ala Asp Asp Ala Luell Ser Thir Arg Thir 845

Wall Glu Wall Wall Asp Met Ala Glu le Ala Gly Ser 86 O

Ser Pro Ala Pro Ala Ala Ala Pro Ala Pro Ala Ala Ala 87s

Ala Pro Ala Pro Ala Ala Ala Pro Ala Wall Ser Glu Lell 88O 890 US 7,842,796 B2 99 100

- Continued

Lell Glu Ala Glu Thir Wall Wall Met Glu Wall Lell Ala Ala 895 9 OO 905

Thir Glu Thir Asp Met Ile Glu Ser Asp Met Glu Lell Glu 910 915 92 O

Thir Glu Lell Gly Ile Asp Ser Ile Arg Wall Glu Ile Lell Ser 925 93 O 935

Glu Glin Ala Met Luell Asn. Wall Glu Ala Wall Asp Ala 94 O 945

Lell Ser Arg Thir Arg Thir Val Gly Glu Wall Wall Ala Met 955

Ala Glu Ile Ala Gly Gly Ser Ala Pro Ala Pro Ala Ala Ala Ala 97O 98 O

Pro Pro Ala Ala Ala Ala Pro Ala Wall Ser Asn Glu Lell Lell 995

Glu Ala Glu Thir Wall Wal Met Glu Wall Lell Ala Ala Thir 2010

Gly Glu Thir Asp Met le Glu Ser Asp Met Glu Luell Glu Thir 2O25

Glu Luell Gly Ile Asp Ser le Llys Arg Wall Glu Ile Luell Ser Glu 2O3O 2O4. O

Wall Glin Ala Met Luell Asn Wall Glu Ala Asp Wall Asp Ala Lell 2O45 2OSO 2O55

Ser Arg Thir Arg Thir Wall Gly Glu Wall Wall Asp Met Ala 2O60 2O65

Glu Ile Ala Gly Gly Ser Ala Pro Ala Pro Ala Ala Ala Pro 2O75 2O8 O

Ala Ser Ala Gly Ala Ala Pro Ala Wall Ile Ser Wall His 2O90 2095

Gly Asp Asp Asp Luell Ser Luell Met His Wall Wall 2 O 5

Asp Arg Arg Pro Asp Glu Luell Luell Glu Pro Glu Asn

Arg Wall Lell Wall Wall Asp Asp Ser Glu Thir Lell Ala

Lell Arg Wall Luell Gly Wall Wall Thir Phe Glu

Gly Glin Lell Ala Glin Arg Ala Ala Ala Ile Arg His

Wall Ala Asp Luell Ser Ala Ser Ala Ala Ile

Ala Glu Glin Arg Phe Gly Luell Gly Phe Ile Ser

Glin Ala Glu Arg Phe Gul Pro Glu Ile Lell Gly Phe Thir 222 O

Lell Met Ala Phe Ala Lys Ser Lell Cys Thir Ala Wall 2225 2235

Ala Gly Gly Arg Pro Ala Phe Ile Wall Ala Arg Luell Asp Gly 224 O 2.245 225 O

Arg Luell Gly Phe Thir Ser Gln Gly Thir Ser Asp Ala Luell Arg 2255 226 O 2265

Ala Glin Arg Gly Ala Ile Phe Gly Luell Thir Ile Gly Lell 2270 2275 228O US 7,842,796 B2 101 102

- Continued Glu Trp Ser Glu Ser Asp Wall Phe Ser Arg Gly Wall Asp Ile Ala 2285 229 O 2295

Glin Gly Met His Pro Glu Asp Ala Ala Wall Ala Ile Wall Arg Glu 23 OO 23 OS 2310

Met Ala Ala Asp Ile Arg Ile Arg Glu Wall Gly Ile Gly Ala 2315 232O 2325

Asn Glin Glin Arg Thir Ile Arg Ala Ala Lell Glu Thir Gly 233 O 2335 234 O

Asn Pro Glin Arg Glin Ile Ala Lys Asp Asp Wall Lell Luell Wall Ser 2345 2350 2355

Gly Gly Ala Arg Gly Ile Thir Pro Luell Cys Ile Arg Glu Ile Thir 2360 23.65 2370

Arg Glin Ile Ala Gly Gly Lys Tyr Ile Luell Lell Gly Arg Ser 2375 238O 23.85

Wall Ser Ala Ser Glu Pro Ala Trp Ala Gly Ile Thir Asp Glu 23.90 23.95 24 OO

Ala Wall Glin Ala Ala Thir Glin Glu Lell Lys Arg Ala Phe 24 O5 241. O 24.15

Ser Ala Gly Glu Gly Pro Lys Pro Thir Pro Arg Ala Wall Thir 242O 24.25 243 O

Lell Wall Gly Ser Wall Luell Gly Ala Arg Glu Wall Arg Ser Ser Ile 2435 244 O 2445

Ala Ala Ile Glu Ala Luell Gly Gly Ala Ile Tyr Ser Ser 2450 2455 246 O

Asp Wall Asn Ser Ala Ala Asp Wall Ala Ala Wall Arg Asp Ala 24 65 2470 2475

Glu Ser Glin Lell Gly Ala Arg Wall Ser Gly Ile Wall His Ala Ser 248O 2485 249 O

Gly Wall Lell Arg Asp Arg Luell Ile Glu Lys Lell Pro Asp Glu 2495 25 OO 2505

Phe Asp Ala Wall Phe Gly Thir Wall Thir Gly Lell Glu Asn Lell 251O 25.15 252O

Lell Ala Ala Wall Asp Arg Ala Asn Luell Lys His Met Wall Lell Phe 2525 253 O 25.35

Ser Ser Lell Ala Gly Phe His Gly Asn Wall Gly Glin Ser Asp 254 O 25.45 2550

Ala Met Ala Asn Glu Ala Luell Asn Met Gly Lell Glu Lell Ala 2555 2560 25.65

Asp Wall Ser Wall Ser Ile Phe Gly Pro Trp Asp Gly 2570 27s 2580

Gly Met Wall Thir Pro Glin Luell Glin Phe Glin Glu Met Gly 2585 2590 2595

Wall Glin Ile Ile Pro Arg Glu Gly Gly Ala Asp Thir Wall Ala Arg 26 OO 2605 261 O

Ile Wall Lell Gly Ser Ser Pro Ala Glu Ile Lell Wall Gly Asn Trp 2615 262O 262s

Arg Thir Pro Ser Wall Gly Ser Asp Thir Ile Thir Lell His 263 O 2635 264 O

Arg Lys Ile Ser Ala Ser Asn Pro Phe Lell Glu Asp His Wall 2645 2650 2655

Ile Glin Gly Arg Arg Wall Luell Pro Met Thir Lell Ala Ile Gly Ser 266 O 2665 2670

Lell Ala Glu Thir Luell Gly Luell Phe Pro Gly Ser Lell Trp US 7,842,796 B2 103 104

- Continued

2675 268O 2685 Ala Ile Asp Asp Ala Glin Lieu. Phe Lys Gly Val Thr Val Asp Gly 2690 2695 27 OO Asp Val Asn Cys Glu Val Thr Lieu. Thr Pro Ser Thr Ala Pro Ser 27 OS 271 O 2715 Gly Arg Val Asn Val Glin Ala Thr Lieu Lys Thr Phe Ser Ser Gly 272O 2725 273 O Llys Lieu Val Pro Ala Tyr Arg Ala Val Ile Val Lieu. Ser Asn. Glin 2735 274 O 2745 Gly Ala Pro Pro Ala Asn Ala Thr Met Gln Pro Pro Ser Lieu. Asp 2750 27s 276 O Ala Asp Pro Ala Lieu. Glin Gly Ser Val Tyr Asp Gly Lys Thr Lieu. 2765 2770 2775 Phe His Gly Pro Ala Phe Arg Gly Ile Asp Asp Val Lieu. Ser Cys 2780 2785 279 O Thir Lys Ser Glin Lieu Val Ala Lys Cys Ser Ala Val Pro Gly Ser 2.79s 28OO 2805 Asp Ala Ala Arg Gly Glu Phe Ala Thr Asp Thr Asp Ala His Asp 2810 2815 282O Pro Phe Val Asn Asp Lieu Ala Phe Glin Ala Met Lieu Val Trp Val 2825 283 O 2835 Arg Arg Thr Lieu. Gly Glin Ala Ala Lieu Pro Asn. Ser Ile Glin Arg 284 O 284.5 285 O Ile Val Gln His Arg Pro Val Pro Glin Asp Llys Pro Phe Tyr Ile 2855 286 O 2865 Thr Lieu. Arg Ser Asn Glin Ser Gly Gly His Ser Gln His Llys His 2870 2875 288O Ala Lieu. Glin Phe His Asn. Glu Glin Gly Asp Lieu. Phe Ile Asp Wall 2.885 289 O 2.895 Glin Ala Ser Val Ile Ala Thr Asp Ser Lieu Ala Phe 29 OO 29 OS 291. O

<210s, SEQ ID NO 3 &211s LENGTH: 6.177 &212s. TYPE: DNA <213> ORGANISM: Schizochytrium sp. 22 Os. FEATURE: <221s NAME/KEY: CDS <222s. LOCATION: (1) . . (61.77)

<4 OOs, SEQUENCE: 3 atg gcc gct cqg aat gtg agc gcc gcg cat gag atg cac gat gala aag 48 Met Ala Ala Arg Asn. Wal Ser Ala Ala His Glu Met His Asp Glu Lys 1. 5 1O 15 cgc at C gcc gtC gtc. g.gc atg gcc gt C cag tac gcc gga tigc aaa acc 96 Arg Ile Ala Val Val Gly Met Ala Val Glin Tyr Ala Gly Cys Llys Thr 2O 25 3O aag gac gag titc tig gag gtg Ct c atgaac ggc aag gtc gag to C aag 144 Lys Asp Glu Phe Trp Glu Val Lieu Met Asn Gly Llys Val Glu Ser Lys 35 4 O 45 gtg at C agc gac alaa C9a ctic ggc tic C aac tac cqc gcc gag cac tac 192 Val Ile Ser Asp Lys Arg Lieu. Gly Ser Asn Tyr Arg Ala Glu. His Tyr SO 55 6 O aaa gCa gag cqc agc aag tat gcc gac acc titt to aac gala acg tac 24 O Lys Ala Glu Arg Ser Llys Tyr Ala Asp Thir Phe Cys Asn. Glu Thir Tyr 65 70 7s 8O ggc acc Ctt gac gag aac gag at C gaC aac gag cac gaa ct c ct c ct c 288

US 7,842,796 B2 113 114

- Continued

Lell Glu Glin Ala Phe Pro Glin Ile Wall Glu. Thir Ile 335 tac Cala aac tac gac titt gt C gag gtt ggg ccc aac aac 4 OS 9 Glin Asn Asp Phe Wall Glu Wall Gly Pro Asn. Asn 350

CaC agc a CC gca gtg acc acg citt ggit c cc cag cqc aac 4104 His Ser Thir Ala Wall Thir Thir Luell Gly Pro Glin Arg Asn 365

CaC gct ggc gcc at C aag cag aac gag gat gct tdg acg 41.49 His Ala Gly Ala Ile Lys Glin ASn Glu Asp Ala Trp Thr 38O a CC gtc aag citt gtg tog citc. aag gcc cac citt gtt cot 41.94 Thir Wall Lys Luell Wall Ser Luell Lys Ala His Lieu. Wall Pro 395 ggc acg atc. tog cc.g tac CaC to c aag Ctt gtg gC9 gag 4239 Gly Thir Ile Ser Pro Tyr His Ser Lys Lieu Wall Ala Glu 41 O gct gct tgc tac gct citc. tgc aag ggit gala aag CCC aag 4284 Ala Ala Cys Ala Luell Cys Lys Gly Glu Lys Pro Llys 425 aag aag titt gtg cgc att cag citc. aac ggt cqc titc aac 4329 Lys Lys Phe Wall Arg Ile Glin Luell Asn Gly Arg Phe Asn 44 O agc gcg gac cc c at C tog gcc gat citt gcc agc titt CC9 4374 Ser Ala Asp Pro Ile Ser Ala Asp Lell Ala Ser Phe Pro 45.5 cott gac cott gcc att gcc gcc atc. tcg agc cgc atc atg 4 419 Pro Asp Pro Ala Ile Ala Ala Ile Ser Ser Arg Ile Met 47 O aag gtc gct cc c aag tac gcg cgt. citc. aac att gac gag 4 464 Lys Wall Ala Pro Lys Ala Arg Lell Asn. Ile Asp Glu 485

Cag gag a CC cga gat at C citc. aac aag gac aac gcg cc.g 4509 Glin Glu Thir Arg Asp Ile Luell ASn Lys Asp Asn Ala Pro SOO tot tot tot tot tot tot tot tot tot tot tot tot tot 45.54 Ser Ser Ser Ser Ser Ser Ser Ser Ser Ser Ser Ser Ser 515 cc.g cott gct cott tog cc c gtg Cala aag aag gct gct C cc 45.99 Pro Pro Ala Pro Ser Pro Wall Glin Lys Lys Ala Ala Pro 53 O gcc gag a CC aag gct gct tog gct gac gca Ctt CC agt 4644 Ala Glu Thir Lys Ala Ala Ser Ala Asp Ala Lieu. Arg Ser 545 gcc citc. gat citc. gac atg citt gC9 Ctg agc tict gcc agt 4689 Ala Lell Asp Luell Asp Met Luell Ala Lell Ser Ser Ala Ser 560 gcc ggc aac citt gtt act gcg cott agc gac goc ticg gtc 4734 Ala Gly Asn Luell Wall Thir Ala Pro Ser Asp Ala Ser Val sts att cc.g c cc tgc aac gcg gat citc. ggc agc cgc gcc titc 4779 Ile Pro Pro Cys Asn Ala Asp Luell Gly Ser Arg Ala Phe 590 atg acg tac ggt gtt gcg cott Ctg tac acg ggc gcc atg 4824 Met Thir Gly Wall Ala Pro Luell Thr Gly Ala Met 605 gcc ggc att gcc tot gac citc. gtc att gcc gcc ggc cc 486.9 Ala Gly Ile Ala Ser Asp Luell Wall Ile Ala Ala Gly Arg 62O

US 7,842,796 B2 117 118

- Continued citc. gala gag gtc tig gac gag acc aaa aac ttt tac att aac C9t 5814 Lell Glu Glu Val Trp Asp Glu Thir Lys Asn. Phe Tyr Ile Asn Arg 925 93 O 935 citt aac Ccg gag aag at C cag cqc gcc gag cqc gaC ccc aag 5859 Lell Asn Pro Glu Lys Ile Glin Arg Ala Glu Arg Asp Pro Llys 945 950 citc. atgtcg ctg. tcc titt cqc tog tac Citg agc ctg gcg agc 5904 Lell Met Ser Lieu. Cys Phe Arg Trp Tyr Lieu Ser Leu Ala Ser 96.O 965 cgc gcc aac act gga gct tcc gat cqc gtc atg gac tac Cag 5949 Arg Ala Asn Thr Gly Ala Ser Asp Arg Val Met Asp Tyr Glin 97. 98 O gtc tgc ggit cot goc att ggit to c titc aac gat titc atc aag 5994 Wall Cys Gly Pro Ala Ile Gly Ser Phe Asn Asp Phe Ile Llys 990 995 gga tac Ctt gat cog gcc gtC goa aac gag tac Cog tec gtc Gly Tyr Lieu. Asp Pro Ala Val Ala Asn. Glu Tyr Pro Cys Val 2005 2010 gtt cag att aac aag cag at C Ctt cqt gga gcg to ttic titg cgc 6084 Wall Ile Asn Lys Glin Ile Lieu. Arg Gly Ala Cys Phe Lieu. Arg 2015 2O2O 2O25 cgt citc. gaa att Ctg cgc aac gca cqc ctt toc gat ggc gct gcc 6129 Arg Luell Glu Ile Lieu. Arg Asn Ala Arg Lieu. Ser Asp Gly Ala Ala 2O3O 2O35 2O4. O gct citt gtg gcc agc at c gat gac aca tac gtc. cc.g. gcc gag aag 6174 Ala Luell Val Ala Ser Ile Asp Asp Thr Tyr Val Pro Ala Glu Lys 2O45 2OSO 2O55

Ctg 6.177 Lell

<210s, SEQ ID NO 4 &211s LENGTH: 2O59 212. TYPE : PRT &213s ORGANISM: Schizochytrium sp.

<4 OOs, SEQUENCE: 4 Met Ala Ala Arg Asn Val Ser Ala Ala His Glu Met His Asp Glu Lys 1. 5 1O 15

Arg Ile Ala Val Val Gly Met Ala Val Glin Tyr Ala Gly Cys Llys Thr 2O 25 3O Asp Glu Phe Trp Glu Val Lieu Met Asn Gly Llys Val Glu Ser Lys 35 4 O 45

Wall Ile Ser Asp Lys Arg Lieu. Gly Ser Asn Tyr Arg Ala Glu. His Tyr SO 55 6 O

Lys Ala Glu Arg Ser Lys Tyr Ala Asp Thr Phe Cys Asn Glu. Thr Tyr 65 70 7s 8O

Gly Thir Lieu. Asp Glu Asn. Glu Ile Asp Asn. Glu. His Glu Lieu. Lieu. Lieu 85 90 95

Asn Luell Ala Lys Glin Ala Lieu Ala Glu Thir Ser Wall Lys Asp Ser Thr 1OO 105 11 O Arg Gly Ile Val Ser Gly Cys Lieu Ser Phe Pro Met Asp Asn Lieu 115 12 O 125

Glin Gly Glu Lieu. Lieu. Asn Val Tyr Glin Asn His Val Glu Lys Llys Lieu. 13 O 135 14 O

Gly Ala Arg Val Phe Lys Asp Ala Ser His Trp Ser Glu Arg Glu Glin 145 150 155 160

Ser Asn Llys Pro Glu Ala Gly Asp Arg Arg Ile Phe Met Asp Pro Ala US 7,842,796 B2 119 120

- Continued

1.65 17O 17s

Ser Phe Wall Ala Glu Glu Lell Asn Luell Gly Ala Lell His Tyr Ser Wall 18O 185 19 O

Asp Ala Ala Cys Ala Thir Ala Luell Wall Luell Arg Lell Ala Glin Asp 195

His Luell Wall Ser Gly Ala Ala Asp Wall Met Luell Cys Gly Ala Thir 21 O 215 22O

Lell Pro Glu Pro Phe Phe Ile Luell Ser Gly Phe Ser Thir Phe Glin Ala 225 23 O 235 24 O

Met Pro Wall Gly Thir Gly Glin Asn Wall Ser Met Pro Lell His Lys Asp 245 250 255

Ser Glin Gly Luell Thir Pro Gly Glu Gly Gly Ser Ile Met Wall Luell 26 O 265 27 O

Arg Luell Asp Asp Ala Ile Arg Asp Gly Asp His Ile Tyr Gly Thir Luell 285

Lell Gly Ala Asn Wall Ser Asn Ser Gly Thir Gly Lell Pro Luell Pro 29 O 295 3 OO

Lell Luell Pro Ser Glu Lys Luell Met Asp Thir Thir Arg Ile 3. OS 310 315

Asn Wall His Pro His Ile Glin Wall Glu His Ala Thir Gly 3.25 330 335

Thir Pro Glin Gly Asp Arg Wall Glu Ile Asp Ala Wall Ala Phe 34 O 345 35. O

Glu Gly Lys Wall Pro Arg Phe Gly Thir Thir Gly Asn Phe Gly His 355 360 365

Thir Luell Wall Ala Ala Gly Phe Ala Gly Met Lys Wall Luell Luell Ser 37 O 375

Met His Gly Ile Ile Pro Pro Thir Pro Gly Ile Asp Asp Glu Thir 385 390 395 4 OO

Met Asp Pro Lell Wall Wall Ser Gly Glu Ala Ile Pro Trp Pro Glu 4 OS 415

Thir Asn Gly Glu Pro Arg Ala Gly Luell Ser Ala Phe Gly Phe Gly 425 43 O

Gly Thir Asn Ala His Ala Wall Phe Glu Glu His Asp Pro Ser Asn Ala 435 44 O 445

Ala Cys Thir Gly His Asp Ser Ile Ser Ala Luell Ser Ala Arg Gly 450 45.5 460

Gly Glu Ser Asn Met Arg Ile Ala Ile Thir Gly Met Asp Ala Thir Phe 465 470

Gly Ala Luell Gly Lell Asp Ala Phe Glu Arg Ala Ile Thir Gly 485 490 495

Ala His Gly Ala Ile Pro Lell Pro Glu Arg Trp Arg Phe Luell Gly SOO 505 51O

Asp Lys Asp Phe Lell Asp Luell Gly Wall Ala Thir Pro His 515 525

Gly Cys Ile Glu Asp Wall Glu Wall Asp Phe Glin Arg Luell Arg Thir 53 O 535 54 O

Pro Met Thir Pro Glu Asp Met Luell Luell Pro Glin Glin Lell Luell Ala Wall 5.45 550 555 560

Thir Thir Ile Asp Arg Ala Ile Luell Asp Ser Gly Met Gly Gly 565 st O sts

Asn Wall Ala Wall Phe Wall Gly Luell Gly Thir Asp Lell Glu Luell 58O 585 59 O US 7,842,796 B2 121 122

- Continued

His Arg Ala Arg Wall Ala Lell Lys Glu Arg Wall Arg Pro Glu Ala Ser 595 605

Lys Luell Asn Asp Met Met Glin Ile ASn Asp Cys Gly Thir Ser 610 615 62O

Thir Ser Thir Ser Tyr Ile Gly Asn Luell Wall Ala Thir Arg Wall Ser 625 630 635 64 O

Ser Glin Trp Gly Phe Thir Gly Pro Ser Phe Thir Ile Thir Glu Gly Asn 645 650 655

Asn Ser Wall Tyr Arg Ala Glu Luell Gly Lell Luell Glu Thir 660 665 67 O

Gly Glu Wall Asp Gly Wall Wall Wall Ala Gly Wall Asp Lell Gly Ser 675 685

Ala Glu Asn Luell Tyr Wall Lys Ser Arg Arg Phe Lys Wall Ser Thir Ser 69 O. 695 7 OO

Asp Thir Pro Arg Ala Ser Phe Asp Ala Ala Ala Asp Gly Phe Wall 7 Os

Gly Glu Gly Gly Ala Phe Wall Luell Lys Arg Glu Thir Ser Cys Thir 72 73 O 73

Asp Asp Arg Ile Ala Met Asp Ala Ile Wall Pro Gly Asn 740 74. 7 O

Wall Pro Ser Ala Cys Lell Arg Glu Ala Luell Asp Glin Ala Arg Wall 760 765

Pro Gly Asp Ile Glu Met Lell Glu Luell Ser Ala Asp Ser Ala Arg His 770 775

Lell Asp Pro Ser Wall Lell Pro Glu Luell Thir Ala Glu Glu Glu 79 O 79. 8OO

Ile Gly Gly Luell Glin Thir Ile Luell Arg Asp Asp Asp Luell Pro Arg 805 810 815

Asn Wall Ala Thir Gly Ser Wall Ala Thir Wall Gly Asp Thir Gly 825 83 O

Ala Ser Gly Ala Ala Ser Lell Ile Ala Ala Lell Cys Ile Asn 835 84 O 845

Arg Tyr Luell Pro Ser Asn Gly Asp Asp Trp Asp Glu Pro Ala Pro Glu 850 855 860

Ala Pro Trp Asp Ser Thir Lell Phe Ala Glin Thir Ser Arg Ala Trp 865

Lell Asn Pro Gly Glu Arg Arg Ala Ala Wall Ser Gly Wall Ser 885 890 895

Glu Thir Arg Ser Cys Ser Wall Luell Luell Ser Glu Glu Gly His 9 OO 905 91 O

Glu Arg Glu Asn Arg Ile Ser Luell Asp Glu Glu Pro Luell 915 92 O

Ile Wall Luell Arg Ala Asp Ser His Glu Glu Ile Lell Arg Luell Asp 93 O 935 94 O

Lys Ile Arg Glu Arg Phe Lell Glin Pro Thir Gly Ala Pro Arg Glu 945 950 955 96.O

Ser Glu Luell Ala Glin Ala Arg Arg Ile Phe Lell Luell Luell Gly 965 97.

Glu Thir Luell Ala Glin Asp Ala Ala Ser Ser Gly Ser Lys Pro Luell 98O 985 99 O

Ala Luell Ser Luell Wall Ser Thir Pro Ser Llys Lieu. Glin Arg Glu Val Glu 995 1OOO OOS US 7,842,796 B2 123 124

- Continued

Lell Ala Ala Gly Ile Pro Arg Luell Met Arg Arg Asp O1O O15 O2O

Trp Ser Ser Pro Ala Gly Ser Arg Ala Pro Glu Pro Lell Ala O25 O3 O O35

Ser Arg Wall Ala Phe Met Gly Glu Gly Arg Ser Pro OSO

Ile Thir Glin Asp His Arg Ile Trp Pro Glu Lell His O65

Glu Ile Asn Glu Asn Arg Luell Trp Glu Gly Asp

Arg Wall Met Pro Arg Ser Phe Ser Luell Glu Ser

Glin Glin Glu Phe Asp Asn Met Ile Glu Phe Arg Lell

Gly Lell Thir Ser Ile Phe Thir ASn Lell Arg Asp Wall

Lell Ile Thir Pro Ala Phe Gly Lell Luell Gly Glu

Ile Met Ile Phe Ala Ser Asn Luell Ile Ser

Asp Lell Thir Asp Arg Glu Ser Asp Trp Asn

Ala Ala Wall Glu Phe Ala Luell Arg Trp Gly Ile

Pro Ser Wall Pro Glu Phe Trp Ile Wall

Arg Thir Glin Asp Glu Ala Ala Pro Asp Ser

Wall Luell Thir Ile Asn Asp Asn Thir Ala Lell 22O 23 O

Ile Ser Gly Pro Asp Cys Ala Ala Arg Lell 235 245

Gly Asn Ile Pro Ala Luell Pro Wall Thir Met Gly 255

His Pro Glu Wall Gly Pro Thir Asp Ala Ile 27 O

His Asn Lell Glu Phe Pro Wall Wall Asp Gly Lell Asp Lell Trp 285 29 O

Thir Ile Asn Glin Luell Wall Pro Arg Ala Thir Gly Ala 305

Glu Trp Ala Pro Ser Ser Phe Gly Glu Ala Gly Glin 315

Lell Glu Glin Ala Asn Phe Pro Glin Ile Wall Glu Thir Ile 33 O 335

Glin Asn Tyr Asp Phe Wall Glu Wall Pro Asn Asn

His Ser Thir Ala Wall Thir Thir Luell Glin Arg Asn

His Ala Gly Ala Ile Lys Glin ASn Ala Trp Thir

Thir Wall Luell Wall Ser Luell Lys Ala Luell Wall Pro

Gly Thir Ile Ser Pro His Ser Wall Ala Glu US 7,842,796 B2 125 126

- Continued

4 OO 405 41 O

Ala Glu Ala Tyr Ala Ala Luell Gly Glu Pro 415 42O 425

Asn Phe Wall Arg Llys Ile Glin Luell Asn Gly Arg Phe Asn 43 O 435 44 O

Ser Ala Asp Pro Ile Ser Ser Ala Asp Lell Ala Ser Phe Pro 450 45.5

Pro Asp Pro Ala Ile Glu Ala Ala Ile Ser Arg Ile Met 465

Wall Ala Pro Llys Phe Ala Arg Lell Ile Asp Glu 48O

Glu Thir Arg Asp Pro Ile Luell ASn ASn Ala Pro 495

Ser Ser Ser Ser Ser Ser Ser Ser Ser Ser Ser Ser Ser Ser 5 OS 510

Pro Ser Pro Ala Pro Ser Ala Pro Wall Glin Ala Ala Pro 525

Ala Glu Thir Ala Wall Ala Ser Ala Asp Luell Arg Ser 535 54 O

Ala Luell Lell Asp Luell Asp Ser Met Luell Ala Lell Ser Ala Ser 550 555

Ala Ser Gly Asn Luell Wall Glu Thir Ala Pro Ser Ala Ser Wall 565 st O

Ile Pro Pro ASn Ile Ala Asp Luell Gly Arg Ala Phe 585

Met Thir Gly Wall Ser Ala Pro Luell Tyr Gly Ala Met 6OO 605

Ala Gly Ile Ala Ser Ala Asp Luell Wall Ile Ala Ala Gly Arg 615 62O

Glin Ile Lell Ala Ser Phe Gly Ala Gly Gly Lell Pro Met Glin 63 O 635

Wall Arg Glu Ser Ile Glu Ile Glin Ala Ala Luell Pro Asn 645 650

Gly Pro Ala Wall Asn Luell Ile His Ser Pro Phe Asp Ser Asn 655 660 665

Lell Glu Gly Asn Val Asp Luell Phe Luell Glu Gly Wall Thir 670 675

Phe Glu Ala Ser Ala Phe Met Thir Luell Thir Glin Wall Wall 685 69 O.

Arg yr Arg Ala Ala Gly Lieu. Thir Arg ASn Ala Gly Ser Wall 7 OO 7Os

Asn Arg Asn Arg Ile Ile Gly Wall Ser Thir Glu Lell 72 O

Ala Met Phe Met Arg Pro Ala Pro Glu His Lell Luell Glin 73 74 O

Lell Ala Ser Gly Glu Ile Asn Glin Glu Glin Ala Glu Lell Ala 7 O 7ss

Wall Pro Wall Ala Asp Asp Ile Ala Wall Glu Ala Asp Ser 765 770

His Thir Asp Asn Arg Pro Ile His Wall le Luell Pro Lell 78O 78s

Ile Asn Lell Arg Asp Arg Luell His Arg Glu Gly Pro 79. US 7,842,796 B2 127 128

- Continued

Ala Asn Lieu. Arg Val Arg Val Gly Ala Gly Gly Gly Ile Gly Cys 805 810 815 Pro Glin Ala Ala Leu Ala Thr Phe Asn Met Gly Ala Ser Phe Ile 82O 825 83 O Val Thr Gly Thr Val Asn Glin Val Ala Lys Glin Ser Gly Thr Cys 835 84 O 845 Asp Asn Val Arg Lys Glin Lieu Ala Lys Ala Thr Tyr Ser Asp Wall 850 855 86 O Cys Met Ala Pro Ala Ala Asp Met Phe Glu Glu Gly Wall Lys Lieu. 865 87 O 87s Glin Val Lieu Lys Lys Gly Thr Met Phe Pro Ser Arg Ala Asn Llys 88O 885 890 Lieu. Tyr Glu Lieu Phe Cys Llys Tyr Asp Ser Phe Glu Ser Met Pro 895 9 OO 905 Pro Ala Glu Lieu Ala Arg Val Glu Lys Arg Ile Phe Ser Arg Ala 910 915 92 O Lieu. Glu Glu Val Trp Asp Glu Thir Lys Asn. Phe Tyr Ile Asn Arg 925 93 O 935 Lieu. His Asn. Pro Glu Lys Ile Glin Arg Ala Glu Arg Asp Pro Llys 94 O 945 950 Lieu Lys Met Ser Lieu. Cys Phe Arg Trip Tyr Lieu. Ser Lieu Ala Ser 955 96.O 965 Arg Trp Ala Asn Thr Gly Ala Ser Asp Arg Val Met Asp Tyr Glin 97O 97. 98 O Val Trp Cys Gly Pro Ala Ile Gly Ser Phe Asn Asp Phe Ile Lys 985 990 995 Gly Thr Tyr Lieu. Asp Pro Ala Val Ala Asn Glu Tyr Pro Cys Val 2OOO 2005 2010 Val Glin Ile Asn Lys Glin Ile Lieu. Arg Gly Ala Cys Phe Lieu. Arg 2015 2O2O 2O25 Arg Lieu. Glu Ile Lieu. Arg Asn Ala Arg Lieu. Ser Asp Gly Ala Ala 2O3O 2O35 2O4. O Ala Lieu Val Ala Ser Ile Asp Asp Thr Tyr Val Pro Ala Glu Lys 2O45 2OSO 2O55

Lell

<210s, SEQ ID NO 5 &211s LENGTH: 4506 &212s. TYPE: DNA <213> ORGANISM: Schizochytrium sp. 22 Os. FEATURE: <221s NAME/KEY: CDS <222s. LOCATION: (1) ... (4506

<4 OOs, SEQUENCE: 5 atg gcg Ct c cqt gtc. aag acg aac aag aag cca to t gag atg acc 48 Met Ala Lieu. Arg Val Llys Thr Asn Llys Llys Pro Cys Trp Glu Met Thr 1. 5 1O 15 aag gag gag ctg acc agc ggc aag acc gag gtg titc aac tat gag gala 96 Lys Glu Glu Lieu. Thir Ser Gly Lys Thr Glu Val Phe Asn Tyr Glu Glu 2O 25 3O ctic ct c gag titc gca gag ggc gaC at C gcc aag gtc. ttic gga ccc gag 144 Lieu. Lieu. Glu Phe Ala Glu Gly Asp Ile Ala Lys Val Phe Gly Pro Glu 35 4 O 45 ttic gcc gtC at C gac aag tac Cog cqc cqc gtg cqc ctg. CCC goc CC 192 Phe Ala Val Ile Asp Llys Tyr Pro Arg Arg Val Arg Lieu Pro Ala Arg

US 7,842,796 B2 137 138

- Continued

305 atc. atg gcc cc.g gcc gcc gac atg titc gag ggc gtc aag 3969 Ile Met Ala Pro Ala Ala Asp Met Phe Glu Gly Wall citc. gtc ctic aag aag giga acc atg titc c cc cgc gcc aac 4 O14 Lell Wall Lieu Lys Lys Gly Thr Met Phe Pro Arg Ala Asn aag tac gag ct c ttt togc aag tac gac t cc gac t cc atg 4 OS 9 Glu Lieu Phe Cys Lys Asp Ser Asp Ser Met cott gcc gag ct c at C gag aag cgt titc aag cgc 4104 Pro Ala Glu Lieu. Glu Arg Ile Glu Lys Arg Phe Lys Arg gca Cag gag gtC tgg gag gag acc aag gac tac att aac 41.49 Ala Glin Glu Wall Trp Glu Glu Thir Lys Asp Ile Asn ggit aag aac cc.g gag aag at C cag cgc gcc CaC gac c cc 41.94 Gly Lys Asn. Pro Glu Lys Ile Glin Arg Ala His Asp Pro aag aag atgtcg ctic togc ttic cgc tgg tac ggt citt gcc 4239 Lys Lys Met Ser Lieu. Cys Phe Arg Trp Gly Lell Ala agc tgg gcc aac atg ggc gcc cc.g gac cgc atg gac tac 4284 Ser Trp Ala Asn Met Gly Ala Pro Asp Arg Met Asp

Cag tgg ccg gcc att ggc gcc tto gac tto atc. 4329 Glin Trp CyS Gly Pro Ala Ile Gly Ala Phe Asp Phe Ile aag a CC tact citc. gac ccc gct gt C to c aac tac c cc tgt 4374 Lys Thir Tyr Lieu. Asp Pro Ala Wall Ser Asn Pro Cys gtc Cag atc. aac ct g caa at C citc. cgt. ggit tgc tac Ctg 4 419 Wall Glin Ile Asn Lieu. Glin Ile Luell Arg Gly Cys Lell cgc citc. aac goc aac gac cc.g cgc gac citc. gag 4 464 Arg Lell Asn Ala Lieu. Arg Asn Asp Pro Arg Asp Lell Glu a CC gat gct gcc ttt gtc tac gag coc a CC aac gC9 citc. 4506 Thir Asp Ala Ala Phe Wall Tyr Glu Pro Thir Asn Ala Lell SOO

<210s, SEQ ID NO 6 &211s LENGTH: 15O2 212. TYPE : PRT &213s ORGANISM: Schizochytrium sp.

<4 OOs, SEQUENCE: 6

Met Ala Lieu. Arg Val Lys Thr Asn Lys Lys Pro Cys Trp Glu Met Thir 1. 15

Lys Glu Glu Lieu. Thir Ser Gly Lys Thr Gl u Wall Phe Asn Tyr Glu Glu 25

Lell Luell Glu Phe Ala Glu Gly Asp Il e Al a Lys Wall Phe Gly Pro Glu 35 4 O 45

Phe Ala Wall Ile Asp Pro Air g Air g Val Arg Lieu. Pro Ala Arg SO 55 6 O

Glu Luell Lieu Wall Thr Arg Wall. Thir Le yet Asp Ala Glu Wall Asn 65 70 7s

Asn Arg Val Gly Ala Arg Met Wa Thir Glu Tyr Asp Luel Pro Wall 85 90 95 US 7,842,796 B2 139 140

- Continued

Asn Gly Glu Luell Ser Glu Gly Gly Asp Pro Trp Ala Wall Luell Wall 105 11 O

Glu Ser Gly Glin Cys Asp Lell Met Luell Ile Ser Tyr Met Gly Ile Asp 115 12 O 125

Phe Glin Asn Glin Gly Asp Arg Wall Arg Luell Lell Asn Thir Thir Luell 13 O 135 14 O

Thir Phe Gly Wall Ala His Glu Gly Glu Thir Lell Glu Asp Ile 145 150 155 160

Arg Wall Thir Gly Phe Ala Arg Luell Asp Gly Gly Ile Ser Met Phe 1.65 17O

Phe Phe Glu Tyr Asp Wall Asn Gly Arg Lell Lell Ile Glu Met 18O 185 19 O

Arg Asp Gly Ala Gly Phe Phe Thir Asn Glu Glu Lell Asp Ala Gly 195

Gly Wall Wall Phe Thir Arg Gly Asp Luell Ala Ala Arg Ala Ile 21 O 215

Pro Glin Asp Wall Ser Pro Ala Wall Ala Pro Luell His Lys 225 23 O 235 24 O

Thir Luell Asn Glu Glu Met Glin Thir Luell Wall Asp Asp Trp 245 250 255

Ala Ser Wall Phe Gly Ser Asn Gly Met Pro Glu Ile Asn 26 O 265 27 O

Lell Ala Arg Lys Met Lell Met Ile Asp Arg Wall Thir Ser Ile Asp 285

His Lys Gly Gly Wall Gly Luell Gly Glin Luell Wall Gly Glu Ile 29 O 295 3 OO

Lell Glu Arg Asp His Trp Phe Pro His Phe Wall Asp Glin 3. OS 310 315 32O

Wall Met Ala Gly Ser Lell Wall Ser Asp Gly Ser Glin Met Luell 3.25 330 335

Met Met Ile Trp Lell Gly Luell His Luell Thir Thir Gly Pro Phe Asp 34 O 345 35. O

Phe Arg Pro Wall Asn Gly His Pro Asn Wall Arg Cys Arg Gly Glin 355 360 365

Ile Ser Pro His Lys Gly Lys Luell Wall Wall Met Glu Ile Glu 37 O 375

Met Gly Phe Asp Glu Asp Asn Asp Pro Ala Ile Ala Asp Wall Asn 385 390 395 4 OO

Ile Ile Asp Wall Asp Phe Glu Gly Glin Asp Phe Ser Luell Asp Arg 4 OS 41O 415

Ile Ser Asp Tyr Gly Gly Asp Luell Asn Ile Wall Wall Asp 425 43 O

Phe Gly Ile Ala Lell Met Glin Arg Ser Thir Asn Asn 435 44 O 445

Pro Ser Wall Glin Pro Wall Phe Ala Asn Gly Ala Ala Thir Wall Gly 450 45.5 460

Pro Glu Ala Ser Lys Ala Ser Ser Gly Ala Ser Ala Ser Ala Ser Ala 465 470

Ala Pro Ala Pro Ala Phe Ser Ala Asp Wall Lell Ala Pro Lys Pro 485 490 495

Wall Ala Luell Pro Glu His Ile Luell Lys Gly Asp Ala Lell Ala Pro Lys SOO 505 51O US 7,842,796 B2 141 142

- Continued

Glu Met Ser Trp His Pro Met Ala Arg Ile Pro Gly Asn Pro Thir Pro 515 525

Ser Phe Ala Pro Ser Ala Tyr Pro Arg ASn Ile Ala Phe Thir Pro 53 O 535 54 O

Phe Pro Gly Asn Pro Asn Asp Asn Asp His Thir Pro Gly Met Pro 5.45 550 555 560

Lell Thir Trp Phe Asn Met Ala Glu Phe Met Ala Gly Wall Ser Met 565 st O sts

Luell Gly Pro Glu Phe Ala Phe Asp Asp Ser Asn Thir Ser Arg 585 59 O

Ser Pro Ala Trp Asp Lell Ala Luell Wall Thir Arg Ala Wall Ser Wall Ser 595 605

Asp Luell His Wall Asn Tyr Arg Asn Ile Asp Lell Asp Pro Ser 610 615

Gly Thir Met Wall Gly Glu Phe Asp Pro Ala Asp Ala Trp Phe Tyr 625 630 635 64 O

Gly Ala Asn Asp Ala His Met Pro Ser Ile Luell Met Glu 645 650 655

Ile Ala Luell Glin Thir Ser Gly Wall Luell Thir Ser Wall Lell Lys Ala Pro 660 665 67 O

Lell Thir Met Glu Asp Asp Ile Luell Phe Arg Asn Lell Asp Ala Asn 675 685

Ala Glu Phe Wall Arg Asp Luell Asp Arg Gly Lys Thir Ile Arg 69 O. 695 7 OO

Asn Wall Thir Cys Gly Tyr Ser Met Luell Gly Glu Met Gly Wall 7 Os

His Arg Phe Thir Phe Lell Tyr Wall Asp Asp Wall Lell Phe Tyr 72 73 O 73

Gly Ser Thir Ser Phe Trp Phe Wall Pro Glu Wall Phe Ala Ala Glin 740 74. 7 O

Ala Gly Luell Asp Asn Arg Lys Ser Glu Pro Trp Phe Ile Glu Asn 760 765

Wall Pro Ala Ser Wall Ser Ser Phe Asp Wall Arg Pro Asn Gly 770 775

Ser Gly Arg Thir Ala Phe Ala Asn Ala Pro Ser Gly Ala Glin Luell 79.

Asn Arg Arg Thir Asp Gly Glin Tyr Luell Asp Ala Wall Asp Ile Wall 805 810 815

Ser Gly Ser Gly Lys Ser Luell Gly Tyr Ala His Gly Ser Thir 825 83 O

Wall Asn Pro Asn Asp Trp Phe Phe Ser His Phe Trp Phe Asp Ser 835 84 O 845

Wall Met Pro Gly Ser Lell Gly Wall Glu Ser Met Phe Glin Luell Wall Glu 850 855 860

Ala Ile Ala Ala His Glu Asp Luell Ala Gly Lys His Gly Ile Ala Asn 865

Pro Thir Phe Wall His Ala Pro Gly Ile Ser Trp Arg Gly 885 890 895

Glin Luell Thir Pro Ser Met Asp Ser Glu Wall His Ile Wall 9 OO 905 91 O

Ser Wall Asp Ala His Asp Gly Wall Wall Asp Luell Wall Ala Asp Gly Phe 915 92 O 925

Lell Trp Ala Asp Ser Lell Arg Wall Ser Wall Ser Asn Ile Arg Wall US 7,842,796 B2 143 144

- Continued

93 O 935 94 O

Arg Ile Ala Ser Gly Glu Ala Pro Ala Ala Ala Ser Ser Ala Ala Ser 945 950 955 96.O

Wall Gly Ser Ser Ala Ser Ser Val Glu Arg Thr Arg Ser Ser Pro Ala 965 97.

Wall Ala Ser Gly Pro Ala in Thir Ile Asp Lieu. Lys Glin Lieu. Lys Thr 985 99 O

Glu Luell Luell Glu Lieu. Asp a Pro Lieu. Tyr Lieu Ser Glin Asp Pro Thr 995 1OOO 1005

Ser Glin Lieu Lys Lys His Thir Asp Wall Ala Ser Gly Glin Ala O15 O2O

Thir Wall Glin Pro Cys Thir Luell Gly Asp Lell Gly Asp Arg Ser O35

Phe Glu Thr Tyr Gly Wall Ala Pro Lell yr Thir Gly Ala OSO

Met Gly Ile Ala Ser Ala Asp Luell Wall Ala Ala Gly O6 O

Ile Lieu. Gly Ser Phe Gly Ala Gly Luell Pro Met O7

His Wall Arg Ala Ala Luell Glu Ile Glin Ala Lell Pro O9 O

Glin Pro Tyr Ala Wall Asn Luell Ile His Ser Phe Asp Ser O5

Asn Glu Asn Asp Luell Phe Lell Gly Wall

Thir Wall Glu Ala Ser Phe Met Thir Lell Pro Glin Wall

Wall Arg Ala Ala Luell Ser Arg Asn Asp Gly Ser

Wall Ile Arg Asn Arg Ile Gly Wall Arg Thir Glu

Lell Glu Met Phe Ile Pro Ala Pro Glu Luell Lell Glu

Ile Ala Ser Gly Ile Thir Glin Glu Ala Glu Lell 2OO

Ala Arg Wall Pro Wall Asp Asp Ile Ala Wall Glu Ala Asp 215

Ser Gly His Thr Asp Asn Arg Pro Ile His Wall Ile Lell Pro 225 23 O

Lell Ile ASn Luell Arg Asn Arg Luell His Arg Glu Gly 24 O 245

Pro His Lieu. Arg Wall Wall Gly Ala Gly Gly Wall Gly

Glin Ala Ala Ala Ala Luell Thir Met Ala Ala Phe

Ile Thir Gly Thr Wall Glin Wall Ala Glin Ser Gly Thir 29 O

Asn Val Arg Lys Luell Ser Glin Ala Ser Asp 305

Ile Met Ala Pro Ala Asp Met Phe Glu Glu Gly Wall 32O

Lell Glin Wall Lieu Lys Lys Thir Met Phe Pro Arg Ala Asn 3.25 335

US 7,842,796 B2 149 150

- Continued Pro Pro Gly Wall Pro Arg Arg Ala Gly Ile Ser Ser Phe Gly Phe Gly 450 45.5 460 ggc gcc aac tac CaC gcc gtc citc. gag gag gcc gag c cc gag CaC acg 144 O Gly Ala Asn Tyr His Ala Wall Luell Glu Glu Ala Glu Pro Glu His Thir 465 470 47s 48O a CC gcg tac cgc citc. aac aag cgc cc.g cag coc gtg citc. atg atg gcc 1488 Thir Ala Arg Lell Asn Lys Arg Pro Glin Pro Wall Lell Met Met Ala 485 490 495 gcc acg cc c gcg 15OO Ala Thir Pro Ala SOO

<210s, SEQ ID NO 8 &211s LENGTH: SOO 212. TYPE : PRT &213s ORGANISM: Schizochytrium sp.

<4 OOs, SEQUENCE: 8

Met Ala Ala Arg Lell Glin Glu Glin Gly Gly Glu Met Asp Thir Arg 1. 15

Ile Ala Ile Ile Gly Met Ser Ala Ile Luell Pro Gly Thir Thir Wall 25

Arg Glu Ser Trp Glu Thir Ile Arg Ala Gly Ile Asp Cys Luell Ser Asp 35 4 O 45

Lell Pro Glu Asp Arg Wall Asp Wall Thir Ala Phe Asp Pro Wall Lys SO 55 6 O

Thir Thir Asp Ile Arg Gly Gly Phe Ile Pro Glu 65 70

Asp Phe Asp Ala Arg Glu Phe Gly Luell ASn Met Phe Glin Met Glu 85 90 95

Asp Ser Asp Ala Asn Glin Thir Ile Ser Luell Luell Wall Lys Glu Ala 105 11 O

Lell Glin Asp Ala Gly Ile Asp Ala Luell Gly Lys Glu Lys Asn Ile 115 12 O 125

Gly Cys Wall Luell Gly Ile Gly Gly Gly Glin Lys Ser Ser His Glu Phe 13 O 135 14 O

Tyr Ser Arg Luell Asn Tyr Wall Wall Wall Glu Lys Wall Lell Arg Met 145 150 155 160

Gly Met Pro Glu Glu Asp Wall Wall Ala Wall Glu Lys Ala 1.65 17s

Asn Phe Pro Glu Trp Arg Lell Asp Ser Phe Pro Gly Phe Luell Gly Asn 18O 185 19 O

Wall Thir Ala Gly Arg Thir Asn Thir Phe ASn Lell Asp Gly Met Asn 195

Wall Wall Asp Ala Ala Cys Ala Ser Ser Luell Ile Ala Wall Wall 21 O 215

Ala Ile Asp Glu Lell Lell Tyr Gly Asp Asp Met Met Wall Thir Gly 225 23 O 235 24 O

Ala Thir Thir Asp Asn Ser Ile Gly Met Met Ala Phe Ser 245 250 255

Thir Pro Wall Phe Ser Thir Asp Pro Ser Wall Arg Ala Asp Glu 26 O 265 27 O

Thir Gly Met Lell Ile Gly Glu Gly Ser Ala Met Lell Wall Luell 27s 28O 285

Arg Tyr Ala Asp Ala Wall Arg Asp Gly Asp Glu Ile His Ala Wall Ile 29 O 295 3 OO US 7,842,796 B2 151 152

- Continued

Arg Gly Ala Ser Ser Ser Asp Gly Ala Ala Gly Ile Tyr Thir 3. OS 310 315 32O

Pro Thir Ile Ser Gly Glin Glu Glu Ala Luell Arg Arg Ala Tyr Asn Arg 3.25 330 335

Ala Wall Asp Pro Ala Thir Wall Thir Luell Wall Glu Gly His Gly Thir 34 O 345 35. O

Gly Thir Pro Wall Gly Asp Arg Ile Glu Luell Thir Ala Lell Arg Asn Luell 355 360 365

Phe Asp Ala Tyr Gly Glu Gly Asn Thir Glu Lys Wall Ala Wall Gly 37 O 375

Ser Ile Ser Ser Ile Gly His Luell Ala Wall Ala Gly Lel Ala 385 390 395 4 OO

Gly Met Ile Wall Ile Met Ala Luell Lys His Thir Luell Gly 4 OS 41O

Thir Ile Asn Wall Asp Asn Pro Pro Asn Luell Tyr Asp Asn Thir Ile 425 43 O

Asn Glu Ser Ser Lell Tyr Ile Asn Thir Met ASn Arg Pro Trp Phe Pro 435 44 O 445

Pro Pro Gly Wall Pro Arg Arg Ala Gly Ile Ser Ser Phe Gly Phe Gly 450 45.5 460

Gly Ala Asn Tyr His Ala Wall Luell Glu Glu Ala Glu Pro Glu His Thir 465 470 47s 48O

Thir Ala Arg Lell Asn Arg Pro Glin Pro Wall Lell Met Met Ala 485 490 495

Ala Thir Pro Ala SOO

SEO ID NO 9 LENGTH: 1278 TYPE: DNA ORGANISM: Schizochytrium sp. FEATURE: NAME/KEY: CDS LOCATION: (1) ... (1278)

< 4 OOs SEQUENCE: 9 gat gt C acc aag gag gcc tgg cgc citc. cc c cgc gag ggc gtC agc ttic 48 Asp Wall Thir Lys Glu Ala Trp Arg Luell Pro Arg Glu Gly Wall Ser Phe 1. 1O 15

gcc aag ggc atc. gcc a CC aac ggc gct gtc gcc gcg citc. ttic to c 96 Arg Ala Lys Gly Ile Ala Thir Asn Gly Ala Wall Ala Ala Luell Phe Ser 25 3O ggc cag ggc gcg Cag tac acg CaC atg titt agc gag gtg gcc atg aac 144 Gly Glin Gly Ala Glin Tyr Thir His Met Phe Ser Glu Wall Ala Met Asn 35 4 O 45 tgg cc c cag ttic cgc Cag agc att gcc gcc atg gac gcc gcc cag to c 192 Trp Pro Glin Phe Glin Ser Ile Ala Ala Met Asp Ala Ala Glin Ser SO 55 6 O aag gt C gct gga agc gac aag gac titt gag cgc gtc t cc cag gt C citc. 24 O Lys Wall Ala Gly Ser Asp Phe Glu Arg Wall Ser Glin Wall Luell 65 7s 8O tac cc.g cgc aag cc.g gag cgt gag cc c gag Cag gac CaC aag aag 288 Pro Arg Lys Pro Glu Arg Glu Pro Glu Glin Asp His Lys Lys 85 90 95 atc. to c citc. acc gcc tcg cag cc c tog acc Ctg gcc tgc gct citc. 336 Ile Ser Luell Thir Ala Ser Glin Pro Ser Thir Lell Ala Cys Ala Luell 1OO 105 11 O

US 7,842,796 B2 155 156

- Continued

<210s, SEQ ID NO 10 &211s LENGTH: 426 212. TYPE : PRT &213s ORGANISM: Schizochytrium sp.

<4 OOs, SEQUENCE: 10

Asp Val Thir Lys Glu Ala Trp Arg Luell Pro Arg Glu Gly Wall Ser Phe 1. 5 15

Arg Ala Lys Gly Ile Ala Thir Asn Gly Ala Wall Ala Ala Luell Phe Ser 2O 25

Gly Glin Gly Ala Glin Tyr Thir His Met Phe Ser Glu Wall Ala Met Asn 35 4 O 45

Trp Pro Glin Phe Glin Ser Ile Ala Ala Met Asp Ala Ala Glin Ser SO 55 6 O

Lys Wall Ala Gly Ser Asp Asp Phe Glu Arg Wall Ser Glin Wall Luell 65 70

Tyr Pro Arg Pro Glu Arg Glu Pro Glu Glin Asp His Lys Lys 85 90 95

Ile Ser Luell Thir Ala Ser Glin Pro Ser Thir Lell Ala Cys Ala Luell 1OO 105 11 O

Gly Ala Phe Glu Ile Phe Glu Ala Gly Phe Thir Pro Asp Phe Ala 115 12 O 125

Ala Gly His Ser Lell Gly Glu Phe Ala Ala Luell Tyr Ala Ala Gly 13 O 135 14 O

Wall Asp Arg Asp Glu Lell Phe Glu Luell Wall Cys Arg Arg Ala Arg Ile 145 150 155 160

Met Gly Gly Asp Ala Pro Ala Thir Pro Lys Gly Met Ala Ala 1.65 17O 17s

Wall Ile Gly Pro Asn Ala Glu Asn Ile Wall Glin Ala Ala Asn Wall 18O 185 19 O

Trp Luell Gly Asn Ser Asn Ser Pro Ser Glin Thir Wall Ile Thir Gly Ser 195

Wall Glu Gly Ile Glin Ala Glu Ser Ala Arg Luell Glin Glu Gly Phe 21 O 215

Arg Wall Wall Pro Lell Ala Glu Ser Ala Phe His Ser Pro Glin Met 225 23 O 235 24 O

Glu Asn Ala Ser Ser Ala Phe Asp Wall Ile Ser Wall Ser Phe 245 250 255

Arg Thir Pro Lys Ala Glu Thir Luell Phe Ser Asn Wall Ser Gly Glu 26 O 265 27 O

Thir Pro Thir Asp Ala Arg Glu Met Luell Thir Glin His Met Thir Ser 27s 28O 285

Ser Wall Phe Lell Thir Glin Wall Arg Asn Met His Glin Ala Gly Ala 29 O 295 3 OO

Arg Ile Phe Wall Glu Phe Gly Pro Glin Wall Lell Ser Luell Wall 3. OS 310 315

Ser Glu Thir Luell Lys Asp Asp Pro Ser Wall Wall Thir Wall Ser Wall Asn 3.25 330 335

Pro Ala Ser Gly Thir Asp Ser Asp Ile Glin Luell Arg Asp Ala Ala Wall 34 O 345 35. O

Glin Luell Wall Wall Ala Gly Wall Asn Luell Glin Gly Phe Asp Trp Asp 355 360 365

Ala Pro Asp Ala Thir Arg Met Glin Ala Ile Lys Arg Thir Thir US 7,842,796 B2 157 158

- Continued

37 O 375

Lieu. Arg Lieu. Ser Ala Ala Thir Tyr Val Ser Asp Llys Thr Lys Llys Val 385 390 395 4 OO

Arg Asp Ala Ala Met Asn Asp Gly Arg Cys Val Thr Tyr Lieu Lys Gly 4 OS 415

Ala Ala Pro Lieu. Ile Ala Pro Glu Pro 425

SEQ ID NO 11 LENGTH: 5 TYPE : PRT ORGANISM: Schizochytrium sp. FEATURE: NAME/KEY: MISC FEATURE LOCATION: (4) . . (4) OTHER INFORMATION: X any amino acid

<4 OOs, SEQUENCE: 11 Gly His Ser Xaa Gly 1. 5

SEQ ID NO 12 LENGTH: 258 TYPE: DNA ORGANISM: Schizochytrium sp. FEATURE: NAME/KEY: CDS LOCATION: (1) ... (258)

< 4 OO > SEQUENCE: 12 gct gt C tog aac gag citt citt gag aag gcc gag act gtc gtC atg gag 48 Ala Wall Ser Asn Glu Lell Lell Glu Lys Ala Glu Thir Wall Wall Met Glu 1. 1O 15 gtc citc. gcc gcc aag a CC ggc tac gag acc gac atg atc. gag gct gac 96 Wall Luell Ala Ala Lys Thir Gly Tyr Glu Thir Asp Met Ile Glu Ala Asp 2O 25 3O atg gag citc. gag a CC gag citc. ggc att gac to c atc. aag cgt. gt C gag 144 Met Glu Luell Glu Thir Glu Lell Gly Ile Asp Ser Ile Lys Arg Wall Glu 35 4 O 45 atc. citc. to c gag gtc Cag gcc atg citc. aat gtc gag gcc aag gat gt C 192 Ile Luell Ser Glu Wall Glin Ala Met Luell Asn Wall Glu Ala Lys Asp Wall SO 55 6 O gat gcc citc. agc cgc act cgc act gtt ggt gag gtt gtc aac gcc atg 24 O Asp Ala Luell Ser Arg Thir Arg Thir Wall Gly Glu Wall Wall Asn Ala Met 65 70 7s 8O aag gcc gag at C gct ggc 258 Lys Ala Glu Ile Ala Gly 85

<210s, SEQ ID NO 13 &211s LENGTH: 86 212. TYPE : PRT <213> ORGANISM: Schizochytrium sp.

<4 OOs, SEQUENCE: 13

Ala Wal Ser Asn. Glu Lell Lell Glu Lys Ala Glu Thir Wall Wall Met Glu 1. 5 1O 15

Wall Lieu Ala Ala Lys Thir Gly Tyr Glu Thir Asp Met Ile Glu Ala Asp 25 3O

Met Glu Lieu. Glu. Thir Glu Lell Gly Ile Asp Ser Ile Lys Arg Wall Glu 35 4 O 45

Ile Lieu. Ser Glu Wall Glin Ala Met Luell Asn Wall Glu Ala Asp Wall

US 7,842,796 B2 167 168

- Continued

645 650 655

Cag CaC cgc cc.g gtc cc.g Cag gac aag cc c titc tac att acc citc. cgc 2O16 Glin His Arg Pro Wall Pro Glin Asp Lys Pro Phe Ile Thir Luell Arg 660 665 67 O t cc aac cag tog ggc ggit CaC to c cag CaC aag CaC gcc citt cag ttic Ser Asn Glin Ser Gly Gly His Ser Glin His Lys His Ala Luell Glin Phe 675 68O 685

CaC aac gag cag ggc gat citc. ttic att gat gtc Cag gct tog gt C at C 2112 His Asn Glu Glin Gly Asp Lell Phe Ile Asp Wall Glin Ala Ser Wall Ile 69 O. 695 7 OO gcc acg gac agc citt gcc tto 21.33 Ala Thir Asp Ser Lell Ala Phe 7 Os 71O

<210s, SEQ ID NO 18 &211s LENGTH: 711 212. TYPE : PRT <213> ORGANISM: Schizochytrium sp.

<4 OOs, SEQUENCE: 18

Phe Gly Ala Lieu. Gly Gly Phe Ile Ser Glin Glin Ala Glu Arg Phe Glu 1. 5 15

Pro Ala Glu Ile Lell Gly Phe Thir Luell Met Cys Ala Phe Ala 2O 25

Ala Ser Luell Thir Ala Wall Ala Gly Gly Arg Pro Ala Phe Ile Gly 35 4 O 45

Wall Ala Arg Luell Asp Gly Arg Luell Gly Phe Thir Ser Glin Gly Thir Ser SO 55 6 O

Asp Ala Luell Ala Glin Arg Gly Ala Ile Phe Gly Luell Lys 65 70

Thir Ile Gly Luell Glu Trp Ser Glu Ser Asp Wall Phe Ser Arg Gly Wall 85 90 95

Asp Ile Ala Glin Gly Met His Pro Glu Asp Ala Ala Wall Ala Ile Wall 105 11 O

Arg Glu Met Ala Ala Asp Ile Arg Ile Arg Glu Wall Gly Ile 115 12 O 125

Ala Asn Glin Glin Arg Thir Ile Arg Ala Ala Lys Lell Glu Thir 13 O 135 14 O

Asn Pro Glin Arg Glin Ile Ala Asp Asp Wall Lell Lell Wall Ser 145 150 155

Gly Ala Arg Gly Ile Thir Pro Luell Ile Arg Glu Ile Thir Arg 1.65 17O

Ile Ala Gly Gly Lys Ile Luell Luell Gly Arg Ser Wall Ser 18O 185 19 O

Ser Glu Pro Ala Trp Ala Gly Ile Thir Asp Glu Lys Ala Wall 195

Ala Ala Thir Glin Glu Lell Arg Ala Phe Ser Ala Gly Glu 21 O 215 22O

Pro Pro Thir Pro Arg Ala Wall Thir Luell Wall Gly Ser Wall Luell 225 23 O 235 24 O

Gly Ala Arg Glu Wall Arg Ser Ser Ile Ala Ala Ile Glu Ala Luell Gly 245 250 255

Gly Ala Ile Tyr Ser Ser Asp Wall ASn Ser Ala Ala Asp Wall 26 O 265 27 O

Ala Ala Wall Arg Asp Ala Glu Ser Glin Luell Gly Ala Arg Wall Ser 27s 28O 285 US 7,842,796 B2 169 170

- Continued

Gly Ile Wall His Ala Ser Gly Wall Luell Arg Asp Arg Lell Ile Glu Lys 29 O 295 3 OO

Lys Luell Pro Asp Glu Phe Asp Ala Wall Phe Gly Thir Wall Thir Gly 3. OS 310 315

Lell Glu Asn Luell Lell Ala Ala Wall Asp Arg Ala Asn Lell His Met 3.25 330 335

Wall Luell Phe Ser Ser Lell Ala Gly Phe His Gly Asn Wall Gly Glin Ser 34 O 345 35. O

Asp Ala Met Ala Asn Glu Ala Luell Asn Met Gly Luell Glu Luell 355 360 365

Ala Lys Asp Wall Ser Wall Lys Ser Ile Phe Gly Pro Trp Asp Gly 37 O 375

Gly Met Wall Thir Pro Glin Lell Glin Phe Glin Glu Met Gly Wall 385 390 395 4 OO

Glin Ile Ile Pro Arg Glu Gly Gly Ala Asp Thir Wall Ala Arg Ile Wall 4 OS 415

Lell Gly Ser Ser Pro Ala Glu Ile Luell Wall Gly Asn Trp Arg Thir Pro 425 43 O

Ser Lys Wall Gly Ser Asp Thir Ile Thir Luell His Arg Ile Ser 435 44 O 445

Ala Lys Ser Asn Pro Phe Lell Glu Asp His Wall Ile Glin Gly Arg Arg 450 45.5 460

Wall Lel Pro Met Thir Lell Ala Ile Gly Ser Luell Ala Glu Thir Luell 465 470

Gly Lel Phe Pro Gly Ser Luell Trp Ala Ile Asp Asp Ala Glin Luell 485 490 495

Phe Gly Wall Thir Wall Asp Gly Asp Wall ASn Glu Wall Thir Luell SOO 505

Thir Ser Thir Ala Pro Ser Gly Arg Wall ASn Wall Glin Ala Thir Luell 515 525

Thir Phe Ser Ser Gly Lys Luell Wall Pro Ala Tyr Arg Ala Wall Ile 53 O 535 54 O

Wall Luell Ser Asn Glin Gly Ala Pro Pro Ala ASn Ala Thir Met Glin Pro 5.45 550 555 560

Pro Ser Luell Asp Ala Asp Pro Ala Luell Glin Gly Ser Wall Asp Gly 565 st O sts

Thir Luell Phe His Gly Pro Ala Phe Arg Gly Ile Asp Asp Wall Luell 585 59 O

Ser Thir Ser Glin Lell Wall Ala Ser Ala Wall Pro Gly 595 605

Ser Asp Ala Ala Arg Gly Glu Phe Ala Thir Asp Thir Asp Ala His Asp 610 615 62O

Pro Phe Wall Asn Asp Lell Ala Phe Glin Ala Met Lell Wall Trp Wall Arg 625 630 635 64 O

Arg Thir Luell Gly Glin Ala Ala Luell Pro Asn Ser Ile Glin Arg Ile Wall 645 650 655

Glin His Arg Pro Wall Pro Glin Asp Lys Pro Phe Ile Thir Luell Arg 660 665 67 O

Ser Asn Glin Ser Gly Gly His Ser Glin His His Ala Luell Glin Phe 675 685

His Asn Glu Glin Gly Asp Lell Phe Ile Asp Wall Glin Ala Ser Wall Ile 69 O. 695 7 OO

US 7,842,796 B2 173 174

- Continued

Ser Glin Gly Luell Thir Pro Gly Glu Gly Gly Ser Ile Met Wall Lieu Lys 26 O 265 27 O cgt citc. gat gat gcc atc. cgc gac ggc gac CaC att tac ggc acc citt 864 Arg Luell Asp Asp Ala Ile Arg Asp Gly Asp His Ile Tyr Gly Thir Lieu. 27s 28O 285 citc. ggc gcc aat gtc agc aac to c ggc aca ggt Ctg c cc citc. aag ccc 912 Lell Gly Ala Asn Wall Ser Asn Ser Gly Thir Gly Lell Pro Luell Llys Pro 29 O 295 3 OO citt citc. cc c agc gag a.a.a. aag tgc citc. atg gac a CC tac acg cgc att 96.O Lell Luell Pro Ser Glu Lys Lys Cys Luell Met Asp Thir Thir Arg Ile 3. OS 310 315 32O aac gtg CaC cc.g CaC aag att cag tac gt C gag tgc CaC gcc acc ggc OO8 Asn Wall His Pro His Lys Ile Glin Tyr Wall Glu Cys His Ala Thr Gly 3.25 330 335 acg cc c cag ggt gat cgt gtg gala at C gac gcc gtc aag gcc tgc titt Thir Pro Glin Gly Asp Arg Wall Glu Ile Asp Ala Wall Lys Ala Cys Phe 34 O 345 35. O gaa ggc aag gt C c cc cgt tto ggt acc aca aag ggc aac titt gga cac 104 Glu Gly Lys Wall Pro Arg Phe Gly Thir Thir Lys Gly Asn Phe Gly His 355 360 365 a CC cts gca gcc ggc titt gcc ggt atg tgc aag gtc citc. citc. tcc. 152 Thir Xaa Ala Ala Gly Phe Ala Gly Met Cys Lys Wall Luell Luell Ser 37 O 375 38O atg aag Cat ggc atc. atc. cc.g cc c acc cc.g ggt atc. gat gac gag acc 2OO Met Lys His Gly Ile Ile Pro Pro Thir Pro Gly Ile Asp Asp Glu Thir 385 390 395 4 OO aag atg gac cott citc. gtc gtc to c ggit gag gcc atc. C Ca tgg cca gag 248 Lys Met Asp Pro Lell Wall Wall Ser Gly Glu Ala Ile Pro Trp Pro Glu 4 OS 41O 415 a CC aac ggc gag c cc aag cgc gcc ggt citc. tog gcc titt ggc titt ggit 296 Thir Asn Gly Glu Pro Lys Arg Ala Gly Luell Ser Ala Phe Gly Phe Gly 42O 425 43 O ggc acc aac gcc Cat gcc gtc titt gag gag Cat gac c cc to c aac goc 344 Gly Thir Asn Ala His Ala Wall Phe Glu Glu His Asp Pro Ser Asn Ala 435 44 O 445 gcc tgc 350 Ala Cys 450

SEQ ID NO 2 O LENGTH: 450 TYPE : PRT ORGANISM: Schizochytrium sp. FEATURE: NAMEAKEY: misc feature LOCATION: (370) ... (370) OTHER INFORMATION: The Xaa at location 370 stands for Lieu. FEATURE: NAMEAKEY: misc feature LOCATION: (371) ... (371) OTHER INFORMATION: The Xaa at location 371 stands for Ala, Wall.

<4 OOs, SEQUENCE: Met Ala Ala Arg Asn. Wal Ser Ala Ala His Glu Met His Asp Glu Lys 1. 5 15

Arg Ile Ala Val Val Gly Met Ala Wall Glin Tyr Ala Gly Cys Lys Thr 25 3O

Lys Asp Glu Phe Trp Glu Wall Luell Met Asin Gly Lys Wall Glu Ser Lys 35 4 O 45

Val Ile Ser Asp Llys Arg Lell Gly Ser Asn Tyr Arg Ala Glu His Tyr SO 55 6 O US 7,842,796 B2 175 176

- Continued

Lys Ala Glu Arg Ser Lys Ala Asp Thir Phe Cys Asn Glu Thir Tyr 65 70

Gly Thir Luell Asp Glu Asn Glu Ile Asp Asn Glu His Glu Luell Luell Luell 85 90 95

Asn Luell Ala Lys Glin Ala Lell Ala Glu Thir Ser Wall Asp Ser Thir 105 11 O

Arg Gly Ile Wall Ser Gly Cys Luell Ser Phe Pro Met Asp Asn Luell 115 12 O 125

Glin Gly Glu Luell Lell Asn Wall Glin Asn His Wall Glu Luell 13 O 135 14 O

Gly Ala Arg Wall Phe Lys Asp Ala Ser His Trp Ser Glu Arg Glu Glin 145 150 155 160

Ser Asn Pro Glu Ala Gly Asp Arg Arg Ile Phe Met Asp Pro Ala 1.65 17s

Ser Phe Wall Ala Glu Glu Lell Asn Luell Gly Ala Lell His Tyr Ser Wall 18O 185 19 O

Asp Ala Ala Ala Thir Ala Luell Wall Luell Arg Lell Ala Glin Asp 195

His Luell Wall Ser Gly Ala Ala Asp Wall Met Luell Cys Gly Ala Thir 21 O 215 22O

Lell Pro Glu Pro Phe Phe Ile Luell Ser Gly Phe Ser Thir Phe Glin Ala 225 23 O 235 24 O

Met Pro Wall Gly Thir Gly Glin Asn Wall Ser Met Pro Lell His Lys Asp 245 250 255

Ser Glin Gly Luell Thir Pro Gly Glu Gly Gly Ser Ile Met Wall Luell 26 O 265 27 O

Arg Luell Asp Asp Ala Ile Arg Asp Gly Asp His Ile Tyr Gly Thir Luell 285

Lell Gly Ala Asn Wall Ser Asn Ser Gly Thir Gly Lell Pro Luell Pro 29 O 295 3 OO

Lell Luell Pro Ser Glu Lys Luell Met Asp Thir Thir Arg Ile 3. OS 310 315

Asn Wall His Pro His Ile Glin Wall Glu His Ala Thir Gly 3.25 330 335

Thir Pro Glin Gly Asp Arg Wall Glu Ile Asp Ala Wall Ala Phe 34 O 345 35. O

Glu Gly Lys Wall Pro Arg Phe Gly Thir Thir Gly Asn Phe Gly His 355 360 365

Thir Xaa Xaa Ala Ala Gly Phe Ala Gly Met Cys Lys Wall Luell Luell Ser 37 O 375

Met His Gly Ile Ile Pro Pro Thir Pro Gly Ile Asp Asp Glu Thir 385 390 395 4 OO

Met Asp Pro Lell Wall Wall Ser Gly Glu Ala Ile Pro Trp Pro Glu 4 OS 415

Thir Asn Gly Glu Pro Arg Ala Gly Luell Ser Ala Phe Gly Phe Gly 425 43 O

Gly Thir Asn Ala His Ala Wall Phe Glu Glu His Asp Pro Ser Asn Ala 435 44 O 445

Ala Cys 450

<210s, SEQ ID NO 21 &211s LENGTH: 1323

US 7,842,796 B2 179 180

- Continued atc. gt C cott ggc aac gtc cott agc gcc tgc ttg cgc gag gcc citc. gac 912 Ile Wall Pro Gly Asn Wall Pro Ser Ala Cys Luell Arg Glu Ala Luell Asp 29 O 295 3 OO

Cag gcg cgc gt C aag cc.g ggc gat at C gag atg citc. gag citc. agc gcc 96.O Glin Ala Arg Wall Lys Pro Gly Asp Ile Glu Met Lell Glu Luell Ser Ala 3. OS 310 315 32O gac to c gcc cgc CaC citc. aag gac cc.g to c gtc Ctg c cc aag gag citc. OO8 Asp Ser Ala Arg His Lell Lys Asp Pro Ser Wall Lell Pro Lys Glu Luell 3.25 330 335 act gcc gag gag gaa atc. ggc ggc citt cag acg atc. citt cgt. gac gat Thir Ala Glu Glu Glu Ile Gly Gly Luell Glin Thir Ile Lell Arg Asp Asp 34 O 345 35. O gac aag citc. cgc aac gtc gca acg ggc gtc aag gcc acc gt C 104 Asp Lys Luell Pro Arg Asn Wall Ala Thir Gly Ser Wall Lys Ala Thir Wall 355 360 365 ggit gac acc ggt tat gcc tot ggt gct gcc agc citc. atc. aag gct gcg 152 Gly Asp Thir Gly Tyr Ala Ser Gly Ala Ala Ser Lell Ile Lys Ala Ala 37 O 375 38O citt at C tac aac cgc tac Ctg cc c agc aac ggc gac gac tgg gat 2OO Lell Cys Ile Asn Arg Luell Pro Ser ASn Gly Asp Asp Trp Asp 385 390 395 4 OO gaa cc c gcc cott gag gcg c cc tgg gac agc acc citc. titt gcg tgc cag 248 Glu Pro Ala Pro Glu Ala Pro Trp Asp Ser Thir Lell Phe Ala Cys Glin 4 OS 41O 415 a CC tog cgc gct tgg citc. aag aac cott ggc gag cgt cgc tat gcg gcc 296 Thir Ser Arg Ala Trp Lell Asn Pro Gly Glu Arg Arg Tyr Ala Ala 42O 425 43 O gtc tog ggc gt C t cc gag acg cgc tog 323 Wall Ser Gly Wall Ser Glu Thir Arg Ser 435 44 O

SEQ ID NO 22 LENGTH: 441 TYPE : PRT ORGANISM: Schizochytrium sp.

< 4 OOs SEQUENCE: 22

Ser Ala Arg Cys Gly Gly Glu Ser Asn Met Arg Ile Ala Ile Thir Gly 1. 5 15

Met Asp Ala Thir Phe Gly Ala Luell Lys Gly Luell Asp Ala Phe Glu Arg 25 3O

Ala Ile Tyr Thir Gly Ala His Gly Ala Ile Pro Lell Pro Glu Arg 35 4 O 45

Trp Arg Phe Luell Gly Asp Asp Phe Luell Asp Lell Gly Wall SO 55 6 O

Lys Ala Thir Pro His Gly Ile Glu Asp Wall Glu Wall Asp Phe 65 70

Glin Arg Luell Arg Thir Pro Met Thir Pro Glu Asp Met Lell Luell Pro Glin 85 90 95

Glin Luell Luell Ala Wall Thir Thir Ile Asp Arg Ala Ile Lell Asp Ser Gly 105 11 O

Met Lys Gly Gly Asn Wall Ala Wall Phe Wall Gly Lell Gly Thir Asp 115 12 O 125

Lell Glu Luell His Arg Ala Arg Wall Ala Lell Glu Arg Wall 13 O 135 14 O

Arg Pro Glu Ala Ser Lys Luell Asn Asp Met Met Glin Ile Asn 145 150 155 160 US 7,842,796 B2 181 182

- Continued Asp Gly Thir Ser Thir Ser Thir Ser Tyr Ile Gly Asn Luell Wall 1.65 17O 17s

Ala Thir Arg Wall Ser Ser Glin Trp Gly Phe Thir Gly Pro Ser Phe Thir 18O 185 19 O

Ile Thir Glu Gly Asn Asn Ser Wall Arg Ala Glu Luell Gly 195

Luell Luell Glu Thir Gly Glu Wall Asp Gly Wall Wall Wall Ala Gly Wall 21 O 215 22O

Asp Luell Gly Ser Ala Glu Asn Luell Wall Ser Arg Arg Phe 225 23 O 235 24 O

Wall Ser Thir Ser Asp Thir Pro Arg Ala Ser Phe Asp Ala Ala Ala 245 250 255

Asp Gly Phe Wall Gly Glu Gly Cys Gly Ala Phe Wall Luell Arg 26 O 265 27 O

Glu Thir Ser Thir Asp Asp Arg Ile Ala Cys Met Asp Ala 27s 285

Ile Wall Pro Gly Asn Wall Pro Ser Ala Luell Arg Glu Ala Luell Asp 29 O 295 3 OO

Glin Ala Arg Wall Lys Pro Gly Asp Ile Glu Met Lell Glu Luell Ser Ala 3. OS 310 315

Asp Ser Ala Arg His Lell Asp Pro Ser Wall Lell Pro Glu Luell 3.25 330 335

Thir Ala Glu Glu Glu Ile Gly Gly Luell Glin Thir Ile Lell Arg Asp Asp 34 O 345 35. O

Asp Luell Pro Arg Asn Wall Ala Thir Gly Ser Wall Lys Ala Thir Wall 355 360 365

Gly Asp Thir Gly Tyr Ala Ser Gly Ala Ala Ser Lell Ile Ala Ala 37 O 375

Lell Ile Asn Arg Luell Pro Ser ASn Gly Asp Asp Trp Asp 385 390 395 4 OO

Glu Pro Ala Pro Glu Ala Pro Trp Asp Ser Thir Lell Phe Ala Cys Glin 4 OS 41O 415

Thir Ser Arg Ala Trp Lell Asn Pro Gly Glu Arg Arg Tyr Ala Ala 42O 425 43 O

Wall Ser Gly Wall Ser Glu Thir Arg Ser 435 44 O

SEQ ID NO 23 LENGTH: 15OO TYPE: DNA ORGANISM: Schizochytrium sp. FEATURE: NAME/KEY: CDS LOCATION: (1) ... (15 OO)

<4 OOs, SEQUENCE: 23 tgc tat to c gtg citc. citc. t cc gala gcc gag ggc CaC tac gag cgc gag 48 Cys Tyr Ser Wall Lell Lell Ser Glu Ala Glu Gly His Glu Arg Glu 1. 1O 15 aac cgc at C tog citc. gac gag gag gcg cc c aag citc. att gtg citt cgc 96 Asn Arg Ile Ser Lell Asp Glu Glu Ala Pro Lys Lell Ile Wall Luell Arg 2O 25 3O gcc gac to c CaC gag gag atc. citt ggt cgc citc. gac aag at C cgc gag 144 Ala Asp Ser His Glu Glu Ile Luell Gly Arg Luell Asp Lys Ile Arg Glu 35 4 O 45 cgc ttic ttg cag c cc acg ggc gcc gcc cc.g cgc gag t cc gag citc. aag 192 Arg Phe Luell Glin Pro Thir Gly Ala Ala Pro Arg Glu Ser Glu Luell Lys

US 7,842,796 B2 185 186

- Continued

Gly Pro Thir Asp Ile Ala Ile His Ala Asn Lieu. Glu Phe 37 O 375 38O c cc gtt gt C gac ggc citt gac citc. tgg acc aca atc. aac cag aag cgc 2OO Pro Wall Wall Asp Gly Lell Asp Luell Trp Thir Thir Ile Asn Glin Lys Arg 385 390 395 4 OO citc. gtg CC a cgc gcc acg ggc gcc aag gac gaa tgg gcc cott tot to c 248 Lell Wall Pro Arg Ala Thir Gly Ala Lys Asp Glu Trp Ala Pro Ser Ser 4 OS 41O 415 titt ggc gag tac gcc ggc Cag citc. tac gag aag Cag gct aac ttic cc c 296 Phe Gly Glu Tyr Ala Gly Glin Luell Tyr Glu Lys Glin Ala Asn Phe Pro 42O 425 43 O

Cala at C gt C gag a CC att tac aag Cala aac tac gac gtc titt gt C gag 344 Glin Ile Wall Glu Thir Ile Lys Glin Asn Tyr Asp Wall Phe Wall Glu 435 44 O 445 gtt 999 cc c aac aac CaC cgt agc acc gca gtg cgc a CC acg citt ggt 392 Wall Gly Pro Asn Asn His Arg Ser Thir Ala Wall Arg Thir Thir Luell Gly 450 45.5 460 c cc cag cgc aac CaC citt gct ggc gcc at C gac aag Cag aac gag gat 44 O Pro Glin Arg Asn His Lell Ala Gly Ala Ile Asp Lys Glin Asn Glu Asp 465 470 47s 48O gct tgg acg acc atc. gtc aag citt gtg gct tog citc. aag gcc CaC citt 488 Ala Trp Thir Thir Ile Wall Lys Luell Wall Ala Ser Lell Lys Ala His Luell 485 490 495 gtt cott ggc gt C SOO Wall Pro Gly Wall SOO

SEQ ID NO 24 LENGTH: SOO TYPE : PRT ORGANISM: Schizochytrium sp.

< 4 OOs SEQUENCE: 24

Cys Tyr Ser Val Lell Lell Ser Glu Glu Gly His Glu Arg Glu 1. 5 15

Asn Arg Ile Ser Lell Asp Glu Glu Pro Lell Ile Wall Luell Arg

Ala Asp Ser His Glu Glu Ile Luell Arg Luell Asp Lys Ile Arg Glu 35 4 O 45

Arg Phe Luell Glin Pro Thir Gly Ala Pro Arg Glu Ser Glu Luell SO 55 6 O

Ala Glin Ala Arg Arg Ile Phe Luell Luell Luell Gly Glu Thir Luell Ala 65 70 8O

Glin Asp Ala Ala Ser Ser Gly Ser Lys Pro Lell Ala Luell Ser Luell 85 90 95

Wall Ser Thir Pro Ser Lell Glin Arg Glu Wall Glu Lell Ala Ala 105 11 O

Gly Ile Pro Arg Lell Met Arg Arg Asp Trp Ser Ser Pro Ala 115 12 O 125

Gly Ser Arg Ala Pro Glu Pro Luell Ala Ser Asp Arg Wall Ala Phe 13 O 135 14 O

Met Gly Glu Gly Arg Ser Pro Tyr Gly Ile Thir Glin Asp Ile 145 150 155 160

His Arg Ile Trp Pro Glu Lell His Glu Wall Ile Asn Glu Thir Asn 1.65 17O 17s

Arg Luell Trp Ala Glu Gly Asp Arg Trp Wall Met Pro Arg Ala Ser Phe 18O 185 19 O