<<

US 2002O194641A1 (19) United States (12) Patent Application Publication (10) Pub. No.: US 2002/0194641 A1 Metz et al. (43) Pub. Date: Dec. 19, 2002

(54) PUFA POLYKETIDE SYNTHASE SYSTEMS Publication Classification AND USES THEREOF (76) Inventors: James G. Metz, Longmont, CO (US); (51) Int. Cl." ...... A01H 5700; CO7H 21/04; James H. Flatt, Longmont, CO (US); C12P 19/62; C12N 9/10; C12P 21/02; Jerry M. Kuner, Longmont, CO (US); C12N 5/04; C12N 1/21 William R. Barclay, Boulder, CO (US) (52) U.S. Cl...... 800/281; 435/193; 435/252.3; Correspondence Address: 435/69.1; 435/320.1; 536/23.2; Angela K. Dallas 435/76 SHERIDAN ROSS PC. Suite 1200 1560 Broadway (57) ABSTRACT Denver, CO 80202-5141 (US) (21) Appl. No.: 10/124,800 The invention generally relates to polyunsaturated fatty acid (22) Filed: Apr. 16, 2002 (PUFA) polyketide synthase (PKS) systems isolated from or derived from non-bacterial organisms, to homologues Related U.S. Application Data thereof, to isolated nucleic acid molecules and recombinant nucleic acid molecules encoding biologically active (63) Continuation-in-part of application No. 09/231,899, domains of such a PUFA PKS system, to genetically modi filed on Jan. 14, 1999. fied organisms comprising PUFA PKS systems, to methods (60) Provisional application No. 60/284.066, filed on Apr. of making and using Such Systems for the production of 16, 2001. Provisional application No. 60/298,796, bioactive molecules of interest, and to novel methods for filed on Jun. 15, 2001. Provisional application No. identifying new bacterial and non-bacterial microorganisms 60/323,269, filed on Sep. 18, 2001. having such a PUFA PKS system. Patent Application Publication Dec. 19, 2002 Sheet 1 of 3 US 2002/0194641 A1 É N O er

i

t g s 2 2 Patent Application Publication Dec. 19, 2002 Sheet 2 of 3 US 2002/0194641 A1

HTIOSX! Z*OIH J.JO9JJO/ :Su?eUIOp/SJIOSYIAHJOUOS?.IeduI0O p113upº?tat/SSAunt/1/(?OOz??OS

Patent Application Publication Dec. 19, 2002 Sheet 3 of 3 US 2002/0194641 A1

US 2002/0194641 A1 Dec. 19, 2002

PUFA POLYKETIDE SYNTHASE SYSTEMS AND production of the end product. The domains are found in USES THEREOF very large proteins and the product of each reaction is passed on to another in the PKS protein. Additionally, in all CROSS-REFERENCE TO RELATED of the PKS systems described above, if a carbon-carbon APPLICATIONS double bond is incorporated into the end product, it is always 0001. This application claims the benefit of priority under in the trans configuration. 35 U.S.C. S 119(e) to: U.S. Provisional Application Serial 0006. In the Type I and Type II PKS systems described No. 60/284,066, filed Apr. 16, 2001, entitled “A Polyketide above, the same Set of reactions is carried out in each cycle Synthase System and Uses Thereof; U.S. Provisional until the end product is obtained. There is no allowance for Application Serial No. 60/298,796, filed Jun. 15, 2001, the introduction of unique reactions during the biosynthetic entitled “A Polyketide Synthase System and Uses Thereof; procedure. The modular PKS Systems require huge proteins and U.S. Provisional Application Serial No. 60/323,269, that do not utilize the economy of iterative reactions (i.e., a filed Sep. 18, 2001, entitled “Thraustochytrium PUFA PKS distinct domain is required for each reaction). Additionally, System and Uses Thereof'. This application is also a con as Stated above, carbon-carbon double bonds are intro tinuation-in-part of copending U.S. application Ser. No.09/ duced in the trans configuration in all of the previously 231,899, filed Jan. 14, 1999, entitled “Schizochytrium PKS described PKS systems. Genes”. Each of the above-identified patent applications is incorporated herein by reference in its entirety. 0007 Polyunsaturated fatty acids (PUFAs) are critical components of membrane lipids in most eukaryotes (Lau 0002 This application does not claim the benefit of ritzen et al., Prog. Lipid Res. 401 (2001); McConn et al., priority from U.S. application Ser. No. 09/090,793, filed Jun. Plant J. 15, 25 521 (1998)) and are precursors of certain 4, 1998, now U.S. Pat. No. 6,140,486, although U.S. appli hormones and signaling molecules (Heller et al., Drugs 55, cation Ser. No. 09/090,793 is incorporated herein by refer 487 (1998); Creelman et al., Annu. Rev. Plant Physiol. Plant ence in its entirety. Mol. Biol. 48, 355 (1997)). Known pathways of PUFA synthesis involve the processing of Saturated 16:0 or 18:0 FIELD OF THE INVENTION fatty acids (the abbreviation X:Y indicates an acyl group 0003. This invention relates to polyunsaturated fatty acid containing X carbon atoms and Y cis double bonds; double (PUFA) polyketide synthase (PKS) systems from microor bond positions of PUFAs are indicated relative to the methyl ganisms, including eukaryotic organisms, Such as ThrauS carbon of the fatty acid chain (c)3 or (O6) with systematic tochytrid microorganisms. More particularly, this invention methylene interruption of the double bonds) derived from relates to nucleic acids encoding non-bacterial PUFA PKS fatty acid Synthase (FAS) by elongation and aerobic desatu systems, to non-bacterial PUFAPKS systems, to genetically ration reactions (Sprecher, Curr. Opin. Clin. Nutr. Metab. modified organisms comprising non-bacterial PUFA PKS Care 2, 135 (1999); Parker-Barnes et al., Proc. Natl. Acad. Systems, and to methods of making and using the non Sci. USA97, 8284 (2000); Shanklin et al., Annu. Rev. Plant bacterial PUFA PKS systems disclosed herein. This inven Physiol. Plant Mol. Biol. 49, 611 (1998)). Starting from tion also relates to a method to identify bacterial and acetyl-CoA, the Synthesis of DHA requires approximately non-bacterial microorganisms comprising PUFA PKS sys 30 distinct enzyme activities and nearly 70 reactions includ temS. ing the four repetitive Steps of the fatty acid Synthesis cycle. Polyketide synthases (PKSs) carry out some of the same BACKGROUND OF THE INVENTION reactions as FAS (Hopwood et al., Annu. Rev. Genet. 24, 37 0004 Polyketide synthase (PKS) systems are generally (1990); Bentley et al., Annu. Rev. Microbial 53,411 (1999)) known in the art as enzyme complexes derived from fatty and use the same Small protein (or domain), acyl carrier acid synthase (FAS) systems, but which are often highly protein (ACP), as a covalent attachment site for the growing modified to produce Specialized products that typically Show carbon chain. However, in these enzyme Systems, the com little resemblance to fatty acids. Researchers have attempted plete cycle of reduction, dehydration and reduction Seen in to exploit polyketide synthase (PKS) systems that have been FAS is often abbreviated so that a highly derivatized carbon described in the literature as falling into one of three basic chain is produced, typically containing many keto- and types, typically referred to as: Type II, Type I and modular. hydroxy-groupS as well as carbon-carbon double bonds in The Type II System is characterized by Separable proteins, the trans configuration. The linear products of PKSs are each of which carries out a distinct enzymatic reaction. The often cyclized to form complex biochemicals that include enzymes work in concert to produce the end product and antibiotics and many other Secondary products (Hopwood et each individual enzyme of the System typically participates al., (1990) Supra; Bentley et al., (1999), Supra; Keating et al., Several times in the production of the end product. This type Curr. Opin. Chem. Biol. 3,598 (1999)). of System operates in a manner analogous to the fatty acid 0008 Very long chain PUFAs such as docosahexaenoic Synthase (FAS) Systems found in plants and bacteria. Type acid (DHA; 22:603) and eicosapentaenoic acid (EPA; I PKS systems are similar to the Type II system in that the 20:5c)3) have been reported from several of marine enzymes are used in an iterative fashion to produce the end bacteria, including Shewanella sp (Nichols et al., Curr. Op. product. The Type I differs from Type II in that enzymatic Biotechnol. 10, 240 (1999); Yazawa, Lipids 31, S (1996); activities, instead of being associated with Separable pro DeLong et al., Appl. Environ. Microbiol. 51, 730 (1986)). teins, occur as domains of larger proteins. This System is Analysis of a genomic fragment (cloned as plasmid pEPA) analogous to the Type I FAS Systems found in animals and from Shewanella sp. strain SCRC2738 led to the identifi fungi. cation of five open reading frames (Orfs), totaling 20 Kb, 0005. In contrast to the Type I and II systems, in modular that are necessary and sufficient for EPA production in E. PKS Systems, each enzyme domain is used only once in the coli (Yazawa, (1996), Supra). Several of the predicted pro

US 2002/0194641 A1 Dec. 19, 2002 least for EPA and DHA. Additionally, for production of about 60% identical to the amino acid sequence of (b), uSeable quantities of Such PUFAS, additional engineering wherein the amino acid Sequence has a biological activity of efforts may be required, for instance the down regulation of at least one domain of a polyunsaturated fatty acid (PUFA) enzymes competing for Substrate, engineering of higher polyketide synthase (PKS) system; and (e) a nucleic acid enzyme activities Such as by mutagenesis or targeting of Sequence that is fully complementary to the nucleic acid enzymes to plastid organelles. Therefore it is of interest to Sequence of (a), (b), (c), or (d). In alternate aspects, the obtain genetic material involved in PUFA biosynthesis from nucleic acid Sequence encodes an amino acid Sequence that Species that naturally produce these fatty acids and to is at least about 70% identical, or at least about 80% express the isolated material alone or in combination in a identical, or at least about 90% identical, or is identical to: heterologous System which can be manipulated to allow (1) at least 500 consecutive amino acids of an amino acid production of commercial quantities of PUFAs. Sequence Selected from the group consisting of: SEQ ID 0013 The discovery of a PUFA PKS system in marine NO:2, SEQ ID NO:4, and SEQ ID NO:6; and/or (2) a bacteria such as Shewanella and Vibrio marinus (see U.S. nucleic acid Sequence encoding an amino acid Sequence that Pat. No. 6,140,486, ibid.) provides a resource for new is at least about 70% identical to an amino acid Sequence methods of commercial PUFA production. However, these selected from the group consisting of: SEQ ID NO:8, SEQ marine bacteria have limitations which will ultimately ID NO:10, SEQID NO:13, SEQID NO:18, SEQID NO:20, restrict their usefulness on a commercial level. First, SEQ ID NO:22, SEQ ID NO:24, SEQ ID NO:26, SEQ ID although U.S. Pat. No. 6,140,486 discloses that the marine NO:28, SEQ ID NO:30, and SEQ ID NO:32. In a preferred bacteria PUFA PKS systems can be used to genetically embodiment, the nucleic acid Sequence encodes an amino modify plants, the marine bacteria naturally live and grow in acid sequence chosen from: SEQ ID NO:2, SEQ ID NO:4, cold marine environments and the enzyme Systems of these SEQ ID NO:6, SEQ ID NO:8, SEQ ID NO:10, SEQ ID bacteria do not function well above 30° C. In contrast, many NO:13, SEQ ID NO:18, SEQ ID NO:20, SEQ ID NO:22, crop plants, which are attractive targets for genetic manipu SEQ ID NO:24, SEQ ID NO:26, SEQ ID NO:28, SEQ ID lation using the PUFA PKS system, have normal growth NO:30, SEQID NO:32 and/or biologically active fragments conditions at temperatures above 30° C. and ranging to thereof. In one aspect, the nucleic acid Sequence is chosen higher than 40°C. Therefore, the marine bacteria PUFAPKS from: SEQ ID NO:1, SEQ ID NO:3, SEQ ID NO:5, SEQ ID System is not predicted to be readily adaptable to plant NO:7, SEQID NO:9, SEQ ID NO:12, SEQ ID NO:17, SEQ expression under normal growth conditions. Moreover, the ID NO:19, SEQID NO:21, SEQID NO:23, SEQID NO:25, marine bacteria PUFA PKS genes, being from a bacterial SEQ ID NO:27, SEQ ID NO:29, and SEQ ID NO:31. Source, may not be compatible with the genomes of eukary 0016. Another embodiment of the present invention otic host cells, or at least may require Significant adaptation relates to a recombinant nucleic acid molecule comprising to work in eukaryotic hosts. Additionally, the known marine the nucleic acid molecule as described above, operatively bacteria PUFA PKS systems do not directly produce trig linked to at least one transcription control Sequence. In lycerides, whereas direct production of triglycerides would another embodiment, the present invention relates to a be desirable because triglycerides are a lipid Storage product recombinant cell transfected with the recombinant nucleic in microorganisms and as a result can be accumulated at acid molecule described directly above. very high levels (e.g. up to 80-85% of cell weight) in 0017. Yet another embodiment of the present invention microbial/plant cells (as opposed to a "structural lipid relates to a genetically modified microorganism, wherein the product (e.g. phospholipids) which can generally only accu microorganism expresses a PKS System comprising at least mulate at low levels (e.g. less than 10-15% of cell weight at one biologically active domain of a polyunsaturated fatty maximum)). acid (PUFA) polyketide synthase (PKS) system. The at least one domain of the PUFA PKS system is encoded by a 0014) Therefore, there is a need in the art for other PUFA nucleic acid sequence chosen from: (a) a nucleic acid PKS systems having greater flexibility for commercial use. Sequence encoding at least one domain of a polyunsaturated fatty acid (PUFA) polyketide synthase (PKS) system from a SUMMARY OF THE INVENTION Thraustochytrid microorganism; (b) a nucleic acid sequence 0.015. One embodiment of the present invention relates to encoding at least one domain of a PUFA PKS system from an isolated nucleic acid molecule comprising a nucleic acid a microorganism identified by the Screening method of the Sequence chosen from: (a) a nucleic acid sequence encoding present invention; (c) a nucleic acid sequence encoding an an amino acid Sequence Selected from the group consisting amino acid Sequence Selected from the group consisting of: of: SEQ ID NO:2, SEQ ID NO:4, SEQ ID NO:6, and SEQ ID NO:2, SEQ ID NO:4, SEQ ID NO:6, and biologi biologically active fragments thereof; (b) a nucleic acid cally active fragments thereof, (d) a nucleic acid sequence Sequence encoding an amino acid Sequence Selected from encoding an amino acid Sequence Selected from the group the group consisting of: SEQ ID NO:8, SEQID NO:10, SEQ consisting of: SEQ ID NO:8, SEQ ID NO:10, SEQ ID ID NO:13, SEQID NO:18, SEQID NO:20, SEQID NO:22, NO:13, SEQ ID NO:18, SEQ ID NO:20, SEQ ID NO:22, SEQ ID NO:24, SEQ ID NO:26, SEQ ID NO:28, SEQ ID SEQ ID NO:24, SEQ ID NO:26, SEQ ID NO:28, SEQ ID NO:30, SEQ ID NO:32, and biologically active fragments NO:30, SEQ ID NO:32, and biologically active fragments thereof; (c) a nucleic acid sequence encoding an amino acid thereof; (e) a nucleic acid sequence encoding an amino acid sequence that is at least about 60% identical to at least 500 sequence that is at least about 60% identical to at least 500 consecutive amino acids of the amino acid sequence of (a), consecutive amino acids of an amino acid Sequence Selected wherein the amino acid Sequence has a biological activity of from the group consisting of: SEQ ID NO:2, SEQ ID NO:4, at least one domain of a polyunsaturated fatty acid (PUFA) and SEQ ID NO:6; wherein the amino acid sequence has a polyketide Synthase (PKS) System; (d) a nucleic acid biological activity of at least one domain of a PUFA PKS Sequence encoding an amino acid Sequence that is at least System; and, (f) a nucleic acid sequence encoding an amino US 2002/0194641 A1 Dec. 19, 2002

acid Sequence that is at least about 60% identical to an amino bacterial PUFA PKS system, from a Type I PKS system, acid Sequence Selected from the group consisting of: SEQID from a Type II PKS system, and/or from a modular PKS NO:8, SEQ ID NO:10, SEQ ID NO:13, SEQ ID NO:18, System. SEQ ID NO:20, SEQ ID NO:22, SEQ ID NO:24, SEQ ID NO:26, SEQ ID NO:28, SEQ ID NO:30, and SEQ ID 0020. In another aspect of this embodiment, the micro NO:32, wherein the amino acid Sequence has a biological organism endogenously expresses a PUFA PKS system activity of at least one domain of a PUFA PKS system. In comprising the at least one biologically active domain of a this embodiment, the microorganism is genetically modified PUFA PKS system, and wherein the genetic modification to affect the activity of the PKS system. The screening comprises expression of a recombinant nucleic acid mol method of the present invention referenced in (b) above ecule Selected from the group consisting of a recombinant comprises: (i) selecting a microorganism that produces at nucleic acid molecule encoding at least one biologically least one PUFA; and, (ii) identifying a microorganism from active domain from a Second PKS System and a recombinant (i) that has an ability to produce increased PUFAs under nucleic acid molecule encoding a protein that affects the dissolved oxygen conditions of less than about 5% of activity of the PUFA PKS system. Preferably, the recombi Saturation in the fermentation medium, as compared to nant nucleic acid molecule comprises any one of the nucleic production of PUFAs by the microorganism under dissolved acid Sequences described above. oxygen conditions of greater than 5% of Saturation, and 0021. In one aspect of this embodiment, the recombinant more preferably 10% of Saturation, and more preferably nucleic acid molecule encodes a phosphopantetheline trans greater than 15% of Saturation, and more preferably greater ferase. In another aspect, the recombinant nucleic acid than 20% of Saturation in the fermentation medium. molecule comprises a nucleic acid Sequence encoding at 0.018. In one aspect, the microorganism endogenously least one biologically active domain from a bacterial PUFA expresses a PKS System comprising the at least one domain PKS system, from a type I PKS system, from a type II PKS of the PUFA PKS system, and wherein the genetic modifi system, and/or from a modular PKS system. cation is in a nucleic acid Sequence encoding the at least one 0022. In another aspect of this embodiment, the micro domain of the PUFA PKS system. For example, the genetic organism is genetically modified by transfection with a modification can be in a nucleic acid Sequence that encodes recombinant nucleic acid molecule encoding the at least one a domain having a biological activity of at least one of the domain of a polyunsaturated fatty acid (PUFA) polyketide following proteins: malonyl-CoA:ACP acyltransferase synthase (PKS) system. Such a recombinant nucleic acid (MAT), B-keto acyl-ACP synthase (KS), ketoreductase molecule can include any recombinant nucleic acid mol (KR), acyltransferase (AT), FabA-like B-hydroxyacyl-ACP ecule comprising any of the nucleic acid sequences dehydrase (DH), phosphopantetheline transferase, chain described above. In one aspect, the microorganism has been length factor (CLF), acyl carrier protein (ACP), enoyl ACP further genetically modified to recombinantly express at reductase (ER), an enzyme that catalyzes the Synthesis of least one nucleic acid molecule encoding at least one bio trans-2-decenoyl-ACP, an enzyme that catalyzes the revers logically active domain from a bacterial PUFAPKS system, ible isomerization of trans-2-decenoyl-ACP to cis-3-de from a Type I PKS system, from a Type II PKS system, or cenoyl-ACP, and an enzyme that catalyzes the elongation of from a modular PKS system. cis-3-decenoyl-ACP to cis-Vaccenic acid. In one aspect, the genetic modification is in a nucleic acid Sequence that 0023 Yet another embodiment of the present invention encodes an amino acid Sequence Selected from the group relates to a genetically modified plant, wherein the plant has consisting of: (a) an amino acid sequence that is at least been genetically modified to recombinantly express a PKS about 70% identical, and preferably at least about 80% System comprising at least one biologically active domain of identical, and more preferably at least about 90% identical a polyunsaturated fatty acid (PUFA) polyketide synthase and more preferably identical to at least 500 consecutive (PKS) system. The domain can be encoded by any of the amino acids of an amino acid Sequence Selected from the nucleic acid Sequences described above. In one aspect, the group consisting of: SEQ ID NO:2, SEQID NO:4, and SEQ plant has been further genetically modified to recombinantly ID NO:6; wherein the amino acid sequence has a biological express at least one nucleic acid molecule encoding at least activity of at least one domain of a PUFAPKS system; and, one biologically active domain from a bacterial PUFA PKS (b) an amino acid sequence that is at least about 70% system, from a Type I PKS system, from a Type II PKS identical, and preferably at least about 80% identical, and system, and/from a modular PKS system. more preferably at least about 90% identical and more 0024. Another embodiment of the present invention preferably identical to an amino acid Sequence Selected from relates to a method to identify a microorganism that has a the group consisting of: SEQ ID NO:8, SEQID NO:10, SEQ polyunsaturated fatty acid (PUFA) polyketide synthase ID NO:13, SEQID NO:18, SEQID NO:20, SEQID NO:22, (PKS) system. The method includes the steps of: (a) select SEQ ID NO:24, SEQ ID NO:26, SEQ ID NO:28, SEQ ID ing a microorganism that produces at least one PUFA; and, NO:30, and SEQ ID NO:32; wherein the amino acid (b) identifying a microorganism from (a) that has an ability Sequence has a biological activity of at least one domain of to produce increased PUFAS under dissolved oxygen con a PUFA PKS system. ditions of less than about 5% of Saturation in the fermenta 0019. In one aspect, the genetically modified microor tion medium, as compared to production of PUFAS by the ganism is a ThrauStochytrid, which can include, but is not microorganism under dissolved oxygen conditions of greater limited to, a ThrauStochytrid from a chosen from than 5% of Saturation, more preferably 10% of Saturation, Schizochytrium and ThrauStochytrium. In another aspect, more preferably greater than 15% of Saturation and more the microorganism has been further genetically modified to preferably greater than 20% of saturation in the fermentation recombinantly express at least one nucleic acid molecule medium. A microorganism that produces at least one PUFA encoding at least one biologically active domain from a and has an ability to produce increased PUFAS under dis US 2002/0194641 A1 Dec. 19, 2002

solved oxygen conditions of less than about 5% of Saturation one biologically active domain of a polyunsaturated fatty is identified as a candidate for containing a PUFA PKS acid (PUFA) polyketide synthase (PKS) system. The domain System. of the PUFA PKS system is encoded by any of the nucleic 0025 In one aspect of this embodiment, step (b) com acid Sequences described above. prises identifying a microorganism from (a) that has an 0030. In one aspect of this embodiment, the organism ability to produce increased PUFAs under dissolved oxygen endogenously expresses a PKS System comprising the at conditions of less than about 2% of Saturation, and more least one domain of the PUFA PKS system, and the genetic preferably under dissolved oxygen conditions of less than modification is in a nucleic acid Sequence encoding the at about 1% of Saturation, and even more preferably under least one domain of the PUFAPKS system. For example, the dissolved conditions of about 0% of Saturation. genetic modification can change at least one product pro 0026. In another aspect ofthis embodiment, the microor duced by the endogenous PKS System, as compared to a ganism Selected in (a) has an ability to consume bacteria by wild-type organism. phagocytosis. In another aspect, the microorganism Selected 0031. In another aspect of this embodiment, the organism in (a) has a simple fatty acid profile. In another aspect, the endogenously expresses a PKS System comprising the at microorganism Selected in (a) is a non-bacterial microor least one biologically active domain of the PUFA PKS ganism. In another aspect, the microorganism Selected in (a) System, and the genetic modification comprises transfection is a eukaryote. In another aspect, the microorganism Selected of the organism with a recombinant nucleic acid molecule in (a) is a member of the order Thraustochytriales. In another Selected from the group consisting of a recombinant nucleic aspect, the microorganism Selected in (a) has an ability to acid molecule encoding at least one biologically active produce PUFAS at a temperature greater than about 15 C., domain from a second PKS system and a recombinant and preferably greater than about 20° C., and more prefer nucleic acid molecule encoding a protein that affects the ably greater than about 25 C., and even more preferably activity of the PUFA PKS system. For example, the genetic greater than about 30° C. In another aspect, the microor modification can change at least one product produced by ganism Selected in (a) has an ability to produce bioactive the endogenous PKS System, as compared to a wild-type compounds (e.g., lipids) of interest at greater than 5% of the organism. dry weight of the organism, and more preferably greater than 10% of the dry weight of the organism. In yet another aspect, 0032. In yet another aspect of this embodiment, the the microorganism Selected in (a) contains greater than 30% organism is genetically modified by transfection with a of its total fatty acids as C14:0, C16:0 and C16:1 while also recombinant nucleic acid molecule encoding the at least one producing at least one long chain fatty acid with three or domain of the polyunsaturated fatty acid (PUFA) polyketide more unsaturated bonds, and preferably, the microorganism Synthase (PKS) system. In another aspect, the organism Selected in (a) contains greater than 40% of its total fatty produces a polyunsaturated fatty acid (PUFA) profile that acids as C14:0, C16:0 and C16:1 while also producing at differs from the naturally occurring organism without a least one long chain fatty acid with three or more unsaturated genetic modification. In another aspect, the organism endog bonds. In another aspect, the microorganism Selected in (a) enously expresses a non-bacterial PUFA PKS system, and contains greater than 30% of its total fatty acids as C14:0, wherein the genetic modification comprises Substitution of a C16:0 and C16:1 while also producing at least one long domain from a different PKS system for a nucleic acid chain fatty acid with four or more unsaturated bonds, and Sequence encoding at least one domain of the non-bacterial more preferably while also producing at least one long chain PUFA PKS system. fatty acid with five or more unsaturated bonds. 0033. In yet another aspect, the organism endogenously expresses a non-bacterial PUFA PKS system that has been 0027. In another aspect of this embodiment, the method modified by transfecting the organism with a recombinant further comprises step (c) of detecting whether the organism nucleic acid molecule encoding a protein that regulates the comprises a PUFA PKS system. In this aspect, the step of chain length of fatty acids produced by the PUFA PKS detecting can include detecting a nucleic acid Sequence in System. For example, the recombinant nucleic acid molecule the microorganism that hybridizes under Stringent condi encoding a protein that regulates the chain length of fatty tions with a nucleic acid Sequence encoding an amino acid acids can replace a nucleic acid Sequence encoding a chain sequence from a Thraustochytrid PUFA PKS system. Alter length factor in the non-bacterial PUFA PKS system. In natively, the Step of detecting can include detecting a nucleic another aspect, the protein that regulates the chain length of acid Sequence in the organism that is amplified by oligo fatty acids produced by the PUFA PKS system is a chain nucleotide primers from a nucleic acid Sequence from a length factor. In another aspect, the protein that regulates the Thaustochytrid PUFA PKS system. chain length of fatty acids produced by the PUFA PKS 0028. Another embodiment of the present invention System is a chain length factor that directs the Synthesis of relates to a microorganism identified by the Screening C20 units. method described above, wherein the microorganism is 0034). In one aspect, the organism expresses a non-bac genetically modified to regulate the production of molecules terial PUFA PKS system comprising a genetic modification by the PUFA PKS system. in a domain chosen from: a domain encoding FabA-like 0029. Yet another embodiment of the present invention f3-hydroxyacyl-ACP dehydrase (DH) domain and a domain relates to a method to produce a bioactive molecule that is encoding f-ketoacyl-ACP synthase (KS), wherein the modi produced by a polyketide Synthase System. The method fication alters the ratio of long chain fatty acids produced by includes the Step of culturing under conditions effective to the PUFA PKS system as compared to in the absence of the produce the bioactive molecule a genetically modified modification. In one aspect, the modification comprises organism that expresses a PKS System comprising at least Substituting a DH domain that does not possess isomeriza US 2002/0194641 A1 Dec. 19, 2002 tion activity for a FabA-like B-hydroxyacyl-ACP dehydrase convulsant, an anti-Heliobactor pylori drug, a drug for (DH) in the non-bacterial PUFA PKS system. In another treatment of neurodegenerative disease, a drug for treatment aspect, the modification is Selected from the group consist of degenerative liver disease, an antibiotic, and a cholesterol ing of a deletion of all or a part of the domain, a Substitution lowering formulation. In one aspect, the endproduct is used of a homologous domain from a different organism for the to treat a condition Selected from the group consisting of: domain, and a mutation of the domain. chronic inflammation, acute inflammation, gastrointestinal disorder, cancer, cachexia, cardiac restenosis, neurodegen 0035) In another aspect, the organism expresses a PKS erative disorder, degenerative disorder of the liver, blood System and the genetic modification comprises Substituting lipid disorder, osteoporosis, osteoarthritis, autoimmune dis a FabA-like B-hydroxy acyl-ACP dehydrase (DH) domain ease, preeclampsia, preterm birth, age related maculopathy, from a PUFA PKS system for a DH domain that does not posses isomerization activity. pulmonary disorder, and peroxisomal disorder. 0041. Yet another embodiment of the present invention 0036). In another aspect, the organism expresses a non relates to a method to produce a humanized animal milk, bacterial PUFAPKS system comprising a modification in an comprising genetically modifying milk-producing cells of a enoyl-ACP reductase (ER) domain, wherein the modifica milk-producing animal with at least one recombinant nucleic tion results in the production of a different compound as acid molecule comprising a nucleic acid Sequence encoding compared to in the absence of the modification. For at least one biologically active domain of a PUFA PKS example, the modification can be selected from the group system. The domain of the PUFAPKS system is encoded by consisting of a deletion of all or a part of the ER domain, a any of the nucleic acid Sequences described above. substitution of an ER domain from a different organism for the ER domain, and a mutation of the ER domain. 0042. Yet another embodiment of the present invention relates to a method produce a recombinant microbe, com 0037. In one aspect, the bioactive molecule produced by prising genetically modifying microbial cells to express at the present method can include, but is not limited to, an least one recombinant nucleic acid molecule comprising a anti-inflammatory formulation, a chemotherapeutic agent, comprising a nucleic acid Sequence encoding at least one an active excipient, an osteoporosis drug, an anti-depressant, biologically active domain of a PUFA PKS system. The an anti-convulsant, an anti-Heliobactor pylori drug, a drug domain of the PUFA PKS system is encoded by any of the for treatment of neurodegenerative disease, a drug for treat nucleic acid Sequences described above. ment of degenerative liver disease, an antibiotic, and a cholesterol lowering formulation. In one aspect, the bioac 0043. Yet another embodiment of the present invention tive molecule is a polyunsaturated fatty acid (PUFA). In relates to a recombinant host cell which has been modified another aspect, the bioactive molecule is a molecule includ to express a polyunsaturated fatty acid (PUFA) polyketide ing carbon-carbon double bonds in the cis configuration. In synthase (PKS) system, wherein the PKS catalyzes both another aspect, the bioactive molecule is a molecule includ iterative and non-iterative enzymatic reactions. The PUFA ing a double bond at every third carbon. PKS system comprises: (a) at least two enoyl ACP-reductase (ER) domains; (b) at least six acyl carrier protein (ACP) 0.038. In one aspect of this embodiment, the organism is domains; (c) at least two B-keto acyl-ACP synthase (KS) a microorganism, and in another aspect, the organism is a domains; (d) at least one acyltransferase (AT) domain; (e) at plant. least one ketoreductase (KR) domain; (f) at least two FabA 0039. Another embodiment of the present invention like B-hydroxy acyl-ACP dehydrase (DH) domains; (g) at relates to a method to produce a plant that has a polyun least one chain length factor (CLF) domain; and (h) at least saturated fatty acid (PUFA) profile that differs from the one malonyl-CoA:ACP acyltransferase (MAT) domain. In naturally occurring plant, comprising genetically modifying one aspect, the PUFA PKS system is a eukaryotic PUFA cells of the plant to express a PKS System comprising at least PKS system. In another aspect, the PUFA PKS system is an one recombinant nucleic acid molecule comprising a nucleic algal PUFA PKS system, and preferably a Thraustochytri acid Sequence encoding at least one biologically active ales PUFAPKS system, which can include, but is not limited domain of a PUFA PKS system. The domain of the PUFA to, a Schizochytrium PUFA PKS system or a Thraus PKS System is encoded by any of the nucleic acid Sequences tochytrium PUFA PKS system. described above. 0044) In this embodiment, the PUFA PKS system can be expressed in a prokaryotic host cell or in a eukaryotic host 0040 Yet another embodiment of the present invention cell. In one aspect, the host cell is a plant cell. Accordingly, relates to a method to modify an endproduct containing at one embodiment of the invention is a method to produce a least one fatty acid, comprising adding to the endproduct an product containing at least one PUFA, comprising growing oil produced by a recombinant host cell that expresses at a plant comprising Such a plant cell under conditions effec least one recombinant nucleic acid molecule comprising a tive to produce the product. The host cell is a microbial cell nucleic acid Sequence encoding at least one biologically and in this case, one embodiment of the present invention is active domain of a PUFA PKS system. The domain of a a method to produce a product containing at least one PUFA, PUFA PKS system is encoded by any of the nucleic acid comprising culturing a culture containing Such a microbial Sequences described above. In one aspect, the endproduct is cell under conditions effective to produce the product. In one Selected from the group consisting of a dietary Supplement, aspect, the PKS System catalyzes the direct production of a food product, a pharmaceutical formulation, a humanized triglycerides. animal milk, and an infant formula. A pharmaceutical for mulation can include, but is not limited to: an anti-inflam 0045. Yet another embodiment of the present invention matory formulation, a chemotherapeutic agent, an active relates to a genetically modified microorganism comprising excipient, an Osteoporosis drug, an anti-depressant, an anti a polyunsaturated fatty acid (PUFA) polyketide synthase US 2002/0194641 A1 Dec. 19, 2002

(PKS) system, wherein the PKS catalyzes both iterative and length of at least 16 carbons, and more preferably at least 18 non-iterative enzymatic reactions. The PUFA PKS system carbons, and more preferably at least 20 carbons, and more comprises: (a) at least two enoyl ACP-reductase (ER) preferably 22 or more carbons, with at least 3 or more double domains; (b) at least six acyl carrier protein (ACP) domains, bonds, and preferably 4 or more, and more preferably 5 or (c) at least two B-ketoacyl-ACP synthase (KS) domains; (d) more, and even more preferably 6 or more double bonds, at least one acyltransferase (AT) domain; (e) at least one wherein all double bonds are in the cis configuration. It is an ketoreductase (KR) domain; (f) at least two FabA-like object of the present invention to find or create via genetic f3-hydroxy acyl-ACP dehydrase (DH) domains; (g) at least manipulation or manipulation of the endproduct, PKS Sys one chain length factor (CLF) domain; and (h) at least one tems which produce polyunsaturated fatty acids of desired malonyl-CoA:ACP acyltransferase (MAT) domain. The chain length and with desired numbers of double bonds. genetic modification affects the activity of the PUFA PKS Examples of PUFAs include, but are not limited to, DHA System. In one aspect of this embodiment, the microorgan (docosahexaenoic acid (C22:6, co-3)), DPA (docosapen ism is a eukaryotic microorganism. taenoic acid (C22:5, (D-6)), and EPA (eicosapentaenoic acid 0.046 Yet another embodiment of the present invention (C20:5, co-3)). relates to a recombinant host cell which has been modified 0.052 Second, the PUFA PKS system described herein to express a non-bacterial polyunsaturated fatty acid (PUFA) incorporates both iterative and non-iterative reactions, polyketide synthase (PKS) system, wherein the non-bacte which distinguish the System from previously described rial PUFA PKS catalyzes both iterative and non-iterative PKS systems (e.g., type I, type II or modular). More enzymatic reactions. The non-bacterial PUFA PKS system particularly, the PUFA PKS system described herein con comprises: (a) at least one enoyl ACP-reductase (ER) tains domains that appear to function during each cycle as domain; (b) multiple acyl carrier protein (ACP) domains; (c) well as those which appear to function during only Some of at least two f-ketoacyl-ACP synthase (KS) domains; (d) at the cycles. Akey aspect of this may be related to the domains least one acyltransferase (AT) domain; (e) at least one showing homology to the bacterial Fab A enzymes. For ketoreductase (KR) domain; (f) at least two FabA-like example, the Fab A enzyme of E. coli has been shown to f3-hydroxy acyl-ACP dehydrase (DH) domains; (g) at least possess two enzymatic activities. It possesses a dehydration one chain length factor (CLF) domain; and (h) at least one activity in which a water molecule (HO) is abstracted from malonyl-CoA:ACP acyltransferase (MAT) domain. a carbon chain containing a hydroxy group, leaving a trans double bond in that carbon chain. In addition, it has an BRIEF DESCRIPTION OF THE FIGURES isomerase activity in which the trans double bond is con verted to the cis configuration. This isomerization is accom 0047 FIG. 1 is a graphical representation of the domain plished in conjunction with a migration of the double bond structure of the Schizochytrium PUFA PKS system. position to adjacent carbons. In PKS (and FAS) systems, the 0048 FIG. 2 shows a comparison of PKS domains from main carbon chain is extended in 2 carbon increments. One Schizochytrium and Shewanella. can therefore predict the number of extension reactions required to produce the PUFA products of these PKS sys 0049 FIG.3 shows a comparison of PKS domains from tems. For example, to produce DHA (C22:6, all cis) requires Schizochytrium and a related PKS system from Nostoc 10 extension reactions. Since there are only 6 double bonds whose product is a long chain fatty acid that does not contain in the end product, it means that during Some of the reaction any double bonds. cycles, a double bond is retained (as a cis isomer), and in others, the double bond is reduced prior to the next exten DETAILED DESCRIPTION OF THE SO. INVENTION 0053 Before the discovery of a PUFA PKS system in 0050. The present invention generally relates to non marine bacteria (see U.S. Pat. No. 6,140,486), PKS systems bacterial derived polyunsaturated fatty acid (PUFA) were not known to possess this combination of iterative and polyketide Synthase (PKS) systems, to genetically modified Selective enzymatic reactions, and they were not thought of organisms comprising non-bacterial PUFA PKS systems, to as being able to produce carbon-carbon double bonds in methods of making and using Such Systems for the produc the cis configuration. However, the PUFA PKS system tion of products of interest, including bioactive molecules, described by the present invention has the capacity to and to novel methods for identifying new eukaryotic micro introduce cis double bonds and the capacity to vary the organisms having such a PUFAPKS system. As used herein, reaction Sequence in the cycle. a PUFAPKS system generally has the following identifying 0054 Therefore, the present inventors propose to use features: (1) it produces PUFAS as a natural product of the these features of the PUFA PKS system to produce a range System; and (2) it comprises several multifunctional proteins of bioactive molecules that could not be produced by the assembled into a complex that conducts both iterative pro previously described (Type II, Type I and modular) PKS cessing of the fatty acid chain as well non-iterative proceSS Systems. These bioactive molecules include, but are limited ing, including trans-cis isomerization and enoyl reduction to, polyunsaturated fatty acids (PUFAs), antibiotics or other reactions in selected cycles (See FIG. 1, for example). bioactive compounds, many of which will be discussed 0051 More specifically, first, a PUFA PKS system that below. For example, using the knowledge of the PUFAPKS forms the basis of this invention produces polyunsaturated gene Structures described herein, any of a number of meth fatty acids (PUFAS) as products (i.e., an organism that ods can be used to alter the PUFA PKS genes, or combine endogenously (naturally) contains Such a PKS System makes portions of these genes with other Synthesis Systems, includ PUFAs using this system). The PUFAs referred to herein are ing other PKS Systems, Such that new products are pro preferably polyunsaturated fatty acids with a carbon chain duced. The inherent ability ofthis particular type of system US 2002/0194641 A1 Dec. 19, 2002

to do both iterative and selective reactions will enable this well as PKS functional domains or proteins from other PKS system to yield products that would not be found if similar Systems (type I, type II, modular) or FAS Systems. methods were applied to other types of PKS systems. 0058 Schizochytrium is a Thraustochytrid marine micro 0055. In one embodiment, a PUFA PKS system accord organism that accumulates large quantities of triacylglycer ing to the present invention comprises at least the following ols rich in DHA and docosapentaenoic acid (DPA; 22:5 biologically active domains: (a) at least two enoyl ACP (O-6); e.g., 30% DHA+DPA by dry weight (Barclay et al., J. reductase (ER) domains; (b) at least six acyl carrier protein Appl. Phycol. 6, 123 (1994)). In eukaryotes that synthesize (ACP) domains; (c) at least two f-ketoacyl-ACP synthase 20- and 22-carbon PUFAS by an elongation/desaturation (KS) domains; (d) at least one acyltransferase (AT) domain; pathway, the pools of 18-, 20- and 22-carbon intermediates (e) at least one ketoreductase (KR) domain, (f) at least two are relatively large So that in Vivo labeling experiments using FabA-like B-hydroxy acyl-ACP dehydrase (DH) domains; 'C-acetate reveal clear precursor-product kinetics for the (g) at least one chain length factor (CLF) domain; and (h) at predicted intermediates (Gellerman et al., Biochim. BiophyS. least one malonyl-CoA:ACP acyltransferase (MAT) domain. Acta 573:23 (1979)). Furthermore, radiolabeled intermedi The functions of these domains are generally individually ates provided exogenously to Such organisms are converted known in the art and will be described in detail below with to the final PUFA products. The present inventors have regard to the PUFA PKS system of the present invention. shown that 1-"C-acetate was rapidly taken up by Schizochytrium cells and incorporated into fatty acids, but at 0056. In another embodiment, the PUFA PKS system the shortest labeling time (1 min), DHA contained 31% of comprises at least the following biologically active domains: the label recovered in fatty acids, and this percentage (a) at least one enoyl ACP-reductase (ER) domain; (b) remained essentially unchanged during the 10- 15 min of multiple acyl carrier protein (ACP) domains (at least four, 'C-acetate incorporation and the Subsequent 24 hours of and preferably at least five, and more preferably at least Six, culture growth (See Example 3). Similarly, DPA represented and even more preferably Seven, eight, nine, or more than 10% of the label throughout the experiment. There is no nine); (c) at least two B-keto acyl-ACP synthase (KS) evidence for a precursor-product relationship between 16- or domains; (d) at least one acyltransferase (AT) domain; (e) at 18-carbon fatty acids and the 22-carbon polyunsaturated least one ketoreductase (KR) domain; (f) at least two FabA fatty acids. These results are consistent with rapid synthesis like B-hydroxy acyl-ACP dehydrase (DH) domains; (g) at of DHA from "C-acetate involving very small (possibly least one chain length factor (CLF) domain; and (h) at least enzyme-bound) pools of intermediates. A cell-free homoge one malonyl-CoA:ACP acyltransferase (MAT) domain. nate derived from Schizochytrium cultures incorporated Preferably, such a PUFA PKS system is a non-bacterial 1-"C-malonyl-CoA into DHA, DPA, and saturated fatty PUFA-PKS system. acids. The Same biosynthetic activities were retained by a 0057. In one embodiment, a PUFA PKS system of the 100,000xg Supernatant fraction but were not present in the present invention is a non-bacterial PUFA PKS system. In membrane pellet. Thus, DHA and DPA synthesis in other words, in one embodiment, the PUFA PKS system of Schizochytrium does not involve membrane-bound desatu the present invention is isolated from an organism that is not rases or fatty acid elongation enzymes like those described a bacteria, or is a homologue of or derived from a PUFA for other eukaryotes (Parker-Barnes et al., 2000, Supra; PKS System from an organism that is not a bacteria, Such as Shanklin et al., 1998, Supra). These fractionation data con a eukaryote or an archaebacterium. Eukaryotes are separated trast with those obtained from the Shewanella enzymes (See from prokaryotes based on the degree of differentiation of Metz et al., 2001, Supra) and may indicate use of a different the cells. The higher group with more differentiation is (Soluble) acyl acceptor molecule, Such as CoA, by the called eukaryotic. The lower group with less differentiated Schizochytrium enzyme. cells is called prokaryotic. In general, prokaryotes do no 0059. In copending U.S. application Ser. No. 09/231,899, possess a nuclear membrane, do not exhibit mitosis during a cDNA library from Schizochytrium was constructed and cell division, have only one chromosome, their approximately 8,000 random clones (ESTs) were sequenced. contains 70S ribosomes, they do not possess any mitochon Within this dataset, only one moderately expressed gene dria, endoplasmic reticulum, chloroplasts, lySOSomes or (0.3% of all sequences) was identified as a fatty acid golgi apparatus, their flagella (if present) consists of a single desaturase, although a Second putative desaturase was rep fibril. In contrast eukaryotes have a nuclear membrane, they resented by a single clone (0.01%). By contrast, Sequences do exhibit mitosis during cell division, they have many that exhibited homology to 8 of the 11 domains of the chromosomes, their cytoplasm contains 80S ribosomes, they Shewanella PKS genes shown in FIG. 2 were all identified do possess mitochondria, endoplasmic reticulum, chloro at frequencies of 0.2-0.5%. In U.S. application Ser. No. plasts (in algae), lysosomes and golgi apparatus, and their 09/231,899, several cDNA clones showing homology to the flagella (if present) consists of many fibrils. In general, Shewanella PKS genes were Sequenced, and various clones bacteria are prokaryotes, while algae, fungi, , were assembled into nucleic acid Sequences representing and higher plants are eukaryotes. The PUFAPKS systems of two partial open reading frames and one complete open the marine bacteria (e.g., Shewanella and Vibrio marinus) reading frame. Nucleotides 390-4443 of the cDNA sequence are not the basis of the present invention, although the containing the first partial open reading frame described in present invention does contemplate the use of domains from U.S. application Ser. No.09/231,899 (denoted therein as these bacterial PUFA PKS systems in conjunction with SEQID NO:69) match nucleotides 4677-8730 (plus the stop domains from the non-bacterial PUFA PKS systems of the codon) of the sequence denoted herein as OrfA (SEQ ID present invention. For example, according to the present NO:1). Nucleotides 1-4876 of the cDNA sequence contain invention, genetically modified organisms can be produced ing the Second partial open reading frame described in U.S. which incorporate non-bacterial PUFA PKS functional application Ser. No. 09/231,899 (denoted therein as SEQ ID domains with bacteria PUFA PKS functional domains, as NO:71) matches nucleotides 1311-6177 (plus the stop US 2002/0194641 A1 Dec. 19, 2002 codon) of the sequence denoted herein as OrfB (SEQ ID represented herein as SEQID NO:2. Within OrfAare twelve NO:3). Nucleotides 145-4653 of the cDNA sequence con domains: (a) one B-keto acyl-ACP synthase (KS) domain; taining the complete open reading frame described in U.S. (b) one malonyl-CoA:ACP acyltransferase (MAT) domain; application Ser. No. 09/231,899 (denoted therein as SEQ ID (c) nine acyl carrier protein (ACP) domains; and (d) one NO:76 and incorrectly designated as a partial open reading ketoreductase (KR) domain. frame) match the entire sequence (plus the stop codon) of the 0064. The nucleotide sequence for OrfA has been depos sequence denoted herein as OrfQ (SEQ ID NO:5). ited with GenBank as Accession No. AF378327 (amino acid 0060) Further sequencing of cDNA and genomic clones sequence Accession No. AAK728879). OrfA was compared by the present inventors allowed the identification of the with known sequences in a standard BLAST search (BLAST full-length genomic sequence of each of OrfA, OrfB and 2.0 Basic BLAST homology search using blastp for amino OrfC and the complete identification of the domains with acid Searches, blastin for nucleic acid Searches, and blastX homology to those in Shewanella (see FIG. 2). It is noted for nucleic acid Searches and Searches of the translated that in Schizochytrium, the genomic DNA and cDNA are amino acid Sequence in all 6 open reading frames with identical, due to the lack of introns in the organism genome, Standard default parameters, wherein the query Sequence is to the best of the present inventors knowledge. Therefore, filtered for low complexity regions by default (described in reference to a nucleotide Sequence from Schizochytrium can Altschul, S. F., Madden, T. L., Schaaffer, A. A., Zhang, J., refer to genomic DNA or cDNA. Based on the comparison Zhang, Z., Miller, W. & Lipman, D. J. (1997) “Gapped of the Schizochytrium PKS domains to Shewanella, clearly, BLAST and PSI-BLAST: a new generation of protein data the Schizochytrium genome encodes proteins that are highly base search programs.” Nucleic Acids Res. 25:3389-3402, Similar to the proteins in Shewanella that are capable of incorporated herein by reference in its entirety)). At the catalyzing EPA synthesis. The proteins in Schizochytrium nucleic acid level, OrfA has no significant homology to any constitute a PUFAPKS system that catalyzes DHA and DPA known nucleotide Sequence. At the amino acid level, the Synthesis. AS discussed in detail herein, Simple modification Sequences with the greatest degree of homology to ORFA of the reaction scheme identified for Shewanella will allow were: Nostoc sp. 7120 heterocyst glycolipid synthase for DHA synthesis in Schizochytrium. The homology (Accession No. NC 003272), which was 42% identical to between the prokaryotic Shewanella and eukaryotic ORFA over 1001 amino acid residues; and Moritella mari Schizochytrium genes suggests that the PUFA PKS has nus (Vibrio marinus) ORF8 (Accession No. AB025342), undergone lateral gene transfer. which was 40% identical to ORFA over 993 amino acid 0061 FIG. 1 is a graphical representation of the three residues. open reading frames from the Schizochytrium PUFA PKS 0065. The first domain in OrfA is a KS domain, also system, and includes the domainstructure of this PUFAPKS referred to herein as ORFA-KS. This domain is contained System. AS described in Example 1 below, the domain within the nucleotide Sequence Spanning from a starting Structure of each open reading frame is as follows: point of between about positions 1 and 40 of SEQ ID NO:1 Open Reading Frame A (OrfA) (OrfA) to an ending point of between about positions 1428 0062) The complete nucleotide sequence for OrfA is and 1500 of SEQ ID NO:1. The nucleotide sequence con represented herein as SEQID NO:1. Nucleotides 4677-8730 taining the Sequence encoding the ORFA-KS domain is of SEQID NO:1 correspond to nucleotides 390-4443 of the represented herein as SEQ ID NO:7 (positions 1-1500 of sequence denoted as SEQ ID NO:69 in U.S. application Ser. SEQ ID NO:1). The amino acid sequence containing the KS No. 09/231,899. Therefore, nucleotides 1-4676 of SEQ ID domain spans from a starting point of between about posi NO:1 represent additional Sequence that was not disclosed in tions 1 and 14 of SEQ ID NO:2 (ORFA) to an ending point U.S. application Ser. No. 09/231,899. This novel region of of between about positions 476 and 500 of SEQ ID NO:2. SEQ ID NO:1 encodes the following domains in OrfA: (1) The amino acid Sequence containing the ORFA-KS domain the ORFA-KS domain; (2) the ORFA-MAT domain; and (3) is represented herein as SEQ ID NO:8 (positions 1-500 of at least a portion of the ACP domain region (e.g., at least SEQ ID NO:2). It is noted that the ORFA-KS domain ACP domains 1-4). It is noted that nucleotides 1-389 of SEQ contains an active site motif: DXAC* (*acyl binding site ID NO:69 in U.S. application Ser.No.09/231,899 do not C21s). match with the 389 nucleotides that are upstream of position 0066 According to the present invention, a domain or 4677 in SEQ ID NO:1 disclosed herein. Therefore, positions protein having 3-keto acyl-ACP synthase (KS) biological 1-389 of SEQ ID NO:69 in U.S. application Ser.No. 09/231, activity (function) is characterized as the enzyme that carries 899 appear to be incorrectly placed next to nucleotides out the initial step of the FAS (and PKS) elongation reaction 390-4443 of that sequence. Most of these first 389 nucle cycle. The acyl group destined for elongation is linked to a otides (about positions 60–389) are a match with an cysteine residue at the active site of the enzyme by a upstream portion of OrfA (SEQ ID NO: 1) of the present thioester bond. In the multi-step reaction, the acyl-enzyme invention and therefore, it is believed that an error occurred undergoes condensation with malonyl-ACP to form -keto in the effort to prepare the contig of the cDNA constructs in acyl-ACP, CO and free enzyme. The KS plays a key role in U.S. application Ser. No.09/231,899. The region in which the elongation cycle and in many Systems has been shown to the alignment error occurred in U.S. application Ser. No.09/ possess greater Substrate Specificity than other enzymes of 231,899 is within the region of highly repetitive sequence the reaction cycle. For example, E. coli has three distinct KS (i.e., the ACP region, discussed below), which probably enzymes-each with its own particular role in the physiol created Some confusion in the assembly of that Sequence ogy of the organism (Magnuson et al., Microbiol. Rev. 57, from various cDNA clones. 522 (1993)). The two KS domains of the PUFA-PKS sys 0063 OrfA is a 8730 nucleotide sequence (not including tems could have distinct roles in the PUFA biosynthetic the stop codon) which encodes a 2910 amino acid sequence, reaction Sequence. US 2002/0194641 A1 Dec. 19, 2002

0067. As a class of enzymes, KS's have been well 0072 All nine ACP domains together span a region of characterized. The Sequences of many verified KS genes are OrfA of from about position 3283 to about position 6288 of know, the active site motifs have been identified and the SEQ ID NO:1, which corresponds to amino acid positions of crystal structures of several have been determined. Proteins from about 1095 to about 2096 of SEO ID NO:2. The (or domains of proteins) can be readily identified as belong nucleotide Sequence for the entire ACP region containing all ing to the KS family of enzymes by homology to known KS nine domains is represented herein as SEQ ID NO:16. The Sequences. region represented by SEQ ID NO:16 includes the linker segments between individual ACP domains. The repeat 0068 The second domain in OrfA is a MAT domain, also interval for the nine domains is approximately every 330 referred to herein as ORFA-MAT. This domain is contained nucleotides of SEQ ID NO:16 (the actual number of amino within the nucleotide Sequence Spanning from a starting acids measured between adjacent active site Serines ranges point of between about positions 1723 and 1798 of SEQ ID from 104 to 116 amino acids). Each of the nine ACP NO:1 (OrfA) to an ending point of between about positions domains contains a pantetheline binding motif LGIDS* 2805 and 3000 of SEQ ID NO:1. The nucleotide sequence (represented herein by SEQ ID NO:14), wherein S* is the containing the Sequence encoding the ORFA-MAT domain pantetheline binding site Serine (S). The pantetheline binding is represented herein as SEQID NO:9 (positions 1723-3000 site serine (S) is located near the center of each ACP domain of SEQ ID NO: 1). The amino acid sequence containing the Sequence. At each end of the ACP domain region and MAT domain spans from a starting point of between about between each ACP domain is a region that is highly enriched positions 575 and 600 of SEQID NO:2 (ORFA) to an ending for proline (P) and alanine (A), which is believed to be a point of between about positions 935 and 1000 of SEQ ID linker region. For example, between ACP domains 1 and 2 NO:2. The amino acid sequence containing the ORFA-MAT is the sequence: APAPVKAAAPAAPVASAPAPA, repre domain is represented herein as SEQ ID NO:10 (positions sented herein as SEQ ID NO:15. The locations of the active 575-1000 of SEQ ID NO:2). It is noted that the ORFA-MAT Site Serine residues (i.e., the pantetheline binding site) for domain contains an active site motif: GHS*XG (* acyl each of the nine ACP domains, with respect to the amino binding site Soe), represented herein as SEQ ID NO:11. acid sequence of SEQ ID NO:2, are as follows: ACP1 = 0069. According to the present invention, a domain or S57, ACP2=See, ACP3=S77, ACP4=Sass, ACP5= protein having malonyl-CoA:ACP acyltransferase (MAT) So, ACP6=S7s, ACP7=Sso, ACP8=Soo, and ACP9= biological activity (function) is characterized as one that Sol. Given that the average size of an ACP domain is about transfers the malonyl moiety from malonyl-CoA to ACP. In 85 amino acids, excluding the linker, and about 110 amino addition to the active site motif (GXSXG), these enzymes acids including the linker, with the active site Serine being possess an extended motifF) and Q amino acids in key approximately in the center of the domain, one of skill in the positions) that identifies them as MAT enzymes (in contrast art can readily determine the positions of each of the nine to the AT domain of Schizochytrium Orf B). In some PKS ACP domains in OrfA. systems (but not the PUFAPKS domain) MAT domains will 0073. According to the present invention, a domain or preferentially load methyl- or ethyl-malonate on to the ACP protein having acyl carrier protein (ACP) biological activity group (from the corresponding CoA ester), thereby intro (function) is characterized as being Small polypeptides (typi ducing branches into the linear carbon chain. MAT domains cally, 80 to 100 amino acids long), that function as carriers can be recognized by their homology to known MAT for growing fatty acyl chains via a thioester linkage to a Sequences and by their extended motif Structure. covalently bound co-factor of the protein. They occur as Separate units or as domains within larger proteins. ACPS are 0070 Domains 3-11 of OrfA are nine tandem ACP converted from inactive apo-forms to functional holo-forms domains, also referred to herein as ORFA-ACP (the first by transfer of the phosphopantetheinyl moeity of CoA to a domain in the sequence is ORFA-ACP1, the second domain highly conserved Serine residue of the ACP. Acyl groups are is ORFA-ACP2, the third domain is ORFA-ACP3, etc.). The attached to ACP by a thioester linkage at the free terminus first ACP domain, ORFA-ACP1, is contained within the of the phosphopantetheinyl moiety. ACPs can be identified nucleotide Sequence Spanning from about position 3343 to by labeling with radioactive pantetheline and by Sequence about position3600 of SEQID NO:1 (OrfA). The nucleotide homology to known ACPs. The presence of variations of the Sequence containing the Sequence encoding the ORFA-ACP above mentioned motif (LGIDS) is also a signature of an 1 domain is represented herein as SEQ ID NO:12 (positions ACP. 3343-3600 of SEQ ID NO:1). The amino acid sequence containing the first ACP domain spans from about position 0074 Domain 12 in OrfA is a KR domain, also referred 1115 to about position 1200 of SEQ ID NO:2. The amino to herein as ORFA-KR. This domain is contained within the acid Sequence containing the ORFA-ACP1 domain is rep nucleotide Sequence Spanning from a starting point of about position 6598 of SEQ ID NO:1 to an ending point of about resented herein as SEQ ID NO:13 (positions 1115-1200 of position 8730 of SEQ ID NO:1. The nucleotide sequence SEQ ID NO:2). It is noted that the ORFA-ACP1 domain containing the Sequence encoding the ORFA-KR domain is contains an active site motif: LGIDS* (*pantetheine binding represented herein as SEQ ID NO:17 (positions 6598-8730 motif Ss), represented herein by SEQ ID NO:14. of SEQ ID NO:1). The amino acid sequence containing the 0071. The nucleotide and amino acid sequences of all KR domain spans from a Starting point of about position nine ACP domains are highly conserved and therefore, the 2200 of SEQ ID NO:2 (ORFA) to an ending point of about Sequence for each domain is not represented herein by an position 2910 of SEQ ID NO:2. The amino acid sequence individual sequence identifier. However, based on the infor containing the ORFA-KR domain is represented herein as mation disclosed herein, one of skill in the art can readily SEQ ID NO:18 (positions 2200-2910 of SEQ ID NO:2). determine the Sequence containing each of the other eight Within the KR domain is a core region with homology to ACP domains (see discussion below). Short chain aldehyde-dehydrogenases (KR is a member of US 2002/0194641 A1 Dec. 19, 2002 this family). This core region spans from about position within the nucleotide Sequence Spanning from a starting 7198 to about position 7500 of SEQ ID NO:1, which point of between about positions 1 and 43 of SEQ ID NO:3 corresponds to amino acid positions 2400-2500 of SEQ ID (OrfB) to an ending point of between about positions 1332 NO:2. and 1350 of SEQ ID NO:3. The nucleotide sequence con 0075 According to the present invention, a domain or taining the Sequence encoding the ORFB-KS domain is protein having ketoreductase activity, also referred to as represented herein as SEQ ID NO:19 (positions 1-1350 of 3-ketoacyl-ACP reductase (KR) biological activity (func SEQ ID NO:3). The amino acid sequence containing the KS tion), is characterized as one that catalyzes the pyridine domain spans from a starting point of between about posi nucleotide-dependent reduction of 3-keto acyl forms of tions 1 and 15 of SEQID NO:4 (ORFB) to an ending point ACP. It is the first reductive step in the de novo fatty acid of between about positions 444 and 450 of SEQ ID NO:4. biosynthesis elongation cycle and a reaction often performed The amino acid sequence containing the ORFB-KS domain in polyketide biosynthesis. Significant Sequence Similarity is is represented herein as SEQ ID NO:20 (positions 1-450 of observed with one family of enoyl ACP reductases (ER), the SEQ ID NO:4). It is noted that the ORFB-KS domain other reductase of FAS (but not the ER family present in the contains an active site motif: DXAC* (*acyl binding site PUFA PKS system), and the short-chain alcohol dehydro Co). KS biological activity and methods of identifying genase family. Pfam analysis of the PUFA PKS region proteins or domains having Such activity is described above. indicated above reveals the homology to the short-chain 0080. The second domain in OrfB is a CLF domain, also alcohol dehydrogenase family in the core region. Blast referred to herein as ORFB-CLF. This domain is contained analysis of the same region reveals matches in the core area within the nucleotide Sequence Spanning from a starting to known KR enzymes as well as an extended region of point of between about positions 1378 and 1402 of SEQ ID homology to domains from the other characterized PUFA NO:3 (OrfB) to an ending point of between about positions PKS systems. 2682 and 2700 of SEQ ID NO:3. The nucleotide sequence containing the Sequence encoding the ORFB-CLF domain is Open Reading Frame B (OrfB) represented herein as SEQ ID NO:21 (positions 1378-2700 of SEQ ID NO:3). The amino acid sequence containing the 0.076 The complete nucleotide sequence for OrfB is CLF domain spans from a starting point of between about represented herein as SEQID NO:3. Nucleotides 1311-6177 positions 460 and 468 of SEQID NO:4(ORFB) to an ending of SEQ ID NO:3 correspond to nucleotides 1-4867 of the point of between about positions 894 and 900 of SEQ ID sequence denoted as SEQ ID NO:71 in U.S. application Ser. NO:4. The amino acid sequence containing the ORFB-CLF No.09/231,899 (The cDNA sequence in U.S. application domain is represented herein as SEQ ID NO:22 (positions Ser. No. 09/231,899 contains about 345 additional nucle 460-900 of SEQ ID NO:4). It is noted that the ORFB-CLF otides beyond the Stop codon, including a polyA tail). domain contains a KS active site motif without the acyl Therefore, nucleotides 1-1310 of SEQ ID NO:1 represent binding cysteine. additional Sequence that was not disclosed in U.S. applica tion Ser. No. 09/231,899. This novel region of SEQID NO:3 0081. According to the present invention, a domain or contains most of the KS domain encoded by OrfB. protein is referred to as a chain length factor (CLF) based on the following rationale. The CLF was originally described as 0.077 OrfB is a 6177 nucleotide sequence (not including characteristic of Type II (dissociated enzymes) PKS systems the stop codon) which encodes a 2059 amino acid sequence, and was hypothesized to play a role in determining the represented herein as SEQ ID NO:4. Within OrfB are four number of elongation cycles, and hence the chain length, of domains: (a) one B-keto acyl-ACP synthase (KS) domain; the end product. CLF amino acid Sequences show homology (b) one chain length factor (CLF) domain; (c) one acyl to KS domains (and are thought to form heterodimers with transferase (AT) domain; and, (d) one enoyl ACP-reductase a KS protein), but they lack the active site cysteine. CLF's (ER) domain. role in PKS systems is currently controversial. New evi 0078. The nucleotide sequence for OrfB has been depos dence (C. Bisang et al., Nature 401, 502 (1999)) suggests a ited with GenBank as Accession No. AF378328 (amino acid role in priming (providing the initial acyl group to be sequence Accession No. AAK728880). OrfB was compared elongated) the PKS systems. In this role the CLF domain is with known Sequences in a Standard BLAST Search as thought to decarboxylate malonate (as malonyl-ACP), thus described above. At the nucleic acid level, OrfB has no forming an acetate group that can be transferred to the KS Significant homology to any known nucleotide Sequence. At active site. This acetate therefore acts as the priming the amino acid level, the Sequences with the greatest degree molecule that can undergo the initial elongation (condensa of homology to ORFB were: Shewanella sp. hypothetical tion) reaction. Homologues of the Type II CLF have been protein (Accession No. U73935), which was 53% identical identified as 'loading domains in some modular PKS sys to ORFB over 458 amino acid residues; Moritella marinus tems. A domain with the sequence features of the CLF is (Vibrio marinus) ORF11 (Accession No. AB025342), which found in all currently identified PUFA PKS systems and in was 53% identical to ORFB over 460 amino acid residues; each case is found as part of a multidomain protein. Photobacterium profundum omega-3 polyunsaturated fatty 0082) The third domain in OrfB is an AT domain, also acid synthase PfalD (Accession No. AF409100), which was referred to herein as ORFB-AT. This domain is contained 52% identical to ORFB over 457 amino acid residues; and within the nucleotide Sequence Spanning from a starting Nostoc sp. 7120 hypothetical protein (Accession No. point of between about positions 2701 and 3598 of SEQ ID NC 003272), which was 53% identical to ORFB over 430 NO:3 (OrfB) to an ending point of between about positions amino acid residues. 3975 and 4200 of SEQ ID NO:3. The nucleotide sequence 007.9 The first domain in OrfB is a KS domain, also containing the Sequence encoding the ORFB-AT domain is referred to herein as ORFB-KS. This domain is contained represented herein as SEQ ID NO:23 (positions 2701-4200 US 2002/0194641 A1 Dec. 19, 2002

of SEQ ID NO:3). The amino acid sequence containing the SEQ ID NO:5 (i.e., the entire open reading frame sequence, AT domain spans from a starting point of between about not including the Stop codon) correspond to nucleotides positions 901 and 1200 of SEQ ID NO:4 (ORFB) to an 145-4653of the sequence denoted as SEQID NO:76 in U.S. ending point of between about positions 1325 and 1400 of application Ser. No. 09/231,899 (The cDNA sequence in SEQ ID NO:4. The amino acid sequence containing the U.S. application Ser. No. 09/231,899 contains about 144 ORFB-AT domain is represented herein as SEQ ID NO:24 nucleotides upstream of the start codon for Orfo and about (positions 901-1400 of SEQ ID NO:4). It is noted that the 110 nucleotides beyond the Stop codon, including a polyA ORFB-AT domain contains an active site motif of GXS*XG tail). OrfQ is a 4509 nucleotide sequence (not including the (acyl binding site So) that is characteristic of acyltrans Stop codon) which encodes a 1503 amino acid sequence, ferse (AT) proteins. represented herein as SEQ ID NO:6. Within OrfQ are three 0.083. An “acyltransferase” or “AT” refers to a general domains: (a) two FabA-like B-hydroxyacyl-ACP dehydrase class of enzymes that can carry out a number of distinct acyl (DH) domains; and (b) one enoyl ACP-reductase (ER) transfer reactions. The Schizochytrium domain shows good domain. homology to a domain present in all of the other PUFA PKS 0087. The nucleotide sequence for Orfo has been depos Systems currently examined and very weak homology to ited with GenBank as Accession No. AF378329 (amino acid Some acyltransferases whose specific functions have been sequence Accession No. AAK728881). OrfQ was compared identified (e.g. to malonyl-CoA:ACP acyltransferase, MAT). with known Sequences in a Standard BLAST Search as In spite of the weak homology to MAT, this AT domain is not described above. At the nucleic acid level, Orfo has no believed to function as a MAT because it does not possess Significant homology to any known nucleotide Sequence. At an extended motif Structure characteristic of Such enzymes the amino acid level (Blastp), the Sequences with the greatest (see MAT domain description, above). For the purposes of degree of homology to ORFC were: Moritella marinus this disclosure, the functions of the AT domain in a PUFA (Vibrio marinus) ORF11 (Accession No. ABO25342), PKS system include, but are not limited to: transfer of the which is 45% identical to ORFC over 514 amino acid fatty acyl group from the ORFAACP domain(s) to water (i.e. residues, Shewanella sp. hypothetical protein 8 (Accession a thioesterase-releasing the fatty acyl group as a free fatty No. U73935), which is 49% identical to ORFC over 447 acid), transfer of a fatty acyl group to an acceptor Such as amino acid residues, Nostoc sp. hypothetical protein (Acces CoA, transfer of the acyl group among the various ACP sion No. NC 003272), which is 49% identical to ORFC domains, or transfer of the fatty acyl group to a lipophilic over 430 amino acid residues, and Shewanella sp. hypo acceptor molecule (e.g. to lysophosphadic acid). thetical protein 7 (Accession No. U73935), which is 37% 0084. The fourth domain in OrfB is an ER domain, also identical to ORFC over 930 amino acid residues. referred to herein as ORFB-ER. This domain is contained within the nucleotide Sequence Spanning from a starting 0088. The first domain in Orfo is a DH domain, also point of about position 4648 of SEQ ID NO:3 (OrfB) to an referred to herein as ORFC-DH1. This is one of two DH ending point of about position 6177 of SEQ ID NO:3. The domains in Orfo, and therefore is designated DH1. This nucleotide Sequence containing the Sequence encoding the domain is contained within the nucleotide Sequence Span ORFB-ER domain is represented herein as SEQ ID NO:25 ning from a starting point of between about positions 1 and (positions 4648-6177 of SEQ ID NO:3). The amino acid 778 of SEQ ID NO:5 (OrfQ) to an ending point of between Sequence containing the ER domain spans from a starting about positions 1233 and 1350 of SEQ ID NO:5. The point of about position 1550 of SEQ ID NO:4 (ORFB) to an nucleotide Sequence containing the Sequence encoding the ending point of about position 2059 of SEQ ID NO:4. The ORFC-DH1 domain is represented herein as SEQID NO:27 amino acid Sequence containing the ORFB-ER domain is (positions 1-1350 of SEQ ID NO:5). The amino acid represented herein as SEQ ID NO:26 (positions 1550-2059 Sequence containing the DH1 domain spans from a starting point of between about positions 1 and 260 of SEQ ID NO:6 of SEQ ID NO:4). (ORFC) to an ending point of between about positions 411 0085. According to the present invention, this domain has and 450 of SEQ ID NO:6. The amino acid sequence con enoyl reductase (ER) biological activity. The ER enzyme taining the ORFC-DH1 domain is represented herein as SEQ reduces the trans-double bond (introduced by the DH activ ity) in the fatty acyl-ACP, resulting in fully Saturating those ID NO:28 (positions 1-450 of SEQ ID NO:6). carbons. The ER domain in the PUFA-PKS shows homology 0089. The characteristics of both the DH domains (see to a newly characterized family of ER enzymes (Heath et al., below for DH 2) in the PUFA PKS systems have been Nature 406,145 (2000)). Heath and Rock identified this new described in the preceding Sections. This class of enzyme class of ER enzymes by cloning a gene of interest from removes HOH from a B-keto acyl-ACP and leaves a trans StreptococcuS pneumoniae, purifying a protein expressed double bond in the carbon chain. The DH domains of the from that gene, and showing that it had ER activity in an in PUFA PKS systems show homology to bacterial DH vitro assay. The sequence of the Schizochytrium ER domain enzymes associated with their FAS Systems (rather than to of OrfB shows homology to the S. pneumoniae ER protein. the DH domains of other PKS systems). A subset of bacterial All of the PUFAPKS systems currently examined contain at DH's, the Fab A-like DH's, possesses cis-trans isomerase least one domain with very high Sequence homology to the activity (Heath et al., J. Biol. Chem., 271, 27795 (1996)). It Schizochytrium ER domain. The Schizochytrium PUFA is the homologies to the Fab A-like DH's that indicate that PKS system contains two ER domains (one on Or fiB and one or both of the DH domains is responsible for insertion one on OrfQ). of the cis double bonds in the PUFA PKS products. Open Reading Frame C (OrfC) 0090 The second domain in Orfo is a DH domain, also 0.086 The complete nucleotide sequence for Orfo is referred to herein as ORFC-DH2. This is the second of two represented herein as SEQ ID NO:5. Nucleotides 1-4509 of DH domains in Orfo, and therefore is designated DH2. This US 2002/0194641 A1 Dec. 19, 2002

domain is contained within the nucleotide Sequence Span 0093. According to the present invention, an amino acid ning from a starting point of between about positions 1351 Sequence that has a biological activity of at least one domain and 2437 of SEQ ID NO:5 (OrfC) to an ending point of of a PUFA PKS system is an amino acid sequence that has between about positions 2607 and 2850 of SEQ ID NO:5. the biological activity of at least one domain of the PUFA The nucleotide Sequence containing the Sequence encoding PKS system described in detail herein, as exemplified by the the ORFC-DH2 domain is represented herein as SEQ ID Schizochytrium PUFAPKS system. The biological activities NO:29 (positions 1351-2850 of SEQ ID NO:5). The amino of the various domains within the Schizochytrium PUFA acid Sequence containing the DH2 domain spans from a PKS system have been described in detail above. Therefore, starting point of between about positions 451 and 813 of an isolated nucleic acid molecule of the present invention can encode the translation product of any PUFA PKS open SEQ ID NO:6 (ORFC) to an ending point of between about reading frame, PUFA PKS domain, biologically active frag positions 869 and 950 of SEQ ID NO:6. The amino acid ment thereof, or any homologue of a naturally occurring sequence containing the ORFC-DH2 domain is represented PUFA PKS open reading frame or domain which has bio herein as SEQ ID NO:30 (positions 451-950 of SEQ ID logical activity. A homologue of given protein or domain is NO:6). DH biological activity has been described above. a protein or polypeptide that has an amino acid Sequence 0091. The third domain in Orfo is an ER domain, also which differs from the naturally occurring reference amino referred to herein as ORFC-ER. This domain is contained acid sequence (i.e., of the reference protein or domain) in that at least one or a few, but not limited to one or a few, within the nucleotide Sequence Spanning from a starting amino acids have been deleted (e.g., a truncated version of point of about position 2998 of SEQ ID NO:5 (OrfQ) to an the protein, Such as a peptide or fragment), inserted, ending point of about position 4509 of SEQ ID NO:5. The inverted, Substituted and/or derivatized (e.g., by glycosyla nucleotide Sequence containing the Sequence encoding the tion, phosphorylation, acetylation, myristoylation, prenyla ORFC-ER domain is represented herein as SEQ ID NO:31 tion, palmitation, amidation and/or addition of glyco (positions 2998-4509 of SEQ ID NO:5). The amino acid Sylphosphatidyl inositol). Preferred homologues of a PUFA Sequence containing the ER domain spans from a starting PKS protein or domain are described in detail below. It is point of about position 1000 of SEQ ID NO:6 (ORFC) to an noted that homologues can include Synthetically produced ending point of about position 1502 of SEQ ID NO:6. The homologues, naturally occurring allelic variants of a given amino acid Sequence containing the ORFC-ER domain is protein or domain, or homologous Sequences from organ represented herein as SEQ ID NO:32 (positions 1000-1502 isms other than the organism from which the reference of SEQID NO:6). ER biological activity has been described Sequence was derived. above. 0094. In general, the biological activity or biological 0092. One embodiment of the present invention relates to action of a protein or domain refers to any function(s) an isolated nucleic acid molecule comprising a nucleic acid exhibited or performed by the protein or domain that is sequence from a non-bacterial PUFA PKS system, a homo ascribed to the naturally occurring form of the protein or logue thereof, a fragment thereof, and/or a nucleic acid domain as measured or observed in Vivo (i.e., in the natural Sequence that is complementary to any of Such nucleic acid physiological environment of the protein) or in vitro (i.e., Sequences. In one aspect, the present invention relates to an under laboratory conditions). Biological activities of PUFA isolated nucleic acid molecule comprising anucleic acid PKS systems and the individual proteins/domains that make Sequence Selected from the group consisting of: (a) a nucleic up a PUFA PKS system have been described in detail acid Sequence encoding an amino acid Sequence Selected elsewhere herein. Modifications of a protein or domain, Such from the group consisting of: SEQ ID NO:2, SEQID NO:4, as in a homologue or mimetic (discussed below), may result SEQID NO:6, and biologically active fragments thereof; (b) in proteins or domains having the same biological activity as a nucleic acid Sequence encoding an amino acid Sequence the naturally occurring protein or domain, or in proteins or selected from the group consisting of: SEQ ID NO:8, SEQ domains having decreased or increased biological activity as ID NO:10, SEQID NO:13, SEQID NO:18, SEQID NO:20, compared to the naturally occurring protein or domain. SEQ ID NO:22, SEQ ID NO:24, SEQ ID NO:26, SEQ ID Modifications which result in a decrease in expression or a NO:28, SEQ ID NO:30, SEQ ID NO:32, and biologically decrease in the activity of the protein or domain, can be active fragments thereof; (c) a nucleic acid sequence encod referred to as inactivation (complete or partial), down ing an amino acid Sequence that is at least about 60% regulation, or decreased action of a protein or domain. identical to at least 500 consecutive amino acids of Said Similarly, modifications which result in an increase in amino acid Sequence of (a), wherein Said amino acid expression or an increase in the activity of the protein or Sequence has a biological activity of at least one domain of domain, can be referred to as amplification, overproduction, a polyunsaturated fatty acid (PUFA) polyketide synthase activation, enhancement, up-regulation or increased action (PKS) system; (d) a nucleic acid sequence encoding an of a protein or domain. A functional domain of a PUFAPKS amino acid Sequence that is at least about 60% identical to System is a domain (i.e., a domain can be a portion of a said amino acid sequence of (b), wherein said amino acid protein) that is capable of performing a biological function Sequence has a biological activity of at least one domain of (i.e., has biological activity). a polyunsaturated fatty acid (PUFA) polyketide synthase 0095. In accordance with the present invention, an iso (PKS) system; or (e) a nucleic acid sequence that is fully lated nucleic acid molecule is a nucleic acid molecule that complementary to the nucleic acid sequence of (a), (b), (c), has been removed from its natural milieu (i.e., that has been or (d). In a further embodiment, nucleic acid Sequences Subject to human manipulation), its natural milieu being the including a Sequence encoding the active site domains or genome or chromosome in which the nucleic acid molecule other functional motifs described above for several of the is found in nature. AS Such, "isolated” does not necessarily PUFA PKS domains are encompassed by the invention. reflect the extent to which the nucleic acid molecule has US 2002/0194641 A1 Dec. 19, 2002

been purified, but indicates that the molecule does not dependent on nucleic acid composition and percent homol include an entire genome or an entire chromosome in which ogy or identity between the nucleic acid molecule and the nucleic acid molecule is found in nature. An isolated complementary Sequence as well as upon hybridization nucleic acid molecule can include a gene. An isolated conditions per Se (e.g., temperature, Salt concentration, and nucleic acid molecule that includes a gene is not a fragment formamide concentration). The minimal size of a nucleic of a chromosome that includes Such gene, but rather includes acid molecule that is used as an oligonucleotide primer or as the coding region and regulatory regions associated with the a probe is typically at least about 12 to about 15 nucleotides gene, but no additional genes naturally found on the same in length if the nucleic acid molecules are GC-rich and at chromosome. An isolated nucleic acid molecule can also least about 15 to about 18 bases in length if they are AT-rich. include a specified nucleic acid Sequence flanked by (i.e., at There is no limit, other than a practical limit, on the maximal the 5' and/or the 3' end of the sequence) additional nucleic Size of a nucleic acid molecule of the present invention, in acids that do not normally flank the Specified nucleic acid that the nucleic acid molecule can include a sequence Sequence in nature (i.e., heterologous sequences). Isolated Sufficient to encode a biologically active fragment of a nucleic acid molecule can include DNA, RNA (e.g., domain of a PUFAPKS system, an entire domain of a PUFA mRNA), or derivatives of either DNA or RNA (e.g., cDNA). PKS System, Several domains within an open reading frame Although the phrase “nucleic acid molecule' primarily (Orf) of a PUFA PKS system, an entire Orf of a PUFAPKS refers to the physical nucleic acid molecule and the phrase system, or more than one Orf of a PUFA PKS system. "nucleic acid Sequence' primarily refers to the Sequence of 0099. In one embodiment of the present invention, an nucleotides on the nucleic acid molecule, the two phrases isolated nucleic acid molecule comprises or consists essen can be used interchangeably, especially with respect to a tially of a nucleic acid Sequence Selected from the group of: nucleic acid molecule, or a nucleic acid Sequence, being SEQ ID NO:2, SEQID NO:4, SEQ ID NO:6, SEQ ID NO:8, capable of encoding a protein or domain of a protein. SEQ ID NO:10, SEQ ID NO:13, SEQ ID NO:18, SEQ ID 0.096 Preferably, an isolated nucleic acid molecule of the NO:20, SEQ ID NO:22, SEQ ID NO:24, SEQ ID NO:26, present invention is produced using recombinant DNA tech SEQ ID NO:28, SEQ ID NO:30, SEQ ID NO:32, or bio nology (e.g., polymerase chain reaction (PCR) amplifica logically active fragments thereof. In one aspect, the nucleic tion, cloning) or chemical Synthesis. Isolated nucleic acid acid sequence is selected from the group of: SEQ ID NO:1, molecules include natural nucleic acid molecules and homo SEQID NO:3, SEQID NO:5, SEQ ID NO:7, SEQ ID NO:9, logues thereof, including, but not limited to, natural allelic SEQ ID NO:12, SEQ ID NO:17, SEQ ID NO:19, SEQ ID variants and modified nucleic acid molecules in which NO:21, SEQ ID NO:23, SEQ ID NO:25, SEQ ID NO:27, nucleotides have been inserted, deleted, Substituted, and/or SEQ ID NO:29, and SEQ ID NO:31. In one embodiment of inverted in Such a manner that Such modifications provide the present invention, any of the above-described PUFA the desired effect on PUFA PKS system biological activity PKS amino acid Sequences, as well as homologues of Such as described herein. Protein homologues (e.g., proteins Sequences, can be produced with from at least one, and up encoded by nucleic acid homologues) have been discussed to about 20, additional heterologous amino acids flanking in detail above. each of the C- and/or N-terminal end of the given amino acid Sequence. The resulting protein or polypeptide can be 0097. A nucleic acid molecule homologue can be pro referred to as "consisting essentially of a given amino acid duced using a number of methods known to those skilled in Sequence. According to the present invention, the heterolo the art (see, for example, Sambrook et al., Molecular Clon gous amino acids are a Sequence of amino acids that are not ing. A Laboratory Manual, Cold Spring Harbor Labs Press, naturally found (i.e., not found in nature, in vivo) flanking 1989). For example, nucleic acid molecules can be modified the given amino acid Sequence or which would not be using a variety of techniques including, but not limited to, encoded by the nucleotides that flank the naturally occurring classic mutagenesis techniques and recombinant DNA tech nucleic acid Sequence encoding the given amino acid niques, Such as site-directed mutagenesis, chemical treat Sequence as it occurs in the gene, if Such nucleotides in the ment of a nucleic acid molecule to induce mutations, restric naturally occurring Sequence were translated using Standard tion enzyme cleavage of a nucleic acid fragment, ligation of codon usage for the organism from which the given amino nucleic acid fragments, PCR amplification and/or mutagen acid Sequence is derived. Similarly, the phrase “consisting esis of Selected regions of a nucleic acid Sequence, Synthesis essentially of, when used with reference to a nucleic acid of oligonucleotide mixtures and ligation of mixture groups Sequence herein, refers to a nucleic acid Sequence encoding to “build” a mixture of nucleic acid molecules and combi a given amino acid Sequence that can be flanked by from at nations thereof. Nucleic acid molecule homologues can be least one, and up to as many as about 60, additional Selected from a mixture of modified nucleic acids by Screen heterologous nucleotides at each of the 5' and/or the 3' end ing for the function of the protein encoded by the nucleic of the nucleic acid Sequence encoding the given amino acid acid and/or by hybridization with a wild-type gene. Sequence. The heterologous nucleotides are not naturally 0098. The minimum size of a nucleic acid molecule of found (i.e., not found in nature, in vivo) flanking the nucleic the present invention is a size Sufficient to form a probe or acid Sequence encoding the given amino acid Sequence as it oligonucleotide primer that is capable of forming a stable occurs in the natural gene. hybrid (e.g., under moderate, high or very high Stringency 0100. The present invention also includes an isolated conditions) with the complementary Sequence of a nucleic nucleic acid molecule comprising a nucleic acid Sequence acid molecule useful in the present invention, or of a size encoding an amino acid Sequence having a biological activ Sufficient to encode an amino acid Sequence having a bio ity of at least one domain of a PUFA PKS system. In one logical activity of at least one domain of a PUFA PKS aspect, Such a nucleic acid Sequence encodes a homologue System according to the present invention. AS Such, the size of any of the Schizochytrium PUFA PKS ORFs or domains, of the nucleic acid molecule encoding Such a protein can be including: SEQ ID NO:2, SEQ ID NO:4, SEQ ID NO:6, US 2002/0194641 A1 Dec. 19, 2002

SEQ ID NO:8, SEQ ID NO:10, SEQ ID NO:13, SEQ ID wherein the amino acid Sequence has a biological activity of NO:18, SEQ ID NO:20, SEQ ID NO:22, SEQ ID NO:24, at least one domain of a PUFA PKS system. SEQ ID NO:26, SEQ ID NO:28, SEQ ID NO:30, or SEQ ID 0103) In one aspect of the invention, a homologue of a NO:32, wherein the homologue has a biological activity of Schizochytrium PUFAPKS protein or domain encompassed at least one domain of a PUFA PKS system as described by the present invention comprises an amino acid Sequence previously herein. that is at least about 60% identical to an amino acid Sequence 0101. In one aspect of the invention, a homologue of a chosen from: SEQ ID NO:8, SEQ ID NO:10, SEQ ID Schizochytrium PUFAPKS protein or domain encompassed NO:13, SEQ ID NO:18, SEQ ID NO:20, SEQ ID NO:22, by the present invention comprises an amino acid Sequence SEQ ID NO:24, SEQ ID NO:26, SEQ ID NO:28, SEQ ID that is at least about 60% identical to at least 500 consecutive NO:30, or SEQ ID NO:32, wherein said amino acid amino acids of an amino acid Sequence chosen from: SEQ Sequence has a biological activity of at least one domain of ID NO:2, SEQ ID NO:4, and SEQ ID NO:6; wherein said a PUFA PKS system. In a further aspect, the amino acid amino acid Sequence has a biological activity of at least one Sequence of the homologue is at least about 65% identical, domain of a PUFA PKS system. In a further aspect, the and more preferably at least about 70% identical, and more amino acid Sequence of the homologue is at least about 60% preferably at least about 75% identical, and more preferably identical to at least about 600 consecutive amino acids, and at least about 80% identical, and more preferably at least more preferably to at least about 700 consecutive amino about 85% identical, and more preferably at least about 90% acids, and more preferably to at least about 800 consecutive identical, and more preferably at least about 95% identical, amino acids, and more preferably to at least about 900 and more preferably at least about 96% identical, and more consecutive amino acids, and more preferably to at least preferably at least about 97% identical, and more preferably about 1000 consecutive amino acids, and more preferably to at least about 98% identical, and more preferably at least at least about 1100 consecutive amino acids, and more about 99% identical to an amino acid Sequence chosen from: preferably to at least about 1200 consecutive amino acids, SEQ ID NO:8, SEQ ID NO:10, SEQ ID NO:13, SEQ ID and more preferably to at least about 1300 consecutive NO:18, SEQ ID NO:20, SEQ ID NO:22, SEQ ID NO:24, amino acids, and more preferably to at least about 1400 SEQ ID NO:26, SEQ ID NO:28, SEQ ID NO:30, SEQ ID consecutive amino acids, and more preferably to at least NO:32, wherein the amino acid Sequence has a biological about 1500 consecutive amino acids of any of SEQ ID activity of at least one domain of a PUFA PKS system. NO:2, SEQID NO:4 and SEQID NO:6, or to the full length of SEQ ID NO:6. In a further aspect, the amino acid 0104. According to the present invention, the term “con Sequence of the homologue is at least about 60% identical to tiguous” or “consecutive' , with regard to nucleic acid or at least about 1600 consecutive amino acids, and more amino acid sequences described herein, means to be con preferably to at least about 1700 consecutive amino acids, nected in an unbroken Sequence. For example, for a first and more preferably to at least about 1800 consecutive Sequence to comprise 30 contiguous (or consecutive) amino amino acids, and more preferably to at least about 1900 acids of a Second Sequence, means that the first Sequence consecutive amino acids, and more preferably to at least includes an unbroken Sequence of 30 amino acid residues about 2000 consecutive amino acids of any of SEQID NO:2 that is 100% identical to an unbroken sequence of 30 amino or SEQ ID NO:4, or to the full length of SEQ ID NO:4. In acid residues in the Second Sequence. Similarly, for a first a further aspect, the amino acid Sequence of the homologue sequence to have “100% identity” with a second sequence is at least about 60% identical to at least about 2100 means that the first Sequence exactly matches the Second consecutive amino acids, and more preferably to at least Sequence with no gaps between nucleotides or amino acids. about 2200 consecutive amino acids, and more preferably to 0105. As used herein, unless otherwise specified, refer at least about 2300 consecutive amino acids, and more ence to a percent (%) identity refers to an evaluation of preferably to at least about 2400 consecutive amino acids, homology which is performed using: (1) a BLAST 2.0 Basic and more preferably to at least about 2500 consecutive BLAST homology Search using blastp for amino acid amino acids, and more preferably to at least about 2600 Searches, blastin for nucleic acid Searches, and blastX for consecutive amino acids, and more preferably to at least nucleic acid Searches and Searches of translated amino acids about 2700 consecutive amino acids, and more preferably to in all 6 open reading frames, all with Standard default at least about 2800 consecutive amino acids, and even more parameters, wherein the query Sequence is filtered for low preferably, to the full length of SEQ ID NO:2. complexity regions by default (described in Altschul, S. F., 0102) In another aspect, a homologue of a Madden, T. L., Schaaffer, A. A., Zhang, J., Zhang, Z., Miller, Schizochytrium PUFAPKS protein or domain encompassed W. & Lipman, D. J. (1997) “Gapped BLAST and PSI by the present invention comprises an amino acid Sequence BLAST: a new generation of protein database Search pro that is at least about 65% identical, and more preferably at grams.” Nucleic Acids Res. 25:3389-3402, incorporated least about 70% identical, and more preferably at least about herein by reference in its entirety); (2) a BLAST 2 alignment 75% identical, and more preferably at least about 80% (using the parameters described below); (3) and/or PSI identical, and more preferably at least about 85% identical, BLAST with the standard default parameters (Position and more preferably at least about 90% identical, and more Specific Iterated BLAST). It is noted that due to some preferably at least about 95% identical, and more preferably differences in the standard parameters between BLAST 2.0 at least about 96% identical, and more preferably at least Basic BLAST and BLAST 2, two specific sequences might about 97% identical, and more preferably at least about 98% be recognized as having Significant homology using the identical, and more preferably at least about 99% identical BLAST 2 program, whereas a search performed in BLAST to an amino acid sequence chosen from: SEQID NO:2, SEQ 2.0 Basic BLAST using one of the Sequences as the query ID NO:4, or SEQ ID NO:6, over any of the consecutive Sequence may not identify the Second Sequence in the top amino acid lengths described in the paragraph above, matches. In addition, PSI-BLAST provides an automated, US 2002/0194641 A1 Dec. 19, 2002 easy-to-use version of a “profile' Search, which is a Sensitive 0116. As used herein, hybridization conditions refer to way to look for Sequence homologues. The program first Standard hybridization conditions under which nucleic acid performs a gapped BLAST database search. The PSI molecules are used to identify similar nucleic acid mol BLAST program uses the information from any significant ecules. Such Standard conditions are disclosed, for example, alignments returned to construct a position-specific Score in Sambrook et al., Molecular Cloning. A Laboratory matrix, which replaces the query Sequence for the next round Manual, Cold Spring Harbor Labs Press, 1989. Sambrook et of database Searching. Therefore, it is to be understood that al., ibid., is incorporated by reference herein in its entirety percent identity can be determined by using any one of these (see Specifically, pages 9.31-9.62). In addition, formulae to programs. calculate the appropriate hybridization and wash conditions 0106 Two specific sequences can be aligned to one to achieve hybridization permitting varying degrees of mis another using BLAST 2 Sequence as described in Tatusova match of nucleotides are disclosed, for example, in and Madden, (1999), “Blast 2 sequences-a new tool for Meinkoth et al., 1984, Anal. Biochem. 138, 267-284; comparing protein and nucleotide Sequences”, FEMS Micro Meinkoth et al., ibid., is incorporated by reference herein in biol Lett. 174:247-250, incorporated herein by reference in its entirety. its entirety. BLAST 2 Sequence alignment is performed in 0117 More particularly, moderate stringency hybridiza blastp or blastn using the BLAST 2.0 algorithm to perform tion and washing conditions, as referred to herein, refer to a Gapped BLAST search (BLAST 2.0) between the two conditions which permit isolation of nucleic acid molecules Sequences allowing for the introduction of gaps (deletions having at least about 70% nucleic acid Sequence identity and insertions) in the resulting alignment. For purposes of with the nucleic acid molecule being used to probe in the clarity herein, a BLAST 2 Sequence alignment is performed hybridization reaction (i.e., conditions permitting about 30% using the Standard default parameters as follows. or less mismatch of nucleotides). High Stringency hybrid ization and washing conditions, as referred to herein, refer to 0107 For blastin, using 0 BLOSUM62 matrix: conditions which permit isolation of nucleic acid molecules 0108 Reward for match=1 having at least about 80% nucleic acid Sequence identity with the nucleic acid molecule being used to probe in the 0109 Penalty for mismatch=-2 hybridization reaction (i.e., conditions permitting about 20% 0110 Open gap (5) and extension gap (2) penalties or less mismatch of nucleotides). Very high Stringency hybridization and washing conditions, as referred to herein, 0111 gap x dropoff (50) expect (10) word size (11) refer to conditions which permit isolation of nucleic acid filter (on) molecules having at least about 90% nucleic acid sequence 0112 For blastp, using 0 BLOSUM62 matrix: identity with the nucleic acid molecule being used to probe in the hybridization reaction (i.e., conditions permitting 0113 Open gap (11) and extension gap (1) penalties about 10% or less mismatch of nucleotides). As discussed above, one of skill in the art can use the formulae in 0114 gap x dropoff (50) expect (10) word size (3) Meinkoth et al., ibid. to calculate the appropriate hybridiza filter (on). tion and wash conditions to achieve these particular levels of 0115) In another embodiment of the invention, an amino nucleotide mismatch. Such conditions will vary, depending acid Sequence having the biological activity of at least one on whether DNA:RNA or DNA:DNA hybrids are being domain of a PUFA PKS system of the present invention formed. Calculated melting temperatures for DNA:DNA includes an amino acid Sequence that is Sufficiently similar hybrids are 10° C. less than for DNA:RNA hybrids. In to a naturally occurring PUFA PKS protein or polypeptide particular embodiments, Stringent hybridization conditions that a nucleic acid Sequence encoding the amino acid for DNA:DNA hybrids include hybridization at an ionic Sequence is capable of hybridizing under moderate, high, or strength of 6xSSC (0.9 M. Na") at a temperature of between very high Stringency conditions (described below) to (i.e., about 20° C. and about 35° C. (lower stringency), more with) a nucleic acid molecule encoding the naturally occur preferably, between about 28° C. and about 40° C. (more ring PUFA PKS protein or polypeptide (i.e., to the comple stringent), and even more preferably, between about 35 C. ment of the nucleic acid Strand encoding the naturally and about 45 C. (even more Stringent), with appropriate occurring PUFAPKS protein or polypeptide). Preferably, an wash conditions. In particular embodiments, Stringent amino acid Sequence having the biological activity of at least hybridization conditions for DNA:RNA hybrids include one domain of a PUFAPKS system of the present invention hybridization at an ionic strength of 6xSSC (0.9 M. Na") at is encoded by a nucleic acid Sequence that hybridizes under a temperature of between about 30° C. and about 45 C., moderate, high or very high Stringency conditions to the more preferably, between about 38 C. and about 50 C., and complement of a nucleic acid Sequence that encodes a even more preferably, between about 45 C. and about 55 protein comprising an amino acid Sequence represented by C., with Similarly Stringent wash conditions. These values any of SEQ ID NO:2, SEQ ID NO:4, SEQ ID NO:6, SEQ are based on calculations of a melting temperature for ID NO:8, SEQ ID NO:10, SEQ ID NO:13, SEQ ID NO:18, molecules larger than about 100 nucleotides, 0% formamide SEQ ID NO:20, SEQ ID NO:22, SEQ ID NO:24, SEQ ID and a G+C content of about 40%. Alternatively, T can be NO:26, SEQ ID NO:28, SEQ ID NO:30, or SEQID NO:32. calculated empirically as Set forth in Sambrook et al., Supra, Methods to deduce a complementary Sequence are known to pages 9.31 to 9.62. In general, the wash conditions should be those skilled in the art. It should be noted that since amino as Stringent as possible, and should be appropriate for the acid Sequencing and nucleic acid Sequencing technologies chosen hybridization conditions. For example, hybridization are not entirely error-free, the Sequences presented herein, at conditions can include a combination of Salt and temperature best, represent apparent sequences of PUFA PKS domains conditions that are approximately 20-25 C. below the and proteins of the present invention. calculated T of a particular hybrid, and wash conditions US 2002/0194641 A1 Dec. 19, 2002 typically include a combination of Salt and temperature 0120 In another embodiment, a recombinant vector used conditions that are approximately 12-20° C. below the in a recombinant nucleic acid molecule of the present calculated T of the particular hybrid. One example of invention is a targeting vector. AS used herein, the phrase hybridization conditions suitable for use with DNA:DNA “targeting vector” is used to refer to a vector that is used to hybrids includes a 2-24 hour hybridization in 6xSSC (50% deliver a particular nucleic acid molecule into a recombinant formamide) at about 42°C., followed by washing steps that host cell, wherein the nucleic acid molecule is used to delete include one or more washes at room temperature in about or inactivate an endogenous gene within the host cell or 2xSSC, followed by additional washes at higher tempera microorganism (i.e., used for targeted gene disruption or tures and lower ionic strength (e.g., at least one wash as knock-out technology). Such a vector may also be known in about 37° C. in about 0.1x-0.5xSSC, followed by at least the art as a “knock-Out' Vector. In one aspect of this one wash at about 68° C. in about 0.1x-0.5xSSC). embodiment, a portion of the vector, but more typically, the 0118. Another embodiment of the present invention nucleic acid molecule inserted into the vector (i.e., the includes a recombinant nucleic acid molecule comprising a insert), has a nucleic acid sequence that is homologous to a recombinant vector and a nucleic acid molecule comprising nucleic acid Sequence of a target gene in the host cell (i.e., a nucleic acid Sequence encoding an amino acid Sequence a gene which is targeted to be deleted or inactivated). The having a biological activity of at least one domain of a PUFA nucleic acid Sequence of the vector insert is designed to bind PKS system as described herein. Such nucleic acid to the target gene Such that the target gene and the insert Sequences are described in detail above. According to the undergo homologous recombination, whereby the endog present invention, a recombinant vector is an engineered enous target gene is deleted, inactivated or attenuated (i.e., (i.e., artificially produced) nucleic acid molecule that is used by at least a portion of the endogenous target gene being as a tool for manipulating a nucleic acid Sequence of choice mutated or deleted). and for introducing Such a nucleic acid Sequence into a host 0121 Typically, a recombinant nucleic acid molecule cell. The recombinant vector is therefore Suitable for use in includes at least one nucleic acid molecule of the present cloning, Sequencing, and/or otherwise manipulating the invention operatively linked to one or more transcription nucleic acid Sequence of choice, Such as by expressing control Sequences. AS used herein, the phrase “recombinant and/or delivering the nucleic acid Sequence of choice into a molecule' or “recombinant nucleic acid molecule' primarily host cell to form a recombinant cell. Such a vector typically refers to a nucleic acid molecule or nucleic acid Sequence contains heterologous nucleic acid Sequences, that is nucleic operatively linked to a transcription control Sequence, but acid Sequences that are not naturally found adjacent to can be used interchangeably with the phrase “nucleic acid nucleic acid Sequence to be cloned or delivered, although the molecule', when Such nucleic acid molecule is a recombi vector can also contain regulatory nucleic acid Sequences nant molecule as discussed herein. According to the present (e.g., promoters, untranslated regions) which are naturally invention, the phrase “operatively linked’ refers to linking a found adjacent to nucleic acid molecules of the present nucleic acid molecule to a transcription control Sequence in invention or which are useful for expression of the nucleic a manner Such that the molecule is able to be expressed acid molecules of the present invention (discussed in detail when transfected (i.e., transformed, transduced, transfected, below). The vector can be either RNA or DNA, either conjugated or conduced) into a host cell. Transcription prokaryotic or eukaryotic, and typically is a plasmid. The control Sequences are Sequences which control the initiation, vector can be maintained as an extrachromosomal element elongation, or termination of transcription. Particularly (e.g., a plasmid) or it can be integrated into the chromosome important transcription control Sequences are those which of a recombinant organism (e.g., a microbe or a plant). The control transcription initiation, Such as promoter, enhancer, entire vector can remain in place within a host cell, or under operator and repressor Sequences. Suitable transcription certain conditions, the plasmid DNA can be deleted, leaving control Sequences include any transcription control behind the nucleic acid molecule of the present invention. Sequence that can function in a host cell or organism into The integrated nucleic acid molecule can be under chromo which the recombinant nucleic acid molecule is to be Somal promoter control, under native or plasmid promoter introduced. control, or under a combination of Several promoter con 0.122 Recombinant nucleic acid molecules of the present trols. Single or multiple copies of the nucleic acid molecule invention can also contain additional regulatory Sequences, can be integrated into the chromosome. A recombinant Such as translation regulatory Sequences, origins of replica vector of the present invention can contain at least one tion, and other regulatory Sequences that are compatible Selectable marker. with the recombinant cell. In one embodiment, a recombi 0119). In one embodiment, a recombinant vector used in nant molecule of the present invention, including those a recombinant nucleic acid molecule of the present invention which are integrated into the host cell chromosome, also is an expression vector. AS used herein, the phrase “expres contains Secretory signals (i.e., signal Segment nucleic acid Sion vector' is used to refer to a vector that is Suitable for Sequences) to enable an expressed protein to be secreted production of an encoded product (e.g., a protein of interest). from the cell that produces the protein. Suitable Signal In this embodiment, a nucleic acid Sequence encoding the Segments include a Signal Segment that is naturally associ product to be produced (e.g., a PUFA PKS domain) is ated with the protein to be expressed or any heterologous inserted into the recombinant vector to produce a recombi Signal Segment capable of directing the Secretion of the nant nucleic acid molecule. The nucleic acid Sequence protein according to the present invention. In another encoding the protein to be produced is inserted into the embodiment, a recombinant molecule of the present inven vector in a manner that operatively links the nucleic acid tion comprises a leader Sequence to enable an expressed Sequence to regulatory Sequences in the vector which enable protein to be delivered to and inserted into the membrane of the transcription and translation of the nucleic acid Sequence a host cell. Suitable leader Sequences include a leader within the recombinant host cell. Sequence that is naturally associated with the protein, or any US 2002/0194641 A1 Dec. 19, 2002 heterologous leader Sequence capable of directing the deliv enous nucleic acid molecule (i.e., a recombinant nucleic acid ery and insertion of the protein to the membrane of a cell. molecule) can be inserted into a cell. The term “transfor 0123 The present inventors have found that the mation' can be used interchangeably with the term “trans Schizochytrium PUFAPKS Orfs A and B are closely linked fection' when Such term is used to refer to the introduction in the genome and region between the Orfs has been of nucleic acid molecules into microbial cells, Such as algae, Sequenced. The Orfs are oriented in opposite directions and bacteria and yeast. In microbial Systems, the term “trans 4244 base pairs separate the start (ATG) codons (i.e. they are formation' is used to describe an inherited change due to the arranged as follows: 3'OrfA5'-4244 bp-5'OrfB3'). Examina acquisition of exogenous nucleic acids by the microorgan tion of the 4244 bp intergenic region did not reveal any ism and is essentially Synonymous with the term “transfec obvious Orfs (no significant matches were found on a tion.” However, in animal cells, transformation has acquired BlastX search). Both Orfs A and B are highly expressed in a Second meaning which can refer to changes in the growth Schizochytrium, at least during the time of oil production, properties of cells in culture after they become cancerous, implying that active promoter elements are embedded in this for example. Therefore, to avoid confusion, the term “trans intergenic region. These genetic elements are believed to fection” is preferably used with regard to the introduction of have utility as a bi-directional promoter Sequence for trans exogenous nucleic acids into animal cells, and the term genic applications. For example, in a preferred embodiment, “transfection' will be used herein to generally encompass one could clone this region, place any genes of interest at transfection of animal cells, is plant cells and transformation each end and introduce the construct into Schizochytrium of microbial cells, to the extent that the terms pertain to the (or Some other host in which the promoters can be shown to introduction of exogenous nucleic acids into a cell. There function). It is predicted that the regulatory elements, under fore, transfection techniques include, but are not limited to, the appropriate conditions, would provide for coordinated, transformation, particle bombardment, electroporation, high level expression of the two introduced genes. The microinjection, lipofection, adsorption, infection and proto complete nucleotide Sequence for the regulatory region plast fusion. containing Schizochytrium PUFA PKS regulatory elements 0128. It will be appreciated by one skilled in the art that (e.g., a promoter) is represented herein as SEQ ID NO:36. use of recombinant DNA technologies can improve control 0.124. In a similar manner, Orfo is highly expressed in of expression of transfected nucleic acid molecules by Schizochytrium during the time of oil production and regu manipulating, for example, the number of copies of the latory elements are expected to reside in the region upstream nucleic acid molecules within the host cell, the efficiency of its start codon. A region of genomic DNA upstream of with which those nucleic acid molecules are transcribed, the OrfC has been cloned and Sequenced and is represented efficiency with which the resultant transcripts are translated, hereinas (SEQID NO:37). This sequence contains the 3886 and the efficiency of post-translational modifications. Addi nt immediately upstream of the Orfo start codon. Exami tionally, the promoter Sequence might be genetically engi nation of this region did not reveal any obvious Orfs (i.e., no neered to improve the level of expression as compared to the Significant matches were found on a BlastX Search). It is native promoter. Recombinant techniques useful for control believed that regulatory elements contained in this region, ling the expression of nucleic acid molecules include, but are under the appropriate conditions, will provide for high-level not limited to, integration of the nucleic acid molecules into expression of a gene placed behind them. Additionally, one or more host cell chromosomes, addition of Vector under the appropriate conditions, the level of expression Stability Sequences to plasmids, Substitutions or modifica may be coordinated with genes under control of the A-B tions of transcription control signals (e.g., promoters, opera intergenic region (SEQ ID NO:36). tors, enhancers), Substitutions or modifications of transla 0.125 Therefore, in one embodiment, a recombinant tional control signals (e.g., ribosome binding sites, Shine nucleic acid molecule useful in the present invention, as Dalgarno Sequences), modification of nucleic acid disclosed herein, can include a PUFAPKS regulatory region molecules to correspond to the codon usage of the host cell, contained within SEO ID NO:36 and/or SEO ID NO:37. and deletion of Sequences that destabilize transcripts. Such a regulatory region can include any portion (fragment) 0.129 General discussion above with regard to recombi of SEO ID NO:36 and/or SEO ID NO:37 that has at least nant nucleic acid molecules and transfection of host cells is basal PUFA PKS transcriptional activity. intended to be applied to any recombinant nucleic acid 0.126 One or more recombinant molecules of the present molecule discussed herein, including those encoding any invention can be used to produce an encoded product (e.g., amino acid Sequence having a biological activity of at least a PUFA PKS domain, protein, or system) of the present one domain from a PUFA PKS, those encoding amino acid invention. In one embodiment, an encoded product is pro Sequences from other PKS Systems, and those encoding duced by expressing a nucleic acid molecule as described other proteins or domains. herein under conditions effective to produce the protein. A 0.130. This invention also relates to the use of a novel preferred method to produce an encoded protein is by method to identify a microorganism that has a PUFA PKS transfecting a host cell with one or more recombinant System that is homologous in Structure, domain organization molecules to form a recombinant cell. Suitable host cells to and/or function to a Schizochytrium PUFA PKS system. In transfect include, but are not limited to, any bacterial, ful one embodiment, the microorganism is a non-bacterial ingal (e.g., yeast), insect, plant or animal cell that can be microorganism, and preferably, the microorganism identi transfected. Host cells can be either untransfected cells or fied by this method is a eukaryotic microorganism. In cells that are already transfected with at least one other addition, this invention relates to the microorganisms iden recombinant nucleic acid molecule. tified by Such method and to the use of these microorganisms 0127. According to the present invention, the term “trans and the PUFA PKS systems from these microorganisms in fection' is used to refer to any method by which an exog the various applications for a PUFA PKS system (e.g., US 2002/0194641 A1 Dec. 19, 2002 genetically modified organisms and methods ofproducing more preferably greater than about 25 C., and even more bioactive molecules) according to the present invention. The preferably greater than 30° C. The low or anoxic culture unique Screening method described and demonstrated herein environment can be easily maintained in culture chambers enables the rapid identification of new microbial Strains capable of inducing this type of atmospheric environment in containing a PUFA PKS system homologous to the the chamber (and thus in the cultures) or by culturing the Schizochytrium PUFAPKS system of the present invention. cells in a manner that induces the low oxygen environment Applicants have used this method to discover and disclose directly in the culture flask/vessel itself. herein that a ThrauStochytrium microorganism contains a 0135) In a preferred culturing method, the microbes can PUFA PKS system that is homologous to that found in be cultured in Shake flaskS which, instead of normally Schizochytrium. This discovery is described in detail in containing a Small amount of culture medium-leSS than Example 2 below. about 50% of total capacity and usually less than about 25% 0131 Microbial organisms with a PUFA PKS system of total capacity-to keep the medium aerated as it is Shaken similar to that found in Schizochytrium, such as the Thraus on a shaker table, are instead filled to greater than about 50% tochytrium microorganism discovered by the present inven of their capacity, and more preferably greater than about tors and described in Example 2, can be readily identified/ 60%, and most preferably greater than about 75% of their isolated/screened by the following methods used Separately capacity with culture medium. High loading of the Shake or in any combination of these methods. flask with culture medium prevents it from mixing very well in the flask when it is placed on a Shaker table, preventing 0.132. In general, the method to identify a non-bacterial oxygen diffusion into the culture. Therefore as the microbes microorganism that has a polyunsaturated fatty acid (PUFA) grow, they use up the existing oxygen in the medium and polyketide synthase (PKS) system includes a first step of (a) naturally create a low or no oxygen environment in the Shake Selecting a microorganism that produces at least one PUFA, flask. and a second step of (b) identifying a microorganism from (a) that has an ability to produce increased PUFAs under 0.136. After the culture period, the cells are harvested and dissolved oxygen conditions of less than about 5% of analyzed for content of bioactive compounds of interest Saturation in the fermentation medium, as compared to (e.g., lipids), but most particularly, for compounds contain production of PUFAS by said microorganism under dis ing two or more unsaturated bonds, and more preferably Solved oxygen conditions of greater than 5% of Saturation, three or more double bonds, and even more preferably four more preferably 10% of Saturation, more preferably greater or more double bonds. For lipids, those Strains possessing than 15% of Saturation and more preferably greater than Such compounds at greater than about 5%, and more pref 20% of Saturation in the fermentation medium. A microor erably greater than about 10%, and more preferably greater ganism that produces at least one PUFA and has an ability than about 15%, and even more preferably greater than to produce increased PUFAS under dissolved oxygen con about 20% of the dry weight of the microorganism are ditions of less than about 5% of Saturation is identified as a identified as predictably containing a novel PKS system of candidate for containing a PUFAPKS system. Subsequent to the type described above. For other bioactive compounds, identifying a microorganism that is a Strong candidate for Such as antibiotics or compounds that are Synthesized in containing a PUFA PKS system, the method can include an Smaller amounts, those Strains possessing Such compounds additional step (c) of detecting whether the organism iden at greater than about 0.5%, and more preferably greater than tified in step (b) comprises a PUFA PKS system. about 0.1%, and more preferably greater than about 0.25%, and more preferably greater than about 0.5%, and more 0133. In one embodiment of the present invention, step preferably greater than about 0.75%, and more preferably (b) is performed by culturing the microorganism Selected for greater than about 1%, and more preferably greater than the Screening process in low oxygen/anoxic conditions and about 2.5%, and more preferably greater than about 5% of aerobic conditions, and, in addition to measuring PUFA the dry weight of the microorganism are identified as pre content in the organism, the fatty acid profile is determined, dictably containing a novel PKS system of the type as well as fat content. By comparing the results under low described above. oxygen/anoxic conditions with the results under aerobic conditions, the method provides a strong indication of 0.137 Alternatively, or in conjunction with this method, whether the microorganism contains a PUFA PKS prospective microbial strains containing novel PUFA PKS system of the present invention. This preferred embodiment Systems as described herein can be identified by examining is described in detail below. the fatty acid profile of the strain (obtained by culturing the organism or through published or other readily available 0134) Initially, microbial strains to be examined for the sources). If the microbe contains greater than about 30%, presence of a PUFA PKS system are cultured under aerobic and more preferably greater than about 40%, and more conditions to induce production of a large number of cells preferably greater than about 45%, and even more preferably (microbial biomass). As one element of the identification greater than about 50% of its total fatty acids as C14:0, process, these cells are then placed under low oxygen or C16:0 and/or C16:1, while also producing at least one long anoxic culture conditions (e.g., dissolved oxygen less than chain fatty acid with three or more unsaturated bonds, and about 5% of saturation, more preferably less than about 2%, more preferably 4 or more double bonds, and more prefer even more preferably less than about 1%, and most prefer ably 5 or more double bonds, and even more preferably 6 or ably dissolved oxygen of about 0% of saturation in the more double bonds, then this microbial strain is identified as culture medium) and allowed to grow for approximately a likely candidate to possess a novel PUFA PKS system of another 24-72 hours. In this process, the microorganisms the type described in this invention. Screening this organism should be cultured at a temperature greater than about 15 under the low oxygen conditions described above, and C., and more preferably greater than about 20 C., and even confirming production of bioactive molecules containing US 2002/0194641 A1 Dec. 19, 2002 20 two or more unsaturated bonds would Suggest the existence 0149 If the first three questions are answered yes, this is of a novel PUFA PKS system in the organism, which could a good indication that the Strain contains a PKS genetic be further confirmed by analysis of the microbes genome. System for making long chain PUFAS. The more questions that are answered yes (preferably the first three questions 0.138. The success of this method can also be enhanced must be answered yes), the stronger the indication that the by Screening eukaryotic Strains that are known to contain strain contains such a PKS genetic system. If all five C17:0 and or C17:1 fatty acids (in conjunction with the large questions are answered yes, then there is a very Strong percentages of C14:0, C16:0 and C16:1 fatty acids described indication that the Strain contains a PKS genetic System for above) because the C17:0 and C17:1 fatty acids are poten making long chain PUFAs. The lack of 18:3n-3/18:2n-6/ tial markers for a bacterial (prokaryotic) based or influenced 18:3n-6 would indicate that the low oxygen conditions fatty acid production System. Another marker for identifying would have turned off or inhibited the conventional pathway strains containing novel PUFA PKS systems is the produc for PUFA synthesis. A high 14:0/16:0/16:1 fatty is an tion of Simple fatty acid profiles by the organism. According preliminary indicator of a bacterially influenced fatty acid to the present invention, a “simple fatty acid profile' is synthesis profile (the presence of C17:0 and 17:1 is also and defined as 8 or fewer fatty acids being produced by the Strain indicator of this) and of a simple fatty acid profile. The at levels greater than 10% of total fatty acids. increased PUFA synthesis and PUFA containing fat synthe 0139 Use of any of these methods or markers (singly or sis under the low oxygen conditions is directly indicative of preferably in combination) would enable one of skill in the a PUFA PKS system, since this system does not require art to readily identify microbial Strains that are highly oxygen to make highly unsaturated fatty acids. predicted to contain a novel PUFA PKS system of the type 0150 Finally, in the identification method of the present described in this invention. invention, once a strong candidate is identified, the microbe 0140. In a preferred embodiment combining many of the is preferably screened to detect whether or not the microbe methods and markers described above, a novel biorational contains a PUFA PKS system. For example, the genome of Screen (using shake flask cultures) has been developed for the microbe can be Screened to detect the presence of one or detecting microorganisms containing PUFA producing PKS more nucleic acid Sequences that encode a domain of a Systems. This Screening System is conducted as follows: PUFA PKS system as described herein. Preferably, this step of detection includes a Suitable nucleic acid detection 0141 A portion of a culture of the strain/microorganism method, Such as hybridization, amplification and or Sequenc to be tested is placed in 250 mL baffled shake flask with 50 ing of one or more nucleic acid Sequences in the microbe of mL culture media (aerobic treatment), and another portion of interest. The probes and/or primers used in the detection culture of the same strain is placed in a 250 mL non-baffled methods can be derived from any known PUFAPKS system, shake flask with 200 mL culture medium (anoxic/low oxy including the marine bacteria PUFAPKS systems described gen treatment). Various culture media can be employed in U.S. Pat. No. 6,140,486, or the Thraustochytrid PUFA depending on the type and Strain of microorganism being PKS systems described in U.S. application Ser. No. 09/231, evaluated. Both flasks are placed on a shaker table at 200 899 and herein. Once novel PUFA PKS systems are iden rpm. After 48-72 hr of culture time, the cultures are har tified, the genetic material from these Systems can also be Vested by centrifugation and the cells are analyzed for fatty used to detect additional novel PUFAPKS systems. Methods acid methyl ester content via gas chromatography to deter of hybridization, amplification and Sequencing of nucleic mine the following data for each culture: (1) fatty acid acids for the purpose of identification and detection of a profile; (2) PUFA content; and (3) fat content (approximated Sequence are well known in the art. Using these detection as amount total fatty acids/cell dry weight). methods, Sequence homology and domainstructure (e.g., the 0142. These data are then analyzed asking the following presence, number and/or arrangement of various PUFAPKS five questions (Yes/No): functional domains) can be evaluated and compared to the known PUFA PKS systems described herein. 0143 Comparing the Data from the Low O/Anoxic 0151. In some embodiments, a PUFAPKS system can be Flask with the Data from the Aerobic Flask: identified using biological assayS. For example, in U.S. 0144) (1) Did the DHA (or other PUFA content) (as % application Ser. No. 09/231,899, Example 7, the results of a FAME (fatty acid methyl esters)) stay about the same or key experiment using a well-known inhibitor of Some types preferably increased in the low oxygen culture compared to of fatty acid Synthesis Systems, i.e., thiolactomycin, is the aerobic culture? described. The inventors showed that the synthesis of 0145 (2) Is C14:0+C16:0+C16:1 greater than about 40% PUFAs in whole cells of Schizochytrium could be specifi TFA in the anoxic culture? cally blocked without blocking the synthesis of short chain Saturated fatty acids. The Significance of this result is as 0146 (3) Are there very little (<1% as FAME) or no follows: the inventors knew from analysis of cDNA precursors (C18:3n-3+C18:2n-6+C18:3n-6) to the conven Sequences from Schizochytrium that a Type I fatty acid tional oxygen dependent elongase/desaturase pathway in the Synthase System is present in Schizochytrium. It was known anoxic culture? that thiolactomycin does not inhibit Type I FAS systems, and 0147 (4) Did fat content (as amount total fatty acids/cell this is consistent with the inventors data-i.e., production dry weight) increase in the low oxygen culture compared to of the saturated fatty acids (primarily C14:0 and C16:0 in the aerobic culture? Schizochytrium) was not inhibited by the thiolactomycin treatment. There are no indications in the literature or in the 0148 (5) Did DHA (or other PUFA content) increase as inventors’ own data that thiolactomycin has any inhibitory % cell dry weight in the low oxygen culture compared to the effect on the elongation of C14:0 or C16:0 fatty acids or their aerobic culture? desaturation (i.e. the conversion of Short chain Saturated US 2002/0194641 A1 Dec. 19, 2002 fatty acids to PUFAs by the classical pathway). Therefore, produce at least one PUFA would initially identify candidate the fact that the PUFA production in Schizochytrium was bacterial strains with a PUFA PKS system that is operative blocked by thiolactomycin Strongly indicates that the clas at elevated temperatures (as opposed to those bacterial sical PUFA synthesis pathway does not produce the PUFAS strains in the prior art which only exhibit PUFA production in Schizochytrium, but rather that a different pathway of at temperatures less than about 20° C. and more preferably synthesis is involved. Further, it had previously been deter below about 5° C). mined that the Shewanella PUFAPKS system is inhibited by O155 Locations for collection of the preferred types of thiolactomycin (note that the PUFA PKS system of the microbes for screening for a PUFAPKS system according to present invention has elements of both Type I and Type II the present invention include any of the following: low Systems), and it was known that thiolactomycin is an inhibi oxygen environments (or locations near these types of low tor of Type II FAS systems (such as that found in E. coli). oxygen environments including in the guts of animals Therefore, this experiment indicated that Schizochytrium including invertebrates that consume microbes or microbe produced PUFAS as a result of a pathway not involving the containing foods (including types of filter feeding organ Type I FAS. A similar rationale and detection step could be isms), low or non-oxygen containing aquatic habitats used to detect a PUFA PKS system in a microbe identified (including freshwater, Saline and marine), and especially using the novel Screening method disclosed herein. at-or near-low oxygen environments (regions) in the . 0152. In addition, Example 3 shows additional biochemi The microbial strains would preferably not be obligate cal data which provides evidence that PUFAs in anaerobes but be adapted to live in both aerobic and low or Schizochytrium are not produced by the classical pathway anoxic environments. Soil environments containing both (i.e., precursor product kinetics between C16:0 and DHA are aerobic and low oxygen or anoxic environments would also not observed in whole cells and, in vitro PUFAsynthesis can excellent environments to find these organisms in and espe be separated from the membrane fraction-all of the fatty cially in these types of Soil in aquatic habitats or temporary acid desaturases of the classical PUFA Synthesis pathway, aquatic habitats. with the exception of the delta 9 desaturase which inserts the 0156 A particularly preferred microbial strain would be first double bond of the series, are associated with cellular a strain (selected from the group consisting of algae, fungi membranes). This type of biochemical data could be used to (including yeast), protozoa or ) that, during a portion detect PUFAPKS activity in microbe identified by the novel of its life cycle, is capable of consuming whole bacterial Screening method described above. cells (bacterivory) by mechanisms Such as phagocytosis, 0153. Preferred microbial strains to screen using the phagotrophic or endocytic capability and/or has a stage of its Screening/identification method of the present invention are life cycle in which it exists as an amoeboid stage or naked chosen from the group consisting of bacteria, algae, fungi, protoplast. This method of nutrition would greatly increase protozoa or protists, but most preferably from the eukaryotic the potential for transfer of a bacterial PKS system into a microbes consisting of algae, fungi, protozoa and protists. eukaryotic cell if a mistake occurred and the bacterial cell These microbes are preferably capable of growth and pro (or its DNA) did not get digested and instead are function duction of the bioactive compounds containing two or more ally incorporated into the eukaryotic cell. unsaturated bonds at temperatures greater than about 15 C., 0157 Strains of microbes (other than the members of the more preferably greater than about 20° C., even more Thraustochytrids) capable of bacterivory (especially by preferably greater than about 25 C. and most preferably phagocytosis or endocytosis) can be found in the following greater than about 30° C. microbial classes (including but not limited to example 0154) In some embodiments of this method of the present genera): invention, novel bacterial PUFA PKS systems can be iden 0158. In the algae and algae-like microbes (including tified in bacteria that produce PUFAs at temperatures Stramenopiles): of the class Euglenophyceae (for example exceeding about 20° C. preferably exceeding about 25 C. genera Euglena, and Peranema), the class Chrysophyceae and even more preferably exceeding about 30° C. As (for example the genus Ochromonas), the class Dinobry described previously herein, the marine bacteria, aceae (for example the genera Dinobryon, Platychrysis, and Shewanella and Vibrio marinus, described in U.S. Pat. No. Chrysochromulina), the Dinophyceae (including the genera 6,140,486, do not produce PUFAs at higher temperatures, Crypthecodinium, Gymnodinium, Peridinium, Ceratium, which limits the usefulness of PUFA PKS systems derived Gyrodinium, and Oxyrrhis), the class Cryptophyceae (for from these bacteria, particularly in plant applications under example the genera Cryptomonas, and Rhodomonas), the field conditions. Therefore, in one embodiment, the Screen class Xanthophyceae (for example the genus Olisthodiscus) ing method of the present invention can be used to identify (and including forms of algae in which an amoeboid stage bacteria that have a PUFAPKS system which are capable of occurs as in the Rhizochloridaceae, and growth and PUFA production at higher temperatures (e.g., Zoospores/gametes of Aphanochaete pascheri, Bumilleria above about 20.25, or 30° C.). In this embodiment, inhibitors of eukaryotic growth Such as nyStatin (antifungal) or cyclo Stigeoclonium and Vaucheria geminata), the class Eustig heximide (inhibitor of eukaryotic protein Synthesis) can be matophyceae, and the class Prymnesiopyceae (including the added to agar plates used to culture/Select initial Strains from genera Prymnesium and Diacronema). water Samples/Soil Samples collected from the types of 0159. In the Stramenopiles including the: Protero habitats/niches described below. This process would help monads, Opalines, Developayella, Diplophorys, Larbrin Select for enrichment of bacterial Strains without (or mini thulids, Thraustochytrids, BicoSecids, Oomycetes, mal) contamination of eukaryotic Strains. This selection Hypochytridiomycetes, Commation, Reticulosphaera, Pel process, in combination with culturing the plates at elevated agomonas, Pelapococcus, Ollicola, Aureococcus, Parmales, temperatures (e.g. 30 C.), and then selecting Strains that Raphidiophytes, Synurids, Rhizochromulinaales, Pedinella US 2002/0194641 A1 Dec. 19, 2002 22 les, Dictyochales, ChrySomeridales, Sarcinochrysidales, tion encompasses organisms identified by the Screening Hydrurales, Hibberdiales, and Chromulinales. method of the present invention which are then genetically modified to regulate the production of bioactive molecules 0160 In the Fungi: Class Myxomycetes (form myxam by said PUFA PKS system. oebae)-Slime molds, class Acrasieae including the orders Acrasiceae (for example the genus Sappinia), class Guttuli 0164. Yet another embodiment of the present invention naceae (for example the genera Guttulinopsis, and Gut relates to an isolated nucleic acid molecule comprising a tulina), class Dictysteliaceae (for example the generaAcra nucleic acid Sequence encoding at least one biologically sis, Dictyostelium, Polysphondylium, and Coenonia), and active domain or biologically active fragment thereof of a class Phycomyceae including the orders Chytridiales, polyunsaturated fatty acid (PUFA) polyketide synthase Ancylistales, Blastocladiales, Monoblepharidales, Saproleg (PKS) system from a Thraustochytrid microorganism. As niales, Peronosporales, Mucorales, and Entomophthorales. discussed above, the present inventors have Successfully used the method to identify a non-bacterial microorganism 0.161 In the Protozoa: Protozoa strains with life stages that has a PUFAPKS system to identify additional members capable of bacterivory (including by phageocytosis) can be of the order Thraustochytriales which contain a PUFA PKS Selected from the types classified as ciliates, flagellates or System. The identification of three Such microorganisms is amoebae. Protozoan ciliates include the groups: Chonot described in Example 2. Specifically, the present inventors richs, Colpodids, Cyrtophores, Haptorids, Karyorelicts, Oli have used the Screening method of the present invention to gohymenophora, Polyhymenophora (Spirotrichs), Prostomes identify Thraustochytrium sp. 23B (ATCC 20892) as being and Suctoria. Protozoan flagellates include the BioSoecids, highly predicted to contain a PUFA PKS system, followed Bodonids, Cercomonads, Chrysophytes (for example the by detection of sequences in the Thraustochytrium sp. 23B genera Anthophysa, Chrysamoemba, Chrysosphaerella, genome that hybridize to the Schizochytrium PUFA PKS Dendromonas, Dinobryon, Mallomonas, Ochromonas, Para genes disclosed herein. Schizochytrium limacium (IFO phySomonas, Poteriodchromonas, Spumella, Syncrypta, 32693) and Ulkenia (BP-5601) have also been identified as Synura, and Uroglena), Collar flagellates, Cryptophytes (for good candidates for containing PUFA PKS systems. Based example the genera Chilomonas, Cryptomonas, Cyanomo on these data and on the Similarities among members of the nas, and Goniomonas), Dinoflagellates, Diplomonads, order Thraustochytriales, it is believed that many other Euglenoids, Heterolobosea, Pedinellids, Pelobionts, Phalan Thraustochytriales PUFA PKS systems can now be readily Steriids, Pseudodendromonads, Spongomonads and Volvo identified using the methods and tools provided by the cales (and other flagellates including the unassigned flagel present invention. Therefore, Thraustochytriales PUFA PKS late genera of Artodiscus, Clautriavia, Helkesimastix, Systems and portions and/or homologues thereof (e.g., pro Kathablepharis and Multicilia). Amoeboid protozoans teins, domains and fragments thereof), genetically modified include the groups: Actinophryids, Centrohelids, Desmo organisms comprising Such Systems and portions and/or thoricids, Diplophryids, Eumamoebae, Heterolobosea, Lep homologues thereof, and methods of using Such microor tomyxids, Nucleariid filose amoebae, Pelebionts, Testate ganisms and PUFA PKS systems, are encompassed by the amoebae and Vampyrellids (and including the unassigned present invention. amoebid genera Gymnophrys, Biomyxa, Microcometes, , Belonocystis, Elaeorhanis, Allelogromia, 0.165 Developments have resulted in revision of the or Lieberkuhnia). The protozoan orders include the of the Thraustochytrids. Taxonomic theorists following: Percolomonadeae, HeteroloboSea, Lyromonadea, place Thraustochytrids with the algae or algae-like protists. Pseudociliata, Trichomonadea, Hypermastigea, Heter However, because of taxonomic uncertainty, it would be best omiteae, Telonemea, Cyathobodonea, Ebridea, Pyytomyxea, for the purposes of the present invention to consider the Opalinea, Kinetomonadea, Hemimastigea, Protostelea, Strains described in the present invention as Thraus Myxagastrea, Dictyostelea, Choanomonadea, Apicomon tochytrids (Order: Thraustochytriales; Family: Thraustochy adea, Eogregarinea, Neogregarinea, Coelotrolphea, Eucoc triaceae, Genus: Thraustochytrium, Schizochytrium, Laby cidea, Haemosporea, Piroplasmea, Spirotrichea, Prosto rinthuloides, or Japonochytrium). For the present invention, matea, Litostomatea, Phyllopharyngea, Nassophorea, members of the labrinthulids are considered to be included Oligohymenophorea, Colpodea, Karyorelicta, Nucleohelea, in the Thraustochytrids. Taxonomic changes are Summarized Centrohelea, , Sticholonchea, Polycystinea, below. Strains of certain unicellular microorganisms dis , Lobosea, FiloSea, Athalamea, , closed herein are members of the order Thraustochytriales. Polythalamea, , Schizocladea, HoloSea, ThrauStochytrids are marine eukaryotes with a evolving Entamoebea, Myxosporea, Actinomyxea, Halosporea, taxonomic history. Problems with the taxonomic placement Paramyxea, Rhombozoa and Orthonectea. of the Thraustochytrids have been reviewed by Moss (1986), Bahnweb and Jackle (1986) and Chamberlain and Moss 0162 A preferred embodiment of the present invention (1988). According to the present invention, the phrases includes Strains of the microorganisms listed above that have “Thraustochytrid”, “Thraustochytriales microorganism” and been collected from one of the preferred habitats listed “microorganism of the order Thraustochytriales' can be above. used interchangeably. 0163) One embodiment of the present invention relates to 0166 For convenience purposes, the Thraustochytrids any microorganisms identified using the novel PUFA PKS were first placed by taxonomists with other colorless screening method described above, to the PUFA PKS genes Zoosporic eukaryotes in the Phycomycetes (algae-like and proteins encoded thereby, and to the use of Such fungi). The name Phycomycetes, however, was eventually microorganisms and/or PUFA PKS genes and proteins dropped from taxonomic Status, and the Thraustochytrids (including homologues and fragments thereof) in any of the were retained in the Oomycetes (the biflagellate Zoosporic methods described herein. In particular, the present inven fungi). It was initially assumed that the Oomycetes were US 2002/0194641 A1 Dec. 19, 2002 related to the heterokont algae, and eventually a wide range Eufungi. The taxonomic placement of the Thraustochytrids of ultrastructural and biochemical Studies, Summarized by is therefore Summarized below: Barr (Barr, 1981, Biosystems 14:359-370) supported this 0170 : Chromophyta (Stramenopiles) assumption. The Oomycetes were in fact accepted by Leedale (Leedale, 1974, Taxon 23:261-270) and other phy 0171 Phylum: Heterokonta cologists as part of the heterokont algae. However, as a 0172) Order: Thraustochytriales matter of convenience resulting from their heterotrophic 0173 Family: Thraustochytriaceae nature, the Oomycetes and Thraustochytrids have been 0.174 Genus: Thraustochytrium, Schizochytrium, largely studied by mycologists (Scientists who study fungi) Labyrinthuloides, or Japonochytrium rather than phycologists (Scientists who study algae). 0.175. Some early taxonomists separated a few original 0167 From another taxonomic perspective, evolutionary members of the genus Thraustochytrium (those with an biologists have developed two general Schools of thought as amoeboid life stage) into a separate genus called Ulkenia. to how eukaryotes evolved. One theory proposes an exog However it is now known that most, if not all, Thraus enous origin of membrane-bound organelles through a Series tochytrids (including Thraustochytrium and of endosymbioses (Margulis, 1970, Origin of Eukaryotic Schizochytrium), exhibit amoeboid stages and as Such, Cells. Yale University Press, New Haven); e.g., mitochon Ulkenia is not considered by Some to be a valid genus. AS dria were derived from bacterial endosymbionts, chloro used herein, the genus Thraustochytrium will include Ulk plasts from cyanophytes, and flagella from Spirochaetes. The C. other theory Suggests a gradual evolution of the membrane 0176) Despite the uncertainty of taxonomic placement bound organelles from the non-membrane-bounded Systems of the prokaryote ancestor via an autogenous process (Cava within higher classifications of Phylum and Kingdom, the lier-Smith, 1975, Nature (Lond.) 256:462-468). Both groups ThrauStochytrids remain a distinctive and characteristic of evolutionary biologists however, have removed the grouping whose members remain classifiable within the Oomycetes and Thraustochytrids from the fungi and place order Thraustochytriales. them either with the chromophyte algae in the kingdom 0177 Polyunsaturated fatty acids (PUFAs) are essential Chromophyta (Cavalier-Smith, 1981, BioSystems 14:461 membrane components in higher eukaryotes and the precur 481) (this kingdom has been more recently expanded to sors of many lipid-derived signaling molecules. The PUFA include other protists and members of this kingdom are now PKS system of the present invention uses pathways for called Stramenopiles) or with all algae in the kingdom PUFA synthesis that do not require desaturation and elon Protoctista (Margulis and Sagen, 1985, Biosystems 18:141 gation of Saturated fatty acids. The pathways catalyzed by 147). PUFA PKSs that are distinct from previously recognized PKSS in both structure and mechanism. Generation of cis 0168 With the development of electron microscopy, double bonds is Suggested to involve position-specific Studies on the ultrastructure of the Zoospores of two genera isomerases, these enzymes are believed to be useful in the of Thraustochytrids, Thraustochytrium and Schizochytrium, production of new families of antibiotics. (Perkins, 1976, pp. 279-312 in “Recent Advances in Aquatic Mycology” (ed. E. B. G. Jones), John Wiley & Sons, New 0.178 To produce significantly high yields of various York; Kazama, 1980, Can. J. Bot. 58:2434-2446; Barr, 1981, bioactive molecules using the PUFA PKS system of the BioSystems 14:359-370) have provided good evidence that present invention, an organism, preferably a microorganism the ThrauStochytriaceae are only distantly related to the or a plant, can be genetically modified to affect the activity Oomycetes. Additionally, genetic data representing a corre of a PUFAPKS system. In one aspect, such an organism can spondence analysis (a form of multivariate statistics) of 5 S endogenously contain and express a PUFAPKS system, and ribosomal RNA sequences indicate that Thraustochytriales the genetic modification can be a genetic modification of one are clearly a unique group of eukaryotes, completely Sepa or more of the functional domains of the endogenous PUFA rate from the fungi, and most closely related to the red and PKS system, whereby the modification has some effect on brown algae, and to members of the Oomycetes (Mannella, the activity of the PUFAPKS system. In another aspect, such et al., 1987, Mol. Evol. 24:228-235). Most taxonomists have an organism can endogenously contain and express a PUFA agreed to remove the Thraustochytrids from the Oomycetes PKS System, and the genetic modification can be an intro (Bartnicki-Garcia, 1987, pp. 389-403 in “Evolutionary Biol duction of at least one exogenous nucleic acid Sequence ogy of the Fungi” (eds. Rayner, A. D. M., Brasier, C. M. & (e.g., a recombinant nucleic acid molecule), wherein the Moore, D.), Cambridge University Press, Cambridge). exogenous nucleic acid Sequence encodes at least one bio logically active domain or protein from a Second PKS 0169. In Summary, employing the taxonomic system of system and/or a protein that affects the activity of said PUFA Cavalier-Smith (Cavalier-Smith, 1981, BioSystems 14:461 PKS System (e.g., a phosphopantetheinyl transferases 481, 1983; Cavalier-Smith, 1993, Microbiol Rey: 57.953 (PPTase), discussed below). In yet another aspect, the organ 994), the Thraustochytrids are classified with the chro ism does not necessarily endogenously (naturally) contain a mophyte algae in the kingdom Chromophyta PUFA PKS system, but is genetically modified to introduce (Stramenopiles). This taxonomic placement has been more at least one recombinant nucleic acid molecule encoding an recently reaffirmed by Cavalier-Smith et al. using the 18s amino acid Sequence having the biological activity of at least rRNA signatures of the Heterokonta to demonstrate that one domain of a PUFA PKS system. In this aspect, PUFA Thraustochytrids are chromists not Fungi (Cavalier-Smith et PKS activity is affected by introducing or increasing PUFA al., 1994, Phil. Tran. Roy. Soc. London Series BioSciences PKS activity in the organism. Various embodiments associ 346:387-397). This places them in a completely different ated with each of these aspects will be discussed in greater kingdom from the fungi, which are all placed in the kingdom detail below. US 2002/0194641 A1 Dec. 19, 2002 24

0179 Therefore, according to the present invention, one e.g., by insertion, deletion, Substitution, and/or inversion of embodiment relates to a genetically modified microorgan nucleotides), in Such a manner that Such modifications ism, wherein the microorganism expresses a PKS System provide the desired effect within the microorganism. comprising at least one biologically active domain of a 0181 Preferred microorganism host cells to modify polyunsaturated fatty acid (PUFA) polyketide synthase according to the present invention include, but are not (PKS) system. The at least one domain of the PUFA PKS limited to, any bacteria, protist, microalga, , or pro System is encoded by a nucleic acid Sequence chosen from: toZoa. In one aspect, preferred microorganisms to geneti (a) a nucleic acid Sequence encoding at least one domain of cally modify include, but are not limited to, any microor a polyunsaturated fatty acid (PUFA) polyketide synthase ganism of the order Thraustochytriales. Particularly (PKS) system from a Thraustochytrid microorganism; (b) a preferred host cells for use in the present invention could nucleic acid Sequence encoding at least one domain of a include microorganisms from a genus including, but not PUFA PKS system from a microorganism identified by a limited to: Thraustochytrium, Labyrinthuloides, Screening method of the present invention; (c) a nucleic acid Japonochytrium, and Schizochytrium. Preferred Species Sequence encoding an amino acid Sequence that is at least within these genera include, but are not limited to: any about 60% identical to at least 500 consecutive amino acids Schizochytrium species, including Schizochytrium aggrega of an amino acid Sequence Selected from the group consist tum, Schizochytrium limacinum, Schizochytrium minutum; ing of: SEQ ID NO:2, SEQ ID NO:4, and SEQ ID NO:6; any Thraustochytrium species (including former Ulkenia wherein the amino acid Sequence has a biological activity of Species Such as U. visurgensis, U. amoeboida, U. Sarkari at least one domain of a PUFA PKS system; and, (d) a ana, U. profunda, U. radiata, U. minuta and Ulkenia sp. nucleic acid Sequence encoding an amino acid Sequence that BP-5601), and including Thraustochytrium striatum, is at least about 60% identical to an amino acid Sequence Thraustochytrium aureum, Thraustochytrium roseum; and selected from the group consisting of: SEQ ID NO:8, SEQ any Japonochytrium species. Particularly preferred Strains of ID NO:10, SEQID NO:13, SEQID NO:18, SEQID NO:20, Thraustochytriales include, but are not limited to: SEQ ID NO:22, SEQ ID NO:24, SEQ ID NO:26, SEQ ID Schizochytrium sp. (S31)(ATCC 20888); Schizochytrium NO:28, SEQ ID NO:30, and SEQ ID NO:32; wherein the sp. (S8)(ATCC 20889); Schizochytrium sp. (LC amino acid Sequence has a biological activity of at least one RM)(ATCC 18915); Schizochytrium sp. (SR21); domain of a PUFA PKS system. The genetic modification Schizochytrium aggregatum (Goldstein et Belsky)(ATCC affects the activity of the PKS system in the organism. The 28209); Schizochytrium limacinum (Honda et Yokochi)(IFO Screening process referenced in part (b) has been described 32693); Thraustochytrium sp. (23B)(ATCC 20891); Thraus in detail above and includes the steps of: (a) Selecting a tochytrium striatum (Schneider)(ATCC 24473); Thraus microorganism that produces at least one PUFA; and, (b) tochytrium aureum (Goldstein)(ATCC 34304); Thraus identifying a microorganism from (a) that has an ability to tochytrium roseum (Goldstein)(ATCC 28210); and produce increased PUFAS under dissolved oxygen condi Japonochytrium sp. (L1)(ATCC 28207). Other examples of tions of less than about 5% of Saturation in the fermentation Suitable host microorganisms for genetic modification medium, as compared to production of PUFAS by the include, but are not limited to, yeast including Saccharomy microorganism under dissolved oxygen conditions of greater ceS cerevisiae, Saccharomyces Carlsbergensis, or other yeast than about 5% of Saturation, and preferably about 10%, and Such as Candida, Kluyveromyces, or other fungi, for more preferably about 15%, and more preferably about 20% example, filamentous fungi Such as Aspergillus, Neurospora, of Saturation in the fermentation medium. The genetically Penicillium, etc. Bacterial cells also may be used as hosts. modified microorganism can include any one or more of the This includes Escherichia coli, which can be useful in above-identified nucleic acid Sequences, and/or any of the fermentation processes. Alternatively, a host Such as a other homologues of any of the Schizochytrium PUFA PKS Lactobacillus Species or Bacillus species can be used as a ORFs or domains as described in detail above. host. 0180. As used herein, a genetically modified microorgan 0182 Another embodiment of the present invention ism can include a genetically modified bacterium, protist, relates to a genetically modified plant, wherein the plant has microalgae, fungus, or other microbe, and particularly, any been genetically modified to recombinantly express a PKS of the genera of the order Thraustochytriales (e.g., a Thraus System comprising at least one biologically active domain of tochytrid) described herein (e.g., Schizochytrium, Thraus a polyunsaturated fatty acid (PUFA) polyketide synthase tochytrium, Japonochytrium, Labyrinthuloides). Such a (PKS) system. The domain is encoded by a nucleic acid genetically modified microorganism has a genome which is Sequence chosen from: (a) a nucleic acid sequence encoding modified (i.e., mutated or changed) from its normal (i.e., at least one domain of a polyunsaturated fatty acid (PUFA) wild-type or naturally occurring) form Such that the desired polyketide synthase (PKS) system from a Thraustochytrid result is achieved (i.e., increased or modified PUFA PKS microorganism; (b) a nucleic acid sequence encoding at least activity and/or production of a desired product using the one domain of a PUFA PKS system from a microorganism PKS System). Genetic modification of a microorganism can identified by the Screening and Selection method described be accomplished using classical Strain development and/or herein (see brief Summary of method in discussion of molecular genetic techniques. Such techniques known in the genetically modified microorganism above); (c) a nucleic art and are generally disclosed for microorganisms, for acid Sequence encoding an amino acid Sequence Selected example, in Sambrook et al., 1989, Molecular Cloning. A from the group consisting of: SEQ ID NO:2, SEQ ID NO:4, Laboratory Manual, Cold Spring Harbor Labs Press. The SEQID NO:6, and biologically active fragments thereof; (d) reference Sambrook et al., ibid., is incorporated by reference a nucleic acid Sequence encoding an amino acid Sequence herein in its entirety. A genetically modified microorganism selected from the group consisting of: SEQ ID NO:8, SEQ can include a microorganism in which nucleic acid mol ID NO:10, SEQID NO:13, SEQID NO:18, SEQID NO:20, ecules have been inserted, deleted or modified (i.e., mutated; SEQ ID NO:22, SEQ ID NO:24, SEQ ID NO:26, SEQ ID US 2002/0194641 A1 Dec. 19, 2002

NO:28, SEQ ID NO:30, SEQ ID NO:32, and biologically which results in incomplete or no translation of the protein active fragments thereof, (e) a nucleic acid sequence encod (e.g., the protein is not expressed), or a mutation in the gene ing an amino acid Sequence that is at least about 60% which decreases or abolishes the natural function of the identical to at least 500 consecutive amino acids of an amino protein (e.g., a protein is expressed which has decreased or acid Sequence Selected from the group consisting of: SEQID no enzymatic activity or action). Genetic modifications that NO:2, SEQ ID NO:4, and SEQID NO:6; wherein the amino result in an increase in gene expression or function can be acid Sequence has a biological activity of at least one domain referred to as amplification, Overproduction, overexpression, of a PUFA PKS system; and/or (f) a nucleic acid sequence activation, enhancement, addition, or up-regulation of a encoding an amino acid Sequence that is at least about 60% identical to an amino acid Sequence Selected from the group gene. consisting of: SEQ ID NO:8, SEQ ID NO:10, SEQ ID 0186 The genetic modification of a microorganism or NO:13, SEQ ID NO:18, SEQ ID NO:20, SEQ ID NO:22, plant according to the present invention preferably affects SEQ ID NO:24, SEQ ID NO:26, SEQ ID NO:28, SEQ ID the activity of the PKS system expressed by the plant, NO:30, and SEQ ID NO:32; wherein the amino acid whether the PKS System is endogenous and genetically Sequence has a biological activity of at least one domain of modified, endogenous with the introduction of recombinant a PUFA PKS system. The genetically modified plant can nucleic acid molecules into the organism, or provided com include any one or more of the above-identified nucleic acid Sequences, and/or any of the other homologues of any of the pletely by recombinant technology. According to the present Schizochytrium PUFA PKS ORFs or domains as described invention, to “affect the activity of a PKS system” includes in detail above. any genetic modification that causes any detectable or measurable change or modification in the PKS system 0183 AS used herein, a genetically modified plant can expressed by the organism as compared to in the absence of include any genetically modified plant including higher the genetic modification. A detectable change or modifica plants and particularly, any consumable plants or plants tion in the PKS system can include, but is not limited to: the useful for producing a desired bioactive molecule of the introduction of PKS System activity into an organism Such present invention. Such a genetically modified plant has a that the organism now has measurable/detectable PKS sys genome which is modified (i.e., mutated or changed) from tem activity (i.e., the organism did not contain a PKS System its normal (i.e., wild-type or naturally occurring) form Such prior to the genetic modification), the introduction into the that the desired result is achieved (i.e., increased or modified organism of a functional domain from a different PKS PUFA PKS activity and/or production of a desired product System than a PKS System endogenously expressed by the using the PKS system). Genetic modification of a plant can organism such that the PKS system activity is modified (e.g., be accomplished using classical Strain development and/or a bacterial PUFA PKS domain or a type I PKS domain is molecular genetic techniques. Methods for producing a introduced into an organism that endogenously expresses a transgenic plant, wherein a recombinant nucleic acid mol non-bacterial PUFAPKS system), a change in the amount of ecule encoding a desired amino acid Sequence is incorpo a bioactive molecule produced by the PKS system (e.g., the rated into the genome of the plant, are known in the art. A System produces more (increased amount) or less (decreased preferred plant to genetically modify according to the amount) of a given product as compared to in the absence of present invention is preferably a plant Suitable for consump the genetic modification), a change in the type of a bioactive tion by animals, including humans. molecule produced by the PKS System (e.g., the System 0184 Preferred plants to genetically modify according to produces a new or different product, or a variant of a product the present invention (i.e., plant host cells) include, but are that is naturally produced by the System), and/or a change in not limited to any higher plants, and particularly consumable the ratio of multiple bioactive molecules produced by the plants, including crop plants and especially plants used for PKS System (e.g., the System produces a different ratio of their oils. Such plants can include, for example: canola, one PUFA to another PUFA, produces a completely different Soybeans, rapeseed, linseed, corn, Safflowers, Sunflowers lipid profile as compared to in the absence of the genetic and tobacco. Other preferred plants include those plants that modification, or places various PUFAs in different positions are known to produce compounds used as pharmaceutical in a triacylglycerol as compared to the natural configura agents, flavoring agents, neutraceutical agents, functional tion). Such a genetic modification includes any type of food ingredients or cosmetically active agents or plants that genetic modification and Specifically includes modifications are genetically engineered to produce these compounds/ made by recombinant technology and by classical mutagen agents. CSS. 0185. According to the present invention, a genetically 0187. It should be noted that reference to increasing the modified microorganism or plant includes a microorganism activity of a functional domain or protein in a PUFA PKS or plant that has been modified using recombinant technol System refers to any genetic modification in the organism ogy. AS used herein, genetic modifications which result in a containing the domain or protein (or into which the domain decrease in gene expression, in the function of the gene, or or protein is to be introduced) which results in increased in the function of the gene product (i.e., the protein encoded functionality of the domain or protein System and can by the gene) can be referred to as inactivation (complete or include higher activity of the domain or protein (e.g., partial), deletion, interruption, blockage or down-regulation Specific activity or in Vivo enzymatic activity), reduced of a gene. For example, a genetic modification in a gene inhibition or degradation of the domain or protein System, which results in a decrease in the function of the protein and overexpression of the domain or protein. For example, encoded by Such gene, can be the result of a complete gene copy number can be increased, expression levels can be deletion of the gene (i.e., the gene does not exist, and increased by use of a promoter that gives higher levels of therefore the protein does not exist), a mutation in the gene expression than that of the native promoter, or a gene can be US 2002/0194641 A1 Dec. 19, 2002 26 altered by genetic engineering or classical mutagenesis to 0190. In another aspect of this embodiment of the inven increase the activity of the domain or protein encoded by the tion, the genetic modification can include: (1) the introduc gene. tion of a recombinant nucleic acid molecule encoding an amino acid Sequence having a biological activity of at least 0188 Similarly, reference to decreasing the activity of a one domain of a non-bacterial PUFAPKS system; and/or (2) functional domain or protein in a PUFA PKS system refers the introduction of a recombinant nucleic acid molecule to any genetic modification in the organism containing Such encoding a protein or functional domain that affects the domain or protein (or into which the domain or protein is to activity of a PUFA PKS system, into a host. The host can be introduced) which results in decreased functionality of include: (1) a host cell that does not express any PKS the domain or protein and includes decreased activity of the system, wherein all functional domains of a PKS system are domain or protein, increased inhibition or degradation of the introduced into the host cell, and wherein at least one domain or protein and a reduction or elimination of expres functional domain is from a non-bacterial PUFA PKS sys Sion of the domain or protein. For example, the action of tem; (2) a host cell that expresses a PKS System (endogenous domain or protein of the present invention can be decreased or recombinant) having at least one functional domain of a by blocking or reducing the production of the domain or non-bacterial PUFA PKS system, wherein the introduced protein, “knocking out the gene or portion thereof encoding recombinant nucleic acid molecule can encode at least one the domain or protein, reducing domain or protein activity, additional non-bacterial PUFA PKS domain function or or inhibiting the activity of the domain or protein. Blocking another protein or domain that affects the activity of the host or reducing the production of an domain or protein can PKS system; and (3) a host cell that expresses a PKS system include placing the gene encoding the domain or protein (endogenous or recombinant) which does not necessarily under the control of a promoter that requires the presence of include a domain function from a non-bacterial PUFAPKS, an inducing compound in the growth medium. By establish and wherein the introduced recombinant nucleic acid mol ing conditions Such that the inducer becomes depleted from ecule includes a nucleic acid Sequence encoding at least one the medium, the expression of the gene encoding the domain functional domain of a non-bacterial PUFA PKS system. In or protein (and therefore, of protein Synthesis) could be other words, the present invention intends to encompass any turned off. Blocking or reducing the activity of domain or genetically modified organism (e.g., microorganism or protein could also include using an excision technology plant), wherein the organism comprises at least one non approach similar to that described in U.S. Pat. No. 4,743, bacterial PUFA PKS domain function (either endogenously 546, incorporated herein by reference. To use this approach, or by recombinant modification), and wherein the genetic the gene encoding the protein of interest is cloned between modification has a measurable effect on the non-bacterial specific genetic sequences that allow specific, controlled PUFAPKS domain function or on the PKS system when the excision of the gene from the genome. Excision could be organism comprises a functional PKS System. prompted by, for example, a shift in the cultivation tem 0191). Therefore, using the non-bacterial PUFA PKS sys perature of the culture, as in U.S. Pat. No. 4,743,546, or by tems of the present invention, which, for example, makes Some other physical or nutritional Signal. use of genes from Thraustochytrid PUFAPKS systems, gene 0189 In one embodiment of the present invention, a mixing can be used to extend the range of PUFA products to genetic modification includes a modification of a nucleic include EPA, DHA, ARA, GLA, SDA and others, as well as acid Sequence encoding an amino acid Sequence that has a to produce a wide variety of bioactive molecules, including biological activity of at least one domain of a non-bacterial antibiotics, other pharmaceutical compounds, and other PUFA PKS system as described herein. Such a modification desirable products. The method to obtain these bioactive can be to an amino acid Sequence within an endogenously molecules includes not only the mixing of genes from (naturally) expressed non-bacterial PUFA PKS system, various organisms but also various methods of genetically whereby a microorganism that naturally contains Such a modifying the non-bacterial PUFA PKS genes disclosed System is genetically modified by, for example, classical herein. Knowledge of the genetic basis and domain Structure mutagenesis and Selection techniques and/or molecular of the non-bacterial PUFA PKS system of the present genetic techniques, include genetic engineering techniques. invention provides a basis for designing novel genetically Genetic engineering techniques can include, for example, modified organisms which produce a variety of bioactive using a targeting recombinant vector to delete a portion of an molecules. Although mixing and modification of any PKS endogenous gene, or to replace a portion of an endogenous domains and related genes are contemplated by the present gene with a heterologous Sequence. Examples of heterolo inventors, by way of example, various possible manipula gous Sequences that could be introduced into a host genome tions of the PUFA-PKS system are discussed below with include Sequences encoding at least one functional domain regard to genetic modification and bioactive molecule pro from another PKS system, such as a different non-bacterial duction. PUFA PKS system, a bacterial PUFA PKS system, a type I 0.192 For example, in one embodiment, non-bacterial PKS system, a type II PKS system, or a modular PKS PUFA-PKS system products, such as those produced by System. Other heterologous Sequences to introduce into the Thraustochytrids, are altered by modifying the CLF (chain genome of a host includes a Sequence encoding a protein or length factor) domain. This domain is characteristic of Type functional domain that is not a domain of a PKS system, but II (dissociated enzymes) PKS systems. Its amino acid which will affect the activity of the endogenous PKS system. Sequence shows homology to KS (keto Synthase pairs) For example, one could introduce into the host genome a domains, but it lacks the active site cysteine. CLF may nucleic acid molecule encoding a phosphopantetheinyl function to determine the number of elongation cycles, and transferase (discussed below). Specific modifications that hence the chain length, of the end product. In this embodi could be made to an endogenous PUFA PKS system are ment of the invention, using the current State of knowledge discussed in detail below. of FAS and PKS synthesis, a rational strategy for production US 2002/0194641 A1 Dec. 19, 2002 27 of ARA by directed modification of the non-bacterial PUFA two carbons. It is believed that only this KS can extend a PKS system is provided. There is controversy in the litera carbon chain that contains a double bond. This extension ture concerning the function of the CLF in PKS systems (C. occurs only when the double bond is in the cis configuration; Bisang et al., Nature 401,502 (1999)) and it is realized that if it is in the trans configuration, the double bond is reduced other domains may be involved in determination of the chain by enoyl-ACP reductase (ER) prior to elongation (Heath et length of the end product. However, it is significant that al., 1996, Supra). All of the PUFA-PKS systems character Schizochytrium produces both DHA (C22:6, ()-3) and DPA ized so far have two KS domains, one of which shows (C22:5, co-6). In the PUFA-PKS system the cis double bonds greater homology to the FabB-like KS of E. coli than the are introduced during Synthesis of the growing carbon chain. other. Again, without being bound by theory, the present Since placement of the co-3 and ()-6 double bonds occurs inventors believe that in PUFA-PKS systems, the specifici early in the Synthesis of the molecules, one would not expect ties and interactions of the DH (FabA-like) and KS (FabB that they would affect Subsequent end-product chain length like) enzymatic domains determine the number and place determination. Thus, without being bound by theory, the ment of cis double bonds in the end products. Because the present inventors believe that introduction of a factor (e.g. number of 2-carbon elongation reactions is greater than the CLF) that directs synthesis of C20 units (instead of C22 number of double bonds present in the PUFA-PKS end units) into the Schizochytrium PUFA-PKS system will result products, it can be determined that in Some extension cycles in the production of EPA (C20:5, (O-3) and ARA (C20:4, complete reduction occurs. Thus the DH and KS domains co-6). For example, in heterologous systems, one could can be used as targets for alteration of the DHA/DPA ratio exploit the CLF by directly substituting a CLF from an EPA or ratioS of other long chain fatty acids. These can be producing System (Such as one from Photobacterium) into modified and/or evaluated by introduction of homologous the Schizochytrium gene Set. The fatty acids of the resulting domains from other Systems or by mutagenesis of these gene transformants can then be analyzed for alterations in profiles fragments. to identify the transformants producing EPA and/or ARA. 0196) In another embodiment, the ER (enoyl-ACP reduc 0193 In addition to dependence on development of a tase-an enzyme which reduces the trans-double bond in the heterologous System (recombinant System, Such as could be fatty acyl-ACP resulting in fully Saturated carbons) domains introduced into plants), the CLF concept can be exploited in can be modified or Substituted to change the type of product Schizochytrium (i.e., by modification of a Schizochytrium made by the PKS system. For example, the present inventors genome). Transformation and homologous recombination know that Schizochytrium PUFA-PKS system differs from has been demonstrated in Schizochytrium. One can exploit the previously described bacterial Systems in that it has two this by constructing a clone with the CLF of OrfB replaced (rather than one) ER domains. Without being bound by with a CLF from a C20 PUFA-PKS system. A marker gene theory, the present inventors believe these ER domains can will be inserted downstream of the coding region. One can Strongly influence the resulting PKS production product. then transform the wild type cells, Select for the marker The resulting PKS product could be changed by separately phenotype and then Screen for those that had incorporated knocking out the individual domains or by modifying their the new CLF. Again, one would analyze these for any effects nucleotide Sequence or by Substitution of ER domains from on fatty acid profiles to identify transformants producing other organisms. EPA and/or ARA. If some factor other than those associated 0197). In another embodiment, nucleic acid molecules with the CLF are found to influence the chain length of the encoding proteins or domains that are not part of a PKS end product, a similar Strategy could be employed to alter system, but which affect a PKS system, can be introduced those factors. into an organism. For example, all of the PUFAPKS systems 0194 Another preferred embodiment involving alter described above contain multiple, tandem, ACP domains. ation of the PUFA-PKS products involves modification or ACP (as a separate protein or as a domain of a larger protein) substitution of the B-hydroxyacyl-ACP dehydrase/ketosyn requires attachment of a phosphopantetheline cofactor to thase pairs. During cis-vaccenic acid (C 18:1, A11) synthesis produce the active, holo-ACP. Attachment of phosphopan in E. coli, creation of the cis double bond is believed to tetheine to the apo-ACP is carried out by members of the depend on a specific DH enzyme, B-hydroxy acyl-ACP Superfamily of enzymes-the phosphopantetheinyl trans dehydrase, the product of the FabA gene. This enzyme ferases (PPTase) (Lambalot R. H., et al., Chemistry and removes HOH from a B-keto acyl-ACP and leaves a trans Biology, 3,923 (1996)). double bond in the carbon chain. A Subset of DH's, FabA 0198 By analogy to other PKS and FAS systems, the like, possess cis-trans isomerase activity (Heath et al., 1996, present inventors presume that activation of the multiple supra). A novel aspect of bacterial and non-bacterial PUFA ACP domains present in the Schizochytrium ORFA protein PKS systems is the presence of two FabA-like DH domains. is carried out by a Specific, endogenous, PPTase. The gene Without being bound by theory, the present inventors encoding this presumed PPTase has not yet been identified believe that one or both of these DH domains will possess in Schizochytrium. If Such a gene is present in cis-trans isomerase activity (manipulation of the DH Schizochytrium, one can envision Several approaches that domains is discussed in greater detail below). could be used in an attempt to identify and clone it. These 0.195 Another aspect of the unsaturated fatty acid syn could include (but would not be limited to): generation and thesis in E. coli is the requirement for a particular KS partial Sequencing of a cDNA library prepared from actively enzyme, B-ketoacyl-ACP synthase, the product of the FabB growing Schizochytrium cells (note, one sequence was gene. This is the enzyme that carries out condensation of a identified in the currently available Schizochytrium cDNA fatty acid, linked to a cysteine residue at the active site (by library set which showed homology to PPTases; however, a thio-ester bond), with a malonyl-ACP. In the multi-step it appears to be part of a multidomain FAS protein, and as reaction, CO is released and the linear chain is extended by such may not encode the desired OrfA specific PPTase); use US 2002/0194641 A1 Dec. 19, 2002 28 of degenerate oligonucleotide primerS designed using amino Schizochytrium OrfA (or the homologous Shewanella Orf5 acid motifs present in many PPTase's in PCR reactions (to protein). However, the complete genome of Nostoc has obtain a nucleic acid probe molecule to Screen genomic or recently been Sequenced and as a result, the Sequence of the cDNA libraries); genetic approaches based on protein-pro region just upstream of the PKS gene cluster is now avail tein interactions (e.g. a yeast two-hybrid System) in which able. In this region are three Orfs that show homology to the the ORFA-ACP domains would be used as a “bait' to find domains (KS, MAT, ACP and KR) of OrfA (see FIG. 3). a “target” (i.e. the PPTase); and purification and partial Included in this set are two ACP domains, both of which Sequencing of the enzyme itself as a means to generate a show high homology to the ORFAACP domains. At the end nucleic acid probe for Screening of genomic or cDNA of the Nostoc PKS cluster is the gene that encodes the Het libraries. I PPTase. Previously, it was not obvious what the substrate 0199. It is also conceivable that a heterologous PPTase of the Het I enzyme could be, however the presence of may be capable of activating the Schizochytrium ORFA tandem ACP domains in the newly identified Orf (Hgl E) of ACP domains. It has been shown that Some PPTases, for the cluster Strongly Suggests to the present inventors that it example the s?p enzyme of Bacillus Subtilis (Lambalot et al., is those ACPs. The homology of the ACP domains of Supra) and the SVp enzyme of Streptomyces verticillus Schizochytrium and NoStoc, as well as the tandem arrange (Sanchez et al., 2001, Chemistry & Biology 8:725-738), ment of the domains in both proteins, makes Het I a likely have a broad Substrate tolerance. These enzymes can be candidate for heterologous activation of the Schizochytrium tested to see if they will activate the Schizochytrium ACP ORFAACPs. The present inventors are believed to be the domains. Also, a recent publication described the expression first to recognize and contemplate this use for NoStoc Het I of a fungal PKS protein in tobacco (Yalpaniet al., 2001, The PPTase. Plant Cell 13:1401-1409). Products of the introduced PKS 0201 AS indicated in Metz et al., 2001, Supra, one novel System (encoded by the 6-methylsalicyclic acid synthase feature of the PUFA PKS systems is the presence of two gene of Penicillium patulum) were detected in the transgenic dehydratase domains, both of which show homology to the plant, even though the corresponding fungal PPTase was not FabA proteins of E. coli. With the availability of the new present in those plants. This Suggested that an endogenous NoStoc PKS gene Sequences mentioned above, one can now plant PPTase(s) recognized and activated the fingal PKS compare the two Systems and their products. The Sequence ACP domain. Of relevance to this observation, the present of domains in the Nostoc cluster (from Hg1E to Het I) as the inventors have identified two Sequences (genes) in the present inventors have defined them is (see FIG. 3): Arabidopsis whole genome database that are likely to encode PPTases. These sequences (GenBank Accession 0202) KS-MAT-2XACP, KR, KS, CLF-AT, ER numbers; AAG51443 and AAC05345) are currently listed as (HetM, HetN) HetI encoding “Unknown Proteins”. They can be identified as putative PPTases based on the presence in the translated 0203) In the Schizochytrium PUFA-PKS Orfs A,B&C the protein Sequences of Several Signature motifs including; sequence (OrfA-B-C) is: G(I/V)D and WXXKE(A/S)xxK (SEQ ID NO:33), (listed in 0204) KS-MAT-9xACP-KR KS-CLF-ATER Lambalot et al., 1996 as characteristic of all PPTases). In DH-DH-ER addition, these two putative proteins contain two additional motifs typically found in PPTases typically associated with 0205 One can see the correspondence of the domains PKS and non-ribosomal peptide Synthesis Systems, i.e., Sequence (there is also a high amino acid Sequence homol FN(I/L/V)SHS (SEQ ID NO:34) and (I/V/L)G(I/L/V)D(I/ ogy). The product of the Nostoc PKS system is a long chain L/V) (SEQ ID NO:35). Furthermore, these motifs occur in hydroxy fatty acid (C26 or C28 with one or two hydroxy the expected relative positions in the protein Sequences. It is groups) that contains no double bonds (cis or trans). The likely that homologues of the Arabidopsis genes are present product of the Schizochytrium PKS system is a long chain in other plants, Such as tobacco. Again, these genes can be polyunsaturated fatty acid (C22, with 5 or 6 double bonds cloned and expressed to see if the enzymes they encode can all cis). An obvious difference between the two domain sets activate the Schizochytrium ORFAACP domains, or alter is the presence of the two DH domains in the natively, OrfA could be expressed directly in the transgenic Schizochytrium proteins just the domains implicated in plant (either targeted to the plastid or the cytoplasm). the formation of the cis double bonds of DHA and DPA 0200 Another heterologous PPTase which may recog (presumably HetM and HetN in the Nostoc system are nize the ORFA ACP domains as Substrates is the Het I involved in inclusion of the hydroxyl groups and also protein of Nostoc sp. PCC 7120 (formerly called Anabaena contain a DH domain whose origin differs from the those sp. PCC 7120). As noted in U.S. Pat. No. 6,140,486, several found in the PUFA). Also, the role of the duplicated ER of the PUFA-PKS genes of Shewanella showed a high domain in the Schizochytrium Orfs B and C is not known degree of homology to protein domains present in a PKS (the Second ER domain in is not present other characterized cluster found in Nostoc (FIG. 2 of that patent). This Nostoc PUFA PKS systems). The amino acid sequence homology PKS system is associated with the synthesis of long chain between the two Sets of domains implies an evolutionary (C26 or C28) hydroxy fatty acids that become esterified to relationship. One can conceive of the PUFA PKS gene set Sugar moieties and form a part of the heterocyst cell wall. being derived from (in an evolutionary Sense) an ancestral These Nostoc PKS domains are also highly homologous to Nostoc-like PKS gene set by incorporation of the DH the domains found in Orfs B and C of the Schizochytrium (FabA-like) domains. The addition of the DH domains PKS proteins (i.e. the same ones that correspond to those would result in the introduction of cis double bonds in the found in the Shewanella PKS proteins). Until very recently, new PKS end product structure. none of the Nostoc PKS domains present in the GenBank 0206. The comparisons of the Schizochytrium and Nos databases showed high homology to any of the domains of toc PKS domain structures as well as the comparison of the US 2002/0194641 A1 Dec. 19, 2002 29 domain organization between the Schizochytrium and one malonyl-CoA:ACP acyltransferase (MAT) domain. In Shewanella PUFA-PKS proteins demonstrate nature's abil one embodiment, the PUFA PKS system is a eukaryotic ity to alter domain order as well as incorporate new domains PUFA PKS system. In a preferred embodiment, the PUFA to create novel end products. In addition, the genes can now PKS system is an algal PUFA PKS system. In a more be manipulated in the laboratory to create new products. The preferred embodiment, the PUFA PKS system is a Thraus implication from these observations is that it should be tochytriales PUFA PKS system. Such PUFA PKS systems possible to continue to manipulate the Systems in either a can include, but are not limited to, a Schizochytrium PUFA directed or random way to influence the end products. For PKS system, and a Thraustochytrium PUFA PKS system. In example, in a preferred embodiment, one could envision one embodiment, the PUFAPKS system can be expressed in substituting one of the DH (FabA-like) domains of the a prokaryotic host cell. In another embodiment, the PUFA PUFA-PKS system for a DH domain that did not posses PKS System can be expressed in a eukaryotic host cell. isomerization activity, potentially creating a molecule with a mix of cis- and tras- double bonds. The current products of 0210 Another embodiment of the present invention the Schizochytrium PUFA PKS system are DHA and DPA relates to a recombinant host cell which has been modified (C22:5 (06). If one manipulated the system to produce C20 to express a non-bacterial PUFA PKS system, wherein the fatty acids, one would expect the products to be EPA and PKS system catalyzes both iterative and non-iterative enzy ARA (C20:4 (O6). This could provide a new source for ARA. matic reactions, and wherein the non-bacterial PUFA PKS One could also Substitute domains from related PUFA-PKS System comprises at least the following biologically active systems that produced a different DHA to DPA ratio-for domains: (a) at least one enoyl ACP-reductase (ER) domain; example by using genes from Thraustochytrium 23B (the (b) multiple acyl carrier protein (ACP) domains (at least four); (c) at least two f-keto acyl-ACP synthase (KS) PUFA PKS system of which is identified for the first time domains; (d) at least one acyltransferase (AT) domain; (e) at herein). least one ketoreductase (KR) domain; (f) at least two FabA 0207 Additionally, one could envision specifically alter like B-hydroxy acyl-ACP dehydrase (DH) domains; (g) at ing one of the ER domains (e.g. removing, or inactivating) least one chain length factor (CLF) domain; and (h) at least in the Schizochytrium PUFAPKS system (other PUFAPKS one malonyl-CoA:ACP acyltransferase (MAT) domain. systems described so far do not have two ER domains) to determine its effect on the end product profile. Similar 0211 One aspect of this embodiment of the invention Strategies could be attempted in a directed manner for each relates to a method to produce a product containing at least of the distinct domains of the PUFA-PKS proteins using one PUFA, comprising growing a plant comprising any of more or less Sophisticated approaches. Of course one would the recombinant host cells described above, wherein the not be limited to the manipulation of Single domains. recombinant host cell is a plant cell, under conditions Finally, one could extend the approach by mixing domains effective to produce the product. Another aspect of this from the PUFA-PKS system and other PKS or FAS systems embodiment of the invention relates to a method to produce (e.g., type I, type II, modular) to create an entire range of a product containing at least one PUFA, comprising cultur new end products. For example, one could introduce the ing a culture containing any of the recombinant host cells PUFA-PKS DH domains into systems that do not normally described above, wherein the host cell is a microbial cell, incorporate cis double bonds into their end products. under conditions effective to produce the product. In a preferred embodiment, the PKS system in the host cell 0208 Accordingly, encompassed by the present inven catalyzes the direct production of triglycerides. tion are methods to genetically modify microbial or plant cells by: genetically modifying at least one nucleic acid 0212 Another embodiment of the present invention Sequence in the organism that encodes an amino acid relates to a microorganism comprising a non-bacterial, poly Sequence having the biological activity of at least one unsaturated fatty acid (PUFA) polyketide synthase (PKS) functional domain of a non-bacterial PUFA PKS system system, wherein the PKS catalyzes both iterative and non according to the present invention, and/or expressing at least iterative enzymatic reactions, and wherein the PUFA PKS one recombinant nucleic acid molecule comprising a nucleic System comprises: (a) at least two enoyl ACP-reductase acid Sequence encoding Such amino acid Sequence. Various (ER) domains; (b) at least six acyl carrier protein (ACP) embodiments of Such Sequences, methods to genetically domains; (c) at least two B-keto acyl-ACP synthase (KS) modify an organism, and Specific modifications have been domains; (d) at least one acyltransferase (AT) domain; (e) at described in detail above. Typically, the method is used to least one ketoreductase (KR) domain; (f) at least two FabA produce a particular genetically modified organism that like B-hydroxy acyl-ACP dehydrase (DH) domains; (g) at produces a particular bioactive molecule or molecules. least one chain length factor (CLF) domain; and (h) at least one malonyl-CoA:ACP acyltransferase (MAT) domain. 0209. One embodiment of the present invention relates to Preferably, the microorganism is a non-bacterial microor a recombinant host cell which has been modified to express ganism and more preferably, a eukaryotic microorganism. a polyunsaturated fatty acid (PUFA) polyketide synthase (PKS) system, wherein the PKS catalyzes both iterative and 0213 Yet another embodiment of the present invention non-iterative enzymatic reactions, and wherein the PUFA relates to a microorganism comprising a non-bacterial, poly PKS system comprises: (a) at least two enoyl ACP-reductase unsaturated fatty acid (PUFA) polyketide synthase (PKS) (ER) domains; (b) at least six acyl carrier protein (ACP) system, wherein the PKS catalyzes both iterative and non domains; (c) at least two B-keto acyl-ACP synthase (KS) iterative enzymatic reactions, and wherein the PUFA PKS domains; (d) at least one acyltransferase (AT) domain; (e) at System comprises: (a) at least one enoyl ACP-reductase (ER) least one ketoreductase (KR) domain; (f) at least two FabA domain; (b) multiple acyl carrier protein (ACP) domains (at like B-hydroxy acyl-ACP dehydrase (DH) domains; (g) at least four); (c) at least two B-ketoacyl-ACP synthase (KS) least one chain length factor (CLF) domain; and (h) at least domains; (d) at least one acyltransferase (AT) domain; (e) at US 2002/0194641 A1 Dec. 19, 2002 30 least one ketoreductase (KR) domain; (f) at least two FabA large number of mutations in one or more of the PUFAPKS like B-hydroxy acyl-ACP dehydrase (DH) domains; (g) at genes disclosed herein, and then transform the appropriately least one chain length factor (CLF) domain; and (h) at least modified FabB-Strain (e.g. create mutations in an expression one malonyl-CoA:ACP acyltransferase (MAT) domain. construct containing an ER domain and transform a FabB 0214. In one embodiment of the present invention, it is Strain Strain having the other essential domains on a Separate contemplated that a mutagenesis program could be com plasmid-or integrated into the chromosome) and Select bined with a Selective Screening process to obtain bioactive only for those transformants that grow without Supplemen molecules of interest. This would include methods to search tation of the medium (i.e., that still possessed an ability to for a range of bioactive compounds. This Search would not produce a molecule that could complement the FabB-de be restricted to production of those molecules with cis fect). Additional screens could be developed to look for double bonds. The mutagenesis methods could include, but particular compounds (e.g. use of GC for fatty acids) being are not limited to: chemical mutagenesis, gene Shuffling, produced in this selective Subset of an active PKS system. Switching regions of the genes encoding Specific enzymatic One could envision a number of Similar Selective Screens for domains, or mutagenesis restricted to specific regions of bioactive molecules of interest. those genes, as well as other methods. 0217. As described above, in one embodiment of the present invention, a genetically modified microorganism or 0215 For example, high throughput mutagenesis meth plant includes a microorganism or plant which has an ods could be used to influence or optimize production of the enhanced ability to Synthesize desired bioactive molecules desired bioactive molecule. Once an effective model System (products) or which has a newly introduced ability to has been developed, one could modify these genes in a high Synthesize specific products (e.g., to Synthesize a specific throughput manner. Utilization of these technologies can be antibiotic). According to the present invention, “an enhanced envisioned on two levels. First, if a sufficiently selective ability to Synthesize' a product refers to any enhancement, Screen for production of a product of interest (e.g., ARA) can or up-regulation, in a pathway related to the Synthesis of the be devised, it could be used to attempt to alter the System to product Such that the microorganism or plant produces an produce this product (e.g., in lieu of, or in concert with, other increased amount of the product (including any production Strategies Such as those discussed above). Additionally, if the of a product where there was none before) as compared to Strategies outlined above resulted in a Set of genes that did the wild-type microorganism or plant, cultured or grown, produce the product of interest, the high throughput tech under the same conditions. Methods to produce Such geneti nologies could then be used to optimize the System. For cally modified organisms have been described in detail example, if the introduced domain only functioned at rela above. tively low temperatures, selection methods could be devised to permit removing that limitation. In one embodiment of the 0218. One embodiment of the present invention is a invention, Screening methods are used to identify additional method to produce desired bioactive molecules (also non-bacterial organisms having novel PKS Systems similar referred to as products or compounds) by growing or cul to the PUFA PKS system of Schizochytrium, as described turing a genetically modified microorganism or plant of the herein (see above). Homologous PKS systems identified in present invention (described in detail above). Such a method Such organisms can be used in methods similar to those includes the Step of culturing in a fermentation medium or described herein for the Schizochytrium, as well as for an growing in a Suitable environment, Such as Soil, a microor additional Source of genetic material from which to create, ganism or plant, respectively, that has a genetic modification further modify and/or mutate a PKS system for expression as described previously herein and in accordance with the in that microorganism, in another microorganism, or in a present invention. In a preferred embodiment, method to higher plant, to produce a variety of compounds. produce bioactive molecules of the present invention includes the Step of culturing under conditions effective to 0216. It is recognized that many genetic alterations, produce the bioactive molecule a genetically modified either random or directed, which one may introduce into a organism that expresses a PKS System comprising at least native (endogenous, natural) PKS System, will result in an one biologically active domain of a polyunsaturated fatty inactivation of enzymatic functions. A preferred embodi acid (PUFA) polyketide synthase (PKS) system. In this ment of the invention includes a System to Select for only preferred aspect, at least one domain of the PUFA PKS those modifications that do not block the ability of the PKS System is encoded by a nucleic acid Sequence Selected from System to produce a product. For example, the FabB-Strain the group consisting of: (a) a nucleic acid Sequence encoding of E. coli is incapable of Synthesizing unsaturated fatty acids at least one domain of a polyunsaturated fatty acid (PUFA) and requires Supplementation of the medium with fatty acids polyketide synthase (PKS) system from a Thraustochytrid that can Substitute for its normal unsaturated fatty acids in microorganism; (b) a nucleic acid sequence encoding at least order to grow (see Metz et al., 2001, Supra). However, this one domain of a PUFA PKS system from a microorganism requirement (for Supplementation of the medium) can be identified by the novel Screening method of the present removed when the strain is transformed with a functional invention (described above in detail); (c) a nucleic acid PUFA-PKS system (i.e. one that produces a PUFA product Sequence encoding an amino acid Sequence Selected from in the E. coli host-see (Metz et al., 2001, Supra, FIG. 2A). the group consisting of: SEQ ID NO:2, SEQ ID NO:4, SEQ The transformed FabB-strain now requires a functional ID NO:6, and biologically active fragments thereof; (d) a PUFA-PKS system (to produce the unsaturated fatty acids) nucleic acid Sequence encoding an amino acid Sequence for growth without Supplementation. The key element in this selected from the group consisting of: SEQ ID NO:8, SEQ example is that production of a wide range of unsaturated ID NO:10, SEQID NO:13, SEQID NO:18, SEQID NO:20, fatty acid will Suffice (even unsaturated fatty acid Substitutes SEQ ID NO:22, SEQ ID NO:24, SEQ ID NO:26, SEQ ID Such as branched chain fatty acids). Therefore, in another NO:28, SEQ ID NO:30, SEQ ID NO:32, and biologically preferred embodiment of the invention, one could create a active fragments thereof; (e) a nucleic acid sequence encod US 2002/0194641 A1 Dec. 19, 2002

ing an amino acid Sequence that is at least about 60% according to the present invention. The compounds can be identical to at least 500 consecutive amino acids of an amino recovered through purification processes which extract the acid Sequence Selected from the group consisting of: SEQID compounds from the plant. In a preferred embodiment, the NO:2, SEQ ID NO:4, and SEQID NO:6; wherein the amino compound is recovered by harvesting the plant. In this acid Sequence has a biological activity of at least one domain embodiment, the plant can be consumed in its natural State of a PUFA PKS system; and, (f) a nucleic acid sequence or further processed into consumable products. encoding an amino acid Sequence that is at least about 60% 0221 AS described above, a genetically modified micro identical to an amino acid Sequence Selected from the group organism useful in the present invention can, in one aspect, consisting of: SEQ ID NO:8, SEQ ID NO:10, SEQ ID endogenously contain and express a PUFAPKS system, and NO:13, SEQ ID NO:18, SEQ ID NO:20, SEQ ID NO:22, the genetic modification can be a genetic modification of one SEQ ID NO:24, SEQ ID NO:26, SEQ ID NO:28, SEQ ID or more of the functional domains of the endogenous PUFA NO:30, and SEQ ID NO:32; wherein the amino acid PKS system, whereby the modification has some effect on Sequence has a biological activity of at least one domain of the activity of the PUFAPKS system. In another aspect, such a PUFAPKS system. In this preferred aspect of the method, an organism can endogenously contain and express a PUFA the organism is genetically modified to affect the activity of PKS System, and the genetic modification can be an intro the PKS system (described in detail above). Preferred host duction of at least one exogenous nucleic acid Sequence cells for genetic modification related to the PUFA PKS (e.g., a recombinant nucleic acid molecule), wherein the system of the invention are described above. exogenous nucleic acid Sequence encodes at least one bio 0219. In the method of production of desired bioactive logically active domain or protein from a Second PKS compounds of the present invention, a genetically modified system and/or a protein that affects the activity of said PUFA microorganism is cultured or grown in a Suitable medium, PKS System (e.g., a phosphopantetheinyl transferases under conditions effective to produce the bioactive com (PPTase), discussed below). In yet another aspect, the organ pound. An appropriate, or effective, medium refers to any ism does not necessarily endogenously (naturally) contain a medium in which a genetically modified microorganism of PUFA PKS system, but is genetically modified to introduce the present invention, when cultured, is capable of produc at least one recombinant nucleic acid molecule encoding an ing the desired product. Such a medium is typically an amino acid Sequence having the biological activity of at least aqueous medium comprising assimilable carbon, nitrogen one domain of a PUFA PKS system. In this aspect, PUFA and phosphate Sources. Such a medium can also include PKS activity is affected by introducing or increasing PUFA appropriate Salts, minerals, metals and other nutrients. PKS activity in the organism. Various embodiments associ Microorganisms of the present invention can be cultured in ated with each of these aspects have been discussed in detail conventional fermentation bioreactors. The microorganisms above. can be cultured by any fermentation proceSS which includes, but is not limited to, batch, fed-batch, cell recycle, and 0222. In one embodiment of the method to produce continuous fermentation. Preferred growth conditions for bioactive compounds, the genetic modification changes at potential host microorganisms according to the present least one product produced by the endogenous PKS System, invention are well known in the art. The desired bioactive as compared to a wild-type organism. molecules produced by the genetically modified microor 0223) In another embodiment, the organism endog ganism can be recovered from the fermentation medium enously expresses a PKS System comprising the at least one using conventional Separation and purification techniques. biologically active domain of the PUFAPKS system, and the For example, the fermentation medium can be filtered or genetic modification comprises transfection of the organism centrifuged to remove microorganisms, cell debris and other with a recombinant nucleic acid molecule Selected from the particulate matter, and the product can be recovered from the group consisting of: a recombinant nucleic acid molecule cell-free Supernatant by conventional methods, Such as, for encoding at least one biologically active domain from a example, ion exchange, chromatography, extraction, Solvent Second PKS System and a recombinant nucleic acid mol extraction, membrane Separation, electrodialysis, reverse ecule encoding a protein that affects the activity of the PUFA oSmosis, distillation, chemical derivatization and crystalli PKS system. In this embodiment, the genetic modification Zation. Alternatively, microorganisms producing the desired preferably changes at least one product produced by the compound, or extracts and various fractions thereof, can be endogenous PKS System, as compared to a wild-type organ used without removal of the microorganism components ism. A second PKS system can include another PUFA PKS from the product. System (bacterial or non-bacterial), a type I PKS System, a 0220. In the method for production of desired bioactive type II PKS system, and/or a modular PKS system. compounds of the present invention, a genetically modified Examples of proteins that affect the activity of a PKS system plant is cultured in a fermentation medium or grown in a have been described above (e.g., PPTase). Suitable medium Such as Soil. An appropriate, or effective, fermentation medium has been discussed in detail above. A 0224. In another embodiment, the organism is genetically Suitable growth medium for higher plants includes any modified by transfection with a recombinant nucleic acid growth medium for plants, including, but not limited to, Soil, molecule encoding the at least one domain of the polyun Sand, any other particulate media that Support root growth saturated fatty acid (PUFA) polyketide synthase (PKS) (e.g. Vermiculite, perlite, etc.) or Hydroponic culture, as well System. Such recombinant nucleic acid molecules have been as Suitable light, water and nutritional Supplements which described in detail previously herein. optimize the growth of the higher plant. The genetically 0225. In another embodiment, the organism endog modified plants of the present invention are engineered to enously expresses a non-bacterial PUFA PKS system, and produce Significant quantities of the desired product through the genetic modification comprises Substitution of a domain the activity of the PKS system that is genetically modified from a different PKS system for a nucleic acid sequence US 2002/0194641 A1 Dec. 19, 2002 32 encoding at least one domain of the non-bacterial PUFA cholesterol lowering formulation. One advantage of the PKS System. In another embodiment, the organism endog non-bacterial APUFA PKS system of the present invention enously expresses a non-bacterial PUFA PKS system that is the ability of Such a System to introduce carbon-carbon has been modified by transfecting the organism with a double bonds in the cis configuration, and molecules includ recombinant nucleic acid molecule encoding a protein that ing a double bond at every third carbon. This ability can be regulates the chain length of fatty acids produced by the utilized to produce a variety of compounds. PUFA PKS system. In one aspect, the recombinant nucleic 0231 Preferably, bioactive compounds of interest are acid molecule encoding a protein that regulates the chain produced by the genetically modified microorganism in an length of fatty acids replaces a nucleic acid Sequence encod amount that is greater than about 0.05%, and preferably ing a chain length factor in the non-bacterial PUFA PKS greater than about 0.1%, and more preferably greater than System. In another aspect, the protein that regulates the chain about 0.25%, and more preferably greater than about 0.5%, length of fatty acids produced by the PUFA PKS system is and more preferably greater than about 0.75%, and more a chain length factor. In another aspect, the protein that preferably greater than about 1%, and more preferably regulates the chain length of fatty acids produced by the greater than about 2.5%, and more preferably greater than PUFA PKS system is a chain length factor that directs the about 5%, and more preferably greater than about 10%, and synthesis of C20 units. more preferably greater than about 15%, and even more 0226. In another embodiment, the organism expresses a preferably greater than about 20% of the dry weight of the non-bacterial PUFAPKS system comprising a genetic modi microorganism. For lipid compounds, preferably, Such com fication in a domain Selected from the group consisting of a pounds are produced in an amount that is greater than about domain encoding B-hydroxyacyl-ACP dehydrase (DH) and 5% of the dry weight of the microorganism. For other a domain encoding fB-ketoacyl-ACP synthase (KS), wherein bioactive compounds, Such as antibiotics or compounds that the modification alters the ratio of long chain fatty acids are Synthesized in Smaller amounts, those Strains possessing produced by the PUFA PKS system as compared to in the Such compounds at of the dry weight of the microorganism absence of the modification. In one aspect of this embodi are identified as predictably containing a novel PKS System ment, the modification is Selected from the group consisting of the type described above. In Some embodiments, particu of a deletion of all or a part of the domain, a Substitution of lar bioactive molecules (compounds) are Secreted by the a homologous domain from a different organism for the microorganism, rather than accumulating. Therefore, Such domain, and a mutation of the domain. bioactive molecules are generally recovered from the culture medium and the concentration of molecule produced will 0227. In another embodiment, the organism expresses a vary depending on the microorganism and the Size of the non-bacterial PUFA PKS system comprising a modification culture. in an enoyl-ACP reductase (ER) domain, wherein the modi fication results in the production of a different compound as 0232. One embodiment of the present invention relates to compared to in the absence of the modification. In one a method to modify an endproduct containing at least one aspect of this embodiment, the modification is Selected from fatty acid, comprising adding to Said endproduct an oil the group consisting of a deletion of all or a part of the ER produced by a recombinant host cell that expresses at least one recombinant nucleic acid molecule comprising a nucleic domain, a Substitution of an ER domain from a different acid Sequence encoding at least one biologically active organism for the ER domain, and a mutation of the ER domain of a PUFA PKS system. The PUFA PKS system is domain. any non-bacterial PUFA PKS system, and preferably, is 0228. In one embodiment of the method to produce a Selected from the group of: (a) a nucleic acid sequence bioactive molecule, the organism produces a polyunsatu encoding at least one domain of a polyunsaturated fatty acid rated fatty acid (PUFA) profile that differs from the naturally (PUFA) polyketide synthase (PKS) system from a Thraus occurring organism without a genetic modification. tochytrid microorganism; (b) a nucleic acid Sequence encod ing at least one domain of a PUFA PKS system from a 0229 Many other genetic modifications useful for pro microorganism identified by the novel Screening method ducing bioactive molecules will be apparent to those of Skill disclosed herein; (c) a nucleic acid sequence encoding an in the art, given the present disclosure, and various other amino acid Sequence Selected from the group consisting of: modifications have been discussed previously herein. The SEQ ID NO:2, SEQ ID NO:4, SEQ ID NO:6, and biologi present invention contemplates any genetic modification cally active fragments thereof; (d) a nucleic acid sequence related to a PUFA PKS system as described herein which encoding an amino acid Sequence Selected from the group results in the production of a desired bioactive molecule. consisting of: SEQ ID NO:8, SEQ ID NO:10, SEQ ID 0230 Bioactive molecules, according to the present NO:13, SEQ ID NO:18, SEQ ID NO:20, SEQ ID NO:22, invention, include any molecules (compounds, products, SEQ ID NO:24, SEQ ID NO:26, SEQ ID NO:28, SEQ ID etc.) that have a biological activity, and that can be produced NO:30, SEQ ID NO:32, and biologically active fragments by a PKS System that comprises at least one amino acid thereof; (e) a nucleic acid sequence encoding an amino acid Sequence having a biological activity of at least one func sequence that is at least about 60% identical to at least 500 tional domain of a non-bacterial PUFA PKS system as consecutive amino acids of an amino acid Sequence Selected described herein. Such bioactive molecules can include, but from the group consisting of: SEQ ID NO:2, SEQ ID NO:4, are not limited to: a polyunsaturated fatty acid (PUFA), an and SEQ ID NO:6; wherein the amino acid sequence has a anti-inflammatory formulation, a chemotherapeutic agent, biological activity of at least one domain of a PUFA PKS an active excipient, an osteoporosis drug, an anti-depressant, System; and, (f) a nucleic acid sequence encoding an amino an anti-convulsant, an anti-Heliobactor pylori drug, a drug acid Sequence that is at least about 60% identical to an amino for treatment of neurodegenerative disease, a drug for treat acid Sequence Selected from the group consisting of: SEQID ment of degenerative liver disease, an antibiotic, and a NO:8, SEQ ID NO:10, SEQ ID NO:13, SEQ ID NO:18, US 2002/0194641 A1 Dec. 19, 2002

SEQ ID NO:20, SEQ ID NO:22, SEQ ID NO:24, SEQ ID is at least about 60% identical to at least 500 consecutive NO:26, SEQ ID NO:28, SEQ ID NO:30, and SEQ ID amino acids of an amino acid Sequence Selected from the NO:32, wherein the amino acid Sequence has a biological group consisting of: SEQ ID NO:2, SEQ ID NO:4, and SEQ activity of at least one domain of a PUFA PKS system. ID NO:6; wherein the amino acid sequence has a biological Variations of these nucleic acid Sequences have been activity of at least one domain of a PUFA PKS system; described in detail above. and/or (f) a nucleic acid Sequence encoding an amino acid 0233 Preferably, the endproduct is selected from the Sequence that is at least about 60% identical to an amino acid group consisting of a food, a dietary Supplement, a pharma Sequence Selected from the group consisting of: SEQ ID ceutical formulation, a humanized animal milk, and an NO:8, SEQ ID NO:10, SEQ ID NO:13, SEQ ID NO:18, infant formula. Suitable pharmaceutical formulations SEQ ID NO:20, SEQ ID NO:22, SEQ ID NO:24, SEQ ID include, but are not limited to, an anti-inflammatory formu NO:26, SEQ ID NO:28, SEQ ID NO:30, and SEQ ID lation, a chemotherapeutic agent, an active eXcipient, an NO:32, wherein the amino acid Sequence has a biological Osteoporosis drug, an anti-depressant, an anti-convulsant, an activity of at least one domain of a PUFA PKS system. anti-Heliobactor pylori drug, a drug for treatment of neu 0236 Methods to genetically modify a host cell and to rodegenerative disease, a drug for treatment of degenerative produce a genetically modified non-human, milk-producing liver disease, an antibiotic, and a cholesterol lowering for animal, are known in the art. Examples of host animals to mulation. In one embodiment, the endproduct is used to treat modify include cattle, sheep, pigs, goats, yaks, etc., which a condition Selected from the group consisting of chronic are amenable to genetic manipulation and cloning for rapid inflammation, acute inflammation, gastrointestinal disorder, expansion of a transgene expressing population. For ani cancer, cachexia, cardiac restenosis, neurodegenerative dis mals, PKS-like transgenes can be adapted for expression in order, degenerative disorder of the liver, blood lipid disorder, target organelles, tissues and body fluids through modifica Osteoporosis, osteoarthritis, autoimmune disease, preec tion of the gene regulatory regions. Of particular interest is lampsia, preterm birth, age related maculopathy, pulmonary the production of PUFAs in the breast milk of the host disorder, and peroxisomal disorder. animal. 0234 Suitable food products include, but are not limited 0237) The following examples are provided for the pur to, fine bakery wares, bread and rolls, breakfast cereals, pose of illustration and are not intended to limit the Scope of processed and unprocessed cheese, condiments (ketchup, the present invention. mayonnaise, etc.), dairy products (milk, yogurt), puddings and gelatine desserts, carbonated drinks, teas, powdered EXAMPLES beverage mixes, processed fish products, fruit-based drinks, chewing gum, hard confectionery, frozen dairy products, Example 1 processed meat products, nut and nut-based spreads, pasta, 0238. The following example describes the further analy processed poultry products, gravies and Sauces, potato chips sis of PKS related sequences from Schizochytrium. and other chips or crisps, chocolate and other confectionery, Soups and Soup mixes, Soya based products (milks, drinks, 0239). The present inventors have sequenced the genomic creams, whiteners), vegetable oil-based spreads, and veg DNA including the entire length of all three open reading etable-based drinks. frames (Orfs) in the Schizochytrium PUFA PKS system using the general methods outlined in Examples 8 and 9 0235 Yet another embodiment of the present invention from PCT Publication No. WO 0042195 and U.S. applica relates to a method to produce a humanized animal milk. tion Ser. No. 09/231,899. The biologically active domains in This method includes the Steps of genetically modifying the Schizochytrium PKS proteins are depicted graphically in milk-producing cells of a milk-producing animal with at FIG. 1. The domain structure of the Schizochytrium PUFA least one recombinant nucleic acid molecule comprising a PKS system is described more particularly as follows. nucleic acid Sequence encoding at least one biologically active domain of a PUFA PKS system. The PUFA PKS Open Reading Frame A (OrfA) system is a non-bacterial PUFAPKS system, and preferably, the at least one domain of the PUFAPKS system is encoded 0240 The complete nucleotide sequence for OrfA is by a nucleic acid Sequence Selected from the group consist represented herein as SEQ ID NO:1. OrfA is a 8730 nucle ing of: (a) a nucleic acid sequence encoding at least one otide Sequence (not including the stop codon) which encodes domain of a polyunsaturated fatty acid (PUFA) polyketide a 2910 amino acid sequence, represented herein as SEQ ID synthase (PKS) system from a Thraustochytrid microorgan NO:2. Within OrfA are twelve domains: ism; (b) a nucleic acid sequence encoding at least one 0241 (a) one f-keto acyl-ACP synthase (KS) domain of a PUFA PKS system from a microorganism domain; identified by the novel screening method described previ ously herein; (c) a nucleic acid sequence encoding an amino 0242 (b) one malonyl-CoA:ACP acyltransferase acid Sequence Selected from the group consisting of: SEQID (MAT) domain; NO:2, SEQID NO:4, SEQID NO:6, and biologically active 0243 (c) nine acyl carrier protein (ACP) domains; fragments thereof; (d) a nucleic acid sequence encoding an amino acid Sequence Selected from the group consisting of: 0244 (d) one ketoreductase (KR) domain. SEQ ID NO:8, SEQ ID NO:10, SEQ ID NO:13, SEQ ID 0245. The domains contained within OrfA have been NO:18, SEQ ID NO:20, SEQ ID NO:22, SEQ ID NO:24, determined based on: SEQ ID NO:26, SEQ ID NO:28, SEQ ID NO:30, SEQ ID NO:32, and biologically active fragments thereof; (e) a 0246 (1) results of an analysis with Pfam program nucleic acid Sequence encoding an amino acid Sequence that (Pfam is a database of multiple alignments of protein US 2002/0194641 A1 Dec. 19, 2002 34

domains or conserved protein regions. The align positions 575 and 600 of SEQID NO:2 (ORFA) to an ending ments represent Some evolutionary conserved Struc point of between about positions 935 and 1000 of SEQ ID ture that has implications for the protein's function. NO:2. The amino acid sequence containing the ORFA-MAT Profile hidden Markov models (profile HMMs) built domain is represented herein as SEQ ID NO:10 (positions from the Pfam alignments can be very useful for 575-1000 of SEQ ID NO:2). It is noted that the ORFA-MAT automatically recognizing that a new protein belongs domain contains an active site motif: GHS*XG (* acyl to an existing protein family, even if the homology is binding site Soe), represented herein as SEQ ID NO:11. weak. Unlike Standard pairwise alignment methods (e.g. BLAST, FASTA), Pfam HMMs deal sensibly ORFA-ACPit 1-9 with multidomain proteins. The reference provided 0251 Domains 3-11 of OrfA are nine tandem ACP for the Pfam version used is: Bateman A, Birney E, domains, also referred to herein as ORFA-ACP (the first Cerruti L, Durbin R, Etwiller L, Eddy SR, Griffiths domain in the sequence is ORFA-ACP1, the second domain Jones S. Howe KL, Marshall M, Sonnhammer EL is ORFA-ACP2, the third domain is ORFA-ACP3, etc.). The (2002) Nucleic Acids Research 30(1):276-280); and/ first ACP domain, ORFA-ACP1, is contained within the O nucleotide Sequence Spanning from about position 3343 to 0247 (2) homology comparison to bacterial PUFA about position3600 of SEQID NO:1 (OrfA). The nucleotide PKS systems (e.g., Shewanella) using a BLAST 2.0 Sequence containing the Sequence encoding the ORFA Basic BLAST homology search using blastp for ACP1 domain is represented herein as SEQ ID NO:12 amino acid Searches with Standard default param (positions 3343-3600 of SEQ ID NO:1). The amino acid eters, wherein the query Sequence is filtered for low Sequence containing the first ACP domain spans from about complexity regions by default (described in Altschul, position 1115 to about position 1200 of SEQ ID NO:2. The S. F., Madden, T. L., Schaaffer, A. A., Zhang, J., amino acid Sequence containing the ORFA-ACP1 domain is Zhang, Z., Miller, W. & Lipman, D. J. (1997) represented herein as SEQ ID NO:13 (positions 1115-1200 “Gapped BLAST and PSI-BLAST: a new generation of SEQ ID NO:2). It is noted that the ORFA-ACP1 domain of protein database Search programs. Nucleic Acids contains an active site motif: LGIDS* (*pantetheine binding Res. 25:3389-3402, incorporated herein by reference motif Ss), represented herein by SEQ ID NO:14. The in its entirety). nucleotide and amino acid Sequences of all nine ACP domains are highly conserved and therefore, the Sequence 0248 Sequences provided for individual domains are for each domain is not represented herein by an individual believed to contain the full length of the Sequence encoding Sequence identifier. However, based on this information, one a functional domain, and may contain additional flanking of skill in the art can readily determine the Sequence for each sequence within the Orf. of the other eight ACP domains. The repeat interval for the nine domains is approximately about 110 to about 330 ORFA-KS nucleotides of SEO ID NO:1. 0249. The first domain in OrfA is a KS domain, also 0252 All nine ACP domains together span a region of referred to herein as ORFA-KS. This domain is contained OrfA of from about position 3283 to about position 6288 of within the nucleotide Sequence Spanning from a starting SEQ ID NO:1, which corresponds to amino acid positions of point of between about positions 1 and 40 of SEQ ID NO:1 from about 1095 to about 2096 of SEQID NO:2. This region (OrfA) to an ending point of between about positions 1428 includes the linker segments between individual ACP and 1500 of SEQ ID NO:1. The nucleotide sequence con domains. Each of the nine ACP domains contains a panteth taining the Sequence encoding the ORFA-KS domain is eine binding motif LGIDS (represented herein by SEQ ID represented herein as SEQ ID NO:7 (positions 1-1500 of NO:14), wherein * is the pantetheine binding site S. At each SEQ ID NO:1). The amino acid sequence containing the KS end of the ACP domain region and between each ACP domain spans from a starting point of between about posi domain is a region that is highly enriched for proline (P) and tions 1 and 14 of SEQ ID NO:2 (ORFA) to an ending point alanine (A), which is believed to be a linker region. For of between about positions 476 and 500 of SEQ ID NO:2. example, between ACP domains 1 and 2 is the Sequence: The amino acid Sequence containing the ORFA-KS domain APAPVKAAAPAAPVASAPAPA, represented herein as is represented herein as SEQ ID NO:8 (positions 1-500 of SEO ID NO:15. SEQ ID NO:2). It is noted that the ORFA-KS domain contains an active site motif: DXAC* (*acyl binding site ORFA-KR C21s). 0253 Domain 12 in OrfA is a KR domain, also referred to herein as ORFA-KR. This domain is contained within the OREA-MAT nucleotide Sequence Spanning from a starting point of about 0250) The second domain in OrfA is a MAT domain, also position 6598 of SEQ ID NO:1 to an ending point of about referred to herein as ORFA-MAT. This domain is contained position 8730 of SEQ ID NO:1. The nucleotide sequence within the nucleotide Sequence Spanning from a starting containing the Sequence encoding the ORFA-KR domain is point of between about positions 1723 and 1798 of SEQ ID represented herein as SEQ ID NO:17 (positions 6598-8730 NO:1 (OrfA) to an ending point of between about positions of SEQ ID NO:1). The amino acid sequence containing the 2805 and 3000 of SEQ ID NO:1. The nucleotide sequence KR domain spans from a Starting point of about position containing the Sequence encoding the ORFA-MAT domain 2200 of SEQ ID NO:2 (ORFA) to an ending point of about is represented herein as SEQID NO:9 (positions 1723-3000 position 2910 of SEQ ID NO:2. The amino acid sequence of SEQ ID NO:1). The amino acid sequence containing the containing the ORFA-KR domain is represented herein as MAT domain spans from a starting point of between about SEQ ID NO:18 (positions 2200-2910 of SEQ ID NO:2). US 2002/0194641 A1 Dec. 19, 2002

Within the KR domain is a core region with homology to domain is represented herein as SEQ ID NO:22 (positions Short chain aldehyde-dehydrogenases (KR is a member of 460-900 of SEQ ID NO:4). It is noted that the ORFB-CLF this family). This core region spans from about position domain contains a KS active site motif without the acyl 7198 to about position 7500 of SEQ ID NO:1, which binding cysteine. corresponds to amino acid positions 2400-2500 of SEQ ID NO:2. ORFB-AT 0262 The third domain in OrfB is an AT domain, also Open Reading Frame B (OrfB) referred to herein as ORFB-AT. This domain is contained 0254 The complete nucleotide sequence for OrfB is within the nucleotide Sequence Spanning from a starting represented herein as SEQ ID NO:3. OrfB is a 6177 nucle point of between about positions 2701 and 3598 of SEQ ID otide Sequence (not including the stop codon) which encodes NO:3 (OrfB) to an ending point of between about positions a 2059 amino acid sequence, represented herein as SEQ ID 3975 and 4200 of SEQ ID NO:3. The nucleotide sequence NO:4. Within OrfB are four domains: containing the Sequence encoding the ORFB-AT domain is represented herein as SEQ ID NO:23 (positions 2701-4200 0255 (a) B-ketoacyl-ACP synthase (KS) domain; of SEQ ID NO:3). The amino acid sequence containing the 0256 (b) one chain length factor (CLF) domain; AT domain spans from a starting point of between about positions 901 and 1200 of SEQ ID NO:4 (ORFB) to an 0257 (c) one acyl transferase (AT) domain; ending point of between about positions 1325 and 1400 of SEQ ID NO:4. The amino acid sequence containing the 0258 (d) one enoyl ACP-reductase (ER) domain. ORFB-AT domain is represented herein as SEQ ID NO:24 0259. The domains contained within ORFB have been (positions 901-1400 of SEQ ID NO:4). It is noted that the determined based on: (1) results of an analysis with Pfam ORFB-AT domain contains an AT active site motif of program, described above; and/or (2) homology comparison GXS*XG (*acyl binding site So). to bacterial PUFA-PKS systems (e.g., Shewanella) using a BLAST 2.0 Basic BLAST homology search, also described ORFB-ER above. Sequences provided for individual domains are 0263. The fourth domain in OrfB is an ER domain, also believed to contain the full length of the Sequence encoding referred to herein as ORFB-ER. This domain is contained a functional domain, and may contain additional flanking within the nucleotide Sequence Spanning from a starting sequence within the Orf. point of about position 4648 of SEQ ID NO:3 (OrfB) to an ending point of about position 6177 of SEQ ID NO:3. The ORFB-KS nucleotide Sequence containing the Sequence encoding the 0260 The first domain in OrfB is a KS domain, also ORFB-ER domain is represented herein as SEQ ID NO:25 referred to herein as ORFB-KS. This domain is contained (positions 4648-6177 of SEQ ID NO:3). The amino acid within the nucleotide Sequence Spanning from a starting Sequence containing the ER domain spans from a starting point of between about positions 1 and 43 of SEQ ID NO:3 point of about position 1550 of SEQ ID NO:4 (ORFB) to an (OrfB) to an ending point of between about positions 1332 ending point of about position 2059 of SEQ ID NO:4. The and 1350 of SEQ ID NO:3. The nucleotide sequence con amino acid Sequence containing the ORFB-ER domain is taining the Sequence encoding the ORFB-KS domain is represented herein as SEQ ID NO:26 (positions 1550-2059 represented herein as SEQ ID NO:19 (positions 1-1350 of of SEQ ID NO:4). SEQ ID NO:3). The amino acid sequence containing the KS domain spans from a starting point of between about posi Open Reading Frame C(OrfC) tions 1 and 15 of SEQID NO:4 (ORFB) to an ending point 0264. The complete nucleotide sequence for Orfo is of between about positions 444 and 450 of SEQ ID NO:4. represented herein as SEQ ID NO:5. is Orfo is a 4509 The amino acid sequence containing the ORFB-KS domain nucleotide Sequence (not including the stop codon) which is represented herein as SEQ ID NO:20 (positions 1-450 of encodes a 1503 amino acid Sequence, represented herein as SEQ ID NO:4). It is noted that the ORFB-KS domain SEO ID NO:6. Within Orf are three domains: contains an active site motif: DXAC* (*acyl binding site 0265 (a) two FabA-like B-hydroxyacyl-ACP dehy Coo). drase (DH) domains; ORFB-CLF 0266 (b) one enoyl ACP-reductase (ER) domain. 0261) The second domain in OrfB is a CLF domain, also 0267. The domains contained within ORFC have been referred to herein as ORFB-CLF. This domain is contained determined based on: (1) results of an analysis with Pfam within the nucleotide Sequence Spanning from a starting program, described above; and/or (2) homology comparison point of between about positions 1378 and 1402 of SEQ ID to bacterial PUFA-PKS systems (e.g., Shewanella) using a NO:3 (OrfB) to an ending point of between about positions BLAST 2.0 Basic BLAST homology search, also described 2682 and 2700 of SEQ ID NO:3. The nucleotide sequence above. Sequences provided for individual domains are containing the Sequence encoding the ORFB-CLF domain is believed to contain the full length of the Sequence encoding represented herein as SEQ ID NO:21 (positions 1378-2700 a functional domain, and may contain additional flanking of SEQ ID NO:3). The amino acid sequence containing the sequence within the Orf. CLF domain spans from a starting point of between about positions 460 and 468 of SEQID NO:4 (ORFB) to an ending ORFC-DH1 point of between about positions 894 and 900 of SEQ ID 0268. The first domain in Orfo is a DH domain, also NO:4. The amino acid sequence containing the ORFB-CLF referred to herein as ORFC-DH1. This is one of two DH US 2002/0194641 A1 Dec. 19, 2002 36 domains in Orfo, and therefore is designated DH1. This 0274. Two mL of a culture of the strain/microorganism to domain is contained within the nucleotide Sequence Span be tested is placed in 250 mL baffled shake flask with 50 mL ning from a starting point of between about positions 1 and culture media (aerobic treatment) and another 2 mL of 778 of SEQ ID NO:5 (OrfQ) to an ending point of between culture of the same strain is placed in a 250 mL non-baffled about positions 1233 and 1350 of SEQ ID NO:5. The shake flask with 200 mL culture medium (anoxic treatment). nucleotide Sequence containing the Sequence encoding the Both flasks are placed on a shaker table at 200 rpm. After ORFC-DH1 domain is represented herein as SEQID NO:27 48-72 hr of culture time, the cultures are harvested by (positions 1-1350 of SEQ ID NO:5). The amino acid centrifugation and the cells analyzed for fatty acid methyl Sequence containing the DH1 domain spans from a starting esters via gas chromatography to determine the following point of between about positions 1 and 260 of SEQID NO:6 data for each culture: (1) fatty acid profile; (2) PUFA (ORFC) to an ending point of between about positions 411 content; (3) fat content (estimated as amount total fatty acids and 450 of SEQ ID NO:6. The amino acid sequence con (TFA)). taining the ORFC-DH1 domain is represented herein as SEQ 0275. These data are then analyzed asking the following ID NO:28 (positions 1-450 of SEQ ID NO:6). five questions: ORFC-DH2 0276 Selection Criteria: Low O/Anoxic Flask vs. Aero 0269. The second domain in Orfo is a DH domain, also bic Flask (Yes/No) referred to herein as ORFC-DH2. This is the second of two 0277 (1) Did the DHA (or other PUFA content) (as % DH domains in Orfo, and therefore is designated DH2. This FAME) stay about the same or preferably increase in the low domain is contained within the nucleotide Sequence Span oxygen culture compared to the aerobic culture? ning from a starting point of between about positions 1351 and 2437 of SEQ ID NO:5 (OrfC) to an ending point of 0278 (2) Is C14:0+C16:0+C16:1 greater than about 40% between about positions 2607 and 2850 of SEQ ID NO:5. TFA in the anoxic culture? The nucleotide Sequence containing the Sequence encoding the ORFC-DH2 domain is represented herein as SEQ ID 0279 (3) Is there very little (>1% as FAME) or no NO:29 (positions 1351-2850 of SEQ ID NO:5). The amino precursors (C18:3n-3+C18:2n-6+Cc18:3n-6) to the conven acid Sequence containing the DH2 domain spans from a tional oxygen dependent elongase/desaturase pathway in the starting point of between about positions 451 and 813 of anoxic culture? SEQ ID NO:6 (ORFC) to an ending point of between about 0280 (4) Did fat content (as amount total fatty acids/cell positions 869 and 950 of SEQ ID NO:6. The amino acid dry Weight) increase in the low oxygen culture compared to sequence containing the ORFC-DH2 domain is represented the aerobic culture? herein as SEQ ID NO:30 (positions 451-950 of SEQ ID 0281 (5) Did DHA (or other PUFA content) increase as NO:6). % cell dry weight in the low oxygen culture compared to the ORFC-ER aerobic culture? 0270. The third domain in Orfo is an ER domain, also 0282) If first three questions are answered yes, there is a referred to herein as ORFC-ER. This domain is contained good indication that the Strain contains a PKS genetic within the nucleotide Sequence Spanning from a starting System for making long chain PUFAS. The more questions point of about position 2998 of SEQ ID NO:5 (OrfQ) to an that are answered yes (preferably the first three questions ending point of about position 4509 of SEQ ID NO:5. The must be answered yes), the stronger the indication that the nucleotide Sequence containing the Sequence encoding the strain contains such a PKS genetic system. If all five ORFC-ER domain is represented herein as SEQ ID NO:31 questions are answered yes, then there is a very Strong (positions 2998-4509 of SEQ ID NO:5). The amino acid indication that the Strain contains a PKS genetic System for Sequence containing the ER domain spans from a starting making long chain PUFAS. point of about position 1000 of SEQ ID NO:6 (ORFC) to an 0283 Following the method outlined above, a frozen vial ending point of about position 1502 of SEQ ID NO:6. The of Thraustochytrium sp. 23B (ATCC 20892) was used to amino acid Sequence containing the ORFC-ER domain is inoculate a 250 mL shake flask containing 50 mL of RCA represented herein as SEQ ID NO:32 (positions 1000-1502 medium. The culture was shaken on a shaker table (200 rpm) of SEQ ID NO:6). for 72 hr at 25 C. RCA medium contains the following: Example 2 0271 The following example describes the use of the Screening process of the present invention to identify three RCA Medium other non-bacterial organisms comprising a PUFA PKS Deionized water 1000 mL. System according to the present invention. Reef Crystals (R) sea salts 40 g/L Glucose 20 g/L 0272. Thraustochytrium sp. 23B (ATCC 20892) was cul Monosodium glutamate (MSG) 20 g/L tured according to the Screening method described in U.S. Yeast extract 1 g/L PII metals* 5 mL/L Provisional Application Serial No. 60/298,796 and as Vitamin mix 1 mLL described in detail herein. pH 7.0

0273 The biorational screen (using shake flask cultures) PII metal mix and vitamin mix are same as tose outlined in U.S. Pat. No. developed for detecting microorganisms containing PUFA 5,130,742, incorporated herein by reference in its entirety. producing PKS systems is as follows: US 2002/0194641 A1 Dec. 19, 2002 37

0284. 25 mL of the 72 hr old culture was then used to 23B library, indicating extensive homology between the inoculate another 250 mL shake flask containing 50 mL of Schizochytrium PUFA-PKS genes and the genes of Thraus low nitrogen RCA medium (10 g/L MSG instead of 20 g/L) tochytrium 23B. and the other 25 mL of culture was used to inoculate a 250 0290. In summary, these results demonstrate that Thraus mL shake flask containing 175 mL of low-nitrogen RCA tochytrium 23B genomic DNA contains Sequences that are medium. The two flasks were then placed on a shaker table homologous to PKS genes from Schizochytrium 20888. (200 rpm) for 72 hr at 25°C. The cells were then harvested via centrifugation and dried by lyophilization. The dried 0291. This Thraustochytrid microorganism is encom cells were analyzed for fat content and fatty acid profile and passed herein as an additional Sources of these genes for use content using Standard gas chromatograph procedures (Such in the embodiments above. as those outlined in U.S. Pat. No. 5,130,742). 0292. Thraustochytrium 23B (ATCC 20892) is signifi 0285) The screening results for Thraustochytrium 23B cantly different from Schizochytrium sp. (ATCC 20888) in were as follows: its fatty acid profile. Thraustochytrium 23B can have DHA: DPA(n-6) ratios as high as 14:1 compared to only 2-3:1 in Schizochytrium (ATCC 20888). Thraustochytrium 23B can also have higher levels of C20:5(n-3). Analysis of Did DHA as % FAME increase Yes (38->44%) the domains in the PUFA PKS system of Thraustochytrium C14:0 + C16:0 + C16:1 greater than about Yes (44%) 40% TEA 23B in comparison to the known Schizochytrium PUFA No C18:3(n-3) or C18:3(n - 6)? Yes (0%) PKS system should provide us with key information on how Did fat content increase? Yes (2-fold increase) to modify these domains to influence the ratio and types of Did DHA (or other HUFA content increase)? Yes (2.3-fold increase) PUFA produced using these systems. 0293. The screening method described above has been utilized the identify other potential candidate Strains con 0286 The results, especially the significant increase in taining a PUFA PKS system. Two additional strains that DHA content (as % FAME) under low oxygen conditions, have been identified by the present inventors to have PUFA conditions, Strongly indicates the presence of a PUFA pro PKS systems are Schizochytrium limacium (SR21) Honda & ducing PKS system in this strain of Thraustochytrium. Yokochi (IFO32693)and Ulkenia (BP-5601). Both were 0287. In order to provide additional data confirming the Screened as above but in N2 media (glucose: 60 g/L, presence of a PUFA PKS system, Southern blot of Thraus KHPO: 4.0 g/l, yeast extract: 1.0 g/L, corn Steep liquor: 1 tochytrium 23B was conducted using PKS probes from mL/L, NHNO: 1.0 g/L, artificial sea salts (Reef Crystals): Schizochytrium strain 20888, a strain which has already 20 g/L, all above concentrations mixed in deionized water). been determined to contain a PUFA producing PKS system For both the Schizochytrium and Ulkenia strains, the (i.e., SEQ ID Nos: 1-32 described above). Fragments of answers to the first three Screen questions discussed above Thraustochytrium 23B genomic DNA which are homolo for Thraustochytrium 23B was yes (Schizochytrium-DHA gous to hybridization probes from PKS PUFA synthesis % FAME 32->41% aerobic vs anoxic, 58%. 14:0/16:0/16:1, genes were detected using the Southern blot technique. 0% precursors) and (Ulkenia-DHA% FAME 28->44% Thraustochytrium 23B genomic DNA was digested with aerobic vs anoxic, 63%. 14:0/16:0/16: 1, 0% precursors), either Clal or KpnI restriction endonucleases, Separated by indicating that these Strains are good candidates for contain agarose gel electrophoresis (0.7% agarose, in Standard Tris ing a PUFA PKS system. Negative answers were obtained Acetate-EDTA buffer), and blotted to a Schleicher & Schuell for the final two questions for each Strain: fat decreased from Nytran Supercharge membrane by capillary transfer. Two 61% dry wt to 22% dry weight, and DHA from 21-9% dry digoxigenin labeled hybridization probes were used-one weight in S. limacium and fat decreased from 59 to 21% dry specific for the Enoyl Reductase (ER) region of weight in Ulkenia and DHA from 16% to 9% dry weight. Schizochytrium PKS Orf B (nucleotides 5012-5511 of Orf These Thraustochytrid microorganisms are also claimed B; SEQ ID NO:3), and the other specific for a conserved herein as additional Sources of the genes for use in the region at the beginning of Schizochytrium PKS Orf C embodiments above. (nucleotides 76-549 of Orfo: SEQ ID NO:5). Example 3 0288 The OrfB-ER probe detected an approximately 0294 The following example demonstrates that DHA 13kb Clal fragment and an approximately 3.6 kb KpnI and DPA synthesis in Schizochytrium does not involve fragment in the Thraustochytrium 23B genomic DNA. The membrane-bound desaturases or fatty acid elongation OrfC probe detected an approximately 7.5 kb Clal fragment enzymes like those described for other eukaryotes (Parker and an approximately 4.6 kb KpnI fragment in the Thraus Barnes et al., 2000, Supra; Shanklin et al., 1998, Supra). tochytrium 23B genomic DNA. 0295 Schizochytrium accumulates large quantities of 0289 Finally, a recombinant genomic library, consisting triacylglycerols rich in DHA and docosapentaenoic acid of DNA fragments from Thraustochytrium 23B genomic (DPA; 22:5c)6); e.g.,30% DHA+DPA by dry weight. In DNA inserted into vector lambda FIX II (Stratagene), was eukaryotes that synthesize 20- and 22-carbon PUFAS by an Screened using digoxigenin labeled probes corresponding to elongation/desaturation pathway, the pools of 18-, 20- and the following segments of Schizochytrium 20888 PUFA 22-carbon intermediates are relatively large So that in vivo PKS genes: nucleotides 7385-7879 of OrfA(SEQID NO:1), labeling experiments using "C-acetate reveal clear pre nucleotides 5012-5511 of Orf B (SEQ ID NO:3), and cursor-product kinetics for the predicted intermediates. Fur nucleotides 76-549 of Orf C (SEQ ID NO:5). Each of these thermore, radiolabeled intermediateS provided exogenously probes detected positive plaques from the Thraustochytrium to such organisms are converted to the final PUFA products. US 2002/0194641 A1 Dec. 19, 2002 38

0296) 1-"Clacetate was supplied to a 2-day-old culture Equivalent aliquots of total homogenate, pellet (H-Spellet), as a Single pulse at Zero time. Samples of cells were then and Supernatant (H-S Super) fractions were incubated in harvested by centrifugation and the lipids were extracted. In homogenization buffer supplemented with 20 uM acetyl addition, 1-'Clacetate uptake by the cells was estimated by CoA, 100 uM 1-'Cmalonyl-CoA (0.9 Gbd/mol), 2mM measuring the radioactivity of the Sample before and after NADH, and 2 mM NADPH for 60 min at 25° C. Assays centrifugation. Fatty acid methyl esters derived from the were extracted and fatty acid methyl esters were prepared total cell lipids were separatedby AgNO-TLC (solvent, and Separated as described above before detection of radio hexane:diethyl ether:acetic acid, 70:30:2 by volume). The activity with an Instantimager (Packard Instruments, Meri identity of the fatty acid bands was verified by gas chroma tography, and the radioactivity in them was measured by den, Conn.). Results showed that a cell-free homogenate scintillation counting. Results showed that 1-'C-acetate derived from Schizochytrium cultures incorporated 1-'C- was rapidly taken up by Schizochytrium cells and incorpo malonyl-CoA into DHA, DPA, and saturated fatty acids rated into fatty acids, but at the shortest labeling time (1 min) (data not shown). The same biosynthetic activities were DHA contained 31% of the label recovered in fatty is acids retained by a 100,000xg Supernatant fraction but were not and this percentage remained essentially unchanged during present in the membrane pellet. These data contrast with the 10-15 min of 'C-acetate incorporation and the sub those obtained during assays of the bacterial enzymes (see Sequent 24 hours of culture growth (data not shown). Simi Metz et al., 2001, Supra) and may indicate use of a different larly, DPA represented 10% of the label throughout the (soluble) acyl acceptor molecule. Thus, DHA and DPA experiment. There is no evidence for a precursor-product Synthesis in Schizochytrium does not involve membrane relationship between 16- or 18-carbon fatty acids and the bound desaturases or fatty acid elongation enzymes like 22-carbon polyunsaturated fatty acids. These results are those described for other eukaryotes. consistent with rapid synthesis of DHA from "C-acetate 0298 While various embodiments of the present inven involving very Small (possibly enzyme-bound) pools of tion have been described in detail, it is apparent that modi intermediates. fications and adaptations of those embodiments will occur to 0297 Next, cells were disrupted in 100 mM phosphate those skilled in the art. It is to be expressly understood, buffer (pH 7.2), containing 2 mM DTT, 2 mM EDTA, and however, that Such modifications and adaptations are within 10% glycerol, by vortexing with glass beads. The cell-free the Scope of the present invention, as Set forth in the homogenate was centrifuged at 100,000 g for 1 hour. following claims.

SEQUENCE LISTING

<160> NUMBER OF SEQ ID NOS : 37

<21 Oc SEQ ID NO 1 <211 LENGTH 8730 <212> TYPE DNA <213> ORGANISM: Schizochytrium sp. <22O > FEATURE <221 NAME/KEY: CDS <222> LOCATION: (1) . . (87.30) OTHER INFORMATION:

<400 SEQUENCE: 1

atg gcg gcc cgt citg cag gag Cala aag gga ggC gag atg gat acc cgc 48 Met Ala Ala Arg Lieu Glin Glu Glin Lys Gly Gly Glu Met Asp Thr Arg 1 5 10 15 att gcc atc atc ggc atg tog gcc atc citc. cc c tgc gg C acg acc gtg 96 Ile Ala Ile Ile Gly Met Ser Ala Ile Teu Pro Cys Gly Thr Thr Wall 2O 25 30 cgc gag tog tgg gag acc atc cgc gcc ggC atc gac tgc citg tog gat 144 Arg Glu Ser Trp Glu Thr Ile Arg Ala Gly Ile Asp Cys Teu Ser Asp 35 40 45 citc. ccc. gag gac cgc gtc gac gtg acg gCg tac titt gac ccc. gto aag 192 Teu Pro Glu Asp Arg Wall Asp Wall Thr Ala Tyr Phe Asp Pro Wall Lys 5 O 55 60 acc acc aag gac aag atc tac tgc aag cgc ggit ggC titc. att ccc. gag 240 Thr Thr Lys Asp Lys Ile Tyr Cys Lys Arg Gly Gly Phe Ile Pro Glu 65 70 75 8O tac gac titt gac gcc cgc gag titc. gga citc. a.a. C. atg titc. cag atg gag 288 Asp Phe Asp Ala Arg Glu Phe Gly Teu Asn Met Phe Glin Met Glu 85 90 95 gac tog gac gca a.a. C. cag acc atc tog citt citc. aag gtc aag gag gcc 336 Asp Ser Asp Ala Asn Glin Thr Ile Ser Teu Telu Lys Wall Lys Glu Ala 100 105 110 citc. cag gac gcc ggC atc gac gcc citc. ggC aag gaa aag aag aac atc 384 Teu Glin Asp Ala Gly Ile Asp Ala Teu Gly Lys Glu Lys Lys Asn Ile

US 2002/0194641 A1 Dec. 19, 2002 46

-continued cgc cqc acg citc ggc cag gCt gcg ctic coc aac tog atc cag cqc 8.559 Arg Arg Thr Lieu Gly Glin Ala Ala Lieu Pro Asn. Ser Ile Glin Arg 284 O 284.5 285 O atc gtc. cag cac cqc cog gtc. cc g cag gac aag ccc titc tac att 86O4 Ile Val Gln His Arg Pro Val Pro Glin Asp Llys Pro Phe Tyr Ile 2855 2.860 2865 acc ctic cqc toc aac cag to g g g c ggit cac tocc cag cac aag cac 8649 Thr Lieu Arg Ser Asn Glin Ser Gly Gly His Ser Glin His Lys His 2870 2875 2880 gcc citt cag titc. cac aac gag cag ggc gat citc titc att gat gtc 86.94 Ala Leu Glin Phe His Asn. Glu Glin Gly Asp Leu Phe Ile Asp Wall 2.885 2890 2.895 cag gCt to g g to atc goc acg gac agc citt gcc titc 873 O Glin Ala Ser Val Ile Ala Thr Asp Ser Leu Ala Phe 29 OO 2905 2.910

SEQ ID NO 2 LENGTH 2.910 TYPE PRT ORGANISM: Schizochytrium sp. SEQUENCE: 2 Met Ala Ala Arg Lieu Glin Glu Gln Lys Gly Gly Glu Met Asp Thr Arg 1 5 10 15 Ile Ala Ile Ile Gly Met Ser Ala Ile Leu Pro Cys Gly Thr Thr Val 2O 25 30 Arg Glu Ser Trp Glu Thir Ile Arg Ala Gly Ile Asp Cys Lieu Ser Asp 35 40 45 Leu Pro Glu Asp Arg Val Asp Val Thr Ala Tyr Phe Asp Pro Wall Lys 50 55 60 Thir Thr Lys Asp Lys Ile Tyr Cys Lys Arg Gly Gly Phe Ile Pro Glu 65 70 75 8O Tyr Asp Phe Asp Ala Arg Glu Phe Gly Lieu. Asn Met Phe Glin Met Glu 85 90 95 Asp Ser Asp Ala Asn Glin Thr Ile Ser Lieu Lleu Lys Wall Lys Glu Ala 100 105 110 Leu Glin Asp Ala Gly Ile Asp Ala Leu Gly Lys Glu Lys Lys Asn. Ile 115 120 125 Gly Cys Val Lieu Gly Ile Gly Gly Gly Glin Lys Ser Ser His Glu Phe 130 135 1 4 0 Tyr Ser Arg Lieu. Asn Tyr Val Val Val Glu Lys Val Lieu Arg Lys Met 145 15 O 155 160 Gly Met Pro Glu Glu Asp Wall Lys Val Ala Val Glu Lys Tyr Lys Ala 1.65 170 175 Asn Phe Pro Glu Trp Arg Lieu. Asp Ser Phe Pro Gly Phe Leu Gly Asn 18O 185 190 Val Thr Ala Gly Arg Cys Thr Asn. Thir Phe Asn Lieu. Asp Gly Met Asn 195 200 2O5 Cys Val Val Asp Ala Ala Cys Ala Ser Ser Lieu. Ile Ala Wall Lys Wal 210 215 220 Ala Ile Asp Glu Lieu Lleu Tyr Gly Asp Cys Asp Met Met Val Thr Gly 225 230 235 240 Ala Thr Cys Thr Asp Asn Ser Ile Gly Met Tyr Met Ala Phe Ser Lys 245 250 255 Thr Pro Val Phe Ser Thr Asp Pro Ser Val Arg Ala Tyr Asp Glu Lys 260 265 27 O Thr Lys Gly Met Lieu. Ile Gly Glu Gly Ser Ala Met Lieu Val Lieu Lys US 2002/0194641 A1 Dec. 19, 2002 47

-continued

275 280 285 Arg Tyr Ala Asp Ala Val Arg Asp Gly Asp Glu Ile His Ala Val Ile 29 O 295 3OO Arg Gly Cys Ala Ser Ser Ser Asp Gly Lys Ala Ala Gly Ile Tyr Thr 305 310 315 320 Pro Thir Ile Ser Gly Glin Glu Glu Ala Lieu Arg Arg Ala Tyr Asn Arg 325 330 335 Ala Cys Val Asp Pro Ala Thr Val Thr Leu Val Glu Gly His Gly Thr 340 345 350 Gly Thr Pro Val Gly Asp Arg Ile Glu Lieu. Thr Ala Lieu Arg Asn Lieu 355 360 365 Phe Asp Lys Ala Tyr Gly Glu Gly Asn Thr Glu Lys Val Ala Val Gly 370 375 38O Ser Ile Lys Ser Ser Ile Gly His Lieu Lys Ala Val Ala Gly Lieu Ala 385 390 395 400 Gly Met Ile Llys Val Ile Met Ala Lieu Lys His Lys Thr Lieu Pro Gly 405 410 415 Thir Ile Asin Val Asp Asn Pro Pro Asn Leu Tyr Asp Asn Thr Pro Ile 420 425 430 Asn Glu Ser Ser Leu Tyr Ile Asn Thr Met Asn Arg Pro Trp Phe Pro 435 4 40 4 45 Pro Pro Gly Val Pro Arg Arg Ala Gly Ile Ser Ser Phe Gly Phe Gly 450 455 460 Gly Ala Asn Tyr His Ala Wall Leu Glu Glu Ala Glu Pro Glu His Thr 465 470 475 480 Thr Ala Tyr Arg Leu Asn Lys Arg Pro Glin Pro Val Leu Met Met Ala 485 490 495 Ala Thr Pro Ala Ala Lieu Glin Ser Lieu. Cys Glu Ala Glin Leu Lys Glu 5 OO 505 510 Phe Glu Ala Ala Ile Lys Glu Asn. Glu Thr Val Lys Asn. Thir Ala Tyr 515 52O 525 Ile Lys Cys Val Lys Phe Gly Glu Glin Phe Lys Phe Pro Gly Ser Ile 530 535 540 Pro Ala Thr Asn Ala Arg Lieu Gly Phe Lieu Val Lys Asp Ala Glu Asp 545 550 555 560 Ala Cys Ser Thr Lieu Arg Ala Ile Cys Ala Glin Phe Ala Lys Asp Val 565 570 575 Thr Lys Glu Ala Trp Arg Lieu Pro Arg Glu Gly Val Ser Phe Arg Ala 58O 585 59 O Lys Gly Ile Ala Thr Asn Gly Ala Val Ala Ala Leu Phe Ser Gly Glin 595 600 605 Gly Ala Glin Tyr Thr His Met Phe Ser Glu Val Ala Met Asn Trp Pro 610 615 62O Glin Phe Arg Glin Ser Ile Ala Ala Met Asp Ala Ala Glin Ser Lys Wal 625 630 635 640 Ala Gly Ser Asp Lys Asp Phe Glu Arg Val Ser Glin Val Lieu. Tyr Pro 645 650 655 Arg Llys Pro Tyr Glu Arg Glu Pro Glu Glin Asn Pro Lys Lys Ile Ser 660 665 670 Lieu. Thir Ala Tyr Ser Glin Pro Ser Thr Lieu Ala Cys Ala Leu Gly Ala 675 680 685 US 2002/0194641 A1 Dec. 19, 2002 48

-continued

Phe Glu Ile Phe Lys Glu Ala Gly Phe Thr Pro Asp Phe Ala Ala Gly 69 O. 695 7 OO His Ser Lieu Gly Glu Phe Ala Ala Leu Tyr Ala Ala Gly Cys Val Asp 705 710 715 720 Arg Asp Glu Lieu Phe Glu Lieu Val Cys Arg Arg Ala Arg Ile Met Gly 725 730 735 Gly Lys Asp Ala Pro Ala Thr Pro Lys Gly Cys Met Ala Ala Val Ile 740 745 750 Gly Pro Asn Ala Glu Asn. Ile Llys Val Glin Ala Ala Asn Val Trp Lieu 755 760 765 Gly Asin Ser Asn Ser Pro Ser Glin Thr Val Ile Thr Gly Ser Val Glu 770 775 78O Gly Ile Glin Ala Glu Ser Ala Arg Lieu Gln Lys Glu Gly Phe Arg Val 785 790 795 8OO Val Pro Leu Ala Cys Glu Ser Ala Phe His Ser Pro Gln Met Glu Asn 805 810 815 Ala Ser Ser Ala Phe Lys Asp Val Ile Ser Lys Val Ser Phe Arg Thr 820 825 830 Pro Lys Ala Glu Thir Lys Leu Phe Ser Asn Val Ser Gly Glu Thr Tyr 835 840 845 Pro Thr Asp Ala Arg Glu Met Leu Thr Glin His Met Thr Ser Ser Val 85 O 855 860 Lys Phe Leu Thr Glin Val Arg Asn Met His Glin Ala Gly Ala Arg Ile 865 870 875 88O Phe Val Glu Phe Gly Pro Lys Glin Val Leu Ser Lys Leu Val Ser Glu 885 890 895 Thr Leu Lys Asp Asp Pro Ser Val Val Thr Val Ser Val Asn Pro Ala 9 OO 905 910 Ser Gly Thr Asp Ser Asp Ile Glin Leu Arg Asp Ala Ala Val Glin Lieu 915 920 925 Val Val Ala Gly Val Asn Lieu Glin Gly Phe Asp Lys Trp Asp Ala Pro 930 935 940 Asp Ala Thr Arg Met Glin Ala Ile Lys Lys Lys Arg Thr Thr Lieu Arg 945 950 955 96.O Leu Ser Ala Ala Thr Tyr Val Ser Asp Llys Thr Lys Lys Val Arg Asp 965 970 975 Ala Ala Met Asn Asp Gly Arg Cys Val Thr Tyr Lieu Lys Gly Ala Ala 98O 985 99 O Pro Lieu. Ile Lys Ala Pro Glu Pro Val Val Asp Glu Ala Ala Lys Arg 995 10 OO 1005 Glu Ala Glu Arg Lieu Gln Lys Glu Lieu Glin Asp Ala Glin Arg Glin 1010 O15 O20 Leu Asp Asp Ala Lys Arg Ala Ala Ala Glu Ala Asn. Ser Lys Lieu 1025 O3O O35 Ala Ala Ala Lys Glu Glu Ala Lys Thr Ala Ala Ala Ser Ala Lys 1040 O45 O5 O Pro Ala Val Asp Thr Ala Val Val Glu Lys His Arg Ala Ile Lieu 1055 O60 O65 Lys Ser Met Leu Ala Glu Lieu. Asp Gly Tyr Gly Ser Val Asp Ala 1070 O75 O8O US 2002/0194641 A1 Dec. 19, 2002 49

-continued

Ser Ser Leu Glin Glin Glin Glin Glin Glin Glin Thr Ala Pro Ala Pro O85 O9 O O95 Wall Lys Ala Ala Ala Pro Ala Ala Pro Wall Ala Ser Ala Pro Ala OO O5 10 Pro Ala Val Ser Asn Glu Leu Leu Glu Lys Ala Glu Thr Val Val

Met Glu Val Leu Ala Ala Lys Thr Gly Tyr Glu Thr Asp Met Ile

Glu Ala Asp Met Glu Lieu Glu Thr Glu Lieu Gly e Asp Ser Ile

Lys Arg Val Glu Ile Leu Ser Glu Val Glin Ala Met Lieu. Asn. Wall

Glu Ala Lys Asp Wall Asp Ala Leu Ser Arg Thr Arg Thr Val Gly

Glu Val Val Asn Ala Met Lys Ala Glu Ile Ala Gly Ser Ser Ala

Pro Ala Pro Ala Ala Ala Ala Pro Ala Pro Ala Lys Ala Ala Pro

Ala Ala Ala Ala Pro Ala Val Ser Asn. Glu Lieu Lleu Glu Lys Ala

Glu Thr Val Val Met Glu Val Leu Ala Ala Lys Thr Gly Tyr Glu 235 240 245 Thr Asp Met Ile Glu Ser Asp Met Glu Lieu Glu Thr Glu Lieu Gly 250 255 260 Ile Asp Ser Ile Lys Arg Val Glu Ile Leu Ser Glu Val Glin Ala

Met Lieu. Asn Val Glu Ala Lys Asp Wall Asp Ala Leu Ser Arg Thr 280 285 29 O Arg Thr Val Gly Glu Val Val Asn Ala Met Lys Ala Glu Ile Ala 295 3OO 305 Gly Gly Ser Ala Pro Ala Pro Ala Ala Ala Ala Pro Gly Pro Ala

Ala Ala Ala Pro Ala Pro Ala Ala Ala Ala Pro Ala Wal Ser Asn

Glu Lieu Lleu Glu Lys Ala Glu Thr Val Val Met Glu Wall Leu Ala

Ala Lys Thr Gly Tyr Glu h Asp Met Ile Glu Ser Asp Met Glu

Leu Glu Thr Glu Lieu Gly Ile Asp Ser Ile Lys Arg Val Glu Ile

Leu Ser Glu Val Glin Ala Met Lieu. Asn Val Glu Ala Lys Asp Wall

Asp Ala Leu Ser Arg Thr Arg Thr Val Gly Glu Val Val Asp Ala

Met Lys Ala Glu Ile Ala Gly Gly Ser Ala Pro Ala Pro Ala Ala

Ala Ala Pro Ala Pro Ala Ala Ala Ala Pro Ala Pro Ala Ala Pro

Ala Pro Ala Val Ser Ser Glu Lieu Lieu Glu Lys Ala Glu Thr Val 4 45 450 455 Val Met Glu Val Leu Ala Ala Lys Thr Gly Tyr Glu Thr Asp Met US 2002/0194641 A1 Dec. 19, 2002 50

-continued

465 470

Ile Ser Asp Met Glu Telu Glu Thr Glu Teu Asp Ser 480

Ile Arg Wall Glu Ile Telu Ser Glu Wall Glin Met Teu Asn 495

Wall Ala Asp Wall Ala Telu Ser Arg Arg Thr Wall

Gly Wall Wall Asp Ala Ala Glu Ile Gly Gly Ser

Ala Ala Pro Ala Ala Ala Pro Ala Pro Ala Ala Ala

Pro Pro Ala Ala Pro Pro Ala Ala Pro Pro Ala Wall

Ser Glu Teu Telu Glu Ala Glu Thr Wall Met Glu Wall

Teu Ala Thr Gly Glu Thr Asp Met Glu Ser Asp

Met Teu Glu Thr Glu Gly Ile Asp Ser Arg Wall

Glu Teu Ser Glu Wall Ala Met Teu Asn Glu Ala

Asp Asp Ala Telu Ser Thr Arg Thr Wall Glu Wall Wall

Asp Met Ala Glu Ala Gly Ser Ser Ser Ala Pro

Ala Ala Ala Pro Ala Ala Ala Ala Ala Ala Pro Ala

Ala Ala Pro Ala Wall Ser Asn Glu Teu Teu Ala Glu 675

Thr Wall Met Glu Wall Telu Ala Ala Thr Tyr Glu Thr 69 O.

Asp Ile Glu Ser Asp Met Glu Telu Glu Thr Teu Gly Ile 705

Asp Ile Arg Wall Ile Telu Ser Glu Glin Ala Met

Teu Wall Glu Ala Wall Asp Ala Teu Arg Thr Arg

Thr Gly Glu Wall Wall Ala Met Ala Ile Ala Gly

Gly Ala Pro Ala Pro Ala Ala Ala Pro Pro Ala Ala

Ala Pro Ala Wall Ser Glu Telu Teu Glu Ala Glu Thr

Wall Met Glu Wall Telu Ala Thr Gly Glu Thr Asp

Met Glu Ser Asp Met Telu Glu Thr Glu Gly Ile Asp

Ser Arg Wall Glu Telu Ser Glu Wall Ala Met Teu

Asn Glu Ala Asp Asp Ala Teu Ser Thr Arg Thr US 2002/0194641 A1 Dec. 19, 2002 51

-continued

Val Gly Glu Val Val Asp Ala Met Lys Ala Glu Ile Ala Gly Ser 850 855 860

Ser Ala Pro Ala Pro Ala Ala Ala Ala Pro Ala Pro Ala Ala Ala 865 870 875

Ala Pro Ala Pro Ala Ala Ala Ala Pro Ala Wal Ser Ser Glu Lieu 88O 885 89 O Leu Glu Lys Ala Glu Thr Val Val Met Glu Val Lieu Ala Ala Lys 895 9 OO 905 Thr Gly Tyr Glu Thr Asp Met Ile Glu Ser Asp Met Glu Leu Glu 910 915 920 Thr Glu Lieu Gly Ile Asp Ser Ile Lys Arg Val Glu Ile Leu Ser

Glu Val Glin Ala Met Lieu. Asn Val Glu Ala Lys Asp Wall Asp Ala

Leu Ser Arg Thr Arg Thr Val Gly Glu Val Val Asp Ala Met Lys

Ala Glu Ile Ala Gly Gly Ser Ala Pro Ala Pro Ala Ala Ala Ala

Pro Ala Pro Ala Ala Ala Ala Pro Ala Wal Ser Asn. Glu Lieu Lieu 985 99 O 995 Glu Lys Ala Glu Thr Val Val Met Glu Val Leu Ala Ala Lys Thr 2OOO 2005 2010 Gly Tyr Glu Thr Asp Met Ile Glu Ser Asp Met Glu Leu Glu Thr 2015 2020 2025 Glu Lieu Gly Ile Asp Ser Ile Lys Arg Val Glu Ile Leu Ser Glu 2030 2O35 20 40 Val Glin Ala Met Lieu. Asn Val Glu Ala Lys Asp Val Asp Ala Lieu 2O45 2O5 O 2O55 Ser Arg Thr Arg Thr Val Gly Glu Val Val Asp Ala Met Lys Ala 2060 2O65 2070 Glu Ile Ala Gly Gly Ser Ala Pro Ala Pro Ala Ala Ala Ala Pro 2O75 2080 2O85 Ala Ser Ala Gly Ala Ala Pro Ala Wall Lys Ile Asp Ser Val His 2O 90 2095 2100 Gly Ala Asp Cys Asp Asp Leu Ser Lieu Met His Ala Lys Val Val 2105 2110 2115 Asp Ile Arg Arg Pro Asp Glu Lieu. Ile Leu Glu Arg Pro Glu Asn 2 Arg Pro Val Lieu Val Val Asp Asp Gly Ser Glu Lieu. Thir Lieu Ala

Leu Val Arg Val Leu Gly Ala Cys Ala Val Val Leu Thr Phe Glu

Gly Lieu Gln Leu Ala Glin Arg Ala Gly Ala Ala Ala Ile Arg His 21 65 217 O 21.75 Val Lieu Ala Lys Asp Leu Ser Ala Glu Ser Ala Glu Lys Ala Ile 218O 21.85 21.90 Lys Glu Ala Glu Glin Arg Phe Gly Ala Leu Gly Gly Phe Ile Ser 21.95 2200 2205 Gln Glin Ala Glu Arg Phe Glu Pro Ala Glu Ile Leu Gly Phe Thr 2210 2215 2220 US 2002/0194641 A1 Dec. 19, 2002 52

-continued Leu Met Cys Ala Lys Phe Ala Lys Ala Ser Lieu. Cys Thr Ala Wall 2225 22.30 2235 Ala Gly Gly Arg Pro Ala Phe Ile Gly Val Ala Arg Lieu. Asp Gly 2240 22 45 225 O Arg Lieu Gly Phe Thir Ser Glin Gly Thr Ser Asp Ala Leu Lys Arg 2255 2260 2265 Ala Glin Arg Gly Ala Ile Phe Gly Lieu. Cys Lys Thr Ile Gly Lieu 2270 2275 228O Glu Trp Ser Glu Ser Asp Val Phe Ser Arg Gly Val Asp Ile Ala 2285 229 O 2295 Glin Gly Met His Pro Glu Asp Ala Ala Val Ala Ile Val Arg Glu 2300 2305 2310 Met Ala Cys Ala Asp Ile Arg Ile Arg Glu Val Gly Ile Gly Ala 2315 2320 2325 Asn Glin Glin Arg Cys Thir Ile Arg Ala Ala Lys Lieu Glu Thr Gly 2330 2335 234. O Asn Pro Glin Arg Glin Ile Ala Lys Asp Asp Wall Leu Lleu Val Ser 2345 235 O 2355 Gly Gly Ala Arg Gly Ile Thr Pro Lieu. Cys Ile Arg Glu Ile Thr 2360 2365 2370 Arg Glin Ile Ala Gly Gly Lys Tyr Ile Leu Lleu Gly Arg Ser Lys 2375 2380 2385 Val Ser Ala Ser Glu Pro Ala Trp Cys Ala Gly Ile Thr Asp Glu 2390 2395 2400 Lys Ala Val Glin Lys Ala Ala Thr Glin Glu Lieu Lys Arg Ala Phe 2405 2410 24.15 Ser Ala Gly Glu Gly Pro Llys Pro Thr Pro Arg Ala Val Thr Lys 2420 24.25 24.30 Leu Val Gly Ser Val Lieu Gly Ala Arg Glu Val Arg Ser Ser Ile 2435 24 40 2445 Ala Ala Ile Glu Ala Leu Gly Gly Lys Ala Ile Tyr Ser Ser Cys 2450 2455 2460 Asp Wall Asn. Ser Ala Ala Asp Val Ala Lys Ala Val Arg Asp Ala 2465 2470 24.75 Glu Ser Glin Leu Gly Ala Arg Val Ser Gly Ile Val His Ala Ser 24.80 2485 24.90 Gly Val Lieu Arg Asp Arg Lieu. Ile Glu Lys Lys Lieu Pro Asp Glu 2495 25 OO 25 O5 Phe Asp Ala Val Phe Gly. Thir Lys Val Thr Gly Leu Glu Asn Leu 2510 2515 252O Leu Ala Ala Val Asp Arg Ala Asn Lieu Lys His Met Wall Leu Phe 2525 25.30 2535 Ser Ser Lieu Ala Gly Phe His Gly Asn Val Gly Glin Ser Asp Tyr 2540 25.45 255 O Ala Met Ala Asn. Glu Ala Lieu. Asn Lys Met Gly Lieu Glu Lieu Ala 2555 2560 2565 Lys Asp Wal Ser Wall Lys Ser Ile Cys Phe Gly Pro Trp Asp Gly 2570 2575 258O Gly Met Val Thr Pro Glin Leu Lys Lys Glin Phe Glin Glu Met Gly 2585 2590 2595 Val Glin Ile Ile Pro Arg Glu Gly Gly Ala Asp Thr Val Ala Arg US 2002/0194641 A1 Dec. 19, 2002 53

-continued

26 OO 2605 26.10

Ile Val Leu Gly Ser Ser Pro Ala Glu Ile Teu Wall Gly Asn Trp 2615 262O 2625

Arg Thr Pro Ser Lys Lys Val Gly Ser Asp Thr Ile Thr Teu His 2630 2635 264 O

Arg Lys Ile Ser Ala Lys Ser Asn Pro Phe Teu Glu Asp His Wall 2645 26.50 2655

Ile Glin Gly Arg Arg Val Lieu Pro Met Thr Teu Ala Ile Gly Ser 2660 2665 2670

Leu Ala Glu Thr Cys Lieu Gly Lieu Phe Pro Gly Tyr Ser Teu Trp 2675 268O 2685

Ala Ile Asp Asp Ala Glin Leu Phe Gly Wall Thr Wall Asp Gly 2690 2695 27 OO

Asp Wall Asn. Cys Glu Val Thr Lieu Thr Pro Ser Thr Ala Pro Ser 27 O5 210 2715

Gly Arg Val Asn Val Glin Ala Thr Telu Thr Phe Ser Ser Gly 2720 2725 273 O

Lys Lieu Val Pro Ala Tyr Arg Ala Wall Ile Wall Teu Ser Asn Glin 2735 2740 2745

Gly Ala Pro Pro Ala Asn Ala Thr Met Glin Pro Pro Ser Teu Asp 2750 2755 276 O.

Ala Asp Pro Ala Leu Glin Gly Ser Wall Asp Gly Thr Teu 2765 2770 2775

Phe His Gly Pro Ala Phe Arg Gly Ile Asp Asp Wall Teu Ser 2780 2785 279 O

Thr Lys Ser Glin Lieu Val Ala Lys Ser Ala Wall Pro Gly Ser 2795 2800 2805

Asp Ala Ala Arg Gly Glu Phe Ala Thr Asp Thr Asp Ala His Asp 2810 2815 282O

Pro Phe Val Asn Asp Leu Ala Phe Glin Ala Met Teu Wall Trp Wall 2825 2830 2835

Arg Arg Thr Lieu Gly Glin Ala Ala Telu Pro Asn Ser Ile Glin Arg 284 O 284.5 285 O

Ile Val Gln His Arg Pro Val Pro Glin Asp Pro Phe Ile 2855 2.860 2865

Thr Lieu Arg Ser Asn Glin Ser Gly Gly His Ser Glin His His 2870 2875 2880

Ala Leu Glin Phe His Asn. Glu Glin Gly Asp Teu Phe Ile Asp Wall 2.885 2890 2.895

Glin Ala Ser Val Ile Ala Thr Asp Ser Teu Ala Phe 29 OO 2905 2.910

<210> SEQ ID NO 3 &2 11s LENGTH 6.177 &212> TYPE DNA <213> ORGANISM: Schizochytrium sp. &220s FEATURE <221 NAME/KEY: CDS <222> LOCATION: (1) . . (6.177) &223> OTHER INFORMATION:

<400 SEQUENCE: 3 atg gCC got cqg aat gtg agc goc gog cat gag atg cac gat gala aag Met Ala Ala Arg Asn. Wal Ser Ala Ala His Glu Met His Asp Glu Lys

US 2002/0194641 A1 Dec. 19, 2002 61

-continued

<210> SEQ ID NO 4 &2 11s LENGTH 2059 &212> TYPE PRT <213> ORGANISM: Schizochytrium sp. &220s FEATURE <221 NAME/KEY: misc feature <222> LOCATION: (370) ... (370) <223> OTHER INFORMATION: The 'Xaa' at location 370 stands for Leu. &220s FEATURE <221 NAME/KEY: misc feature <222> LOCATION: (371) . . (371) <223> OTHER INFORMATION: The 'Xaa' at location 371 stands for Ala or Wall.

<400 SEQUENCE: 4 Met Ala Ala Arg Asn. Wal Ser Ala Ala His Glu Met His Asp Glu Lys 1 5 10 15 Arg Ile Ala Val Val Gly Met Ala Val Glin Tyr Ala Gly Cys Lys Thr 2O 25 30 Lys Asp Glu Phe Trp Glu Val Lieu Met Asn Gly Lys Val Glu Ser Lys 35 40 45 Val Ile Ser Asp Lys Arg Lieu Gly Ser Asn Tyr Arg Ala Glu His Tyr 50 55 60 Lys Ala Glu Arg Ser Lys Tyr Ala Asp Thr Phe Cys Asn. Glu Thir Tyr 65 70 75 8O Gly Thr Lieu. Asp Glu Asn. Glu Ile Asp Asn. Glu His Glu Lieu Lleu Lieu 85 90 95 Asn Lieu Ala Lys Glin Ala Leu Ala Glu Thir Ser Wall Lys Asp Ser Thr 100 105 110 Arg Cys Gly Ile Val Ser Gly Cys Lieu Ser Phe Pro Met Asp Asn Lieu 115 120 125 Glin Gly Glu Lieu Lleu. Asn Val Tyr Glin Asn His Val Glu Lys Lys Lieu 130 135 1 4 0 Gly Ala Arg Val Phe Lys Asp Ala Ser His Trp Ser Glu Arg Glu Glin 145 15 O 155 160 Ser Asn Lys Pro Glu Ala Gly Asp Arg Arg Ile Phe Met Asp Pro Ala 1.65 170 175 Ser Phe Val Ala Glu Glu Lieu. Asn Lieu Gly Ala Lieu. His Tyr Ser Val 18O 185 190 Asp Ala Ala Cys Ala Thr Ala Leu Tyr Val Lieu Arg Lieu Ala Glin Asp 195 200 2O5 His Lieu Val Ser Gly Ala Ala Asp Wal Met Lieu. Cys Gly Ala Thr Cys 210 215 220 Leu Pro Glu Pro Phe Phe Ile Leu Ser Gly Phe Ser Thr Phe Glin Ala 225 230 235 240 Met Pro Val Gly Thr Gly Glin Asn Val Ser Met Pro Leu. His Lys Asp 245 250 255 Ser Glin Gly Leu Thr Pro Gly Glu Gly Gly Ser Ile Met Val Leu Lys 260 265 27 O Arg Lieu. Asp Asp Ala Ile Arg Asp Gly Asp His Ile Tyr Gly. Thir Lieu 275 280 285 Leu Gly Ala Asn. Wal Ser Asn. Ser Gly Thr Gly Lieu Pro Leu Lys Pro 29 O 295 3OO Leu Lleu Pro Ser Glu Lys Lys Cys Lieu Met Asp Thr Tyr Thr Arg Ile US 2002/0194641 A1 Dec. 19, 2002 62

-continued

305 310 315 320

Asn Wall His Pro His Ile Glin Wall Glu His Ala Thr Gly 325 330 335

Thr Pro Glin Gly Asp Arg Wall Glu Ile Asp Ala Wall Ala Cys Phe 340 345 350

Glu Gly Lys Wall Pro Arg Phe Gly Thr Thr Gly Asn Phe Gly His 355 360 365

Thr Xaa Xaa Ala Ala Gly Phe Ala Gly Met Lys Wall Teu Telu Ser 370 375

Met His Gly Ile Ile Pro Pro Thr Pro Gly Ile Asp Glu Thr 385 390 395 400

Met Asp Pro Teu Wall Wall Ser Gly Glu Ala Ile Pro Trp Pro Glu 405 410 415

Thr Asn Gly Glu Pro Arg Ala Gly Telu Ser Ala Phe Gly Phe Gly 420 425 430

Gly Thr Asn Ala His Ala Wall Phe Glu Glu His Asp Pro Ser Asn Ala 435 4 40 4 45

Ala Cys Thr Gly His Asp Ser Ile Ser Ala Teu Ser Ala Cys Gly 450 455 460

Gly Glu Ser Asn Met Arg Ile Ala Ile Thr Gly Met Asp Ala Thr Phe 465 470 475 480

Gly Ala Telu Gly Teu Asp Ala Phe Glu Arg Ala Ile Tyr Thr Gly 485 490 495

Ala His Gly Ala Ile Pro Teu Pro Glu Arg Trp Arg Phe Telu Gly 5 OO 505 510

Asp Lys Asp Phe Teu Asp Telu Gly Wall Ala Thr Pro His 515 525

Gly Cys Ile Glu Asp Wall Glu Wall Asp Phe Glin Arg Teu Thr 530 535 540

Pro Met Thr Pro Glu Asp Met Telu Telu Pro Glin Glin Teu Teu Ala Wall 545 550 555 560

Thr Thr Ile Asp Arg Ala Ile Telu Asp Ser Gly Met Gly Gly 565 570 575

Asn Wall Ala Wall Phe Wall Gly Telu Gly Thr Asp Teu Glu Teu Tyr 585 59 O

His Arg Ala Arg Wall Ala Teu Lys Glu Arg Wall Arg Pro Glu Ala Ser 595 600 605

Lys Telu Asn Asp Met Met Glin Ile Asn Asp Thr Ser 610 615 62O

Thr Ser Thr Ser Tyr Ile Gly Asn Telu Wall Ala Thr Wall Ser 625 630 635 640

Ser Glin Trp Gly Phe Thr Gly Pro Ser Phe Thr Ile Thr Glu Gly Asn 645 650 655

Asn Ser Wall Tyr Arg Ala Glu Telu Gly Lys Teu Teu Glu Thr 660 665 670

Gly Glu Wall Asp Gly Wall Wall Wall Ala Gly Wall Asp Teu Gly Ser 675 680 685

Ala Glu Asn Telu Wall Lys Ser Arg Arg Phe Lys Wall Ser Thr Ser 69 O. 695 7 OO

Asp Thr Pro Arg Ala Ser Phe Asp Ala Ala Ala Asp Gly Tyr Phe Wall 705 710 715 720 US 2002/0194641 A1 Dec. 19, 2002 63

-continued

Gly Glu Gly Cys Gly Ala Phe Val Lieu Lys Arg Glu Thir Ser Cys Thr 725 730 735 Lys Asp Asp Arg Ile Tyr Ala Cys Met Asp Ala Ile Val Pro Gly Asn 740 745 750 Val Pro Ser Ala Cys Lieu Arg Glu Ala Lieu. Asp Glin Ala Arg Val Lys 755 760 765 Pro Gly Asp Ile Glu Met Leu Glu Lieu Ser Ala Asp Ser Ala Arg His 770 775 78O Leu Lys Asp Pro Ser Val Lieu Pro Lys Glu Lieu. Thr Ala Glu Glu Glu 785 790 795 8OO Ile Gly Gly Lieu Glin Thr Ile Leu Arg Asp Asp Asp Lys Lieu Pro Arg 805 810 815 Asn Val Ala Thr Gly Ser Val Lys Ala Thr Val Gly Asp Thr Gly Tyr 820 825 830 Ala Ser Gly Ala Ala Ser Lieu. Ile Lys Ala Ala Lieu. Cys Ile Tyr Asn 835 840 845 Arg Tyr Lieu Pro Ser Asn Gly Asp Asp Trp Asp Glu Pro Ala Pro Glu 85 O 855 860 Ala Pro Trp Asp Ser Thr Leu Phe Ala Cys Gln Thr Ser Arg Ala Trp 865 870 875 88O Leu Lys Asn Pro Gly Glu Arg Arg Tyr Ala Ala Val Ser Gly Val Ser 885 890 895 Glu Thr Arg Ser Cys Tyr Ser Val Leu Leu Ser Glu Ala Glu Gly. His 9 OO 905 910 Tyr Glu Arg Glu Asn Arg Ile Ser Lieu. Asp Glu Glu Ala Pro Llys Lieu 915 920 925 Ile Val Lieu Arg Ala Asp Ser His Glu Glu Ile Leu Gly Arg Lieu. Asp 930 935 940 Lys Ile Arg Glu Arg Phe Leu Glin Pro Thr Gly Ala Ala Pro Arg Glu 945 950 955 96.O Ser Glu Lieu Lys Ala Glin Ala Arg Arg Ile Phe Leu Glu Lieu Lieu Gly 965 970 975 Glu Thir Lieu Ala Glin Asp Ala Ala Ser Ser Gly Ser Glin Lys Pro Leu 98O 985 99 O Ala Leu Ser Lieu Val Ser Thr Pro Ser Lys Lieu Glin Arg Glu Val Glu 995 10 OO OO 5 Leu Ala Ala Lys Gly Ile Pro Arg Cys Lieu Lys Met Arg Arg Asp O 10 O15 O20 Trp Ser Ser Pro Ala Gly Ser Arg Tyr Ala Pro Glu Pro Leu Ala O25 O3O O35 Ser Asp Arg Val Ala Phe Met Tyr Gly Glu Gly Arg Ser Pro Tyr O40 O45 O5 O Tyr Gly Ile Thr Glin Asp Ile His Arg Ile Trp Pro Glu Lieu. His O55 O60 O65 Glu Val Ile Asn. Glu Lys Thr Asn Arg Lieu Trp Ala Glu Gly Asp OFO O75 O8O Arg Trp Val Met Pro Arg Ala Ser Phe Lys Ser Glu Leu Glu Ser

Glin Glin Glin Glu Phe Asp Arg Asn Met Ile Glu Met Phe Arg Lieu 1 OO 105 110 US 2002/0194641 A1 Dec. 19, 2002 64

-continued Gly e Lieu. Thir Ser Ile Ala Phe Thr Asn Lieu Ala Arg Asp Wal 15 20 25 Lieu. Asn. Ile Thr Pro Lys Ala Ala Phe Gly Lieu Ser Leu Gly Glu 30 35 40 Ile Ser Met Ile Phe Ala Phe Ser Lys Lys Asn Gly Lieu. Ile Ser

Asp Gln Leu Thir Lys Asp Leu Arg Glu Ser Asp Val Trp Asn Lys

Ala Lieu Ala Val Glu Phe Asn Ala Lieu Arg Glu Ala Trp Gly Ile

Pro Glin Ser Val Pro Lys Asp Glu Phe Trp Gln Gly Tyr Ile Val

Arg Gly Thr Lys Glin Asp e Glu Ala Ala Ile Ala Pro Asp Ser

Lys Tyr Val Arg Leu Thr e Ile Asin Asp Ala Asn. Thir Ala Lieu

Ile Ser Gly Lys Pro Asp Ala Cys Lys Ala Ala Ile Ala Arg Lieu

Gly Gly Asn Ile Pro Ala Leu Pro Val Thr Glin Gly Met Cys Gly

His Cys Pro Glu Val Gly Pro Tyr Thr Lys Asp Ile Ala Lys Ile

His Ala Asn Lieu Glu Phe Pro Val Val Asp Gly Lieu. Asp Leu Trp 28O 285 29 O

Thr h Ile Asn. Glin Lys Arg Lieu Val Pro Arg Ala Thr Gly Ala 295 3OO 305 Lys Asp Glu Trp Ala Pro Ser Ser Phe Gly Glu Tyr Ala Gly Glin

Leu Tyr Glu Lys Glin Ala Asn Phe Pro Glin Ile Val Glu Thir Ile

Tyr Lys Glin Asn Tyr Asp Val Phe Val Glu Val Gly Pro Asn Asn

His Arg Ser Thr Ala Val Arg Thr Thr Leu Gly Pro Glin Arg Asn

His Leu Ala Gly Ala Ile Asp Lys Glin Asn. Glu Asp Ala Trp Thr

Thir Ile Val Lys Lieu Val Ala Ser Lieu Lys Ala His Leu Val Pro

Gly Val Thr Ile Ser Pro Leu Tyr His Ser Lys Leu Val Ala Glu

Ala Glin Ala Cys Tyr Ala Ala Lieu. Cys Lys Gly Glu Lys Pro Lys

Lys Asn Lys Phe Val Arg Lys Ile Glin Lieu. Asn Gly Arg Phe Asn

Ser Lys Ala Asp Pro Ile Ser Ser Ala Asp Leu Ala Ser Phe Pro

Pro Ala Asp Pro Ala Ile Glu Ala Ala Ile Ser Ser Arg Ile Met

Lys Pro Val Ala Pro Llys Phe Tyr Ala Arg Lieu. Asn. Ile Asp Glu 475 480 485 Glin Asp Glu Thr Arg Asp Pro Ile Lieu. Asn Lys Asp Asn Ala Pro US 2002/0194641 A1 Dec. 19, 2002 65

-continued

490 495 5 OO

Ser Ser Ser Ser Ser Ser Ser Ser Ser Ser Ser Ser Ser Ser Ser

Pro Ser Pro Ala Pro Ser Ala Pro Val Gln Lys Lys Ala Ala Pro

Ala Ala Glu Thir Lys Ala Wall Ala Ser Ala Asp Ala Leu Arg Ser

Ala Lieu Lleu. Asp Lieu. Asp Ser Met Leu Ala Leu Ser Ser Ala Ser

Ala Ser Gly Asn Lieu Val Glu Thir Ala Pro Ser Asp Ala Ser Val

Ile Val Pro Pro Cys Asn. Ile Ala Asp Leu Gly Ser Arg Ala Phe

Met Lys Thr Tyr Gly Val Ser Ala Pro Leu Tyr Thr Gly Ala Met

Ala Lys Gly Ile Ala Ser Ala Asp Leu Val Ile Ala Ala Gly Arg

Glin Gly Ile Leu Ala Ser Phe Gly Ala Gly Gly Lieu Pro Met Glin

Val Val Arg Glu Ser Ile Glu Lys Ile Glin Ala Ala Leu Pro Asn

Gly Pro Tyr Ala Val Asn Leu Ile His Ser Pro Phe Asp Ser Asn

Leu Glu Lys Gly Asn Val Asp Leu Phe Leu Glu Lys Gly Val Thr

Phe Wall Glu Ala Ser Ala Phe Met Thr Lieu. Thr Pro Glin Wal Wall

Arg Tyr Arg Ala Ala Gly Lieu. Thir Arg Asn Ala Asp Gly Ser Val

Asn. Ile Arg Asn Arg Ile Ile Gly Lys Val Ser Arg Thr Glu Lieu

Ala Glu Met Phe Met Arg Pro Ala Pro Glu His Leu Leu Gln Lys

Lieu. Ile Ala Ser Gly Glu Ile Asin Glin Glu Glin Ala Glu Lieu Ala

Arg Arg Val Pro Wall Ala Asp Asp Ile Ala Val Glu Ala Asp Ser

Gly Gly. His Thr Asp Asn Arg Pro Ile His Val Ile Leu Pro Leu

Ile Ile Asn Lieu Arg Asp Arg Lieu. His Arg Glu Cys Gly Tyr Pro

Ala Asn Lieu Arg Val Arg Val Gly Ala Gly Gly Gly Ile Gly Cys

Pro Glin Ala Ala Leu Ala Thr Phe Asn Met Gly Ala Ser Phe Ile 820 825 830 Val Thr Gly Thr Val Asin Glin Val Ala Lys Glin Ser Gly Thr Cys 835 840 845 Asp Asn Val Arg Lys Glin Leu Ala Lys Ala Thr Tyr Ser Asp Wall 850 855 860 Cys Met Ala Pro Ala Ala Asp Met Phe Glu Glu Gly Wall Lys Lieu 865 870 875 US 2002/0194641 A1 Dec. 19, 2002 66

-continued

Glin Val Lieu Lys Lys Gly Thr Met Phe Pro Ser Arg Ala Asn Lys 88O 885 89 O Leu Tyr Glu Leu Phe Cys Lys Tyr Asp Ser Phe Glu Ser Met Pro 895 9 OO 905 Pro Ala Glu Lieu Ala Arg Val Glu Lys Arg Ile Phe Ser Arg Ala 910 915 920 Leu Glu Glu Val Trp Asp Glu Thir Lys Asn. Phe Tyr Ile Asn Arg

Lieu. His Asn. Pro Glu Lys Ile Glin Arg Ala Glu Arg Asp Pro Lys

Leu Lys Met Ser Lieu. Cys Phe Arg Trp Tyr Lieu Ser Lieu Ala Ser

Arg Trp Ala Asn Thr Gly Ala Ser Asp Arg Wal Met Asp Tyr Glin

Val Trp Cys Gly Pro Ala Ile Gly Ser Phe Asn Asp Phe Ile Lys 985 99 O 995 Gly Thr Tyr Leu Asp Pro Ala Val Ala Asn Glu Tyr Pro Cys Val 2OOO 2005 2010 Val Glin Ile Asn Lys Glin Ile Leu Arg Gly Ala Cys Phe Leu Arg 2015 2020 2025 Arg Lieu Glu Ile Leu Arg Asn Ala Arg Lieu Ser Asp Gly Ala Ala 2030 2O35 20 40 Ala Leu Val Ala Ser Ile Asp Asp Thr Tyr Val Pro Ala Glu Lys 2O45 2O5 O 2O55

Teu

<210 SEQ ID NO 5 &2 11s LENGTH 4509 &212> TYPE DNA <213> ORGANISM: Schizochytrium sp. &220s FEATURE <221 NAME/KEY: CDS <222> LOCATION: (1) ... (4509) &223> OTHER INFORMATION:

<400 SEQUENCE: 5 atg gC g citc cqt gtc. aag acg aac aag aag coa toc togg gag at g acc 48 Met Ala Lieu Arg Wall Lys Thr Asn Lys Llys Pro Cys Trp Glu Met Thr 1 5 10 15 aag gag gag citg acc agc ggc aag acc gag gtg titc aac tat gag gala 96 Lys Glu Glu Leu Thir Ser Gly Lys Thr Glu Val Phe Asn Tyr Glu Glu 2O 25 30 citc ctic gag titc gca gag ggc gac atc goc aag gtc titc gga ccc gag 144 Leu Lieu Glu Phe Ala Glu Gly Asp Ile Ala Lys Val Phe Gly Pro Glu 35 40 45 titc gcc gtc atc gac aag tac cog cqc cqc gtg cqc ctd coc goc cqc 192 Phe Ala Val Ile Asp Lys Tyr Pro Arg Arg Val Arg Lieu Pro Ala Arg 50 55 60 gag tac citg citc gtg acc cqc gtc. acc citc atg gac goc gag gtc. aac 240 Glu Tyr Leu Leu Val Thr Arg Val Thr Leu Met Asp Ala Glu Val Asn 65 70 75 8O aac tac cqc gtc. g.gc gcc cqc at g g to acc gag tac gat citc ccc gtc 288 Asn Tyr Arg Val Gly Ala Arg Met Val Thr Glu Tyr Asp Leu Pro Val 85 90 95 aac gga gag citc. tcc gag ggc gga gac toc coc togg gcc gtc. citg gtc 336

US 2002/0194641 A1 Dec. 19, 2002 72

-continued

65 70 75

Asn Arg Wall Gly Ala Arg Met Wall Thr Glu Asp Teu Pro Wall 85 90 95

Asn Gly Glu Telu Ser Glu Gly Gly Asp Pro Ala Wall Telu Wall 100 105 110

Glu Ser Gly Glin Asp Teu Met Telu Ile Ser Met Ile Asp 115 120 125

Phe Glin Asn Glin Gly Asp Arg Wall Arg Teu Teu Asn Thr Telu 130 135 1 4 0

Thr Phe Gly Wall Ala His Glu Gly Glu Thr Teu Glu Ile 145 15 O 155 160

Arg Wall Thr Gly Phe Ala Arg Telu Asp Gly Gly Ile Ser Met Phe 1.65 170 175

Phe Phe Glu Tyr Asp Wall Asn Gly Arg Teu Teu Ile Glu Met 18O 185 190

Arg Asp Gly Ala Gly Phe Phe Thr Asn Glu Glu Teu Asp Ala Gly 195 200

Gly Wall Wall Phe Thr Arg Gly Asp Telu Ala Ala Arg Ala Lys Ile 210 215 220

Pro Glin Asp Wall Ser Pro Ala Wall Ala Pro Teu His Lys 225 230 235 240

Thr Telu Asn Glu Glu Met Glin Thr Teu Wall Asp Asp Trp 245 250 255

Ala Ser Wall Phe Gly Ser Asn Gly Met Pro Glu Ile Asn Tyr Lys 260 265 27 O

Teu Ala Arg Met Teu Met Ile Asp Arg Wall Thr Ser Ile Asp 275 280 285

His Lys Gly Gly Wall Gly Telu Gly Glin Teu Wall Gly Glu Lys Ile 29 O 295

Teu Glu Arg Asp His Trp Phe Pro His Phe Wall Glin 305 310 315 320

Wall Met Ala Gly Ser Teu Wall Ser Asp Gly Ser Glin Met Telu Lys 325 330 335

Met Met Ile Trp Teu Gly Telu His Telu Thr Thr Gly Pro Phe 340 345 350

Phe Arg Pro Wall Asn Gly His Pro Asn Wall Arg Cys Arg Gly Glin 355 360 365

Ile Ser Pro His Gly Lys Telu Wall Wall Met Glu Ile Lys Glu 370 375

Met Gly Phe Asp Glu Asp Asn Asp Pro Ala Ile Ala Asp Wall Asn 385 390 395 400

Ile Ile Asp Wall Asp Phe Glu Gly Glin Asp Phe Ser Teu Asp Arg 405 410 415

Ile Ser Asp Tyr Gly Gly Asp Telu Asn Ile Wall Wall Asp 420 425 430

Phe Gly Ile Ala Teu Met Glin Arg Ser Thr Asn Lys Asn 435 4 40 4 45

Pro Ser Wall Glin Pro Wall Phe Ala Asn Gly Ala Ala Thr Wall Gly 450 455 460

Pro Glu Ala Ser Ala Ser Ser Gly Ala Ser Ala Ser Ala Ser Ala 465 470 475 480 US 2002/0194641 A1 Dec. 19, 2002 73

-continued

Ala Pro Ala Lys Pro Ala Phe Ser Ala Asp Wall Leu Ala Pro Llys Pro 485 490 495 Val Ala Lieu Pro Glu His Ile Leu Lys Gly Asp Ala Leu Ala Pro Lys 5 OO 505 510 Glu Met Ser Trp His Pro Met Ala Arg Ile Pro Gly Asn Pro Thr Pro 515 52O 525 Ser Phe Ala Pro Ser Ala Tyr Lys Pro Arg Asn Ile Ala Phe Thr Pro 530 535 540 Phe Pro Gly Asn Pro Asn Asp Asn Asp His Thr Pro Gly Lys Met Pro 545 550 555 560 Leu Thir Trp Phe Asn Met Ala Glu Phe Met Ala Gly Lys Val Ser Met 565 570 575 Cys Lieu Gly Pro Glu Phe Ala Lys Phe Asp Asp Ser Asn. Thir Ser Arg 58O 585 59 O Ser Pro Ala Trp Asp Leu Ala Leu Val Thr Arg Ala Val Ser Val Ser 595 600 605 Asp Leu Lys His Val Asn Tyr Arg Asn. Ile Asp Lieu. Asp Pro Ser Lys 610 615 62O Gly Thr Met Val Gly Glu Phe Asp Cys Pro Ala Asp Ala Trp Phe Tyr 625 630 635 640 Lys Gly Ala Cys Asn Asp Ala His Met Pro Tyr Ser Ile Leu Met Glu 645 650 655 Ile Ala Leu Gln Thr Ser Gly Val Leu Thir Ser Val Leu Lys Ala Pro 660 665 670 Lieu. Thir Met Glu Lys Asp Asp Ile Leu Phe Arg Asn Lieu. Asp Ala Asn 675 680 685 Ala Glu Phe Val Arg Ala Asp Lieu. Asp Tyr Arg Gly Lys. Thir Ile Arg 69 O. 695 7 OO Asn Val Thr Lys Cys Thr Gly Tyr Ser Met Leu Gly Glu Met Gly Val 705 710 715 720 His Arg Phe Thr Phe Glu Leu Tyr Val Asp Asp Val Leu Phe Tyr Lys 725 730 735 Gly Ser Thr Ser Phe Gly Trp Phe Val Pro Glu Val Phe Ala Ala Glin 740 745 750 Ala Gly Lieu. Asp Asn Gly Arg Lys Ser Glu Pro Trp Phe Ile Glu Asn 755 760 765 Lys Val Pro Ala Ser Glin Val Ser Ser Phe Asp Val Arg Pro Asn Gly 770 775 78O Ser Gly Arg Thr Ala Ile Phe Ala Asn Ala Pro Ser Gly Ala Glin Lieu 785 790 795 8OO Asn Arg Arg Thr Asp Glin Gly Glin Tyr Lieu. Asp Ala Wall Asp Ile Val 805 810 815 Ser Gly Ser Gly Lys Lys Ser Lieu Gly Tyr Ala His Gly Ser Lys Thr 820 825 830 Val Asn Pro Asn Asp Trp Phe Phe Ser Cys His Phe Trp Phe Asp Ser 835 840 845 Val Met Pro Gly Ser Leu Gly Val Glu Ser Met Phe Gln Leu Val Glu 85 O 855 860 Ala Ile Ala Ala His Glu Asp Leu Ala Gly Lys Ala Arg His Cys Glin 865 870 875 88O US 2002/0194641 A1 Dec. 19, 2002 74

-continued Pro His Lieu. Cys Ala Arg Pro Arg Ala Arg Ser Ser Trp Llys Tyr Arg 885 890 895 Gly Glin Lieu. Thr Pro Lys Ser Lys Lys Met Asp Ser Glu Wal His Ile 9 OO 905 910 Val Ser Val Asp Ala His Asp Gly Val Val Asp Lieu Val Ala Asp Gly 915 920 925 Phe Leu Trp Ala Asp Ser Leu Arg Val Tyr Ser Val Ser Asn. Ile Arg 930 935 940 Val Arg Ile Ala Ser Gly Glu Ala Pro Ala Ala Ala Ser Ser Ala Ala 945 950 955 96.O Ser Val Gly Ser Ser Ala Ser Ser Val Glu Arg Thr Arg Ser Ser Pro 965 970 975 Ala Val Ala Ser Gly Pro Ala Glin Thir Ile Asp Leu Lys Glin Lieu Lys 98O 985 99 O Thr Glu Lieu Lieu Glu Lieu. Asp Ala Pro Leu Tyr Lieu Ser Glin Asp Pro 995 10 OO 1005 Thir Ser Gly Glin Lieu Lys Lys His Thr Asp Wall Ala Ser Gly Glin O 10 O15 O20 Ala Thir Ile Val Glin Pro Cys Thr Lieu Gly Asp Leu Gly Asp Arg

Ser Phe Met Glu Thr Tyr Gly Val Val Ala Pro Leu Tyr Thr Gly O40 O45 O5 O Ala Met Ala Lys Gly Ile Ala Ser Ala Asp Leu Val Ile Ala Ala

Gly Lys Arg Lys Ile Leu Gly Ser Phe Gly Ala Gly Gly Lieu Pro

Met His His Val Arg Ala Ala Leu Glu Lys Ile Glin Ala Ala Lieu

Pro Glin Gly Pro Tyr Ala Val Asn Leu Ile His Ser Pro Phe Asp

Ser Asn Lieu Glu Lys Gly Asn. Wall Asp Leu Phe Leu Glu Lys Gly

Wall h Wal Wall Glu Ala Ser Ala Phe Met Thr Lieu. Thr Pro Glin

Val Val Arg Tyr Arg Ala Ala Gly Lieu Ser Arg Asn Ala Asp Gly

Ser Val Asn. Ile Arg Asn Arg Ile Ile Gly Lys Wal Ser Arg Thr

Glu Leu Ala Glu Met Phe e Arg Pro Ala Pro Glu. His Leu Lieu

Glu Lys Lieu. Ile Ala Ser Gly Glu Ile Thr Glin Glu Glin Ala Glu

Leu Ala Arg Arg Val Pro Val Ala Asp Asp Ile Ala Val Glu Ala

Asp Ser Gly Gly. His Thr Asp Asn Arg Pro Ile His Val Ile Lieu

Pro Lieu. Ile Ile Asn Lieu Arg Asn Arg Lieu. His Arg Glu Cys Gly 235 240 245 Tyr Pro Ala His Lieu Arg Val Arg Val Gly Ala Gly Gly Gly Val 25 O 255 260 Gly Cys Pro Glin Ala Ala Ala Ala Ala Lieu. Thir Met Gly Ala Ala US 2002/0194641 A1 Dec. 19, 2002 75

-continued

265 27 O 275 Phe Ile Val Thr Gly Thr Val Asin Glin Val Ala Lys Glin Ser Gly

Thr Cys Asp Asn Val Arg Lys Glin Leu Ser Glin Ala Thr Tyr Ser

Asp Ile Cys Met Ala Pro Ala Ala Asp Met Phe Glu Glu Gly Val

Lys Lieu Glin Val Lieu Lys Lys Gly Thr Met Phe Pro Ser Arg Ala

Asn Lys Lieu. Tyr Glu Lieu Phe Cys Lys Tyr Asp Ser Phe Asp Ser

Met Pro Pro Ala Glu Lieu Glu Arg Ile Glu Lys Arg Ile Phe Lys

Arg Ala Leu Glin Glu Val Trp Glu Glu Thir Lys Asp Phe Tyr Ile

Asn Gly Lieu Lys Asn Pro Glu Lys Ile Glin Arg Ala Glu His Asp

Pro Llys Lieu Lys Met Ser Lieu. Cys Phe Arg Trp Tyr Lieu Gly Lieu 400 405 410 Ala Ser Arg Trp Ala Asn Met Gly Ala Pro Asp Arg Val Met Asp 415 420 425 Tyr Glin Val Trp Cys Gly Pro Ala Ile Gly Ala Phe Asn Asp Phe

Ile Lys Gly Thr Tyr Leu Asp Pro Ala Val Ser Asn Glu Tyr Pro

Cys Val Val Glin Ile Asn Lieu Glin Ile Leu Arg Gly Ala Cys Tyr 460 465 470 Leu Arg Arg Lieu. Asn Ala Lieu Arg Asn Asp Pro Arg Ile Asp Lieu 475 480 485 Glu Thr Glu Asp Ala Ala Phe Val Tyr Glu Pro Thr Asn Ala Leu 490 495 5 OO

<210 SEQ ID NO 7 &2 11s LENGTH 600 &212> TYPE DNA <213> ORGANISM: Schizochytrium sp. &220s FEATURE <221 NAME/KEY: CDS <222> LOCATION: (1) . . (600) &223> OTHER INFORMATION:

<400 SEQUENCE: 7 atg gC g gCC cqt Ctg Cag gag caa aag gga ggC gag atg gat acc cqc 48 Met Ala Ala Arg Lieu Glin Glu Gln Lys Gly Gly Glu Met Asp Thr Arg 1 5 10 15 att goc atc atc ggc atg tog gCC atc ctic coc toc ggc acg acc gtg 96 Ile Ala Ile Ile Gly Met Ser Ala Ile Leu Pro Cys Gly Thr Thr Val 2O 25 30 cgc gag to g tog gag acc atc cqc gcc ggc atc gac toc ct g tog gat 144 Arg Glu Ser Trp Glu Thir Ile Arg Ala Gly Ile Asp Cys Lieu Ser Asp 35 40 45 citc ccc gag gac cqc gtc gac gitg acg gC g tac titt gac coc gtc. aag 192 Leu Pro Glu Asp Arg Val Asp Val Thr Ala Tyr Phe Asp Pro Wall Lys 50 55 60 acc acc aag gac aag atc tac toc aag cqc ggit ggc titc att coc gag 240 US 2002/0194641 A1 Dec. 19, 2002 76

-continued Thir Thr Lys Asp Lys Ile Tyr Cys Lys Arg Gly Gly Phe Ile Pro Glu 65 70 75 8O tac gac titt gac goc cqc gag titc gga citc aac atg titc cag at g gag 288 Tyr Asp Phe Asp Ala Arg Glu Phe Gly Lieu. Asn Met Phe Glin Met Glu 85 90 95 gac to g gac goa aac cag acc atc. tcg citt citc aag gtc. aag gag gCC 336 Asp Ser Asp Ala Asn Glin Thr Ile Ser Lieu Lleu Lys Wall Lys Glu Ala 100 105 110 citc cag gac goc ggc atc gac goc citc ggc aag gala aag aag aac atc 384 Leu Glin Asp Ala Gly Ile Asp Ala Leu Gly Lys Glu Lys Lys Asn. Ile 115 120 125 ggc toc gtg citc ggc att goc ggc ggc caa aag toc agc cac gag titc 432 Gly Cys Val Lieu Gly Ile Gly Gly Gly Glin Lys Ser Ser His Glu Phe 130 135 1 4 0 tac to g cqc citt aat tat gtt gtc gtg gag aag gtc. citc cqc aag atg 480 Tyr Ser Arg Lieu. Asn Tyr Val Val Val Glu Lys Val Lieu Arg Lys Met 145 15 O 155 160 ggc at g ccc gag gag gac gito aag gtc goc gttc gala aag tac aag gCC 528 Gly Met Pro Glu Glu Asp Wall Lys Val Ala Val Glu Lys Tyr Lys Ala 1.65 170 175 aac titc ccc gag togg cqc citc gac toc titc cct ggc titc ctic ggc aac 576 Asn Phe Pro Glu Trp Arg Lieu. Asp Ser Phe Pro Gly Phe Leu Gly Asn 18O 185 190 gto acc gcc ggit cqc toc acc aac 600 Val Thr Ala Gly Arg Cys Thr Asn 195 200

<210 SEQ ID NO 8 &2 11s LENGTH 200 &212> TYPE PRT <213> ORGANISM: Schizochytrium sp. <400 SEQUENCE: 8 Met Ala Ala Arg Lieu Glin Glu Gln Lys Gly Gly Glu Met Asp Thr Arg 1 5 10 15 Ile Ala Ile Ile Gly Met Ser Ala Ile Leu Pro Cys Gly Thr Thr Val 2O 25 30 Arg Glu Ser Trp Glu Thir Ile Arg Ala Gly Ile Asp Cys Lieu Ser Asp 35 40 45 Leu Pro Glu Asp Arg Val Asp Val Thr Ala Tyr Phe Asp Pro Wall Lys 50 55 60 Thir Thr Lys Asp Lys Ile Tyr Cys Lys Arg Gly Gly Phe Ile Pro Glu 65 70 75 8O Tyr Asp Phe Asp Ala Arg Glu Phe Gly Lieu. Asn Met Phe Glin Met Glu 85 90 95 Asp Ser Asp Ala Asn Glin Thr Ile Ser Lieu Lleu Lys Wall Lys Glu Ala 100 105 110 Leu Glin Asp Ala Gly Ile Asp Ala Leu Gly Lys Glu Lys Lys Asn. Ile 115 120 125 Gly Cys Val Lieu Gly Ile Gly Gly Gly Glin Lys Ser Ser His Glu Phe 130 135 1 4 0 Tyr Ser Arg Lieu. Asn Tyr Val Val Val Glu Lys Val Lieu Arg Lys Met 145 15 O 155 160 Gly Met Pro Glu Glu Asp Wall Lys Val Ala Val Glu Lys Tyr Lys Ala 1.65 170 175

US 2002/0194641 A1 Dec. 19, 2002 78

-continued

225 230 235 240 gag aac goc to g tog gcc titc aag gac gtc atc. tcc aag gtc. tcc titc 768 Glu Asn Ala Ser Ser Ala Phe Lys Asp Wal Ile Ser Lys Wal Ser Phe 245 250 255 cgc acc ccc aag goc gag acc aag citc titc agc aac gtc. tct ggc gag 816 Arg Thr Pro Lys Ala Glu Thir Lys Leu Phe Ser Asn Val Ser Gly Glu 260 265 27 O acc tac coc acg gac goc cqc gag at g citt acg cag cac at g acc agc 864 Thr Tyr Pro Thr Asp Ala Arg Glu Met Leu Thr Gln His Met Thr Ser 275 280 285 agc gtc. aag titc citc acc cag gtc. c.gc aac atg cac cag gCC ggit gcg 912 Ser Val Lys Phe Leu Thr Glin Val Arg Asn Met His Glin Ala Gly Ala 29 O 295 3OO cgc atc titt gtc gag titc gga ccc aag cag gtg citc. tcc aag citt gtc 96.O Arg Ile Phe Val Glu Phe Gly Pro Lys Glin Val Leu Ser Lys Leu Val 305 310 315 320 to c gag acc ctic aag gat gac coc to g gtt gtc. acc gito tct gtc. aac OO 8 Ser Glu Thr Leu Lys Asp Asp Pro Ser Val Val Thr Val Ser Val Asn 325 330 335 ccg gCC to g g g c acg gat tog gac atc cag citc cqc gac gog gCC gtc O56 Pro Ala Ser Gly Thr Asp Ser Asp Ile Glin Leu Arg Asp Ala Ala Wal 340 345 350 cag citc gtt gtc gct ggc gito aac citt cag ggc titt gac aag togg gac 104 Glin Leu Val Val Ala Gly Val Asn Lieu Glin Gly Phe Asp Llys Trp Asp 355 360 365 gcc ccc gat gcc acc cgc atg cag gCC atc aag aag aag cqc act acc 152 Ala Pro Asp Ala Thr Arg Met Glin Ala Ile Lys Lys Lys Arg Thir Thr 370 375 38O citc cqc citt to g gcc gcc acc tac gtc. tcg gac aag acc aag aag gtc 200 Leu Arg Lieu Ser Ala Ala Thr Tyr Val Ser Asp Lys Thr Lys Llys Val 385 390 395 400 cgc gac goc goc atgaac gat ggc cqc toc gttc acc tac citc aag ggc 248 Arg Asp Ala Ala Met Asn Asp Gly Arg Cys Val Thr Tyr Lieu Lys Gly 405 410 415 gcc go a cog citc atc aag gCd cog gag coc 278 Ala Ala Pro Lieu. Ile Lys Ala Pro Glu Pro 420 425

<210> SEQ ID NO 10 <211& LENGTH 426 &212> TYPE PRT <213> ORGANISM: Schizochytrium sp. <400 SEQUENCE: 10 Asp Val Thir Lys Glu Ala Trp Arg Lieu Pro Arg Glu Gly Wal Ser Phe 1 5 10 15 Arg Ala Lys Gly Ile Ala Thr Asn Gly Ala Wall Ala Ala Lieu Phe Ser 2O 25 30 Gly Glin Gly Ala Glin Tyr Thr His Met Phe Ser Glu Val Ala Met Asn 35 40 45 Trp Pro Glin Phe Arg Glin Ser Ile Ala Ala Met Asp Ala Ala Glin Ser 50 55 60 Lys Wall Ala Gly Ser Asp Lys Asp Phe Glu Arg Val Ser Glin Val Lieu 65 70 75 8O Tyr Pro Arg Llys Pro Tyr Glu Arg Glu Pro Glu Glin Asn Pro Llys Lys 85 90 95 US 2002/0194641 A1 Dec. 19, 2002 79

-continued Ile Ser Leu Thr Ala Tyr Ser Glin Pro Ser Thr Leu Ala Cys Ala Leu 100 105 110 Gly Ala Phe Glu Ile Phe Lys Glu Ala Gly Phe Thr Pro Asp Phe Ala 115 120 125 Ala Gly His Ser Leu Gly Glu Phe Ala Ala Leu Tyr Ala Ala Gly Cys 130 135 1 4 0 Val Asp Arg Asp Glu Lieu Phe Glu Lieu Val Cys Arg Arg Ala Arg Ile 145 15 O 155 160 Met Gly Gly Lys Asp Ala Pro Ala Thr Pro Lys Gly Cys Met Ala Ala 1.65 170 175 Val Ile Gly Pro Asn Ala Glu Asn. Ile Llys Val Glin Ala Ala Asn Val 18O 185 190 Trp Leu Gly Asn Ser Asn Ser Pro Ser Glin Thr Val Ile Thr Gly Ser 195 200 2O5 Val Glu Gly Ile Glin Ala Glu Ser Ala Arg Lieu Gln Lys Glu Gly Phe 210 215 220 Arg Val Val Pro Leu Ala Cys Glu Ser Ala Phe His Ser Pro Glin Met 225 230 235 240 Glu Asn Ala Ser Ser Ala Phe Lys Asp Wal Ile Ser Lys Wal Ser Phe 245 250 255 Arg Thr Pro Lys Ala Glu Thir Lys Leu Phe Ser Asn Val Ser Gly Glu 260 265 27 O Thr Tyr Pro Thr Asp Ala Arg Glu Met Leu Thr Gln His Met Thr Ser 275 28O 285 Ser Val Lys Phe Leu Thr Glin Val Arg Asn Met His Glin Ala Gly Ala 29 O 295 3OO Arg Ile Phe Val Glu Phe Gly Pro Lys Glin Val Leu Ser Lys Leu Val 305 310 315 320 Ser Glu Thr Leu Lys Asp Asp Pro Ser Val Val Thr Val Ser Val Asn 325 330 335 Pro Ala Ser Gly Thr Asp Ser Asp Ile Glin Leu Arg Asp Ala Ala Wal 340 345 350 Glin Leu Val Val Ala Gly Val Asn Lieu Glin Gly Phe Asp Llys Trp Asp 355 360 365 Ala Pro Asp Ala Thr Arg Met Glin Ala Ile Lys Lys Lys Arg Thir Thr 370 375 38O Leu Arg Lieu Ser Ala Ala Thr Tyr Val Ser Asp Lys Thr Lys Llys Val 385 390 395 400 Arg Asp Ala Ala Met Asn Asp Gly Arg Cys Val Thr Tyr Lieu Lys Gly 405 410 415 Ala Ala Pro Lieu. Ile Lys Ala Pro Glu Pro 420 425

<210> SEQ ID NO 11 &2 11s LENGTH 5 &212> TYPE PRT <213> ORGANISM: Schizochytrium sp. &220s FEATURE <221 NAME/KEY: MISC FEATURE <222> LOCATION: (4) ... (4) <223> OTHER INFORMATION: X = any amino acid <400 SEQUENCE: 11 Gly His Ser Xaa Gly Dec. 19, 2002 80

-continued

1 5

<210> SEQ ID NO 12 &2 11s LENGTH 258 &212> TYPE DNA <213> ORGANISM: Schizochytrium sp. &220s FEATURE <221 NAME/KEY: CDS <222> LOCATION: (1) . . (258) &223> OTHER INFORMATION:

<400 SEQUENCE: 12 gct gtc. tcg aac gag citt citt gag aag gcc gag act gto gtc atg gag 48 Ala Val Ser Asn. Glu Lieu Lleu Glu Lys Ala Glu Thr Wall Wall Met Glu 1 5 10 15 gto: ctic goc goc aag acc ggc tac gag acc gac atg atc gag gct gac 96 Val Lieu Ala Ala Lys Thr Gly Tyr Glu Thr Asp Met Ile Glu Ala Asp 2O 25 30 atg gag ctic gag acc gag citc ggc att gac to c atc aag cgt. gtc gag 144 Met Glu Leu Glu Thr Glu Leu Gly Ile Asp Ser Ile Lys Arg Wall Glu 35 40 45 atc ctic toc gag gtc. cag gCo at g citc aat gto gag gcc aag gat gtc 192 Ile Leu Ser Glu Wall Glin Ala Met Leu Asn Wall Glu Ala Tys Asp Wall 50 55 60 gat goc ctic agc cqc act c go act gtt ggit gag gtt gto a.a. C. gcc atg 240 Asp Ala Leu Ser Arg Thr Arg Thr Val Gly Glu Wall Wall Asn Ala Met 65 70 75 8O aag gCC gag atc gct ggc 258 Lys Ala Glu Ile Ala Gly 85

<210> SEQ ID NO 13 &2 11s LENGTH: 86 &212> TYPE PRT <213> ORGANISM: Schizochytrium sp. <400 SEQUENCE: 13

Ala Val Ser Asn. Glu Lieu Lleu Glu Lys Ala Glu Thr Wall Wall Met Glu 1 5 10 15

Val Lieu Ala Ala Lys Thr Gly Tyr Glu Thr Asp Met Ile Glu Ala Asp 2O 25 30

Met Glu Leu Glu Thr Glu Leu Gly Ile Asp Ser Ile Lys Arg Wall Glu 35 40 45

Ile Leu Ser Glu Wall Glin Ala Met Leu Asn Wall Glu Ala Tys Wall 50 55 60

Asp Ala Leu Ser Arg Thr Arg Thr Val Gly Glu Wall Wall Asn Ala Met 65 70 75 8O Lys Ala Glu Ile Ala Gly 85

<210> SEQ ID NO 14 &2 11s LENGTH 5 &212> TYPE PRT <213> ORGANISM: Schizochytrium sp. <400 SEQUENCE: 14 Leu Gly Ile Asp Ser 1 5

US 2002/0194641 A1 Dec. 19, 2002 85

-continued Ser Asn Glin Ser Gly Gly His Ser Glin His Lys His Ala Leu Glin Phe 675 680 685 cac aac gag cag g g c gat citc titc att gat gttc cag gCt to g g to atc 2112 His Asn. Glu Glin Gly Asp Leu Phe Ile Asp Val Glin Ala Ser Val Ile 69 O. 695 7 OO gcc acg gac agc citt gcc titc 21.33 Ala Thr Asp Ser Leu Ala Phe 705 710

<210> SEQ ID NO 18 &2 11s LENGTH 711 &212> TYPE PRT <213> ORGANISM: Schizochytrium sp. <400 SEQUENCE: 18 Phe Gly Ala Leu Gly Gly Phe Ile Ser Glin Glin Ala Glu Arg Phe Glu 1 5 10 15 Pro Ala Glu Ile Leu Gly Phe Thr Lieu Met Cys Ala Lys Phe Ala Lys 2O 25 30 Ala Ser Lieu. Cys Thr Ala Val Ala Gly Gly Arg Pro Ala Phe Ile Gly 35 40 45 Val Ala Arg Lieu. Asp Gly Arg Lieu Gly Phe Thr Ser Glin Gly. Thir Ser 50 55 60 Asp Ala Lieu Lys Arg Ala Glin Arg Gly Ala Ile Phe Gly Lieu. Cys Lys 65 70 75 8O Thir Ile Gly Leu Glu Trp Ser Glu Ser Asp Val Phe Ser Arg Gly Val 85 90 95 Asp Ile Ala Glin Gly Met His Pro Glu Asp Ala Ala Wall Ala Ile Val 100 105 110 Arg Glu Met Ala Cys Ala Asp Ile Arg Ile Arg Glu Val Gly Ile Gly 115 120 125 Ala Asn Glin Glin Arg Cys Thr Ile Arg Ala Ala Lys Lieu Glu Thr Gly 130 135 1 4 0 Asn Pro Glin Arg Glin Ile Ala Lys Asp Asp Wall Leu Lleu Val Ser Gly 145 15 O 155 160 Gly Ala Arg Gly Ile Thr Pro Lieu. Cys Ile Arg Glu Ile Thr Arg Glin 1.65 170 175 Ile Ala Gly Gly Lys Tyr Ile Leu Lieu Gly Arg Ser Lys Wal Ser Ala 18O 185 190 Ser Glu Pro Ala Trp Cys Ala Gly Ile Thr Asp Glu Lys Ala Val Glin 195 200 2O5 Lys Ala Ala Thr Glin Glu Lieu Lys Arg Ala Phe Ser Ala Gly Glu Gly 210 215 220 Pro Llys Pro Thr Pro Arg Ala Val Thr Lys Leu Val Gly Ser Val Leu 225 230 235 240 Gly Ala Arg Glu Val Arg Ser Ser Ile Ala Ala Ile Glu Ala Leu Gly 245 250 255 Gly Lys Ala Ile Tyr Ser Ser Cys Asp Wall Asn. Ser Ala Ala Asp Wal 260 265 27 O Ala Lys Ala Val Arg Asp Ala Glu Ser Glin Leu Gly Ala Arg Val Ser 275 280 285 Gly Ile Val His Ala Ser Gly Val Lieu Arg Asp Arg Lieu. Ile Glu Lys 29 O 295 3OO US 2002/0194641 A1 Dec. 19, 2002 86

-continued Lys Leu Pro Asp Glu Phe Asp Ala Val Phe Gly Thr Lys Val Thr Gly 305 310 315 320 Leu Glu Asn Lieu Lleu Ala Ala Val Asp Arg Ala Asn Lieu Lys His Met 325 330 335 Val Leu Phe Ser Ser Leu Ala Gly Phe His Gly Asn Val Gly Glin Ser 340 345 350 Asp Tyr Ala Met Ala Asn. Glu Ala Lieu. Asn Lys Met Gly Lieu Glu Lieu 355 360 365 Ala Lys Asp Val Ser Wall Lys Ser Ile Cys Phe Gly Pro Trp Asp Gly 370 375 38O Gly Met Val Thr Pro Gln Leu Lys Lys Glin Phe Glin Glu Met Gly Val 385 390 395 400 Glin Ile Ile Pro Arg Glu Gly Gly Ala Asp Thr Val Ala Arg Ile Val 405 410 415 Leu Gly Ser Ser Pro Ala Glu Ile Leu Val Gly Asn Trp Arg Thr Pro 420 425 430 Ser Lys Llys Val Gly Ser Asp Thir Ile Thr Lieu. His Arg Lys Ile Ser 435 4 40 4 45 Ala Lys Ser Asn Pro Phe Leu Glu Asp His Val Ile Glin Gly Arg Arg 450 455 460 Val Leu Pro Met Thr Leu Ala Ile Gly Ser Leu Ala Glu Thr Cys Leu 465 470 475 480 Gly Lieu Phe Pro Gly Tyr Ser Leu Trp Ala Ile Asp Asp Ala Glin Lieu 485 490 495 Phe Lys Gly Val Thr Val Asp Gly Asp Val Asn Cys Glu Val Thr Leu 5 OO 505 510 Thr Pro Ser Thr Ala Pro Ser Gly Arg Val Asn Val Glin Ala Thr Leu 515 52O 525 Lys Thr Phe Ser Ser Gly Lys Leu Val Pro Ala Tyr Arg Ala Val Ile 530 535 540 Val Leu Ser Asn Gln Gly Ala Pro Pro Ala Asn Ala Thr Met Glin Pro 545 550 555 560 Pro Ser Lieu. Asp Ala Asp Pro Ala Leu Glin Gly Ser Val Tyr Asp Gly 565 570 575 Lys Thr Lieu Phe His Gly Pro Ala Phe Arg Gly Ile Asp Asp Val Lieu 58O 585 59 O Ser Cys Thr Lys Ser Glin Lieu Val Ala Lys Cys Ser Ala Val Pro Gly 595 600 605 Ser Asp Ala Ala Arg Gly Glu Phe Ala Thr Asp Thr Asp Ala His Asp 610 615 62O Pro Phe Val Asn Asp Leu Ala Phe Glin Ala Met Leu Val Trp Val Arg 625 630 635 640 Arg Thr Lieu Gly Glin Ala Ala Lieu Pro Asn. Ser Ile Glin Arg Ile Val 645 650 655 Gln His Arg Pro Val Pro Gln Asp Llys Pro Phe Tyr Ile Thr Leu Arg 660 665 670 Ser Asn Glin Ser Gly Gly His Ser Glin His Lys His Ala Leu Glin Phe 675 680 685 His Asn. Glu Glin Gly Asp Leu Phe Ile Asp Val Glin Ala Ser Val Ile 69 O. 695 7 OO Ala Thr Asp Ser Leu Ala Phe

US 2002/0194641 A1 Dec. 19, 2002 88

-continued

245 250 255 agc cag ggc ctic acc cc g g gt gag ggc ggc toc atc atg gtc. citc aag 816 Ser Glin Gly Leu Thr Pro Gly Glu Gly Gly Ser Ile Met Val Leu Lys 260 265 27 O cgt citc gat gait gcc atc cqc gac ggc gac cac att tac ggc acc citt 864 Arg Lieu. Asp Asp Ala Ile Arg Asp Gly Asp His Ile Tyr Gly. Thir Lieu 275 280 285 citc ggc gcc aat gtc. agc aac tocc ggc aca ggit citg ccc ctic aag coc 912 Leu Gly Ala Asn. Wal Ser Asn. Ser Gly Thr Gly Lieu Pro Leu Lys Pro 29 O 295 3OO citt citc ccc agc gag aaa aag togc citc at g gac acc tac acg cqc att 96.O Leu Lleu Pro Ser Glu Lys Lys Cys Lieu Met Asp Thr Tyr Thr Arg Ile 305 310 315 320 aac gtg cac cog cac aag att cag tac gtc gag togc cac goc acc ggc OO 8 Asn Val His Pro His Lys Ile Glin Tyr Val Glu Cys His Ala Thr Gly 325 330 335 acg ccc cag ggit gat cqt gtg gala atc gac goc gtc. aag gCC togc titt O56 Thr Pro Glin Gly Asp Arg Val Glu Ile Asp Ala Wall Lys Ala Cys Phe 340 345 350 gala ggc aag gtc. ccc cqt titc ggit acc aca aag ggc aac titt gga cac 104 Glu Gly Lys Val Pro Arg Phe Gly Thr Thr Lys Gly Asn Phe Gly. His 355 360 365 acc cts gyc goa gcc ggc titt goc ggt at g toc aag gtc ctic ctic to c 152 Thr Xaa Xaa Ala Ala Gly Phe Ala Gly Met Cys Lys Val Lieu Lleu Ser 370 375 38O atg aag cat ggc atc atc ccg ccc acc cc.g. g.gt atc gat gac gag acc 2OO Met Lys His Gly Ile Ile Pro Pro Thr Pro Gly Ile Asp Asp Glu Thr 385 390 395 400 aag at g g ac colt citc gtc gito tcc ggit gag gCC atc cca togg cca gag 248 Lys Met Asp Pro Leu Val Val Ser Gly Glu Ala Ile Pro Trp Pro Glu 405 410 415 acc aac ggc gag ccc aag cqc goc ggit citc. tcg gcc titt ggc titt ggit 296 Thr Asn Gly Glu Pro Lys Arg Ala Gly Lieu Ser Ala Phe Gly Phe Gly 420 425 430 ggc acc aac goc cat gcc gito titt gag gag cat gac ccc toc aac goc 344 Gly Thr Asn Ala His Ala Val Phe Glu Glu. His Asp Pro Ser Asn Ala 435 4 40 4 45 gcc to c 350 Ala Cys 450

<210> SEQ ID NO 20 &2 11s LENGTH 450 &212> TYPE PRT <213> ORGANISM: Schizochytrium sp. &220s FEATURE <221 NAME/KEY: misc feature <222> LOCATION: (370) ... (370) <223> OTHER INFORMATION: The 'Xaa' at location 370 stands for Leu. &220s FEATURE <221 NAME/KEY: misc feature <222> LOCATION: (371) . . (371) <223> OTHER INFORMATION: The 'Xaa' at location 371 stands for Ala or Wall.

<400 SEQUENCE: 20 Met Ala Ala Arg Asn. Wal Ser Ala Ala His Glu Met His Asp Glu Lys 1 5 10 15 Arg Ile Ala Val Val Gly Met Ala Val Glin Tyr Ala Gly Cys Lys Thr 2O 25 30 US 2002/0194641 A1 Dec. 19, 2002 89

-continued

Lys Asp Glu Phe Trp Glu Val Lieu Met Asn Gly Lys Val Glu Ser Lys 35 40 45 Val Ile Ser Asp Lys Arg Lieu Gly Ser Asn Tyr Arg Ala Glu His Tyr 50 55 60 Lys Ala Glu Arg Ser Lys Tyr Ala Asp Thr Phe Cys Asn. Glu Thir Tyr 65 70 75 8O Gly Thr Lieu. Asp Glu Asn. Glu Ile Asp Asn. Glu His Glu Lieu Lleu Lieu 85 90 95 Asn Lieu Ala Lys Glin Ala Leu Ala Glu Thir Ser Wall Lys Asp Ser Thr 100 105 110 Arg Cys Gly Ile Val Ser Gly Cys Lieu Ser Phe Pro Met Asp Asn Lieu 115 120 125 Glin Gly Glu Lieu Lleu. Asn Val Tyr Glin Asn His Val Glu Lys Lys Lieu 130 135 1 4 0 Gly Ala Arg Val Phe Lys Asp Ala Ser His Trp Ser Glu Arg Glu Glin 145 15 O 155 160 Ser Asn Lys Pro Glu Ala Gly Asp Arg Arg Ile Phe Met Asp Pro Ala 1.65 170 175 Ser Phe Val Ala Glu Glu Lieu. Asn Lieu Gly Ala Lieu. His Tyr Ser Val 18O 185 190 Asp Ala Ala Cys Ala Thr Ala Leu Tyr Val Lieu Arg Lieu Ala Glin Asp 195 200 2O5 His Leu Val Ser Gly Ala Ala Asp Val Met Leu Cys Gly Ala Thr Cys 210 215 220 Leu Pro Glu Pro Phe Phe Ile Leu Ser Gly Phe Ser Thr Phe Glin Ala 225 230 235 240 Met Pro Val Gly Thr Gly Glin Asn Val Ser Met Pro Leu. His Lys Asp 245 250 255 Ser Glin Gly Leu Thr Pro Gly Glu Gly Gly Ser Ile Met Val Leu Lys 260 265 27 O Arg Lieu. Asp Asp Ala Ile Arg Asp Gly Asp His Ile Tyr Gly. Thir Lieu 275 280 285 Leu Gly Ala Asn. Wal Ser Asn. Ser Gly Thr Gly Lieu Pro Leu Lys Pro 29 O 295 3OO Leu Lleu Pro Ser Glu Lys Lys Cys Lieu Met Asp Thr Tyr Thr Arg Ile 305 310 315 320 Asn Val His Pro His Lys Ile Glin Tyr Val Glu Cys His Ala Thr Gly 325 330 335 Thr Pro Glin Gly Asp Arg Val Glu Ile Asp Ala Wall Lys Ala Cys Phe 340 345 350 Glu Gly Lys Val Pro Arg Phe Gly Thr Thr Lys Gly Asn Phe Gly. His 355 360 365 Thr Xaa Xaa Ala Ala Gly Phe Ala Gly Met Cys Lys Val Lieu Lleu Ser 370 375 38O Met Lys His Gly Ile Ile Pro Pro Thr Pro Gly Ile Asp Asp Glu Thr 385 390 395 400 Lys Met Asp Pro Leu Val Val Ser Gly Glu Ala Ile Pro Trp Pro Glu 405 410 415 Thr Asn Gly Glu Pro Lys Arg Ala Gly Lieu Ser Ala Phe Gly Phe Gly 420 425 430

US 2002/0194641 A1 Dec. 19, 2002 92

-continued

Glin Arg Leu Arg Thr Pro Met Thr Pro Glu Asp Met Leu Leu Pro Glin 85 90 95 Glin Leu Lieu Ala Val Thr Thr Ile Asp Arg Ala Ile Lieu. Asp Ser Gly 100 105 110 Met Lys Lys Gly Gly Asn Val Ala Val Phe Val Gly Lieu Gly. Thir Asp 115 120 125 Leu Glu Lieu. Tyr Arg His Arg Ala Arg Val Ala Leu Lys Glu Arg Val 130 135 1 4 0 Arg Pro Glu Ala Ser Lys Lys Lieu. Asn Asp Met Met Glin Tyr Ile Asn 145 15 O 155 160 Asp Cys Gly. Thir Ser Thr Ser Tyr Thr Ser Tyr Ile Gly Asn Leu Val 1.65 170 175 Ala Thr Arg Val Ser Ser Gln Trp Gly Phe Thr Gly Pro Ser Phe Thr 18O 185 190 Ile Thr Glu Gly Asn. Asn. Ser Val Tyr Arg Cys Ala Glu Lieu Gly Lys 195 200 2O5 Tyr Leu Leu Glu Thr Gly Glu Val Asp Gly Val Val Val Ala Gly Val 210 215 220 Asp Lieu. Cys Gly Ser Ala Glu Asn Lieu. Tyr Val Lys Ser Arg Arg Phe 225 230 235 240 Lys Val Ser Thr Ser Asp Thr Pro Arg Ala Ser Phe Asp Ala Ala Ala 245 250 255 Asp Gly Tyr Phe Val Gly Glu Gly Cys Gly Ala Phe Val Leu Lys Arg 260 265 27 O Glu Thir Ser Cys Thr Lys Asp Asp Arg Ile Tyr Ala Cys Met Asp Ala 275 280 285 Ile Val Pro Gly Asn Val Pro Ser Ala Cys Lieu Arg Glu Ala Lieu. Asp 29 O 295 3OO Glin Ala Arg Val Lys Pro Gly Asp Ile Glu Met Leu Glu Lieu Ser Ala 305 310 315 320 Asp Ser Ala Arg His Lieu Lys Asp Pro Ser Val Lieu Pro Lys Glu Lieu 325 330 335 Thr Ala Glu Glu Glu Ile Gly Gly Lieu Glin Thr Ile Lieu Arg Asp Asp 340 345 350 Asp Llys Lieu Pro Arg Asn Val Ala Thr Gly Ser Wall Lys Ala Thr Val 355 360 365 Gly Asp Thr Gly Tyr Ala Ser Gly Ala Ala Ser Lieu. Ile Lys Ala Ala 370 375 38O Lieu. Cys Ile Tyr Asn Arg Tyr Lieu Pro Ser Asn Gly Asp Asp Trp Asp 385 390 395 400 Glu Pro Ala Pro Glu Ala Pro Trp Asp Ser Thr Leu Phe Ala Cys Glin 405 410 415 Thir Ser Arg Ala Trp Lieu Lys Asn Pro Gly Glu Arg Arg Tyr Ala Ala 420 425 430 Val Ser Gly Val Ser Glu Thir Arg Ser 435 4 40

<210> SEQ ID NO 23 &2 11s LENGTH 1500 &212> TYPE DNA <213> ORGANISM: Schizochytrium sp. &220s FEATURE

US 2002/0194641 A1 Dec. 19, 2002 95

-continued

50 55 60 Ala Glin Ala Arg Arg Ile Phe Leu Glu Lieu Lieu Gly Glu Thir Lieu Ala 65 70 75 8O Glin Asp Ala Ala Ser Ser Gly Ser Glin Lys Pro Leu Ala Leu Ser Lieu 85 90 95 Val Ser Thr Pro Ser Lys Lieu Glin Arg Glu Val Glu Lieu Ala Ala Lys 100 105 110 Gly Ile Pro Arg Cys Lieu Lys Met Arg Arg Asp Trp Ser Ser Pro Ala 115 120 125 Gly Ser Arg Tyr Ala Pro Glu Pro Leu Ala Ser Asp Arg Val Ala Phe 130 135 1 4 0 Met Tyr Gly Glu Gly Arg Ser Pro Tyr Tyr Gly Ile Thr Glin Asp Ile 145 15 O 155 160 His Arg Ile Trp Pro Glu Lieu. His Glu Val Ile Asn. Glu Lys Thr Asn 1.65 170 175 Arg Lieu Trp Ala Glu Gly Asp Arg Trp Val Met Pro Arg Ala Ser Phe 18O 185 190 Lys Ser Glu Lieu Glu Ser Glin Glin Glin Glu Phe Asp Arg Asn Met Ile 195 200 2O5 Glu Met Phe Arg Leu Gly Ile Leu Thir Ser Ile Ala Phe Thr Asn Leu 210 215 220 Ala Arg Asp Val Lieu. Asn. Ile Thr Pro Lys Ala Ala Phe Gly Lieu Ser 225 230 235 240 Leu Gly Glu Ile Ser Met Ile Phe Ala Phe Ser Lys Lys Asn Gly Lieu 245 250 255 Ile Ser Asp Gln Leu Thir Lys Asp Leu Arg Glu Ser Asp Val Trp Asn 260 265 27 O Lys Ala Lieu Ala Val Glu Phe Asn Ala Lieu Arg Glu Ala Trp Gly Ile 275 280 285 Pro Glin Ser Val Pro Lys Asp Glu Phe Trp Gln Gly Tyr Ile Val Arg 29 O 295 3OO Gly Thr Lys Glin Asp Ile Glu Ala Ala Ile Ala Pro Asp Ser Lys Tyr 305 310 315 320 Val Arg Lieu. Thir Ile Ile Asn Asp Ala Asn. Thr Ala Lieu. Ile Ser Gly 325 330 335 Lys Pro Asp Ala Cys Lys Ala Ala Ile Ala Arg Lieu Gly Gly Asn. Ile 340 345 350 Pro Ala Leu Pro Val Thr Glin Gly Met Cys Gly His Cys Pro Glu Val 355 360 365 Gly Pro Tyr Thr Lys Asp Ile Ala Lys Ile His Ala Asn Lieu Glu Phe 370 375 38O Pro Val Val Asp Gly Lieu. Asp Leu Trp Thir Thr Ile Asn Glin Lys Arg 385 390 395 400 Leu Val Pro Arg Ala Thr Gly Ala Lys Asp Glu Trp Ala Pro Ser Ser 405 410 415 Phe Gly Glu Tyr Ala Gly Glin Leu Tyr Glu Lys Glin Ala Asn. Phe Pro 420 425 430 Glin Ile Val Glu Thir Ile Tyr Lys Glin Asn Tyr Asp Val Phe Val Glu 435 4 40 4 45 Val Gly Pro Asn Asn His Arg Ser Thr Ala Val Arg Thr Thr Leu Gly 450 455 460 US 2002/0194641 A1 Dec. 19, 2002 96

-continued

Pro Glin Arg Asn His Leu Ala Gly Ala Ile Asp Lys Glin Asn. Glu Asp 465 470 475 480

Ala Trp Thir Thr Ile Wall Lys Lieu Val Ala Ser Lieu Lys Ala His Telu 485 490 495 Val Pro Gly Val 5 OO

SEQ ID NO 25 LENGTH 1530 TYPE DNA ORGANISM: Schizochytrium sp. FEATURE: NAME/KEY: CDS LOCATION: (1) . . (1530) OTHER INFORMATION:

<400 SEQUENCE: 25 citg citc gat citc gac agt atg citt gcig citg agc ticit gcc agit gcc to c 48 Leu Lieu. Asp Lieu. Asp Ser Met Leu Ala Leu Ser Ser Ala Ser Ala Ser 1 5 10 15 ggc aac citt gtt gag act gcg cct agc gac goc to g g to att gtg cc.g 96 Gly Asn Leu Val Glu Thr Ala Pro Ser Asp Ala Ser Val Ile Val Pro 2O 25 30 ccc toc aac att gcg gat citc ggc agc cqc goc titc atg aaa acg tac 144 Pro Cys Asn. Ile Ala Asp Leu Gly Ser Arg Ala Phe Met Lys Thr 35 40 45 ggit gtt tog gCg cct citg tac acg ggc gcc atg gCC aag ggc att gcc 192 Gly Val Ser Ala Pro Leu Tyr Thr Gly Ala Met Ala Lys Gly Ile Ala 50 55 60 tot gcg gac ctic gtc att gcc goc ggc cqc cag ggc atc citt gcg to c 240 Ser Ala Asp Leu Val Ile Ala Ala Gly Arg Glin Gly Ile Leu Ala Ser 65 70 75 8O titt ggc goc ggc gga citt coc at g cag gtt gtg cqt gag to c atc gaa 288 Phe Gly Ala Gly Gly Leu Pro Met Glin Val Val Arg Glu Ser Ile Glu 85 90 95 aag att cag gCC goc citg ccc aat ggc ccg tac got gtc. aac citt atc 336 Lys Ile Glin Ala Ala Lieu Pro Asn Gly Pro Tyr Ala Val Asn Lieu Ile 100 105 110 cat tot coc titt gac agc aac ctic gala aag ggc aat gtc gat citc titc. 384 His Ser Pro Phe Asp Ser Asn Lieu Glu Lys Gly Asn. Wall Asp Lieu Phe 115 120 125 citc gag aag ggt gtc. acc titt gtc. gag gCC to g g cc titt at g acg citc. 432 Leu Glu Lys Gly Val Thr Phe Val Glu Ala Ser Ala Phe Met Thr Telu 130 135 1 4 0 acc cc g cag gtc gtg cqg tac cqc gcg gCt ggc citc acg cqc aac gcc 480 Thr Pro Glin Val Val Arg Tyr Arg Ala Ala Gly Leu Thr Arg Asn Ala 145 15 O 155 160 gac ggc to g g to aac atc cqc aac cqt atc att goc aag gtc. tcg 528 Asp Gly Ser Val Asn. Ile Arg Asn Arg Ile Ile Gly Lys Wal Ser Arg 1.65 170 175 acc gag ctic goc gag atg titc at g c git cot gog ccc gag cac citt citt 576 Thr Glu Leu Ala Glu Met Phe Met Arg Pro Ala Pro Glu His Leu Telu 18O 185 190 cag aag citc att gct tcc ggc gag atc aac cag gag cag gCC gag citc. 624 Glin Lys Lieu. Ile Ala Ser Gly Glu Ile Asin Glin Glu Glin Ala Glu Telu 195 200 2O5 gcc cqc cqt gtt coc gtc gct gac gac atc gog gtc gaa got gac tog 672 Ala Arg Arg Val Pro Wall Ala Asp Asp Ile Ala Val Glu Ala Asp Ser