Quick viewing(Text Mode)

Schizochytrium Vs Shewanella

Schizochytrium Vs Shewanella

US 2008.0026440A1 (19) United States (12) Patent Application Publication (10) Pub. No.: US 2008/0026440 A1 Metz et al. (43) Pub. Date: Jan. 31, 2008

(54) PUFA POLYKETIDE SYNTHASE SYSTEMS (60) Provisional application No. 60/284.066, filed on Apr. AND USES THEREOF 16, 2001. Provisional application No. 60/298,796, filed on Jun. 15, 2001. Provisional application No. (75) Inventors: James G. Metz, Longmont, CO (US); 60/323.269, filed on Sep. 18, 2001. James H. Flatt, Colorado Springs, CO (US); Jerry M. Kuner, Longmont, CO Publication Classification (US); William R. Barclay, Boulder, (51) Int. Cl. CO (US) CI2P 7/64 (2006.01) C7H 2L/00 (2006.01) Correspondence Address: CI2N I/00 (2006.01) SHERIDAN ROSS PC CI2N I/2 (2006.01) 1560 BROADWAY CI2N 5/04 (2006.01) SUTE 12OO (52) U.S. Cl...... 435/134: 435/243; 435/252.3: DENVER, CO 80202 435/419; 536/23.2 (73) Assignee: MARTEK BIOSCIENCES CORPO (57) ABSTRACT RATION, Columbia, MD (US) The invention generally relates to polyunsaturated fatty acid (21) Appl. No.: 11/689,608 (PUFA) polyketide synthase (PKS) systems isolated from or derived from non-bacterial organisms, to homologues (22) Filed: Mar. 22, 2007 thereof, to isolated nucleic acid molecules and recombinant nucleic acid molecules encoding biologically active Related U.S. Application Data domains of such a PUFA PKS system, to genetically modi fied organisms comprising PUFA PKS systems, to methods (63) Continuation of application No. 10/124,800, filed on of making and using Such systems for the production of Apr. 16, 2002, now Pat. No. 7,247.461, and which is bioactive molecules of interest, and to novel methods for a continuation-in-part of application No. 09/231,899, identifying new bacterial and non-bacterial microorganisms filed on Jan. 14, 1999, now Pat. No. 6,566,583. having such a PUFA PKS system. Comparison of PKS Orfs/domains: Schizochytrium vs Shewanella

Orf6 Orf 7 Of 8

AI-S-S- DH OH ER

- 2 2 V Of B DH DH ER Of C Amino acid sequence homology = 20-35% = 35-50% > 50% Patent Application Publication Jan. 31, 2008 Sheet 1 of 3 US 2008/0026440 A1 É N O er

S g 2 CN (N f \d \d want

g

2 Patent Application Publication Jan. 31, 2008 Sheet 2 of 3 US 2008/0026440 A1

Z'OIH

Patent Application Publication Jan. 31, 2008 Sheet 3 of 3 US 2008/0026440 A1

unuifi?oo2|qøSSA(OZIL)oorsow SHRIOSXAVIQ?umpul??oo2??oç (SJ)IO9,8.Iedas)

JOVJIO8!4

~~~~.No. US 2008/0026440 A1 Jan. 31, 2008

PUFA POLYKETDE SYNTHASE SYSTEMIS AND 0005. In contrast to the Type I and II systems, in modular USES THEREOF PKS systems, each enzyme is used only once in the production of the end product. The domains are found in CROSS-REFERENCE TO RELATED very large proteins and the product of each reaction is passed APPLICATIONS on to another domain in the PKS protein. Additionally, in all 0001. This application is a continuation of U.S. applica of the PKS systems described above, if a carbon-carbon tion Ser. No. 10/124,800, filed Apr. 16, 2002, entitled “PUFA double bond is incorporated into the end product, it is always Polyketide Synthase Systems and Uses Thereof.” which in the trans configuration. claims the benefit of priority under 35 U.S.C. S 119(e) to: 0006. In the Type I and Type II PKS systems described U.S. Provisional Application Ser. No. 60/284,066, filed Apr. above, the same set of reactions is carried out in each cycle 16, 2001, entitled “A Polyketide Synthase System and Uses until the end product is obtained. There is no allowance for Thereof: U.S. Provisional Application Ser. No. 60/298.796, the introduction of unique reactions during the biosynthetic filed Jun. 15, 2001, entitled “A Polyketide Synthase System procedure. The modular PKS systems require huge proteins and Uses Thereof; and U.S. Provisional Application Ser. that do not utilize the economy of iterative reactions (i.e., a No. 60/323,269, filed Sep. 18, 2001, entitled “Thraus distinct domain is required for each reaction). Additionally, tochytrium PUFA PKS System and Uses Thereof. This as stated above, carbon-carbon double bonds are introduced application is also a continuation-in-part of copending U.S. in the trans configuration in all of the previously described application Ser. No. 09/231,899, filed Jan. 14, 1999, entitled PKS systems. “Schizochytrium PKS Genes”. Each of the above-identified patent applications is incorporated herein by reference in its 0007 Polyunsaturated fatty acids (PUFAs) are critical entirety. components of membrane lipids in most (Lau ritzen et al., Prog. Lipid Res. 401 (2001); McConn et al., 0002 This application does not claim the benefit of Plant J. 15, 521 (1998)) and are precursors of certain priority from U.S. application Ser. No. 09/090,793, filed Jun. hormones and signaling molecules (Heller et al., Drugs 55. 4, 1998, now U.S. Pat. No. 6,140,486, although U.S. appli 487 (1998); Creelman et al., Annu. Rev. Plant Physiol. Plant cation Ser. No. 09/090,793 is incorporated herein by refer Mol. Biol. 48, 355 (1997)). Known pathways of PUFA ence in its entirety. synthesis involve the processing of saturated 16:0 or 18:0 FIELD OF THE INVENTION fatty acids (the abbreviation X;Y indicates an acyl group containing X carbon atoms and Y cis double bonds; double 0003. This invention relates to polyunsaturated fatty acid bond positions of PUFAs are indicated relative to the methyl (PUFA) polyketide synthase (PKS) systems from microor carbon of the fatty acid chain (c)3 or co6) with systematic ganisms, including eukaryotic organisms, such as Thraus methylene interruption of the double bonds) derived from tochytrid microorganisms. More particularly, this invention fatty acid synthase (FAS) by elongation and aerobic desatu relates to nucleic acids encoding non-bacterial PUFA PKS ration reactions (Sprecher, Curr. Opin. Clin. Nutr. Metab. systems, to non-bacterial PUFA PKS systems, to genetically Care 2, 135 (1999); Parker-Barnes et al., Proc. Natl. Acad. modified organisms comprising non-bacterial PUFA PKS Sci. USA 97,8284 (2000); Shanklin et al., Annu. Rev. Plant systems, and to methods of making and using the non Physiol. Plant Nol. Biol. 49, 611 (1998)). Starting from bacterial PUFA PKS systems disclosed herein. This inven acetyl-CoA, the synthesis of DHA requires approximately tion also relates to a method to identify bacterial and 30 distinct enzyme activities and nearly 70 reactions includ non-bacterial microorganisms comprising PUFA PKS sys ing the four repetitive steps of the fatty acid synthesis cycle. temS. Polyketide synthases (PKSs) carry out some of the same BACKGROUND OF THE INVENTION reactions as FAS (Hopwood et al., Annu. Rev. Genet. 24, 37 (1990); Bentley et al., Annu. Rev. Microbiol. 53,411 (1999)) 0004 Polyketide synthase (PKS) systems are generally and use the same Small protein (or domain), acyl carrier known in the art as enzyme complexes derived from fatty protein (ACP), as a covalent attachment site for the growing acid synthase (FAS) systems, but which are often highly carbon chain. However, in these enzyme systems, the com modified to produce specialized products that typically show plete cycle of reduction, dehydration and reduction seen in little resemblance to fatty acids. Researchers have attempted FAS is often abbreviated so that a highly derivatized carbon to exploit polyketide synthase (PKS) systems that have been chain is produced, typically containing many keto- and described in the literature as falling into one of three basic types, typically referred to as: Type II, Type I and modular. hydroxy-groups as well as carbon-carbon double bonds in The Type II system is characterized by separable proteins, the trans configuration. The linear products of PKSs are each of which carries out a distinct enzymatic reaction. The often cyclized to form complex biochemicals that include enzymes work in concert to produce the end product and antibiotics and many other secondary products (Hopwood et each individual enzyme of the system typically participates al., (1990) supra; Bentley et al., (1999), supra; Keating et al., several times in the production of the end product. This type Curr. Opin. Chem. Biol. 3, 598 (1999)). of system operates in a manner analogous to the fatty acid 0008 Very long chain PUFAs such as docosahexaenoic synthase (FAS) systems found in plants and bacteria. Type acid (DHA; 22:6c)3) and eicosapentaenoic acid (EPA: I PKS systems are similar to the Type II system in that the 20:5c)3) have been reported from several of marine enzymes are used in an iterative fashion to produce the end bacteria, including Shewanella sp (Nichols et al., Curr. Op. product. The Type I differs from Type II in that enzymatic Biotechnol. 10, 240 (1999); Yazawa, Lipids 31, S (1996): activities, instead of being associated with separable pro DeLong et al., Appl. Environ. Microbiol. 51, 730 (1986)). teins, occur as domains of larger proteins. This system is Analysis of a genomic fragment (cloned as plasmid pEPA) analogous to the Type I FAS systems found in animals and from Shewanella sp. strain SCRC2738 led to the identifi fungi. cation of five open reading frames (Orfs), totaling 20 Kb. US 2008/0026440 A1 Jan. 31, 2008 that are necessary and sufficient for EPA production in E. Synthesis of EPA was found only in the high-speed pellet coli (Yazawa, (1996), supra). Several of the predicted pro fraction, indicating that EPA synthesis can occur without tein domains were homologues of FAS enzymes, while other reliance on enzymes of the E. coli FAS or on soluble regions showed no homology to proteins of known function. intermediates (such as 16:0-ACP) from the cytoplasmic On the basis of these observations and biochemical studies, fraction. Since the proteins encoded by the Shewanella EPA it was suggested that PUFA synthesis in Shewanella genes are not particularly hydrophobic, restriction of EPA involved the elongation of 16- or 18-carbon fatty acids synthesis activity to this fraction may reflect a requirement produced by FAS and the insertion of double bonds by for a membrane-associated acyl acceptor molecule. Addi undefined aerobic desaturases (Watanabe et al., J. Biochem. tionally, in contrast to the E. coli FAS, EPA synthesis is 122, 467 (1997)). The recognition that this hypothesis was specifically NADPH-dependent and does not require incorrect began with a reexamination of the protein NADH. All these results are consistent with the pl. PA genes sequences encoded by the five Shewanella Orfs. At least 11 encoding a multifunctional PKS that acts independently of regions within the five Orfs were identifiable as putative FAS, elongase, and desaturase activities to synthesize EPA enzyme domains (See Metz et al., Science 293:290–293 directly. It is likely that the PKS pathway for PUFA synthesis (2001)). When compared with sequences in the gene data that has been identified in Shewanella is widespread in bases, seven of these were more strongly related to PKS marine bacteria. Genes with high homology to the proteins than to FAS proteins. Included in this group were Shewanella gene cluster have been identified in Photobac domains putatively encoding malonyl-CoA:ACP acyltrans terium profiindum (Allen et al., Appli. Environ. Microbiol. ferase (MAT), 3-ketoacyl-ACP synthase (KS).3-ketoacyl 65:1710 (1999)) and in Moritella marina (Vibrio marinus) ACP reductase (KR), acyltransferase (AT), phosphopanteth (Tanaka et al., Biotechnol. Lett. 21:939 (1999)). eine transferase, chain length (or chain initiation) factor 0010. The biochemical and molecular-genetic analyses (CLF) and a highly unusual cluster of six ACP domains (i.e., performed with Shewanella provide compelling evidence the presence of more than two clustered ACP domains has for polyketide synthases that are capable of synthesizing not previously been reported in PKS or FAS sequences). PUFAs from malonyl-CoA. A complete scheme for synthe However, three regions were more highly homologous to sis of EPA by the Shewanella PKS has been proposed. The bacterial FAS proteins. One of these was similar to the identification of protein domains homologous to the E. coli newly-described Triclosan-resistant enoyl reductase (ER) FabA protein, and the observation that bacterial EPA syn from Streptococcus pneumoniae (Heath et al., Nature 406, thesis occurs anaerobically, provide evidence for one 145 (2000)); comparison of ORF8 peptide with the S. mechanism wherein the insertion of cis double bonds occurs pneumoniae enoyl reductase using the LALIGN program through the action of a bifunctional dehydratase/2-trans, (matrix, BLOSUM50; gap opening penalty, -10; elongation 3-cis isomerase (DH/2.3I). In E. coli, condensation of the penalty -1) indicated 49% similarity over a 386aa overlap). 3-cis acyl intermediate with malonyl-ACP requires a par Two regions were homologues of the E. coli FAS protein ticular ketoacyl-ACP synthase and this may provide a ratio encoded by fabA, which catalyzes the synthesis of trans-2- male for the presence of two KS in the Shewanella gene decenoyl-ACP and the reversible isomerization of this prod cluster (in Orf5 and Orf7). However, the PKS cycle extends uct to cis-3-decenoyl-ACP (Heath et al., J. Biol. Chem., 271, the chain in two-carbon increments while the double bonds 27795 (1996)). On this basis, it seemed likely that at least in the EPA product occur at every third carbon. This dis some of the double bonds in EPA from Shewanella are junction can be solved if the double bonds at C-14 and C-8 introduced by a dehydrase-isomerase mechanism catalyzed of EPA are generated by 2-trans, 2-cis isomerization (DH/ by the FabA-like domains in Orf7. 2.2T) followed by incorporation of the cis double bond into 0009 Anaerobically-grown E. coli cells harboring the the elongating fatty acid chain. The enzymatic conversion of pEPA plasmid accumulated EPA to the same levels as a trans double bond to the cis configuration without bond aerobic cultures (Metz et al., 2001, Supra), indicating that an migration is known to occur, for example, in the synthesis of oxygen-dependent desaturase is not involved in EPA syn 11-cis-retinal in the retinoid cycle (Jang et al., J. Biol. Chem. thesis. When pRPA was introduced into a fabB mutant of E. 275, 28.128 (2000)). Although such an enzyme function has coli, which is unable to synthesize monounsaturated fatty not yet been identified in the Shewanella PKS, it may reside acids and requires unsaturated fatty acids for growth, the in one of the unassigned protein domains. resulting cells lost their fatty acid auxotrophy. They also accumulated much higher levels of EPA than other pEPA 0011. The PKS pathways for PUFA synthesis in containing strains, suggesting that EPA competes with Shewanella and another marine bacteria, Vibrio marinus, are endogenously produced monounsaturated fatty acids for described in detail in U.S. Pat. No. 6,140,486 (issued from transfer to glycerolipids. When pRPA-containing E. coli U.S. application Ser. No. 09/090,793, filed Jun. 4, 1998, cells were grown in the presence of "C-acetate, the data entitled “Production of Polyunsaturated Fatty Acids by from C-NMR analysis of purified EPA from the cells Expression of Polyketide-like Synthesis Genes in Plants', confirmed the identity of EPA and provided evidence that which is incorporated herein by reference in its entirety). this fatty acid was synthesized from acetyl-CoA and malo 0012 Polyunsaturated fatty acids (PUFAs) are consid nyl-CoA (See Metz et al., 2001, supra). A cell-free homo ered to be useful for nutritional, pharmaceutical, industrial, genate from pEPA-containing fabB cells synthesized both and other purposes. An expansive supply of PUFAs from EPA and saturated fatty acids from C-malonyl-CoA. natural sources and from chemical synthesis are not Sufi When the homogenate was separated into a 200,000xg cient for commercial needs. Because a number of separate high-speed pellet and a membrane-free Supernatant fraction, desaturase and elongase enzymes are required for fatty acid saturated fatty acid synthesis was confined to the Superna synthesis from linoleic acid (LA, 18:2 A 9, 12), common in tant, consistent with the soluble nature of the Type II FAS most plant species, to the more Saturated and longer chain enzymes (Magnuson et al., Microbiol. Rev. 57, 522 (1993)). PUFAs, engineering plant host cells for the expression of US 2008/0026440 A1 Jan. 31, 2008

PUFAs such as EPA and DHA may require expression of five polyketide synthase (PKS) system; (d) a nucleic acid or six separate enzyme activities to achieve expression, at sequence encoding an amino acid sequence that is at least least for EPA and DHA. Additionally, for production of about 60% identical to the amino acid sequence of (b). useable quantities of Such PUFAs, additional engineering wherein the amino acid sequence has a biological activity of efforts may be required, for instance the down regulation of at least one domain of a polyunsaturated fatty acid (PUFA) enzymes competing for Substrate, engineering of higher polyketide synthase (PKS) system; and (e) a nucleic acid enzyme activities such as by mutagenesis or targeting of sequence that is fully complementary to the nucleic acid enzymes to plastid organelles. Therefore it is of interest to sequence of (a), (b), (c), or (d). In alternate aspects, the obtain genetic material involved in PUFA biosynthesis from nucleic acid sequence encodes an amino acid sequence that species that naturally produce these fatty acids and to is at least about 70% identical, or at least about 80% express the isolated material alone or in combination in a identical, or at least about 90% identical, or is identical to: heterologous system which can be manipulated to allow (1) at least 500 consecutive amino acids of an amino acid production of commercial quantities of PUFAs. sequence selected from the group consisting of SEQ ID 0013 The discovery of a PUFA PKS system in marine NO:2, SEQ ID NO:4, and SEQ ID NO:6; and/or (2) a bacteria such as Shewanella and Vibrio marinus (see U.S. nucleic acid sequence encoding an amino acid sequence that Pat. No. 6,140,486, ibid.) provides a resource for new is at least about 70% identical to an amino acid sequence methods of commercial PUFA production. However, these selected from the group consisting of: SEQ ID NO:8, SEQ marine bacteria have limitations which will ultimately ID NO:10, SEQID NO:13, SEQID NO:18, SEQID NO:20, restrict their usefulness on a commercial level. First, SEQ ID NO:22, SEQ ID NO:24, SEQ ID NO:26, SEQ ID although U.S. Pat. No. 6,140.486 discloses that the marine NO:28, SEQ ID NO:30, and SEQID NO:32. In a preferred bacteria PUFA PKS systems can be used to genetically embodiment, the nucleic acid sequence encodes an amino modify plants, the marine bacteria naturally live and grow in acid sequence chosen from: SEQ ID NO:2, SEQ ID NO:4, cold marine environments and the enzyme systems of these SEQ ID NO:6, SEQ ID NO:8, SEQ ID NO:10, SEQ ID bacteria do not function well above 30°C. In contrast, many NO:13, SEQ ID NO:18, SEQ ID NO:20, SEQ ID NO:22, crop plants, which are attractive targets for genetic manipu SEQ ID NO:24, SEQ ID NO:26, SEQ ID NO:28, SEQ ID lation using the PUFA PKS system, have normal growth NO:30, SEQID NO:32 and/or biologically active fragments conditions at temperatures above 30° C. and ranging to thereof. In one aspect, the nucleic acid sequence is chosen higher than 40°C. Therefore, the marine bacteria PUFAPKS from: SEQID NO:1, SEQID NO:3, SEQID NO:5, SEQID system is not predicted to be readily adaptable to plant NO:7, SEQID NO:9, SEQID NO:12, SEQID NO:17, SEQ expression under normal growth conditions. Moreover, the ID NO:19, SEQID NO:21, SEQID NO:23, SEQID NO:25, marine bacteria PUFA PKS genes, being from a bacterial SEQ ID NO:27, SEQ ID NO:29, and SEQ ID NO:31. Source, may not be compatible with the genomes of eukary 0016. Another embodiment of the present invention otic host cells, or at least may require significant adaptation relates to a recombinant nucleic acid molecule comprising to work in eukaryotic hosts. Additionally, the known marine the nucleic acid molecule as described above, operatively bacteria PUFA PKS systems do not directly produce trig linked to at least one transcription control sequence. In lycerides, whereas direct production of triglycerides would another embodiment, the present invention relates to a be desirable because triglycerides are a lipid storage product recombinant cell transfected with the recombinant nucleic in microorganisms and as a result can be accumulated at acid molecule described directly above. very high levels (e.g. up to 80-85% of cell weight) in 0017. Yet another embodiment of the present invention microbial/plant cells (as opposed to a “structural lipid relates to a genetically modified microorganism, wherein the product (e.g. phospholipids) which can generally only accu microorganism expresses a PKS system comprising at least mulate at low levels (e.g. less than 10-15% of cell weight at one biologically active domain of a polyunsaturated fatty maximum)). acid (PUFA) polyketide synthase (PKS) system. The at least one domain of the PUFA PKS system is encoded by a 0014) Therefore, there is a need in the art for other PUFA nucleic acid sequence chosen from: (a) a nucleic acid PKS systems having greater flexibility for commercial use. sequence encoding at least one domain of a polyunsaturated fatty acid (PUFA) polyketide synthase (PKS) system from a SUMMARY OF THE INVENTION Thraustochytrid microorganism; (b) a nucleic acid sequence 00.15 One embodiment of the present invention relates to encoding at least one domain of a PUFA PKS system from an isolated nucleic acid molecule comprising a nucleic acid a microorganism identified by the screening method of the sequence chosen from: (a) a nucleic acid sequence encoding present invention; (c) a nucleic acid sequence encoding an an amino acid sequence selected from the group consisting amino acid sequence selected from the group consisting of of: SEQ ID NO:2, SEQ ID NO:4, SEQ ID NO:6, and SEQID NO:2, SEQID NO:4, SEQID NO:6, and biologi biologically active fragments thereof; (b) a nucleic acid cally active fragments thereof. (d) a nucleic acid sequence sequence encoding an amino acid sequence selected from encoding an amino acid sequence selected from the group the group consisting of: SEQID NO:8, SEQID NO:10, SEQ consisting of: SEQ ID NO:8, SEQ ID NO:10, SEQ ID ID NO:13, SEQID NO:18, SEQID NO:20, SEQID NO:22, NO:13, SEQ ID NO:18, SEQ ID NO:20, SEQ ID NO:22, SEQ ID NO:24, SEQ ID NO:26, SEQ ID NO:28, SEQ ID SEQ ID NO:24, SEQ ID NO:26, SEQ ID NO:28, SEQ ID NO:30, SEQ ID NO:32, and biologically active fragments NO:30, SEQ ID NO:32, and biologically active fragments thereof (c) a nucleic acid sequence encoding an amino acid thereof: (e) a nucleic acid sequence encoding an amino acid sequence that is at least about 60% identical to at least 500 sequence that is at least about 60% identical to at least 500 consecutive amino acids of the amino acid sequence of (a), consecutive amino acids of an amino acid sequence selected wherein the amino acid sequence has a biological activity of from the group consisting of: SEQID NO:2, SEQID NO:4, at least one domain of a polyunsaturated fatty acid (PUFA) and SEQ ID NO:6; wherein the amino acid sequence has a US 2008/0026440 A1 Jan. 31, 2008 biological activity of at least one domain of a PUFA PKS recombinantly express at least one nucleic acid molecule system; and, (f) a nucleic acid sequence encoding an amino encoding at least one biologically active domain from a acid sequence that is at least about 60% identical to an amino bacterial PUFA PKS system, from a Type I PKS system, acid sequence selected from the group consisting of: SEQID from a Type II PKS system, and/or from a modular PKS NO:8, SEQ ID NO:10, SEQ ID NO:13, SEQ ID NO:18, system. SEQ ID NO:20, SEQ ID NO:22, SEQ ID NO:24, SEQ ID NO:26, SEQ ID NO:28, SEQ ID NO:30, and SEQ ID 0020. In another aspect of this embodiment, the micro NO:32; wherein the amino acid sequence has a biological organism endogenously expresses a PUFA PKS system activity of at least one domain of a PUFA PKS system. In comprising the at least one biologically active domain of a this embodiment, the microorganism is genetically modified PUFA PKS system, and wherein the genetic modification to affect the activity of the PKS system. The screening comprises expression of a recombinant nucleic acid mol method of the present invention referenced in (b) above ecule selected from the group consisting of a recombinant comprises: (i) selecting a microorganism that produces at nucleic acid molecule encoding at least one biologically least one PUFA; and, (ii) identifying a microorganism from active domain from a second PKS system and a recombinant (i) that has an ability to produce increased PUFAs under nucleic acid molecule encoding a protein that affects the dissolved oxygen conditions of less than about 5% of activity of the PUFA PKS system. Preferably, the recombi saturation in the fermentation medium, as compared to nant nucleic acid molecule comprises any one of the nucleic production of PUFAs by the microorganism under dissolved acid sequences described above. oxygen conditions of greater than 5% of Saturation, and 0021. In one aspect of this embodiment, the recombinant more preferably 10% of saturation, and more preferably nucleic acid molecule encodes a phosphopantetheline trans greater than 15% of Saturation, and more preferably greater ferase. In another aspect, the recombinant nucleic acid than 20% of saturation in the fermentation medium. molecule comprises a nucleic acid sequence encoding at 0018. In one aspect, the microorganism endogenously least one biologically active domain from a bacterial PUFA expresses a PKS system comprising the at least one domain PKS system, from a type I PKS system, from a type II PKS of the PUFA PKS system, and wherein the genetic modifi system, and/or from a modular PKS system. cation is in a nucleic acid sequence encoding the at least one 0022. In another aspect of this embodiment, the micro domain of the PUFA PKS system. For example, the genetic organism is genetically modified by transfection with a modification can be in a nucleic acid sequence that encodes recombinant nucleic acid molecule encoding the at least one a domain having a biological activity of at least one of the domain of a polyunsaturated fatty acid (PUFA) polyketide following proteins: malonyl-CoA:ACP acyltransferase synthase (PKS) system. Such a recombinant nucleic acid (MAT). B-keto acyl-ACP synthase (KS), ketoreductase molecule can include any recombinant nucleic acid mol (KR), acyltransferase (AT), FabA-like B-hydroxyacyl-ACP ecule comprising any of the nucleic acid sequences dehydrase (DH), phosphopantetheline transferase, chain described above. In one aspect, the microorganism has been length factor (CLF), acyl carrier protein (ACP), enoyl ACP further genetically modified to recombinantly express at reductase (ER), an enzyme that catalyzes the synthesis of least one nucleic acid molecule encoding at least one bio trans-2-decenoyl-ACP, an enzyme that catalyzes the revers logically active domain from a bacterial PUFA PKS system, ible isomerization of trans-2-decenoyl-ACP to cis-3-de from a Type I PKS system, from a Type II PKS system, or cenoyl-ACP, and an enzyme that catalyzes the elongation of from a modular PKS system. cis-3-decenoyl-ACP to cis-vaccenic acid. In one aspect, the genetic modification is in a nucleic acid sequence that 0023 Yet another embodiment of the present invention encodes an amino acid sequence selected from the group relates to a genetically modified plant, wherein the plant has consisting of: (a) an amino acid sequence that is at least been genetically modified to recombinantly express a PKS about 70% identical, and preferably at least about 80% system comprising at least one biologically active domain of identical, and more preferably at least about 90% identical a polyunsaturated fatty acid (PUFA) polyketide synthase and more preferably identical to at least 500 consecutive (PKS) system. The domain can be encoded by any of the amino acids of an amino acid sequence selected from the nucleic acid sequences described above. In one aspect, the group consisting of: SEQID NO:2, SEQID NO:4, and SEQ plant has been further genetically modified to recombinantly ID NO:6; wherein the amino acid sequence has a biological express at least one nucleic acid molecule encoding at least activity of at least one domain of a PUFA PKS system; and, one biologically active domain from a bacterial PUFA PKS (b) an amino acid sequence that is at least about 70% system, from a Type I PKS system, from a Type II PKS identical, and preferably at least about 80% identical, and system, and/from a modular PKS system. more preferably at least about 90% identical and more 0024. Another embodiment of the present invention preferably identical to an amino acid sequence selected from relates to a method to identify a microorganism that has a the group consisting of: SEQID NO:8, SEQID NO:10, SEQ polyunsaturated fatty acid (PUFA) polyketide synthase ID NO:13, SEQID NO:18, SEQID NO:20, SEQID NO:22, (PKS) system. The method includes the steps of: (a) select SEQ ID NO:24, SEQ ID NO:26, SEQ ID NO:28, SEQ ID ing a microorganism that produces at least one PUFA; and, NO:30, and SEQ ID NO:32; wherein the amino acid (b) identifying a microorganism from (a) that has an ability sequence has a biological activity of at least one domain of to produce increased PUFAs under dissolved oxygen con a PUFA PKS system. ditions of less than about 5% of saturation in the fermenta 0019. In one aspect, the genetically modified microor tion medium, as compared to production of PUFAs by the ganism is a Thraustochytrid, which can include, but is not microorganism under dissolved oxygen conditions of greater limited to, a Thraustochytrid from a chosen from than 5% of saturation, more preferably 10% of saturation, Schizochytrium and Thraustochytrium. In another aspect, more preferably greater than 15% of saturation and more the microorganism has been further genetically modified to preferably greater than 20% of saturation in the fermentation US 2008/0026440 A1 Jan. 31, 2008

medium. A microorganism that produces at least one PUFA includes the step of culturing under conditions effective to and has an ability to produce increased PUFAs under dis produce the bioactive molecule a genetically modified solved oxygen conditions of less than about 5% of saturation organism that expresses a PKS system comprising at least is identified as a candidate for containing a PUFA PKS one biologically active domain of a polyunsaturated fatty system. acid (PUFA) polyketide synthase (PKS) system. The domain 0025. In one aspect of this embodiment, step (b) com of the PUFA PKS system is encoded by any of the nucleic prises identifying a microorganism from (a) that has an acid sequences described above. ability to produce increased PUFAs under dissolved oxygen 0030. In one aspect of this embodiment, the organism conditions of less than about 2% of Saturation, and more endogenously expresses a PKS system comprising the at preferably under dissolved oxygen conditions of less than least one domain of the PUFA PKS system, and the genetic about 1% of saturation, and even more preferably under modification is in a nucleic acid sequence encoding the at dissolved conditions of about 0% of saturation. least one domain of the PUFAPKS system. For example, the 0026. In another aspect of this embodiment, the micro genetic modification can change at least one product pro organism selected in (a) has an ability to consume bacteria duced by the endogenous PKS system, as compared to a by phagocytosis. In another aspect, the microorganism wild-type organism. selected in (a) has a simple fatty acid profile. In another 0031. In another aspect of this embodiment, the organism aspect, the microorganism selected in (a) is a non-bacterial endogenously expresses a PKS system comprising the at microorganism. In another aspect, the microorganism least one biologically active domain of the PUFA PKS selected in (a) is a . In another aspect, the micro system, and the genetic modification comprises transfection organism selected in (a) is a member of the order Thraus of the organism with a recombinant nucleic acid molecule tochytriales. In another aspect, the microorganism selected selected from the group consisting of a recombinant nucleic in (a) has an ability to produce PUFAs at a temperature acid molecule encoding at least one biologically active greater than about 15° C., and preferably greater than about domain from a second PKS system and a recombinant 20°C., and more preferably greater than about 25°C., and nucleic acid molecule encoding a protein that affects the even more preferably greater than about 30° C. In another activity of the PUFA PKS system. For example, the genetic aspect, the microorganism selected in (a) has an ability to modification can change at least one product produced by produce bioactive compounds (e.g., lipids) of interest at the endogenous PKS system, as compared to a wild-type greater than 5% of the dry weight of the organism, and more organism. preferably greater than 10% of the dry weight of the organ ism. In yet another aspect, the microorganism selected in (a) 0032. In yet another aspect of this embodiment, the contains greater than 30% of its total fatty acids as C14:0. organism is genetically modified by transfection with a C16:0 and C16:1 while also producing at least one long recombinant nucleic acid molecule encoding the at least one chain fatty acid with three or more unsaturated bonds, and domain of the polyunsaturated fatty acid (PUFA) polyketide preferably, the microorganism selected in (a) contains synthase (PKS) system. In another aspect, the organism greater than 40% of its total fatty acids as C14:0, C16:0 and produces a polyunsaturated fatty acid (PUFA) profile that C16:1 while also producing at least one long chain fatty acid differs from the naturally occurring organism without a with three or more unsaturated bonds. In another aspect, the genetic modification. In another aspect, the organism endog microorganism selected in (a) contains greater than 30% of enously expresses a non-bacterial PUFA PKS system, and its total fatty acids as C14:0, C16:0 and C16:1 while also wherein the genetic modification comprises Substitution of a producing at least one long chain fatty acid with four or domain from a different PKS system for a nucleic acid more unsaturated bonds, and more preferably while also sequence encoding at least one domain of the non-bacterial producing at least one long chain fatty acid with five or more PUFA PKS system. unsaturated bonds. 0033. In yet another aspect, the organism endogenously expresses a non-bacterial PUFA PKS system that has been 0027. In another aspect of this embodiment, the method modified by transfecting the organism with a recombinant further comprises step (c) of detecting whether the organism nucleic acid molecule encoding a protein that regulates the comprises a PUFA PKS system. In this aspect, the step of chain length of fatty acids produced by the PUFA PKS detecting can include detecting a nucleic acid sequence in system. For example, the recombinant nucleic acid molecule the microorganism that hybridizes under Stringent condi encoding a protein that regulates the chain length of fatty tions with a nucleic acid sequence encoding an amino acid acids can replace a nucleic acid sequence encoding a chain sequence from a Thraustochytrid PUFA PKS system. Alter length factor in the non-bacterial PUFA PKS system. In natively, the step of detecting can include detecting a nucleic another aspect, the protein that regulates the chain length of acid sequence in the organism that is amplified by oligo fatty acids produced by the PUFA PKS system is a chain nucleotide primers from a nucleic acid sequence from a length factor. In another aspect, the protein that regulates the Thaustochytrid PUFA PKS system. chain length of fatty acids produced by the PUFA PKS 0028. Another embodiment of the present invention system is a chain length factor that directs the synthesis of relates to a microorganism identified by the screening C20 units. method described above, wherein the microorganism is 0034. In one aspect, the organism expresses a non-bac genetically modified to regulate the production of molecules terial PUFA PKS system comprising a genetic modification by the PUFA PKS system. in a domain chosen from: a domain encoding FabA-like 0029. Yet another embodiment of the present invention B-hydroxyacyl-ACP dehydrase (DH) domain and a domain relates to a method to produce a bioactive molecule that is encoding B-ketoacyl-ACP synthase (KS), wherein the modi produced by a polyketide synthase system. The method fication alters the ratio of long chain fatty acids produced by US 2008/0026440 A1 Jan. 31, 2008

the PUFA PKS system as compared to in the absence of the mulation can include, but is not limited to: an anti-inflam modification. In one aspect, the modification comprises matory formulation, a chemotherapeutic agent, an active Substituting a DH domain that does not possess isomeriza excipient, an osteoporosis drug, an anti-depressant, an anti tion activity for a FabA-like B-hydroxyacyl-ACP dehydrase convulsant, an anti-Heliobactor pylori drug, a drug for (DH) in the non-bacterial PUFA PKS system. In another treatment of neurodegenerative disease, a drug for treatment aspect, the modification is selected from the group consist of degenerative liver disease, an antibiotic, and a cholesterol ing of a deletion of all or a part of the domain, a Substitution lowering formulation. In one aspect, the endproduct is used of a homologous domain from a different organism for the to treat a condition selected from the group consisting of domain, and a mutation of the domain. chronic inflammation, acute inflammation, gastrointestinal disorder, cancer, cachexia, cardiac restenosis, neurodegen 0035) In another aspect, the organism expresses a PKS erative disorder, degenerative disorder of the liver, blood system and the genetic modification comprises Substituting lipid disorder, osteoporosis, osteoarthritis, autoimmune dis a Fab A-like B-hydroxy acyl-ACP dehydrase (DH) domain ease, preeclampsia, preterm birth, age related maculopathy, from a PUFA PKS system for a DH domain that does not posses isomerization activity. pulmonary disorder, and peroxisomal disorder. 0041. Yet another embodiment of the present invention 0036). In another aspect, the organism expresses a non relates to a method to produce a humanized animal milk, bacterial PUFA PKS system comprising a modification in an comprising genetically modifying milk-producing cells of a enoyl-ACP reductase (ER) domain, wherein the modifica tion results in the production of a different compound as milk-producing animal with at least one recombinant nucleic compared to in the absence of the modification. For acid molecule comprising a nucleic acid sequence encoding example, the modification can be selected from the group at least one biologically active domain of a PUFA PKS consisting of a deletion of all or a part of the ER domain, a system. The domain of the PUFAPKS system is encoded by substitution of an ER domain from a different organism for any of the nucleic acid sequences described above. the ER domain, and a mutation of the ER domain. 0042. Yet another embodiment of the present invention 0037. In one aspect, the bioactive molecule produced by relates to a method produce a recombinant microbe, com the present method can include, but is not limited to, an prising genetically modifying microbial cells to express at anti-inflammatory formulation, a chemotherapeutic agent, least one recombinant nucleic acid molecule comprising a an active excipient, an osteoporosis drug, an anti-depressant, comprising a nucleic acid sequence encoding at least one an anti-convulsant, an anti-Heliobactor pylori drug, a drug biologically active domain of a PUFA PKS system. The for treatment of neurodegenerative disease, a drug for treat domain of the PUFA PKS system is encoded by any of the ment of degenerative liver disease, an antibiotic, and a nucleic acid sequences described above. cholesterol lowering formulation. In one aspect, the bioac 0043. Yet another embodiment of the present invention tive molecule is a polyunsaturated fatty acid (PUFA). In relates to a recombinant host cell which has been modified another aspect, the bioactive molecule is a molecule includ to express a polyunsaturated fatty acid (PUFA) polyketide ing carbon-carbon double bonds in the cis configuration. In synthase (PKS) system, wherein the PKS catalyzes both another aspect, the bioactive molecule is a molecule includ iterative and non-iterative enzymatic reactions. The PUFA ing a double bond at every third carbon. PKS system comprises: (a) at least two enoyl ACP-reductase 0038. In one aspect of this embodiment, the organism is (ER) domains; (b) at least six acyl carrier protein (ACP) a microorganism, and in another aspect, the organism is a domains; (c) at least two B-keto acyl-ACP synthase (KS) plant. domains; (d) at least one acyltransferase (AT) domain; (e) at least one ketoreductase (KR) domain; (f) at least two FabA 0039. Another embodiment of the present invention like B-hydroxy acyl-ACP dehydrase (DH) domains; (g) at relates to a method to produce a plant that has a polyun least one chain length factor (CLF) domain; and (h) at least saturated fatty acid (PUFA) profile that differs from the one malonyl-CoA:ACP acyltransferase (MAT) domain. In naturally occurring plant, comprising genetically modifying one aspect, the PUFA PKS system is a eukaryotic PUFA cells of the plant to express a PKS system comprising at least PKS system. In another aspect, the PUFA PKS system is an one recombinant nucleic acid molecule comprising a nucleic algal PUFA PKS system, and preferably a Thraustochytri acid sequence encoding at least one biologically active ales PUFAPKS system, which can include, but is not limited domain of a PUFA PKS system. The domain of the PUFA to, a Schizochytrium PUFA PKS system or a Thraus PKS system is encoded by any of the nucleic acid sequences tochytrium PUFA PKS system. described above. 0044) In this embodiment, the PUFA PKS system can be 0040 Yet another embodiment of the present invention expressed in a prokaryotic host cell or in a eukaryotic host relates to a method to modify an endproduct containing at cell. In one aspect, the host cell is a plant cell. Accordingly, least one fatty acid, comprising adding to the endproduct an one embodiment of the invention is a method to produce a oil produced by a recombinant host cell that expresses at product containing at least one PUFA, comprising growing least one recombinant nucleic acid molecule comprising a a plant comprising Such a plant cell under conditions effec nucleic acid sequence encoding at least one biologically tive to produce the product. The host cell is a microbial cell active domain of a PUFA PKS system. The domain of a and in this case, one embodiment of the present invention is PUFA PKS system is encoded by any of the nucleic acid a method to produce a product containing at least one PUFA, sequences described above. In one aspect, the endproduct is comprising culturing a culture containing Such a microbial selected from the group consisting of a dietary Supplement, cell under conditions effective to produce the product. In one a food product, a pharmaceutical formulation, a humanized aspect, the PKS system catalyzes the direct production of animal milk, and an infant formula. A pharmaceutical for triglycerides. US 2008/0026440 A1 Jan. 31, 2008

0045. Yet another embodiment of the present invention endogenously (naturally) contains such a PKS system makes relates to a genetically modified microorganism comprising PUFAs using this system). The PUFAs referred to herein are a polyunsaturated fatty acid (PUFA) polyketide synthase preferably polyunsaturated fatty acids with a carbon chain (PKS) system, wherein the PKS catalyzes both iterative and length of at least 16 carbons, and more preferably at least 18 non-iterative enzymatic reactions. The PUFA PKS system carbons, and more preferably at least 20 carbons, and more comprises: (a) at least two enoyl ACP-reductase (ER) preferably 22 or more carbons, with at least 3 or more double domains; (b) at least six acyl carrier protein (ACP) domains; bonds, and preferably 4 or more, and more preferably 5 or (c) at least two B-ketoacyl-ACP synthase (KS) domains; (d) more, and even more preferably 6 or more double bonds, at least one acyltransferase (AT) domain; (e) at least one wherein all double bonds are in the cis configuration. It is an ketoreductase (KR) domain; (f) at least two Fab A-like object of the present invention to find or create via genetic B-hydroxyacyl-ACP dehydrase (DH) domains; (g) at least manipulation or manipulation of the endproduct, PKS sys one chain length factor (CLF) domain; and (h) at least one tems which produce polyunsaturated fatty acids of desired malonyl-CoA:ACP acyltransferase (MAT) domain. The chain length and with desired numbers of double bonds. genetic modification affects the activity of the PUFA PKS Examples of PUFAs include, but are not limited to, DHA system. In one aspect of this embodiment, the microorgan (docosahexaenoic acid (C22:6, ()-3)), DPA (docosapen ism is a eukaryotic microorganism. taenoic acid (C22:5, (D-6)), and EPA (eicosapentaenoic acid 0046 Yet another embodiment of the present invention (C20:5, co-3)). relates to a recombinant host cell which has been modified 0.052 Second, the PUFA PKS system described herein to express a non-bacterial polyunsaturated fatty acid (PUFA) incorporates both iterative and non-iterative reactions, polyketide synthase (PKS) system, wherein the non-bacte which distinguish the system from previously described rial PUFA PKS catalyzes both iterative and non-iterative PKS systems (e.g., type I, type II or modular). More enzymatic reactions. The non-bacterial PUFA PKS system particularly, the PUFA PKS system described herein con comprises: (a) at least one enoyl ACP-reductase (ER) tains domains that appear to function during each cycle as domain; (b) multiple acyl carrier protein (ACP) domains; (c) well as those which appear to function during only some of at least two B-ketoacyl-ACP synthase (KS) domains; (d) at the cycles. A key aspect of this may be related to the domains least one acyltransferase (AT) domain; (e) at least one showing homology to the bacterial Fab. A enzymes. For ketoreductase (KR) domain; (f) at least two Fab A-like example, the Fab A enzyme of E. coli has been shown to B-hydroxyacyl-ACP dehydrase (DH) domains; (g) at least possess two enzymatic activities. It possesses a dehydration one chain length factor (CLF) domain; and (h) at least one activity in which a water molecule (HO) is abstracted from malonyl-CoA:ACP acyltransferase (MAT) domain. a carbon chain containing a hydroxy group, leaving a trans double bond in that carbon chain. In addition, it has an BRIEF DESCRIPTION OF THE FIGURES isomerase activity in which the trans double bond is con verted to the cis configuration. This isomerization is accom 0047 FIG. 1 is a graphical representation of the domain plished in conjunction with a migration of the double bond structure of the Schizochytrium PUFA PKS system. position to adjacent carbons. In PKS (and FAS) systems, the 0.048 FIG. 2 shows a comparison of PKS domains from main carbon chain is extended in 2 carbon increments. One Schizochytrium and Shewanella. can therefore predict the number of extension reactions required to produce the PUFA products of these PKS sys 0049 FIG. 3 shows a comparison of PKS domains from tems. For example, to produce DHA (C22:6, all cis) requires Schizochytrium and a related PKS system from Nostoc 10 extension reactions. Since there are only 6 double bonds whose product is a long chain fatty acid that does not contain in the end product, it means that during some of the reaction any double bonds. cycles, a double bond is retained (as a cis isomer), and in others, the double bond is reduced prior to the next exten DETAILED DESCRIPTION OF THE S1O. INVENTION 0053 Before the discovery of a PUFA PKS system in 0050. The present invention generally relates to non marine bacteria (see U.S. Pat. No. 6,140.486), PKS systems bacterial derived polyunsaturated fatty acid (PUFA) were not known to possess this combination of iterative and polyketide synthase (PKS) systems, to genetically modified selective enzymatic reactions, and they were not thought of organisms comprising non-bacterial PUFA PKS systems, to as being able to produce carbon-carbon double bonds in the methods of making and using Such systems for the produc cis configuration. However, the PUFA PKS system tion of products of interest, including bioactive molecules, described by the present invention has the capacity to and to novel methods for identifying new eukaryotic micro introduce cis double bonds and the capacity to vary the organisms having such a PUFAPKS system. As used herein, reaction sequence in the cycle. a PUFA PKS system generally has the following identifying 0054 Therefore, the present inventors propose to use features: (1) it produces PUFAs as a natural product of the these features of the PUFA PKS system to produce a range system; and (2) it comprises several multifunctional proteins of bioactive molecules that could not be produced by the assembled into a complex that conducts both iterative pro previously described (Type II, Type I and modular) PKS cessing of the fatty acid chain as well non-iterative process systems. These bioactive molecules include, but are limited ing, including trans-cis isomerization and enoyl reduction to, polyunsaturated fatty acids (PUFAs), antibiotics or other reactions in selected cycles (See FIG. 1, for example). bioactive compounds, many of which will be discussed 0051 More specifically, first, a PUFA PKS system that below. For example, using the knowledge of the PUFA PKS forms the basis of this invention produces polyunsaturated gene structures described herein, any of a number of meth fatty acids (PUFAs) as products (i.e., an organism that ods can be used to alter the PUFA PKS genes, or combine US 2008/0026440 A1 Jan. 31, 2008 portions of these genes with other synthesis systems, includ invention, genetically modified organisms can be produced ing other PKS systems, such that new products are pro which incorporate non-bacterial PUFA PKS functional duced. The inherent ability of this particular type of system domains with bacteria PUFA PKS functional domains, as to do both iterative and selective reactions will enable this well as PKS functional domains or proteins from other PKS system to yield products that would not be found if similar systems (type I, type II, modular) or FAS systems. methods were applied to other types of PKS systems. 0058 Schizochytrium is a Thraustochytrid marine micro 0055. In one embodiment, a PUFA PKS system accord organism that accumulates large quantities of triacylglycer ing to the present invention comprises at least the following ols rich in DHA and docosapentaenoic acid (DPA; 22:5 biologically active domains: (a) at least two enoyl ACP (O-6); e.g., 30% DHA+DPA by dry weight (Barclay et al., J. reductase (ER) domains; (b) at least six acyl carrier protein Appl. Phycol. 6, 123 (1994)). In eukaryotes that synthesize (ACP) domains; (c) at least two f-ketoacyl-ACP synthase 20- and 22-carbon PUFAs by an elongation/desaturation (KS) domains; (d) at least one acyltransferase (AT) domain; pathway, the pools of 18-, 20- and 22-carbon intermediates (e) at least one ketoreductase (KR) domain; (f) at least two are relatively large so that in Vivo labeling experiments using FabA-like B-hydroxy acyl-ACP dehydrase (DH) domains: '''C-acetate reveal clear precursor-product kinetics for the (g) at least one chain length factor (CLF) domain; and (h) at predicted intermediates (Gellerman et al., Biochim. Biophys. least one malonyl-CoA:ACP acyltransferase (MAT) domain. Acta 573:23 (1979)). Furthermore, radiolabeled intermedi The functions of these domains are generally individually ates provided exogenously to Such organisms are converted known in the art and will be described in detail below with to the final PUFA products. The present inventors have regard to the PUFA PKS system of the present invention. shown that 1-''C-acetate was rapidly taken up by Schizochytrium cells and incorporated into fatty acids, but at 0056. In another embodiment, the PUFA PKS system the shortest labeling time (1 min), DHA contained 31% of comprises at least the following biologically active domains: the label recovered in fatty acids, and this percentage (a) at least one enoyl ACP-reductase (ER) domain; (b) remained essentially unchanged during the 10-15 min of multiple acyl carrier protein (ACP) domains (at least four, '''C-acetate incorporation and the subsequent 24 hours of and preferably at least five, and more preferably at least six, culture growth (See Example 3). Similarly, DPA represented and even more preferably seven, eight, nine, or more than 10% of the label throughout the experiment. There is no nine); (c) at least two B-keto acyl-ACP synthase (KS) evidence for a precursor-product relationship between 16- or domains; (d) at least one acyltransferase (AT) domain; (e) at 18-carbon fatty acids and the 22-carbon polyunsaturated least one ketoreductase (KR) domain; (f) at least two FabA fatty acids. These results are consistent with rapid synthesis like B-hydroxy acyl-ACP dehydrase (DH) domains; (g) at of DHA from '''C-acetate involving very small (possibly least one chain length factor (CLF) domain; and (h) at least enzyme-bound) pools of intermediates. A cell-free homoge one malonyl-CoA:ACP acyltransferase (MAT) domain. nate derived from Schizochytrium cultures incorporated Preferably, such a PUFA PKS system is a non-bacterial 1-''C-malonyl-CoA into DHA, DPA, and saturated fatty PUFA-PKS system. acids. The same biosynthetic activities were retained by a 0057. In one embodiment, a PUFA PKS system of the 100,000xg supernatant fraction but were not present in the present invention is a non-bacterial PUFA PKS system. In membrane pellet. Thus, DHA and DPA synthesis in other words, in one embodiment, the PUFA PKS system of Schizochytrium does not involve membrane-bound desatu the present invention is isolated from an organism that is not rases or fatty acid elongation enzymes like those described a bacteria, or is a homologue of or derived from a PUFA for other eukaryotes (Parker-Barnes et al., 2000, supra; PKS system from an organism that is not a bacteria, Such as Shanklin et al., 1998, supra). These fractionation data con a eukaryote oran archaebacterium. Eukaryotes are separated trast with those obtained from the Shewanella enzymes (See from prokaryotes based on the degree of differentiation of Metz et al., 2001, Supra) and may indicate use of a different the cells. The higher group with more differentiation is (soluble) acyl acceptor molecule. Such as CoA, by the called eukaryotic. The lower group with less differentiated Schizochytrium enzyme. cells is called prokaryotic. In general, prokaryotes do no 0059. In copending U.S. application Ser. No. 09/231,899, possess a nuclear membrane, do not exhibit mitosis during a cDNA library from Schizochytrium was constructed and cell division, have only one chromosome, their approximately 8,000 random clones (ESTs) were sequenced. contains 70S ribosomes, they do not possess any mitochon Within this dataset, only one moderately expressed gene dria, endoplasmic reticulum, , lysosomes or (0.3% of all sequences) was identified as a fatty acid golgi apparatus, their flagella (if present) consists of a single desaturase, although a second putative desaturase was rep fibril. In contrast eukaryotes have a nuclear membrane, they resented by a single clone (0.01%). By contrast, sequences do exhibit mitosis during cell division, they have many that exhibited homology to 8 of the 11 domains of the chromosomes, their cytoplasm contains 80S ribosomes, they Shewanella PKS genes shown in FIG. 2 were all identified do possess mitochondria, endoplasmic reticulum, chloro at frequencies of 0.2-0.5%. In U.S. application Ser. No. plasts (in ), lysosomes and golgi apparatus, and their 09/231,899, several cDNA clones showing homology to the flagella (if present) consists of many fibrils. In general, Shewanella PKS genes were sequenced, and various clones bacteria are prokaryotes, while algae, fungi, , were assembled into nucleic acid sequences representing and higher plants are eukaryotes. The PUFAPKS systems of two partial open reading frames and one complete open the marine bacteria (e.g., Shewanella and Vibrio marinus) reading frame. Nucleotides 390-4443 of the cDNA sequence are not the basis of the present invention, although the containing the first partial open reading frame described in present invention does contemplate the use of domains from U.S. application Ser. No. 09/231,899 (denoted therein as these bacterial PUFA PKS systems in conjunction with SEQID NO:69) match nucleotides 4677-8730 (plus the stop domains from the non-bacterial PUFA PKS systems of the codon) of the sequence denoted herein as OrfA (SEQ ID present invention. For example, according to the present NO:1). Nucleotides 1-4876 of the cDNA sequence contain US 2008/0026440 A1 Jan. 31, 2008

ing the second partial open reading frame described in U.S. (i.e., the ACP region, discussed below), which probably application Ser. No. 09/231,899 (denoted therein as SEQ ID created Some confusion in the assembly of that sequence NO:71) matches nucleotides 1311-6177 (plus the stop from various cDNA clones. codon) of the sequence denoted herein as OrfB (SEQ ID 0063 OrfA is a 8730 nucleotide sequence (not including NO:3). Nucleotides 145-4653 of the cDNA sequence con the stop codon) which encodes a 2910 amino acid sequence, taining the complete open reading frame described in U.S. represented herein as SEQID NO:2. Within OrfA are twelve application Ser. No. 09/231,899 (denoted therein as SEQ ID domains: (a) one B-keto acyl-ACP synthase (KS) domain; NO:76 and incorrectly designated as a partial open reading (b) one malonyl-CoA:ACP acyltransferase (MAT) domain; frame) match the entire sequence (plus the stop codon) of the (c) nine acyl carrier protein (ACP) domains; and (d) one sequence denoted herein as Orf (SEQ ID NO:5). ketoreductase (KR) domain. 0060) Further sequencing of cDNA and genomic clones 0064. The nucleotide sequence for OrfA has been depos by the present inventors allowed the identification of the ited with GenBank as Accession No. AF378327 (amino acid full-length genomic sequence of each of OrfA, OrfB and sequence Accession No. AAK728879). OrfA was compared OrfQ and the complete identification of the domains with with known sequences in a standard BLAST search (BLAST homology to those in Shewanella (see FIG. 2). It is noted 2.0 Basic BLAST homology search using blastp for amino that in Schizochytrium, the genomic DNA and cDNA are acid searches, blastin for nucleic acid searches, and blastX identical, due to the lack of introns in the organism genome, for nucleic acid searches and searches of the translated to the best of the present inventors knowledge. Therefore, amino acid sequence in all 6 open reading frames with reference to a nucleotide sequence from Schizochytrium can standard default parameters, wherein the query sequence is refer to genomic DNA or cDNA. Based on the comparison filtered for low complexity regions by default (described in of the Schizochytrium PKS domains to Shewanella, clearly, Altschul, S. F., Madden, T. L., Schaaffer, A. A., Zhang, J., the Schizochytrium genome encodes proteins that are highly Zhang, Z. Miller, W. & Lipman, D. J. (1997) “Gapped similar to the proteins in Shewanella that are capable of BLAST and PSI-BLAST: a new generation of protein data catalyzing EPA synthesis. The proteins in Schizochytrium base search programs. Nucleic Acids Res. 25:3389-3402, constitute a PUFA PKS system that catalyzes DHA and DPA incorporated herein by reference in its entirety)). At the synthesis. As discussed in detail herein, simple modification nucleic acid level, OrfA has no significant homology to any of the reaction scheme identified for Shewanella will allow known nucleotide sequence. At the amino acid level, the for DHA synthesis in Schizochytrium. The homology sequences with the greatest degree of homology to ORFA between the prokaryotic Shewanella and eukaryotic were: Nostoc sp. 7120 heterocyst glycolipid synthase Schizochytrium genes suggests that the PUFA PKS has (Accession No. NC 003272), which was 42% identical to undergone lateral gene transfer. ORFA over 1001 amino acid residues; and Moritella mari nus (Vibrio marinus) ORF8 (Accession No. AB025342), 0061 FIG. 1 is a graphical representation of the three which was 40% identical to ORFA over 993 amino acid open reading frames from the Schizochytrium PUFA PKS residues. system, and includes the domain structure of this PUFAPKS system. As described in Example 1 below, the domain 0065. The first domain in OrfA is a KS domain, also structure of each open reading frame is as follows: referred to herein as ORFA-KS. This domain is contained within the nucleotide sequence spanning from a starting Open Reading Frame A (OrfA): point of between about positions 1 and 40 of SEQID NO:1 (OrfA) to an ending point of between about positions 1428 0062) The complete nucleotide sequence for OrfA is and 1500 of SEQ ID NO:1. The nucleotide sequence con represented herein as SEQID NO:1. Nucleotides 4677-8730 taining the sequence encoding the ORFA-KS domain is of SEQID NO:1 correspond to nucleotides 390-4443 of the represented herein as SEQ ID NO:7 (positions 1-1500 of sequence denoted as SEQID NO:69 in U.S. application Ser. SEQID NO:1). The amino acid sequence containing the KS No. 09/231,899. Therefore, nucleotides 1-4676 of SEQ ID domain spans from a starting point of between about posi NO:1 represent additional sequence that was not disclosed in tions 1 and 14 of SEQID NO:2 (ORFA) to an ending point U.S. application Ser. No. 09/231,899. This novel region of of between about positions 476 and 500 of SEQ ID NO:2. SEQ ID NO: 1 encodes the following domains in OrfA: (1) The amino acid sequence containing the ORFA-KS domain the ORFA-KS domain; (2) the ORFA-MAT domain; and (3) is represented herein as SEQ ID NO:8 (positions 1-500 of at least a portion of the ACP domain region (e.g., at least SEQ ID NO:2). It is noted that the ORFA-KS domain ACP domains 1-4). It is noted that nucleotides 1-389 of SEQ contains an active site motif: DXAC* (*acyl binding site ID NO:69 in U.S. application Ser. No. 09/231,899 do not match with the 389 nucleotides that are upstream of position C2s). 4677 in SEQID NO:1 disclosed herein. Therefore, positions 0066. According to the present invention, a domain or 1-389 of SEQ ID NO:69 in U.S. Application Serial No. protein having 3-keto acyl-ACP synthase (KS) biological 09/231,899 appear to be incorrectly placed next to nucle activity (function) is characterized as the enzyme that carries otides 390-4443 of that sequence. Most of these first 389 out the initial step of the FAS (and PKS) elongation reaction nucleotides (about positions 60–389) are a match with an cycle. The acyl group destined for elongation is linked to a upstream portion of OrfA (SEQ ID NO: 1) of the present cysteine residue at the active site of the enzyme by a invention and therefore, it is believed that an error occurred thioester bond. In the multi-step reaction, the acyl-enzyme in the effort to prepare the contig of the cDNA constructs in undergoes condensation with malonyl-ACP to form -keto U.S. application Ser. No. 09/231,899. The region in which acyl-ACP, CO and free enzyme. The KS plays a key role in the alignment error occurred in U.S. application Ser. No. the elongation cycle and in many systems has been shown to 09/231,899 is within the region of highly repetitive sequence possess greater Substrate specificity than other enzymes of US 2008/0026440 A1 Jan. 31, 2008

the reaction cycle. For example, E. coli has three distinct KS sequence for each domain is not represented herein by an enzymes—each with its own particular role in the physiol individual sequence identifier. However, based on the infor ogy of the organism (Magnuson et al., Microbiol. Rev. 57. mation disclosed herein, one of skill in the art can readily 522 (1993)). The two KS domains of the PUFA-PKS sys determine the sequence containing each of the other eight tems could have distinct roles in the PUFA biosynthetic ACP domains (see discussion below). reaction sequence. 0072 All nine ACP domains together span a region of 0067. As a class of enzymes, KS's have been well OrfA of from about position 3283 to about position 6288 of characterized. The sequences of many verified KS genes are SEQID NO:1, which corresponds to amino acid positions of know, the active site motifs have been identified and the from about 1095 to about 2096 of SEQ ID NO:2. The crystal structures of several have been determined. Proteins nucleotide sequence for the entire ACP region containing all (or domains of proteins) can be readily identified as belong nine domains is represented herein as SEQ ID NO:16. The ing to the KS family of enzymes by homology to known KS region represented by SEQ ID NO:16 includes the linker Sequences. segments between individual ACP domains. The repeat 0068 The second domain in OrfA is a MAT domain, also interval for the nine domains is approximately every 330 referred to herein as ORFA-MAT. This domain is contained nucleotides of SEQ ID NO:16 (the actual number of amino within the nucleotide sequence spanning from a starting acids measured between adjacent active site serines ranges point of between about positions 1723 and 1798 of SEQ ID from 104 to 116 amino acids). Each of the nine ACP NO: 1 (OrfA) to an ending point of between about positions domains contains a pantetheline binding motif LGIDS* 2805 and 3000 of SEQ ID NO:1. The nucleotide sequence (represented herein by SEQ ID NO:14), wherein S* is the containing the sequence encoding the ORFA-MAT domain pantetheline binding site serine (S). The pantetheline binding is represented herein as SEQID NO:9 (positions 1723-3000 site serine (S) is located near the center of each ACP domain of SEQ ID NO:1). The amino acid sequence containing the sequence. At each end of the ACP domain region and MAT domain spans from a starting point of between about between each ACP domain is a region that is highly enriched positions 575 and 600 of SEQID NO:2 (ORFA) to an ending for proline (P) and alanine (A), which is believed to be a point of between about positions 935 and 1000 of SEQ ID linker region. For example, between ACP domains 1 and 2 NO:2. The amino acid sequence containing the ORFA-MAT is the sequence: APAPVKAAAPAAPVASAPAPA, repre domain is represented herein as SEQ ID NO:10 (positions sented herein as SEQID NO:15. The locations of the active 575-1000 of SEQID NO:2). It is noted that the ORFA-MAT site serine residues (i.e., the pantetheline binding site) for domain contains an active site motif: GHS*XG (acyl each of the nine ACP domains, with respect to the amino binding site So), represented herein as SEQ ID NO:11. acid sequence of SEQ ID NO:2, are as follows: ACP1 = 0069. According to the present invention, a domain or Sls; ACP2=See: ACP3=S77; ACP4=Sass, ACP5= protein having malonyl-CoA:ACP acyltransferase (MAT) So, ACP6=Szs: ACP7=Ss: ACP8=So; and ACP9= biological activity (function) is characterized as one that S2034. Given that the average size of an ACP domain is transfers the malonyl moiety from malonyl-CoA to ACP. In about 85 amino acids, excluding the linker, and about 110 addition to the active site motif (GXSXG), these enzymes amino acids including the linker, with the active site serine possess an extended Motif R and Q amino acids in key being approximately in the center of the domain, one of skill positions) that identifies them as MAT enzymes (in contrast in the art can readily determine the positions of each of the to the AT domain of Schizochytrium Orf B). In some PKS nine ACP domains in OrfA. systems (but not the PUFA PKS domain) MAT domains will 0073. According to the present invention, a domain or preferentially load methyl- or ethyl-malonate on to the ACP protein having acyl carrier protein (ACP) biological activity group (from the corresponding CoA ester), thereby intro (function) is characterized as being Small polypeptides (typi ducing branches into the linear carbon chain. MAT domains cally, 80 to 100 amino acids long), that function as carriers can be recognized by their homology to known MAT for growing fatty acyl chains via a thioester linkage to a sequences and by their extended motif structure. covalently bound co-factor of the protein. They occur as 0070 Domains 3-11 of OrfA are nine tandem ACP separate units or as domains within larger proteins. ACPs are domains, also referred to herein as ORFA-ACP (the first converted from inactive apo-forms to functional holo-forms domain in the sequence is ORFA-ACP1, the second domain by transfer of the phosphopantetheinyl moeity of CoA to a is ORFA-ACP2, the third domain is ORFA-ACP3, etc.). The highly conserved serine residue of the ACP. Acyl groups are first ACP domain, ORFA-ACP1, is contained within the attached to ACP by a thioester linkage at the free terminus nucleotide sequence spanning from about position 3343 to of the phosphopantetheinyl moiety. ACPs can be identified about position3600 of SEQID NO:1 (OrfA). The nucleotide by labeling with radioactive pantetheline and by sequence sequence containing the sequence encoding the ORFA homology to known ACPs. The presence of variations of the ACP1 domain is represented herein as SEQ ID NO:12 above mentioned motif (LGIDS) is also a signature of an (positions 3343-3600 of SEQ ID NO:1). The amino acid ACP. sequence containing the first ACP domain spans from about 0074 Domain 12 in OrfA is a KR domain, also referred position 1115 to about position 1200 of SEQID NO:2. The to herein as ORFA-KR. This domain is contained within the amino acid sequence containing the ORFA-ACP1 domain is nucleotide sequence spanning from a starting point of about represented herein as SEQ ID NO:13 (positions 1115-1200 position 6598 of SEQ ID NO:1 to an ending point of about of SEQID NO:2). It is noted that the ORFA-ACP1 domain position 8730 of SEQ ID NO:1. The nucleotide sequence contains an active site motif: LGIDS* (pantetheline binding containing the sequence encoding the ORFA-KR domain is motif Ss), represented herein by SEQ ID NO:14. represented herein as SEQ ID NO:17 (positions 6598-8730 0071. The nucleotide and amino acid sequences of all of SEQ ID NO:1). The amino acid sequence containing the nine ACP domains are highly conserved and therefore, the KR domain spans from a starting point of about position US 2008/0026440 A1 Jan. 31, 2008

2200 of SEQ ID NO:2 (ORFA) to an ending point of about Nostoc sp. 7120 hypothetical protein (Accession No. position 2910 of SEQ ID NO:2. The amino acid sequence NC 003272), which was 53% identical to ORFB over 430 containing the ORFA-KR domain is represented herein as amino acid residues. SEQ ID NO:18 (positions 2200-2910 of SEQ ID NO:2). Within the KR domain is a core region with homology to 0079. The first domain in OrfB is a KS domain, also short chain aldehyde-dehydrogenases (KR is a member of referred to herein as ORFB-KS. This domain is contained this family). This core region spans from about position within the nucleotide sequence spanning from a starting 7198 to about position 7500 of SEQ ID NO:1, which point of between about positions 1 and 43 of SEQID NO:3 corresponds to amino acid positions 2400-2500 of SEQ ID (OrfB) to an ending point of between about positions 1332 NO:2. and 1350 of SEQ ID NO:3. The nucleotide sequence con taining the sequence encoding the ORFB-KS domain is 0075 According to the present invention, a domain or represented herein as SEQ ID NO:19 (positions 1-1350 of protein having ketoreductase activity, also referred to as SEQID NO:3). The amino acid sequence containing the KS 3-ketoacyl-ACP reductase (KR) biological activity (func domain spans from a starting point of between about posi tion), is characterized as one that catalyzes the pyridine tions 1 and 15 of SEQID NO:4 (ORFB) to an ending point nucleotide-dependent reduction of 3-keto acyl forms of of between about positions 444 and 450 of SEQ ID NO:4. ACP. It is the first reductive step in the de novo fatty acid The amino acid sequence containing the ORFB-KS domain biosynthesis elongation cycle and a reaction often performed is represented herein as SEQID NO:20 (positions 1-450 of in polyketide biosynthesis. Significant sequence similarity is SEQ ID NO:4). It is noted that the ORFB-KS domain observed with one family of enoyl ACP reductases (ER), the contains an active site motif: DXAC* (*acyl binding site other reductase of FAS (but not the ER family present in the Co.). KS biological activity and methods of identifying PUFA PKS system), and the short-chain alcohol dehydro proteins or domains having Such activity is described above. genase family. Pfam analysis of the PUFA PKS region indicated above reveals the homology to the short-chain 0080. The second domain in OrfB is a CLF domain, also alcohol dehydrogenase family in the core region. Blast referred to herein as ORFB-CLF. This domain is contained analysis of the same region reveals matches in the core area within the nucleotide sequence spanning from a starting to known KR enzymes as well as an extended region of point of between about positions 1378 and 1402 of SEQ ID homology to domains from the other characterized PUFA NO:3 (OrfB) to an ending point of between about positions PKS systems. 2682 and 2700 of SEQ ID NO:3. The nucleotide sequence containing the sequence encoding the ORFB-CLF domain is Open Reading Frame B (OrfB): represented herein as SEQ ID NO:21 (positions 1378-2700 0.076 The complete nucleotide sequence for OrfB is of SEQ ID NO:3). The amino acid sequence containing the represented herein as SEQID NO:3. Nucleotides 1311-6177 CLF domain spans from a starting point of between about of SEQ ID NO:3 correspond to nucleotides 1-4867 of the positions 460 and 468 of SEQID NO:4 (ORFB) to an ending sequence denoted as SEQID NO:71 in U.S. application Ser. point of between about positions 894 and 900 of SEQ ID No. 09/231,899 (The cDNA sequence in U.S. application NO:4. The amino acid sequence containing the ORFB-CLF Ser. No. 09/231,899 contains about 345 additional nucle domain is represented herein as SEQ ID NO:22 (positions otides beyond the stop codon, including a polyA tail). 460-900 of SEQID NO:4). It is noted that the ORFB-CLF Therefore, nucleotides 1-1310 of SEQ ID NO:1 represent domain contains a KS active site motif without the acyl additional sequence that was not disclosed in U.S. applica binding cysteine. tion Ser. No. 09/231,899. This novel region of SEQID NO:3 0081. According to the present invention, a domain or contains most of the KS domain encoded by OrfB. protein is referred to as a chain length factor (CLF) based on 0.077 OrfB is a 6177 nucleotide sequence (not including the following rationale. The CLF was originally described as the stop codon) which encodes a 2059 amino acid sequence, characteristic of Type II (dissociated enzymes) PKS systems represented herein as SEQ ID NO:4. Within OrfB are four and was hypothesized to play a role in determining the domains: (a) one B-keto acyl-ACP synthase (KS) domain; number of elongation cycles, and hence the chain length, of (b) one chain length factor (CLF) domain; (c) one acyl the end product. CLF amino acid sequences show homology transferase (AT) domain; and, (d) one enoyl ACP-reductase to KS domains (and are thought to form heterodimers with (ER) domain. a KS protein), but they lack the active site cysteine. CLF's role in PKS systems is currently controversial. New evi 0078. The nucleotide sequence for OrfB has been depos dence (C. Bisang et al., Nature 401, 502 (1999)) suggests a ited with GenBank as Accession No. AF378328 (amino acid role in priming (providing the initial acyl group to be sequence Accession No. AAK728880). OrfB was compared elongated) the PKS systems. In this role the CLF domain is with known sequences in a standard BLAST search as thought to decarboxylate malonate (as malonyl-ACP), thus described above. At the nucleic acid level, OrfB has no forming an acetate group that can be transferred to the KS significant homology to any known nucleotide sequence. At active site. This acetate therefore acts as the priming the amino acid level, the sequences with the greatest degree molecule that can undergo the initial elongation (condensa of homology to ORFB were: Shewanella sp. hypothetical tion) reaction. Homologues of the Type II CLF have been protein (Accession No. U73935), which was 53% identical identified as loading domains in some modular PKS sys to ORFB over 458 amino acid residues; Moritella marinus tems. A domain with the sequence features of the CLF is (Vibrio marinus) ORF11 (Accession No. AB025342), which found in all currently identified PUFA PKS systems and in was 53% identical to ORFB over 460 amino acid residues; Photobacterium profindum omega-3 polyunsaturated fatty each case is found as part of a multidomain protein. acid synthase Pfal) (Accession No. AF409 100), which was 0082 The third domain in OrfB is an AT domain, also 52% identical to ORFB over 457 amino acid residues; and referred to herein as ORFB-AT. This domain is contained US 2008/0026440 A1 Jan. 31, 2008 within the nucleotide sequence spanning from a starting Schizochytrium ER domain. The Schizochytrium PUFA PKS point of between about positions 2701 and 3598 of SEQ ID system contains two ER domains (one on OrfB and one on NO:3 (OrfB) to an ending point of between about positions Orf(). 3975 and 4200 of SEQ ID NO:3. The nucleotide sequence containing the sequence encoding the ORFB-AT domain is Open Reading Frame C (Orf(r): represented herein as SEQ ID NO:23 (positions 2701-4200 0086) The complete nucleotide sequence for Orfc is of SEQ ID NO:3). The amino acid sequence containing the represented herein as SEQID NO:5. Nucleotides 1-4509 of AT domain spans from a starting point of between about SEQID NO:5 (i.e., the entire open reading frame sequence, positions 901 and 1200 of SEQ ID NO:4 (ORFB) to an not including the stop codon) correspond to nucleotides ending point of between about positions 1325 and 1400 of 145-4653 of the sequence denoted as SEQID NO:76 in U.S. SEQ ID NO:4. The amino acid sequence containing the application Ser. No. 09/231,899 (The cDNA sequence in ORFB-AT domain is represented herein as SEQ ID NO:24 U.S. application Ser. No. 09/231,899 contains about 144 (positions 901-1400 of SEQ ID NO:4). It is noted that the nucleotides upstream of the start codon for Orfor and about ORFB-AT domain contains an active site motif of GXS*XG 110 nucleotides beyond the stop codon, including a polyA (acyl binding site Sao) that is characteristic of acyltrans tail). Orf is a 4509 nucleotide sequence (not including the ferse (AT) proteins. stop codon) which encodes a 1503 amino acid sequence, 0083) An “acyltransferase” or “AT” refers to a general represented herein as SEQ ID NO:6. Within Orf are three class of enzymes that can carry out a number of distinct acyl domains: (a) two FabA-like B-hydroxyacyl-ACP dehydrase transfer reactions. The Schizochytrium domain shows good (DH) domains; and (b) one enoyl ACP-reductase (ER) homology to a domain present in all of the other PUFA PKS domain. systems currently examined and very weak homology to 0087. The nucleotide sequence for Orfc has been depos Some acyltransferases whose specific functions have been ited with GenBank as Accession No. AF3783.29 (amino acid identified (e.g. to malonyl-CoA:ACP acyltransferase, MAT). sequence Accession No. AAK728881). Orf was compared In spite of the weak homology to MAT, this AT domain is not with known sequences in a standard BLAST search as believed to function as a MAT because it does not possess described above. At the nucleic acid level, Orf has no an extended motif structure characteristic of Such enzymes significant homology to any known nucleotide sequence. At (see MAT domain description, above). For the purposes of the amino acid level (Blastp), the sequences with the greatest this disclosure, the functions of the AT domain in a PUFA degree of homology to ORFC were: Moritella marinus PKS system include, but are not limited to: transfer of the (Vibrio marinus) ORF11 (Accession No. ABO25342), fatty acyl group from the ORFAACP domain(s) to water (i.e. which is 45% identical to ORFC over 514 amino acid a thioesterase—releasing the fatty acyl group as a free fatty residues. Shewanella sp. hypothetical protein 8 (Accession acid), transfer of a fatty acyl group to an acceptor Such as No. U73935), which is 49% identical to ORFC over 447 CoA, transfer of the acyl group among the various ACP amino acid residues, Nostoc sp. hypothetical protein (Acces domains, or transfer of the fatty acyl group to a lipophilic sion No. NC 003272), which is 49% identical to ORFC acceptor molecule (e.g. to lysophosphadic acid). over 430 amino acid residues, and Shewanella sp. hypo thetical protein 7 (Accession No. U73935), which is 37% 0084. The fourth domain in OrfB is an ER domain, also identical to ORFC over 930 amino acid residues. referred to herein as ORFB-ER. This domain is contained within the nucleotide sequence spanning from a starting 0088. The first domain in Orf is a DH domain, also point of about position 4648 of SEQ ID NO:3 (OrfB) to an referred to herein as ORFC-DH1. This is one of two DH ending point of about position 6177 of SEQ ID NO:3. The domains in Orfc, and therefore is designated DH1. This nucleotide sequence containing the sequence encoding the domain is contained within the nucleotide sequence Span ORFB-ER domain is represented herein as SEQ ID NO:25 ning from a starting point of between about positions 1 and (positions 4648-6177 of SEQ ID NO:3). The amino acid 778 of SEQ ID NO:5 (Orf) to an ending point of between sequence containing the ER domain spans from a starting about positions 1233 and 1350 of SEQ ID NO:5. The point of about position 1550 of SEQID NO:4 (ORFB) to an nucleotide sequence containing the sequence encoding the ending point of about position 2059 of SEQ ID NO:4. The ORFC-DH1 domain is represented herein as SEQID NO:27 amino acid sequence containing the ORFB-ER domain is (positions 1-1350 of SEQ ID NO:5). The amino acid represented herein as SEQ ID NO:26 (positions 1550-2059 sequence containing the DH1 domain spans from a starting of SEQ ID NO:4). point of between about positions 1 and 260 of SEQID NO:6 0085. According to the present invention, this domain has (ORFC) to an ending point of between about positions 411 enoyl reductase (ER) biological activity. The ER enzyme and 450 of SEQ ID NO:6. The amino acid sequence con reduces the trans-double bond (introduced by the DH activ taining the ORFC-DH1 domain is represented herein as SEQ ity) in the fatty acyl-ACP, resulting in fully saturating those ID NO:28 (positions 1-450 of SEQ ID NO:6). carbons. The ER domain in the PUFA-PKS shows homology 0089. The characteristics of both the DH domains (see to a newly characterized family of ER enzymes (Heath et al., below for DH 2) in the PUFA PKS systems have been Nature 406, 145 (2000)). Heath and Rock identified this new described in the preceding sections. This class of enzyme class of ER enzymes by cloning a gene of interest from removes HOH from a B-keto acyl-ACP and leaves a trans Streptococcus pneumoniae, purifying a protein expressed double bond in the carbon chain. The DH domains of the from that gene, and showing that it had ER activity in an in PUFA PKS systems show homology to bacterial DH vitro assay. The sequence of the Schizochytrium ER domain enzymes associated with their FAS systems (rather than to of OrfB shows homology to the S. pneumoniae ER protein. the DH domains of other PKS systems). A subset of bacterial All of the PUFAPKS systems currently examined contain at DH's, the Fab A-like DH's, possesses cis-trans isomerase least one domain with very high sequence homology to the activity (Heath et al., J. Biol. Chem., 271, 27795 (1996)). It US 2008/0026440 A1 Jan. 31, 2008

is the homologies to the Fab A-like DH's that indicate that complementary to the nucleic acid sequence of (a), (b), (c), one or both of the DH domains is responsible for insertion or (d). In a further embodiment, nucleic acid sequences of the cis double bonds in the PUFA PKS products. including a sequence encoding the active site domains or 0090 The second domain in Orf is a DH domain, also other functional motifs described above for several of the referred to herein as ORFC-DH2. This is the Second of two PUFA PKS domains are encompassed by the invention. DH domains in Orf , and therefore is designated DH2. This domain is contained within the nucleotide sequence Span 0093. According to the present invention, an amino acid ning from a starting point of between about positions 1351 sequence that has a biological activity of at least one domain and 2437 of SEQ ID NO:5 (Orf() to an ending point of of a PUFA PKS system is an amino acid sequence that has between about positions 2607 and 2850 of SEQ ID NO:5. the biological activity of at least one domain of the PUFA The nucleotide sequence containing the sequence encoding PKS system described in detail herein, as exemplified by the the ORFC-DH2 domain is represented herein as SEQ ID Schizochytrium PUFA PKS system. The biological activities NO:29 (positions 1351-2850 of SEQ ID NO:5). The amino of the various domains within the Schizochytrium PUFA acid sequence containing the DH2 domain spans from a PKS system have been described in detail above. Therefore, starting point of between about positions 451 and 813 of an isolated nucleic acid molecule of the present invention SEQ ID NO:6 (ORFC) to an ending point of between about can encode the translation product of any PUFA PKS open positions 869 and 950 of SEQ ID NO:6. The amino acid reading frame, PUFA PKS domain, biologically active frag sequence containing the ORFC-DH2 domain is represented ment thereof, or any homologue of a naturally occurring herein as SEQ ID NO:30 (positions 451-950 of SEQ ID PUFA PKS open reading frame or domain which has bio NO:6). DH biological activity has been described above. logical activity. A homologue of given protein or domain is 0091. The third domain in Orf is an ER domain, also a protein or polypeptide that has an amino acid sequence referred to herein as ORFC-ER. This domain is contained which differs from the naturally occurring reference amino within the nucleotide sequence spanning from a starting acid sequence (i.e., of the reference protein or domain) in point of about position 2998 of SEQ ID NO:5 (Orf) to an that at least one or a few, but not limited to one or a few, ending point of about position 4509 of SEQ ID NO:5. The amino acids have been deleted (e.g., a truncated version of nucleotide sequence containing the sequence encoding the the protein, Such as a peptide or fragment), inserted, ORFC-ER domain is represented herein as SEQ ID NO:31 inverted. Substituted and/or derivatized (e.g., by glycosyla (positions 2998-4509 of SEQ ID NO:5). The amino acid tion, phosphorylation, acetylation, myristoylation, prenyla sequence containing the ER domain spans from a starting tion, palmitation, amidation and/or addition of glyco point of about position 1000 of SEQID NO:6 (ORFC) to an Sylphosphatidyl inositol). Preferred homologues of a PUFA ending point of about position 1502 of SEQ ID NO:6. The PKS protein or domain are described in detail below. It is amino acid sequence containing the ORFC-ER domain is noted that homologues can include synthetically produced represented herein as SEQ ID NO:32 (positions 1000-1502 homologues, naturally occurring allelic variants of a given of SEQID NO:6). ER biological activity has been described protein or domain, or homologous sequences from organ above. isms other than the organism from which the reference sequence was derived. 0092. One embodiment of the present invention relates to an isolated nucleic acid molecule comprising a nucleic acid 0094. In general, the biological activity or biological sequence from a non-bacterial PUFA PKS system, a homo action of a protein or domain refers to any function(s) logue thereof, a fragment thereof, and/or a nucleic acid exhibited or performed by the protein or domain that is sequence that is complementary to any of Such nucleic acid ascribed to the naturally occurring form of the protein or sequences. In one aspect, the present invention relates to an domain as measured or observed in Vivo (i.e., in the natural isolated nucleic acid molecule comprising a nucleic acid physiological environment of the protein) or in vitro (i.e., sequence selected from the group consisting of: (a) a nucleic under laboratory conditions). Biological activities of PUFA acid sequence encoding an amino acid sequence selected PKS systems and the individual proteins/domains that make from the group consisting of: SEQID NO:2, SEQID NO:4, up a PUFA PKS system have been described in detail SEQ ID NO: 6, and biologically active fragments thereof, elsewhere herein. Modifications of a protein or domain, such (b) a nucleic acid sequence encoding an amino acid as in a homologue or mimetic (discussed below), may result sequence selected from the group consisting of SEQ ID in proteins or domains having the same biological activity as NO:8, SEQ ID NO:10, SEQ ID NO:13, SEQ ID NO:18, the naturally occurring protein or domain, or in proteins or SEQ ID NO:20, SEQ ID NO:22, SEQ ID NO:24, SEQ ID domains having decreased or increased biological activity as NO:26, SEQ ID NO:28, SEQ ID NO:30, SEQ ID NO:32, compared to the naturally occurring protein or domain. and biologically active fragments thereof (c) a nucleic acid Modifications which result in a decrease in expression or a sequence encoding an amino acid sequence that is at least decrease in the activity of the protein or domain, can be about 60% identical to at least 500 consecutive amino acids referred to as inactivation (complete or partial), down of said amino acid sequence of (a), wherein said amino acid regulation, or decreased action of a protein or domain. sequence has a biological activity of at least one domain of Similarly, modifications which result in an increase in a polyunsaturated fatty acid (PUFA) polyketide synthase expression or an increase in the activity of the protein or (PKS) system; (d) a nucleic acid sequence encoding an domain, can be referred to as amplification, overproduction, amino acid sequence that is at least about 60% identical to activation, enhancement, up-regulation or increased action said amino acid sequence of (b), wherein said amino acid of a protein or domain. A functional domain of a PUFAPKS sequence has a biological activity of at least one domain of system is a domain (i.e., a domain can be a portion of a a polyunsaturated fatty acid (PUFA) polyketide synthase protein) that is capable of performing a biological function (PKS) system; or (e) a nucleic acid sequence that is fully (i.e., has biological activity). US 2008/0026440 A1 Jan. 31, 2008

0.095. In accordance with the present invention, an iso hybrid (e.g., under moderate, high or very high Stringency lated nucleic acid molecule is a nucleic acid molecule that conditions) with the complementary sequence of a nucleic has been removed from its natural milieu (i.e., that has been acid molecule useful in the present invention, or of a size Subject to human manipulation), its natural milieu being the Sufficient to encode an amino acid sequence having a bio genome or chromosome in which the nucleic acid molecule logical activity of at least one domain of a PUFA PKS is found in nature. As such, "isolated does not necessarily system according to the present invention. As such, the size reflect the extent to which the nucleic acid molecule has of the nucleic acid molecule encoding such a protein can be been purified, but indicates that the molecule does not dependent on nucleic acid composition and percent homol include an entire genome or an entire chromosome in which ogy or identity between the nucleic acid molecule and the nucleic acid molecule is found in nature. An isolated complementary sequence as well as upon hybridization nucleic acid molecule can include a gene. An isolated conditions per se (e.g., temperature, salt concentration, and nucleic acid molecule that includes a gene is not a fragment formamide concentration). The minimal size of a nucleic of a chromosome that includes Such gene, but rather includes acid molecule that is used as an oligonucleotide primer or as the coding region and regulatory regions associated with the a probe is typically at least about 12 to about 15 nucleotides gene, but no additional genes naturally found on the same in length if the nucleic acid molecules are GC-rich and at chromosome. An isolated nucleic acid molecule can also least about 15 to about 18 bases in length if they are AT-rich. include a specified nucleic acid sequence flanked by (i.e., at There is no limit, other than a practical limit, on the maximal the 5' and/or the 3' end of the sequence) additional nucleic size of a nucleic acid molecule of the present invention, in acids that do not normally flank the specified nucleic acid that the nucleic acid molecule can include a sequence sequence in nature (i.e., heterologous sequences). Isolated Sufficient to encode a biologically active fragment of a nucleic acid molecule can include DNA, RNA (e.g., domain of a PUFA PKS system, an entire domain of a PUFA mRNA), or derivatives of either DNA or RNA (e.g., cDNA). PKS system, several domains within an open reading frame Although the phrase “nucleic acid molecule' primarily (Orf) of a PUFA PKS system, an entire Orf of a PUFA PKS refers to the physical nucleic acid molecule and the phrase system, or more than one Orf of a PUFA PKS system. “nucleic acid sequence' primarily refers to the sequence of nucleotides on the nucleic acid molecule, the two phrases 0099. In one embodiment of the present invention, an can be used interchangeably, especially with respect to a isolated nucleic acid molecule comprises or consists essen nucleic acid molecule, or a nucleic acid sequence, being tially of a nucleic acid sequence selected from the group of capable of encoding a protein or domain of a protein. SEQID NO:2, SEQID NO:4, SEQID NO:6, SEQID NO:8, SEQ ID NO:10, SEQ ID NO:13, SEQID NO:18, SEQ ID 0.096 Preferably, an isolated nucleic acid molecule of the NO:20, SEQ ID NO:22, SEQ ID NO:24, SEQ ID NO:26, present invention is produced using recombinant DNA tech SEQ ID NO:28, SEQ ID NO:30, SEQ ID NO:32, or bio nology (e.g., polymerase chain reaction (PCR) amplifica logically active fragments thereof. In one aspect, the nucleic tion, cloning) or chemical synthesis. Isolated nucleic acid acid sequence is selected from the group of: SEQID NO:1, molecules include natural nucleic acid molecules and homo SEQID NO:3, SEQID NO:5, SEQID NO:7, SEQID NO:9, logues thereof, including, but not limited to, natural allelic SEQ ID NO:12, SEQ ID NO:17, SEQ ID NO:19, SEQ ID variants and modified nucleic acid molecules in which NO:21, SEQ ID NO:23, SEQ ID NO:25, SEQ ID NO:27, nucleotides have been inserted, deleted, substituted, and/or SEQ ID NO:29, and SEQID NO:31. In one embodiment of inverted in Such a manner that such modifications provide the present invention, any of the above-described PUFA the desired effect on PUFA PKS system biological activity PKS amino acid sequences, as well as homologues of Such as described herein. Protein homologues (e.g., proteins sequences, can be produced with from at least one, and up encoded by nucleic acid homologues) have been discussed to about 20, additional heterologous amino acids flanking in detail above. each of the C- and/or N-terminal end of the given amino acid sequence. The resulting protein or polypeptide can be 0097. A nucleic acid molecule homologue can be pro referred to as "consisting essentially of a given amino acid duced using a number of methods known to those skilled in sequence. According to the present invention, the heterolo the art (see, for example, Sambrook et al., Molecular Clon gous amino acids are a sequence of amino acids that are not ing: A Laboratory Manual, Cold Spring Harbor Labs Press, naturally found (i.e., not found in nature, in vivo) flanking 1989). For example, nucleic acid molecules can be modified the given amino acid sequence or which would not be using a variety of techniques including, but not limited to, encoded by the nucleotides that flank the naturally occurring classic mutagenesis techniques and recombinant DNA tech nucleic acid sequence encoding the given amino acid niques. Such as site-directed mutagenesis, chemical treat sequence as it occurs in the gene, if Such nucleotides in the ment of a nucleic acid molecule to induce mutations, restric naturally occurring sequence were translated using standard tion enzyme cleavage of a nucleic acid fragment, ligation of codon usage for the organism from which the given amino nucleic acid fragments, PCR amplification and/or mutagen acid sequence is derived. Similarly, the phrase “consisting esis of selected regions of a nucleic acid sequence, synthesis essentially of, when used with reference to a nucleic acid of oligonucleotide mixtures and ligation of mixture groups sequence herein, refers to a nucleic acid sequence encoding to “build a mixture of nucleic acid molecules and combi a given amino acid sequence that can be flanked by from at nations thereof. Nucleic acid molecule homologues can be least one, and up to as many as about 60, additional selected from a mixture of modified nucleic acids by screen heterologous nucleotides at each of the 5' and/or the 3' end ing for the function of the protein encoded by the nucleic of the nucleic acid sequence encoding the given amino acid acid and/or by hybridization with a wild-type gene. sequence. The heterologous nucleotides are not naturally 0098. The minimum size of a nucleic acid molecule of found (i.e., not found in nature, in vivo) flanking the nucleic the present invention is a size sufficient to form a probe or acid sequence encoding the given amino acid sequence as it oligonucleotide primer that is capable of forming a stable occurs in the natural gene. US 2008/0026440 A1 Jan. 31, 2008

0100. The present invention also includes an isolated preferably at least about 95% identical, and more preferably nucleic acid molecule comprising a nucleic acid sequence at least about 96% identical, and more preferably at least encoding an amino acid sequence having a biological activ about 97% identical, and more preferably at least about 98% ity of at least one domain of a PUFA PKS system. In one identical, and more preferably at least about 99% identical aspect, Such a nucleic acid sequence encodes a homologue to an amino acid sequence chosen from: SEQID NO:2, SEQ of any of the Schizochytrium PUFA PKS ORFs or domains, ID NO:4, or SEQ ID NO:6, over any of the consecutive including: SEQ ID NO:2, SEQ ID NO:4, SEQ ID NO:6, amino acid lengths described in the paragraph above, SEQ ID NO:8, SEQ ID NO:10, SEQ ID NO:13, SEQ ID wherein the amino acid sequence has a biological activity of NO:18, SEQ ID NO:20, SEQ ID NO:22, SEQ ID NO:24, at least one domain of a PUFA PKS system. SEQID NO:26, SEQID NO:28, SEQID NO:30, or SEQID 0103) In one aspect of the invention, a homologue of a NO:32, wherein the homologue has a biological activity of Schizochytrium PUFA PKS protein or domain encompassed at least one domain of a PUFA PKS system as described by the present invention comprises an amino acid sequence previously herein. that is at least about 60% identical to an amino acid sequence 0101. In one aspect of the invention, a homologue of a chosen from: SEQ ID NO:8, SEQ ID NO:10, SEQ ID Schizochytrium PUFA PKS protein or domain encompassed NO:13, SEQ ID NO:18, SEQ ID NO:20, SEQ ID NO:22, by the present invention comprises an amino acid sequence SEQ ID NO:24, SEQ ID NO:26, SEQ ID NO:28, SEQ ID that is at least about 60% identical to at least 500 consecutive NO:30, or SEQ ID NO:32, wherein said amino acid amino acids of an amino acid sequence chosen from: SEQ sequence has a biological activity of at least one domain of ID NO:2, SEQ ID NO:4, and SEQ ID NO:6; wherein said a PUFA PKS system. In a further aspect, the amino acid amino acid sequence has a biological activity of at least one sequence of the homologue is at least about 65% identical, domain of a PUFA PKS system. In a further aspect, the and more preferably at least about 70% identical, and more amino acid sequence of the homologue is at least about 60% preferably at least about 75% identical, and more preferably identical to at least about 600 consecutive amino acids, and at least about 80% identical, and more preferably at least more preferably to at least about 700 consecutive amino about 85% identical, and more preferably at least about 90% acids, and more preferably to at least about 800 consecutive identical, and more preferably at least about 95% identical, amino acids, and more preferably to at least about 900 and more preferably at least about 96% identical, and more consecutive amino acids, and more preferably to at least preferably at least about 97% identical, and more preferably about 1000 consecutive amino acids, and more preferably to at least about 98% identical, and more preferably at least at least about 1100 consecutive amino acids, and more about 99% identical to an amino acid sequence chosen from: preferably to at least about 1200 consecutive amino acids, SEQ ID NO:8, SEQ ID NO:10, SEQ ID NO:13, SEQ ID and more preferably to at least about 1300 consecutive NO:18, SEQ ID NO:20, SEQ ID NO:22, SEQ ID NO:24, amino acids, and more preferably to at least about 1400 SEQ ID NO:26, SEQ ID NO:28, SEQ ID NO:30, SEQ ID consecutive amino acids, and more preferably to at least NO:32, wherein the amino acid sequence has a biological about 1500 consecutive amino acids of any of SEQ ID activity of at least one domain of a PUFA PKS system. NO:2, SEQID NO.4 and SEQID NO:6, or to the full length of SEQ ID NO:6. In a further aspect, the amino acid 0104. According to the present invention, the term “con sequence of the homologue is at least about 60% identical to tiguous” or “consecutive', with regard to nucleic acid or at least about 1600 consecutive amino acids, and more amino acid sequences described herein, means to be con preferably to at least about 1700 consecutive amino acids, nected in an unbroken sequence. For example, for a first and more preferably to at least about 1800 consecutive sequence to comprise 30 contiguous (or consecutive) amino amino acids, and more preferably to at least about 1900 acids of a second sequence, means that the first sequence consecutive amino acids, and more preferably to at least includes an unbroken sequence of 30 amino acid residues about 2000 consecutive amino acids of any of SEQID NO:2 that is 100% identical to an unbroken sequence of 30 amino or SEQ ID NO:4, or to the full length of SEQ ID NO:4. In acid residues in the second sequence. Similarly, for a first a further aspect, the amino acid sequence of the homologue sequence to have “100% identity” with a second sequence is at least about 60% identical to at least about 2100 means that the first sequence exactly matches the second consecutive amino acids, and more preferably to at least sequence with no gaps between nucleotides or amino acids. about 2200 consecutive amino acids, and more preferably to 0105. As used herein, unless otherwise specified, refer at least about 2300 consecutive amino acids, and more ence to a percent (%) identity refers to an evaluation of preferably to at least about 2400 consecutive amino acids, homology which is performed using: (1) a BLAST 2.0 Basic and more preferably to at least about 2500 consecutive BLAST homology search using blastp for amino acid amino acids, and more preferably to at least about 2600 searches, blastin for nucleic acid searches, and blastX for consecutive amino acids, and more preferably to at least nucleic acid searches and searches of translated amino acids about 2700 consecutive amino acids, and more preferably to in all 6 open reading frames, all with standard default at least about 2800 consecutive amino acids, and even more parameters, wherein the query sequence is filtered for low preferably, to the full length of SEQ ID NO:2. complexity regions by default (described in Altschul, S. F., 0102) In another aspect, a homologue of a Madden, T. L., Schaaffer, A. A., Zhang, J., Zhang, Z. Miller, Schizochytrium PUFA PKS protein or domain encompassed W. & Lipman, D. J. (1997) “Gapped BLAST and PSI by the present invention comprises an amino acid sequence BLAST: a new generation of protein database search pro that is at least about 65% identical, and more preferably at grams. Nucleic Acids Res. 25:3389-3402, incorporated least about 70% identical, and more preferably at least about herein by reference in its entirety); (2) a BLAST 2 alignment 75% identical, and more preferably at least about 80% (using the parameters described below); (3) and/or PSI identical, and more preferably at least about 85% identical, BLAST with the standard default parameters (Position and more preferably at least about 90% identical, and more Specific Iterated BLAST). It is noted that due to some US 2008/0026440 A1 Jan. 31, 2008

differences in the standard parameters between BLAST 2.0 NO:26, SEQID NO:28, SEQID NO:30, or SEQID NO:32. Basic BLAST and BLAST 2, two specific sequences might Methods to deduce a complementary sequence are known to be recognized as having significant homology using the those skilled in the art. It should be noted that since amino BLAST 2 program, whereas a search performed in BLAST acid sequencing and nucleic acid sequencing technologies 2.0 Basic BLAST using one of the sequences as the query are not entirely error-free, the sequences presented herein, at sequence may not identify the second sequence in the top best, represent apparent sequences of PUFA PKS domains matches. In addition, PSI-BLAST provides an automated, and proteins of the present invention. easy-to-use version of a “profile' search, which is a sensitive 0114. As used herein, hybridization conditions refer to way to look for sequence homologues. The program first standard hybridization conditions under which nucleic acid performs a gapped BLAST database search. The PSI molecules are used to identify similar nucleic acid mol BLAST program uses the information from any significant ecules. Such standard conditions are disclosed, for example, alignments returned to construct a position-specific score in Sambrook et al., Molecular Cloning: A Laboratory matrix, which replaces the query sequence for the next round Manual, Cold Spring Harbor Labs Press, 1989. Sambrook et of database searching. Therefore, it is to be understood that al., ibid., is incorporated by reference herein in its entirety percent identity can be determined by using any one of these (see specifically, pages 9.31-9.62). In addition, formulae to programs. calculate the appropriate hybridization and wash conditions 0106 Two specific sequences can be aligned to one to achieve hybridization permitting varying degrees of mis another using BLAST 2 sequence as described in Tatusova match of nucleotides are disclosed, for example, in and Madden, (1999), “Blast 2 sequences—a new tool for Meinkoth et al., 1984, Anal Biochem. 138, 267-284; comparing protein and nucleotide sequences. FEMS Micro Meinkoth et al., ibid., is incorporated by reference herein in biol Lett. 174:247-250, incorporated herein by reference in its entirety. its entirety. BLAST 2 sequence alignment is performed in 0115 More particularly, moderate stringency hybridiza blastp or blastn using the BLAST 2.0 algorithm to perform tion and washing conditions, as referred to herein, refer to a Gapped BLAST search (BLAST 2.0) between the two conditions which permit isolation of nucleic acid molecules sequences allowing for the introduction of gaps (deletions having at least about 70% nucleic acid sequence identity and insertions) in the resulting alignment. For purposes of with the nucleic acid molecule being used to probe in the clarity herein, a BLAST 2 sequence alignment is performed hybridization reaction (i.e., conditions permitting about 30% using the standard default parameters as follows. or less mismatch of nucleotides). High Stringency hybrid ization and washing conditions, as referred to herein, refer to For blastin, using 0 BLOSUM62 matrix: conditions which permit isolation of nucleic acid molecules 0107 Reward for match=1 having at least about 80% nucleic acid sequence identity with the nucleic acid molecule being used to probe in the 0108 Penalty for mismatch=-2 hybridization reaction (i.e., conditions permitting about 20% 0109 Open gap (5) and extension gap (2) penalties or less mismatch of nucleotides). Very high Stringency hybridization and washing conditions, as referred to herein, 0110 gap x dropoff (50) expect (10) word size (11) filter refer to conditions which permit isolation of nucleic acid (on) molecules having at least about 90% nucleic acid sequence For blastp, using 0 BLOSUM62 matrix: identity with the nucleic acid molecule being used to probe in the hybridization reaction (i.e., conditions permitting 0111 Open gap (11) and extension gap (1) penalties about 10% or less mismatch of nucleotides). As discussed above, one of skill in the art can use the formulae in 0112 gap X dropoff (50) expect (10) word size (3) filter Meinkoth et al., ibid. to calculate the appropriate hybridiza (on). tion and wash conditions to achieve these particular levels of 0113. In another embodiment of the invention, an amino nucleotide mismatch. Such conditions will vary, depending acid sequence having the biological activity of at least one on whether DNA:RNA or DNA:DNA hybrids are being domain of a PUFA PKS system of the present invention formed. Calculated melting temperatures for DNA:DNA includes an amino acid sequence that is sufficiently similar hybrids are 10° C. less than for DNA:RNA hybrids. In to a naturally occurring PUFA PKS protein or polypeptide particular embodiments, stringent hybridization conditions that a nucleic acid sequence encoding the amino acid for DNA:DNA hybrids include hybridization at an ionic sequence is capable of hybridizing under moderate, high, or strength of 6xSSC (0.9 M. Na") at a temperature of between very high stringency conditions (described below) to (i.e., about 20° C. and about 35° C. (lower stringency), more with) a nucleic acid molecule encoding the naturally occur preferably, between about 28° C. and about 40° C. (more ring PUFA PKS protein or polypeptide (i.e., to the comple stringent), and even more preferably, between about 35°C. ment of the nucleic acid strand encoding the naturally and about 45° C. (even more stringent), with appropriate occurring PUFA PKS protein or polypeptide). Preferably, an wash conditions. In particular embodiments, stringent amino acid sequence having the biological activity of at least hybridization conditions for DNA:RNA hybrids include one domain of a PUFA PKS system of the present invention hybridization at an ionic strength of 6xSSC (0.9 M. Na") at is encoded by a nucleic acid sequence that hybridizes under a temperature of between about 30° C. and about 45° C. moderate, high or very high Stringency conditions to the more preferably, between about 38°C. and about 50° C., and complement of a nucleic acid sequence that encodes a even more preferably, between about 45° C. and about 55° protein comprising an amino acid sequence represented by C., with similarly stringent wash conditions. These values any of SEQ ID NO:2, SEQ ID NO:4, SEQ ID NO:6, SEQ are based on calculations of a melting temperature for ID NO:8, SEQID NO:10, SEQID NO:13, SEQID NO:18, molecules larger than about 100 nucleotides, 0% formamide SEQ ID NO:20, SEQ ID NO:22, SEQ ID NO:24, SEQ ID and a G+C content of about 40%. Alternatively, T. can be US 2008/0026440 A1 Jan. 31, 2008

calculated empirically as set forth in Sambrook et al., Supra, inserted into the recombinant vector to produce a recombi pages 9.31 to 9.62. In general, the wash conditions should be nant nucleic acid molecule. The nucleic acid sequence as stringent as possible, and should be appropriate for the encoding the protein to be produced is inserted into the chosen hybridization conditions. For example, hybridization vector in a manner that operatively links the nucleic acid conditions can include a combination of salt and temperature sequence to regulatory sequences in the vector which enable conditions that are approximately 20-25° C. below the the transcription and translation of the nucleic acid sequence calculated T of a particular hybrid, and wash conditions within the recombinant host cell. typically include a combination of salt and temperature 0118. In another embodiment, a recombinant vector used conditions that are approximately 12-20° C. below the in a recombinant nucleic acid molecule of the present calculated T of the particular hybrid. One example of invention is a targeting vector. As used herein, the phrase hybridization conditions suitable for use with DNA:DNA “targeting vector is used to refer to a vector that is used to hybrids includes a 2-24 hour hybridization in 6xSSC (50% deliver a particular nucleic acid molecule into a recombinant formamide) at about 42°C., followed by washing steps that host cell, wherein the nucleic acid molecule is used to delete include one or more washes at room temperature in about or inactivate an endogenous gene within the host cell or 2xSSC, followed by additional washes at higher tempera microorganism (i.e., used for targeted gene disruption or tures and lower ionic strength (e.g., at least one wash as knock-out technology). Such a vector may also be known in about 37° C. in about 0.1x-0.5xSSC, followed by at least the art as a “knock-out vector. In one aspect of this one wash at about 68° C. in about 0.1x-0.5xSSC). embodiment, a portion of the vector, but more typically, the 0116. Another embodiment of the present invention nucleic acid molecule inserted into the vector (i.e., the includes a recombinant nucleic acid molecule comprising a insert), has a nucleic acid sequence that is homologous to a recombinant vector and a nucleic acid molecule comprising nucleic acid sequence of a target gene in the host cell (i.e., a nucleic acid sequence encoding an amino acid sequence a gene which is targeted to be deleted or inactivated). The having a biological activity of at least one domain of a PUFA nucleic acid sequence of the vector insert is designed to bind PKS system as described herein. Such nucleic acid to the target gene Such that the target gene and the insert sequences are described in detail above. According to the undergo homologous recombination, whereby the endog present invention, a recombinant vector is an engineered enous target gene is deleted, inactivated or attenuated (i.e., (i.e., artificially produced) nucleic acid molecule that is used by at least a portion of the endogenous target gene being as a tool for manipulating a nucleic acid sequence of choice mutated or deleted). and for introducing Such a nucleic acid sequence into a host cell. The recombinant vector is therefore suitable for use in 0119 Typically, a recombinant nucleic acid molecule cloning, sequencing, and/or otherwise manipulating the includes at least one nucleic acid molecule of the present nucleic acid sequence of choice. Such as by expressing invention operatively linked to one or more transcription and/or delivering the nucleic acid sequence of choice into a control sequences. As used herein, the phrase “recombinant host cell to form a recombinant cell. Such a vector typically molecule' or “recombinant nucleic acid molecule' primarily contains heterologous nucleic acid sequences, that is nucleic refers to a nucleic acid molecule or nucleic acid sequence acid sequences that are not naturally found adjacent to operatively linked to a transcription control sequence, but nucleic acid sequence to be cloned or delivered, although the can be used interchangeably with the phrase “nucleic acid vector can also contain regulatory nucleic acid sequences molecule', when such nucleic acid molecule is a recombi (e.g., promoters, untranslated regions) which are naturally nant molecule as discussed herein. According to the present found adjacent to nucleic acid molecules of the present invention, the phrase “operatively linked’ refers to linking a invention or which are useful for expression of the nucleic nucleic acid molecule to a transcription control sequence in acid molecules of the present invention (discussed in detail a manner such that the molecule is able to be expressed below). The vector can be either RNA or DNA, either when transfected (i.e., transformed, transduced, transfected, prokaryotic or eukaryotic, and typically is a plasmid. The conjugated or conduced) into a host cell. Transcription vector can be maintained as an extrachromosomal element control sequences are sequences which control the initiation, (e.g., a plasmid) or it can be integrated into the chromosome elongation, or termination of transcription. Particularly of a recombinant organism (e.g., a microbe or a plant). The important transcription control sequences are those which entire vector can remain in place within a host cell, or under control transcription initiation, such as promoter, enhancer, certain conditions, the plasmid DNA can be deleted, leaving operator and repressor sequences. Suitable transcription behind the nucleic acid molecule of the present invention. control sequences include any transcription control The integrated nucleic acid molecule can be under chromo sequence that can function in a host cell or organism into Somal promoter control, under native or plasmid promoter which the recombinant nucleic acid molecule is to be control, or under a combination of several promoter con introduced. trols. Single or multiple copies of the nucleic acid molecule 0120 Recombinant nucleic acid molecules of the present can be integrated into the chromosome. A recombinant invention can also contain additional regulatory sequences, vector of the present invention can contain at least one Such as translation regulatory sequences, origins of replica selectable marker. tion, and other regulatory sequences that are compatible 0117. In one embodiment, a recombinant vector used in with the recombinant cell. In one embodiment, a recombi a recombinant nucleic acid molecule of the present invention nant molecule of the present invention, including those is an expression vector. As used herein, the phrase “expres which are integrated into the host cell chromosome, also sion vector” is used to refer to a vector that is suitable for contains secretory signals (i.e., signal segment nucleic acid production of an encoded product (e.g., a protein of interest). sequences) to enable an expressed protein to be secreted In this embodiment, a nucleic acid sequence encoding the from the cell that produces the protein. Suitable signal product to be produced (e.g., a PUFA PKS domain) is segments include a signal segment that is naturally associ US 2008/0026440 A1 Jan. 31, 2008

ated with the protein to be expressed or any heterologous transfecting a host cell with one or more recombinant signal segment capable of directing the secretion of the molecules to form a recombinant cell. Suitable host cells to protein according to the present invention. In another transfect include, but are not limited to, any bacterial, fungal embodiment, a recombinant molecule of the present inven (e.g., yeast), insect, plant or animal cell that can be trans tion comprises a leader sequence to enable an expressed fected. Host cells can be either untransfected cells or cells protein to be delivered to and inserted into the membrane of that are already transfected with at least one other recom a host cell. Suitable leader sequences include a leader binant nucleic acid molecule. sequence that is naturally associated with the protein, or any 0.125. According to the present invention, the term “trans heterologous leader sequence capable of directing the deliv fection' is used to refer to any method by which an exog ery and insertion of the protein to the membrane of a cell. enous nucleic acid molecule (i.e., a recombinant nucleic acid 0121 The present inventors have found that the molecule) can be inserted into a cell. The term “transfor Schizochytrium PUFA PKS Orfs A and B are closely linked mation' can be used interchangeably with the term “trans in the genome and region between the Orfs has been fection' when such term is used to refer to the introduction sequenced. The Orfs are oriented in opposite directions and of nucleic acid molecules into microbial cells. Such as algae, 4244 base pairs separate the start (ATG) codons (i.e. they are bacteria and yeast. In microbial systems, the term “trans arranged as follows: 3'OrfA5'-4244 bp-5'OrfB3'). Exami formation' is used to describe an inherited change due to the nation of the 4244 bp intergenic region did not reveal any acquisition of exogenous nucleic acids by the microorgan obvious Orfs (no significant matches were found on a ism and is essentially synonymous with the term “transfec BlastX search). Both Orfs A and B are highly expressed in tion.” However, in animal cells, transformation has acquired Schizochytrium, at least during the time of oil production, a second meaning which can refer to changes in the growth implying that active promoter elements are embedded in this properties of cells in culture after they become cancerous, intergenic region. These genetic elements are believed to for example. Therefore, to avoid confusion, the term “trans have utility as a bi-directional promoter sequence for trans fection' is preferably used with regard to the introduction of genic applications. For example, in a preferred embodiment, exogenous nucleic acids into animal cells, and the term one could clone this region, place any genes of interest at “transfection' will be used herein to generally encompass each end and introduce the construct into Schizochytrium (or transfection of animal cells, plant cells and transformation of some other host in which the promoters can be shown to microbial cells, to the extent that the terms pertain to the function). It is predicted that the regulatory elements, under introduction of exogenous nucleic acids into a cell. There the appropriate conditions, would provide for coordinated, fore, transfection techniques include, but are not limited to, high level expression of the two introduced genes. The transformation, particle bombardment, electroporation, complete nucleotide sequence for the regulatory region microinjection, lipofection, adsorption, infection and proto containing Schizochytrium PUFA PKS regulatory elements plast fusion. (e.g., a promoter) is represented herein as SEQ ID NO:36. 0.126. It will be appreciated by one skilled in the art that use of recombinant DNA technologies can improve control 0122) In a similar manner, Orf is highly expressed in of expression of transfected nucleic acid molecules by Schizochytrium during the time of oil production and regu manipulating, for example, the number of copies of the latory elements are expected to reside in the region upstream nucleic acid molecules within the host cell, the efficiency of its start codon. A region of genomic DNA upstream of with which those nucleic acid molecules are transcribed, the OrfQ has been cloned and sequenced and is represented efficiency with which the resultant transcripts are translated, hereinas (SEQ ID NO:37). This sequence contains the 3886 and the efficiency of post-translational modifications. Addi nt immediately upstream of the Orf start codon. Exami tionally, the promoter sequence might be genetically engi nation of this region did not reveal any obvious Orfs (i.e., no neered to improve the level of expression as compared to the significant matches were found on a BlastX search). It is native promoter. Recombinant techniques useful for control believed that regulatory elements contained in this region, ling the expression of nucleic acid molecules include, but are under the appropriate conditions, will provide for high-level not limited to, integration of the nucleic acid molecules into expression of a gene placed behind them. Additionally, one or more host cell chromosomes, addition of vector under the appropriate conditions, the level of expression stability sequences to plasmids, Substitutions or modifica may be coordinated with genes under control of the A-B tions of transcription control signals (e.g., promoters, opera intergenic region (SEQ ID NO:36). tors, enhancers), Substitutions or modifications of transla 0123 Therefore, in one embodiment, a recombinant tional control signals (e.g., ribosome binding sites, Shine nucleic acid molecule useful in the present invention, as Dalgarno sequences), modification of nucleic acid disclosed herein, can include a PUFAPKS regulatory region molecules to correspond to the codon usage of the host cell, contained within SEQ ID NO:36 and/or SEQ ID NO:37. and deletion of sequences that destabilize transcripts. Such a regulatory region can include any portion (fragment) 0.127 General discussion above with regard to recombi of SEQ ID NO:36 and/or SEQ ID NO:37 that has at least nant nucleic acid molecules and transfection of host cells is basal PUFA PKS transcriptional activity. intended to be applied to any recombinant nucleic acid 0124 One or more recombinant molecules of the present molecule discussed herein, including those encoding any invention can be used to produce an encoded product (e.g., amino acid sequence having a biological activity of at least a PUFA PKS domain, protein, or system) of the present one domain from a PUFA PKS, those encoding amino acid invention. In one embodiment, an encoded product is pro sequences from other PKS systems, and those encoding duced by expressing a nucleic acid molecule as described other proteins or domains. herein under conditions effective to produce the protein. A 0128. This invention also relates to the use of a novel preferred method to produce an encoded protein is by method to identify a microorganism that has a PUFA PKS US 2008/0026440 A1 Jan. 31, 2008 system that is homologous in structure, domain organization process, these cells are then placed under low oxygen or and/or function to a Schizochytrium PUFA PKS system. In anoxic culture conditions (e.g., dissolved oxygen less than one embodiment, the microorganism is a non-bacterial about 5% of saturation, more preferably less than about 2%, microorganism, and preferably, the microorganism identi even more preferably less than about 1%, and most prefer fied by this method is a eukaryotic microorganism. In ably dissolved oxygen of about 0% of saturation in the addition, this invention relates to the microorganisms iden culture medium) and allowed to grow for approximately tified by Such method and to the use of these microorganisms another 24-72 hours. In this process, the microorganisms and the PUFA PKS systems from these microorganisms in should be cultured at a temperature greater than about 15° the various applications for a PUFA PKS system (e.g., C., and more preferably greater than about 20°C., and even genetically modified organisms and methods of producing more preferably greater than about 25°C., and even more bioactive molecules) according to the present invention. The preferably greater than 30° C. The low or anoxic culture unique screening method described and demonstrated herein environment can be easily maintained in culture chambers enables the rapid identification of new microbial strains capable of inducing this type of atmospheric environment in containing a PUFA PKS system homologous to the the chamber (and thus in the cultures) or by culturing the Schizochytrium PUFA PKS system of the present invention. cells in a manner that induces the low oxygen environment Applicants have used this method to discover and disclose directly in the culture flask/vessel itself. herein that a Thraustochytrium microorganism contains a 0133. In a preferred culturing method, the microbes can PUFA PKS system that is homologous to that found in be cultured in shake flasks which, instead of normally Schizochytrium. This discovery is described in detail in containing a small amount of culture medium-less than Example 2 below. about 50% of total capacity and usually less than about 25% 0129 Microbial organisms with a PUFA PKS system of total capacity—to keep the medium aerated as it is shaken similar to that found in Schizochytrium, such as the Thraus on a shaker table, are instead filled to greater than about 50% tochytrium microorganism discovered by the present inven of their capacity, and more preferably greater than about tors and described in Example 2, can be readily identified/ 60%, and most preferably greater than about 75% of their isolated/screened by the following methods used separately capacity with culture medium. High loading of the shake or in any combination of these methods. flask with culture medium prevents it from mixing very well in the flask when it is placed on a shaker table, preventing 0130. In general, the method to identify a non-bacterial oxygen diffusion into the culture. Therefore as the microbes microorganism that has a polyunsaturated fatty acid (PUFA) grow, they use up the existing oxygen in the medium and polyketide synthase (PKS) system includes a first step of (a) naturally create a low or no oxygen environment in the shake selecting a microorganism that produces at least one PUFA; flask. and a second step of (b) identifying a microorganism from (a) that has an ability to produce increased PUFAs under 0.134. After the culture period, the cells are harvested and dissolved oxygen conditions of less than about 5% of analyzed for content of bioactive compounds of interest saturation in the fermentation medium, as compared to (e.g., lipids), but most particularly, for compounds contain production of PUFAs by said microorganism under dis ing two or more unsaturated bonds, and more preferably Solved oxygen conditions of greater than 5% of Saturation, three or more double bonds, and even more preferably four more preferably 10% of saturation, more preferably greater or more double bonds. For lipids, those strains possessing than 15% of saturation and more preferably greater than Such compounds at greater than about 5%, and more pref 20% of saturation in the fermentation medium. A microor erably greater than about 10%, and more preferably greater ganism that produces at least one PUFA and has an ability than about 15%, and even more preferably greater than to produce increased PUFAs under dissolved oxygen con about 20% of the dry weight of the microorganism are ditions of less than about 5% of saturation is identified as a identified as predictably containing a novel PKS system of candidate for containing a PUFAPKS system. Subsequent to the type described above. For other bioactive compounds, identifying a microorganism that is a strong candidate for Such as antibiotics or compounds that are synthesized in containing a PUFA PKS system, the method can include an Smaller amounts, those strains possessing Such compounds additional step (c) of detecting whether the organism iden at greater than about 0.5%, and more preferably greater than tified in step (b) comprises a PUFA PKS system. about 0.1%, and more preferably greater than about 0.25%, and more preferably greater than about 0.5%, and more 0131. In one embodiment of the present invention, step preferably greater than about 0.75%, and more preferably (b) is performed by culturing the microorganism selected for greater than about 1%, and more preferably greater than the screening process in low oxygen/anoxic conditions and about 2.5%, and more preferably greater than about 5% of aerobic conditions, and, in addition to measuring PUFA the dry weight of the microorganism are identified as pre content in the organism, the fatty acid profile is determined, dictably containing a novel PKS system of the type as well as fat content. By comparing the results under low described above. oxygen/anoxic conditions with the results under aerobic conditions, the method provides a strong indication of 0.135 Alternatively, or in conjunction with this method, whether the microorganism contains a PUFA PKS prospective microbial strains containing novel PUFA PKS system of the present invention. This preferred embodiment systems as described herein can be identified by examining is described in detail below. the fatty acid profile of the strain (obtained by culturing the organism or through published or other readily available 0132) Initially, microbial strains to be examined for the sources). If the microbe contains greater than about 30%, presence of a PUFA PKS system are cultured under aerobic and more preferably greater than about 40%, and more conditions to induce production of a large number of cells preferably greater than about 45%, and even more preferably (microbial biomass). As one element of the identification greater than about 50% of its total fatty acids as C14:0. US 2008/0026440 A1 Jan. 31, 2008 20

C16:0 and/or C16:1, while also producing at least one long 0.144 (4) Did fat content (as amount total fatty acids/cell chain fatty acid with three or more unsaturated bonds, and dry weight) increase in the low oxygen culture compared to more preferably 4 or more double bonds, and more prefer the aerobic culture? ably 5 or more double bonds, and even more preferably 6 or more double bonds, then this microbial strain is identified as 0145 (5) Did DHA (or other PUFA content) increase as a likely candidate to possess a novel PUFA PKS system of % cell dry weight in the low oxygen culture compared to the the type described in this invention. Screening this organism aerobic culture? under the low oxygen conditions described above, and 0146 If the first three questions are answered yes, this is confirming production of bioactive molecules containing a good indication that the strain contains a PKS genetic two or more unsaturated bonds would suggest the existence system for making long chain PUFAs. The more questions of a novel PUFA PKS system in the organism, which could that are answered yes (preferably the first three questions be further confirmed by analysis of the microbes genome. must be answered yes), the stronger the indication that the 0136. The success of this method can also be enhanced strain contains such a PKS genetic system. If all five by screening eukaryotic strains that are known to contain questions are answered yes, then there is a very strong C17:0 and or C17:1 fatty acids (in conjunction with the large indication that the strain contains a PKS genetic system for percentages of C14:0, C16:0 and C16:1 fatty acids described making long chain PUFAs. The lack of 18:3n-3/18:2n-6/ above) because the C17:0 and C17:1 fatty acids are poten 18:3n-6 would indicate that the low oxygen conditions tial markers for a bacterial (prokaryotic) based or influenced would have turned off or inhibited the conventional pathway fatty acid production system. Another marker for identifying for PUFA synthesis. A high 14:0/16:0/16:1 fatty is an strains containing novel PUFA PKS systems is the produc preliminary indicator of a bacterially influenced fatty acid tion of simple fatty acid profiles by the organism. According synthesis profile (the presence of C17:0 and 17:1 is also and to the present invention, a “simple fatty acid profile' is indicator of this) and of a simple fatty acid profile. The defined as 8 or fewer fatty acids being produced by the strain increased PUFA synthesis and PUFA containing fat synthe at levels greater than 10% of total fatty acids. sis under the low oxygen conditions is directly indicative of 0137 Use of any of these methods or markers (singly or a PUFA PKS system, since this system does not require preferably in combination) would enable one of skill in the oxygen to make highly unsaturated fatty acids. art to readily identify microbial strains that are highly 0147 Finally, in the identification method of the present predicted to contain a novel PUFA PKS system of the type invention, once a strong candidate is identified, the microbe described in this invention. is preferably screened to detect whether or not the microbe 0138. In a preferred embodiment combining many of the contains a PUFA PKS system. For example, the genome of methods and markers described above, a novel biorational the microbe can be screened to detect the presence of one or screen (using shake flask cultures) has been developed for more nucleic acid sequences that encode a domain of a detecting microorganisms containing PUFA producing PKS PUFA PKS system as described herein. Preferably, this step systems. This screening system is conducted as follows: of detection includes a suitable nucleic acid detection 0.139. A portion of a culture of the strain/microorganism method, such as hybridization, amplification and or sequenc to be tested is placed in 250 mL baffled shake flask with 50 ing of one or more nucleic acid sequences in the microbe of mL culture media (aerobic treatment), and another portion of interest. The probes and/or primers used in the detection culture of the same strain is placed in a 250 mL non-baffled methods can be derived from any known PUFAPKS system, shake flask with 200 mL culture medium (anoxic/low oxy including the marine bacteria PUFA PKS systems described gen treatment). Various culture media can be employed in U.S. Pat. No. 6,140,486, or the Thraustochytrid PUFA depending on the type and strain of microorganism being PKS systems described in U.S. application Ser. No. 09/231, evaluated. Both flasks are placed on a shaker table at 200 899 and herein. Once novel PUFA PKS systems are iden rpm. After 48-72 hr of culture time, the cultures are har tified, the genetic material from these systems can also be vested by centrifugation and the cells are analyzed for fatty used to detect additional novel PUFAPKS systems. Methods acid methyl ester content via gas chromatography to deter of hybridization, amplification and sequencing of nucleic mine the following data for each culture: (1) fatty acid acids for the purpose of identification and detection of a profile; (2) PUFA content; and (3) fat content (approximated sequence are well known in the art. Using these detection as amount total fatty acids/cell dry weight). methods, sequence homology and domain structure (e.g., the presence, number and/or arrangement of various PUFAPKS 0140. These data are then analyzed asking the following functional domains) can be evaluated and compared to the five questions (Yes/No): known PUFA PKS systems described herein. Comparing the Data from the Low O/Anoxic Flask with the Data from the Aerobic Flask: 0.148. In some embodiments, a PUFA PKS system can be identified using biological assays. For example, in U.S. 0141 (1) Did the DHA (or other PUFA content) (as % application Ser. No. 09/231,899, Example 7, the results of a FAME (fatty acid methyl esters)) stay about the same or key experiment using a well-known inhibitor of some types preferably increased in the low oxygen culture compared to of fatty acid synthesis systems, i.e., thiolactomycin, is the aerobic culture? described. The inventors showed that the synthesis of 0142 (2) Is C14:0+C16:0+C16:1 greater than about 40% PUFAs in whole cells of Schizochytrium could be specifi TFA in the anoxic culture? cally blocked without blocking the synthesis of short chain 0143 (3) Are there very little (<1% as FAME) or no saturated fatty acids. The significance of this result is as precursors (C18:3n-3+C18:2n-6+C 18:3n-6) to the conven follows: the inventors knew from analysis of cDNA tional oxygen dependent elongase? desaturase pathway in the sequences from Schizochytrium that a Type I fatty acid anoxic culture? synthase system is present in Schizochytrium. It was known US 2008/0026440 A1 Jan. 31, 2008 that thiolactomycin does not inhibit Type I FAS systems, and cycloheximide (inhibitor of eukaryotic protein synthesis) this is consistent with the inventors data—i.e., production can be added to agar plates used to culture/select initial of the saturated fatty acids (primarily C14:0 and C16:0 in strains from water samples/soil samples collected from the Schizochytrium) was not inhibited by the thiolactomycin types of habitats/niches described below. This process would treatment. There are no indications in the literature or in the help select for enrichment of bacterial strains without (or inventors’ own data that thiolactomycin has any inhibitory minimal) contamination of eukaryotic strains. This selection effect on the elongation of C14:0 or C16:0 fatty acids or their process, in combination with culturing the plates at elevated desaturation (i.e. the conversion of short chain Saturated temperatures (e.g. 30° C.), and then selecting strains that fatty acids to PUFAs by the classical pathway). Therefore, produce at least one PUFA would initially identify candidate the fact that the PUFA production in Schizochytrium was bacterial strains with a PUFA PKS system that is operative blocked by thiolactomycin strongly indicates that the clas at elevated temperatures (as opposed to those bacterial sical PUFA synthesis pathway does not produce the PUFAs strains in the prior art which only exhibit PUFA production in Schizochytrium, but rather that a different pathway of at temperatures less than about 20° C. and more preferably synthesis is involved. Further, it had previously been deter below about 5° C.). mined that the Shewanella PUFAPKS system is inhibited by thiolactomycin (note that the PUFA PKS system of the 0152 Locations for collection of the preferred types of present invention has elements of both Type I and Type II microbes for screening for a PUFAPKS system according to systems), and it was known that thiolactomycin is an inhibi the present invention include any of the following: low tor of Type II FAS systems (such as that found in E. coli). oxygen environments (or locations near these types of low Therefore, this experiment indicated that Schizochytrium oxygen environments including in the guts of animals produced PUFAs as a result of a pathway not involving the including invertebrates that consume microbes or microbe Type I FAS. A similar rationale and detection step could be containing foods (including types of filter feeding organ used to detect a PUFA PKS system in a microbe identified isms), low or non-oxygen containing aquatic habitats using the novel screening method disclosed herein. (including freshwater, saline and marine), and especially at or near-low oxygen environments (regions) in the . 0149. In addition, Example 3 shows additional biochemi The microbial strains would preferably not be obligate cal data which provides evidence that PUFAs in anaerobes but be adapted to live in both aerobic and low or Schizochytrium are not produced by the classical pathway anoxic environments. Soil environments containing both (i.e., precursor product kinetics between C16:0 and DHA are aerobic and low oxygen or anoxic environments would also not observed in whole cells and, in vitro PUFA synthesis can excellent environments to find these organisms in and espe be separated from the membrane fraction—all of the fatty cially in these types of soil in aquatic habitats or temporary acid desaturases of the classical PUFA synthesis pathway, aquatic habitats. with the exception of the delta 9 desaturase which inserts the first double bond of the series, are associated with cellular 0153. A particularly preferred microbial strain would be membranes). This type of biochemical data could be used to a strain (selected from the group consisting of algae, fungi detect PUFA PKS activity in microbe identified by the novel (including yeast), protozoa or ) that, during a portion screening method described above. of its life cycle, is capable of consuming whole bacterial cells (bacterivory) by mechanisms such as phagocytosis, 0150 Preferred microbial strains to screen using the phagotrophic or endocytic capability and/or has a stage of its screening/identification method of the present invention are life cycle in which it exists as an amoeboid stage or naked chosen from the group consisting of bacteria, algae, fungi, protoplast. This method of nutrition would greatly increase protozoa or protists, but most preferably from the eukaryotic the potential for transfer of a bacterial PKS system into a microbes consisting of algae, fungi, protozoa and protists. eukaryotic cell if a mistake occurred and the bacterial cell These microbes are preferably capable of growth and pro (or its DNA) did not get digested and instead are function duction of the bioactive compounds containing two or more ally incorporated into the eukaryotic cell. unsaturated bonds attemperatures greater than about 15°C., more preferably greater than about 20° C., even more 0154 Strains of microbes (other than the members of the preferably greater than about 25° C. and most preferably Thraustochytrids) capable of bacterivory (especially by greater than about 30° C. phagocytosis or endocytosis) can be found in the following microbial classes (including but not limited to example 0151. In some embodiments of this method of the present genera): invention, novel bacterial PUFA PKS systems can be iden tified in bacteria that produce PUFAs at temperatures 0.155. In the algae and algae-like microbes (including exceeding about 20° C., preferably exceeding about 25°C. ): of the class Euglenophyceae (for example and even more preferably exceeding about 30° C. As genera Euglena, and Peranema), the class Chrysophyceae described previously herein, the marine bacteria, (for example the genus Ochromonas), the class Dinobry Shewanella and Vibrio marinus, described in U.S. Pat. No. aceae (for example the genera Dinobryon, Platychrysis, and 6,140,486, do not produce PUFAs at higher temperatures, Chrysochromulina), the Dinophyceae (including the genera which limits the usefulness of PUFA PKS systems derived Crypthecodinium, Gymnodinium, Peridinium, Ceratium, from these bacteria, particularly in plant applications under Gyrodinium, and Oxyrrhis), the class (for field conditions. Therefore, in one embodiment, the screen example the genera Cryptomonas, and Rhodomonas), the ing method of the present invention can be used to identify class Xanthophyceae (for example the genus Olisthodiscus) bacteria that have a PUFA PKS system which are capable of (and including forms of algae in which an amoeboid stage growth and PUFA production at higher temperatures (e.g., occurs as in the Rhizochloridaceae, and above about 20, 25, or 30° C.). In this embodiment, inhibi Zoospores/gametes of Aphanochaete pascheri, Bumilleria tors of eukaryotic growth Such as nyStatin (antifungal) or stigeoclonium and Vaucheria geminata), the class Eustig US 2008/0026440 A1 Jan. 31, 2008 22 matophyceae, and the class Prymnesiopyceae (including the 0.160 One embodiment of the present invention relates to genera Prymnesium and Diacronema). any microorganisms identified using the novel PUFA PKS 0156. In the Stramenopiles including the: Protero screening method described above, to the PUFA PKS genes monads, Opalines, Developayella, Diplophorys, Larbrin and proteins encoded thereby, and to the use of Such thulids, Thraustochytrids, Bicosecids, , microorganisms and/or PUFA PKS genes and proteins Hypochytridiomycetes, Commation, Reticulosphaera, Pel (including homologues and fragments thereof) in any of the agomonas, Pelapococcus, Ollicola, Aureococcus, , methods described herein. In particular, the present inven Raphidiophytes, Synurids, Rhizochromulinaales, Pedinella tion encompasses organisms identified by the screening les, , Chrysomeridales, Sarcinochrysidales, method of the present invention which are then genetically , , and . modified to regulate the production of bioactive molecules 0157. In the Fungi. Class Myxomycetes (form myxam by said PUFA PKS system. oebae)—slime molds, class Acrasieae including the orders 0.161 Yet another embodiment of the present invention Acrasiceae (for example the genus Sappinia), class Guttuli relates to an isolated nucleic acid molecule comprising a naceae (for example the genera Guttulinopsis, and Gut nucleic acid sequence encoding at least one biologically tulina), class Dicty Steliaceae (for example the genera Acra active domain or biologically active fragment thereof of a sis, Dictyostelium, Polysphondylium, and Coenonia), and polyunsaturated fatty acid (PUFA) polyketide synthase class Phycomyceae including the orders Chytridiales, (PKS) system from a Thraustochytrid microorganism. As Ancylistales, Blastocladiales, Monoblepharidales, Saproleg discussed above, the present inventors have successfully niales, . Mucorales, and Entomophthorales. used the method to identify a non-bacterial microorganism 0158. In the Protozoa: Protozoa strains with life stages that has a PUFA PKS system to identify additional members capable of bacterivory (including byphageocytosis) can be of the order Thraustochytriales which contain a PUFA PKS selected from the types classified as ciliates, flagellates or system. The identification of three such microorganisms is amoebae. Protozoan ciliates include the groups: Chonot described in Example 2. Specifically, the present inventors richs, Colpodids, Cyrtophores, Haptorids, Karyorelicts, Oli have used the screening method of the present invention to gohymenophora, Polyhymenophora (spirotrichs), Prostomes identify Thraustochytrium sp. 23B (ATCC 20892) as being and Suctoria. Protozoan flagellates include the Biosoecids, highly predicted to contain a PUFA PKS system, followed Bodonids, Cercomonads, Chrysophytes (for example the by detection of sequences in the Thraustochytrium sp. 23B genera Anthophysa, Chrysanoemba, Chrysosphaerella, genome that hybridize to the Schizochytrium PUFA PKS Dendromonas, Dinobryon, Mallomonas, Ochromonas, genes disclosed herein. Schizochytrium limacium (IFO Paraphysomonas, Poteriodchromonas, Spumella, Syn 32693) and Ulkenia (BP-5601) have also been identified as crypta, Synura, and Uroglena), Collar flagellates, Crypto good candidates for containing PUFA PKS systems. Based phytes (for example the genera Chilomonas, Cryptomonas, on these data and on the similarities among members of the Cyanomonas, and ), Dinoflagellates, order Thraustochytriales, it is believed that many other Diplomonads, Euglenoids, Heterolobosea, Pedinellids, Thraustochytriales PUFA PKS systems can now be readily Pelobionts, Phalansteriids, Pseudodendromonads, Spon identified using the methods and tools provided by the gomonads and Volvocales (and other flagellates including present invention. Therefore, Thraustochytriales PUFA PKS the unassigned genera of Artodiscus, Clautriavia, systems and portions and/or homologues thereof (e.g., pro Helkesimastix, Kathablepharis and Multicilia). Amoeboid teins, domains and fragments thereof), genetically modified protozoans include the groups: Actinophryids, , organisms comprising Such systems and portions and/or Desmothoricids, Diplophryids, Eumamoebae, Heterolobo homologues thereof, and methods of using Such microor sea, Leptomyxids, Nucleariid filose amoebae, Pelebionts, ganisms and PUFA PKS systems, are encompassed by the and Vampyrellids (and including the unas present invention. signed amoebid genera Gymnophrys, Biomyxa, Microcom 0162 Developments have resulted in revision of the etes, , Belonocystis, Elaeorhanis, Allelogro of the Thraustochytrids. Taxonomic theorists mia, or Lieberkuhnia). The protozoan orders include place Thraustochytrids with the algae or algae-like protists. the following: Percolomonadeae, Heterolobosea, Lyromon However, because of taxonomic uncertainty, it would be best adea, Pseudociliata, Trichomonadea, Hypermastigea, Heter for the purposes of the present invention to consider the omiteae, Telonemea, Cyathobodonea, Ebridea, Pyytomyxea, strains described in the present invention as Thraus Opalinea, Kinetomonadea, Hemimastigea, Protostelea, tochytrids (Order: Thraustochytriales; Family: Thraustochy Myxagastrea, Dictyostelea, Choanomonadea, Apicomon triaceae; Genus: Thraustochytrium, Schizochytrium, Laby adea, Eogregarinea, Neogregarinea, Coelotrolphea, Eucoc rinthuloides, or Japonochytrium). For the present invention, cidea, Haemosporea, Piroplasmea, Spirotrichea, Prosto members of the labrinthulids are considered to be included matea, Litostomatea, Phyllopharyngea, Nassophorea, in the Thraustochytrids. Taxonomic changes are summarized Oligohymenophorea, Colpodea, Karyorelicta, Nucleohelea, below. Strains of certain unicellular microorganisms dis Centrohelea, , Sticholonchea, Polycystinea, closed herein are members of the order Thraustochytriales. , Lobosea, Filosea, Athalamea, , Thraustochytrids are marine eukaryotes with a evolving Polythalamea, , Schizocladea, Holosea, taxonomic history. Problems with the taxonomic placement Entamoebea, Myxosporea, Actinomyxea, Halosporea, of the Thraustochytrids have been reviewed by Moss (1986), Paramyxea, Rhombozoa and Orthonectea. Bahnweb and Jackle (1986) and Chamberlain and Moss 0159. A preferred embodiment of the present invention (1988). According to the present invention, the phrases includes strains of the microorganisms listed above that have “Thraustochytrid, “Thraustochytriales microorganism” and been collected from one of the preferred habitats listed “microorganism of the order Thraustochytriales' can be above. used interchangeably. US 2008/0026440 A1 Jan. 31, 2008

0163 For convenience purposes, the Thraustochytrids recently reaffirmed by Cavalier-Smith et al. using the 18s were first placed by taxonomists with other colorless rRNA signatures of the Heterokonta to demonstrate that Zoosporic eukaryotes in the Phycomycetes (algae-like Thraustochytrids are chromists not Fungi (Cavalier-Smith et fungi). The name Phycomycetes, however, was eventually al., 1994, Phil. Tran. Roy. Soc. London Series BioSciences dropped from taxonomic status, and the Thraustochytrids 346:387-397). This places them in a completely different were retained in the Oomycetes (the biflagellate Zoosporic from the fungi, which are all placed in the kingdom fungi). It was initially assumed that the Oomycetes were Eufungi. The taxonomic placement of the Thraustochytrids related to the algae, and eventually a wide range is therefore summarized below: of ultrastructural and biochemical studies, Summarized by Kingdom: Chromophyta (Stramenopiles) Barr (Barr, 1981, Biosystems 14:359-370) supported this assumption. The Oomycetes were in fact accepted by Phylum: Heterokonta Leedale (Leedale, 1974, Taxon 23:261-270) and other phy cologists as part of the heterokont algae. However, as a Order: Thraustochytriales matter of convenience resulting from their heterotrophic Family: Thraustochytriaceae nature, the Oomycetes and Thraustochytrids have been largely studied by mycologists (scientists who study fungi) Genus: Thraustochytrium, Schizochytrium, Labyrinthu rather than phycologists (scientists who study algae). loides, or Japonochytrium 0164. From another taxonomic perspective, evolutionary 0.167 Some early taxonomists separated a few original biologists have developed two general Schools of thought as members of the genus Thraustochytrium (those with an to how eukaryotes evolved. One theory proposes an exog amoeboid life stage) into a separate genus called Ulkenia. enous origin of membrane-bound organelles through a series However it is now known that most, if not all, Thraus of endosymbioses (Margulis, 1970, Origin of Eukaryotic tochytrids (including Thraustochytrium and Cells. Yale University Press, New Haven); e.g., mitochon Schizochytrium), exhibit amoeboid stages and as such, Ulk dria were derived from bacterial endosymbionts, chloro enia is not considered by some to be a valid genus. As used plasts from cyanophytes, and flagella from Spirochaetes. The herein, the genus Thraustochytrium will include Ulkenia. other theory suggests a gradual evolution of the membrane 0.168. Despite the uncertainty of taxonomic placement bound organelles from the non-membrane-bounded systems within higher classifications of Phylum and Kingdom, the of the prokaryote ancestor via an autogenous process (Cava Thraustochytrids remain a distinctive and characteristic lier-Smith, 1975, Nature (Lond.) 256:462-468). Both groups grouping whose members remain classifiable within the of evolutionary biologists however, have removed the order Thraustochytriales. Oomycetes and Thraustochytrids from the fungi and place 0.169 Polyunsaturated fatty acids (PUFAs) are essential them either with the chromophyte algae in the kingdom membrane components in higher eukaryotes and the precur Chromophyta (Cavalier-Smith, 1981, BioSystems 14:461 sors of many lipid-derived signaling molecules. The PUFA 481) (this kingdom has been more recently expanded to PKS system of the present invention uses pathways for include other protists and members of this kingdom are now PUFA synthesis that do not require desaturation and elon called Stramenopiles) or with all algae in the kingdom gation of Saturated fatty acids. The pathways catalyzed by Protoctista (Margulis and Sagen, 1985, Biosystems 18:141 PUFA PKSs that are distinct from previously recognized 147). PKSs in both structure and mechanism. Generation of cis 0165 With the development of electron microscopy, double bonds is suggested to involve position-specific studies on the ultrastructure of the Zoospores of two genera isomerases; these enzymes are believed to be useful in the of Thraustochytrids, Thraustochytrium and Schizochytrium, production of new families of antibiotics. (Perkins, 1976, pp. 279-312 in “Recent Advances in Aquatic Mycology’ (ed. E. B. G. Jones), John Wiley & Sons, New 0170 To produce significantly high yields of various York; Kazama, 1980, Can. J. Bot. 58:2434-2446; Barr, 1981, bioactive molecules using the PUFA PKS system of the Biosystems 14:359-370) have provided good evidence that present invention, an organism, preferably a microorganism the Thraustochytriaceae are only distantly related to the or a plant, can be genetically modified to affect the activity Oomycetes. Additionally, genetic data representing a corre of a PUFAPKS system. In one aspect, such an organism can spondence analysis (a form of multivariate statistics) of 5 S endogenously contain and express a PUFAPKS system, and ribosomal RNA sequences indicate that Thraustochytriales the genetic modification can be a genetic modification of one are clearly a unique group of eukaryotes, completely sepa or more of the functional domains of the endogenous PUFA rate from the fungi, and most closely related to the red and PKS system, whereby the modification has some effect on , and to members of the Oomycetes (Mannella, the activity of the PUFAPKS system. In another aspect, such et al., 1987, Mol. Evol. 24:228-235). Most taxonomists have an organism can endogenously contain and express a PUFA agreed to remove the Thraustochytrids from the Oomycetes PKS system, and the genetic modification can be an intro (Bartnicki-Garcia, 1987, pp. 389-403 in “Evolutionary Biol duction of at least one exogenous nucleic acid sequence ogy of the Fungi' (eds. Rayner, A. D. M., Brasier, C. M. & (e.g., a recombinant nucleic acid molecule), wherein the Moore, D.), Cambridge University Press, Cambridge). exogenous nucleic acid sequence encodes at least one bio logically active domain or protein from a second PKS 0166 In summary, employing the taxonomic system of system and/or a protein that affects the activity of said PUFA Cavalier-Smith (Cavalier-Smith, 1981, BioSystems 14:461 PKS system (e.g., a phosphopantetheinyl transferases 481, 1983; Cavalier-Smith, 1993, MicrobiolRev. 57:953 (PPTase), discussed below). In yet another aspect, the organ 994), the Thraustochytrids are classified with the chro ism does not necessarily endogenously (naturally) contain a mophyte algae in the kingdom Chromophyta PUFA PKS system, but is genetically modified to introduce (Stramenopiles). This taxonomic placement has been more at least one recombinant nucleic acid molecule encoding an US 2008/0026440 A1 Jan. 31, 2008 24 amino acid sequence having the biological activity of at least example, in Sambrook et al., 1989, Molecular Cloning. A one domain of a PUFA PKS system. In this aspect, PUFA Laboratory Manual, Cold Spring Harbor Labs Press. The PKS activity is affected by introducing or increasing PUFA reference Sambrook et al., ibid., is incorporated by reference PKS activity in the organism. Various embodiments associ herein in its entirety. A genetically modified microorganism ated with each of these aspects will be discussed in greater can include a microorganism in which nucleic acid mol detail below. ecules have been inserted, deleted or modified (i.e., mutated: 0171 Therefore, according to the present invention, one e.g., by insertion, deletion, Substitution, and/or inversion of embodiment relates to a genetically modified microorgan nucleotides), in Such a manner that such modifications ism, wherein the microorganism expresses a PKS system provide the desired effect within the microorganism. comprising at least one biologically active domain of a 0173 Preferred microorganism host cells to modify polyunsaturated fatty acid (PUFA) polyketide synthase according to the present invention include, but are not (PKS) system. The at least one domain of the PUFA PKS limited to, any bacteria, protist, microalga, , or pro system is encoded by a nucleic acid sequence chosen from: tozoa. In one aspect, preferred microorganisms to geneti (a) a nucleic acid sequence encoding at least one domain of cally modify include, but are not limited to, any microor a polyunsaturated fatty acid (PUFA) polyketide synthase ganism of the order Thraustochytriales. Particularly (PKS) system from a Thraustochytrid microorganism; (b) a preferred host cells for use in the present invention could nucleic acid sequence encoding at least one domain of a include microorganisms from a genus including, but not PUFA PKS system from a microorganism identified by a limited tO: Thraustochytrium, Labyrinthuloides, screening method of the present invention; (c) a nucleic acid Japonochytrium, and Schizochytrium. Preferred species sequence encoding an amino acid sequence that is at least within these genera include, but are not limited to: any about 60% identical to at least 500 consecutive amino acids Schizochytrium species, including Schizochytrium aggrega of an amino acid sequence selected from the group consist tum, Schizochytrium limacinum, Schizochytrium minutum; ing of: SEQ ID NO:2, SEQ ID NO:4, and SEQ ID NO:6; any Thraustochytrium species (including former Ulkenia wherein the amino acid sequence has a biological activity of species such as U. visurgensis, U. amoeboida, U. Sarkari at least one domain of a PUFA PKS system; and, (d) a ana, U. profunda, U. radiata, U. minuta and Ulkenia sp. nucleic acid sequence encoding an amino acid sequence that BP-5601), and including Thraustochytrium striatum, is at least about 60% identical to an amino acid sequence Thraustochytrium aureum, Thraustochytrium roseum; and selected from the group consisting of: SEQID NO:8, SEQ any Japonochytrium species. Particularly preferred strains ID NO:10, SEQID NO:13, SEQID NO:18, SEQID NO:20, of Thraustochytriales include, but are not limited to: SEQ ID NO:22, SEQ ID NO:24, SEQ ID NO:26, SEQ ID Schizochytrium sp. (S31)(ATCC 20888); Schizochytrium sp. NO:28, SEQ ID NO:30, and SEQ ID NO:32; wherein the (S8)(ATCC 20889); Schizochytrium sp. (LC-RM)(ATCC amino acid sequence has a biological activity of at least one 18915); Schizochytrium sp. (SR21); Schizochytrium aggre domain of a PUFA PKS system. The genetic modification gatum (Goldstein et Belsky)(ATCC 28209); Schizochytrium affects the activity of the PKS system in the organism. The limacinum (Honda et Yokochi) (IFO 32693); Thraus screening process referenced in part (b) has been described tochytrium sp. (23B)(ATCC 20891); Thraustochytrium in detail above and includes the steps of: (a) selecting a striatum (Schneider)(ATCC 24473); Thraustochytrium microorganism that produces at least one PUFA; and, (b) aureum (Goldstein)(ATCC 34304); Thraustochytrium identifying a microorganism from (a) that has an ability to roseum (Goldstein)(ATCC 28210); and Japonochytrium sp. produce increased PUFAs under dissolved oxygen condi (L1)(ATCC 28207). Other examples of suitable host micro tions of less than about 5% of saturation in the fermentation organisms for genetic modification include, but are not medium, as compared to production of PUFAs by the limited to, yeast including Saccharomyces cerevisiae, Sac microorganism under dissolved oxygen conditions of greater charomyces Carlsbergensis, or other yeast Such as Candida, than about 5% of saturation, and preferably about 10%, and Kluyveromyces, or other fungi, for example, filamentous more preferably about 15%, and more preferably about 20% fungi such as Aspergillus, Neurospora, Penicillium, etc. of Saturation in the fermentation medium. The genetically Bacterial cells also may be used as hosts. This includes modified microorganism can include anyone or more of the Escherichia coli, which can be useful in fermentation pro above-identified nucleic acid sequences, and/or any of the cesses. Alternatively, a host such as a Lactobacillus species other homologues of any of the Schizochytrium PUFA PKS or Bacillus species can be used as a host. ORFs or domains as described in detail above. 0.174 Another embodiment of the present invention 0172. As used herein, a genetically modified microorgan relates to a genetically modified plant, wherein the plant has ism can include a genetically modified bacterium, protist, been genetically modified to recombinantly express a PKS microalgae, fungus, or other microbe, and particularly, any system comprising at least one biologically active domain of of the genera of the order Thraustochytriales (e.g., a Thraus a polyunsaturated fatty acid (PUFA) polyketide synthase tochytrid) described herein (e.g., Schizochytrium, Thraus (PKS) system. The domain is encoded by a nucleic acid tochytrium, Japonochytrium, Labyrinthuloides). Such a sequence chosen from: (a) a nucleic acid sequence encoding genetically modified microorganism has a genome which is at least one domain of a polyunsaturated fatty acid (PUFA) modified (i.e., mutated or changed) from its normal (i.e., polyketide synthase (PKS) system from a Thraustochytrid wild-type or naturally occurring) form such that the desired microorganism; (b) a nucleic acid sequence encoding at least result is achieved (i.e., increased or modified PUFA PKS one domain of a PUFA PKS system from a microorganism activity and/or production of a desired product using the identified by the screening and selection method described PKS system). Genetic modification of a microorganism can herein (see brief summary of method in discussion of be accomplished using classical strain development and/or genetically modified microorganism above); (c) a nucleic molecular genetic techniques. Such techniques known in the acid sequence encoding an amino acid sequence selected art and are generally disclosed for microorganisms, for from the group consisting of: SEQID NO:2, SEQID NO:4, US 2008/0026440 A1 Jan. 31, 2008

SEQID NO:6, and biologically active fragments thereof; (d) of a gene. For example, a genetic modification in a gene a nucleic acid sequence encoding an amino acid sequence which results in a decrease in the function of the protein selected from the group consisting of: SEQID NO:8, SEQ encoded by Such gene, can be the result of a complete ID NO:10, SEQID NO:13, SEQID NO:18, SEQID NO:20, deletion of the gene (i.e., the gene does not exist, and SEQ ID NO:22, SEQ ID NO:24, SEQ ID NO:26, SEQ ID therefore the protein does not exist), a mutation in the gene NO:28, SEQ ID NO:30, SEQ ID NO:32, and biologically which results in incomplete or no translation of the protein active fragments thereof, (e) a nucleic acid sequence encod (e.g., the protein is not expressed), or a mutation in the gene ing an amino acid sequence that is at least about 60% which decreases or abolishes the natural function of the identical to at least 500 consecutive amino acids of an amino protein (e.g., a protein is expressed which has decreased or acid sequence selected from the group consisting of: SEQID no enzymatic activity or action). Genetic modifications that NO:2, SEQID NO:4, and SEQID NO:6; wherein the amino result in an increase in gene expression or function can be acid sequence has a biological activity of at least one domain referred to as amplification, overproduction, overexpression, of a PUFA PKS system; and/or (f) a nucleic acid sequence activation, enhancement, addition, or up-regulation of a encoding an amino acid sequence that is at least about 60% gene. identical to an amino acid sequence selected from the group 0.178 The genetic modification of a microorganism or consisting of: SEQ ID NO:8, SEQ ID NO:10, SEQ ID plant according to the present invention preferably affects NO:13, SEQ ID NO:18, SEQ ID NO:20, SEQ ID NO:22, the activity of the PKS system expressed by the plant, SEQ ID NO:24, SEQ ID NO:26, SEQ ID NO:28, SEQ ID whether the PKS system is endogenous and genetically NO:30, and SEQ ID NO:32; wherein the amino acid modified, endogenous with the introduction of recombinant sequence has a biological activity of at least one domain of nucleic acid molecules into the organism, or provided com a PUFA PKS system. The genetically modified plant can pletely by recombinant technology. According to the present include any one or more of the above-identified nucleic acid invention, to “affect the activity of a PKS system” includes sequences, and/or any of the other homologues of any of the any genetic modification that causes any detectable or Schizochytrium PUFA PKS ORFs or domains as described measurable change or modification in the PKS system in detail above. expressed by the organism as compared to in the absence of 0175. As used herein, a genetically modified plant can the genetic modification. A detectable change or modifica include any genetically modified plant including higher tion in the PKS system can include, but is not limited to: the plants and particularly, any consumable plants or plants introduction of PKS system activity into an organism Such useful for producing a desired bioactive molecule of the that the organism now has measurable/detectable PKS sys present invention. Such a genetically modified plant has a tem activity (i.e., the organism did not contain a PKS system genome which is modified (i.e., mutated or changed) from prior to the genetic modification), the introduction into the its normal (i.e., wild-type or naturally occurring) form Such organism of a functional domain from a different PKS that the desired result is achieved (i.e., increased or modified system than a PKS system endogenously expressed by the PUFA PKS activity and/or production of a desired product organism such that the PKS system activity is modified (e.g., using the PKS system). Genetic modification of a plant can a bacterial PUFA PKS domain or a type I PKS domain is be accomplished using classical strain development and/or introduced into an organism that endogenously expresses a molecular genetic techniques. Methods for producing a non-bacterial PUFAPKS system), a change in the amount of transgenic plant, wherein a recombinant nucleic acid mol a bioactive molecule produced by the PKS system (e.g., the ecule encoding a desired amino acid sequence is incorpo system produces more (increased amount) or less (decreased rated into the genome of the plant, are known in the art. A amount) of a given product as compared to in the absence of preferred plant to genetically modify according to the the genetic modification), a change in the type of a bioactive present invention is preferably a plant Suitable for consump molecule produced by the PKS system (e.g., the system tion by animals, including humans. produces a new or different product, or a variant of a product that is naturally produced by the system), and/or a change in 0176 Preferred plants to genetically modify according to the ratio of multiple bioactive molecules produced by the the present invention (i.e., plant host cells) include, but are PKS system (e.g., the system produces a different ratio of not limited to any higher plants, and particularly consumable one PUFA to another PUFA, produces a completely different plants, including crop plants and especially plants used for lipid profile as compared to in the absence of the genetic their oils. Such plants can include, for example: canola, modification, or places various PUFAs in different positions Soybeans, rapeseed, linseed, corn, Safflowers, Sunflowers in a triacylglycerol as compared to the natural configura and tobacco. Other preferred plants include those plants that tion). Such a genetic modification includes any type of are known to produce compounds used as pharmaceutical genetic modification and specifically includes modifications agents, flavoring agents, neutraceutical agents, functional made by recombinant technology and by classical mutagen food ingredients or cosmetically active agents or plants that are genetically engineered to produce these compounds/ CS1S. agents. 0179. It should be noted that reference to increasing the activity of a functional domain or protein in a PUFA PKS 0177 According to the present invention, a genetically system refers to any genetic modification in the organism modified microorganism or plant includes a microorganism containing the domain or protein (or into which the domain or plant that has been modified using recombinant technol or protein is to be introduced) which results in increased ogy. As used herein, genetic modifications which result in a functionality of the domain or protein system and can decrease in gene expression, in the function of the gene, or include higher activity of the domain or protein (e.g., in the function of the gene product (i.e., the protein encoded specific activity or in vivo enzymatic activity), reduced by the gene) can be referred to as inactivation (complete or inhibition or degradation of the domain or protein system, partial), deletion, interruption, blockage or down-regulation and overexpression of the domain or protein. For example, US 2008/0026440 A1 Jan. 31, 2008 26 gene copy number can be increased, expression levels can be transferase (discussed below). Specific modifications that increased by use of a promoter that gives higher levels of could be made to an endogenous PUFA PKS system are expression than that of the native promoter, or a gene can be discussed in detail below. altered by genetic engineering or classical mutagenesis to 0182. In another aspect of this embodiment of the inven increase the activity of the domain or protein encoded by the tion, the genetic modification can include: (1) the introduc gene. tion of a recombinant nucleic acid molecule encoding an 0180. Similarly, reference to decreasing the activity of a amino acid sequence having a biological activity of at least functional domain or protein in a PUFA PKS system refers one domain of a non-bacterial PUFAPKS system; and/or (2) to any genetic modification in the organism containing Such the introduction of a recombinant nucleic acid molecule domain or protein (or into which the domain or protein is to encoding a protein or functional domain that affects the be introduced) which results in decreased functionality of activity of a PUFA PKS system, into a host. The host can the domain or protein and includes decreased activity of the include: (1) a host cell that does not express any PKS domain or protein, increased inhibition or degradation of the system, wherein all functional domains of a PKS system are domain or protein and a reduction or elimination of expres introduced into the host cell, and wherein at least one sion of the domain or protein. For example, the action of functional domain is from a non-bacterial PUFA PKS sys domain or protein of the present invention can be decreased tem; (2) a host cell that expresses a PKS system (endogenous by blocking or reducing the production of the domain or or recombinant) having at least one functional domain of a protein, "knocking out the gene or portion thereof encoding non-bacterial PUFA PKS system, wherein the introduced the domain or protein, reducing domain or protein activity, recombinant nucleic acid molecule can encode at least one or inhibiting the activity of the domain or protein. Blocking additional non-bacterial PUFA PKS domain function or or reducing the production of an domain or protein can another protein or domain that affects the activity of the host include placing the gene encoding the domain or protein PKS system; and (3) a host cell that expresses a PKS system under the control of a promoter that requires the presence of (endogenous or recombinant) which does not necessarily an inducing compound in the growth medium. By establish include a domain function from a non-bacterial PUFA PKS, ing conditions such that the inducer becomes depleted from and wherein the introduced recombinant nucleic acid mol the medium, the expression of the gene encoding the domain ecule includes a nucleic acid sequence encoding at least one or protein (and therefore, of protein synthesis) could be functional domain of a non-bacterial PUFA PKS system. In turned off. Blocking or reducing the activity of domain or other words, the present invention intends to encompass any protein could also include using an excision technology genetically modified organism (e.g., microorganism or approach similar to that described in U.S. Pat. No. 4,743, plant), wherein the organism comprises at least one non 546, incorporated herein by reference. To use this approach, bacterial PUFA PKS domain function (either endogenously the gene encoding the protein of interest is cloned between or by recombinant modification), and wherein the genetic specific genetic sequences that allow specific, controlled modification has a measurable effect on the non-bacterial excision of the gene from the genome. Excision could be PUFA PKS domain function or on the PKS system when the prompted by, for example, a shift in the cultivation tem organism comprises a functional PKS system. perature of the culture, as in U.S. Pat. No. 4,743,546, or by 0183) Therefore, using the non-bacterial PUFA PKS sys Some other physical or nutritional signal. tems of the present invention, which, for example, makes use of genes from Thraustochytrid PUFAPKS systems, gene 0181. In one embodiment of the present invention, a mixing can be used to extend the range of PUFA products to genetic modification includes a modification of a nucleic include EPA, DHA, ARA, GLA, SDA and others, as well as acid sequence encoding an amino acid sequence that has a to produce a wide variety of bioactive molecules, including biological activity of at least one domain of a non-bacterial antibiotics, other pharmaceutical compounds, and other PUFA PKS system as described herein. Such a modification can be to an amino acid sequence within an endogenously desirable products. The method to obtain these bioactive (naturally) expressed non-bacterial PUFA PKS system, molecules includes not only the mixing of genes from whereby a microorganism that naturally contains such a various organisms but also various methods of genetically system is genetically modified by, for example, classical modifying the non-bacterial PUFA PKS genes disclosed mutagenesis and selection techniques and/or molecular herein. Knowledge of the genetic basis and domain structure genetic techniques, include genetic engineering techniques. of the non-bacterial PUFA PKS system of the present Genetic engineering techniques can include, for example, invention provides a basis for designing novel genetically using a targeting recombinant vector to delete a portion of an modified organisms which produce a variety of bioactive endogenous gene, or to replace a portion of an endogenous molecules. Although mixing and modification of any PKS gene with a heterologous sequence. Examples of heterolo domains and related genes are contemplated by the present gous sequences that could be introduced into a host genome inventors, by way of example, various possible manipula include sequences encoding at least one functional domain tions of the PUFA-PKS system are discussed below with from another PKS system, such as a different non-bacterial regard to genetic modification and bioactive molecule pro PUFA PKS system, a bacterial PUFA PKS system, a type I duction. PKS system, a type II PKS system, or a modular PKS 0.184 For example, in one embodiment, non-bacterial system. Other heterologous sequences to introduce into the PUFA-PKS system products, such as those produced by genome of a host includes a sequence encoding a protein or Thraustochytrids, are altered by modifying the CLF (chain functional domain that is not a domain of a PKS system, but length factor) domain. This domain is characteristic of Type which will affect the activity of the endogenous PKS system. II (dissociated enzymes) PKS systems. Its amino acid For example, one could introduce into the host genome a sequence shows homology to KS (keto synthase pairs) nucleic acid molecule encoding a phosphopantetheinyl domains, but it lacks the active site cysteine. CLF may US 2008/0026440 A1 Jan. 31, 2008 27 function to determine the number of elongation cycles, and gene. This is the enzyme that carries out condensation of a hence the chain length, of the end product. In this embodi fatty acid, linked to a cysteine residue at the active site (by ment of the invention, using the current state of knowledge a thio-ester bond), with a malonyl-ACP. In the multi-step of FAS and PKS synthesis, a rational strategy for production reaction, CO, is released and the linear chain is extended by of ARA by directed modification of the non-bacterial PUFA two carbons. It is believed that only this KS can extend a PKS system is provided. There is controversy in the litera carbon chain that contains a double bond. This extension ture concerning the function of the CLF in PKS systems (C. occurs only when the double bond is in the cis configuration; Bisang et al., Nature 401, 502 (1999) and it is realized that if it is in the trans configuration, the double bond is reduced other domains may be involved in determination of the chain by enoyl-ACP reductase (ER) prior to elongation (Heath et length of the end product. However, it is significant that al., 1996, supra). All of the PUFA-PKS systems character Schizochytrium produces both DHA (C22:6, ()-3) and DPA ized so far have two KS domains, one of which shows (C22:5, co-6). In the PUFA-PKS system the cis double bonds greater homology to the FabB-like KS of E. coli than the are introduced during synthesis of the growing carbon chain. other. Again, without being bound by theory, the present Since placement of the co-3 and co-6 double bonds occurs inventors believe that in PUFA-PKS systems, the specifici early in the synthesis of the molecules, one would not expect ties and interactions of the DH (FabA-like) and KS (FabB that they would affect Subsequent end-product chain length like) enzymatic domains determine the number and place determination. Thus, without being bound by theory, the ment of cis double bonds in the end products. Because the present inventors believe that introduction of a factor (e.g. number of 2-carbon elongation reactions is greater than the CLF) that directs synthesis of C20 units (instead of C22 number of double bonds present in the PUFA-PKS end units) into the Schizochytrium PUFA-PKS system will result products, it can be determined that in some extension cycles in the production of EPA (C20:5, co-3) and ARA (C20:4, complete reduction occurs. Thus the DH and KS domains (O-6). For example, in heterologous systems, one could can be used as targets for alteration of the DHA/DPA ratio exploit the CLF by directly substituting a CLF from an EPA or ratios of other long chain fatty acids. These can be producing system (such as one from Photobacterium) into modified and/or evaluated by introduction of homologous the Schizochytrium gene set. The fatty acids of the resulting domains from other systems or by mutagenesis of these gene transformants can then be analyzed for alterations in profiles fragments. to identify the transformants producing EPA and/or ARA. 0188 In another embodiment, the ER (enoyl-ACP reduc 0185. In addition to dependence on development of a tase—an enzyme which reduces the trans-double bond in the heterologous system (recombinant system, such as could be fatty acyl-ACP resulting in fully saturated carbons) domains introduced into plants), the CLF concept can be exploited in can be modified or substituted to change the type of product Schizochytrium (i.e., by modification of a Schizochytrium made by the PKS system. For example, the present inventors genome). Transformation and homologous recombination know that Schizochytrium PUFA-PKS system differs from has been demonstrated in Schizochytrium. One can exploit the previously described bacterial systems in that it has two this by constructing a clone with the CLF of OrfB replaced (rather than one) ER domains. Without being bound by with a CLF from a C20 PUFA-PKS system. A marker gene theory, the present inventors believe these ER domains can will be inserted downstream of the coding region. One can strongly influence the resulting PKS production product. then transform the wild type cells, select for the marker The resulting PKS product could be changed by separately phenotype and then screen for those that had incorporated knocking out the individual domains or by modifying their the new CLF. Again, one would analyze these for any effects nucleotide sequence or by substitution of ER domains from on fatty acid profiles to identify transformants producing other organisms. EPA and/or ARA. If some factor other than those associated 0189 In another embodiment, nucleic acid molecules with the CLF are found to influence the chain length of the encoding proteins or domains that are not part of a PKS end product, a similar strategy could be employed to alter system, but which affect a PKS system, can be introduced those factors. into an organism. For example, all of the PUFAPKS systems 0186. Another preferred embodiment involving alter described above contain multiple, tandem, ACP domains. ation of the PUFA-PKS products involves modification or ACP (as a separate protein or as a domain of a larger protein) substitution of the B-hydroxyacyl-ACP dehydrase/keto syn requires attachment of a phosphopantetheline cofactor to thase pairs. During cis-vaccenic acid (C18:1, A11) synthesis produce the active, holo-ACP. Attachment of phosphopan in E. coli, creation of the cis double bond is believed to tetheline to the apo-ACP is carried out by members of the depend on a specific DH enzyme, B-hydroxy acyl-ACP Superfamily of enzymes—the phosphopantetheinyl trans dehydrase, the product of the FabA gene. This enzyme ferases (PPTase) (Lambalot R. H., et al., Chemistry and removes HOH from a B-keto acyl-ACP and leaves a trans Biology, 3,923 (1996)). double bond in the carbon chain. A subset of DH's, FabA 0190. By analogy to other PKS and FAS systems, the like, possess cis-trans isomerase activity (Heath et al., 1996, present inventors presume that activation of the multiple supra). A novel aspect of bacterial and non-bacterial PUFA ACP domains present in the Schizochytrium ORFA protein PKS systems is the presence of two FabA-like DH domains. is carried out by a specific, endogenous, PPTase. The gene Without being bound by theory, the present inventors encoding this presumed PPTase has not yet been identified believe that one or both of these DH domains will possess in Schizochytrium. If such a gene is present in cis-trans isomerase activity (manipulation of the DH Schizochytrium, one can envision several approaches that domains is discussed in greater detail below). could be used in an attempt to identify and clone it. These 0187 Another aspect of the unsaturated fatty acid syn could include (but would not be limited to): generation and thesis in E. coli is the requirement for a particular KS partial sequencing of a cDNA library prepared from actively enzyme. B-ketoacyl-ACP synthase, the product of the FabB growing Schizochytrium cells (note, one sequence was iden US 2008/0026440 A1 Jan. 31, 2008 28 tified in the currently available Schizochytrium cDNA PKS proteins (i.e. the same ones that correspond to those library set which showed homology to PPTases; however, found in the Shewanella PKS proteins). Until very recently, it appears to be part of a multidomain FAS protein, and as none of the Nostoc PKS domains present in the GenBank such may not encode the desired OrfA specific PPTase); use databases showed high homology to any of the domains of of degenerate oligonucleotide primers designed using amino Schizochytrium OrfA (or the homologous Shewanella Orf5 acid motifs present in many PPTases in PCR reactions (to protein). However, the complete genome of Nostoc has obtain a nucleic acid probe molecule to screen genomic or recently been sequenced and as a result, the sequence of the cDNA libraries); genetic approaches based on protein-pro region just upstream of the PKS gene cluster is now avail tein interactions (e.g. a yeast two-hybrid system) in which able. In this region are three Orfs that show homology to the the ORFA-ACP domains would be used as a “bait' to find domains (KS, MAT, ACP and KR) of OrfA (see FIG. 3). a “target' (i.e. the PPTase); and purification and partial Included in this set are two ACP domains, both of which sequencing of the enzyme itself as a means to generate a show high homology to the ORFAACP domains. At the end nucleic acid probe for Screening of genomic or cDNA of the Nostoc PKS cluster is the gene that encodes the Het libraries. I PPTase. Previously, it was not obvious what the substrate 0191 It is also conceivable that a heterologous PPTase of the Het I enzyme could be, however the presence of may be capable of activating the Schizochytrium ORFAACP tandem ACP domains in the newly identified Orf (Hgl E) of domains. It has been shown that some PPTases, for example the cluster strongly suggests to the present inventors that it the sfp enzyme of Bacillus subtilis (Lambalot et al., Supra) is those ACPs. The homology of the ACP domains of and the SVp enzyme of Streptomyces verticillus (Sanchez et Schizochytrium and Nostoc, as well as the tandem arrange al., 2001, Chemistry & Biology 8:725-738), have a broad ment of the domains in both proteins, makes Het I a likely substrate tolerance. These enzymes can be tested to see if candidate for heterologous activation of the Schizochytrium they will activate the Schizochytrium ACP domains. Also, a ORFAACPs. The present inventors are believed to be the recent publication described the expression of a fungal PKS first to recognize and contemplate this use for Nostoc Het I protein in tobacco (Yalpani et al., 2001, The Plant Cell PPTase. 13:1401-1409). Products of the introduced PKS system 0193 As indicated in Metz et al., 2001, supra, one novel (encoded by the 6-methylsalicyclic acid synthase gene of feature of the PUFA PKS systems is the presence of two Penicillium patulum) were detected in the transgenic plant, dehydratase domains, both of which show homology to the even though the corresponding fungal PPTase was not FabA proteins of E. coli. With the availability of the new present in those plants. This suggested that an endogenous Nostoc PKS gene sequences mentioned above, one can now plant PPTase(s) recognized and activated the fungal PKS compare the two systems and their products. The sequence ACP domain. Of relevance to this observation, the present of domains in the Nostoc cluster (from HglE to Het I) as the inventors have identified two sequences (genes) in the present inventors have defined them is (see FIG. 3): Arabidopsis whole genome database that are likely to encode PPTases. These sequences (GenBank Accession 0194) KS-MAT-2XACP, KR, KS, CLF-AT, ER (HetM, numbers: AAG51443 and AAC05345) are currently listed as HetN) HetI encoding “Unknown Proteins’. They can be identified as In the Schizochytrium PUFA-PKS Orfs A,B&C the sequence putative PPTases based on the presence in the translated (OrfA-B-C) is: protein sequences of several signature motifs including: G(I/V)D and WXXKE(A/S)XXK (SEQ ID NO:33), (listed in 0.195 KS-MAT-9xACP-KR KS-CLF-AT-ER DH-DH Lambalot et al., 1996 as characteristic of all PPTases). In ER addition, these two putative proteins contain two additional 0196. One can see the correspondence of the domains motifs typically found in PPTases typically associated with sequence (there is also a high amino acid sequence homol PKS and non-ribosomal peptide synthesis systems; i.e., ogy). The product of the Nostoc PKS system is a long chain FN(IL/V)SHS (SEQ ID NO:34) and (I/V/L)G(IL/V)D(IL/ hydroxy fatty acid (C26 or C28 with one or two hydroxy V) (SEQ ID NO:35). Furthermore, these motifs occur in the groups) that contains no double bonds (cis or trans). The expected relative positions in the protein sequences. It is product of the Schizochytrium PKS system is a long chain likely that homologues of the Arabidopsis genes are present polyunsaturated fatty acid (C22, with 5 or 6 double bonds— in other plants. Such as tobacco. Again, these genes can be all cis). An obvious difference between the two domain sets cloned and expressed to see if the enzymes they encode can is the presence of the two DH domains in the Schizochytrium activate the Schizochytrium ORFAACP domains, or alter proteins just the domains implicated in the formation of natively, OrfA could be expressed directly in the transgenic the cis double bonds of DHA and DPA (presumably HetM plant (either targeted to the plastid or the cytoplasm). and HetN in the Nostoc system are involved in inclusion of 0192 Another heterologous PPTase which may recog the hydroxyl groups and also contain a DH domain whose nize the ORFA ACP domains as substrates is the Het I origin differs from the those found in the PUFA). Also, the protein of Nostoc sp. PCC 7120 (formerly called Anabaena role of the duplicated ER domain in the Schizochytrium Orfs sp. PCC 7120). As noted in U.S. Pat. No. 6,140,486, several B and C is not known (the second ER domain in is not of the PUFA-PKS genes of Shewanella showed a high present other characterized PUFA PKS systems). The amino degree of homology to protein domains present in a PKS acid sequence homology between the two sets of domains cluster found in Nostoc (FIG. 2 of that patent). This Nostoc implies an evolutionary relationship. One can conceive of PKS system is associated with the synthesis of long chain the PUFA PKS gene set being derived from (in an evolu (C26 or C28) hydroxy fatty acids that become esterified to tionary sense) an ancestral Nostoc-like PKS gene set by Sugar moieties and form a part of the heterocyst . incorporation of the DH (FabA-like) domains. The addition These Nostoc PKS domains are also highly homologous to of the DH domains would result in the introduction of cis the domains found in Orfs B and C of the Schizochytrium double bonds in the new PKS end product structure. US 2008/0026440 A1 Jan. 31, 2008 29

0197) The comparisons of the Schizochytrium and Nostoc like B-hydroxy acyl-ACP dehydrase (DH) domains; (g) at PKS domain structures as well as the comparison of the least one chain length factor (CLF) domain; and (h) at least domain organization between the Schizochytrium and one malonyl-CoA:ACP acyltransferase (MAT) domain. In Shewanella PUFA-PKS proteins demonstrate nature’s abil one embodiment, the PUFA PKS system is a eukaryotic ity to alter domain order as well as incorporate new domains PUFA PKS system. In a preferred embodiment, the PUFA to create novel end products. In addition, the genes can now PKS system is an algal PUFA PKS system. In a more be manipulated in the laboratory to create new products. The preferred embodiment, the PUFA PKS system is a Thraus implication from these observations is that it should be tochytriales PUFA PKS system. Such PUFA PKS systems possible to continue to manipulate the systems in either a can include, but are not limited to, a Schizochytrium PUFA directed or random way to influence the end products. For PKS system, and a Thraustochytrium PUFA PKS system. In example, in a preferred embodiment, one could envision one embodiment, the PUFAPKS system can be expressed in substituting one of the DH (FabA-like) domains of the a prokaryotic host cell. In another embodiment, the PUFA PUFA-PKS system for a DH domain that did not posses PKS system can be expressed in a eukaryotic host cell. isomerization activity, potentially creating a molecule with a mix of cis- and trans-double bonds. The current products 0201 Another embodiment of the present invention of the Schizochytrium PUFA PKS system are DHA and DPA relates to a recombinant host cell which has been modified (C22:5 (O6). If one manipulated the system to produce C20 to express a non-bacterial PUFA PKS system, wherein the fatty acids, one would expect the products to be EPA and PKS system catalyzes both iterative and non-iterative enzy ARA (C20:4 (O6). This could provide a new source for ARA. matic reactions, and wherein the non-bacterial PUFA PKS One could also substitute domains from related PUFA-PKS system comprises at least the following biologically active systems that produced a different DHA to DPA ratio—for domains: (a) at least one enoyl ACP-reductase (ER) domain; example by using genes from Thraustochytrium 23B (the (b) multiple acyl carrier protein (ACP) domains (at least PUFA PKS system of which is identified for the first time four); (c) at least two B-keto acyl-ACP synthase (KS) herein). domains; (d) at least one acyltransferase (AT) domain; (e) at least one ketoreductase (KR) domain; (f) at least two FabA 0198 Additionally, one could envision specifically alter like B-hydroxy acyl-ACP dehydrase (DH) domains; (g) at ing one of the ER domains (e.g. removing, or inactivating) least one chain length factor (CLF) domain; and (h) at least in the Schizochytrium PUFA PKS system (other PUFA PKS one malonyl-CoA:ACP acyltransferase (MAT) domain. systems described so far do not have two ER domains) to determine its effect on the end product profile. Similar 0202 One aspect of this embodiment of the invention strategies could be attempted in a directed manner for each relates to a method to produce a product containing at least of the distinct domains of the PUFA-PKS proteins using one PUFA, comprising growing a plant comprising any of more or less Sophisticated approaches. Of course one would the recombinant host cells described above, wherein the not be limited to the manipulation of single domains. recombinant host cell is a plant cell, under conditions Finally, one could extend the approach by mixing domains effective to produce the product. Another aspect of this from the PUFA-PKS system and other PKS or FAS systems embodiment of the invention relates to a method to produce (e.g., type I, type II, modular) to create an entire range of a product containing at least one PUFA, comprising cultur new end products. For example, one could introduce the ing a culture containing any of the recombinant host cells PUFA-PKS DH domains into systems that do not normally described above, wherein the host cell is a microbial cell, incorporate cis double bonds into their end products. under conditions effective to produce the product. In a preferred embodiment, the PKS system in the host cell 0199 Accordingly, encompassed by the present inven catalyzes the direct production of triglycerides. tion are methods to genetically modify microbial or plant cells by: genetically modifying at least one nucleic acid 0203 Another embodiment of the present invention sequence in the organism that encodes an amino acid relates to a microorganism comprising a non-bacterial, poly sequence having the biological activity of at least one unsaturated fatty acid (PUFA) polyketide synthase (PKS) functional domain of a non-bacterial PUFA PKS system system, wherein the PKS catalyzes both iterative and non according to the present invention, and/or expressing at least iterative enzymatic reactions, and wherein the PUFA PKS one recombinant nucleic acid molecule comprising a nucleic system comprises: (a) at least two enoyl ACP-reductase acid sequence encoding Such amino acid sequence. Various (ER) domains; (b) at least six acyl carrier protein (ACP) embodiments of Such sequences, methods to genetically domains; (c) at least two B-keto acyl-ACP synthase (KS) modify an organism, and specific modifications have been domains; (d) at least one acyltransferase (AT) domain; (e) at described in detail above. Typically, the method is used to least one ketoreductase (KR) domain; (f) at least two FabA produce a particular genetically modified organism that like B-hydroxy acyl-ACP dehydrase (DH) domains; (g) at produces a particular bioactive molecule or molecules. least one chain length factor (CLF) domain; and (h) at least one malonyl-CoA:ACP acyltransferase (MAT) domain. 0200. One embodiment of the present invention relates to Preferably, the microorganism is a non-bacterial microor a recombinant host cell which has been modified to express ganism and more preferably, a eukaryotic microorganism. a polyunsaturated fatty acid (PUFA) polyketide synthase (PKS) system, wherein the PKS catalyzes both iterative and 0204 Yet another embodiment of the present invention non-iterative enzymatic reactions, and wherein the PUFA relates to a microorganism comprising a non-bacterial, poly PKS system comprises: (a) at least two enoyl ACP-reductase unsaturated fatty acid (PUFA) polyketide synthase (PKS) (ER) domains; (b) at least six acyl carrier protein (ACP) system, wherein the PKS catalyzes both iterative and non domains; (c) at least two B-keto acyl-ACP synthase (KS) iterative enzymatic reactions, and wherein the PUFA PKS domains; (d) at least one acyltransferase (AT) domain; (e) at system comprises: (a) at least one enoyl ACP-reductase (ER) least one ketoreductase (KR) domain; (f) at least two FabA domain; (b) multiple acyl carrier protein (ACP) domains (at US 2008/0026440 A1 Jan. 31, 2008 30 least four); (c) at least two f-ketoacyl-ACP synthase (KS) such as branched chain fatty acids). Therefore, in another domains; (d) at least one acyltransferase (AT) domain; (e) at preferred embodiment of the invention, one could create a least one ketoreductase (KR) domain; (f) at least two FabA large number of mutations in one or more of the PUFAPKS like B-hydroxy acyl-ACP dehydrase (DH) domains; (g) at genes disclosed herein, and then transform the appropriately least one chain length factor (CLF) domain; and (h) at least modified FabB- strain (e.g. create mutations in an expres one malonyl-CoA:ACP acyltransferase (MAT) domain. sion construct containing an ER domain and transform a 0205. In one embodiment of the present invention, it is FabB- strain having the other essential domains on a sepa contemplated that a mutagenesis program could be com rate plasmid—or integrated into the chromosome) and select bined with a selective screening process to obtain bioactive only for those transformants that grow without Supplemen molecules of interest. This would include methods to search tation of the medium (i.e., that still possessed an ability to for a range of bioactive compounds. This search would not produce a molecule that could complement the FabB be restricted to production of those molecules with cis defect). Additional screens could be developed to look for double bonds. The mutagenesis methods could include, but particular compounds (e.g. use of GC for fatty acids) being are not limited to: chemical mutagenesis, gene shuffling, produced in this selective subset of an active PKS system. Switching regions of the genes encoding specific enzymatic One could envision a number of similar selective screens for domains, or mutagenesis restricted to specific regions of bioactive molecules of interest. those genes, as well as other methods. 0208. As described above, in one embodiment of the present invention, a genetically modified microorganism or 0206 For example, high throughput mutagenesis meth plant includes a microorganism or plant which has an ods could be used to influence or optimize production of the enhanced ability to synthesize desired bioactive molecules desired bioactive molecule. Once an effective model system (products) or which has a newly introduced ability to has been developed, one could modify these genes in a high synthesize specific products (e.g., to synthesize a specific throughput manner. Utilization of these technologies can be antibiotic). According to the present invention, “an enhanced envisioned on two levels. First, if a sufficiently selective ability to synthesize' a product refers to any enhancement, screen for production of a product of interest (e.g., ARA) can or up-regulation, in a pathway related to the synthesis of the be devised, it could be used to attempt to alter the system to product such that the microorganism or plant produces an produce this product (e.g., in lieu of or in concert with, other increased amount of the product (including any production strategies such as those discussed above). Additionally, if the of a product where there was none before) as compared to strategies outlined above resulted in a set of genes that did the wild-type microorganism or plant, cultured or grown, produce the product of interest, the high throughput tech under the same conditions. Methods to produce Such geneti nologies could then be used to optimize the system. For cally modified organisms have been described in detail example, if the introduced domain only functioned at rela above. tively low temperatures, selection methods could be devised to permit removing that limitation. In one embodiment of the 0209. One embodiment of the present invention is a invention, Screening methods are used to identify additional method to produce desired bioactive molecules (also non-bacterial organisms having novel PKS systems similar referred to as products or compounds) by growing or cul to the PUFA PKS system of Schizochytrium, as described turing a genetically modified microorganism or plant of the herein (see above). Homologous PKS systems identified in present invention (described in detail above). Such a method Such organisms can be used in methods similar to those includes the step of culturing in a fermentation medium or described herein for the Schizochytrium, as well as for an growing in a Suitable environment, such as soil, a microor additional source of genetic material from which to create, ganism or plant, respectively, that has a genetic modification further modify and/or mutate a PKS system for expression as described previously herein and in accordance with the in that microorganism, in another microorganism, or in a present invention. In a preferred embodiment, method to higher plant, to produce a variety of compounds. produce bioactive molecules of the present invention includes the step of culturing under conditions effective to 0207. It is recognized that many genetic alterations, produce the bioactive molecule a genetically modified either random or directed, which one may introduce into a organism that expresses a PKS system comprising at least native (endogenous, natural) PKS system, will result in an one biologically active domain of a polyunsaturated fatty inactivation of enzymatic functions. A preferred embodi acid (PUFA) polyketide synthase (PKS) system. In this ment of the invention includes a system to select for only preferred aspect, at least one domain of the PUFA PKS those modifications that do not block the ability of the PKS system is encoded by a nucleic acid sequence selected from system to produce a product. For example, the FabB- strain the group consisting of: (a) a nucleic acid sequence encoding of E. coli is incapable of synthesizing unsaturated fatty acids at least one domain of a polyunsaturated fatty acid (PUFA) and requires Supplementation of the medium with fatty acids polyketide synthase (PKS) system from a Thraustochytrid that can Substitute for its normal unsaturated fatty acids in microorganism; (b) a nucleic acid sequence encoding at least order to grow (see Metz et al., 2001, Supra). However, this one domain of a PUFA PKS system from a microorganism requirement (for Supplementation of the medium) can be identified by the novel screening method of the present removed when the strain is transformed with a functional invention (described above in detail); (c) a nucleic acid PUFA-PKS system (i.e. one that produces a PUFA product sequence encoding an amino acid sequence selected from in the E. coli host see (Metz et al., 2001, supra, FIG. 2A). the group consisting of: SEQID NO:2, SEQID NO:4, SEQ The transformed FabB- strain now requires a functional ID NO:6, and biologically active fragments thereof, (d) a PUFA-PKS system (to produce the unsaturated fatty acids) nucleic acid sequence encoding an amino acid sequence for growth without supplementation. The key element in this selected from the group consisting of: SEQ ID NO:8, SEQ example is that production of a wide range of unsaturated ID NO:10, SEQID NO:13, SEQID NO:18, SEQID NO:20, fatty acid will suffice (even unsaturated fatty acid substitutes SEQ ID NO:22, SEQ ID NO:24, SEQ ID NO:26, SEQ ID US 2008/0026440 A1 Jan. 31, 2008

NO:28, SEQ ID NO:30, SEQ ID NO:32, and biologically produce significant quantities of the desired product through active fragments thereof, (e) a nucleic acid sequence encod the activity of the PKS system that is genetically modified ing an amino acid sequence that is at least about 60% according to the present invention. The compounds can be identical to at least 500 consecutive amino acids of an amino recovered through purification processes which extract the acid sequence selected from the group consisting of: SEQID compounds from the plant. In a preferred embodiment, the NO:2, SEQID NO:4, and SEQID NO:6; wherein the amino compound is recovered by harvesting the plant. In this acid sequence has a biological activity of at least one domain embodiment, the plant can be consumed in its natural state of a PUFA PKS system; and, (f) a nucleic acid sequence or further processed into consumable products. encoding an amino acid sequence that is at least about 60% 0212. As described above, a genetically modified micro identical to an amino acid sequence selected from the group organism useful in the present invention can, in one aspect, consisting of: SEQ ID NO:8, SEQ ID NO:10, SEQ ID endogenously contain and express a PUFAPKS system, and NO:13, SEQ ID NO:18, SEQ ID NO:20, SEQ ID NO:22, the genetic modification can be a genetic modification of one SEQ ID NO:24, SEQ ID NO:26, SEQ ID NO:28, SEQ ID or more of the functional domains of the endogenous PUFA NO:30, and SEQ ID NO:32; wherein the amino acid PKS system, whereby the modification has some effect on sequence has a biological activity of at least one domain of the activity of the PUFAPKS system. In another aspect, such a PUFA PKS system. In this preferred aspect of the method, an organism can endogenously contain and express a PUFA the organism is genetically modified to affect the activity of PKS system, and the genetic modification can be an intro the PKS system (described in detail above). Preferred host duction of at least one exogenous nucleic acid sequence cells for genetic modification related to the PUFA PKS (e.g., a recombinant nucleic acid molecule), wherein the system of the invention are described above. exogenous nucleic acid sequence encodes at least one bio 0210. In the method of production of desired bioactive logically active domain or protein from a second PKS compounds of the present invention, a genetically modified system and/or a protein that affects the activity of said PUFA microorganism is cultured or grown in a Suitable medium, PKS system (e.g., a phosphopantetheinyl transferases under conditions effective to produce the bioactive com (PPTase), discussed below). In yet another aspect, the organ pound. An appropriate, or effective, medium refers to any ism does not necessarily endogenously (naturally) contain a medium in which a genetically modified microorganism of PUFA PKS system, but is genetically modified to introduce the present invention, when cultured, is capable of produc at least one recombinant nucleic acid molecule encoding an ing the desired product. Such a medium is typically an amino acid sequence having the biological activity of at least aqueous medium comprising assimilable carbon, nitrogen one domain of a PUFA PKS system. In this aspect, PUFA and phosphate sources. Such a medium can also include PKS activity is affected by introducing or increasing PUFA appropriate salts, minerals, metals and other nutrients. PKS activity in the organism. Various embodiments associ Microorganisms of the present invention can be cultured in ated with each of these aspects have been discussed in detail conventional fermentation bioreactors. The microorganisms above. can be cultured by any fermentation process which includes, but is not limited to, batch, fed-batch, cell recycle, and 0213. In one embodiment of the method to produce continuous fermentation. Preferred growth conditions for bioactive compounds, the genetic modification changes at potential host microorganisms according to the present least one product produced by the endogenous PKS system, invention are well known in the art. The desired bioactive as compared to a wild-type organism. molecules produced by the genetically modified microor 0214. In another embodiment, the organism endog ganism can be recovered from the fermentation medium enously expresses a PKS system comprising the at least one using conventional separation and purification techniques. biologically active domain of the PUFAPKS system, and the For example, the fermentation medium can be filtered or genetic modification comprises transfection of the organism centrifuged to remove microorganisms, cell debris and other with a recombinant nucleic acid molecule selected from the particulate matter, and the product can be recovered from the group consisting of a recombinant nucleic acid molecule cell-free Supernatant by conventional methods, such as, for encoding at least one biologically active domain from a example, ion exchange, chromatography, extraction, Solvent second PKS system and a recombinant nucleic acid mol extraction, membrane separation, electrodialysis, reverse ecule encoding a protein that affects the activity of the PUFA osmosis, distillation, chemical derivatization and crystalli PKS system. In this embodiment, the genetic modification Zation. Alternatively, microorganisms producing the desired preferably changes at least one product produced by the compound, or extracts and various fractions thereof, can be endogenous PKS system, as compared to a wild-type organ used without removal of the microorganism components ism. A second PKS system can include another PUFA PKS from the product. system (bacterial or non-bacterial), a type I PKS system, a 0211. In the method for production of desired bioactive type II PKS system, and/or a modular PKS system. compounds of the present invention, a genetically modified Examples of proteins that affect the activity of a PKS system plant is cultured in a fermentation medium or grown in a have been described above (e.g., PPTase). Suitable medium Such as soil. An appropriate, or effective, fermentation medium has been discussed in detail above. A 0215. In another embodiment, the organism is genetically Suitable growth medium for higher plants includes any modified by transfection with a recombinant nucleic acid growth medium for plants, including, but not limited to, Soil, molecule encoding the at least one domain of the polyun sand, any other particulate media that Support root growth saturated fatty acid (PUFA) polyketide synthase (PKS) (e.g. Vermiculite, perlite, etc.) or Hydroponic culture, as well system. Such recombinant nucleic acid molecules have been as Suitable light, water and nutritional Supplements which described in detail previously herein. optimize the growth of the higher plant. The genetically 0216) In another embodiment, the organism endog modified plants of the present invention are engineered to enously expresses a non-bacterial PUFA PKS system, and US 2008/0026440 A1 Jan. 31, 2008 32 the genetic modification comprises Substitution of a domain for treatment of neurodegenerative disease, a drug for treat from a different PKS system for a nucleic acid sequence ment of degenerative liver disease, an antibiotic, and a encoding at least one domain of the non-bacterial PUFA cholesterol lowering formulation. One advantage of the PKS system. In another embodiment, the organism endog non-bacterial PUFA PKS system of the present invention is enously expresses a non-bacterial PUFA PKS system that the ability of such a system to introduce carbon-carbon has been modified by transfecting the organism with a double bonds in the cis configuration, and molecules includ recombinant nucleic acid molecule encoding a protein that ing a double bond at every third carbon. This ability can be regulates the chain length of fatty acids produced by the utilized to produce a variety of compounds. PUFA PKS system. In one aspect, the recombinant nucleic 0222 Preferably, bioactive compounds of interest are acid molecule encoding a protein that regulates the chain produced by the genetically modified microorganism in an length of fatty acids replaces a nucleic acid sequence encod amount that is greater than about 0.05%, and preferably ing a chain length factor in the non-bacterial PUFA PKS greater than about 0.1%, and more preferably greater than system. In another aspect, the protein that regulates the chain about 0.25%, and more preferably greater than about 0.5%, length of fatty acids produced by the PUFA PKS system is and more preferably greater than about 0.75%, and more a chain length factor. In another aspect, the protein that preferably greater than about 1%, and more preferably regulates the chain length of fatty acids produced by the greater than about 2.5%, and more preferably greater than PUFA PKS system is a chain length factor that directs the about 5%, and more preferably greater than about 10%, and synthesis of C20 units. more preferably greater than about 15%, and even more 0217. In another embodiment, the organism expresses a preferably greater than about 20% of the dry weight of the non-bacterial PUFAPKS system comprising a genetic modi microorganism. For lipid compounds, preferably, Such com fication in a domain selected from the group consisting of a pounds are produced in an amount that is greater than about domain encoding B-hydroxyacyl-ACP dehydrase (DH) and 5% of the dry weight of the microorganism. For other a domain encoding B-ketoacyl-ACP synthase (KS), wherein bioactive compounds, such as antibiotics or compounds that the modification alters the ratio of long chain fatty acids are synthesized in Smaller amounts, those strains possessing produced by the PUFA PKS system as compared to in the Such compounds at of the dry weight of the microorganism absence of the modification. In one aspect of this embodi are identified as predictably containing a novel PKS system ment, the modification is selected from the group consisting of the type described above. In some embodiments, particu of a deletion of all or a part of the domain, a substitution of lar bioactive molecules (compounds) are secreted by the a homologous domain from a different organism for the microorganism, rather than accumulating. Therefore, Such domain, and a mutation of the domain. bioactive molecules are generally recovered from the culture medium and the concentration of molecule produced will 0218. In another embodiment, the organism expresses a vary depending on the microorganism and the size of the non-bacterial PUFA PKS system comprising a modification culture. in an enoyl-ACP reductase (ER) domain, wherein the modi fication results in the production of a different compound as 0223) One embodiment of the present invention relates to compared to in the absence of the modification. In one a method to modify an endproduct containing at least one aspect of this embodiment, the modification is selected from fatty acid, comprising adding to said endproduct an oil the group consisting of a deletion of all or a part of the ER produced by a recombinant host cell that expresses at least one recombinant nucleic acid molecule comprising a nucleic domain, a substitution of an ER domain from a different acid sequence encoding at least one biologically active organism for the ER domain, and a mutation of the ER domain of a PUFA PKS system. The PUFA PKS system is domain. any non-bacterial PUFA PKS system, and preferably, is 0219. In one embodiment of the method to produce a selected from the group of: (a) a nucleic acid sequence bioactive molecule, the organism produces a polyunsatu encoding at least one domain of a polyunsaturated fatty acid rated fatty acid (PUFA) profile that differs from the naturally (PUFA) polyketide synthase (PKS) system from a Thraus occurring organism without a genetic modification. tochytrid microorganism; (b) a nucleic acid sequence encod ing at least one domain of a PUFA PKS system from a 0220 Many other genetic modifications useful for pro microorganism identified by the novel screening method ducing bioactive molecules will be apparent to those of skill disclosed herein; (c) a nucleic acid sequence encoding an in the art, given the present disclosure, and various other amino acid sequence selected from the group consisting of modifications have been discussed previously herein. The SEQID NO:2, SEQID NO:4, SEQID NO:6, and biologi present invention contemplates any genetic modification cally active fragments thereof. (d) a nucleic acid sequence related to a PUFA PKS system as described herein which encoding an amino acid sequence selected from the group results in the production of a desired bioactive molecule. consisting of: SEQ ID NO:8, SEQ ID NO:10, SEQ ID 0221 Bioactive molecules, according to the present NO:13, SEQ ID NO:18, SEQ ID NO:20, SEQ ID NO:22, invention, include any molecules (compounds, products, SEQ ID NO:24, SEQ ID NO:26, SEQ ID NO:28, SEQ ID etc.) that have a biological activity, and that can be produced NO:30, SEQ ID NO:32, and biologically active fragments by a PKS system that comprises at least one amino acid thereof: (e) a nucleic acid sequence encoding an amino acid sequence having a biological activity of at least one func sequence that is at least about 60% identical to at least 500 tional domain of a non-bacterial PUFA PKS system as consecutive amino acids of an amino acid sequence selected described herein. Such bioactive molecules can include, but from the group consisting of: SEQID NO:2, SEQID NO:4, are not limited to: a polyunsaturated fatty acid (PUFA), an and SEQ ID NO:6; wherein the amino acid sequence has a anti-inflammatory formulation, a chemotherapeutic agent, biological activity of at least one domain of a PUFA PKS an active excipient, an osteoporosis drug, an anti-depressant, system; and, (f) a nucleic acid sequence encoding an amino an anti-convulsant, an anti-Heliobactor pylori drug, a drug acid sequence that is at least about 60% identical to an amino US 2008/0026440 A1 Jan. 31, 2008 acid sequence selected from the group consisting of: SEQID NO:32, and biologically active fragments thereof; (e) a NO:8, SEQ ID NO:10, SEQ ID NO:13, SEQ ID NO:18, nucleic acid sequence encoding an amino acid sequence that SEQ ID NO:20, SEQ ID NO:22, SEQ ID NO:24, SEQ ID is at least about 60% identical to at least 500 consecutive NO:26, SEQ ID NO:28, SEQ ID NO:30, and SEQ ID amino acids of an amino acid sequence selected from the NO:32; wherein the amino acid sequence has a biological group consisting of: SEQID NO:2, SEQID NO:4, and SEQ activity of at least one domain of a PUFA PKS system. ID NO:6; wherein the amino acid sequence has a biological Variations of these nucleic acid sequences have been activity of at least one domain of a PUFA PKS system; described in detail above. and/or (f) a nucleic acid sequence encoding an amino acid 0224 Preferably, the endproduct is selected from the sequence that is at least about 60% identical to an amino acid group consisting of a food, a dietary Supplement, a pharma sequence selected from the group consisting of SEQ ID ceutical formulation, a humanized animal milk, and an NO:8, SEQ ID NO:10, SEQ ID NO:13, SEQ ID NO:18, infant formula. Suitable pharmaceutical formulations SEQ ID NO:20, SEQ ID NO:22, SEQ ID NO:24, SEQ ID include, but are not limited to, an anti-inflammatory formu NO:26, SEQ ID NO:28, SEQ ID NO:30, and SEQ ID lation, a chemotherapeutic agent, an active excipient, an NO:32; wherein the amino acid sequence has a biological osteoporosis drug, an anti-depressant, an anti-convulsant, an activity of at least one domain of a PUFA PKS system. anti-Heliobactor pylori drug, a drug for treatment of neu rodegenerative disease, a drug for treatment of degenerative 0227 Methods to genetically modify a host cell and to liver disease, an antibiotic, and a cholesterol lowering for produce a genetically modified non-human, milk-producing mulation. In one embodiment, the endproduct is used to treat animal, are known in the art. Examples of host animals to a condition selected from the group consisting of chronic modify include cattle, sheep, pigs, goats, yaks, etc., which inflammation, acute inflammation, gastrointestinal disorder, are amenable to genetic manipulation and cloning for rapid cancer, cachexia, cardiac restenosis, neurodegenerative dis expansion of a transgene expressing population. For ani order, degenerative disorder of the liver, blood lipid disorder, mals, PKS-like transgenes can be adapted for expression in osteoporosis, osteoarthritis, autoimmune disease, preec target organelles, tissues and body fluids through modifica lampsia, preterm birth, age related maculopathy, pulmonary tion of the gene regulatory regions. Of particular interest is disorder, and peroxisomal disorder. the production of PUFAs in the breast milk of the host 0225 Suitable food products include, but are not limited animal. to, fine bakery wares, bread and rolls, breakfast cereals, processed and unprocessed cheese, condiments (ketchup, 0228. The following examples are provided for the pur mayonnaise, etc.), dairy products (milk, yogurt), puddings pose of illustration and are not intended to limit the scope of and gelatine desserts, carbonated drinks, teas, powdered the present invention. beverage mixes, processed fish products, fruit-based drinks, chewing gum, hard confectionery, frozen dairy products, EXAMPLES processed meat products, nut and nut-based spreads, pasta, processed poultry products, gravies and sauces, potato chips Example 1 and other chips or crisps, chocolate and other confectionery, Soups and Soup mixes, soya based products (milks, drinks, 0229. The following example describes the further analy creams, whiteners), vegetable oil-based spreads, and veg sis of PKS related sequences from Schizochytrium. etable-based drinks. 0230. The present inventors have sequenced the genomic 0226 Yet another embodiment of the present invention DNA including the entire length of all three open reading relates to a method to produce a humanized animal milk. frames (Orfs) in the Schizochytrium PUFA PKS system This method includes the steps of genetically modifying using the general methods outlined in Examples 8 and 9 milk-producing cells of a milk-producing animal with at from PCT Publication No. WO 0042195 and U.S. applica least one recombinant nucleic acid molecule comprising a tion Ser. No. 09/231,899. The biologically active domains in nucleic acid sequence encoding at least one biologically the Schizochytrium PKS proteins are depicted graphically in active domain of a PUFA PKS system. The PUFA PKS system is a non-bacterial PUFAPKS system, and preferably, FIG. 1. The domain structure of the Schizochytrium PUFA the at least one domain of the PUFA PKS system is encoded PKS system is described more particularly as follows. by a nucleic acid sequence selected from the group consist Open Reading Frame A (OrfA): ing of: (a) a nucleic acid sequence encoding at least one domain of a polyunsaturated fatty acid (PUFA) polyketide 0231. The complete nucleotide sequence for OrfA is synthase (PKS) system from a Thraustochytrid microorgan represented herein as SEQ ID NO:1. OrfA is a 8730 nucle ism; (b) a nucleic acid sequence encoding at least one otide sequence (not including the stop codon) which encodes domain of a PUFA PKS system from a microorganism a 2910 amino acid sequence, represented herein as SEQ ID identified by the novel screening method described previ NO:2. Within OrfA are twelve domains: ously herein; (c) a nucleic acid sequence encoding an amino acid sequence selected from the group consisting of: SEQID 0232 (a) one f-ketoacyl-ACP synthase (KS) domain; NO:2, SEQID NO:4, SEQID NO:6, and biologically active fragments thereof. (d) a nucleic acid sequence encoding an 0233 (b) one malonyl-CoA:ACP acyltransferase (MAT) amino acid sequence selected from the group consisting of domain; SEQ ID NO:8, SEQ ID NO:10, SEQ ID NO:13, SEQ ID 0234 (c) nine acyl carrier protein (ACP) domains; NO:18, SEQ ID NO:20, SEQ ID NO:22, SEQ ID NO:24, SEQ ID NO:26, SEQ ID NO:28, SEQ ID NO:30, SEQ ID 0235 (d) one ketoreductase (KR) domain. US 2008/0026440 A1 Jan. 31, 2008 34

0236. The domains contained within OrfA have been positions 575 and 600 of SEQID NO:2 (ORFA) to an ending determined based on: point of between about positions 935 and 1000 of SEQ ID 0237 (1) results of an analysis with Pfam program (Pfam NO:2. The amino acid sequence containing the ORFA-MAT is a database of multiple alignments of protein domains or domain is represented herein as SEQ ID NO:10 (positions conserved protein regions. The alignments represent some 575-1000 of SEQID NO:2). It is noted that the ORFA-MAT evolutionary conserved structure that has implications for domain contains an active site motif: GHS*XG (acyl the protein's function. Profile hidden Markov models (pro binding site S7), represented herein as SEQ ID NO:11. file HMMs) built from the Pfam alignments can be very ORFA-ACPH1-9 useful for automatically recognizing that a new protein 0242 Domains 3-11 of OrfA are nine tandem ACP belongs to an existing protein family, even if the homology domains, also referred to herein as ORFA-ACP (the first is weak. Unlike standard pairwise alignment methods (e.g. domain in the sequence is ORFA-ACP1, the second domain BLAST, FASTA), Pfam HMMs deal sensibly with multido is ORFA-ACP2, the third domain is ORFA-ACP3, etc.). The main proteins. The reference provided for the Pfam version first ACP domain, ORFA-ACP1, is contained within the used is: Bateman A, Birney E. Cerruti L. Durbin R, Etwiller nucleotide sequence spanning from about position 3343 to L., Eddy S R, Griffiths-Jones S. Howe K L, Marshall M, about position3600 of SEQID NO:1 (OrfA). The nucleotide Sonnhammer E L (2002) Nucleic Acids Research 30(1):276 sequence containing the sequence encoding the ORFA 280); and/or ACP1 domain is represented herein as SEQ ID NO:12 (positions 3343-3600 of SEQ ID NO:1). The amino acid 0238 (2) homology comparison to bacterial PUFA-PKS sequence containing the first ACP domain spans from about systems (e.g., Shewanella) using a BLAST 2.0 Basic position 1115 to about position 1200 of SEQID NO:2. The BLAST homology search using blastp for amino acid amino acid sequence containing the ORFA-ACP1 domain is searches with standard default parameters, wherein the represented herein as SEQ ID NO:13 (positions 1115-1200 query sequence is filtered for low complexity regions by of SEQ ID NO:2). It is noted that the ORFA-ACP1 domain default (described in Altschul, S. F., Madden, T. L., Schaaf contains an active site motif: LGIDS* (pantetheline binding fer, A. A., Zhang, J., Zhang, Z. Miller, W. & Lipman, D. J. motif Ss.), represented herein by SEQ ID NO:14. The (1997) “Gapped BLAST and PSI-BLAST: a new generation nucleotide and amino acid sequences of all nine ACP of protein database search programs. Nucleic Acids Res. domains are highly conserved and therefore, the sequence 25:3389-3402, incorporated herein by reference in its for each domain is not represented herein by an individual entirety). sequence identifier. However, based on this information, one of skill in the art can readily determine the sequence for each 0239 Sequences provided for individual domains are of the other eight ACP domains. The repeat interval for the believed to contain the full length of the sequence encoding nine domains is approximately about 110 to about 330 a functional domain, and may contain additional flanking nucleotides of SEQ ID NO:1. sequence within the Orf. 0243 All nine ACP domains together span a region of ORFA-KS OrfA of from about position 3283 to about position 6288 of SEQID NO:1, which corresponds to amino acid positions of 0240 The first domain in OrfA is a KS domain, also from about 1095 to about 2096 of SEQID NO:2. This region referred to herein as ORFA-KS. This domain is contained includes the linker segments between individual ACP within the nucleotide sequence spanning from a starting domains. Each of the nine ACP domains contains a panteth point of between about positions 1 and 40 of SEQID NO:1 eine binding motif LGIDS* (represented herein by SEQ ID (OrfA) to an ending point of between about positions 1428 NO:14), wherein * is the pantetheline binding site S. At each and 1500 of SEQ ID NO:1. The nucleotide sequence con end of the ACP domain region and between each ACP taining the sequence encoding the ORFA-KS domain is domain is a region that is highly enriched for proline (P) and represented herein as SEQ ID NO:7 (positions 1-1500 of alanine (A), which is believed to be a linker region. For SEQID NO:1). The amino acid sequence containing the KS example, between ACP domains 1 and 2 is the sequence: domain spans from a starting point of between about posi APAPVKAAAPAAPVASAPAPA, represented herein as tions 1 and 14 of SEQID NO:2 (ORFA) to an ending point SEQ ID NO:15. of between about positions 476 and 500 of SEQ ID NO:2. The amino acid sequence containing the ORFA-KS domain ORFA-KR is represented herein as SEQ ID NO:8 (positions 1-500 of 0244 Domain 12 in OrfA is a KR domain, also referred SEQ ID NO:2). It is noted that the ORFA-KS domain to herein as ORFA-KR. This domain is contained within the contains an active site motif: DXAC* (*acyl binding site nucleotide sequence spanning from a starting point of about C2 15 ) position 6598 of SEQ ID NO:1 to an ending point of about position 8730 of SEQ ID NO:1. The nucleotide sequence ORFA-MAT containing the sequence encoding the ORFA-KR domain is 0241 The second domain in OrfA is a MAT domain, also represented herein as SEQ ID NO:17 (positions 6598-8730 referred to herein as ORFA-MAT. This domain is contained of SEQ ID NO:1). The amino acid sequence containing the within the nucleotide sequence spanning from a starting KR domain spans from a starting point of about position point of between about positions 1723 and 1798 of SEQ ID 2200 of SEQ ID NO:2 (ORFA) to an ending point of about NO: 1 (OrfA) to an ending point of between about positions position 2910 of SEQ ID NO:2. The amino acid sequence 2805 and 3000 of SEQ ID NO:1. The nucleotide sequence containing the ORFA-KR domain is represented herein as containing the sequence encoding the ORFA-MAT domain SEQ ID NO:18 (positions 2200-2910 of SEQ ID NO:2). is represented herein as SEQID NO:9 (positions 1723-3000 Within the KR domain is a core region with homology to of SEQ ID NO:1). The amino acid sequence containing the short chain aldehyde-dehydrogenases (KR is a member of MAT domain spans from a starting point of between about this family). This core region spans from about position US 2008/0026440 A1 Jan. 31, 2008

7198 to about position 7500 of SEQ ID NO:1, which ORFB-AT corresponds to amino acid positions 2400-2500 of SEQ ID NO:2. 0253) The third domain in OrfB is an AT domain, also referred to herein as ORFB-AT. This domain is contained Open Reading Frame B (OrfB): within the nucleotide sequence spanning from a starting point of between about positions 2701 and 3598 of SEQ ID 0245. The complete nucleotide sequence for OrfB is NO:3 (OrfB) to an ending point of between about positions represented herein as SEQ ID NO:3. OrfB is a 6177 nucle 3975 and 4200 of SEQ ID NO:3. The nucleotide sequence otide sequence (not including the stop codon) which encodes containing the sequence encoding the ORFB-AT domain is a 2059 amino acid sequence, represented herein as SEQ ID represented herein as SEQ ID NO:23 (positions 2701-4200 NO:4. Within OrfB are four domains: of SEQ ID NO:3). The amino acid sequence containing the 0246 (a) B-ketoacyl-ACP synthase (KS) domain; AT domain spans from a starting point of between about positions 901 and 1200 of SEQ ID NO:4 (ORFB) to an 0247 (b) one chain length factor (CLF) domain; ending point of between about positions 1325 and 1400 of SEQ ID NO:4. The amino acid sequence containing the 0248 (c) one acyl transferase (AT) domain; ORFB-AT domain is represented herein as SEQ ID NO:24 0249 (d) one enoyl ACP-reductase (ER) domain. (positions 901-1400 of SEQ ID NO:4). It is noted that the ORFB-AT domain contains an AT active site motif of 0250) The domains contained within ORFB have been GxS*XG (*acyl binding site S1140). determined based on: (1) results of an analysis with Pfam program, described above; and/or (2) homology comparison ORFB-ER to bacterial PUFA-PKS systems (e.g., Shewanella) using a 0254 The fourth domain in OrfB is an ER domain, also BLAST 2.0 Basic BLAST homology search, also described referred to herein as ORFB-ER. This domain is contained above. Sequences provided for individual domains are within the nucleotide sequence spanning from a starting believed to contain the full length of the sequence encoding point of about position 4648 of SEQ ID NO:3 (OrfB) to an a functional domain, and may contain additional flanking ending point of about position 6177 of SEQ ID NO:3. The sequence within the Orf. nucleotide sequence containing the sequence encoding the ORFB-KS ORFB-ER domain is represented herein as SEQ ID NO:25 (positions 4648-6177 of SEQ ID NO:3). The amino acid 0251 The first domain in OrfB is a KS domain, also sequence containing the ER domain spans from a starting referred to herein as ORFB-KS. This domain is contained point of about position 1550 of SEQID NO:4 (ORFB) to an within the nucleotide sequence spanning from a starting ending point of about position 2059 of SEQ ID NO:4. The point of between about positions 1 and 43 of SEQID NO:3 amino acid sequence containing the ORFB-ER domain is (OrfB) to an ending point of between about positions 1332 represented herein as SEQ ID NO:26 (positions 1550-2059 and 1350 of SEQ ID NO:3. The nucleotide sequence con of SEQ ID NO:4). taining the sequence encoding the ORFB-KS domain is represented herein as SEQ ID NO:19 (positions 1-1350 of Open Reading Frame C (Orf(r): SEQID NO:3). The amino acid sequence containing the KS 0255 The complete nucleotide sequence for Orfc is domain spans from a starting point of between about posi represented herein as SEQ ID NO:5. Orf is a 4509 nucle tions 1 and 15 of SEQID NO:4 (ORFB) to an ending point otide sequence (not including the stop codon) which encodes of between about positions 444 and 450 of SEQ ID NO:4. a 1503 amino acid sequence, represented herein as SEQ ID The amino acid sequence containing the ORFB-KS domain NO:6. Within Orf are three domains: is represented herein as SEQID NO:20 (positions 1-450 of SEQ ID NO:4). It is noted that the ORFB-KS domain 0256 (a) two FabA-like B-hydroxy acyl-ACP dehydrase contains an active site motif: DXAC* (*acyl binding site (DH) domains; C196). 0257 (b) one enoyl ACP-reductase (ER) domain. ORFB-CLF 0258. The domains contained within ORFC have been 0252) The second domain in OrfB is a CLF domain, also determined based on: (1) results of an analysis with Pfam referred to herein as ORFB-CLF. This domain is contained program, described above; and/or (2) homology comparison within the nucleotide sequence spanning from a starting to bacterial PUFA-PKS systems (e.g., Shewanella) using a point of between about positions 1378 and 1402 of SEQID BLAST 2.0 Basic BLAST homology search, also described NO:3 (OrfB) to an ending point of between about positions above. Sequences provided for individual domains are 2682 and 2700 of SEQ ID NO:3. The nucleotide sequence believed to contain the full length of the sequence encoding containing the sequence encoding the ORFB-CLF domain is a functional domain, and may contain additional flanking represented herein as SEQ ID NO:21 (positions 1378-2700 sequence within the Orf. of SEQ ID NO:3). The amino acid sequence containing the ORFC-DH1 CLF domain spans from a starting point of between about positions 460 and 468 of SEQID NO:4 (ORFB) to an ending 0259. The first domain in Orf is a DH domain, also point of between about positions 894 and 900 of SEQ ID referred to herein as ORFC-DH1. This is one of two DH NO:4. The amino acid sequence containing the ORFB-CLF domains in Orfc, and therefore is designated DH1. This domain is represented herein as SEQ ID NO:22 (positions domain is contained within the nucleotide sequence Span 460-900 of SEQID NO:4). It is noted that the ORFB-CLF ning from a starting point of between about positions 1 and domain contains a KS active site motif without the acyl 778 of SEQ ID NO:5 (Orf) to an ending point of between binding cysteine. about positions 1233 and 1350 of SEQ ID NO:5. The US 2008/0026440 A1 Jan. 31, 2008 36 nucleotide sequence containing the sequence encoding the Both flasks are placed on a shaker table at 200 rpm. After ORFC-DH1 domain is represented herein as SEQID NO:27 48-72 hr of culture time, the cultures are harvested by (positions 1-1350 of SEQ ID NO:5). The amino acid centrifugation and the cells analyzed for fatty acid methyl sequence containing the DH1 domain spans from a starting esters via gas chromatography to determine the following point of between about positions 1 and 260 of SEQID NO:6 data for each culture: (1) fatty acid profile; (2) PUFA (ORFC) to an ending point of between about positions 411 content; (3) fat content (estimated as amount total fatty acids and 450 of SEQ ID NO:6. The amino acid sequence con (TFA)). taining the ORFC-DH1 domain is represented herein as SEQ 0266 These data are then analyzed asking the following ID NO:28 (positions 1-450 of SEQ ID NO:6). five questions: ORFC-DH2 Selection Criteria Low O/Anoxic Flask Vs. Aerobic Flask 0260 The second domain in Orf is a DH domain, also (Yes/No) referred to herein as ORFC-DH2. This is the Second of two DH domains in Orf , and therefore is designated DH2. This 0267 (1) Did the DHA (or other PUFA content) (as % domain is contained within the nucleotide sequence Span FAME) stay about the same or preferably increase in the low ning from a starting point of between about positions 1351 oxygen culture compared to the aerobic culture? and 2437 of SEQ ID NO:5 (Orf() to an ending point of 0268 (2) Is C14:0+C16:0+C16:1 greater than about 40% between about positions 2607 and 2850 of SEQ ID NO:5. TFA in the anoxic culture? The nucleotide sequence containing the sequence encoding the ORFC-DH2 domain is represented herein as SEQ ID 0269 (3) Is there very little (>1% as FAME) or no NO:29 (positions 1351-2850 of SEQ ID NO:5). The amino precursors (C18:3n-3+C18:2n-6+C 18:3n-6) to the conven acid sequence containing the DH2 domain spans from a tional oxygen dependent elongase? desaturase pathway in the starting point of between about positions 451 and 813 of anoxic culture? SEQ ID NO:6 (ORFC) to an ending point of between about positions 869 and 950 of SEQ ID NO:6. The amino acid 0270 (4) Did fat content (as amount total fatty acids/cell sequence containing the ORFC-DH2 domain is represented dry weight) increase in the low oxygen culture compared to herein as SEQ ID NO:30 (positions 451-950 of SEQ ID the aerobic culture? NO:6). 0271 (5) Did DHA (or other PUFA content) increase as % cell dry weight in the low oxygen culture compared to the ORFC-ER aerobic culture? 0261) The third domain in Orf is an ER domain, also referred to herein as ORFC-ER. This domain is contained 0272. If first three questions are answered yes, there is a within the nucleotide sequence spanning from a starting good indication that the Strain contains a PKS genetic point of about position 2998 of SEQ ID NO:5 (Orf) to an system for making long chain PUFAs. The more questions ending point of about position 4509 of SEQ ID NO:5. The that are answered yes (preferably the first three questions nucleotide sequence containing the sequence encoding the must be answered yes), the stronger the indication that the ORFC-ER domain is represented herein as SEQ ID NO:31 strain contains such a PKS genetic system. If all five (positions 2998-4509 of SEQ ID NO:5). The amino acid questions are answered yes, then there is a very strong sequence containing the ER domain spans from a starting indication that the strain contains a PKS genetic system for point of about position 1000 of SEQID NO:6 (ORFC) to an making long chain PUFAs. ending point of about position 1502 of SEQ ID NO:6. The 0273 Following the method outlined above, a frozen vial amino acid sequence containing the ORFC-ER domain is of Thraustochytrium sp. 23B (ATCC 20892) was used to represented herein as SEQ ID NO:32 (positions 1000-1502 inoculate a 250 mL shake flask containing 50 mL of RCA of SEQ ID NO:6). medium. The culture was shaken on a shaker table (200 rpm) for 72 hr at 25° C. RCA medium contains the following: Example 2 0262 The following example describes the use of the screening process of the present invention to identify three RCA Medium other non-bacterial organisms comprising a PUFA PKS Deionized water 1000 mL. system according to the present invention. Reef Crystals (R) sea salts 40 g/L Glucose 20 g/L 0263. Thraustochytrium sp. 23B (ATCC 20892) was cul Monosodium glutamate (MSG) 20 g/L tured according to the screening method described in U.S. Yeast extract 1 g/L Provisional Application Ser. No. 60/298.796 and as PII metals 5 mLL described in detail herein. Vitamin mix 1 mLL pH 7.0 0264. The biorational screen (using shake flask cultures) developed for detecting microorganisms containing PUFA *PII metal mix and vitamin mix are same as those outlined in U.S. Pat. producing PKS systems is as follows: No. 5,130,742, incorporated herein by reference in its entirety. 0265 Two mL of a culture of the strain/microorganism to 0274) 25 mL of the 72 hr old culture was then used to be tested is placed in 250 mL baffled shake flask with 50 mL inoculate another 250 mL shake flask containing 50 mL of culture media (aerobic treatment) and another 2 mL of low nitrogen RCA medium (10 g/L MSG instead of 20 g/L) culture of the same strain is placed in a 250 mL non-baffled and the other 25 mL of culture was used to inoculate a 250 shake flask with 200 mL culture medium (anoxic treatment). mL shake flask containing 175 mL of low-nitrogen RCA US 2008/0026440 A1 Jan. 31, 2008 37 medium. The two flasks were then placed on a shaker table 0281. This Thraustochytrid microorganism is encom (200 rpm) for 72 hr at 25°C. The cells were then harvested passed herein as an additional sources of these genes for use via centrifugation and dried by lyophilization. The dried in the embodiments above. cells were analyzed for fat content and fatty acid profile and 0282) Thraustochytrium 23B (ATCC 20892) is signifi content using standard gas chromatograph procedures (such cantly different from Schizochytrium sp. (ATCC 20888) in as those outlined in U.S. Pat. No. 5,130,742). its fatty acid profile. Thraustochytrium 23B can have 0275. The screening results for Thraustochytrium 23B DHA: DPA(n-6) ratios as high as 14:1 compared to only were as follows: 2-3:1 in Schizochytrium (ATCC 20888). Thraustochytrium 23B can also have higher levels of C20:5(n-3). Analysis of the domains in the PUFA PKS system of Thraustochytrium 23B in comparison to the known Schizochytrium PUFAPKS Did DHA as % FAME increase? Yes (38->44%) C14:0 + C16:0 + C16:1 greater than about 40% Yes (44%) system should provide us with key information on how to TFA modify these domains to influence the ratio and types of No C18:3(n-3) or C18:3(n-6)? Yes (0%) PUFA produced using these systems. Did fat content increase? Yes (2-fold increase) Did DHA (or other HUFA content increase)? Yes (2.3-fold increase) 0283 The screening method described above has been utilized the identify other potential candidate strains con taining a PUFA PKS system. Two additional strains that 0276. The results, especially the significant increase in have been identified by the present inventors to have PUFA DHA content (as % FAME) under low oxygen conditions, PKS systems are Schizochytrium limacium (SR21) Honda & conditions, strongly indicates the presence of a PUFA pro Yokochi (IFO32693) and Ulkenia (BP-5601). Both were ducing PKS system in this strain of Thraustochytrium. screened as above but in N2 media (glucose: 60 g/L.; 0277. In order to provide additional data confirming the KHPO: 4.0 g/l; yeast extract: 1.0 g/L; corn steep liquor: 1 presence of a PUFA PKS system, southern blot of Thraus mL/L. NHNO: 1.0 g/L. artificial sea salts (Reef Crystals): tochytrium 23B was conducted using PKS probes from 20 g/L, all above concentrations mixed in deionized water). Schizochytrium strain 20888, a strain which has already For both the Schizochytrium and Ulkenia strains, the been determined to contain a PUFA producing PKS system answers to the first three screen questions discussed above (i.e., SEQ ID Nos: 1-32 described above). Fragments of for Thraustochytrium 23B was yes (Schizochytrium—DHA Thraustochytrium 23B genomic DNA which are homolo % FAME 32->41% aerobic vs anoxic, 58%. 14:0/16:0/16:1, gous to hybridization probes from PKS PUFA synthesis 0% precursors) and (Ulkenia–DHA % FAME 28->44% genes were detected using the Southern blot technique. aerobic vs anoxic, 63%. 14:0/16:0/16:1, 0% precursors), Thraustochytrium 23B genomic DNA was digested with indicating that these strains are good candidates for contain either Clal or KpnI restriction endonucleases, separated by ing a PUFA PKS system. Negative answers were obtained agarose gel electrophoresis (0.7% agarose, in standard Tris for the final two questions for each strain: fat decreased from Acetate-EDTA buffer), and blotted to a Schleicher &Schuell 61% dry wt to 22% dry weight, and DHA from 21-9% dry Nytran Supercharge membrane by capillary transfer. Two weight in S. limacium and fat decreased from 59 to 21% dry digoxigenin labeled hybridization probes were used—one weight in Ulkenia and DHA from 16% to 9% dry weight. specific for the Enoyl Reductase (ER) region of These Thraustochytrid microorganisms are also claimed Schizochytrium PKS OrfB (nucleotides 5012-5511 of OrfB: herein as additional sources of the genes for use in the SEQID NO:3), and the other specific for a conserved region embodiments above. at the beginning of Schizochytrium PKS Orf C (nucleotides 76-549 of Orf'; SEQ ID NO:5). Example 3 0278. The OrfB-ER probe detected an approximately 13 0284. The following example demonstrates that DHA kb Clal fragment and an approximately 3.6 kb KpnI frag and DPA synthesis in Schizochytrium does not involve ment in the Thraustochytrium 23B genomic DNA. The Orf membrane-bound desaturases or fatty acid elongation probe detected an approximately 7.5 kb ClaI fragment and enzymes like those described for other eukaryotes (Parker an approximately 4.6 kb KpnI fragment in the Thraus Barnes et al., 2000, supra; Shanklin et al., 1998, supra). tochytrium 23B genomic DNA. 0285 Schizochytrium accumulates large quantities of 0279 Finally, a recombinant genomic library, consisting triacylglycerols rich in DHA and docosapentaenoic acid of DNA fragments from Thraustochytrium 23B genomic (DPA; 22:506); e.g., 30% DHA+DPA by dry weight. In DNA inserted into vector lambda FIX II (Stratagene), was eukaryotes that synthesize 20- and 22-carbon PUFAs by an screened using digoxigenin labeled probes corresponding to elongation/desaturation pathway, the pools of 18-, 20- and the following segments of Schizochytrium 20888 PUFA 22-carbon intermediates are relatively large so that in vivo PKS genes: nucleotides 7385-7879 of OrfA (SEQID NO:1), labeling experiments using ''C-acetate reveal clear pre nucleotides 5012-5511 of Orf B (SEQ ID NO:3), and cursor-product kinetics for the predicted intermediates. Fur nucleotides 76-549 of Orf C (SEQ ID NO:5). Each of these thermore, radiolabeled intermediates provided exogenously probes detected positive plaques from the Thraustochytrium to such organisms are converted to the final PUFA products. 23B library, indicating extensive homology between the 0286) 1-'Clacetate was supplied to a 2-day-old culture Schizochytrium PUFA-PKS genes and the genes of Thraus as a single pulse at Zero time. Samples of cells were then tochytrium 23B. harvested by centrifugation and the lipids were extracted. In 0280. In summary, these results demonstrate that Thraus addition, 1-'Clacetate uptake by the cells was estimated by tochytrium 23B genomic DNA contains sequences that are measuring the radioactivity of the sample before and after homologous to PKS genes from Schizochytrium 20888. centrifugation. Fatty acid methyl esters derived from the US 2008/0026440 A1 Jan. 31, 2008 38 total cell lipids were separated by AgNO-TLC (solvent, CoA, 100 uM 1- Cmalonyl-CoA (0.9 Gbc/mol), 2 mM hexane:diethyl ether:acetic acid, 70:30:2 by volume). The NADH, and 2 mM NADPH for 60 min at 25° C. Assays identity of the fatty acid bands was verified by gas chroma were extracted and fatty acid methyl esters were prepared tography, and the radioactivity in them was measured by and separated as described above before detection of radio scintillation counting. Results showed that 1-C-acetate activity with an Instantimager (Packard Instruments, Meri was rapidly taken up by Schizochytrium cells and incorpo den, Conn.). Results showed that a cell-free homogenate rated into fatty acids, but at the shortest labeling time (1 min) derived from Schizochytrium cultures incorporated 1-''C- DHA contained 31% of the label recovered in fatty acids and malonyl-CoA into DHA, DPA, and saturated fatty acids this percentage remained essentially unchanged during the (data not shown). The same biosynthetic activities were 10-15 min of ''C-acetate incorporation and the subsequent retained by a 100,000xg supernatant fraction but were not 24 hours of culture growth (data not shown). Similarly, DPA present in the membrane pellet. These data contrast with represented 10% of the label throughout the experiment. those obtained during assays of the bacterial enzymes (see There is no evidence for a precursor-product relationship Metz et al., 2001, Supra) and may indicate use of a different between 16- or 18-carbon fatty acids and the 22-carbon (soluble) acyl acceptor molecule. Thus, DHA and DPA polyunsaturated fatty acids. These results are consistent with synthesis in Schizochytrium does not involve membrane rapid synthesis of DHA from '''C-acetate involving very bound desaturases or fatty acid elongation enzymes like Small (possibly enzyme-bound) pools of intermediates. those described for other eukaryotes. 0287 Next, cells were disrupted in 100 mM phosphate 0288 While various embodiments of the present inven buffer (pH 7.2), containing 2 mM DTT, 2 mM EDTA, and tion have been described in detail, it is apparent that modi 10% glycerol, by vortexing with glass beads. The cell-free fications and adaptations of those embodiments will occur to homogenate was centrifuged at 100,000 g for 1 hour. those skilled in the art. It is to be expressly understood, Equivalent aliquots of total homogenate, pellet (H-S pel however, that such modifications and adaptations are within let), and Supernatant (H-S Super) fractions were incubated the scope of the present invention, as set forth in the in homogenization buffer supplemented with 20 uMacetyl following claims.

SEQUENCE LISTING

<160> NUMBER OF SEQ ID NOS : 37 <21 Oc SEQ ID NO 1 <211 LENGTH 8730 <212> TYPE DNA <213> ORGANISM: Schizochytrium sp. <22O > FEATURE <221 NAME/KEY: CDS LOCATION: (1) . . (87.30) <400 SEQUENCE: 1

at g g c g gcc cgt citg Cag gag Cala aag gga ggC gag atg gat acc cqc 48 Met Ala Ala Arg Lieu Glin Glu Glin Lys Gly Gly Glu Met Asp Thr Arg 1 5 10 15

att gcc atc atc ggC atg tog gcc atc citc. coc tgc ggC acg acc gtg 96 Ile Ala Ile Ile Gly Met Ser Ala Ile Teu Pro Cys Gly Thir Thr Wall 2O 25 30

cgc gag tog tgg gag acc atc. cgc gcc ggC atc gac tgc citg tog gat 144 Arg Glu Ser Trp Glu Thr Ile Arg Ala Gly Ile Asp Cys Leu Ser Asp 35 40 45

citc. cco gag gac cgc gtc gac gtg acg gCg tac titt gac ccc gtc. aag 192 Teu Pro Glu Asp Arg Wall Asp Wall Thr Ala Tyr Phe Asp Pro Val Lys 5 O 55 60

acc acc aag gac aag atc tac tgc aag cgc ggit ggC titc. att coc gag 240 Thr Thr Lys Asp Lys Ile Tyr Cys Lys Arg Gly Gly Phe Ile Pro Glu 65 70 75 8O

tac gac titt gac gcc cgc gag titc. gga citc. aac atg titc. Cag atg gag 288 Asp Phe Asp Ala Arg Glu Phe Gly Teu Asn Met Phe Gln Met Glu 85 90 95

gac tog gac gCa aac Cag acc atc tog citt citc. aag gtc aag gag gCC 336 Asp Ser Asp Ala Asn Glin Thr Ile Ser Teu Telu Lys Wall Lys Glu Ala 100 105 110

citc cag gac goc ggc atc gac goc citc ggc aag gala aag aag aac atc 384

US 2008/0026440 A1 Jan. 31, 2008 48

-continued Lys Lieu Val Pro Ala Tyr Arg Ala Val Ile Val Lieu Ser Asn. Glin 2735 2740 2745 ggc gog ccc ccg goc aac goc acc at g cag ccg ccc tog citc gat 8289 Gly Ala Pro Pro Ala Asn Ala Thr Met Gln Pro Pro Ser Leu Asp 2750 2755 276 O. gcc gat cog gog ctic cag ggc toc gttc tac gac ggc aag acc citc 8334 Ala Asp Pro Ala Leu Glin Gly Ser Val Tyr Asp Gly Lys Thr Lieu 2765 2770 2775 titc. cac ggc ccg gC c ttic cqc ggc atc gat gac gtg citc. tcg togc 8379 Phe His Gly Pro Ala Phe Arg Gly Ile Asp Asp Val Lieu Ser Cys 2780 2785 279 O acc aag agc cag citt gtg goc aag toc agc gct g to coc ggc ticc 84.24 Thr Lys Ser Glin Lieu Val Ala Lys Cys Ser Ala Val Pro Gly Ser 2795 2800 2805 gac goc got cqc ggc gag titt goc acg gac act gac goc cat gac 8 469 Asp Ala Ala Arg Gly Glu Phe Ala Thr Asp Thr Asp Ala His Asp 2810 2815 282O ccc titc gtgaac gac citg goc titt cag goc atg citc gtc. togg gtg 85.14 Pro Phe Val Asn Asp Leu Ala Phe Glin Ala Met Leu Val Trp Val 2825 2830 2835 cgc cqc acg citc ggc cag got gog ctic coc aac tog atc cag cqc 8.559 Arg Arg Thr Lieu Gly Glin Ala Ala Lieu Pro Asn. Ser Ile Glin Arg 284 O 284.5 285 O atc gtc. cag cac cqc cog gtc. cc g cag gac aag ccc titc tac att 86O4 Ile Val Gln His Arg Pro Val Pro Glin Asp Lys Pro Phe Tyr Ile 2855 2.860 2865 acc ctic cqc toc aac cag to g g g c ggit cac tocc cag cac aag cac 8649 Thr Lieu Arg Ser Asn Glin Ser Gly Gly His Ser Glin His Lys His 2870 2875 2880 gcc citt cag titc. cac aac gag cag ggc gat citc titc att gat gtc 86.94 Ala Leu Glin Phe His Asn. Glu Glin Gly Asp Leu Phe Ile Asp Wall 2.885 2890 2.895 cag got to g g to atc goc acg gac agc citt gcc titc 873 O Glin Ala Ser Val Ile Ala Thr Asp Ser Leu Ala Phe 29 OO 2905 2.910

<210> SEQ ID NO 2 &2 11s LENGTH 2.910 &212> TYPE PRT <213> ORGANISM: Schizochytrium sp. <400 SEQUENCE: 2 Met Ala Ala Arg Lieu Glin Glu Gln Lys Gly Gly Glu Met Asp Thr Arg 1 5 10 15 Ile Ala Ile Ile Gly Met Ser Ala Ile Leu Pro Cys Gly. Thir Thr Val 2O 25 30 Arg Glu Ser Trp Glu Thir Ile Arg Ala Gly Ile Asp Cys Lieu Ser Asp 35 40 45 Leu Pro Glu Asp Arg Val Asp Val Thr Ala Tyr Phe Asp Pro Wall Lys 50 55 60 Thir Thr Lys Asp Lys Ile Tyr Cys Lys Arg Gly Gly Phe Ile Pro Glu 65 70 75 8O Tyr Asp Phe Asp Ala Arg Glu Phe Gly Lieu. Asn Met Phe Glin Met Glu 85 90 95 Asp Ser Asp Ala Asn Glin Thr Ile Ser Lieu Lleu Lys Wall Lys Glu Ala 100 105 110 US 2008/0026440 A1 Jan. 31, 2008 49

-continued Leu Glin Asp Ala Gly Ile Asp Ala Leu Gly Lys Glu Lys Lys Asn. Ile 115 120 125 Gly Cys Val Lieu Gly Ile Gly Gly Gly Glin Lys Ser Ser His Glu Phe 130 135 1 4 0 Tyr Ser Arg Lieu. Asn Tyr Val Val Val Glu Lys Val Lieu Arg Lys Met 145 15 O 155 160 Gly Met Pro Glu Glu Asp Wall Lys Val Ala Val Glu Lys Tyr Lys Ala 1.65 170 175 Asn Phe Pro Glu Trp Arg Lieu. Asp Ser Phe Pro Gly Phe Leu Gly Asn 18O 185 19 O Val Thr Ala Gly Arg Cys Thr Asn Thr Phe Asn Leu Asp Gly Met Asn 195 200 2O5 Cys Val Val Asp Ala Ala Cys Ala Ser Ser Lieu. Ile Ala Wall Lys Wal 210 215 220 Ala Ile Asp Glu Lieu Lleu Tyr Gly Asp Cys Asp Met Met Val Thr Gly 225 230 235 240 Ala Thr Cys Thr Asp Asn Ser Ile Gly Met Tyr Met Ala Phe Ser Lys 245 250 255 Thr Pro Val Phe Ser Thr Asp Pro Ser Val Arg Ala Tyr Asp Glu Lys 260 265 27 O Thr Lys Gly Met Lieu. Ile Gly Glu Gly Ser Ala Met Lieu Val Lieu Lys 275 280 285 Arg Tyr Ala Asp Ala Val Arg Asp Gly Asp Glu Ile His Ala Val Ile 29 O 295 3OO Arg Gly Cys Ala Ser Ser Ser Asp Gly Lys Ala Ala Gly Ile Tyr Thr 305 310 315 320 Pro Thir Ile Ser Gly Glin Glu Glu Ala Lieu Arg Arg Ala Tyr Asn Arg 325 330 335 Ala Cys Val Asp Pro Ala Thr Val Thr Leu Val Glu Gly His Gly Thr 340 345 35 O Gly Thr Pro Val Gly Asp Arg Ile Glu Lieu. Thr Ala Lieu Arg Asn Lieu 355 360 365 Phe Asp Lys Ala Tyr Gly Glu Gly Asn Thr Glu Lys Val Ala Val Gly 370 375 38O Ser Ile Lys Ser Ser Ile Gly His Lieu Lys Ala Val Ala Gly Lieu Ala 385 390 395 400 Gly Met Ile Llys Val Ile Met Ala Lieu Lys His Lys Thr Lieu Pro Gly 405 410 415 Thir Ile Asin Val Asp Asn Pro Pro Asn Leu Tyr Asp Asn Thr Pro Ile 420 425 43 O Asn Glu Ser Ser Leu Tyr Ile Asn Thr Met Asn Arg Pro Trp Phe Pro 435 4 40 4 45 Pro Pro Gly Val Pro Arg Arg Ala Gly Ile Ser Ser Phe Gly Phe Gly 450 455 460 Gly Ala Asn Tyr His Ala Wall Leu Glu Glu Ala Glu Pro Glu His Thr 465 470 475 480 Thr Ala Tyr Arg Leu Asn Lys Arg Pro Glin Pro Val Leu Met Met Ala 485 490 495 Ala Thr Pro Ala Ala Lieu Glin Ser Lieu. Cys Glu Ala Glin Leu Lys Glu 5 OO 505 51O. Phe Glu Ala Ala Ile Lys Glu Asn. Glu Thr Val Lys Asn. Thir Ala Tyr US 2008/0026440 A1 Jan. 31, 2008 50

-continued

515 525

Ile Lys Wall Lys Phe Gly Glu Glin Phe Phe Pro Gly Ser Ile 530 535 540

Pro Ala Thr Asn Ala Arg Teu Gly Phe Telu Wall Asp Ala Glu Asp 545 550 555 560

Ala Ser Thr Teu Arg Ala Ile Cys Ala Glin Phe Ala Asp Wall 565 570 575

Thr Glu Ala Trp Teu Pro Arg Glu Gly Wall Ser Phe Arg Ala 585 59 O

Gly Ile Ala Thr Asn Gly Ala Wall Ala Ala Teu Phe Ser Gly Glin 595 600 605

Gly Ala Glin Tyr Thr His Met Phe Ser Glu Wall Ala Met Asn Trp Pro 610 615

Glin Phe Arg Glin Ser Ile Ala Ala Met Asp Ala Ala Glin Ser Wall 625 630 635 640

Ala Gly Ser Asp Lys Asp Phe Glu Arg Wall Ser Glin Wall Telu Tyr Pro 645 650 655

Arg Pro Tyr Glu Glu Pro Glu Glin Asn Pro Lys Ile Ser 660 665 67 O

Teu Thr Ala Tyr Ser Glin Pro Ser Thr Telu Ala Ala Telu Gly Ala 675 680 685

Phe Glu Ile Phe Lys Glu Ala Gly Phe Thr Pro Asp Phe Ala Ala Gly 69 O. 695 7 OO

His Ser Telu Gly Glu Phe Ala Ala Leu Tyr Ala Ala Gly Cys Wall Asp 705 710 715 720

Arg Asp Glu Telu Phe Glu Teu Val Cys Arg Arg Ala Arg Ile Met Gly 725 730 735

Gly Lys Asp Ala Pro Ala Thr Pro Lys Gly Cys Met Ala Ala Wall Ile 740 745 750

Gly Pro Asn Ala Glu Asn Ile Lys Wall Glin Ala Ala Asn Wall Trp Telu 755 760 765

Gly Asn Ser Asn Ser Pro Ser Glin Thr Wall Ile Thr Gly Ser Wall Glu 770 775

Gly Ile Glin Ala Glu Ser Ala Arg Lieu Glin Lys Glu Gly Phe Arg Wall 785 790 795

Wall Pro Telu Ala Cys Glu Ser Ala Phe His Ser Pro Glin Met Glu Asn 805 810 815

Ala Ser Ser Ala Phe Lys Wall Ile Ser Wall Ser Phe Arg Thr 820 825

Pro Ala Glu Thr Lys Teu Phe Ser Asn Wall Ser Gly Glu Thr 835 840 845

Pro Thr Asp Ala Glu Met Leu. Thr Glin His Met Thr Ser Ser Wall 85 O 855 860

Lys Phe Telu Thr Glin Wall Arg Asn Met His Glin Ala Gly Ala Arg Ile 865 870 875 88O

Phe Wall Glu Phe Gly Pro Lys Glin Wall Telu Ser Teu Wall Ser Glu 885 890 895

Thr Telu Asp Asp Pro Ser Wal Wall Thr Wall Ser Wall Asn Pro Ala 9 OO 905 910

Ser Gly Thr Asp Ser Asp Ile Gln Leu Asp Ala Ala Wall Glin Telu 915 920 925 US 2008/0026440 A1 Jan. 31, 2008 51

-continued

Val Val Ala Gly Val Asn Lieu Glin Gly Phe Asp Lys Trp Asp Ala Pro 930 935 940 Asp Ala Thr Arg Met Glin Ala Ile Lys Lys Lys Arg Thr Thr Lieu Arg 945 950 955 96.O Leu Ser Ala Ala Thr Tyr Val Ser Asp Llys Thr Lys Lys Val Arg Asp 965 970 975 Ala Ala Met Asn Asp Gly Arg Cys Val Thr Tyr Lieu Lys Gly Ala Ala 98O 985 99 O Pro Lieu. Ile Lys Ala Pro Glu Pro Val Val Asp Glu Ala Ala Lys Arg 995 10 OO 1005 Glu Ala Glu Arg Lieu Gln Lys Glu Lieu Glin Asp Ala Glin Arg Glin O 10 O15 O20 Leu Asp Asp Ala Lys Arg Ala Ala Ala Glu Ala Asn. Ser Lys Lieu O25 O3O O35 Ala Ala Ala Lys Glu Glu Ala Lys Thr Ala Ala Ala Ser Ala Lys O40 O45 O5 O Pro Ala Val Asp Thr Ala Val Val Glu Lys His Arg Ala Ile Lieu O55 O60 O65 Lys Ser Met Leu Ala Glu Lieu. Asp Gly Tyr Gly Ser Val Asp Ala OFO O75 O8O

Ser Ser Leu Glin Glin Glin Glin Glin Glin Glin Thr Ala Pro Ala Pro

Wall Lys Ala Ala Ala Pro Ala Ala Pro Wall Ala Ser Ala Pro Ala

Pro Ala Val Ser Asn. Glu Lieu Lieu Glu Lys Ala Glu Thr Val Val

Met Glu Val Leu Ala Ala Lys Thr Gly Tyr Glu Thr Asp Met Ile

Glu Ala Asp Met Glu Lieu Glu Thr Glu Lieu Gly e Asp Ser Ile

Lys Arg Val Glu Ile Leu Ser Glu Val Glin Ala Met Lieu. Asn. Wall

Glu Ala Lys Asp Wall Asp Ala Leu Ser Arg Thr Arg Thr Val Gly

Glu Val Val Asn Ala Met Lys Ala Glu Ile Ala Gly Ser Ser Ala

Pro Ala Pro Ala Ala Ala Ala Pro Ala Pro Ala Lys Ala Ala Pro

Ala Ala Ala Ala Pro Ala Val Ser Asn. Glu Lieu Lleu Glu Lys Ala

Glu Thr Val Val Met Glu Val Leu Ala Ala Lys Thr Gly Tyr Glu

Thr Asp Met Ile Glu Ser Asp Met Glu Leu Glu Thr Glu Leu Gly

Ile Asp Ser Ile Lys Arg Val Glu Ile Leu Ser Glu Val Glin Ala

Met Lieu. Asn Val Glu Ala Lys Asp Wall Asp Ala Leu Ser Arg Thr

Arg Thr Val Gly Glu Val Val Asn Ala Met Lys Ala Glu Ile Ala US 2008/0026440 A1 Jan. 31, 2008 52

-continued Gly Gly Ser Ala Pro Ala Pro Ala Ala Ala Ala Pro Gly Pro Ala 310 315 320

Ala Ala Ala Pro Ala Pro Ala Ala Ala Ala Pro Ala Wal Ser Asn

Glu Lieu Lleu Glu Lys Ala Glu Thr Val Val Met Glu Wall Leu Ala

Ala Lys Thr Gly Tyr Glu Thr Asp Met Ile Glu Ser Asp Met Glu

Leu Glu Thr Glu Lieu Gly Ile Asp Ser Ile Lys Arg Val Glu Ile

Leu Ser Glu Val Glin Ala Met Lieu. Asn Val Glu Ala Lys Asp Wall

Asp Ala Leu Ser Arg Thr Arg Thr Val Gly Glu Val Val Asp Ala

Met Lys Ala Glu Ile Ala Gly Gly Ser Ala Pro Ala Pro Ala Ala

Ala Ala Pro Ala Pro Ala Ala Ala Ala Pro Ala Pro Ala Ala Pro

Ala Pro Ala Val Ser Ser Glu Lieu Lieu Glu Lys Ala Glu Thr Val

Val Met Glu Val Leu Ala Ala Lys Thr Gly Tyr Glu Thr Asp Met

Ile Glu Ser Asp Met Glu Lieu Glu Thr Glu Lieu Gly Ile Asp Ser 475 480 485 Ile Lys Arg Val Glu Ile Leu Ser Glu Val Glin Ala Met Lieu. Asn 490 495 5 OO Val Glu Ala Lys Asp Val Asp Ala Leu Ser Arg Thr Arg Thr Val

Gly Glu Val Val Asp Ala Met Lys Ala Glu Ile Ala Gly Gly Ser

Ala Pro Ala Pro Ala Ala Ala Ala Pro Ala Pro Ala Ala Ala Ala

Pro Ala Pro Ala Ala Pro Ala Pro Ala Ala Pro Ala Pro Ala Wall

Ser Ser Glu Leu Leu Glu Lys Ala Glu Thr Val Val Met Glu Val

Leu Ala Ala Lys Thr Gly Tyr Glu Thir Asp Met Ile Glu Ser Asp

Met Glu Lieu Glu Thr Glu Lieu Gly Ile Asp Ser Ile Lys Arg Val

Glu Ile Leu Ser Glu Val Glin Ala Met Lieu. Asn Val Glu Ala Lys

Asp Wall Asp Ala Leu Ser Arg Thr Arg Thr Val Gly Glu Val Val

Asp Ala Met Lys Ala Glu Ile Ala Gly Ser Ser Ala Ser Ala Pro

Ala Ala Ala Ala Pro Ala Pro Ala Ala Ala Ala Pro Ala Pro Ala

Ala Ala Ala Pro Ala Val Ser Asn. Glu Lieu Lleu Glu Lys Ala Glu 670 675 680 Thr Val Val Met Glu Val Leu Ala Ala Lys Thr Gly Tyr Glu Thr US 2008/0026440 A1 Jan. 31, 2008 53

-continued

685 69 O. 695 Asp Met Ile Glu Ser Asp Met Glu Lieu Glu Thr Glu Lieu Gly Ile

Asp Ser Ile Lys Arg Val Glu Ile Leu Ser Glu Val Glin Ala Met

Lieu. Asn Val Glu Ala Lys Asp Wall Asp Ala Lieu Ser Arg Thr Arg

Thr Val Gly Glu Val Val Asp Ala Met Lys Ala Glu Ile Ala Gly

Gly Ser Ala Pro Ala Pro Ala Ala Ala Ala Pro Ala Pro Ala Ala

Ala Ala Pro Ala Wal Ser Asn. Glu Lieu Lieu Glu Lys Ala Glu Thr

Val Val Met Glu Val Leu Ala Ala Lys Thr Gly Tyr Glu Thr Asp

Met Ile Glu Ser Asp Met Glu Lieu Glu Thr Glu Lieu Gly Ile Asp

Ser Ile Lys Arg Val Glu Ile Leu Ser Glu Val Glin Ala Met Leu

Asn Val Glu Ala Lys Asp Wall Asp Ala Lieu Ser Arg Thr Arg Thr

Val Gly Glu Val Val Asp Ala Met Lys Ala Glu Ile Ala Gly Ser

Ser Ala Pro Ala Pro Ala Ala Ala Ala Pro Ala Pro Ala Ala Ala

Ala Pro Ala Pro Ala Ala Ala Ala Pro Ala Wal Ser Ser Glu Lieu

Leu Glu Lys Ala Glu Thr Val Val Met Glu Val Lieu Ala Ala Lys 895 9 OO 905 Thr Gly Tyr Glu Thr Asp Met Ile Glu Ser Asp Met Glu Leu Glu 910 915 920 Thr Glu Lieu Gly Ile Asp Ser Ile Lys Arg Val Glu Ile Leu Ser

Glu Val Glin Ala Met Lieu. Asn Val Glu Ala Lys Asp Wall Asp Ala

Leu Ser Arg Thr Arg Thr Val Gly Glu Val Val Asp Ala Met Lys

Ala Glu Ile Ala Gly Gly Ser Ala Pro Ala Pro Ala Ala Ala Ala

Pro Ala Pro Ala Ala Ala Ala Pro Ala Wal Ser Asn. Glu Lieu Lieu 985 99 O 995 Glu Lys Ala Glu Thr Val Val Met Glu Val Leu Ala Ala Lys Thr 2OOO 2005 2010 Gly Tyr Glu Thr Asp Met Ile Glu Ser Asp Met Glu Leu Glu Thr 2015 2020 2025 Glu Lieu Gly Ile Asp Ser Ile Lys Arg Val Glu Ile Leu Ser Glu 2030 2O35 20 40 Val Glin Ala Met Lieu. Asn Val Glu Ala Lys Asp Val Asp Ala Lieu 2O45 2O5 O 2O55 Ser Arg Thr Arg Thr Val Gly Glu Val Val Asp Ala Met Lys Ala 2060 2O65 2070 US 2008/0026440 A1 Jan. 31, 2008 54

-continued

Glu Ile Ala Gly Gly Ser Ala Pro Ala Pro Ala Ala Ala Ala Pro

Ala Ser Ala Gly Ala Ala Pro Ala Wall Lys Ile Asp Ser Val His 2O 90 2095 2100 Gly Ala Asp Cys Asp Asp Leu Ser Lieu Met His Ala Lys Val Val

Asp Ile Arg Arg Pro Asp Glu Lieu. Ile Leu Glu Arg Pro Glu Asn

Arg Pro Val Lieu Val Val Asp Asp Gly Ser Glu Lieu. Thir Lieu Ala

Leu Val Arg Val Leu Gly Ala Cys Ala Val Val Leu Thr Phe Glu 2 Gly Lieu Gln Leu Ala Glin Arg Ala Gly Ala Ala Ala Ile Arg His 2 Val Lieu Ala Lys Asp Leu Ser Ala Glu Ser Ala Glu Lys Ala Ile

Lys Glu Ala Glu Glin Arg Phe Gly Ala Leu Gly Gly Phe Ile Ser 2 Gln Glin Ala Glu Arg Phe Glu Pro Ala Glu Ile Leu Gly Phe Thr 2210 2215 2220 Leu Met Cys Ala Lys Phe Ala Lys Ala Ser Lieu. Cys Thr Ala Wall 2225 22.30 2235 Ala Gly Gly Arg Pro Ala Phe Ile Gly Val Ala Arg Lieu. Asp Gly 2240 22 45 225 O Arg Lieu Gly Phe Thir Ser Glin Gly Thr Ser Asp Ala Lieu Lys Arg 2255 2260 2265 Ala Glin Arg Gly Ala Ile Phe Gly Lieu. Cys Lys Thr Ile Gly Lieu 2270 2275 228O Glu Trp Ser Glu Ser Asp Val Phe Ser Arg Gly Val Asp Ile Ala 2285 229 O 2295 Glin Gly Met His Pro Glu Asp Ala Ala Val Ala Ile Val Arg Glu 2300 2305 2310 Met Ala Cys Ala Asp Ile Arg Ile Arg Glu Val Gly Ile Gly Ala 2315 2320 2325 Asn Glin Glin Arg Cys Thir Ile Arg Ala Ala Lys Lieu Glu Thr Gly 2330 2335 234. O Asn Pro Glin Arg Glin Ile Ala Lys Asp Asp Wall Leu Lieu Val Ser 2345 235 O 2355 Gly Gly Ala Arg Gly Ile Thr Pro Lieu. Cys Ile Arg Glu Ile Thr 2360 2365 2370 Arg Glin Ile Ala Gly Gly Lys Tyr Ile Leu Lleu Gly Arg Ser Lys 2375 2380 2385 Val Ser Ala Ser Glu Pro Ala Trp Cys Ala Gly Ile Thr Asp Glu 2390 2395 24 OO Lys Ala Val Glin Lys Ala Ala Thr Glin Glu Lieu Lys Arg Ala Phe 2405 2410 24.15 Ser Ala Gly Glu Gly Pro Llys Pro Thr Pro Arg Ala Val Thr Lys 2420 24.25 24.30 Leu Val Gly Ser Val Lieu Gly Ala Arg Glu Val Arg Ser Ser Ile 2435 24 40 2445 US 2008/0026440 A1 Jan. 31, 2008 55

-continued Ala Ala Ile Glu Ala Leu Gly Gly Lys Ala Ile Tyr Ser Ser Cys 2450 2455 2460 Asp Wall Asn. Ser Ala Ala Asp Val Ala Lys Ala Val Arg Asp Ala 2465 2470 24.75 Glu Ser Glin Leu Gly Ala Arg Val Ser Gly Ile Val His Ala Ser 24.80 2485 24.90 Gly Val Lieu Arg Asp Arg Lieu. Ile Glu Lys Lys Lieu Pro Asp Glu 2495 25 OO 25 O5 Phe Asp Ala Val Phe Gly. Thir Lys Val Thr Gly Leu Glu Asn Leu 2510 2515 252O Leu Ala Ala Val Asp Arg Ala Asn Lieu Lys His Met Wall Leu Phe 2525 25.30 2535 Ser Ser Lieu Ala Gly Phe His Gly Asn Val Gly Glin Ser Asp Tyr 2540 25.45 255 O Ala Met Ala Asn. Glu Ala Lieu. Asn Lys Met Gly Lieu Glu Lieu Ala 2555 2560 2565 Lys Asp Wal Ser Wall Lys Ser Ile Cys Phe Gly Pro Trp Asp Gly 2570 2575 258O Gly Met Val Thr Pro Glin Leu Lys Lys Glin Phe Glin Glu Met Gly 2585 2590 2595 Val Glin Ile Ile Pro Arg Glu Gly Gly Ala Asp Thr Val Ala Arg 26 OO 2605 26.10 Ile Val Leu Gly Ser Ser Pro Ala Glu Ile Leu Val Gly Asn Trp 2615 262O 2625 Arg Thr Pro Ser Lys Llys Val Gly Ser Asp Thir Ile Thr Leu. His 2630 2635 264 O Arg Lys Ile Ser Ala Lys Ser Asn. Pro Phe Leu Glu Asp His Val 2645 26.50 2655 Ile Glin Gly Arg Arg Val Leu Pro Met Thr Leu Ala Ile Gly Ser 2660 2665 2670 Leu Ala Glu Thr Cys Leu Gly Leu Phe Pro Gly Tyr Ser Leu Trp 2675 268O 2685 Ala Ile Asp Asp Ala Glin Leu Phe Lys Gly Val Thr Val Asp Gly 2690 2695 27 OO Asp Val Asn Cys Glu Val Thr Leu Thr Pro Ser Thr Ala Pro Ser 27 O5 210 2715 Gly Arg Val Asn Val Glin Ala Thr Leu Lys Thr Phe Ser Ser Gly 2720 2725 273 O Lys Lieu Val Pro Ala Tyr Arg Ala Val Ile Val Lieu Ser Asn. Glin 2735 2740 2745 Gly Ala Pro Pro Ala Asn Ala Thr Met Gln Pro Pro Ser Leu Asp 2750 2755 276 O. Ala Asp Pro Ala Leu Glin Gly Ser Val Tyr Asp Gly Lys Thr Lieu 2765 2770 2775 Phe His Gly Pro Ala Phe Arg Gly Ile Asp Asp Val Lieu Ser Cys 2780 2785 279 O Thr Lys Ser Glin Lieu Val Ala Lys Cys Ser Ala Val Pro Gly Ser 2795 2800 2805 Asp Ala Ala Arg Gly Glu Phe Ala Thr Asp Thr Asp Ala His Asp 2810 2815 282O Pro Phe Val Asn Asp Leu Ala Phe Glin Ala Met Leu Val Trp Val US 2008/0026440 A1 Jan. 31, 2008 56

-continued

2825 2830 2835 Arg Arg Thr Lieu Gly Glin Ala Ala Lieu Pro Asn. Ser Ile Glin Arg 284 O 284.5 285 O Ile Val Gln His Arg Pro Val Pro Glin Asp Llys Pro Phe Tyr Ile 2855 2.860 2865 Thr Lieu Arg Ser Asn Glin Ser Gly Gly His Ser Glin His Lys His 2870 2875 2880 Ala Leu Glin Phe His Asn. Glu Glin Gly Asp Leu Phe Ile Asp Wall 2.885 2890 2.895 Glin Ala Ser Val Ile Ala Thr Asp Ser Leu Ala Phe 29 OO 2905 2.910

<210> SEQ ID NO 3 &2 11s LENGTH 6.177 &212> TYPE DNA <213> ORGANISM: Schizochytrium sp. &220s FEATURE <221 NAME/KEY: CDS <222> LOCATION: (1) . . (6.177) <400 SEQUENCE: 3 atg goc got cqg aat gtg agc goc gog cat gag atg cac gat gala aag 48 Met Ala Ala Arg Asn. Wal Ser Ala Ala His Glu Met His Asp Glu Lys 1 5 10 15 cgc atc gcc gtc gtc. g.gc atg gCC gtc. cag tac goc gga togc aaa acc 96 Arg Ile Ala Val Val Gly Met Ala Val Glin Tyr Ala Gly Cys Lys Thr 2O 25 30 aag gaC gag titC togg gag gtg Ctc atg aac ggC aag gtC gag to C aag 144 Lys Asp Glu Phe Trp Glu Val Lieu Met Asn Gly Lys Val Glu Ser Lys 35 40 45 gtg atc agc gac aaa ciga citc ggc toc aac tac cqc gcc gag cac tac 192 Val Ile Ser Asp Lys Arg Lieu Gly Ser Asn Tyr Arg Ala Glu His Tyr 50 55 60 aaa goa gag cqc agc aag tat gcc gac acc titt toc aac gala acg tac 240 Lys Ala Glu Arg Ser Lys Tyr Ala Asp Thr Phe Cys Asn. Glu Thir Tyr 65 70 75 8O ggc acc citt gac gag aac gag atc gac aac gag cac gaa citc ctic citc 288 Gly Thr Lieu. Asp Glu Asn. Glu Ile Asp Asn. Glu His Glu Lieu Lleu Lieu 85 90 95 aac citc gcc aag cag goa citc goa gag aca toc gtc. aaa gac to g aca 336 Asn Lieu Ala Lys Glin Ala Leu Ala Glu Thir Ser Wall Lys Asp Ser Thr 100 105 110 cgc toc ggc atc gtc. agc ggc toc ctic to g titc ccc atg gac aac citc 384 Arg Cys Gly Ile Val Ser Gly Cys Lieu Ser Phe Pro Met Asp Asn Lieu 115 120 125 cag ggit gaa citc. citc aac gtg tac caa aac cat gtc gag aaa aag citc 432 Glin Gly Glu Lieu Lleu. Asn Val Tyr Glin Asn His Val Glu Lys Lys Lieu 130 135 1 4 0 ggg gCC cqc gtc titc aag gac goc toc cat togg toc gaa cqc gag cag 480 Gly Ala Arg Val Phe Lys Asp Ala Ser His Trp Ser Glu Arg Glu Glin 145 15 O 155 160 to c aac aaa coc gag goc ggit gac cqc cqc atc titc atg gac cog goc 528 Ser Asn Lys Pro Glu Ala Gly Asp Arg Arg Ile Phe Met Asp Pro Ala 1.65 170 175 to C titc gtc goc gaa gaa citc aac citc ggc goc citt cac tac toc gtc 576 Ser Phe Val Ala Glu Glu Lieu. Asn Lieu Gly Ala Lieu. His Tyr Ser Val 18O 185 19 O

US 2008/0026440 A1 Jan. 31, 2008 61

-continued acc atc gttcaag citt gtg got tog citc aag goc cac citt gtt cott 4.194 Thir Ile Val Lys Lieu Val Ala Ser Lieu Lys Ala His Lieu Val Pro 385 39 O. 395 ggc gtc acg atc. tcg cc g citg tac cac toc aag citt gtg gog gag 4239 Gly Val Thr Ile Ser Pro Leu Tyr His Ser Lys Leu Val Ala Glu 400 405 410 gct cag got toc tac got gog citc toc aag ggt gala aag coc aag 4284 Ala Glin Ala Cys Tyr Ala Ala Lieu. Cys Lys Gly Glu Lys Pro Lys 415 420 425 aag aac aag titt gtg cqc aag att cag citc aac ggit cqc titc aac 4329 Lys Asn Lys Phe Val Arg Lys Ile Glin Lieu. Asn Gly Arg Phe Asn 430 435 4 40 agc aag gog gac ccc atc. tcc to g goc gat citt gcc agc titt cog 4374 Ser Lys Ala Asp Pro Ile Ser Ser Ala Asp Leu Ala Ser Phe Pro 4 45 450 455 cct gcg gac cct gcc att gala gCo goc atc. tcg agc cqc atc atg 4 419 Pro Ala Asp Pro Ala Ile Glu Ala Ala Ile Ser Ser Arg Ile Met 460 465 470 aag cott gttc gct coc aag titc tac gog cqt citc aac att gac gag 4 464 Lys Pro Val Ala Pro Llys Phe Tyr Ala Arg Lieu. Asn. Ile Asp Glu 475 480 485 cag gac gag acc cga gat cog atc ctic aac aag gac aac gog cc.g 4509 Glin Asp Glu Thr Arg Asp Pro Ile Lieu. Asn Lys Asp Asn Ala Pro 490 495 5 OO tot tot tot tot tot tot tot tot tot tot tot tot tot tot tot 4554 Ser Ser Ser Ser Ser Ser Ser Ser Ser Ser Ser Ser Ser Ser Ser 505 510 515 cc.g. tcg cct gct cott tog goc coc gtg caa aag aag got gct coc 45.99 Pro Ser Pro Ala Pro Ser Ala Pro Val Gln Lys Lys Ala Ala Pro 52O 525 530 gcc gog gag acc aag got gtt got tog got gac goa citt cqc agt 4 64 4 Ala Ala Glu Thir Lys Ala Wall Ala Ser Ala Asp Ala Lieu Arg Ser 535 540 545 gcc ct g c to gat citc gac agt at g citt gcg citg agc tot gcc agt 4689 Ala Lieu Lleu. Asp Lieu. Asp Ser Met Leu Ala Leu Ser Ser Ala Ser 550 555 560 gcc toc ggc aac citt gtt gag act gcg cct agc gac goc to g g to 4734 Ala Ser Gly Asn Lieu Val Glu Thir Ala Pro Ser Asp Ala Ser Val 565 570 575 att gtg ccg ccc toc aac att gog gat citc ggc agc cqc gcc titc 4779 Ile Val Pro Pro Cys Asn. Ile Ala Asp Leu Gly Ser Arg Ala Phe 58O 585 590 atg aaa acg tac ggt gtt tog gog cct citg tac acg ggc gcc atg 4824 Met Lys Thr Tyr Gly Val Ser Ala Pro Leu Tyr Thr Gly Ala Met 595 600 605 gcc aag ggc att gcc tot gog gac ctic gtc att gcc gcc ggc cqc 486.9 Ala Lys Gly Ile Ala Ser Ala Asp Leu Val Ile Ala Ala Gly Arg 610 615 62O cag ggc atc. citt gcg to C titt ggc goc ggc gga citt coc atg cag 4.914 Glin Gly Ile Leu Ala Ser Phe Gly Ala Gly Gly Lieu Pro Met Glin 625 630 635 gtt gt g c gt gag to c atc gala aag att cag goc goc ctd coc aat 4.959 Val Val Arg Glu Ser Ile Glu Lys Ile Glin Ala Ala Lieu Pro Asn 640 645 650 ggc cog tac got gtc. aac citt atc cat tot coc titt gac agc aac 5 OO 4 Gly Pro Tyr Ala Val Asn Leu Ile His Ser Pro Phe Asp Ser Asn 655 660 665

US 2008/0026440 A1 Jan. 31, 2008

-continued citc aag atg tog citg toc titt cqc togg tac citg agc ctd gc g agc 5904 Leu Lys Met Ser Lieu. Cys Phe Arg Trp Tyr Lieu Ser Lieu Ala Ser 1955 1960 1965 cgc togg gcc aac act gga got to c gat cqc gtc atg gac tac cag 59.49 Arg Trp Ala Asn Thr Gly Ala Ser Asp Arg Wal Met Asp Tyr Glin 1970 1975 1980 gto togg togc ggit cott goc att ggit to C titc aac gat titc atc aag 5994 Val Trp Cys Gly Pro Ala Ile Gly Ser Phe Asn Asp Phe Ile Lys 1985 1990 1995 gga act tac citt gat cog goc gtc goa aac gag tac cog togc gtc 6 O39 Gly Thr Tyr Leu Asp Pro Ala Val Ala Asn Glu Tyr Pro Cys Val 2OOO 2005 2010 gtt cag att aac aag cag atc citt cqt gga gcg togc titc ttg cqc 6084 Val Glin Ile Asn Lys Glin Ile Leu Arg Gly Ala Cys Phe Leu Arg 2015 2020 2025 cgt citc gaa att citg cqc aac goa cqc citt toc gat ggc gct gcc 6129 Arg Lieu Glu Ile Leu Arg Asn Ala Arg Lieu Ser Asp Gly Ala Ala 2030 2O35 20 40 gct citt gtg gcc agc atc gat gac aca tac gtc. cc.g. gcc gag aag 6174 Ala Lieu Val Ala Ser Ile Asp Asp Thr Tyr Val Pro Ala Glu Lys 2O45 2O5 O 2O55 citg 6.177 Teu

<210> SEQ ID NO 4 &2 11s LENGTH 2059 &212> TYPE PRT <213> ORGANISM: Schizochytrium sp. &220s FEATURE <221 NAME/KEY: misc feature <222> LOCATION: (370) ... (370) <223> OTHER INFORMATION: The 'Xaa' at location 370 stands for Leu. &220s FEATURE <221 NAME/KEY: misc feature <222> LOCATION: (371) . . (371) <223> OTHER INFORMATION: The 'Xaa' at location 371 stands for Ala or Wall.

<400 SEQUENCE: 4 Met Ala Ala Arg Asn. Wal Ser Ala Ala His Glu Met His Asp Glu Lys 1 5 10 15 Arg Ile Ala Val Val Gly Met Ala Val Glin Tyr Ala Gly Cys Lys Thr 2O 25 30 Lys Asp Glu Phe Trp Glu Val Lieu Met Asn Gly Lys Val Glu Ser Lys 35 40 45 Val Ile Ser Asp Lys Arg Lieu Gly Ser Asn Tyr Arg Ala Glu His Tyr 50 55 60 Lys Ala Glu Arg Ser Lys Tyr Ala Asp Thr Phe Cys Asn. Glu Thir Tyr 65 70 75 8O Gly Thr Lieu. Asp Glu Asn. Glu Ile Asp Asn. Glu His Glu Lieu Lleu Lieu 85 90 95 Asn Lieu Ala Lys Glin Ala Leu Ala Glu Thir Ser Wall Lys Asp Ser Thr 100 105 110 Arg Cys Gly Ile Val Ser Gly Cys Lieu Ser Phe Pro Met Asp Asn Lieu 115 120 125 Glin Gly Glu Lieu Lleu. Asn Val Tyr Glin Asn His Val Glu Lys Lys Lieu 130 135 1 4 0 Gly Ala Arg Val Phe Lys Asp Ala Ser His Trp Ser Glu Arg Glu Glin US 2008/0026440 A1 Jan. 31, 2008 64

-continued

145 15 O 155 160 Ser Asn Lys Pro Glu Ala Gly Asp Arg Arg Ile Phe Met Asp Pro Ala 1.65 170 175 Ser Phe Val Ala Glu Glu Lieu. Asn Lieu Gly Ala Lieu. His Tyr Ser Val 18O 185 19 O Asp Ala Ala Cys Ala Thr Ala Leu Tyr Val Lieu Arg Lieu Ala Glin Asp 195 200 2O5 His Lieu Val Ser Gly Ala Ala Asp Wal Met Lieu. Cys Gly Ala Thr Cys 210 215 220 Leu Pro Glu Pro Phe Phe Ile Leu Ser Gly Phe Ser Thr Phe Glin Ala 225 230 235 240 Met Pro Val Gly Thr Gly Glin Asn Val Ser Met Pro Leu. His Lys Asp 245 250 255 Ser Glin Gly Leu Thr Pro Gly Glu Gly Gly Ser Ile Met Val Leu Lys 260 265 27 O Arg Lieu. Asp Asp Ala Ile Arg Asp Gly Asp His Ile Tyr Gly. Thir Lieu 275 280 285 Leu Gly Ala Asn. Wal Ser Asn. Ser Gly Thr Gly Lieu Pro Leu Lys Pro 29 O 295 3OO Leu Lleu Pro Ser Glu Lys Lys Cys Lieu Met Asp Thr Tyr Thr Arg Ile 305 310 315 320 Asn Val His Pro His Lys Ile Glin Tyr Val Glu Cys His Ala Thr Gly 325 330 335 Thr Pro Glin Gly Asp Arg Val Glu Ile Asp Ala Wall Lys Ala Cys Phe 340 345 35 O Glu Gly Lys Val Pro Arg Phe Gly Thr Thr Lys Gly Asn Phe Gly. His 355 360 365 Thr Xaa Xaa Ala Ala Gly Phe Ala Gly Met Cys Lys Val Lieu Lleu Ser 370 375 38O Met Lys His Gly Ile Ile Pro Pro Thr Pro Gly Ile Asp Asp Glu Thr 385 390 395 400 Lys Met Asp Pro Leu Val Val Ser Gly Glu Ala Ile Pro Trp Pro Glu 405 410 415 Thr Asn Gly Glu Pro Lys Arg Ala Gly Lieu Ser Ala Phe Gly Phe Gly 420 425 43 O Gly Thr Asn Ala His Ala Val Phe Glu Glu. His Asp Pro Ser Asn Ala 435 4 40 4 45 Ala Cys Thr Gly His Asp Ser Ile Ser Ala Leu Ser Ala Arg Cys Gly 450 455 460 Gly Glu Ser Asn Met Arg Ile Ala Ile Thr Gly Met Asp Ala Thr Phe 465 470 475 480 Gly Ala Lieu Lys Gly Lieu. Asp Ala Phe Glu Arg Ala Ile Tyr Thr Gly 485 490 495 Ala His Gly Ala Ile Pro Leu Pro Glu Lys Arg Trp Arg Phe Leu Gly 5 OO 505 51O. Lys Asp Lys Asp Phe Lieu. Asp Lieu. Cys Gly Val Lys Ala Thr Pro His 515 52O 525 Gly Cys Tyr Ile Glu Asp Val Glu Val Asp Phe Glin Arg Lieu Arg Thr 530 535 540 Pro Met Thr Pro Glu Asp Met Leu Leu Pro Gln Gln Leu Leu Ala Val 545 550 555 560 US 2008/0026440 A1 Jan. 31, 2008 65

-continued

Thir Thr Ile Asp Arg Ala Ile Leu Asp Ser Gly Met Lys Lys Gly Gly 565 570 575 Asn Val Ala Val Phe Val Gly Lieu Gly Thr Asp Leu Glu Lieu. Tyr Arg 58O 585 59 O His Arg Ala Arg Val Ala Leu Lys Glu Arg Val Arg Pro Glu Ala Ser 595 600 605 Lys Lys Lieu. Asn Asp Met Met Glin Tyr Ile Asn Asp Cys Gly Thr Ser 610 615 62O Thr Ser Tyr Thr Ser Tyr Ile Gly Asn Leu Val Ala Thr Arg Val Ser 625 630 635 640 Ser Gln Trp Gly Phe Thr Gly Pro Ser Phe Thr Ile Thr Glu Gly Asn 645 650 655 Asn Ser Val Tyr Arg Cys Ala Glu Lieu Gly Lys Tyr Lieu Lieu Glu Thr 660 665 67 O Gly Glu Val Asp Gly Val Val Val Ala Gly Val Asp Lieu. Cys Gly Ser 675 680 685 Ala Glu Asn Lieu. Tyr Val Lys Ser Arg Arg Phe Lys Wal Ser Thr Ser 69 O. 695 7 OO Asp Thr Pro Arg Ala Ser Phe Asp Ala Ala Ala Asp Gly Tyr Phe Val 705 710 715 720 Gly Glu Gly Cys Gly Ala Phe Val Lieu Lys Arg Glu Thir Ser Cys Thr 725 730 735 Lys Asp Asp Arg Ile Tyr Ala Cys Met Asp Ala Ile Val Pro Gly Asn 740 745 750 Val Pro Ser Ala Cys Lieu Arg Glu Ala Lieu. Asp Glin Ala Arg Val Lys 755 760 765 Pro Gly Asp Ile Glu Met Leu Glu Lieu Ser Ala Asp Ser Ala Arg His 770 775 78O Leu Lys Asp Pro Ser Val Lieu Pro Lys Glu Lieu. Thr Ala Glu Glu Glu 785 790 795 8OO Ile Gly Gly Lieu Glin Thr Ile Leu Arg Asp Asp Asp Lys Lieu Pro Arg 805 810 815 Asn Val Ala Thr Gly Ser Val Lys Ala Thr Val Gly Asp Thr Gly Tyr 820 825 83O Ala Ser Gly Ala Ala Ser Lieu. Ile Lys Ala Ala Lieu. Cys Ile Tyr Asn 835 840 845 Arg Tyr Lieu Pro Ser Asn Gly Asp Asp Trp Asp Glu Pro Ala Pro Glu 85 O 855 860 Ala Pro Trp Asp Ser Thr Leu Phe Ala Cys Gln Thr Ser Arg Ala Trp 865 870 875 88O Leu Lys Asn Pro Gly Glu Arg Arg Tyr Ala Ala Val Ser Gly Val Ser 885 890 895 Glu Thir Arg Ser Cys Tyr Ser Val Lieu Lleu Ser Glu Ala Glu Gly. His 9 OO 905 910 Tyr Glu Arg Glu Asn Arg Ile Ser Lieu. Asp Glu Glu Ala Pro Llys Lieu 915 920 925 Ile Val Lieu Arg Ala Asp Ser His Glu Glu Ile Leu Gly Arg Lieu. Asp 930 935 940 Lys Ile Arg Glu Arg Phe Leu Glin Pro Thr Gly Ala Ala Pro Arg Glu 945 950 955 96.O US 2008/0026440 A1 Jan. 31, 2008 66

-continued Ser Glu Lieu Lys Ala Glin Ala Arg Arg Ile Phe Leu Glu Lieu Lieu Gly 965 970 975 Glu Thir Lieu Ala Glin Asp Ala Ala Ser Ser Gly Ser Glin Lys Pro Leu 98O 985 99 O Ala Leu Ser Lieu Val Ser Thr Pro Ser Lys Lieu Glin Arg Glu Val Glu 995 10 OO 1005 Leu Ala Ala Lys Gly Ile Pro Arg Cys Lieu Lys Met Arg Arg Asp O 10 O15 O20 Trp Ser Ser Pro Ala Gly Ser Arg Tyr Ala Pro Glu Pro Leu Ala O25 O3O O35 Ser Asp Arg Val Ala Phe Met Tyr Gly Glu Gly Arg Ser Pro Tyr

Tyr Gly Ile Thr Glin Asp Ile His Arg Ile Trp Pro Glu Lieu. His

Glu Val Ile Asn. Glu Lys Thr Asn Arg Lieu Trp Ala Glu Gly Asp

Arg Trp Val Met Pro Arg Ala Ser Phe Lys Ser Glu Leu Glu Ser

Glin Glin Glin Glu Phe Asp Arg Asn Met Ile Glu Met Phe Arg Lieu

Gly e Lieu. Thir Ser Ile Ala Phe Thr Asn Lieu Ala Arg Asp Wal

Lieu. Asn. Ile Thr Pro Lys Ala Ala Phe Gly Lieu Ser Lieu Gly Glu

Ile Ser Met Ile Phe Ala Phe Ser Lys Lys Asn Gly Lieu. Ile Ser

Asp Gln Leu Thir Lys Asp Leu Arg Glu Ser Asp Val Trp Asn Lys

Ala Lieu Ala Val Glu Phe Asn Ala Lieu Arg Glu Ala Trp Gly Ile

Pro Glin Ser Val Pro Lys Asp Glu Phe Trp Gln Gly Tyr Ile Val

Arg Gly Thr Lys Glin Asp e Glu Ala Ala Ile Ala Pro Asp Ser

Lys Tyr Val Arg Leu Thr e Ile Asin Asp Ala Asn Thr Ala Lieu

Ile Ser Gly Lys Pro Asp Ala Cys Lys Ala Ala Ile Ala Arg Lieu

Gly Gly Asn Ile Pro Ala Leu Pro Val Thr Glin Gly Met Cys Gly

His Cys Pro Glu Val Gly Pro Tyr Thr Lys Asp Ile Ala Lys Ile

His Ala Asn Lieu Glu Phe Pro Val Val Asp Gly Lieu. Asp Leu Trp

Thir Thr Ile Asn Gln Lys Arg Lieu Val Pro Arg Ala Thr Gly Ala

Lys Asp Glu Trp Ala Pro Ser Ser Phe Gly Glu Tyr Ala Gly Glin

Leu Tyr Glu Lys Glin Ala Asn Phe Pro Glin Ile Val Glu Thir Ile 325 330 335 Tyr Lys Glin Asn Tyr Asp Val Phe Val Glu Val Gly Pro Asn Asn US 2008/0026440 A1 Jan. 31, 2008 67

-continued

340 345 350 His Arg Ser Thr Ala Val Arg Thr Thr Leu Gly Pro Glin Arg Asn

His Leu Ala Gly Ala Ile Asp Lys Glin Asn. Glu Asp Ala Trp Thr

Thir Ile Val Lys Lieu Val Ala Ser Lieu Lys Ala His Lieu Val Pro

Gly Val Thr Ile Ser Pro Leu Tyr His Ser Lys Leu Val Ala Glu

Ala Glin Ala Cys Tyr Ala Ala Lieu. Cys Lys Gly Glu Lys Pro Lys

Lys Asn Lys Phe Val Arg Lys Ile Glin Lieu. Asn Gly Arg Phe Asn

Ser Lys Ala Asp Pro Ile Ser Ser Ala Asp Leu Ala Ser Phe Pro

Pro Ala Asp Pro Ala Ile Glu Ala Ala Ile Ser Ser Arg Ile Met

Lys Pro Val Ala Pro Llys Phe Tyr Ala Arg Lieu. Asn. Ile Asp Glu

Glin Asp Glu Thr Arg Asp Pro Ile Lieu. Asn Lys Asp Asn Ala Pro

Ser Ser Ser Ser Ser Ser Ser Ser Ser Ser Ser Ser Ser Ser Ser

Pro Ser Pro Ala Pro Ser Ala Pro Val Gln Lys Lys Ala Ala Pro

Ala Ala Glu Thir Lys Ala Wall Ala Ser Ala Asp Ala Lieu Arg Ser

Ala Lieu Lleu. Asp Lieu. Asp Ser Met Leu Ala Leu Ser Ser Ala Ser

Ala Ser Gly Asn Lieu Val Glu Thir Ala Pro Ser Asp Ala Ser Val

Ile Val Pro Pro Cys Asn. Ile Ala Asp Leu Gly Ser Arg Ala Phe

Met Lys Thr Tyr Gly Val Ser Ala Pro Leu Tyr Thr Gly Ala Met

Ala Lys Gly Ile Ala Ser Ala Asp Leu Val Ile Ala Ala Gly Arg

Glin Gly Ile Leu Ala Ser Phe Gly Ala Gly Gly Lieu Pro Met Glin

Val Val Arg Glu Ser Ile Glu Lys Ile Glin Ala Ala Lieu Pro Asn

Gly Pro Tyr Ala Val Asn Leu Ile His Ser Pro Phe Asp Ser Asn

Leu Glu Lys Gly Asn Val Asp Leu Phe Leu Glu Lys Gly Val Thr

Phe Wall Glu Ala Ser Ala Phe Met Thr Lieu. Thr Pro Glin Wal Wall 685 69 O. 695 Arg Tyr Arg Ala Ala Gly Lieu. Thir Arg Asn Ala Asp Gly Ser Val 7 OO 705 710 Asn. Ile Arg Asn Arg Ile Ile Gly Lys Val Ser Arg Thr Glu Lieu 715 720 725 US 2008/0026440 A1 Jan. 31, 2008 68

-continued

Ala Glu Met Phe Met Arg Pro Ala Pro Glu His Leu Lieu Gln Lys 730 735 740 Lieu. Ile Ala Ser Gly Glu Ile Asin Glin Glu Glin Ala Glu Lieu Ala 745 750 755 Arg Arg Val Pro Wall Ala Asp Asp Ile Ala Val Glu Ala Asp Ser 760 765 770 Gly Gly. His Thr Asp Asn Arg Pro Ile His Val Ile Leu Pro Leu

Ile Ile Asn Lieu Arg Asp Arg Lieu. His Arg Glu Cys Gly Tyr Pro

Ala Asn Lieu Arg Val Arg Val Gly Ala Gly Gly Gly Ile Gly Cys

Pro Glin Ala Ala Leu Ala Thr Phe Asn Met Gly Ala Ser Phe Ile

Val Thr Gly Thr Val Asin Glin Val Ala Lys Glin Ser Gly Thr Cys 835 840 845 Asp Asn Val Arg Lys Glin Leu Ala Lys Ala Thr Tyr Ser Asp Wall 850 855 860 Cys Met Ala Pro Ala Ala Asp Met Phe Glu Glu Gly Wall Lys Lieu

Glin Val Lieu Lys Lys Gly Thr Met Phe Pro Ser Arg Ala Asn Lys

Leu Tyr Glu Leu Phe Cys Lys Tyr Asp Ser Phe Glu Ser Met Pro

Pro Ala Glu Lieu Ala Arg Val Glu Lys Arg Ile Phe Ser Arg Ala

Leu Glu Glu Val Trp Asp Glu Thir Lys Asn. Phe Tyr Ile Asn Arg

Lieu. His Asn. Pro Glu Lys Ile Glin Arg Ala Glu Arg Asp Pro Lys

Leu Lys Met Ser Lieu. Cys Phe Arg Trp Tyr Lieu Ser Lieu Ala Ser

Arg Trp Ala Asn Thr Gly Ala Ser Asp Arg Wal Met Asp Tyr Glin

Val Trp Cys Gly Pro Ala Ile Gly Ser Phe Asn Asp Phe Ile Lys

Gly Thr Tyr Leu Asp Pro Ala Val Ala Asn Glu Tyr Pro Cys Val 2OOO 2005 2010 Val Glin Ile Asn Lys Glin Ile Leu Arg Gly Ala Cys Phe Leu Arg 2015 2020 2025 Arg Lieu Glu Ile Leu Arg Asn Ala Arg Lieu Ser Asp Gly Ala Ala 2030 2O35 20 40 Ala Lieu Val Ala Ser Ile Asp Asp Thr Tyr Val Pro Ala Glu Lys 2O45 2O5 O 2O55

Teu

<210 SEQ ID NO 5 &2 11s LENGTH 4509 &212> TYPE DNA <213> ORGANISM: Schizochytrium sp. &220s FEATURE <221 NAME/KEY: CDS

US 2008/0026440 A1 Jan. 31, 2008

-continued citg cqc cqt citc aac goc citg cqc aac gac cc g c go att gac citc 4 464 Leu Arg Arg Lieu. Asn Ala Lieu Arg Asn Asp Pro Arg Ile Asp Lieu 1475 1480 1485 gag acc gag gat gct gcc titt gtc tac gag ccc acc aac gog citc 4509 Glu Thr Glu Asp Ala Ala Phe Val Tyr Glu Pro Thr Asn Ala Leu 1490 1495 15 OO

<210> SEQ ID NO 6 &2 11s LENGTH 1503 &212> TYPE PRT <213> ORGANISM: Schizochytrium sp. <400 SEQUENCE: 6 Met Ala Leu Arg Val Lys Thr Asn Lys Llys Pro Cys Trp Glu Met Thr 1 5 10 15 Lys Glu Glu Leu Thir Ser Gly Lys Thr Glu Val Phe Asn Tyr Glu Glu 2O 25 30 Leu Lieu Glu Phe Ala Glu Gly Asp Ile Ala Lys Val Phe Gly Pro Glu 35 40 45 Phe Ala Val Ile Asp Lys Tyr Pro Arg Arg Val Arg Lieu Pro Ala Arg 50 55 60 Glu Tyr Leu Leu Val Thr Arg Val Thr Leu Met Asp Ala Glu Val Asn 65 70 75 8O Asn Tyr Arg Val Gly Ala Arg Met Val Thr Glu Tyr Asp Leu Pro Val 85 90 95 Asn Gly Glu Lieu Ser Glu Gly Gly Asp Cys Pro Trp Ala Val Lieu Val 100 105 110 Glu Ser Gly Glin Cys Asp Leu Met Lieu. Ile Ser Tyr Met Gly Ile Asp 115 120 125 Phe Glin Asn Glin Gly Asp Arg Val Tyr Arg Lieu Lleu. Asn. Thir Thr Lieu 130 135 1 4 0 Thr Phe Tyr Gly Val Ala His Glu Gly Glu Thir Leu Glu Tyr Asp Ile 145 15 O 155 160 Arg Val Thr Gly Phe Ala Lys Arg Lieu. Asp Gly Gly Ile Ser Met Phe 1.65 170 175 Phe Phe Glu Tyr Asp Cys Tyr Val Asin Gly Arg Leu Leu Ile Glu Met 18O 185 19 O Arg Asp Gly Cys Ala Gly Phe Phe Thr Asn. Glu Glu Lieu. Asp Ala Gly 195 200 2O5 Lys Gly Val Val Phe Thr Arg Gly Asp Leu Ala Ala Arg Ala Lys Ile 210 215 220 Pro Lys Glin Asp Val Ser Pro Tyr Ala Wall Ala Pro Cys Lieu. His Lys 225 230 235 240 Thr Lys Lieu. Asn. Glu Lys Glu Met Glin Thr Lieu Val Asp Lys Asp Trp 245 250 255 Ala Ser Val Phe Gly Ser Lys Asn Gly Met Pro Glu Ile Asn Tyr Lys 260 265 27 O Lieu. Cys Ala Arg Lys Met Leu Met Ile Asp Arg Val Thir Ser Ile Asp 275 280 285 His Lys Gly Gly Val Tyr Gly Lieu Gly Glin Leu Val Gly Glu Lys Ile 29 O 295 3OO Leu Glu Arg Asp His Trp Tyr Phe Pro Cys His Phe Wall Lys Asp Glin 305 310 315 320 US 2008/0026440 A1 Jan. 31, 2008 75

-continued Wal Met Ala Gly Ser Lieu Val Ser Asp Gly Cys Ser Gln Met Lieu Lys 325 330 335 Met Tyr Met Ile Trp Leu Gly Leu. His Leu Thir Thr Gly Pro Phe Asp 340 345 35 O Phe Arg Pro Val Asn Gly. His Pro Asn Lys Val Arg Cys Arg Gly Glin 355 360 365 Ile Ser Pro His Lys Gly Lys Leu Val Tyr Val Met Glu Ile Lys Glu 370 375 38O Met Gly Phe Asp Glu Asp Asn Asp Pro Tyr Ala Ile Ala Asp Wall Asn 385 390 395 400 Ile Ile Asp Wall Asp Phe Glu Lys Gly Glin Asp Phe Ser Lieu. Asp Arg 405 410 415 Ile Ser Asp Tyr Gly Lys Gly Asp Lieu. Asn Lys Lys Ile Val Val Asp 420 425 43 O Phe Lys Gly Ile Ala Lieu Lys Met Gln Lys Arg Ser Thr Asn Lys Asn 435 4 40 4 45 Pro Ser Lys Val Glin Pro Val Phe Ala Asn Gly Ala Ala Thr Val Gly 450 455 460 Pro Glu Ala Ser Lys Ala Ser Ser Gly Ala Ser Ala Ser Ala Ser Ala 465 470 475 480 Ala Pro Ala Lys Pro Ala Phe Ser Ala Asp Wall Leu Ala Pro Llys Pro 485 490 495 Val Ala Lieu Pro Glu His Ile Leu Lys Gly Asp Ala Leu Ala Pro Lys 5 OO 505 51O. Glu Met Ser Trp His Pro Met Ala Arg Ile Pro Gly Asn Pro Thr Pro 515 52O 525 Ser Phe Ala Pro Ser Ala Tyr Lys Pro Arg Asn Ile Ala Phe Thr Pro 530 535 540 Phe Pro Gly Asn Pro Asn Asp Asn Asp His Thr Pro Gly Lys Met Pro 545 550 555 560 Leu Thir Trp Phe Asn Met Ala Glu Phe Met Ala Gly Lys Val Ser Met 565 570 575 Cys Lieu Gly Pro Glu Phe Ala Lys Phe Asp Asp Ser Asn. Thir Ser Arg 58O 585 59 O Ser Pro Ala Trp Asp Leu Ala Leu Val Thr Arg Ala Val Ser Val Ser 595 600 605 Asp Leu Lys His Val Asn Tyr Arg Asn. Ile Asp Lieu. Asp Pro Ser Lys 610 615 62O Gly Thr Met Val Gly Glu Phe Asp Cys Pro Ala Asp Ala Trp Phe Tyr 625 630 635 640 Lys Gly Ala Cys Asn Asp Ala His Met Pro Tyr Ser Ile Leu Met Glu 645 650 655 Ile Ala Leu Gln Thr Ser Gly Val Leu Thir Ser Val Leu Lys Ala Pro 660 665 67 O Lieu. Thir Met Glu Lys Asp Asp Ile Leu Phe Arg Asn Lieu. Asp Ala Asn 675 680 685 Ala Glu Phe Val Arg Ala Asp Lieu. Asp Tyr Arg Gly Lys. Thir Ile Arg 69 O. 695 7 OO Asn Val Thr Lys Cys Thr Gly Tyr Ser Met Leu Gly Glu Met Gly Val 705 710 715 720 His Arg Phe Thr Phe Glu Leu Tyr Val Asp Asp Val Leu Phe Tyr Lys US 2008/0026440 A1 Jan. 31, 2008 76

-continued

725 730 735 Gly Ser Thr Ser Phe Gly Trp Phe Val Pro Glu Val Phe Ala Ala Glin 740 745 750 Ala Gly Lieu. Asp Asn Gly Arg Lys Ser Glu Pro Trp Phe Ile Glu Asn 755 760 765 Lys Val Pro Ala Ser Glin Val Ser Ser Phe Asp Val Arg Pro Asn Gly 770 775 78O Ser Gly Arg Thr Ala Ile Phe Ala Asn Ala Pro Ser Gly Ala Glin Lieu 785 790 795 8OO Asn Arg Arg Thr Asp Glin Gly Glin Tyr Lieu. Asp Ala Wall Asp Ile Val 805 810 815 Ser Gly Ser Gly Lys Lys Ser Lieu Gly Tyr Ala His Gly Ser Lys Thr 820 825 83O Val Asn Pro Asn Asp Trp Phe Phe Ser Cys His Phe Trp Phe Asp Ser 835 840 845 Val Met Pro Gly Ser Leu Gly Val Glu Ser Met Phe Gln Leu Val Glu 85 O 855 860 Ala Ile Ala Ala His Glu Asp Leu Ala Gly Lys Ala Arg His Cys Glin 865 870 875 88O Pro His Lieu. Cys Ala Arg Pro Arg Ala Arg Ser Ser Trp Llys Tyr Arg 885 890 895 Gly Gln Leu Thr Pro Lys Ser Lys Lys Met Asp Ser Glu Val His Ile 9 OO 905 910 Val Ser Val Asp Ala His Asp Gly Val Val Asp Lieu Val Ala Asp Gly 915 920 925 Phe Leu Trp Ala Asp Ser Leu Arg Val Tyr Ser Val Ser Asn. Ile Arg 930 935 940 Val Arg Ile Ala Ser Gly Glu Ala Pro Ala Ala Ala Ser Ser Ala Ala 945 950 955 96.O Ser Val Gly Ser Ser Ala Ser Ser Val Glu Arg Thr Arg Ser Ser Pro 965 970 975 Ala Val Ala Ser Gly Pro Ala Glin Thir Ile Asp Leu Lys Glin Lieu Lys 98O 985 99 O Thr Glu Lieu Lieu Glu Lieu. Asp Ala Pro Leu Tyr Lieu Ser Glin Asp Pro 995 10 OO 1005 Thir Ser Gly Glin Lieu Lys Lys His Thr Asp Wall Ala Ser Gly Glin O 10 O15 O20 Ala Thir Ile Val Glin Pro Cys Thr Lieu Gly Asp Leu Gly Asp Arg O25 O3O O35 Ser Phe Met Glu Thr Tyr Gly Val Val Ala Pro Leu Tyr Thr Gly

Ala Met Ala Lys Gly Ile Ala Ser Ala Asp Leu Val Ile Ala Ala

Gly Lys Arg Lys Ile Leu Gly Ser Phe Gly Ala Gly Gly Lieu Pro

Met His His Val Arg Ala Ala Leu Glu Lys Ile Glin Ala Ala Lieu

Pro Glin Gly Pro Tyr Ala Val Asn Leu Ile His Ser Pro Phe Asp

Ser Asn Lieu Glu Lys Gly Asn. Wall Asp Leu Phe Leu Glu Lys Gly US 2008/0026440 A1 Jan. 31, 2008 77

-continued

Wall Thr Wal Wall Glu Ala Ser Ala Phe Met Thr Lieu. Thr Pro Glin 130 135 14 O Val Val Arg Tyr Arg Ala Ala Gly Lieu Ser Arg Asn Ala Asp Gly 145 15 O 155 Ser Val Asn. Ile Arg Asn Arg Ile Ile Gly Lys Wal Ser Arg Thr 160 1.65 17 O Glu Lieu Ala Glu Met Phe Ile Arg Pro Ala Pro Glu His Leu Lieu

Glu Lys Lieu. Ile Ala Ser Gly Glu Ile Thr Glin Glu Glin Ala Glu

Leu Ala Arg Arg Val Pro Val Ala Asp Asp Ile Ala Val Glu Ala

Asp Ser Gly Gly. His Thr Asp Asn Arg Pro Ile His Val Ile Lieu

Pro Lieu. Ile Ile Asn Lieu Arg Asn Arg Lieu. His Arg Glu Cys Gly

Tyr Pro Ala His Lieu Arg Val Arg Val Gly Ala Gly Gly Gly Val

Gly Cys Pro Glin Ala Ala Ala Ala Ala Lieu. Thir Met Gly Ala Ala

Phe Ile Val Thr Gly Thr Val Asin Glin Val Ala Lys Glin Ser Gly

Thr Cys Asp Asn Val Arg Lys Glin Leu Ser Glin Ala Thr Tyr Ser

Asp Ile Cys Met Ala Pro Ala Ala Asp Met Phe Glu Glu Gly Val

Lys Lieu Glin Val Lieu Lys Lys Gly Thr Met Phe Pro Ser Arg Ala

Asn Lys Lieu. Tyr Glu Lieu Phe Cys Lys Tyr Asp Ser Phe Asp Ser

Met Pro Pro Ala Glu Lieu Glu Arg Ile Glu Lys Arg Ile Phe Lys

Arg Ala Leu Glin Glu Val Trp Glu Glu Thir Lys Asp Phe Tyr Ile

Asn Gly Lieu Lys Asn Pro Glu Lys Ile Glin Arg Ala Glu. His Asp

Pro Llys Lieu Lys Met Ser Lieu. Cys Phe Arg Trp Tyr Lieu Gly Lieu 400 405 410 Ala Ser Arg Trp Ala Asn Met Gly Ala Pro Asp Arg Val Met Asp 415 420 425 Tyr Glin Val Trp Cys Gly Pro Ala Ile Gly Ala Phe Asn Asp Phe 430 435 4 40 Ile Lys Gly Thr Tyr Leu Asp Pro Ala Val Ser Asn Glu Tyr Pro 4 45 450 455 Cys Val Val Glin Ile Asn Lieu Glin Ile Leu Arg Gly Ala Cys Tyr 460 465 470 Leu Arg Arg Lieu. Asn Ala Lieu Arg Asn Asp Pro Arg Ile Asp Lieu 475 480 485 Glu Thr Glu Asp Ala Ala Phe Val Tyr Glu Pro Thr Asn Ala Leu 490 495 5 OO

US 2008/0026440 A1 Jan. 31, 2008 80

-continued

Arg Glu Ser Trp Glu Thir Ile Arg Ala Gly Ile Asp Cys Lieu Ser Asp 35 40 45 Leu Pro Glu Asp Arg Val Asp Val Thr Ala Tyr Phe Asp Pro Wall Lys 50 55 60 Thir Thr Lys Asp Lys Ile Tyr Cys Lys Arg Gly Gly Phe Ile Pro Glu 65 70 75 8O Tyr Asp Phe Asp Ala Arg Glu Phe Gly Lieu. Asn Met Phe Glin Met Glu 85 90 95 Asp Ser Asp Ala Asn Glin Thr Ile Ser Lieu Lleu Lys Wall Lys Glu Ala 100 105 110 Leu Glin Asp Ala Gly Ile Asp Ala Leu Gly Lys Glu Lys Lys Asn. Ile 115 120 125 Gly Cys Val Lieu Gly Ile Gly Gly Gly Glin Lys Ser Ser His Glu Phe 130 135 1 4 0 Tyr Ser Arg Lieu. Asn Tyr Val Val Val Glu Lys Val Lieu Arg Lys Met 145 15 O 55 160 Gly Met Pro Glu Glu Asp Wall Lys Val Ala Val Glu Lys Tyr Lys Ala 1.65 170 175 Asn Phe Pro Glu Trp Arg Lieu. Asp Ser Phe Pro Gly Phe Leu Gly Asn 18O 185 19 O Val Thr Ala Gly Arg Cys Thr Asn Thr Phe Asn Leu Asp Gly Met Asn 195 200 2O5 Cys Val Val Asp Ala Ala Cys Ala Ser Ser Lieu. Ile Ala Wall Lys Wal 210 215 220 Ala Ile Asp Glu Lieu Lleu Tyr Gly Asp Cys Asp Met Met Val Thr Gly 225 230 235 240 Ala Thr Cys Thr Asp Asn Ser Ile Gly Met Tyr Met Ala Phe Ser Lys 245 250 255 Thr Pro Val Phe Ser Thr Asp Pro Ser Val Arg Ala Tyr Asp Glu Lys 260 265 27 O Thr Lys Gly Met Lieu. Ile Gly Glu Gly Ser Ala Met Lieu Val Lieu Lys 275 280 285 Arg Tyr Ala Asp Ala Val Arg Asp Gly Asp Glu Ile His Ala Val Ile 29 O 295 3OO Arg Gly Cys Ala Ser Ser Ser Asp Gly Lys Ala Ala Gly Ile Tyr Thr 305 310 315 320 Pro Thir Ile Ser Gly Glin Glu Glu Ala Lieu Arg Arg Ala Tyr Asn Arg 325 330 335 Ala Cys Val Asp Pro Ala Thr Val Thr Leu Val Glu Gly His Gly Thr 340 345 35 O Gly Thr Pro Val Gly Asp Arg Ile Glu Lieu. Thr Ala Lieu Arg Asn Lieu 355 360 365 Phe Asp Lys Ala Tyr Gly Glu Gly Asn Thr Glu Lys Val Ala Val Gly 370 375 38O Ser Ile Lys Ser Ser Ile Gly His Lieu Lys Ala Val Ala Gly Lieu Ala 385 390 395 400 Gly Met Ile Llys Val Ile Met Ala Lieu Lys His Lys Thr Lieu Pro Gly 405 410 415 Thir Ile Asin Val Asp Asn Pro Pro Asn Leu Tyr Asp Asn Thr Pro Ile 420 425 43 O US 2008/0026440 A1 Jan. 31, 2008 81

-continued Asn Glu Ser Ser Leu Tyr Ile Asn Thr Met Asn Arg Pro Trp Phe Pro 435 4 40 4 45 Pro Pro Gly Val Pro Arg Arg Ala Gly Ile Ser Ser Phe Gly Phe Gly 450 455 460 Gly Ala Asn Tyr His Ala Wall Leu Glu Glu Ala Glu Pro Glu His Thr 465 470 475 480 Thr Ala Tyr Arg Leu Asn Lys Arg Pro Glin Pro Val Leu Met Met Ala 485 490 495

Ala Thr Pro Ala 5 OO

<210 SEQ ID NO 9 &2 11s LENGTH 1278 &212> TYPE DNA <213> ORGANISM: Schizochytrium sp. &220s FEATURE <221 NAME/KEY: CDS <222> LOCATION: (1) . . (1278) <400 SEQUENCE: 9 gat gtc acc aag gag goc tog cqc ctic coc cqc gag ggc gtc. agc titc 48 Asp Val Thir Lys Glu Ala Trp Arg Lieu Pro Arg Glu Gly Val Ser Phe 1 5 10 15 cgc goc aag ggc atc gcc acc aac ggc got gito goc gog ctic titc. tcc 96 Arg Ala Lys Gly Ile Ala Thr Asn Gly Ala Wall Ala Ala Lieu Phe Ser 2O 25 30 ggc cag ggc gog cag tac acg cac atg titt agc gag gtg goc atgaac 144 Gly Glin Gly Ala Glin Tyr Thr His Met Phe Ser Glu Val Ala Met Asn 35 40 45 tgg ccc cag titc cqC cag agc att goc goc atg gac goc goc cag to c 192 Trp Pro Glin Phe Arg Glin Ser Ile Ala Ala Met Asp Ala Ala Glin Ser 50 55 60 aag gtc gct gga agc gac aag gac titt gag cqc gtc. tcc cag gtc. citc 240 Lys Wall Ala Gly Ser Asp Lys Asp Phe Glu Arg Val Ser Glin Val Lieu 65 70 75 8O tac cog cqc aag ccg tac gag cqt gag coc gag cag aac coc aag aag 288 Tyr Pro Arg Llys Pro Tyr Glu Arg Glu Pro Glu Glin Asn Pro Llys Lys 85 90 95 atc toc citc acc gcc tac tog cag coc tog acc citg gcc toc got citc 336 Ile Ser Leu Thr Ala Tyr Ser Glin Pro Ser Thr Leu Ala Cys Ala Leu 100 105 110 ggit gcc titt gag atc titc aag gag goc ggc titc acc ccg gac titt go c 384 Gly Ala Phe Glu Ile Phe Lys Glu Ala Gly Phe Thr Pro Asp Phe Ala 115 120 125 gcc ggc cat tog citc ggit gag titc gcc goc citc tac goc gog ggc toc 432 Ala Gly His Ser Leu Gly Glu Phe Ala Ala Leu Tyr Ala Ala Gly Cys 130 135 1 4 0 gto gac cqc gac gag citc titt gag citt gtc toc cqc cqc goc cqc atc 480 Val Asp Arg Asp Glu Lieu Phe Glu Lieu Val Cys Arg Arg Ala Arg Ile 145 15 O 155 160 atg ggc ggC aag gaC gca CC g gcc acc CCC aag gga togC at g gCC gCC 528 Met Gly Gly Lys Asp Ala Pro Ala Thr Pro Lys Gly Cys Met Ala Ala 1.65 170 175 gto att ggc ccc aac goc gag aac atc aag gito cag goc goc aac gtc 576 Val Ile Gly Pro Asn Ala Glu Asn. Ile Llys Val Glin Ala Ala Asn Val 18O 185 19 O tgg citc ggc aac toc aac tog cott tog cag acc gtc atc acc ggc to c 624 Trp Leu Gly Asn Ser Asn Ser Pro Ser Glin Thr Val Ile Thr Gly Ser US 2008/0026440 A1 Jan. 31, 2008 82

-continued

195 200 2O5 gto: gaa ggt atc cag goc gag agc gcc cgc citc cag aag gag ggc titc 672 Val Glu Gly Ile Glin Ala Glu Ser Ala Arg Lieu Gln Lys Glu Gly Phe 210 215 220 cgc gtc gtg cct citt gcc toc gag agc goc titc. cac tog ccc cag atg 720 Arg Val Val Pro Leu Ala Cys Glu Ser Ala Phe His Ser Pro Glin Met 225 230 235 240 gag aac goc to g tog goc titc aag gac gtc atc. tcc aag gtc. tcc titc 768 Glu Asn Ala Ser Ser Ala Phe Lys Asp Wal Ile Ser Lys Val Ser Phe 245 250 255 cgc acc ccc aag goc gag acc aag citc titc agc aac gtc. tct ggc gag 816 Arg Thr Pro Lys Ala Glu Thir Lys Leu Phe Ser Asn Val Ser Gly Glu 260 265 27 O acc tac coc acg gac goc cqc gag at g citt acg cag cac at g acc agc 864 Thr Tyr Pro Thr Asp Ala Arg Glu Met Leu Thr Gln His Met Thr Ser 275 280 285 agc gtc. aag titc. citc acc cag gtc. c.gc aac atg cac cag goc ggit gcg 912 Ser Val Lys Phe Leu Thr Glin Val Arg Asn Met His Glin Ala Gly Ala 29 O 295 3OO cgc atc titt gtc. gag titc gga ccc aag cag gtg citc. tcc aag citt gtc 96.O Arg Ile Phe Val Glu Phe Gly Pro Lys Glin Val Leu Ser Lys Leu Val 305 310 315 320 to C gag acc ctic aag gat gac coc to g gtt gtc. acc gito tct gtc. aac OO 8 Ser Glu Thr Leu Lys Asp Asp Pro Ser Val Val Thr Val Ser Val Asn 325 330 335 ccg goc to g g g c acg gat tog gac atc cag citc cqc gac gog goc gtc O56 Pro Ala Ser Gly Thr Asp Ser Asp Ile Glin Leu Arg Asp Ala Ala Wal 340 345 35 O cag citc gtt gtc. got goc gto: aac citt cag ggc titt gac aag togg gac 104 Glin Leu Val Val Ala Gly Val Asn Lieu Glin Gly Phe Asp Llys Trp Asp 355 360 365 gcc ccc gat gcc acc cqc atg cag goc atc aag aag aag cqc act acc 152 Ala Pro Asp Ala Thr Arg Met Glin Ala Ile Lys Lys Lys Arg Thir Thr 370 375 38O citc cqc citt to g g cc gcc acc tac gtc. tcg gac aag acc aag aag gtc 200 Leu Arg Lieu Ser Ala Ala Thr Tyr Val Ser Asp Lys Thr Lys Llys Val 385 390 395 400 cgc gac goc goc atgaac gat ggc cqc toc gito acc tac citc aag ggc 248 Arg Asp Ala Ala Met Asn Asp Gly Arg Cys Val Thr Tyr Lieu Lys Gly 405 410 415 gcc goa cog citc atc aag goc cog gag coc 278 Ala Ala Pro Lieu. Ile Lys Ala Pro Glu Pro 420 425

<210> SEQ ID NO 10 <211& LENGTH 426 &212> TYPE PRT <213> ORGANISM: Schizochytrium sp. <400 SEQUENCE: 10 Asp Val Thir Lys Glu Ala Trp Arg Lieu Pro Arg Glu Gly Val Ser Phe 1 5 10 15 Arg Ala Lys Gly Ile Ala Thr Asn Gly Ala Wall Ala Ala Lieu Phe Ser 2O 25 30 Gly Glin Gly Ala Glin Tyr Thr His Met Phe Ser Glu Val Ala Met Asn 35 40 45 Trp Pro Glin Phe Arg Glin Ser Ile Ala Ala Met Asp Ala Ala Glin Ser US 2008/0026440 A1 Jan. 31, 2008 83

-continued

50 55 60 Lys Wall Ala Gly Ser Asp Lys Asp Phe Glu Arg Val Ser Glin Val Lieu 65 70 75 8O Tyr Pro Arg Llys Pro Tyr Glu Arg Glu Pro Glu Glin Asn Pro Llys Lys 85 90 95 Ile Ser Leu Thr Ala Tyr Ser Glin Pro Ser Thr Leu Ala Cys Ala Leu 100 105 110 Gly Ala Phe Glu Ile Phe Lys Glu Ala Gly Phe Thr Pro Asp Phe Ala 115 120 125 Ala Gly His Ser Leu Gly Glu Phe Ala Ala Leu Tyr Ala Ala Gly Cys 130 135 1 4 0 Val Asp Arg Asp Glu Lieu Phe Glu Lieu Val Cys Arg Arg Ala Arg Ile 145 15 O 155 160 Met Gly Gly Lys Asp Ala Pro Ala Thr Pro Lys Gly Cys Met Ala Ala 1.65 170 175 Val Ile Gly Pro Asn Ala Glu Asn. Ile Llys Val Glin Ala Ala Asn Val 18O 185 19 O Trp Leu Gly Asn Ser Asn Ser Pro Ser Glin Thr Val Ile Thr Gly Ser 195 200 2O5 Val Glu Gly Ile Glin Ala Glu Ser Ala Arg Lieu Gln Lys Glu Gly Phe 210 215 220 Arg Val Val Pro Leu Ala Cys Glu Ser Ala Phe His Ser Pro Gln Met 225 230 235 240 Glu Asn Ala Ser Ser Ala Phe Lys Asp Wal Ile Ser Lys Val Ser Phe 245 250 255 Arg Thr Pro Lys Ala Glu Thir Lys Leu Phe Ser Asn Val Ser Gly Glu 260 265 27 O Thr Tyr Pro Thr Asp Ala Arg Glu Met Leu Thr Gln His Met Thr Ser 275 280 285 Ser Val Lys Phe Leu Thr Glin Val Arg Asn Met His Glin Ala Gly Ala 29 O 295 3OO Arg Ile Phe Val Glu Phe Gly Pro Lys Glin Val Leu Ser Lys Leu Val 305 310 315 320 Ser Glu Thr Leu Lys Asp Asp Pro Ser Val Val Thr Val Ser Val Asn 325 330 335 Pro Ala Ser Gly Thr Asp Ser Asp Ile Glin Leu Arg Asp Ala Ala Wal 340 345 35 O Glin Leu Val Val Ala Gly Val Asn Lieu Glin Gly Phe Asp Llys Trp Asp 355 360 365 Ala Pro Asp Ala Thr Arg Met Glin Ala Ile Lys Lys Lys Arg Thir Thr 370 375 38O Leu Arg Lieu Ser Ala Ala Thr Tyr Val Ser Asp Lys Thr Lys Llys Val 385 390 395 400 Arg Asp Ala Ala Met Asn Asp Gly Arg Cys Val Thr Tyr Lieu Lys Gly 405 410 415 Ala Ala Pro Lieu. Ile Lys Ala Pro Glu Pro 420 425

<210> SEQ ID NO 11 &2 11s LENGTH 5 &212> TYPE PRT <213> ORGANISM: Schizochytrium sp. US 2008/0026440 A1 Jan. 31, 2008 84

-continued

&220s FEATURE <221 NAME/KEY: MISC FEATURE <222> LOCATION: (4) ... (4) <223> OTHER INFORMATION: X = any amino acid <400 SEQUENCE: 11 Gly His Ser Xaa Gly 1 5

<210> SEQ ID NO 12 &2 11s LENGTH 258 &212> TYPE DNA <213> ORGANISM: Schizochytrium sp. &220s FEATURE <221 NAME/KEY: CDS <222> LOCATION: (1) . . (258) <400 SEQUENCE: 12 gct gtc. tcg aac gag citt citt gag aag goc gag act gtc gtc at g gag 48 Ala Val Ser Asn. Glu Lieu Lleu Glu Lys Ala Glu Thr Val Val Met Glu 1 5 10 15 gto: ctic goc goc aag acc ggc tac gag acc gac atg atc gag got gac 96 Val Lieu Ala Ala Lys Thr Gly Tyr Glu Thir Asp Met Ile Glu Ala Asp 2O 25 30 atg gag ctic gag acc gag citc ggc att gac toc atc aag cqt gtc gag 144 Met Glu Lieu Glu Thr Glu Lieu Gly Ile Asp Ser Ile Lys Arg Val Glu 35 40 45 atc ctic toc gag gtc. cag goc at g citc aat gito gag goc aag gat gtc 192 Ile Leu Ser Glu Val Glin Ala Met Lieu. Asn Val Glu Ala Lys Asp Wall 50 55 60 gat goc ctic agc cqc act c go act gtt ggit gag gtt gtc. aac goc atg 240 Asp Ala Leu Ser Arg Thr Arg Thr Val Gly Glu Val Val Asn Ala Met 65 70 75 8O aag gCC gag atc gct ggC 258

85

<210> SEQ ID NO 13 &2 11s LENGTH: 86 &212> TYPE PRT <213> ORGANISM: Schizochytrium sp. <400 SEQUENCE: 13 Ala Val Ser Asn. Glu Lieu Lleu Glu Lys Ala Glu Thr Val Val Met Glu 1 5 10 15 Val Lieu Ala Ala Lys Thr Gly Tyr Glu Thir Asp Met Ile Glu Ala Asp 2O 25 30 Met Glu Lieu Glu Thr Glu Lieu Gly Ile Asp Ser Ile Lys Arg Val Glu 35 40 45 Ile Leu Ser Glu Val Glin Ala Met Lieu. Asn Val Glu Ala Lys Asp Wall 50 55 60 Asp Ala Leu Ser Arg Thr Arg Thr Val Gly Glu Val Val Asn Ala Met 65 70 75 8O Lys Ala Glu Ile Ala Gly 85

<210> SEQ ID NO 14 &2 11s LENGTH 5 &212> TYPE PRT <213> ORGANISM: Schizochytrium sp.

US 2008/0026440 A1 Jan. 31, 2008 89

-continued cag cac cqc cog gtc. cc.g. cag gac aag coc titc tac att acc ctic cqc 2016 Gln His Arg Pro Val Pro Gln Asp Llys Pro Phe Tyr Ile Thr Leu Arg 660 665 67 O to c aac cag to g g g c gqt cac toc cag cac aag cac goc citt cag titc 2O64 Ser Asn Glin Ser Gly Gly His Ser Glin His Lys His Ala Leu Glin Phe 675 680 685 cac aac gag cag g g c gat citc titc att gat gito cag got to g g to atc 2112 His Asn. Glu Glin Gly Asp Leu Phe Ile Asp Val Glin Ala Ser Val Ile 69 O. 695 7 OO gcc acg gac agc citt gcc titc 21.33 Ala Thr Asp Ser Leu Ala Phe 705 710

<210> SEQ ID NO 18 &2 11s LENGTH 711 &212> TYPE PRT <213> ORGANISM: Schizochytrium sp. <400 SEQUENCE: 18 Phe Gly Ala Leu Gly Gly Phe Ile Ser Glin Glin Ala Glu Arg Phe Glu 1 5 10 15 Pro Ala Glu Ile Leu Gly Phe Thr Lieu Met Cys Ala Lys Phe Ala Lys 2O 25 30 Ala Ser Lieu. Cys Thr Ala Val Ala Gly Gly Arg Pro Ala Phe Ile Gly 35 40 45 Val Ala Arg Lieu. Asp Gly Arg Lieu Gly Phe Thr Ser Glin Gly. Thir Ser 50 55 60 Asp Ala Lieu Lys Arg Ala Glin Arg Gly Ala Ile Phe Gly Lieu. Cys Lys 65 70 75 8O Thir Ile Gly Leu Glu Trp Ser Glu Ser Asp Val Phe Ser Arg Gly Val 85 90 95 Asp Ile Ala Glin Gly Met His Pro Glu Asp Ala Ala Wall Ala Ile Val 100 105 110 Arg Glu Met Ala Cys Ala Asp Ile Arg Ile Arg Glu Val Gly Ile Gly 115 120 125 Ala Asn Glin Glin Arg Cys Thr Ile Arg Ala Ala Lys Lieu Glu Thr Gly 130 135 1 4 0 Asn Pro Glin Arg Glin Ile Ala Lys Asp Asp Wall Leu Lleu Val Ser Gly 145 15 O 155 160 Gly Ala Arg Gly Ile Thr Pro Lieu. Cys Ile Arg Glu Ile Thr Arg Glin 1.65 170 175 Ile Ala Gly Gly Lys Tyr Ile Leu Lieu Gly Arg Ser Lys Val Ser Ala 18O 185 19 O Ser Glu Pro Ala Trp Cys Ala Gly Ile Thr Asp Glu Lys Ala Val Glin 195 200 2O5 Lys Ala Ala Thr Glin Glu Lieu Lys Arg Ala Phe Ser Ala Gly Glu Gly 210 215 220 Pro Llys Pro Thr Pro Arg Ala Val Thr Lys Leu Val Gly Ser Val Leu 225 230 235 240 Gly Ala Arg Glu Val Arg Ser Ser Ile Ala Ala Ile Glu Ala Leu Gly 245 250 255 Gly Lys Ala Ile Tyr Ser Ser Cys Asp Wall Asn. Ser Ala Ala Asp Wal 260 265 27 O US 2008/0026440 A1 Jan. 31, 2008 90

-continued Ala Lys Ala Val Arg Asp Ala Glu Ser Glin Leu Gly Ala Arg Val Ser 275 280 285 Gly Ile Val His Ala Ser Gly Val Lieu Arg Asp Arg Lieu. Ile Glu Lys 29 O 295 3OO Lys Leu Pro Asp Glu Phe Asp Ala Val Phe Gly Thr Lys Val Thr Gly 305 310 315 320 Leu Glu Asn Lieu Lleu Ala Ala Val Asp Arg Ala Asn Lieu Lys His Met 325 330 335 Val Leu Phe Ser Ser Leu Ala Gly Phe His Gly Asn Val Gly Glin Ser 340 345 35 O Asp Tyr Ala Met Ala Asn. Glu Ala Lieu. Asn Lys Met Gly Lieu Glu Lieu 355 360 365 Ala Lys Asp Val Ser Wall Lys Ser Ile Cys Phe Gly Pro Trp Asp Gly 370 375 38O Gly Met Val Thr Pro Gln Leu Lys Lys Glin Phe Glin Glu Met Gly Val 385 390 395 400 Glin Ile Ile Pro Arg Glu Gly Gly Ala Asp Thr Val Ala Arg Ile Val 405 410 415 Leu Gly Ser Ser Pro Ala Glu Ile Leu Val Gly Asn Trp Arg Thr Pro 420 425 43 O Ser Lys Llys Val Gly Ser Asp Thir Ile Thr Lieu. His Arg Lys Ile Ser 435 4 40 4 45 Ala Lys Ser Asn Pro Phe Leu Glu Asp His Val Ile Glin Gly Arg Arg 450 455 460 Val Leu Pro Met Thr Leu Ala Ile Gly Ser Leu Ala Glu Thr Cys Leu 465 470 475 480 Gly Lieu Phe Pro Gly Tyr Ser Leu Trp Ala Ile Asp Asp Ala Glin Lieu 485 490 495 Phe Lys Gly Val Thr Val Asp Gly Asp Val Asn Cys Glu Val Thr Leu 5 OO 505 51O. Thr Pro Ser Thr Ala Pro Ser Gly Arg Val Asn Val Glin Ala Thr Leu 515 52O 525 Lys Thr Phe Ser Ser Gly Lys Leu Val Pro Ala Tyr Arg Ala Val Ile 530 535 540 Val Leu Ser Asn Gln Gly Ala Pro Pro Ala Asn Ala Thr Met Glin Pro 545 550 555 560 Pro Ser Lieu. Asp Ala Asp Pro Ala Leu Glin Gly Ser Val Tyr Asp Gly 565 570 575 Lys Thr Lieu Phe His Gly Pro Ala Phe Arg Gly Ile Asp Asp Val Lieu 58O 585 59 O Ser Cys Thr Lys Ser Glin Lieu Val Ala Lys Cys Ser Ala Val Pro Gly 595 600 605 Ser Asp Ala Ala Arg Gly Glu Phe Ala Thr Asp Thr Asp Ala His Asp 610 615 62O Pro Phe Val Asn Asp Leu Ala Phe Glin Ala Met Leu Val Trp Val Arg 625 630 635 640 Arg Thr Lieu Gly Glin Ala Ala Lieu Pro Asn. Ser Ile Glin Arg Ile Val 645 650 655 Gln His Arg Pro Val Pro Gln Asp Llys Pro Phe Tyr Ile Thr Leu Arg 660 665 67 O Ser Asn Glin Ser Gly Gly His Ser Glin His Lys His Ala Leu Glin Phe

US 2008/0026440 A1 Jan. 31, 2008 93

-continued Met Ala Ala Arg Asn. Wal Ser Ala Ala His Glu Met His Asp Glu Lys 1 5 10 15 Arg Ile Ala Val Val Gly Met Ala Val Glin Tyr Ala Gly Cys Lys Thr 2O 25 30 Lys Asp Glu Phe Trp Glu Val Lieu Met Asn Gly Lys Val Glu Ser Lys 35 40 45 Val Ile Ser Asp Lys Arg Lieu Gly Ser Asn Tyr Arg Ala Glu His Tyr 50 55 60 Lys Ala Glu Arg Ser Lys Tyr Ala Asp Thr Phe Cys Asn. Glu Thir Tyr 65 70 75 8O Gly Thr Lieu. Asp Glu Asn. Glu Ile Asp Asn. Glu His Glu Lieu Lleu Lieu 85 90 95 Asn Lieu Ala Lys Glin Ala Leu Ala Glu Thir Ser Wall Lys Asp Ser Thr 100 105 110 Arg Cys Gly Ile Val Ser Gly Cys Lieu Ser Phe Pro Met Asp Asn Lieu 115 120 125 Glin Gly Glu Lieu Lleu. Asn Val Tyr Glin Asn His Val Glu Lys Lys Lieu 130 135 1 4 0 Gly Ala Arg Val Phe Lys Asp Ala Ser His Trp Ser Glu Arg Glu Glin 145 15 O 155 160 Ser Asn Lys Pro Glu Ala Gly Asp Arg Arg Ile Phe Met Asp Pro Ala 1.65 170 175 Ser Phe Val Ala Glu Glu Lieu. Asn Lieu Gly Ala Lieu. His Tyr Ser Val 18O 185 19 O Asp Ala Ala Cys Ala Thr Ala Leu Tyr Val Lieu Arg Lieu Ala Glin Asp 195 200 2O5 His Lieu Val Ser Gly Ala Ala Asp Wal Met Lieu. Cys Gly Ala Thr Cys 210 215 220 Leu Pro Glu Pro Phe Phe Ile Leu Ser Gly Phe Ser Thr Phe Glin Ala 225 230 235 240 Met Pro Val Gly Thr Gly Glin Asn Val Ser Met Pro Leu. His Lys Asp 245 250 255 Ser Glin Gly Leu Thr Pro Gly Glu Gly Gly Ser Ile Met Val Leu Lys 260 265 27 O Arg Lieu. Asp Asp Ala Ile Arg Asp Gly Asp His Ile Tyr Gly. Thir Lieu 275 280 285 Leu Gly Ala Asn. Wal Ser Asn. Ser Gly Thr Gly Lieu Pro Leu Lys Pro 29 O 295 3OO Leu Lleu Pro Ser Glu Lys Lys Cys Lieu Met Asp Thr Tyr Thr Arg Ile 305 310 315 320 Asn Val His Pro His Lys Ile Glin Tyr Val Glu Cys His Ala Thr Gly 325 330 335 Thr Pro Glin Gly Asp Arg Val Glu Ile Asp Ala Wall Lys Ala Cys Phe 340 345 35 O Glu Gly Lys Val Pro Arg Phe Gly Thr Thr Lys Gly Asn Phe Gly. His 355 360 365 Thr Xaa Xaa Ala Ala Gly Phe Ala Gly Met Cys Lys Val Lieu Lleu Ser 370 375 38O Met Lys His Gly Ile Ile Pro Pro Thr Pro Gly Ile Asp Asp Glu Thr 385 390 395 400 Lys Met Asp Pro Leu Val Val Ser Gly Glu Ala Ile Pro Trp Pro Glu

US 2008/0026440 A1 Jan. 31, 2008 96

-continued

50 55 60 Lys Ala Thr Pro His Gly Cys Tyr Ile Glu Asp Val Glu Val Asp Phe 65 70 75 8O Glin Arg Leu Arg Thr Pro Met Thr Pro Glu Asp Met Leu Leu Pro Glin 85 90 95 Glin Leu Lieu Ala Val Thr Thr Ile Asp Arg Ala Ile Lieu. Asp Ser Gly 100 105 110 Met Lys Lys Gly Gly Asn Val Ala Val Phe Val Gly Lieu Gly. Thir Asp 115 120 125 Leu Glu Lieu. Tyr Arg His Arg Ala Arg Val Ala Leu Lys Glu Arg Val 130 135 1 4 0 Arg Pro Glu Ala Ser Lys Lys Lieu. Asn Asp Met Met Glin Tyr Ile Asn 145 15 O 155 160 Asp Cys Gly. Thir Ser Thr Ser Tyr Thr Ser Tyr Ile Gly Asn Leu Val 1.65 170 175 Ala Thr Arg Val Ser Ser Gln Trp Gly Phe Thr Gly Pro Ser Phe Thr 18O 185 19 O Ile Thr Glu Gly Asn. Asn. Ser Val Tyr Arg Cys Ala Glu Lieu Gly Lys 195 200 2O5 Tyr Leu Leu Glu Thr Gly Glu Val Asp Gly Val Val Val Ala Gly Val 210 215 220 Asp Leu. Cys Gly Ser Ala Glu Asn Leu Tyr Val Lys Ser Arg Arg Phe 225 230 235 240 Lys Val Ser Thr Ser Asp Thr Pro Arg Ala Ser Phe Asp Ala Ala Ala 245 250 255 Asp Gly Tyr Phe Val Gly Glu Gly Cys Gly Ala Phe Val Lieu Lys Arg 260 265 27 O Glu Thir Ser Cys Thr Lys Asp Asp Arg Ile Tyr Ala Cys Met Asp Ala 275 280 285 Ile Val Pro Gly Asn Val Pro Ser Ala Cys Lieu Arg Glu Ala Lieu. Asp 29 O 295 3OO Glin Ala Arg Val Lys Pro Gly Asp Ile Glu Met Leu Glu Lieu Ser Ala 305 310 315 320 Asp Ser Ala Arg His Lieu Lys Asp Pro Ser Val Lieu Pro Lys Glu Lieu 325 330 335 Thr Ala Glu Glu Glu Ile Gly Gly Lieu Glin Thr Ile Lieu Arg Asp Asp 340 345 35 O Asp Llys Lieu Pro Arg Asn Val Ala Thr Gly Ser Wall Lys Ala Thr Val 355 360 365 Gly Asp Thr Gly Tyr Ala Ser Gly Ala Ala Ser Lieu. Ile Lys Ala Ala 370 375 38O Lieu. Cys Ile Tyr Asn Arg Tyr Lieu Pro Ser Asn Gly Asp Asp Trp Asp 385 390 395 400 Glu Pro Ala Pro Glu Ala Pro Trp Asp Ser Thr Leu Phe Ala Cys Glin 405 410 415 Thir Ser Arg Ala Trp Lieu Lys Asn Pro Gly Glu Arg Arg Tyr Ala Ala 420 425 43 O Val Ser Gly Val Ser Glu Thir Arg Ser 435 4 40

<210> SEQ ID NO 23