US 20120058535A1 (19) United States (12) Patent Application Publication (10) Pub. No.: US 2012/0058535 A1 Heaps et al. (43) Pub. Date: Mar. 8, 2012

(54) BIOFUEL PRODUCTION IN PROKARYOTES Publication Classification AND EUKARYOTES (51) Int. Cl. (75) Inventors: Nicole A. Heaps, San Diego, CA CI2N 9/00 (2006.01) (US); Craig A. Behnke, San Diego, CI2N I/2 (2006.01) CA (US); David Molina, San CI2N 15/63 (2006.01) Diego, CA (US) CI2N I/3 (2006.01) CI2N 5/10 (2006.01) (73) Assignee: SAPPHIRE ENERGY, INC., SAN C7H 2L/04 (2006.01) DIEGO, CA (US) CI2N I/19 (2006.01) (21) Appl. No.: 13/255,888 (52) U.S. Cl...... 435/183: 536/23.1: 435/252.3: 435/254.2:435/257.2:435/419:435/320.1 (22) PCT Filed: Mar. 5, 2010 (86). PCT No.: PCT/US 10/26445 (57) ABSTRACT S371 (c)(1), Terpene synthases are that directly convert IPP & DMAPP to terpenes, such as fusicoccadiene. Described (2), (4) Date: Nov. 9, 2011 herein are methods and compositions for the production of (30) Foreign Application Priority Data terpenes and terpenoids for use as fuel molecules or other useful components. Genetically engineered enzymes capable Mar. 11, 2009 (US) ...... 61159366 of producing terpenes and terpenoids are also described. ppor-l's pports IPP DMAPP

IPP

ppor--~s - Monoterpenes GPP

PP

Sterols --Squalene-PPO 2 2 1N -- Sesquiterpenes FPP

PP

PPO1 na 2 2 GGPP

Terpenes (i.e. Fusicocca-2,10(14)-diene) Patent Application Publication Mar. 8, 2012 Sheet 1 of 32 US 2012/0058535 A1

ppor-l's pports IPP DMAPP

PP

ppor--~s Monoterpenes GPP

IPP

Sterols -HSqualene-HPPO 2 FSG 2 Sesquiterpenes

PP

PPO 12 12 1. 2 GGPP

Terpenes (i.e. Fusicocca-2,10(14)-diene)

FIG. 1 Patent Application Publication Mar. 8, 2012 Sheet 2 of 32 US 2012/0058535 A1

O O Por-sh -- 1 cooh CDP-ME OH

DXP HN synthase 2 1s O O ~-S N-N-O-Ns N1O SR OH HO OH

PorneOH synthaseCMEPP

synthaseMEP NADPH Q p-4-O.HO9 O-F No OH OH HOs Por’sOH oH HDMAPP synthase

CDP-ME 1 synthase O-P-O-P-O O O OH HOs H.N.'sn Q-i-o-H-or N.N., O O O OH OH O SR HDMAPP HO OH reductase

FIG. 2 Patent Application Publication Mar. 8, 2012 Sheet 3 of 32 US 2012/0058535 A1

Pyruvate H Glyceraldehyde Acetyl-CoA 3-phosphate

Deoxyxylulose-5

! N / OPP t MeValonate / PP phosphate

OPP N

Monoterpenes

G PP -r OPP Sesquiterpenes, N Triterpenes 2

PP |-r OP

N Diterpenes, Tetraterpenes

FIG. 3 Patent Application Publication Mar. 8, 2012 Sheet 4 of 32 US 2012/0058535 A1

(-)-Limonene (-)-a-Pinene

Ö-Cadinene Vetispiradiene 5-epi-Aristolochene FPP -- loo CEO (E)-O-Bisabolene S-Selinene

A.4, Abietadiene Taxa-4(5), 11(12)-diene GGPP -- r CO OPP (-)-Copalyl diphosphate ent-Kaurene

FIG. 4 Patent Application Publication Mar. 8, 2012 Sheet 5 of 32 US 2012/0058535 A1

M Wetispiradiene Synthase III VIII XI XII XIII XIV A. thaliana 84.92 (putative protein) RK H H H DDXXD (607) M Casbene Synthase M III VIII XI XII XI XIV R. communis 86 91 125 73 47 83 96 C20 RP H H H DDXXD (601) 5-epi-Aristolochene III VIII XI XII XIII XIV A. thaliana 84.92 || 126 74 47 84 97 3 (putative protein) RK H H H DDXXD (604) 3 M 5-epi-Aristolochene III VIII XI XII XIII XIV N. tabacum 37.88 || 124 | 73 |46 82 98 g. C15 Rp H H H DDXXD (550) |2 Westispiradiene Synthase M III VIII XI XII XIII XIV H. muticus 13992. 126 73 147 82 98 C15 RP II H H DDXXD (555) E 8-Cadinene Synthase MMIII VIII XI XII XIII XIV 3. G. arboreum 4387 126 73 47 83 97 O C15 RP H H H DDXXD (556) 8-Cadinene Synthase III VIII XI XII XIII XIV A. thaliana 80 92 126 73 48 82 99 (putative protein) RK H H H DDXXD (600) M Limonene Synthase B III VIII XI XII XIII XIV A. thaliana 4994 124 73 47 80 98 (putative protein) RR H H H DDXXD (565) M Limonene Synthase Al III VIII XI XII XIII XIV A. thaliana 3989 (putative protein) TT H H H DDXXD (563) M Limonene Synthase A2 III VIII XI XII XI XIV A. thaliana 78 89 (putative protein) RR H H H DDXXD (602) M Limonene Synthase M II VIII XI XII XIII XIV M. spicata 81 92 C10 RR H H H DDXXD (599) Limonene Synthase MM III VIII XI XII XIII XIV P. frutescens 84 87 132 73 47 83 97 C10 RR H H H DDXXD (603) FIG. 5A Patent Application Publication Mar. 8, 2012 Sheet 6 of 32 US 2012/0058535 A1

M (-) Limonene Synthase M M III VII VIII IX X XI XII XIII XIV A. grandis 93 6639 71 3433 72 47 83 C10 RR H H H DDXXD (637) 2 M (3 (-) Pinene Synthase M M III VII VIII IX X XI XII XIII XIV X A. grandis 87.6738 713133 72 47 83 99 3 C10 RR. H. H. H DDXXD (628) 5, M M 8-Selinene Synthase 2 III VII VIII IX XI XII XIII XIV É. A. grandis 34.6239 79 10 71 47 83 100 9 C15 RR H H H DDXXD DDXXD (525)

M O 8-Selinene Synthase 1 III WIWII IX X XI XII XIII XIV A. grandis 34.6439 79 3232 72 47 83 C15 RR H H H ED DDXXD (581)

FIG. 5B

Patent Application Publication Mar. 8, 2012 Sheet 8 of 32 US 2012/0058535 A1

OZZ00Z08|,09|07||

08

I'SS 09 07 0008 0009 0007 000Z Patent Application Publication Mar. 8, 2012 Sheet 9 of 32 US 2012/0058535 A1

ZZLZ 082

OZZ00Z08|,09|07||

I’IZI

08

09

07 0008 0009 0007 000Z Patent Application Publication Mar. 8, 2012 Sheet 10 of 32 US 2012/0058535 A1

„O9’5).I.H.

Z’ZZI I'S6I'SS ZLOI000Z ['6L0009 0009 Patent Application Publication Mar. 8, 2012 Sheet 11 of 32 US 2012/0058535 A1

00|| 08

09

0000?7|| 0000Z? 00000|| 00009 00009 00007 0000Z Patent Application Publication Mar. 8, 2012 Sheet 12 of 32 US 2012/0058535 A1

009

?Ouepunq\/ 0000?7|| 0000Z). 00000|| 00008 00009 00007 Patent Application Publication Mar. 8, 2012 Sheet 13 of 32 US 2012/0058535 A1

|OZ'5).I.H.

00|| 08

09 ?Ouepunq\/ 00007|| 0000Z|. 00000|| 00008 00009 00007 0000Z Patent Application Publication Mar. 8, 2012 Sheet 14 of 32 US 2012/0058535 A1

Fusicoccadiene

Carotenoids

IS87 IS88 IS89

FIG. B. Patent Application Publication Mar. 8, 2012 Sheet 15 of 32 US 2012/0058535 A1

1 2 3 4 5 6 WT WT WT -- - -

FIG. 9 Patent Application Publication US 2012/0058535 A1

VOI"50IH

Patent Application Publication Mar. 8, 2012 Sheet 18 of 32 US 2012/0058535 A1

FIG. 11

Patent Application Publication Mar. 8, 2012 Sheet 20 of 32 US 2012/0058535 A1

Abund. Ion 272.00 (271.70 to 272.30):0036-IS88-1.D\datams 1550 1500 1450 1400 1350 1300 1250 1200 1150 1100 1050 1000 950 900 850 800 750 700 650 600 550 500 450 400 350 300 250 200 150 100 50 Times" boso 600 700 800 900 1000 1100 1200 13.OO 1400 1500 1600 1700 1800 1900 2000 2100 22.00

FIG. 13A Patent Application Publication Mar. 8, 2012 Sheet 21 of 32 US 2012/0058535 A1

Abundance Scan 1291 (12,489 min):0036-IS88-1D\datams (-1303 (-) 1400 95.0 1350 1300 1250 1200 135.1 1150 1100 1050 1000 950 701 900 850 800 83.0 750 148.9 700 650 600 229.4 550 500 450 53.1 2721 400 402 116.9 350 1632 243.4 300 193.1 250 1801 200 150 100 50 m/z.--> O 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 25026O 270 280

FIG. 13B Patent Application Publication Mar. 8, 2012 Sheet 22 of 32 US 2012/0058535 A1

Abundance Ion 272.00 (271.70 to 272,30).0036-BD11.D\datams

320

300

280

260

240

220

200

180

160

140

120

100

80

60

40

20

Tim-" 400 S.OO 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 21.00 22.00

FIG. 1 BC Patent Application Publication Mar. 8, 2012 Sheet 23 of 32 US 2012/0058535 A1

Abundance Ion 272.00 (271.70 to 272,30): IS1181.D.data.ms

125000 120000 115000 110000 105000 100000 95000 90000 85000 80000 75000 70000 65000 60000 55000 50000 45000 40000 35000 30000 25000 20000 15000 10000 5000 Time' 4OO 450 SOO 5.5O 600 650 TOO 7SO 800 8.5O 900 950 1000

FIG. 14A Patent Application Publication Mar. 8, 2012 Sheet 24 of 32 US 2012/0058535 A1

Abundance 250000 257.1 240000 Scan 322 (8.365min): IS1181.D\datams (-331)(-)

230000 910 220000 210000 200000 190000 180000 170000 690 105.0 160000 229.1 150000 140000 55.0 1230 130000 2721 120000 1470 110000 213.1 100000 90000 80000 70000 1871 60000 16.1 115 50000 40000 BDO 14C 201.1 30000 20000 10000 243.1 2949 m/z->' 50 60 TO 80 90 100 110 120 130 140 150 16O 170 180 190 200 210 220 230 240 25026027O 280 290 300

FIG. 14B Patent Application Publication Mar. 8, 2012 Sheet 25 of 32 US 2012/0058535 A1

Abundance TIC: WT.D\data.ms 380000 360000 340000 320000 300000 280000 260000 240000 220000 200000 180000 160000 140000 120000 100000 80000 60000 40000 20000

O Ne=a a a lar a 1-all- a - AN Time-> 400 450 500 550 600 6.50 700 7.50 800 8.5O 900 950 1000

FIG. 14C Patent Application Publication Mar. 8, 2012 Sheet 26 of 32 US 2012/0058535 A1

AbundOCC 71.0'' Scan 381 (8,283 min): WTD\datams 48000 46000 44000 42000 40000 38000 36000 34000 32000 30000 28000 26000 24000 22000 570 20000 18000 16000 14000 12000 123.2 10000 8 1949 8000 6000 109.1 4000 2000 O 1510 1649 1821 1961207. 2783 2961 m/Z--> 50 60 70 8O 90 100 110 120 130 140 150 16O 170 180 1902OO 210 220 230 240 25026O 270 280 290 300

FIG. 14D Patent Application Publication Mar. 8, 2012 Sheet 27 of 32 US 2012/0058535 A1

Abundance Ion 272.00 (271.70 to 272,30): 1.Ddatams 6200 6000 5800 5600 5400 5200 5000 4800 4600 4400 4200 4000 3800 3600 3400 3200 3000 2800 2600 2400 2200 2000 1800 1600 1400 1200 1000 800 600 400 200

08 "i"f"i"; a"ies YA-AA-a-a-a-ry resi?, 1\la as All-aaaa Aaaaya

FIG. 15A Patent Application Publication Mar. 8, 2012 Sheet 28 of 32 US 2012/0058535 A1

Abundance Scan 347 (7.914 min): 1.D\datams (-362)(-) 257.1 12000 11500 11000 10500 10000 9500 9000 8500 8000 910 7500 2291 7000 6500 125.1 6000 690 5500 1470 5000 213.1 4500 4000 410 109.O 3500 3000 1871 2500 2000 1500 273.3 1000 13.0 500 O 25 300.9326,5345936.3863 m/Z--> 20 40 60 80 100 120 140 160 180 200 220 240 260 280 300 320 340 360 380 400

FIG. 15B Patent Application Publication Mar. 8, 2012 Sheet 29 of 32 US 2012/0058535 A1

Abundance TIC: C6.D\data.ms 320000 310000 300000 290000 280000 270000 260000 250000 240000 230000 220000 210000 200000 190000 180000 170000 160000 150000 140000 130000 120000 110000 100000 90000 80000 70000 60000 50000 40000 30000 20000 10000 Time' 4SO SOO SSO 600 650 700 7SO 800 85O 900 950 1 OOO 1050

FIG. 1 5C Patent Application Publication Mar. 8, 2012 Sheet 30 of 32 US 2012/0058535 A1

pEarleyGate 104 12.504 kb a

FIG. 16 Patent Application Publication Mar. 8, 2012 Sheet 31 of 32 US 2012/0058535 A1

Abundance Ion 272.00 (271.70 to 272.30); steamdist9095.D\datams 440 420 400 380 360 340 320 300 280 260 240 220 200 180 160 140 120 100 80 60 40 20old. ld I M W. l HM I Time-> 4.50 500 550 6.00 6.50 700 7SO 800 8.5O 900 950 1000 10.50

FIG. 1 7A Patent Application Publication Mar. 8, 2012 Sheet 32 of 32 US 2012/0058535 A1

Scan 258 (6.933 min); steamdist9095.D\datams (-278)(-) Abundance 121.1

1800 1700 1600 1500 1400 1300 93.0

1200 1100 1000 900 800 410 700 600 690 500 400 1370 1610181.0 2720 300 2050 200 2292

100 f O 39.231.535.937, 9. m/Z--> 20 40 60 80 100 120 140 160 180 200 220 240 260 280 300 320 340 360 380 400

FIG. 1 7B US 2012/0058535 A1 Mar. 8, 2012

BOFUEL PRODUCTION IN PROKARYOTES NO:21, SEQID NO:28, SEQID NO:34, or SEQID NO:39. AND EUKARYOTES 3. The isolated polynucleotide of claim 1 or claim 2, wherein the polynucleotide further comprises a nucleic acid which CROSS REFERENCE TO RELATED facilitates homologous recombination into a genome of the APPLICATION photosynthetic bacterium, yeast, alga, or vascular plant. 4. 0001. This application claims the benefit of U.S. Provi The isolated polynucleotide of claim 3, wherein the genome sional Application No. 61/159,366, filed Mar. 11, 2009, the is a chloroplast genome of the alga or the vascular plant. 5. entire contents of which are incorporated by reference for all The isolated polynucleotide of claim 3, wherein the genome purposes. is a nuclear genome of the yeast, the alga, or the vascular plant. 6. The isolated polynucleotide of claim 1, wherein the INCORPORATION BY REFERENCE photosynthetic bacterium is a member of genera Synechocys 0002 All publications, patents, patent applications, public tis, genera Synechococcus, or genera. A throspira. 7. The iso databases, public database entries, and other references cited lated polynucleotide of claim 1, wherein the photosynthetic in this application are herein incorporated by reference in bacterium is a cyanobacterium. 8. The isolated polynuc their entirety as if each individual publication, patent, patent teotide of claim 1, wherein the alga is a microalga. 9. The application, public database, public database entry, or other isolated polynucleotide of claim 1, wherein the alga is C. reference was specifically and individually indicated to be reinhardtii, D. sauna, H. pluvalis, S. dimorphus, D. viridis, D. incorporated by reference. tertiolecta., N. Oculata, or N. Satina. 10, The isolated poly nucleotide of claim 1, wherein the alga is a cyanophyta, a prochlorophyta, a rhodophyta, a chlorophyta, a heterokonto BACKGROUND phyta, a tribophyta, a glaucophyta, a chlorarachniophyte, a 0003 Products, such as oil, petrochemicals, and other sub euglenophyta, a euglenoid, a haptophyta, a chrysophyta, a stances useful for the production of petrochemicals are cryptophyta, a cryptomonad, a dinophyta, a dinoflagellata, a increasingly in demand. Much of today's fuel products are pyrmnesiophyta, a bacillariophyta, a Xanthophyta, a eustig generated from fossil fuels, which are not considered renew matophyta, a raphidophyta, a phaeophyta, or a phytopiank able energy sources, as they are the result of organic material ton. 11. The isolated polynucleotide of claim 1, wherein the being covered by successive layers of sediment over the polynucleotide further comprises a nucleic acid encoding a course of millions of years. There is also a growing desire to tag for purification or detection. 12. The isolated polynucle lessen dependence on imported crude oil. Public awareness otide of claim 11, wherein the tag is a His-6 tag, a FLAG regarding pollution and environmental hazards has also epitope, a c-myc epitope, a Strep-TAGII, a biotin tag, a glu increased. As a result, there has been a growing interest and tathione 5- (GST), a chitin binding protein (CBP), need for alternative methods to produce fuel products. Thus, a maltose binding protein (MBP), or a metal affinity tag. 13. there exists a pressing need for alternative methods to develop The isolated polynucleotide of claim 1, wherein the poly fuel products that are renewable, Sustainable, and less harm nucleotide further comprises a nucleic acid encoding an ful to the environment. amino acid sequence of SEQID NO:3, SEQID NO: 12, SEQ 0004 Liquid fuels (gasoline, diesel, jet fuel, and kerosene, ID NO: 19, SEQ ID NO: 23, or SEQ ID NO: 29. 14. The for example) are primarily composed of mixtures of paraf isolated polynucleotide of claim 1, wherein the polynucle finic and aromatic hydrocarbons. Terpenes are a class of otide further comprises a nucleic acid encoding a selectable biologically produced molecules synthesized from five car marker. 15. The isolated polynucleotide of claim 14, wherein bon precursor molecules in a wide range of organisms. Ter the selectable marker is kanamycin, chloramphenicol, ampi penes are pure hydrocarbons, while terpenoids may contain cillin, or glufosinate. 16. A bacterial, yeast, alga, or vascular one or more oxygen atoms. Because terpenes are hydrocar plant cell comprising the isolated polynucleotide of any one bons with a low oxygen content and contain no nitrogen or of claims 1 to 15. other heteroatoms, terpenes can be used as fuel components 0008 17. An isolated polynucleotide capable of trans with minimal processing. forming a photosynthetic bacterium, a yeast, an alga, or a 0005 Examples of terpenes are fusicoccadiene, casbene, vascular plant, comprising a nucleic acid encoding a terpene ent-kaurene, taxadiene, and abietadiene. synthase comprising, (a) an amino acid sequence of SEQID 0006. Described herein are methods and compositions for NO: 2, SEQ ID NO: 10, SEQID NO: 16, SEQ ID NO: 22, the production of terpenes and terpenoids for use as fuel SEQID NO: 27, SEQ ID NO:33, SEQID NO:38, SEQ ID molecules or components. NO:45, SEQID NO:50, or SEQID NO:55; or (b) a homolog SUMMARY of the amino acid sequence of SEQIDNO: 2, SEQIDNO: 10, SEQID NO: 16, SEQ. ID NO:22, SEQID NO: 27, SEQ ID 0007 1. An isolated polynucleotide capable of transform NO:33, SEQID NO:38, SEQID NO:45, SEQID NO:50, or ing a photosynthetic bacterium, a yeast, an alga, or a vascular SEQ NO: 55. 18. The isolated polynucleotide of claim 17, plant, wherein the polynucleotide comprises a nucleic acid wherein the homolog has at least 50%, at least 60%, at least sequence of SEQID NO: 1, SEQ. ID NO:4, SEQ ID NO: 7, 70%, at least 75%, at least 80%, at least 85%, at least 90%, at SEQ ID NO: 9, SEQ ID NO: 11, SEQ ID NO: 15, SEQID least 95%, at least 98%, or at least 99% sequence identity to NO: 17, SEQID NO: 21, SEQID NO: 26, SEQID NO: 28, the amino acid sequence of SEQID NO: 2, SEQID NO: 10, SEQID NO:32, SEQ ID NO:34, SEQID NO:37, SEQ ID SEQID NO: 16, SEQ ID NO: 22, SEQID NO: 27, SEQ ID NO:39, SEQID NO: 44, SEQID NO: 46, SEQID NO:49, NO:33, SEQID NO:38, SEQID NO:45, SEQID NO:50, or SEQID NO:51, SEQID NO: 54, or SEQID NO: 56. 2. The SEQID NO:55.19. The isolated polynucleotide of claim 17, isolated polynucleotide of claim 1, wherein the polynucle wherein the terpene synthase comprises the amino acid otide comprises a nucleic acid sequence of SEQ ID NO: 4, sequence of SEQID NO: 2. 20. The isolated polynucleotide SEQ ID NO: 7, SEQ ID NO: 11, SEQ ID NO: 17, SEQID of claim 17, wherein the photosynthetic bacterium is a mem US 2012/0058535 A1 Mar. 8, 2012 ber of genera Synechocystis, genera Synechococcus, or gen wherein the nucleic acid comprises the nucleotide sequence era Athrospira. 21. The isolated polynucleotide of claim 17, of SEQ ID. NO: 7. 40. The vector of claim 26, wherein the wherein the photosynthetic bacterium is a cyanobacterium. terpene is a diterpene. 41. The vector of claim 40, wherein the 22. The isolated polynucleotide of claim 17, wherein the alga diterpene is a cyclical diterpene. 42. The vector of claim 26, is a linicroalga. 23. The isolated polynucleotide of claim 17, wherein the terpene is a fusicoccadiene, a casbene, an entkau wherein the alga is C. reinhardtii, D. Salina, H. pluvalis, S. rene, a taxadiene, or an abietadiene. 43. The vector of claim dimorphus, D. viridis, D. tertiolecta, N. Oculata, or N. satina. 42, wherein the terpene is a fusicoccadiene. 44. The vector of 24. The isolated polynucleotide of claim 17, wherein the alga claim 43, wherein the fusicoccadiene is fusicocca-2,10(14)- is a cyanophyta, a prochlorophyta, a rhodophyta, a chloro diene. 45. The vector of claim 26, wherein the terpene syn phyta, a heterokontophyta, a tribophyta, a glaucophyta, a thase is a fusion terpene synthase. 46, The vector of 45, chiorarachniophyte, a etiglenophyta, a eugienoid, a hapto wherein the fusion terpene synthase comprises a portion of a phyta, a chrysophyta, a cryptophyta, a cryptomon.ad, a dino casbene synthase and a portion of a geranylgeranyi-diphos phyta, a dinoflagellata, a pyrmnesiophyta, a bacillariophyta, a phate (GGPP) synthase. 47. The vector of 46, wherein the Xanthophyta, a eustigmatophyta, a raphidophyta, a phaeo fusion terpene synthase comprises the amino acid sequence phyta, or a phytoplankton. 25. A bacterial, yeast, alga, or of SEQID NO: 22.48. The vector of any one of claims 26-47, vascular plant cell comprising the isolated polynucleotide of wherein the polynucteotide further comprises a promoter for any one of claims 17 to 24. expression in the photosynthetic bacterium, yeast, alga, or 0009. 26. A vector comprising a polynucleotide compris vascular plant. 49. The vector of claim 48, wherein the pro ing a nucleic acid encoding a terpene synthase, wherein the moter is a constitutive promoter. 50. The vector of claim 48, terpene synthase cyclyzes a terpene, and wherein the terpene wherein the promoter is an inducible promoter. 51. The vector synthase is capable of being expressed in a photosynthetic of claim 50, wherein the inducible promoter is a light induc bacterium, a yeast, an alga, or a vascular plant. 27. The vector ible promoter, a nitrate inducible promoter, or a heat respon of claim 26, wherein the nucleic acid is codon biased for sive promoter. 52. The vector of claim 48, wherein the pro expression in the photosynthetic bacterium, yeast, alga, or moter is T7, psbD, psdA, tufA, Itra, atp A, or tubulin. 53. The vascular plant. 28. The vector of claim 27, wherein the codon vector of claim 48, wherein the promoter is a chloroplast bias is hot codon bias. 29. The vector of claim 27, wherein the promoter. 54. The vector of claim 48, wherein the promoter is codon bias is regular codon bias. 30. The vector of claim 26, psbA, psbD, atp A, or tufA. 55. The vector of any one of wherein the terpene synthase is a diterpene synthase. 31. The claims 48 to 54, wherein the promoter is operably linked to vector of claim 30, wherein the diterpene synthase is a fusi the polynucleotide. 56. The vector of claim 26, wherein said coccadiene synthase, a kaurene synthase, a casbene synthase, vector further comprises a 5' regulatory region. 57. The vector a taxadiene synthase, an abietadiene synthase, or a homolog of claim 56, wherein said 5' regulatory region further com of any one of the above. 32. The vector of claim 31, wherein prises a promoter. 58. The vector of claim 57, wherein said the diterpene synthase is a fusicoccadiene synthase or a promoter is a constitutive promoter. 59. The vector of claim homolog of a fusicoccadiene synthase. 33. The vector of 57, wherein said promoter is an inducible promoter. 60. The claim 26, wherein the nucleic acid comprises a nucleotide vector of claim 59, wherein said inducible promoter is a light sequence of SEQID NO: 1, SEQID NO:4, SEQID NO: 7, inducible promoter, nitrate inducible promoter, or a heat SEQ flD NO:9, SEQ ID NO: 11, SEQIDNC): 15, SEQ ID responsive promoter. 61. The vector of any one of claims 56 NO: 17, SEQID NO: 21, SEQID NO: 26, SEQID NO: 28, to 60, further comprising a 3' regulatory region. 62. The SEQID NO:32, SEQ ID NO:34, SEQID NO:37, SEQ ID vector of any one of claims 57 to 60, wherein the promoter is NO:39, SEQID NO: 44, SEQID NO: 46, SEQID NO:49, operably linked to the polynucleotide. 63. The vector of any SEQID NO:51, SEQID NO:54, or SEQID NO:56.34. The one of claims 26 to 62, wherein the polynucleotide further vector of claim 26, wherein the nucleic acid comprises a comprises a nucleic acid which facilitates homologous nucleotide sequence of SEQID NO: 4, SEQID NO: 7, SEQ recombination into a genome of the photosynthetic bacte IDNO:11, SEQID NO: 17, SEQIDNO:21, SEQID NO:28, rium, yeast, alga, or vascular plant. 64. The vector of claim 63, SEQID NO:34, or SEQID NO:39. 35. The vector of claim wherein the genome is a chloroplast genome of the alga or the 26, wherein the nucleic acid encoding a terpene synthase vascular plant. 65. The vector of claim 63, wherein the comprises, (a) anamino acid sequence of SEQID NO: 2, SEQ genome is a nuclear genome of the yeast, the alga., or the IDNO: 10, SEQID NO:16, SEQIDNO:22, SEQID NO:27, vascular plant. 66. The vector of claim 26, wherein the pho SEQID NO:33, SEQID NO:38, SEQID NO:45, SEQNO: tosynthetic bacterium is a member of genera Synechocystis, 50, or SEQ ID NO. 55; or (h) a homolog of the amino acid genera Synechococcus, or genera Athrospira. 67. The vector sequence of SEQID NO: 2, SEQID NO: 10, SEQID NO: 16, of claim 26, wherein the photosynthetic bacterium is a cyano SEQID NO: 22, SEQ ID NO: 27, SEQID NO:33, SEQ ID bacterium. 68. The vector of claim 26, wherein the alga is a NO:38, SEQID NO:45, SEQID NO:50, or SEQID NO:55. microalga. 69. The vector of claim 26, wherein the alga is C. 36. The vector of claim 35, wherein the homolog has at least reinhardtii, D. salina, H. pluvalis, S. dimorphus, D. viridis, D. 50%, at least 60%, at least 70%, at least 75%, at least 80%, at tefiolecta, N. Oculata, or N. satina. 70. The vector of claim least 85%, at least 90%, at least 95%, at least 98%, or at least 26, wherein the alga is a cyanophyta, a prochlorophyta, 99% sequence identity to the amino acid sequence of SEQID rhodophyta, chiorophyta, a heterokontophyta, a tribophyta, a NO: 2, SEQ ID NO: 10, SEQID NO: 16, SEQ ID NO: 22, glaucophyta., a chlorarachniophyte, a eugienophyta, a eugle SEQID NO: 27, SEQ ID NO:33, SEQID NO:38, SEQ ID noid, a haptophyta, a chrysophyta, a cryptophyta, a crypto NO: 45, SEQID NO: 50, or SEQID NO:55.37. The vector monad, a dinophyta, a dinoflagellata, a pyrmnesiophyta, a of claim 26, wherein the terpene synthase comprises an amino bacillariophyta, a Xanthophyta, a eustigmatophyta, a raphi acid sequence of SEQID NO: 2.38. The vector of claim 26, dophyta, phaeophyta, or a phytoplankton. 71. The vector of wherein the nucleic acid comprises a nucleotide sequence of claim 26, wherein the polynucleotide further comprises a SEQID. NO:4 or SEQID. NO: 7.39. The vector of claim38, nucleic acid encoding a tag for purification or detection of the US 2012/0058535 A1 Mar. 8, 2012 terpene synthase. 72, The vector of claim 71, wherein the tag the nucleic acid comprises a nucleotide sequence of SEQID is a His-6 tag, a FLAG epitope, a c-myc epitope, a Strep NO: 1, SEQID NO:4, SEQID NO: 7, SEQIDNC):9, SEQ TAGII, a biotin tag, a glutathione S-transferase (GST), a IDNO: 11, SEQID NO:15, SEQIDNO: 17, SEQID NO:21, chitin binding protein (CBP), a maltose binding protein SEQID NO: 26, SEQ ID NO: 28, SEQID NO:32, SEQ ID (MBP), or a metal affinity tag. 71 The vector of claim 26, NO:34, SEQID NO:37, SEQID NO:39, SEQID NO:44, wherein the polynucleotide further comprises a nucleic acid SEQID NO:46, SEQ ID NO:49, SEQID NO:51, SEQ ID encoding an amino acid sequence of SEQID NO:3, SEQID NO: 54, or SEQ ID NO: 56. 100. The vector of claim 91, NO: 12, SEQID NO: 19, SEQID NO:23, or SEQID NO:29, wherein the nucleic acid comprises a nucleotide sequence of 74. The vector of claim 26, wherein the polynucleotide fur SEQID NO:4, SEQID NO:7, SEQID NO: 11, SEQID NO: ther comprises a nucleic acid encoding a selectable marker. 17, SEQID NO:21, SEQID NO:28, SEQID NO:34, or SEQ 75. The vector of claim 74, wherein the selectable marker is ID NO:39. 101. The vector of claim 95, wherein the terpene kanamycin, chloramphenicol, ampicillin, or glufosinate. 76. synthase comprises, (a) an amino acid sequence of SEQ ID The vector of claim 26, wherein the photosynthetic bacte NO: 2, SEQ ID NO: 10, SEQID NO: 16, SEQ ID NO: 22, rium, yeast, alga, or vascular plant does not normally produce SEQID NO: 27, SEQ ID NO:33, SEQID NO:38, SEQ ID the terpene. NO:45, SEQID NO:50, or SEQ. ID NO:55;or (b) a homolog 0010 77. A vector comprising, a polynucleotide compris of the amino acid sequence of SEQIDNO: 2, SEQIDNO: 10, ing a nucleic acid sequence of SEQID NO:46, SEQID NO: SEQID NO: 16, SEQ ID NO: 22, SEQID NO: 27, SEQ ID 51, or SEQID NO:56.78. The vector of claim 77, wherein the NO:33, SEQID NO:38, SEQID NO:45, SEQID NO:50, or nucleic acid sequence is operably linked to a promoter in a SEQ ID NO: 55. 102. The vector of claim 101, wherein the host organism. 79. The vector of claim 78, wherein the pro lioniolog has at least 50%, at least 60%, at least 70%, at least moter is a constitutive promoter. 80. The vector of claim 78, 75%, at least 80%, at least 85%, at least 90%, at least 95%, at wherein the promoter is an inducible promoter. 81. The vector least 98%, or at least 99% sequence identity to the amino acid of claim 80, wherein the inducible promoter is a light induc sequence of SEQID NO: 2, SEQID NO: 10, SEQID NO: 16, ible promoter, a nitrate inducible promoter, or a heat respon SEQID NO: 22, SEQ ID NO: 27, SEQID NO:33, SEQ ID sive promoter. 82. The vector of claim 78, wherein the pro NO:38, SEQID NO:45, SEQID NO:50, or SEQID NO:55. moter is T7, psbD, psdA, tufA, Itra, atp A, or tubulin. 83. The 103. The vector of claim 95, wherein the terpene synthase is vector of claim 78, wherein the promoter is a chloroplast a fusion terpene synthase. 104. The vector of 103, wherein the promoter. 84. The vector of claim 78, wherein the promoter is fusion terpene synthase comprises a portion of a casbene pshA, psbD, atpA, or tufA. 85. The vector of claim 78, synthase and a portion of a geranylgeranyl-diphosphate wherein the organism is a photosynthetic bacterium, a yeast, (GGIP) synthase. 105. The vector of 104, wherein the fusion an alga, or a vascular plant. 86. The vector of claim 85, terpene synthase comprises the amino acid sequence of SEQ wherein the photosynthetic bacterium is a member of genera ID NO: 22. 106. The vector of any one of claims 91-105, Synechocystis, genera Synechococcus, or genera Athrospira. wherein the polynucleotide further comprises a promoter for 87. The vector of claim 85, wherein the photosynthetic bac expression in the photosynthetic bacterium, yeast, alga, or terium is a cyanobacterium. 88. The vector of claim 85, vascular plant. 107. The vector of claim 106, wherein the wherein the alga is a microalga. 89. The vector of claim 85, promoter is a constitutive promoter. 108, The vector of claim wherein the alga is C. reinhardtii, D. Salina, H. pluvalis, S. 106, wherein the promoter is an inducible promoter. 109. The dimorphus, D. viridis, D. tertiolecta, N. Oculata, or N. sauna. vector of claim 106, wherein the inducible promoter is a tight 90. The vector of claim 85, wherein the alga is a cyanophyta, inducible promoter, a nitrate inducible promoter, or a heat a prochlorophyta, a rhodophyta, a chlorophyta, a heterokon responsive promoter. 110. The vector of claim 106, wherein tophyta, a tribophyta, a glaucophyta, a chlorarachniaphyte, a the promoter is T7, psbD, psdA., tufA, Itra, atp A, or tubulin. euglenophyta, a euglenoid, a haptophyta, a chrysophyta, a 111. The vector of claim 106, wherein the promoter is a cryptophyta, a cryptomonad, a dinophyta, a dinollagellata, a chloroplast promoter. 112. The vector of claim 106, wherein pyrmriesiophyta, a bacillariophyta, a Xanthophyta, a eustig the promoterispsb.A., psbD, atp A, or tufA. 113. The vector of matophyta, a raphidophyta, a phaeophyta, or a phytoplank any one of claims 106 to 112, wherein the promoter is oper tOn. ably linked to the polynucleotide. 114. The vector of claim91, 0011 91. A vector comprising a polynucleotide compris wherein said vector further comprises a 5' regulatory region. ing a nucleic acid encoding an capable of modulating 115. The vector of claim 114, wherein said 5' regulatory a terpenoid biosynthetic pathway in an organism wherein the region further comprises a promoter. 116. The vector of claim organism is a photosynthetic bacterium, a yeast, an alga, or a 115, wherein said promoter is a constitutive promoter. 117. vascular plant. 92. The vector of claim 91, wherein the nucleic The vector of claim 115, wherein said promoter is an induc acid is codon biased for expression in the photosynthetic ible promoter. 118. The vector of claim 117, wherein said bacterium, yeast, alga, or vascular plant. 93. The vector of inducible promoter is a light inducible promoter, nitrate claim 92, wherein the codon bias is hot codon bias. 94. The inducible promoter, or a heat responsive promoter. 119. The vector of claim 92, wherein the codon bias is regular codon vector of any one of claims 114 to 118, further comprising a bias. 95. The vector of claim 91, wherein the enzyme is a 3' regulatory region. 120. The vector of any one of claims 115 terpene synthase. 96. The vector of claim 95, wherein the to 118, wherein the promoter is operably linked to the poly terpene synthase is a diterpene synthase. 97. The vector of nucleotide. 121. The vector of any one of claims 91 to 120, claim 96, wherein the diterpene synthase is a fusicoccadiene wherein the polynucleotide further comprises a nucleic acid synthase, a kaurene synthase, a casbene synthase, a taxadiene which facilitates homologous recombination into a genome synthase, an abietadiene synthase, or a homolog of any one of of the photosynthetic bacterium, yeast, alga, or vascular plant. the above.98. The vector of claim 97, wherein the diterpene 122. The vector of claim 121, wherein the genome is a chlo synthase is a fusicoccadiene synthase or a homolog of a roplast genome of the alga or the vascular plant. 123. The fusicoccadiene synthase.99. The vector of claim 91, wherein vector of claim 121, wherein the genome is a nuclear genome US 2012/0058535 A1 Mar. 8, 2012

of the yeast, the alga, or the vascular plant. 124. The vector of terpene synthase comprises, (a) an amino acid sequence of claim 91, wherein the photosynthetic bacterium is a member SEQ ID NO: 2, SEQ ID NC): 10, SEQ ID NO: 16, SEQID of genera Synechocystis, genera Synechococcus, or genera NO: 22, SEQID NO: 27, SEQID NO:33, SEQID NO:38, A throspira. 125. The vector of claim 91, wherein the photo SEQID NO: 45, SEQID NO:50, or SEQID NO. 55; or (h) synthetic bacterium is a cyanobacterium. 126. The vector of a homolog of the amino acid sequence of SEQID NO: 2, SEQ claim 91, wherein the alga is a microalga. 127. The vector of ID NO: 10, SEC. ID NO: 16, SEQID NO:22, SEQ ID NO: claim 91, wherein the alga is C. reinhardtii, D. Salina, H. 27, SEQID NO:33, SEQID NO:38, SEQID NO:45, SEQ pluvalis, S. dimorphus, D. viridis, D. tertiolecta, N. Oculata, ID NO:50, or SEQID NO:55, 144. The genetically modified or N. satina, 128. The vector of claim 91, wherein the alga is organism of claim 143, wherein the homolog has at least 50%, a cyanophyta, a prochlorophyta, a thodophyta, a chlorophyta, at least 60%, at least 70%, at least 75%, at least 80%, at least a heterokontophyta, a tribophyta, a giaucophyta, a chlo 85%, at least 90%, at least 95%, at least 98%, or at least 99% rarachniophyte, euglenophyta, euglenoid, a haptophyta, a sequence identity to the amino acid sequence of SEQID NO: chrysophyta, a cryptophyta, a cryptomonad, a dinophyta, a 2, SEQID NO: 10, SEQID NO: 16, SEQID NO:22, SEQ:ID dinalagellata, a pyrmnesiophyta, a bacillariophyta, oxantho NO: 27, SEQID NO:33, SEQID NO:38, SEQID NO:45, phyta, a eustigmatophyta, mruyhidophvta, aphaeophyta, or a SEQ ID NO: 50, or SEQ ID NO:55, 145. The genetically phytoplankton, 129. The vector of claim 91, wherein the modified organism of claim 134, wherein the terpene Syn polynucleotide further comprises a nucleic acid encoding a thase comprises an amino acid sequence of SEQ ID NO: 2. tag for purification or detection of the terpene synthase. 130. 146. The genetically modified organism of claim 134, The vector of claim 129, wherein the tag is a His-6 tag, a wherein the nucleic acid comprises a nucleotide sequence of FLAG epitope, a c-myc epitope, a Strep-TAGH, a biotin tag, SEQ ID. NO: 4 or SEQ ID. NO: 7. 147. The genetically a glutathione S-transferase (GST), a chitin binding protein modified organism of claim 134, wherein the nucleic acid (CBP), a maltose binding protein (IVIBP), or a metal affinity comprises the nucleotide sequence of SEQ ID. NO: 7. 148. tag. 131. The vector of claim 91, wherein the polynucleotide The genetically modified organism of claim 134, wherein the further comprises a nucleic acid encoding an amino acid terpene is a diterpene. 149. The genetically modified organ sequence of SEQID NO:3, SEQID NO: 12, SEQID NO: 19, ism of claim 148, wherein the diterpene is a cyclical diter SEQID NO:23, or SEQID NO: 29, 132. The vector of claim pene. 150. The genetically modified organism of claim 134, 91, wherein the polynucleotide further comprises a nucleic wherein the terpene is a fusicoccadiene, a casbene, an ent acid encoding a selectable marker. 133. The vector of claim kaurene, a taxadiene, or an abietadiene. 151. The genetically 74, wherein the selectable marker is kanamycin, chloram modified organism of claim 150, wherein the terpene is a phenicoi, amplicillin, or glufosinate. fusicoccadiene. 152. The genetically modified organism of 0012 134. A genetically modified organism, comprising a claim 151, wherein the fusicoccadiene is fusicocca-2,10(14)- polynucleotide comprising a nucleic acid encoding a terpene diene. 153. The genetically modified organism of 134, synthase, wherein the terpene synthase cyclyzes a terpene, wherein the terpene synthase is a fusion terpene synthase. and wherein the terpene synthase is capable of being 154. The genetically modified organism of claim 153, expressed in the organism, and wherein the organism is a wherein the fusion terpene synthase comprises a portion of a photosynthetic bacterium, a yeast, analga, or a vascular plant. casbene synthase and a portion of a geranylgeranyl-diphos 135. The genetically modified organism of claim 134, phate (GGPP) synthase. 155. The genetically modified organ wherein the nucleic acid is codon biased for expression in the ism of claim 154, wherein the fusion terpene synthase com photosynthetic bacterium, yeast, alga, or vascular plant. 136. prises the amino acid sequence of SEQID NO: 22. 156. The The genetically modified organism of claim 135, wherein the genetically modified organism of any one of claims 134 to codon bias is hot codon bias. 137. The genetically modified 155, wherein the polynucleotide further comprises a pro organism of claim 135, wherein the codon bias is regular moter for expression in the photosynthetic bacterium, yeast, codon bias. 138. The genetically modified organism of claim alga, or vascular plant. 157. The genetically modified organ 134, wherein the terpene synthase is a diterpenk. Synthase. ism of claim 156, wherein the promoter is a constitutive 139. The genetically modified organism of claim 138, promoter. 158. The genetically modified organism of claim wherein the diterpene synthase is a fusicoccadiene synthase, 156, wherein the promoter is an inducible promoter. 159. The a kaurene synthase, a casbene synthase, a taxadiene synthase, genetically modified organism of claim 158, wherein the an abietadiene synthase, or a homolog of any one of the inducible promoter is a light inducible promoter, a nitrate above. 140. The genetically modified organism of claim 139, inducible promoter, or a heat responsive promoter. 160. The wherein the diterpene synthase is a fusicoccadiene synthase genetically modified organism of claim 156, wherein the or a homolog of a fusicoccadiene synthase. 141. The geneti promoter is 17, psbD, psdA, tufA, ltrA, atpA, or tubulin. 161. cally modified organism of claim 134, wherein the nucleic The genetically modified organism of claim 156, wherein the acid comprises a nucleotide sequence of SEQID NO: 1, SEQ promoter is a chloroplast promoter. 162. The genetically ID NO:4, SEQID NO: 7, SEQID NO:9, SEQID NO: 11, modified organism of claim 156, wherein the promoter is SEQID NO: 15, SEQ ID NO: 17, SEQID NO: 21, SEQ ID psbA, psbD, atp A, or tufA. 163. The genetically modified NO: 26, SEQID NO: 28, SEQID NO:32, SEQID NO:34, organism of any one of claims 156 to 162 wherein the pro SEQID NO:37, SEQ ID NO:39, SEQID NO:44, SEQ ID moter is operably linked to the polynucleotide. 164. The NO:46, SEQID NO:49, SEQID NO:51, SEQID NO:54, or genetically modified organism of claim 134, wherein the SEQID NO: 56. 1142. The genetically modified organism of polynucleotide further comprises a 5' regulatory region. 165. claim 134, wherein the nucleic acid comprises a nucleotide The genetically modified organism of claim 164, wherein sequence of SEQID NO: 4, SEQID NO: 7, SEQID NO: 11, said 5' regulatory region further comprises a promoter. 166, SEQID NO: 17, SEQID NO: 211, SEQID NO: 28, SEQID The genetically modified organism of claim 165, wherein NO:34, or SEQ ID NO:39. 143. The genetically modified said promoter is a constitutive promoter. 167. The genetically organism of claim 134, wherein the nucleic acid encoding a modified organism of claim 165, wherein said promoter is an US 2012/0058535 A1 Mar. 8, 2012 inducible promoter. 168. The genetically modified organism alga. 189. The genetically modified organism of claim 188, of claim 167, wherein said inducible promoter is a light wherein the alga is D. sauna. 190. The genetically modified inducible promoter, nitrate inducible promoter, or a heat organism of claim 187, wherein the high saline environment responsive promoter. 169. The genetically modified organism comprises sodium chloride. 191. The genetically modified of any one of claims 164 to 168, further comprising a 3' organism of claim 190, wherein the sodium chloride is about regulatory region. 170. The genetically modified organism of 0.5 to about 4.0 molar sodium chloride. any one of claims 165 to 168, wherein the promoter is oper 0013 192. A composition comprising at least 3% terpene ably linked to the polynucleotide. 171. The genetically modi and at least a trace amount of a cellular portion of a geneti fied organism of any one of claim 134-170, wherein the cally modified organism. polynucleotide further comprises a nucleic acid which facili 0014) 193. A method of producing a product, comprising: tates homologous recombination into a genome of the pho a) transforming an organism with a polynucleotide compris tosynthetic bacterium, yeast, alga, or vascular plant. 172. The ing a nucleic acid encoding a terpene synthase capable of genetically modified organism of claim 171, wherein the being expressed in the organism, wherein the transformation genome is a chloroplast genome of the alga or the vascular results in the production or increased production of a terpene, plant. 173. The genetically modified organism of claim 171, and wherein the organism is a photosynthetic bacterium, a wherein the genome is a nuclear genome of the yeast, the alga, yeast, an alga, or a vascular plant; b) collecting the terpene or the vascular plant. 174. The genetically modified organism from the transformed organism; and c) using the terpene to of claim 134, wherein the photosynthetic bacterium is a mem produce a product. 194. The method of claim 193, wherein the ber of genera Synechocystis, genera Synechococcus, or gen nucleic acid is codon biased for expression in the photosyn era, Athrospira. 175. The genetically modified organism of thetic bacterium, yeast, alga, or vascular plant. 195. The claim 134, wherein the photosynthetic bacterium is a cyano method of claim 194, wherein the codon bias is hot codon bacterium. 176. The genetically modified organism of claim bias. 196. The method of claim 194, wherein the codon bias is 134, wherein the alga is a microalga. 177. The genetically regular codon bias. 197. The method of claim 193, wherein modified organism of claim 134, wherein the alga is C. rein the terpene synthase is a diterpene synthase. 198. The method hardtii, D. Salina, H. pluvalis, S. dimorphus, D. viridis, D. of claim 197, wherein the diterpene synthase is a fusicocca tertiotecta, N. Oculata, or N. salina. 178. The genetically diene synthase, a kaurene synthase, a casbene synthase, a modified organism of claim 134, wherein the alga is a cyano taxadiene synthase, an abietadiene synthase, or a homolog of phyta, a prochlorophyta, a rhodophyta, a chlorophyta, a het any one of the above. 199. The method of claim 198, wherein erokontophyta, a tribophyta, a giaucophyta, chlorara.chnio the diterpene synthase is a fusicoccadiene synthase or a phyte, a euglenophyta, a euglenoid, a haptophyta, a homolog of a fusicoccadiene synthase. 200. The method of chrysophyta, a cryptophyta, a cryptomonad, a dinophyta, a claim 193, wherein the nucleic acid comprises a nucleotide dinofiagefiata, a pyrmnesiophyta, abacillariophyta, a Xantho sequence of SEQID NO: 1, SEQID NO:4, SEQID NO: 7, phyta, a eustigmatophyta, a raphidophyta, a phaeophyta, or a SEQ ID NO:9, SEQ ID NO: 11, SEQ ID NO: 15, SEQID phytoplankton. 179. The genetically modified organism of NO: 17, SEQID NO: 21, SEQID NO: 26, SEQID NO: 28, claim 134, wherein the polynucleotide further comprises a SEQID NO:32, SEC. ID NO:34, SEQID NO:37, SEQ ID nucleic acid encoding a tag for purification or detection of the NO:39, SEQID NO: 44, SEQID NO: 46, SEQID NO:49, terpene synthase. 180. The genetically modified organism of SEQID NO:51, SEQID NO:54, or SEQID NO:56.201. The claim 179, wherein the tag is a His-6 tag, a FLAG epitope, a method of claim 193, wherein the nucleic acid comprises a c-myc epitope, a Strep-TAG11, a biotin tag, a glutathione nucleotide sequence of SEQID NO: 4, SEQID NO: 7, SEQ S-transferase (GST), a chitin binding protein (CBP), a mal IDNO: 11, SEQID NO: 17, SEQIDNO:21, SEQID NO:28, tose binding protein (MEP), or a metal affinity tag. 181. The SEQID NO:34, or SEQID NO:39,202. The method of claim genetically modified organism of claim 134, wherein the 193, wherein the nucleic acid encoding a terpene synthase polynucleotide further comprises a nucleic acid encoding an comprises, (a) anamino acid sequence of SEQID NO: 2, SEQ amino acid sequence of SEQID NO:3, SEQID NO: 12, SEQ IDNO: 10, SEQID NO:16, SEQIDNO:22, SEQID NO:27, ID NO: 19, SEQ ID NO. 23, or SEQ ID NO: 29, 182. The SEQID NO:33, SEQ ID NO:38, SEQID NO:45, SEQ ID genetically modified organism of claim 134, wherein the NO: 50, or SEQ ID NO. 55; or (b) a homolog of the amino polynucleotide further comprises a nucleic acid encoding a acid sequence of SEQ ID NO: 2, SEQ ID NO: 10, SEQ ID selectable marker. 183. The genetically modified organism of NO: 16, SEQID NO: 22, SEQID NO: 27, SEQID NO:33, claim 182, wherein the selectable marker is kanamycin, SEQID NO:38, SEQIDNO:45, SEQID NO:50, or SEQID chloramphenicol, amplicillin, or glufosiriatk. 184. The geneti NO:55.203. The method of claim 202, wherein the homolog cally modified organism of claim 134, wherein the photosyn has at least 50%, at least 60%, at least 70%, at least 75%, at thetic bacterium, yeast, alga, or vascular plant does not nor least 80%, at least 85%, at least 90%, at least 95%, at least mally produce the terpene. 185. The genetically modified 98%, or at least 99% sequence identity to the amino acid organism of claim 134, wherein at least 0.24%, at least 0.5%, sequence of SEQID NO: 2, SEQID NO: 10, SEQID NO: 16, at least 0.75%, or at least 1.0% dry weight of the organism is SEQID NO: 22, SEQ ID NO: 27, SEQID NO:33, SEQ ID the terpene. 186. The genetically modified organism of claim NO:38, SEQID NO:45, SEQID NO:50, or SEQID NO:55. 134, wherein at least 0.05%, at least 0.1%, at least 0.25%, at 204. The method of claim 193, wherein the terpene synthase least 0.5%, at least 0.75%, at least 1.0%, at least 1.25%, at comprises an amino acid sequence of SEQ ID NO: 2. 205. least 1.5%, at least 1.75%, at least 2.0%, at least 3.0%, at least The method of claim 193, wherein the nucleic acid comprises 4.0%, or at least 5.0% dry weight of the organism is the a nucleotide sequence of SEQID, NO. 4 or SEQID. NO: 7. terpene, 187. The genetically modified organism of claim 206. The method of claim 193, wherein the nucleic acid 134, wherein the genetically modified organism is capable of comprises the nucleotide sequence of SEQ ID. NO: 7. 207. growing in a high saline environment. 188. The genetically The method of claim 193, wherein the terpene is a diterpene. modified organism of claim 187, wherein the organism is 208. The method of claim 207, wherein the diterpene is a US 2012/0058535 A1 Mar. 8, 2012 cyclical diterpene. 209. The method of claim 193, wherein the tag is a His-6 tag, a FLAG epitope, a c-myc epitope, a Strep terpene is a fusicoccadiene, casbene, ent-kaurene, a taxadi TAGH, a biotin tag, a glutathione S-transferase (GST), a ene, or an abietadiene. 210. The method of claim 209, chitin binding protein (CEP), a maltose binding protein wherein the terpene is a fusicoccadiene. 211. The method of (MBP), or a metal affinity tag. 240. The method of claim 193, claim 210, wherein the fusicoccadiene is fusicocca-2,10(14)- wherein the polynucleotide further comprises a nucleic acid diene. 212. The method of claim 193, wherein the terpene encoding an amino acid sequence of SEQID NO:3, SEQID synthase is a fusion terpene synthase. 213. The method of NO: 12, SEQ “NO: 19, SEQID NO. 23, or SEQID NO: 29, claim 212, wherein the fusion terpene synthase comprises a 241. The method of claim 193, wherein the polynucleotide portion of a casbene synthase and a portion of a geranylgera further comprises a nucleic acid encoding a selectable nyi-diphosphate (GGPP) synthase. 214. The method of claim marker. 242. The method of claim 241, wherein the selectable 213, wherein the fusion terpene synthase comprises the marker is kanamycin, chloramphenicol, amplicillin, or glufo amino acid sequence of SEQID NO: 22. 215. The method of sinate. 243. The method of claim 193, wherein the photosyn any one of claims 193 to 214, wherein the polynucleotide thetic bacterium, yeast, alga, or vascular plant does not nor further comprises a promoter the expression in the photosyn mally produce the terpene. 244. The method of any one of thetic bacterium, yeast, alga, or vascular plant. 216. The claims 193-243, further comprising growing the organism in method of claim 215, wherein the promoter is a constitutive an aqueous environment. 245. The method of claim 244. promoter. 217. The method of claim 215, wherein the pro wherein the growing comprises Supplying CO to the organ moter is an inducible promoter. 218. The method of claim ism. 246. The method of claim 245, wherein the CO is at least 217, wherein the inducible promoter is a light inducible pro partially derived from a burned fossil fuel. 247. The mahod of moter, a nitrate inducible promoter, or a heat responsive pro claim 245 wherein the CO is at least partially derived from moter. 219. The method of claim 15, wherein the promoter is flue gas, 248. The method of any one of claims 193 to 247, T7, psbD, psdA, tufA, Itra, atpA, or tubulin. 220. The method wherein the collecting step comprises one or more of the of claim 215, wherein the promoter is a chioroplast promoter. following steps: (a) harvesting the transformed organism; (b) 221. The method of claim 215, wherein the promoter is psbA, harvesting the terpene from a medium comprising the trans psbD, atp A, or tufA. 222. The method of any one of claims formed organism; (c) mechanically disrupting the trans 215 to 221, wherein the promoter is operably linked to the formed organism; or (d) chemically disrupting the trans polynucleotide. 223. The method of claim 193, wherein the formed organism. polynucleotide further comprises a 5' regulatory region. 224. 00.15 Methods and compositions described herein utilize The method of claim 223, wherein said 5' regulatory region terpenefterpenoid synthases, such as fusicoccadiene syn further comprises a promoter. 225. The method of claim 224, thase, for the production of terpenes and terpenoids, includ wherein said promoter is a constitutive promoter. 226. The ing fusicoccadiene, various organisms. Methods are provided method of claim 224, wherein said promoter is an inducible to create organisms genetically modified to produce terpenes promoter. 227. The method of claim 226, wherein said induc and terpenoids. Production ofterpenes and terpenoids or their ible promoter is a light inducible promoter, nitrate inducible derivatives are useful source of hydrocarbons which can be a promoter, or a heat responsive promoter. 228. The method of source material for the production of fuel. Methods are pro any one of claims 223 to 227, further comprising a 3' regula vided by which terpene synthases, for example PaFS, are tory region. 229. The method of any one of claims 224 to 227, engineered to be expressed in genetically modified host cells, wherein the promoter is operably linked to the polynucle for example, cyanobacteria, yeast and algae, where the Syn otide. 230. The method of any one of claims 193 to 229, thase(s) result in the production or increased production of wherein the polynucleotide further comprises a nucleic acid terpenes and terpenoids, such as fusicoccadiene. In some which facilitates homologous recombination into a genome instances, the terpenes and terpenoids are metabolically inac of the photosynthetic bacterium, yeast, alga, or vascular plant. tive in the host cell, leading to a build up of hydrocarbons. 231. The method of claim 230, wherein the genome is a Such buildup of hydrocarbons increases the usefulness of the chloroplast genome of the alga or the vascular plant. 232. The engineered host cells for the purpose of fuel production. In method of claim 230, wherein the genome is a nuclear Some instances, the hydrocarbons can be secreted from the genome of the yeast, the alga, or the vascular plant. 233. The host cell, either naturally or by introduction of a terpene/ method of claim 193, wherein the photosynthetic bacterium is terpenoid secretion protein. a member of genera Synechocystis, genera Synechococcus, or 0016 Described herein is a vector comprising a nucleic genera Athrospira. 234. The method of claim 193, wherein acid encoding a terpene synthase, wherein the terpene Syn the photosynthetic bacterium is a cyanobacterium. 235. The thase both condenses and/or cyclyzes a terpene and wherein method of claim 193, wherein the alga is a microalga. 236. the nucleic acid is codon biased for expression in photosyn The method of claim 193, wherein the alga is C. reinhardtii, thetic bacteria, yeast, algae or vascular plant. A vector D. Salina, H. pluvalis, S. dimorphus, D. viridis, D, tertiolecta, described herein can contain a nucleic acid in which one or N. Oculata, or N. Satina. 237. The method of claim 193, more codons are biased toward the usage of a target organism. wherein the alga is a cyanophyta, a prochtorophyta, a rhodo Of various methods available for introducing codon bias to a phyta, a chlorophyta, a heterokontophyta, a tribophyta, a gene, vectors described herein can contain a codon bias that is giaucophyta, a chlorarachniophyte, a euglenophyta, a eugle known as “hot” codon bias. In some instances, a vector noid, a haptophyta, a chrysophyta, a cryptophyta, a crypto encodes a terpene synthase wherein the terpene synthase is monad, a dinophyta, a dinoflagerlata, a pyrinnesiophyta, a fusicoccadiene synthase or a homotog thereof. In some bacillariophyta, a Xanthophyta, a eustigmatophyta, a raphi instances, the homotog has at least 50%, at least 60%, at least dophyta, phaeophyta, or a phytoplankton. 238. The method of 70%, at least 75%, at least 80%, at least 85%, at least 90%, at claim 193, wherein the polynucleotide further comprises a least 95%, at least 98%, or at least 99% sequence identity to nucleic acid encoding a tag for purification or detection of the the amino acid sequence of SEQID. NO: 2. Alternatively, a terpene synthase. 239. The method of claim 238, wherein the vector can comprise a nucleic acid sequence, Such as SEQID. US 2012/0058535 A1 Mar. 8, 2012

NO. 4 or SEQ. ID. NO: 7, both of which encode for a fusic enzyme capable of modulating a fusicoccadienk. biosyn occadiene synthase. In some instances, vectors described thetic pathway. Such a vector may further comprise a pro herein further comprise a promoter for expression in photo moter for expression of the nucleic acid in bacteria, yeast or synthetic bacteria, non-photosynthetic bacteria, yeast or algae. Nucleic acid(s) included in Such vectors may contain a algae. A vector can utilize promoter sequences derived from, codon biased form of a gene, optimized for expression in a for example, T7 (bacteriophageT7), t|D2 (truncated D2. pro host organism of choice. Such organisms can be a photosyn moter of Chlamydomonus), D1 (Chlamydomonas), psbD thetic, a unicellular and/or eukaryotic. In some instances, (Scenedesmus) or tufA (Scenedesmus). Other types of pro vectors described herein further comprise a nucleic acid moters contemplated in the present disclosure include pro encoding a tag for purification or detection of an enzyme, and moters driving gene expression in a chtoroplast or a nucleus a nucleic acid sequence for homologous recombination into a of a host organism. A vector can include nucleic acid genome of a host cell. In some instances, the target genome is sequences which facilitate homologous recombination in a a chloroplast genome. In other instances, the target genome is genome of an organism, Such as a nuclear genome or a chlo a nuclear genome. In one embodiment, the flusicoccadiene roplast genome, especially a microalgal chloroplast genome. produced is fusicocca.-2,10(14)-diene. Microalgai host organisms which can be transformed with the 0020. Another aspect of the present disclosure is directed vectors of the present disclosure include Chlamydomonas to a. Vector comprising a nucleic acid encoding an enzyme reinhardtii, Dunaliella Salina, Haematococcus pluvalis, that produces a fusicoccadiene when the vector is integrated Scenedesmus dimorphus, D. viridis, or D. tertiolecta, into a genome of an organism, such as photosynthetic bacte 0017. Also described herein is a genetically modified ria, yeast or algae, wherein the organism does not produce organism comprising an endogenous or exogenous nucleic fusicoccadiene without the vector and wherein the fusicocca acid encoding an enzyme, wherein the enzyme both con diene is metabolically inactive in the organism. In some denses andlor cyclyzes a terpene. Depending on the specific instances, each codon of the nucleic acid encoding the gene introduced, the enzyme may have chain elongation enzyme which is not a preferred codon of the organism is activity, cyclization activity, or both chain elongation and codon biased. A vector of the present disclosure can utilize cyclization activities, Organisms useful for the present dis “hot” codon bias or “regular codon bias. A vector encoding closure include a photosynthetic bacterium, non-photosyn an enzyme such as fisicoccadiene synthase or a homotog thetic bacterium, yeast or alga. An example of the photosyn thereof may be modified by “hot” codon bias. A homolog thetic bacterium is a cyanobacterium, Such as Synechocystis, useful in the present disclosure may have at least 50%, at least Synechoeoccus, Athrospira. Non-limiting examples of algal 60%, at least 70%, at least 75%, at least 80%, at least 85%, at organisms are C. reinhardtii, D. Salina, H. plivalis, S. dimor least 90%, at least 95%, at least 98%, or at least 99% sequence phus, D. viridis, and D. tertiolecta. Genetically modified identity to, for example, the amino acid sequence of SEQID. organisms disclosed herein can produce one or more terpene NO: 2. In another embodiment, a nucleic acid encoding an Syrithases. A terpene synthase can be a fusicoccadiene Syn enzyme that produces fusicocca.diene can be a nucleic acid thase. One of the products that may be produced in the geneti sequence disclosed herein, such as SEQID. NO. 4 or SEQID. cally modified organism is fusicoccadiene, for example, fusi NO: 7. In some instances, a vector of the present disclosure cocca-2,10(14)-diene. In some instances, the fusicoccadiene may further comprise a promoter for expression in photosyn is metabolically inactive in the genetically modified organ thetic bacteria, yeast or algae, for example, a vector may 1S include T7, psal D, tubulin, t2, D1, psbD or tufA promoter. In 0018. A genetically modified organism of the present dis other instances, a promoter on a vector of the present disclo closure can be a photosynthetic baterium wherein the bacte sure may be a chloroplast promoter, such as to2, Dil, psbD, or rium contains at least 0.25%, at least 0.5%, at least 0.75% or tufA. A vector can also include nucleic acid sequences known at least 1.0% dry weight as a fusicoccadiene. A genetically to facilitate homologous recombination in a genome of an modified organism can also be an alga wherein the alga con organism, Such as a chloroplast genome, especially a tains at least 0.05%, at least 0.1%, at least 0.25%, at least microalga I chloroplast genome. Sequences for homologous 0.5%, at least 0.75%, at least 1.0%, at least 1.25%, at least recombination can include sequences from a chioroplast 1.5%, at least 1.75%, at least 2.0%, at least 3.0%, at least 4.0% genome of C. reinhardtii, D. Salina, pluvalis, S. dimorphus, or at least 5.0% dry weight as fusicoccadiene. Exogenous or D. viridis, or D. tertiolecta. endogenous nucleic acids described herein can be present in 0021. Also provided herein are genetically modified the chloroplast and/or nucleus of an organism. In one chioroplasts comprising any of the vectors of the present embodiment, one or more nucleic acids are integrated into a disclosureakdditionally, non-vascular, photosynthetic organ genome of the chloroplast. In another embodiment, the chlo isms which comprise genetically modified chloroplasts of the roplast is homoplasmic for the nucleic acid. In some present disclosure are disclosed. In some instances, anon instances, genetic modification of a host cell results in the Vascular organism is an alga, including mieroalgae, such as C. host cell comprising sufficient chlorophyll levels for the reinhardtii, D. Salina, H. pluvalis, S. dimorphus, D. viridis, organism to be photoautotrophic. Examples of the organisms and D. tertiolecta. In other instances, the non-vascular, pho useful for genetic modification described herein include tosynthetic organisms can be a photosynthetic bacterium, cyanophyta, prochlorophyta, rhodophyta, chlorophyta, heter Such as a member of the genera Synechocystis, Synechococ okontophyta, tribophyta, glaucophyta, chtorarachniophytes, cus, or Athrospira. euglenophyta, euglenoids, haptophyta, chrysophyta, crypto 0022. Further described herein are genetically modified, phyta, cryptomonads, dinophyta, dinoflagellata, pyrnmesio non-vascular photosynthetic organisms comprising an exog phyta, baciliariophyta, Xanthophyta, eustigmatophyta, raphi enous or endogenous nucleic acid encoding an enzyme that dophyta, phaeophyta, and phytoplankton. modulates a fusicoccadiene biosynthetic pathway. A genetic 0019. Some methods and compositions described herein modification can lead to the production of a fusicoccadiene are directed to a vector comprising a nucleic acid encoding an that is not naturally produced by the organisms lacking the US 2012/0058535 A1 Mar. 8, 2012

nucleic acid. In some instances a fusicoccadiene is metaboli diterpene synthase wherein the organism can grow in a high cally inactive in the modified organism, Organisms useful for saline environment. In one embodiment, the organism is a the present disclosure can be a unicellular organism, Such as non-vascular, photosynthetic organism, for example D. a cyanobacterium, yeast or alga. In some instances an exog salina. A high saline environment in Some embodiments enous nucleic acid encoding an enzyme is one that is specifi comprises 0.5-4.0 molar sodium chloride. A diterpene pro cally disclosed herein, such as SEQID NO: 44 and SEQID duced by these organisms can be cyclical. Such as fusicocca NO:46 (a nucleic acid sequence encoding the protein diene. EAS27885 from Coccidioides immitis), SEQID NO: 49 and SEQID NO:51. (a nucleic acid sequence encoding the protein 0027 Described herein is a composition comprising at EAA68264 from Gibberella zeae), SEQID NO: 54 and SEQ least 3% fusicoccadiene and at least a trace amount of a ID NO:56 (a nucleic acid sequence encoding the protein cellular portion of a genetically modified organism. The ACLA. O76850 from Aspergillus clavatus), or the nucleic genetically modified organism can be modified by an exog acid sequence of SEQID NO: 4, or the nucleic acid sequence enous or endogenous nucleic acid encoding fusicoccadiene of SEQ ID NO: 7. synthase. In one embodiment, a fusicoccadiene synthase gene 0023. Further provided herein is a method of producing a is derived from Phomopsis amygdall. An organism for use in fuel product, comprising: a) transforming an organism, the present disclosure can be a bacterium or yeast. In some wherein the transformation results in the production or embodiments the bacterium is a photosynthetic bacterium, increased production of a fusicoccadiene; b) collecting the Such as a member of the genera Synechocystis, Synechococ fusicoccadiene from the organism; and c) using the fusicoc cus, or Athrovira. In other embodiments the organism is an cadiene to produce a fuel product. In some instances, the alga, including microaigae, such as C. reinhardtii, D. Salina, organism is an alga, including microaigae Such as C. rein H. pluvalis, S. dimorphus, D. viridis, and D. tertiolecta. hardtii, D. Salina, H. pluvalis, S. dimorphus, D. viridis, and D. 0028. Further provided herein is a vector comprising: (a) a tertiolecta. In another embodiment, the organism can be a nucleic acid encoding protein EAS27885 from Coccidioides photosynthetic bacterium, Such as a member of the genera Synechocystis, Synechococcus, or Athrospira. In still other immitis, protein EAA68264 from Gibberella zeae, or protein embodiments, the organism can be a non-photosynthetic bac EAQ85668 from Chaetomium blobosum, or a homolog terium or yeast. In some aspects, a method provided herein thereof; and (b) a promoter configured for expression of the further comprises growing the organism in an aqueous envi nucleic acid in a host cell. In some instances, the host cell is a ronment, wherein CO is supplied to the organism. The CO. bacterium, yeast, or alga. Abacterium useful in Some embodi can be at least partially derived from a burned fossil fuel or ments can be a photosynthetic bacterium, for example, mem flue gas. In some embodiments, the collecting step of the bers of the genera Synechocystis, Synechococcus, and Ath method comprises one or more of the following steps: (a) rospira. Algae useful in some embodiments can be a harvesting the transformed organism; (b) harvesting the diter microalga, Such as C. reinhardtii, D. Salina, H. pluvalis, S. pene from a cell medium; (c) mechanically disrupting the dimorphus, D. viridis, and D. tertiolecta. A promoter useful organism; or (d) chemically disrupting the organism, for Some vectors of the present disclosure is a promoter 0024 Methods and compositions described herein are capable of driving expression in chloroplast. In some directed to a fuel product comprising a hydrocarbon refined instances, a vector further comprises one or more nucleic from a fusicoccadiene. In some instances, the fusicoccadiene acids which allow for homologous recombination with a is obtained from a microorganism, Such bacteria, yeast, or genome of the host cell. In some embodiments, a target algae. Such microorganisms can be photosynthetic. In one genome is a chloroplast genome. Host cells Suitable for the embodiment, the fusicoccadiene is fusicocca-2,10(14) diene. vector include cyanophyta, prochlorophyta, rhodophyta, A fuel product may further comprise a fuel additive. chlorophyta, heterokornophyta, tribophyta, glaucophyta, 0025. A method for identifying diterpene synthases with a chlorarachniophytes, euglenophyta, euglenoids, haptophyta, desired trait is also described herein. In some instances, such chrysophyta, cryptophyta, cryptomonads, dinophyta, dino a method comprises the steps of: a) performing one or more fiagellata, pyrinnesiophyta, bacillariophyta, Xanthophyta, genetic manipulations on a nucleic acid encoding a diterpene euStigmatophyta, raphidophyta, phaeophyta, and phy synthase to produce a modified diterpene synthase; b) trans toplankton. A vector disclosed herein may further comprise a forming the modified diterpene synthase into a microorgan nucleic acid encoding a tag for purification or detection of the ism; c) growing the microorganism to produce a diterpene; d) enzyme and/or a selectable marker. analyzing the diterpene; and e) identifying the transformed microorganism having the desired trait. Examples of a 0029. In some embodiments, a host cell comprising a vec desired trait are the expression level of the diterpene synthase, tor comprising: (a) a nucleic acid encoding protein the production level of the diterpene, or the species of diter EAS27885 from Coccidioides immitis, protein EAA68264 pene produced. Genetic manipulations utilized in the method from Gibberella zeae, or protein EAQ85668 from Chaeto include took-through mutagenesis or walk-through mutagen mium blobosum, or a hornolog thereof, and (b) a promoter esis. In some instances, the organism is an alga, including configured for expression of the nucleic acid in a host cell is microalgae Such as a C. reinhardtii, D. solina, G. pluvalis, S. provided. Host cells can include a bacterium, yeast, or alga. A dimorphus, D. viridis, and D. tertiolecta. In another embodi bacterium can be a photosynthetic bacterium, for example, ment, the organism can be a photosynthetic bacterium, Such members of the genera Synechocystis, Synechococcus, and as a member of the genera Synechocystis, Synechococcus, A throspira. Examples of alga for use in the present disclosure A throspira. A diterpene produced by a method disclosed include C. reinhardtii, D. Satina, H. pluvalis, S. dimorphus, herein can be cyclical. Such as fusicoccadiene. D. viridis, and D. tertiolecta. In some instances, the vector, or 0026. Another aspect disclosed herein is a genetically a portion thereof, is present in a chloroplast and can be inte modified organism comprising a nucleic acid encoding a grated into a genome of a chloroplast. Where a vector is US 2012/0058535 A1 Mar. 8, 2012

incorporated into a chioroplast genome, the host cell can be recombinant e kaurene synthase. FIG. 15C is the total ion homoplasmic for the vector, or portion thereof. chromatogram of untransformed Senedesmus, demonstrating that there is no accumulation of ent-kaurene. BRIEF DESCRIPTION OF TRE DRAWINGS 0046 FIG. 16 shows plant expression vector pEarleyG 0030 These and other features, aspects, and advantages of ate 104, the present disclosure will become better understood with 0047 FIGS. 17A and 17B are the total ion chromatogram regard to the following description, appended claims and and mass spectrum, respectively, demonstrating in vivo accu accompanying figures where: mulation of casbene in Chlamydomonas transformed with a 0031 FIG. 1 shows the isoprenoid pathway, and exem recombinant fusion synthase. plary products of the pathway, for example, fusiccoca-2.10 (14)-diene. DETAILED DESCRIPTION 0032 FIG. 2 shows the MEP pathway for the production 0048. The following detailed description is provided to aid of IPP and DMAPP those skilled in the art in practicing the present disclosure. 0033 FIG.3 shows an overview ofterpene biosynthesis in Even so, this detailed description should not be construed to photosynthetic eukaryotes. unduly limit the present disclosure as modifications and 0034 FIG. 4 shows exemplary terpenes biosynthesized by variations in the embodiments discussed herein can be made eukaryotes or prokaryotes. by those of ordinary skill in the art without departing from the 0035 FIGS.5A, B, and C show the genomic organization spirit or scope of the present disclosure, of exemplary plant terpenoid synthase genes. 0049. As used in this specification and the appended 0036 FIGS. 6A, B, and C show mass spectrum analysis claims, the singular forms “a”, “an and “the include plural containing peaks corresponding to fusicoccadiene and indole reference unless the context clearly dictates otherwise. produced: in Vivo by recombinant fusicoccadiene synthase 0050 Endogenous expressed in E. coil (FIG. 6A); in vitro by isolated recombi 0051. An endogenous nucleic acid, nucleotide, polypep nant fusicoccadiene synthase expressed in E. coli (FIG. 6B); tide, or protein as described herein is defined in relationship to and in vivo by recombinant fusicoccadiene synthase the host organism. An endogenous nucleic acid, nucleotide, expressed in C. reinharctii (FIG. 6C). polypeptide, or protein is one that naturally occurs in the host 0037 FIGS. 7A, B, and C show mass spectrum analysis organism, containing peaks corresponding to fusicoccadiene produced 0052 Exogenous by recombinant fusicoccadiene synthases encoded by genes 0053 An exogenous nucleic acid, nucleotide, polypep with different codon biases expressed in C. reinhardtii. FIG. tide, or protein as described herein is defined relationship to 7A regular codon bias; FIG. 7B C. reinhardtii cells lack the host organi SM. An exogenous nucleic acid, nucleotide, ing the recombinant fusicoccadiene synthase gene; and FIG. polypeptide, or protein is one that does not naturally occur in 7C “hot” codon bias. the host organism or is a different location in the host organ 0038 FIG. 8 shows thin layer chromatogram of algal ism. extracts demonstrating in vivo accumulation of fusicoccadi 0054 Isoprenes and Isoprenoids CC. 0055. Over 55,000 individual isoprenoid compounds have 0039 FIG. 9 shows selection of six transformants of been characterized, and hundreds of new structures are cyanobacterium clones transformed with PaFS. reported each year. Most of the molecular diversity in the 0040 FIGS. 10A and B show mass spectrum analysis isoprenoid pathway is created from the disphosphate esters of containing peaks corresponding to fusicoccadiene produced simple linear polyunsaturated allylic alcohols such as dim by recombinant fusicoccadiene synthase expressed in cyano ethyl alcohol (a 5-carbon molecule), gerartoil (a 10-carbon bacteria (Synechocystis). molecule), farnesol (a 15-carbon molecule), and geranylge 0041 FIG. 11 shows an SDS-PAGE gel showing produc raniol (a 20-carbon molecule). The hydrocarbon chains are tion of fusicoccadiene synthase from a “hot” codon biased constructed one isoprene unit at a time by addition of the gene expressed in bacteria. ailylic moiety to the double bond in isopentenyi diphosphate, 0042 FIG. 12 shows a GC/MSD total ion chromatogram the fundamental five-carbon building block in the pathway, to analysis containing peaks corresponding to geranylgeraniol form the next higher member of the series. Geranyl, farnesyl produced by a recombinant fusicoccadiene synthase C-termi and geranylgeranyl diphosphate lie at multiple branch points nal prenyltransferase domain expressed in E. coli, along with in the isoprenoid pathway and are substrates for many positive and negative controls. enzymes. These are primary cyclases, which are responsible 0043 FIGS. 13A, B, and C show mass spectrum analysis for generating the diverse carbon skeletons for the synthesis containing peaks corresponding to fusicoccadiene produced of the thousands of mono-, sequi-, di-, and triterpenes; Sterols; by a recombinant fusicoccadiene synthase expressed in and carotenoids found in nature, The structures of several of cyanobacteria (Synechocystis). these cyclases have been reported. CLesburg, C. A., et at, 0044 FIGS. 14A and 14B are the total ion chromatogram Science, Vol.277, 1820 (1997); Wendt, K. et al., Science, Vol. and mass spectrum, respectively, demonstrating in vivo accu 277, 1811 (1997); and Starks, C.M., et al., Science; Vol. 277: mulation of ent-kaurene in Chlamydomonas transformed 1815 (1997)). with recombinant ent-kaurene synthase. FIGS. 14C and 14D 0056. The extensive family of isoprenoid compounds is are the total ion chromatogram and mass spectrum, respec synthesized from two-precursors, isopentertyl diphosphate tively, of untransformed Chlamydomonas, demonstrating and dimethylailyl disphosphate. The chain elongation and that there is no accumulation of ent-kaurene. cyclization reactions of isoprenoid metabolism are electro 004.5 FIGS. 15A and 15B are the total ion chromatogram phinic alkylations in which a new carbon-carbon single bond and mass spectrum, respectively, demonstrating in vivo accu is formed by attaching a highly reactive electron-deficient mulation of ent-kaurene Scenedesmus transformed with carbocation to an electron-rich carbon-carbon double bond. US 2012/0058535 A1 Mar. 8, 2012

From a chemical viewpoint, the most difficult step is genera reactions, the ten-, fifteen-, and twenty-precursor molecules tion of the carbocations. Nature has selected three strategies geranyl diphosphate, famesyl diphosphate, and gerartylgera for catalysis: cleavage of the carbon-oxygenbond in anallylic nyl diphosphate are produced by chain elongation enzymes. disph.osphate ester, protonation of a carbon-carbon double These terpenoids are then cyclyzed by terpene synthases into bond, or protonation of an epoxide. Once formed, the car monoterpenes (C10 molecules), sesquiterpenes (C15 mol bocations can rearrange by hydrogen atom or alkyl group ecules), and diterpenes (C20 molecules). Farnesyl diphos shifts and Subsequently cyclize by alkylating nearby double phate can be condensed into C30 terpenes, and geranytgera bonds. Diverse families of isoprenoid structures, often nyl diphosphate can be condensed into C20, C40, or higher formed from the same Substrate in and enzyme-specific man molecular weight terpenes. FIG. 1 and FIG. 3 provide an ner, are thought to arise from differences (i) the way substrate overview of terpenoid biosynthesis. is folded in the , (ii) how carbocationic intermedi 0062 An overview of terpene biosynthesis in photosyn ates are stabilized to encourage or discourage rearrange thetic eukaryotes is shown in FIG. 3. The intracellular com ments, and (iii) how positive charge is quenched when the partmentalization of the mevalonate and mevalonate-inde product is formed. pendent pathways for the production of isopentenyl 0057. Several of the enzymes involved in isoprenoid chain diphosphate (IPP) and dimethylallyldiphosphate (DMAPP), elongation and cyclization have been studied and genetic and of the derived terpenoids, is illustrated. The cytosolic information is available for some of the enzymes. Although pool of IPP, which serves as a precursor of famesyl diphos there is little overall similarity between amino acid sequences phate (HT) and, ultimately, the sesquiterpenes and triterpe for the chain elongation and cyclization enzymes, proteins nes, is derived from mevalonic acid (left), The plastidial pool from both classes that use allylic disphosphates as Substrates of IPP is derived from the glycolytic intermediates pyruvate contain highly conserved aspartate-rich DDXXD motifs (Dis and glyceraldehyde-3-phosphate and provides the precursor aspartate, X is any amino acid) thought to be Mg2+ binding of geranyl diphosphate (GPP) and geranylgeranyl dis sites. pliosphate (GGPP) and, ultimately, the monoterpenes, diter 0058. The cyclase domains of the three isoprenoid cycla penes, and tetraterpenes (right). Reactions common to both ses as well as farnesyl diphosphate synthase have a similar pathways are enclosed by both boxes. structural motif, consisting of 10 to 12 mostly antiparallet, 0063 Exemplary terpenes biosynthesized by eukaryotes alpha helices that form a large active site cavity (as described or prokaryotes are shown in FIG. 4. Monoterpenes, sesquit in Tarshis, L. C., Biochemistry, 33, 10871 ()94)). Lesburg, C. erpenes, and diterpenes are derived from the prenyl diphos A. et al. (Science, Vol. 277, 1820 (1997)) have labeled this phate substrates, geranyl diphosphate, farnesyl diphosphate, motif the "isoprenoid synthase fold.” In addition, aspartate and geranylgeranyl disphosphate, respectively, and are pro rich clusters are present in all four proteins. Three enzymes duced in both angiosperms and gymnosperms, (-)-copalyl that use disphosphate-containing Substrates (pentalenene diphosphate and ent-kaurene are sequential intermediates in synthase, epi-aristolochene synthase, and farnesyl disphos the biosynthesis of gibberellins plant growth hormones. phate synthase) contain DDXXD on the walls of their active Examples of terpenes that can be produced by an organism, site cavity (for example, as described in Sacchettini, J.C., and for example, an alga, a yeast, a bacteria, or a higher plant, are Poulter, C. D. Science, Vol. 277, no, 5333, pp. 1788-1789 Casbene, Ent-kaurene, Taxadiene, or Abietadiene (as shown (1997)). The aspartates are involved in binding multiple in FIG. 4). Mg2+ ions. The amino acid sequence of hopene synthase also 0064. Fusicoccins and Fusiococcadienes contains a DDXXD motif Pentalenene synthase and epi-aris 0065. Fusicoccins or fusiococcadienes are compounds tolochene synthase also catalyze proton-promoted cycliza which function in plant pathogenesis and are synthesized by tions (as described in for example, Sacchettini, J. C., and the fungus Phomopsis amygdali. Fusiococcadiene is a cyclic Poulter, C. D. Science, Vol. 277, no. 5333, pp. 1788-1789 diterpene formed by the condensation ofisopentenyl diphos (1997); and Starks, C. M., et al., Science, Vol. 277, 1815 phate (IPP) and dimethytallyldiphosphate (DMAPP) to form 1997)). the C. geranylgeranyl diphosphate (GGPP). This linear iso 0059 Terpenes and Terpenoids prenoid is then cyclized by a terpene cyclase (fusiococcadi 0060 Liquid fuels (gasoline, diesel, jet fuel, kerosene, etc) ene synthase) to form the tricyclic ring structure of fifsio are primarily composed of mixtures of paraffinic and aro cocca-2,10(14)-diene. In P. amygdali, the formation of matic hydrocarbons. Terpenes are a class of biologically pro fusiococca-2,10(14)-diene is carried out by a bifunctional duced molecules synthesized from five carbon precursor mol enzyme fusicoccadiene synthase (PaFS), which has both a ecules in a variety of organisms. Terpenes are pure prenyitransferase domain for the formation of GGPP and a hydrocarbons, while terpenoids may contain one or more terpene cyclase domain for formation of the tricyclic ring oxygen atoms, Because they are hydrocarbons with a low fusicocca-2, 11.0(14)-diene. The carbon skeleton is then oxygen content and contain no nitrogen or other heteroatoms, modified by oxidation, reduction, methylation, and glycosy terpenes can be used as fuel components with minimal pro lation to form fusicoccin A and fusicoccin J, which function cessing (as described, for example, in Calvin, M. (2008) to assist plant pathogenesis by permanently activating plant “Fuel oils from euphorbs and other plants' Botanical Journal 14-3-3 proteins. of the Linnean Society 94:97-(10, and U.S. Pat. No. 7,037, 0066. The present description provides methods and com 348). positions for constructing genetically modified organisms 0061 Terpenes are a subset of isoprenes. Terpenes are which produce terpenes/terpenoids, including cyclical terpe synthesized in biological systems from two five-carbon pre nes, such as fusicoccadiene, casbene, ent-kaurene, taxadiene, cursor molecules, isopentyl-diphosphate and dimethytallyl and abietadiene. Also provided are methods of producing diphosphate (see FIG. 2). The five-carbon precursors are pro terpenes/terpenoids (such as fusicoccadiene) in genetically duced through two pathways, the MEP and the mevalonic modified organisms. In some aspects, the terpenes/terpenoids acid pathways (see FIG.2 and FIG.3). Through condensation may be collected from the organism(s) which have been US 2012/0058535 A1 Mar. 8, 2012 modified to produce them. Collected terpenes/terpenoids chloroplast genome, for example. h is apparent to one of skill may then be further modified, for example by refining and/or in the art that a chloroplast may contain multiple copies of its cracking to produce fuel molecules or components. genome, and therefore, the term "homoplasmic' or 0067. In some instances, a host organism is transformed “homoplasmy” refers to the state where all copies of a par with a nucleic acid encoding at least one terpene/terpenoid ticular locus of interest are substantially identical. Plastid synthase, such as fusicoccadiene synthase. Host organisms expression, in which genes are inserted by homologous can include any Suitable host, for example, a microorganism. recombination into all of the several thousand copies of the Microorganisms which are useful for the methods described circular plastid genome present in each plant cell, takes herein include, for example, photosynthetic bacteria (e.g., advantage of the enormous copy number advantage over cyanobacteria), non-photosynthetic bacteria. (e.g., E. coli), nuclear-expressed genes to permit expression levels that can yeast (e.g., Saccharomyces Cerevisiae), and algae (e.g., readily exceed 110% or more of the total soluble plant pro microalgae Such as Chlamydamonas reinhardtii). Modified tein. The process of determining the plasmic state of an organ organisms are then grown, in some embodiments in the pres ism of the present disclosure involves screening transfor ence of CO, to produce the terpene/terpenoid. In one mants for the presence of exogenous nucleic acids and the embodiment, the terpenefterpenoid is fusicoccene. absence of wild-type nucleic acids at a given locus of interest. 0068 Methods and compositions described herein may 0073. The present disclosure, among other embodiments, take advantage of naturally occurring product production provides genetically modified microorganisms capable of pathways in an organism, for example, a photosynthetic producing useful products, for example, terpenes and terpe organism. An example of one such production pathway is the noids such as fusicoccadierte. In some embodiments, produc isoprenoid biosynthetic pathway. Methods and compositions tion of a desired terpene/terpenoid is achieved by way of described herein may take advantage of naturally occurring expressing one or more codon biased terpenefterpenoid syn biological molecules as Substrates for the recombinantly thases in the microorganism. Examples of terpene/terpenoid expressed enzyme or enzymes of interest. IPP. DMAPP, FPP, synthases useful for the present disclosure are PaFS or PaFS and GPP may serve as substrates for enzymes of the present homologs. Other proteins, such as, for example, EAS27885 disclosure, and may be natively produced in bacteria, yeast, from (occidioides immitis, a nucleic acid encoding protein and algae (e.g., through the mevalonate pathway or the MEP EAA68264 from Gibberella zeae, or a nucleic acid encoding pathway (see FIG. 2 and FIG. 3). protein EAQ85668 from Chaetoinium blobosum, can be 0069. Insertion of genes encoding an enzyme of the cloned and utilized in the present disclosure. Nucleic acid present disclosure into a host organism may lead to increased sequences artificially modified to adopt “regular codon bias production of terpenes/terpenoids and/or derivatives, such as or “hot” codon bias, such as, for example, IS-87 (“regular fusicoccadiene. in one disclosed method, fusicocca-2,10(14) codon biased PaFS with a tag: SEQID NO: 4) or IS-88 (“hot” diene is produced. Production of terpene/terpenoid deriva codon biased PaFS with a tag: SEQID NO: 7) can be utilized tives may be artificially increased by introducing extra copies in the creation of genetically modified organisms useful for of an artificially engineered, exogenous enzyme modulating terpene/terpenoid (e.g., fusicoccadiene) production. the isoprenoid biosynthetic pathway. (0074 Terpene Synthases 0070 Production of fusicoccadiene can be modulated by 0075 Terpene synthases are also known as terpene cycla introducing a fusicoccadiene synthase. Such as PaFS, or a ses, and these two terms can be used interchangeably homolog derived from bacteria, yeast, fungi, orananimal into throughout the disclosure. an organism. Fusicoccadiene synthase homologs have been 0076 Generally speaking, terpene cyclases use one of identified in Coccidioides immites, Gibberella zeae, Alterna three Substrates the ten carbon geranyl diphosphate, fifteen ria brassicicola, and Chaetonium blobosum, for example. carbon farnesyl diphosphate, or twenty carbon geranyigera Production of fusicoccadiene can also be modulated by intro nyl diphosphate, as Substrates. Cyclases acting on geranyl ducing a portion of PaFS into an organism, wherein the por diphosphate produce ten carbon monoterpenes; those that act tion exerts an enzymatic activity on a Substrate. Enzymes with on farnesyl diphosphate produce sesquiterpenes, and those terpene cyclase activity (terpene synthases) can also be uti that act on geranylgeranyl diphosphate produce diterpenes. lized in optimizing the production of a fusicoccadiene. For Some naturally occurring terpene synthase (for instance, fusi example, enzymes capable of forming Co geranylgeranyl coccadiene synthase from Pamygdali) contain both a terpene diphosphate (GGPP) can be utilized in optimizing the pro cyclase domain, as well as a prenyl transferase or chain elon duction of a fusicocca.diene. gation domain. If present, this chain elongation domain will 0071. By way of example, a non-vascular photosynthetic produce the GPP FPP, or GGPP substrate for the cyclase from microalga species can be genetically engineered to produce the five carbon isoprenoids isoprenyl diphosphate and dim fusicoccadiene, such as C. reinhardtii, D. Salina, H. Pluvalis, ethylallyl diphosphate. S. dimorphus, D. viridis, and D. tertiolecta. Production of 0077. In one exemplary organism (Phomopsis amygdali), fusicoccadiene in these microalgae can be achieved by engi fusicoccadiene synthase catalyzes two reactions, the first is a neering the microalgae to express an exogenous enzyme prenyl transferase reaction producing GGPP from three mol PaFS in the chloroplast or nucleus. PaFS can convert IPP and ecules of IPP and one molecule of DMAPP, and a second DMAPP into fusicocca-2,10(1.4)-diene. reaction where GCPP is cyclyzed to produce fusicocca-2,10 0072 The expression of the PaFS can be accomplished by (14)diene and inorganic pyrophosphate. These two reactions inserting an exogenous gene encoding PaFS into the chloro reside in two separate domains of the protein; the N-terminal plast or nuclear genome of the microalgae. The modified terpene cyclase and the C-terminal prenyl transferase strain of microalgac can be made homoplasmic to ensure that domains. the PaFS gene will be stably maintained in the chloroplast 0078 Terpenoids are the largest, most diverse class of genome of all descendents. A microalga is homoplasmic for a natural products and they play numerous functional roles in gene when the inserted gene is present in all copies of the primary metabolism. Well over 30 cDNAs encoding plant US 2012/0058535 A1 Mar. 8, 2012 terpenoid synthases involved in primary and secondary three units ofisopentenyl disphosphate, by the action of pre metabolism have been cloned and characterized. Terpenoids nyltransferases, to give geranyl disphosphate (C10), farnesyl are present and abundant in all phyla, and they serve a mul disphosphate (C15), and geranylgeranyl disphosphate (C20), titude of functions in their internal environment (primary respectively (for example, as described in Ramos-Valdivia, A. metabolism) and external environment (ecological interac C., et al. (1997) Nat. Prod. Rep. 14:591-603; Ogura, K. and tions). The biosynthetic requirements for terpene production Koyama, T. (1998) Chem. Rev. 98: 1263-1276: Koyama, T. are the same for all organisms (a source of isopentenyl and Ogura, K. (1999) isopentenyl disphosphate &phosphate, isopentyl diphosphate isomerase or other source and prenyltransferases, pp. 69-96 in Comprehensive Natural of dimethylallyi diphosphate, prenyltransferases, and terpene Products Chemistry including Steroids and Cartenoids, Vol. synthases). 2, edited by D. E. Cane, Pergamon, Oxford; and FIG. 2). 0079. Of the more than 30,000 individual terpenoids now These three acyclic prenyl disphosphates serve as the imme identified (for example, as described in Buckingham, J. diate precursors of the corresponding monoterpenoid (C10), (1998) Dictionary of Natural Products on CD-ROM, Version sequiterpenoid (C15), and diterpenoid (C20) classes, to 6.1. Chapman & Hall, London), at least half are synthesized which they are converted by a very large group of enzymes by plants. A relatively small, but quantitatively significant, called the terpene (terpenoid) synthases. These enzymes are number of terpenoids are involved in primary plant metabo often referred to as terpene cyclases, since the products of the lism including, for example, the phytol side chain of chloro reactions are most often cyclic. phyll, the carotenoid pigments, the phytosterols of cellular I0081. A large number of terpenoid synthases of thernon membranes, and the gibberellin plant hormones. However, oterpene (for example, as described in Croteau, R. (1987) the vast majority of terpenoids are classified as secondary Chem. Rev. 87: 929-954; and Wise, M. I. and Croteau, R. metabolites, compounds not required for plant growth and (1999) Monoterperte biosynthesis, pp. 97-153 in Compre development but presumed to have an ecological function in hensive Natural Products Chemistry: Isoprenoids Including communication or defense (for example as described in Har Steroids and Carotenoids, Vol. 2, edited by D. E. Cane, Per borne, J. B. (1991) Recent advances in the ecological chem gamon, Oxford), sesquiterpene (for example, as described in istry of plant terpenoids, pp. 396-426 in Ecologial Chemistry Cane, D. E. (1990) Isoprenoid biosynthesis: overview, pp. and Biochemistry of Plant Terpenoids, edited by J. B. Har 1-13 in Comprehensive Natural Products Chemistry: Iso borne and F. A Tomas-Barberan. Clarendon Press, Oxford). prenoids Including Steroids and Cartenoids, Vol.2, edited by Mixtures of terpenoids, such as the aromatic essential oils, D. E. Cane, Pergamon, Oxford; and Cane, D. E. (1999) Ses turpentines, and resins, form the basis of a range of commer quiterpene biosynthesis: cyclization mechanisms, pp. 150 cially useful products (for example, as described in Zinkel, D. 200 in Comprehensive Natural Products Chemistry: iso F. and Russell, J. (1989) Naval Stores: Production, Chemis prenoids Including Steroids and Cartenoids, Vol.2, edited by try, Utilization. Pulp Chemicals Association, New York, p. D. E. Cane, Pergamon, Oxford), and diterpk.mk. (for 1060; and Dawson, F. A. (1994) The Amazing Terpenes. example, as described in West, C. A. (1981) Biosynthesis of Naval Stores Rev. March/April: 6-12), and several terpenoids diterpenes, pp. 375-411 in Biosynthesis of Isoprenoid Com are of pharmacological significance, including the monoter pounds, Vol. 1, edited by J.W. Porter and S. L. Spurgeon, John penoid (C10) dietary anticarcinogen limonene (Crowell, P.L. Wiley & Sons, New York; and MacMillan, J. and Beale, M. and Gould, M. N. (1994) CRC Crit. Rev. Oncogenesis 5:1- (1999) Diterpene biosynthesis, pp. 217-243 in Comprehen 22), the sequiterpenoid (C15) antimalaria artemisinin (Van sive Natural Products Chemistry. Isoprenoids Including Ste Geldre, E., et al. (1997) Plant Mol. 33: 199-209), and the roids and Carotenoids, Vol. 2, edited by D. E. Cane, Perga diterpenoid anticancer drug Taxol (Holmes, A. et al. (1995) mon, Oxford) series have been isolated from both plant and Current status of clinical trials with paclitaxel and docetaxel, microbial sources, and these catalysts have been described in pp. 31-57 in Taxane Anticancer Agents. Basic Science and detail. All terpenoid synthases are very similarin physical and Current Status, edited by C. I. George, T. T. Chen, I. Ojima chemical properties, for example, in requiring a divalent and D. M. Vyas. American Chemical Society Symposium metalion as the only for catalysis, and all operate by Series 583, Washington D. C.). electrophilic reaction mechanisms. In this regard, the terpe 0080 All terpenoids are derived from isopentenyl dispho noid synthases resemble the prenyltransferases; however, it is sphate (FIG. 2). In plants, this central precursor is synthesized the tremendous range of possible variations in the carboca in the cytosol via the classical acetate/mevalonate pathway tionic reactions (cyclizations, hydride shifts, rearrangements, (for example, as described in Qureshi, N. and Porter, J. W. and termination steps) catalyzed by the terpenoid synthases (1981) Conversion of acetyl-Coenzyme A to isopentenyl that sets them apart as a unique enzyme class. Indeed, it is pyrophosphate, pp. 47-94 in Biosynthesis of Isoprenoid Com these variations on a common mechanistic theme that permit pounds, Vol. 1, edited by J.W. Porter and S. L. Spurgeon, John the production of essentially all chemically feasible skeletal Wiley & Sons, New York; and Newman, J. D. and Chappell, types, isomers, and derivatives that form the foundation for J. (1999) Crit. Rev. Biochem. Mol. Biol. 34: 95-106), by the great diversity of terpenoid structures, which the sequiterpenes (C15) and triterpenes (C30) are I0082. Several groups have suggested that plant terpene formed, and in plastids via the alternative, pyruvate/glyceral synthases share a common evolutionary origin based upon dehydes-3-phosphate pathway (for example, as described in their similar reaction mechanism and conserved structural Eisenreich, W. M., et al. (1998) Chem. Biol. 5:R221-R233; and sequence characteristics, including amino acid sequence and Lichtenthaler, H. K. (1999) Annu. Rev. Plant Physiol. homology, conserved sequence motifs, intron number, and Plant Mol. Biol. 50:47-66), by which the monoterpenes exon. size (for example, as described in Mau, C. J. D. and (C10), diterpenes (C20), and tetraterpenes (C40) are formed. West, C.A. (1994) Proc. Natl. Acad. Sci. USA'91: 8479-8501; Following the isomerization of isopentyl disphosphane to Back, K. and Chappell, J. (1995) J. Biol. Chem, 270:7375 dimethylallyl disphosphate, by the action of isopentyl dis 7381; Bohlman, J., et al. (1998) Proc. Natl. Acad. Sci. USA phosphate isomerase, the latteris condensed with one, two, or 95: 4126-4133; and Cseke, L., et al. (1998) Mol. Biol. Evol. US 2012/0058535 A1 Mar. 8, 2012

15: 1491-1498). A sequence comparison between three iso I0084. In addition to gene sequences for several lated plant terpenoid synthase genes (a mortoterpene cyclase angiosperm terpene synthases being able to be found in pub limonenk. synthase (Colby, S.M., et al. (1993).J. Biol. Chem. lic databases, see Table 1, Trapp, S. C. and Croteau, R. B. 268: 23016-23024), a sesquiterpene cyclase epi-aris (Genetics (2001) 158:811-832) determined the genomic tolochene synthase (Facchini, P. J. and Chappell, J. (1992) sequences of several terpene synthases from gymnosperms. Proc. Natl. Acad. Sci. USA 89:11088-11092), and a diterpene Trapp, S. C. and Croteau, R. B. (Genetics (2001) 158:811 cyclase cashene synthase (Mau, C. J. D. and West, C. A. 832) determined the genomic (gDNA) sequences corre (1994) Proc. Natl. Acad. Sci. USA'91: 8479-8501) gave clear sponding to six (Agggabi, AgfEabis, Agg-pin1, Agfösell, indication that these genes, from phylogenetically distant Agg-lim, Tbggtax) conifer terpene Xynthase cDNAS (Table 1). This selection of genes represents constitutive and induc plant species, were related, a conclusion Supported by ible terpenoid synthases from each class (inonoterpene, ses genomic analysis of intron number and location (Mau, C. J. quiterpene, and diterpene), Sequence alignment of each D. and West, C. A. (1994) Proc. Natl. Acad. Sci., USA 91: cDNA with the corresponding gldNA, including putative ter 8479-8501; Back, K. and Chapell, J. (1995) J. Biol. Chem, pene synthases from Arabidopsis, established exon and intron 270:7375-7381; Chappell, J. (1995) Plant Physiol. 107:1-6: boundaries, exon and intron sizes, and intron placement; and Chappell, J. (1995) Amu Rev. Plant Physiol. Plant Mol. generic dicot plant 5’- and 3'-splice site consensus sequences Biol. 46:521-547), Phylogenetic analysis of the deduced (5' NAGVGTAAGWWWW; and 3'YAGV) were used to amino acid sequences of 33 terpenoid synthases from define specific boundaries (Hanley, B. A. and Schuler, M. A. angiosperms and gymnosperms allowed recognition of six (1988) Nucleic Acid Res. 16:7159-7176; and Turner, G. terpenoid synthase (Tps) gene Subfamilies on the basis of (1993) Gene organization in filamentous fungi, pp. 107-125 chides (Bohlmann, J., et al. (1998) Proc. Natl. Acad. Sci., USA in The Eukatvotie Genome. Organization and Regulation, 95: (4126-4133). The majority of terpene synthases analyzed edited by P. M. A. Borda, S. Oliver, and P. F. Ci., SIMS, produce secondary metabolites and are classified into three Cambridge University Press, New York), These analyses Subfamilies, Tpsa (Sesquiterpene and diterpene synthases reveal a distinct pattern of intron phase for each intron from angiosperms), Tpsb (monoterpene synthase from throughout the entire Tps gene family. angiosperms of the Lamiaceae), and Tpsd (11 gymnosperm I0085. A wide range of nomenclatures has been applied to monoterpene, sesquiterpene, and diterpene synthases). The the terpenoid synthases, none of which are systematic. Trapp, other three subfamilies, Tpsc, Tpse, and Tpsf, are represented S.C. and Croteau, R. B. (Genetics (2001) 158:811-832) uses by the single angiosperm terpene Synthase types copalyl dis a unified and specific nomenclature system in which the Latin phosphate synthase, kaurene synthase, and linaloot synthase, binomial (two letters), substrate (one- to four-letter abbrevia respectively. The first two are diterpenes synthases involved tion), and product (three letters) are specified. Thus, ag22, the in early steps of gibberellin biosynthesis (MacMillan, J. and original cDNA designation for abietadiene synthase from A. Beale, M. (1999) Diterpene biosynthesis, pp. 217-243 in grandis (a Tpsd Subfamily member), becomes AgggABI for Comprehensive Natural Products Chemistry: Isoprenoids the protein and Agggabi for the gene, with the remaining Including Steroids and Carotenoids, Vol. 2, edited by D. E. conifer synthases (and other selected genes) described Cane, Pergamon, Oxford). These two Tps subfamilies are accordingly (for example, as described in Table 1). grouped into a single Glade and are involved in primary I0086 A key to Table 1 is provided below. metabolism, which Suggests that the bifurcation of terpenoid I0087 Tc, genomic sequences by Trapp, S.C. and Croteau, synthases of primary and secondary metabolism occurred R. B. (Genetics (2001))58:811-832); NA, sequences unavail before the separation of angiosperms and gymnosperms able in the public databases but disclosed in journal reference: (Bohlmann, J. G., et al. (1998) Proc. Natl. Acad, Sci. USA95: pc, sequences obtained by personal communications, ds, 4126-4133). A detailed analysis of the monoterpene synthase, sequences in public database by direct Submission hut not linalool synthase from Clarkia representing Tpsf, was con published; p, sequences in database with putative function; c. ducted by Cseke, L., et al. (1998) Mol. Biol. Evol. 15: 1491 confirmed gene by experimental &termination stated in data 1498. base; i, two possible isozymes reported for the same region 0083. The isolation and analysis of six genomic clones referred to as A1 and A2; -, no former gene name or accession encoding terpene synthases of conifers, ((-)-pinene (C10), number. Species names are: Abies grandis, Arabidopsis (-)-iimonene (C10), (E)-C.-bisabolenk. (C15), d-setinene thaliana, Clarkia concinna, Gossypium arboreum, Hyoscya (C15), and abietadiene synthase (C20) from Abies grandis mus muticus, Mentha longifolia, Mentha spicata, Nicotiana and taxadiene synthase (C20) from Taxus brevifolia), all of tabacum, Ricinus communis, Perilla frutescens, Taxus brevi which are involved in natural products biosynthesis, has been folia, and Zea mays. described by Trapp, S.C. and Croteau, R. B., Genetics (2001) I0088 Former names, respectively, for (2)-copalyl diphos 158:81 1-832. Genome organization (intron number, size, phate synthase and ent-kaurene synthase were ent-kaurene placement and phase, and exon size) of these gymnosperm synthase A (KSA) and ent-kaurene synthase B (KSB), and terpene synthases was compared by Trapp, S.C. and Croteau, mutant phenotypes were gal and ga2; these designations have R. B. (Genetics (2001) 158:811-832) to eight previously char been used loosely. acterized angiosperm terpene synthase genes and to six puta I0089 Nomenclature architecture is specified as follows. tive terpene synthase genomic sequences from Arabidopsis The Latin binomial two-letter abbreviations are in spaces 1 thaliana. Three distinct classes of terpene synthase genes and 2. The substrates (1- to 4-letter abbreviations) are in were discerned, from which assumed patterns of sequential spaces 3-6, consisting of 1- or 2-letter abbreviations for sub intron loss and the loss of an unusual internal sequence ele strate utilized in boldface (e.g., g, geranyl diphosphate; f, ment suggest that the ancestral terpenoid synthase gene farnesyl diphosphate; gg, geranylgeranyl diphosphate; c. resembled a contemporary conifer diterpene synthase gene in copalyldiphosphate; ch, chrysanthemyl diphosphate; in low containing at least 12 introns and 13 exons of conserved size. ercase) followed by stereochemistry and/or isomer definition US 2012/0058535 A1 Mar. 8, 2012

(e.g., a, b, d, g, etc. followed by epi (e), E, Z, -, 1, etc.). The blocks with specified lengths, representing exons 1-15. The 3-letter product abbreviation indicates the major product is an terpenoid synthase genes are divided into three classes (class olefin; otherwise the quenching nucleophile is indicated, 1, class II and class III), which appear to have evolved sequen (e.g., ABI, abietadiene synthase; BORPP bornyldiphosphate tially from class I to class III by intron loss and loss of the synthase; CEDOH, cedrol synthase); uppercase specifies pro conifer diterpene internal sequence domain (CDIS). (FIG. 5C) Class I Tps genes comprise 12-14 introns and 13-15 tein and lowercase specifies cDNA or gldNA. All letters exons and consist primarily of diterpene synthases found in except species names are in italics for cDNA and gene. Dis gymnosperms (secondary metabolism) and angiosperms tinction between cDNA and gldNA must be stated or a g is (primary metabolism). (FIG. 5B) Class IITps genes comprise added before the abbreviation, e.g. Thggtax cDNA. and 9 introns and 10 exons and consist of only gymnosperm gTbggtax, or Tbggtax gene (nomenclature system devised by monoterpene and sesquiterpene synthases involved in sec S. Trapp, E. Davis, J. Crock, and IR. Croteau, and as dis ondary metabolism. (FIG. 5A) Class III Tps genes comprise cussed in Trapp, S. C. and Croteau, R. B., Genetics (2001) 6 introns and 7 exons and consist of angiosperm monoter 158:811-832). pene, sesquiterpene, and diterpene synthases involved in sec 0090. A comparison of genomic structures (as shown in ondary metabolism. Exons that are identically shaded illus FIGS.5A, B. and C) indicate that the plant terpene synthase trate sequential loss of introns and the CDIS domain, over genes consist of three classes based on intron/exon pattern; evolutionary time, from class I through class III. The 12-14 introns (class 1), 9 introns (class II), or 6 introns (class methionine at the translational start site of the coding region III). Using this classification, based on distinctive exon/intron (and alternatives), highly conserved histidines, and single or patterns, seven conifer genes that Trapp, S.C. and Croteau, R. double arginines indicating the minimum mature protein B. (Genetics (2001) 158:811-832) studied were assigned to (Williams, D. C. et al., (1998) Biochemistry 37: 12213 class I or class II. Class I comprises conifer diterpene Syn 12220) are represented by M. H. RR, or RX (X representing thase genes Agggabi and Tbggtax and sesquiterpene synthase other amino acids that are sometimes Substituted), respec tively. The enzymatic classification as a monoterpene, ses Agfixbis and angiosperm synthase genes specifically quiterpene, or diterpene synthase is represented by C10, C15, involved in primary metabolism (Atgg-coppi and Ceglinoh). C20, respectively. Conifrterpene synthases were isolated and Terpene synthase class I genes contain 11-14 introns and sequenced to determine genomic structure; all other terpene 12-15 of exons of characteristic size, including the CDIS synthase sequences were obtained from public databases or domain comprising exons 4, 5, and 6 and the first approxi by personal communication (see Table 1). Putative terpene mately 20 amino acids of exon 7, and introns 4, 5, and 6 (this synthases are referred to as putative proteins and are illus unusual sequence element corresponds to a 215-amino-acid trated based upon predicted homology. Two different predic region (Pro 137-Leu351) of the Agggabi sequence). Class II tions of the same putative protein (accession no. Z97341) arc Tps genes comprise only conifer monoterpene and sesquiter shown as limonene synthase A1 and A2; if A1 is correct, the pene synthases, and these contain 9 introns and 10 exons; genomic pattern Suggests that Attim (accession no. Z97341) introns 1 and 2 and the entire CDIS element have been lost, is a sesquiterpene synthase; if A2 is correct, then Atlim (ac including introns 4, 5 and 6. Class III Tps genes comprise cession no. Z97341) is a monoterpene synthase. In the analy only angiosperm monoterpene, sesquiterpene, and diterpene sis of intron borders of the Msg-lim/Mig-lim chimera and synthases involved in secondary metabolism, and they con Hinfreti genes (see Table 1), only a single intron border (5' or tain 6 introns and 7 exons. introns 1, 2, 7, 9, and 10, and the 3') was sequenced to determine intron placement; size was not CDIS domain have been lost in the class III type. The introns determined. The intronlexon borders predicted for a number of class III Tps genes (introns 3, 8, and 11-14) are conserved of terpene synthases identified in the Arabidopsis database among all plant terpene synthase genes and were described as were determined to be incorrect; these data were reanalyzed introns respectively, in previous analyses (Mau, C. J. D. and and new predictions used. The number in parentheses repre sents the deduced size (in amino acid residues) of the corre West, C. A. (1994) Proc. Nail, Acad. Sci. USA 91: 8479 sponding protein or preprotein, as appropriate. 8501; Back, K. and Chapell, J. (1995) J. Biol. Chem. 270: 0093 Table 1 provides the names of various terpene syn 7375-7381; and Chappell, J. (1995) Arum Rev. Plant Physiol. thases and provides the GenBank accession numbers for both Plant Mol. Biol. 46:521-547). the cDNA and g|NA of many of the listed terpene synthases. 0091. A number of diterpene products may be produced in A listing of the articles cited in Table 1 is provided below. Vivo by inserting an exogenous or endogenous gene encoding 0094. The following articles are cited in Table 1: Back, K. a diterpene synthase into the chloroplast or nuclear genome of and Chapell, J. (1995).J. Biol. Chem. 270:7375-7381; Bohl an organism, for example, a microalgae, yeast, or plant. When mann, J., et al. (1997).J. Biol. Chem. 272:21784-21792; Bohl the functional diterpene synthase is expressed by the organ mann, J., et al. (1998a) Proc. Natl. Acad. Sci., USA95:6756 ism, the exogenous or endogenous enzyme will utilize either 6761; Bohlmann J., et al. (1999) Arch Biochem. Biophys. the endogenous geranylgeranyl diphosphate as a Substrate, or 368:232-243; Chen, X., et al. (1996) J. Nat. Prod. 59:944 if the exogenous or endogenous enzyme contains a GGPP 951; Colby, S. M., et al. (1993) J. Biol. Chem. 268:23016 synthase domain, will utilize the endogenous IPP and 23024: Csekf, L., et al. (1998) Mol. Bio. Evol. 15:1491-1498: DMAPP as substrates. The enzyme will convert the substrates Davis, E. M., et al. (1998) Plant Physiol. 116:1192; Facchini, to a diterpene in vivo. Examples of diterpene synthases that P. J., and Chappell, J. (1992) Proc. Nall Acad. Sci. USA 89:11088-11092; Mau, C.J. D. and West, C. A. (1994) Proc. may be used in this manner include Abietadiene synthase, Natl. Acad. Sci. USA 91:8479-8501; Steele, C. L., et al. Taxadiene synthase, Cashene synthase, and ent-Kaurene Syn (1998).J. Biol. Client. 273:2078-2089; Stofer Vogel, B., et al. thase. (1996).J. Biol. Cheni. 271:23262-23268; Sun, T. and Kamiya, 0092. Trapp, S. C., and Croteau R. B. (Genetics 158:811 Y. (1994) Plant Cell 6:1509-1518; Sun, T. P., et al. (1992) 832 (2001) studied the genomic organization of plant terpene Plant Cell 4:119-128; Wiidung, M. R. and Croteau, R. (1996) synthase (Tps) genes and the results of their studies are shown J. Biol. Chem. 271:9201-9204: Yamaguchi, S., et al. (1998) in FIGS.5A, B, and C. Black vertical bars represent introns Plant Physiol. 116:1271-1278; and Yuba, A., et al. (1996) 1-14 (Roman numerals in figure) and are separated by shaded Arch. Biochem. Biophys. 332:280-287. US 2012/0058535 A1 Mar. 8, 2012 15

Terpene synthase name GenBank

Former cDNA accession no.

Products Species gene Enzyme genomic' cDNA gDNA

Abietadiene A. grandis ag22 AgggABI Agggabi USO768 AF326516 (E)-C-Bisabolene A. grandis ag1 AgfECBIS AgfECabis AFOO6.195 AF3265.15 (-)-Camphene A. grandis ag6 Agg-CAM Agg-cam U87910 Y-Humulene A. grandis ag5 AgfyHUM Agfyhum U92267 (-)-Limonene A. grandis ag10 Agg-LIM1 Agg-in AFOO6193 AF326518 Myrcene A. grandis ag2 AggMYR Aggmyr U87908 (-)-C/B-Pinene A. grandis ag3 Agg-PIN1 Agg-pin U87909 AF326S 17 (-)-C-Pinene? (-)-limonene A. grandis ag11 Agg-PIN2 Agg-pin2 AF1392O7 (-)-B-Phellandrene A. grandis ag8 Agg-BPHE Agg-?phe AF1392OS 8-Selinene A. grandis ag4 AgföSEL1 Agfösel 1 U92266 AF326S13 AgföSEL2 Agfösel2 AF326514 Taxadiene T. brevifolia Tb1 TbggTAX Tbggiax U48796 AF326519 Terpinolene A. grandis ag9 AggTEO AggieO AF139206 - 5-epi-Aristolochene Nicotiana tabacum TEAS3 NtfeARI3 Ntfearis LO4680 LO468O TEAS4 NtfeARI4 Ntfeari4 LO4680 LO468O 5-epi-Aristolochene A. thaliana AteARI Ateari ALO22224 8-Cadinene G. arboreum CAD1-A GafoCAD1A Gafcadia X96429 Y18484 8-Cadinene G. hirsutum CAD1-A GhföCAD1 Ghfcadi U883.18 – 8-Cadinene G. arboreum gCAD1-B GaföCAD1B Gaföcad ib X95323 Cadinene A. thaliana AtCAD Atcad ALO22224 Casbene Ricinits communis C8S RcggCAS RcggCaS L32134 NA (-)-Copalyl A. thaliana GA1 Atgg-COPP1 Atgg- U11034 NA diphosphate Copp ACOO4044? ent-Kaurene A. thaliana GA2 Atgg-KAU Atgg-kati AFO34774 ACOO72O2 (-)-Limonene Perilla frutescens PFLC1 Pfg-LIM1 Pfg-lim I D493.68 ABOOS/44 (-)-Limonene Mentha spicaia LMS Msg-LIM Msg-im L13459 (-)-Limonene M. longifolia LMS Mlg-LIM MIg-lim AF17S323 Limonene A. thaliana AtLIMA1 Atlina Z97341 AtLIMA2 Atlina2 Limonene A. thaliana AtLIMB Atimb Z97341 (S)-Linalool Ciarkia concinna LIS CcgLINOH Ceglinoh AFO676O2 Linalool A. thaliana AtgLINOH Atglinoh ACO2294 Vetispiradiene Hyoscyamus muticus Chimera HmfVET Hmei U2O187 NA Vetispiradiene A. thaliana AtVET Atvet ALO22224

Reference

Products cDNA gDNA Region on chromosome Abietadiene STOFERVOGEL Trapp and Croteau' — et al. (1996) (E)-C-Bisabolene BOHLMANN et al. (1998a) Trapp and Croteau (-)-Camphene BOHLMANN et al. (1999) Y-Humulene STEELE et al. (1998) (-)-Limonene BOHLMANN et al. (1997) Trapp and Croteau Myrcene BOHLMANN et al. (1997) (-)-C/B-Pinene BOHLMANN et al. (1997) Trapp and Croteau (-)-C-Pinene? (-)-limonene BOHLMANN et al. (1999) (-)-B-Phelandrene BOHLMANN et al. (1999) 8-Selinene STEELE et al. (1998) Trapp and Croteau'. Taxadiene WILDUNG and Trapp and Croteau'. CROTEAU (1996) Terpinolene BOHLMANN et al. (1999) 5-epi-Aristolochene FACCHINI and FACCHINI and CHAPPELL (1992) CHAPPELL (1992) 5-epi-Aristolochene Bevan et al. Chromosome 4 BAC F1C12 (ESSA) nt 44054-38820 8-Cadinene CHEN et al. (1996) Liang et al. as 8-Cadinene DAVIS et al. (1998) 8-Cadinene Chen et al. Cadlinene Bevan et al. Chromosome 4 BAC F1C12 (ESSA) nt 44054-38820 Casbene MAU and WEST (1994) Westpe (-)-Copalyl SUN and KAMIYA (1994) Sun et al. (1992) Chromosome 4 (Top) BAC diphosphate Bastide et al. TS8 it 34971-41856 ent-Kaurene YAMAGUCHI Vysotskaia et al. Chromosome 1 BAC T8K14 et al. (1998) nt 435S2-4742O (-)-Limonene YUBA et al. (1996) Tsubouchi (-)-Limonene COLBY et al. (1993) (-)-Limonene Crock and Croteau Jones and Davis’ US 2012/0058535 A1 Mar. 8, 2012 16

-continued Limonene’’’ Bevan et al.’ Chromosome 4 CF6 (ESSA 1) nt 164983-17OSOS Limonene Bevan et al.’ Chromosome 4 CF6 (ESSAI) nt 172598-175344 (S)-Linalool CSEKE et al. (1998) CSEKE et al. (1998) - Linalool Federspiel Chromosome 1 BACFIIP17 nt 73996-7890S Vetispiradiene BACK and Chappel P. CHAPPELL (1995) Vetispiradiene Bevan et al. Chromosome 4 BAC F12C12 (ESSA) nt 54692-56893

0095. In addition to the terpene synthases in Table 1, addi tional exemplary terpene synthases include Bisobotene Syn TABLE 2-continued thase, (-)-Pinene synthase, Ö-Selinene synthase. (-)-Li monene synthase, Abeitadiene synthase, and Taxadiene Examples of Enzymes Involved in the ISOprenoid Pathway synthase. Enzyme Source NCBI protein ID 0096. Examples of synthases include, but are not limited Farnesene Citinos AAKS4279 to, botryococcene synthase, timonene synthase, 1.8 cineole Farnesene Pabies AAS47697 synthase, a-pinene synthase, camphene synthase, (+)-sab Bisabolene Pabies AAS47689 inene synthase, myrcene synthase, abietadiene synthase, Sesquiterpene A. thaliana NP 1977.84 taxadiene synthase, farnesyl pyrophosphate synthase, amor Sesquiterpene A. thaliana NP 1753.13 GPP Chimera phadiene synthase, (E)-C.-bisabotene synthase, diapophy GPPS-LSU+ SSU fusion toene synthase; or diapophytoene desaturase. Additional Geranylgeranyl reductase A. thaliana NP 177587 examples of enzymes useful in the disclosed embodiments Geranylgeranyl reductase C. reinhardtii EDPO9986 are described in Table 2. FPPA118W G. gallus

TABLE 2 0097. The synthase may also be B-caryophyllene syn thase, germacrene A synthase, 8-epicedrol synthase, Valen Examples of Enzymes Involved in the Isoprenoid Pathway cene synthase, (+)-8-cadinene synthase, germacrene C Syn Enzyme Source NCBI protein ID thase, (E)-B-farriesene synthase, casbene synthase, Limonene M. spicaia 2ONH A vetispiradiene synthase, 5-epi-aristotochene synthase, aris Cineole S. officinalis AAC26O16 tolchene synthase, a-humulene, (E.E)-O-farnesene synthase, Pinene A. grandis AAK835 64 (-)-3-pinene synthase, limonene cyclase, linaloot synthase, Camphene A. grandis AAB70707 (+)-bornyl diphosphate synthase, levopimaradiene synthase, Sabinene S. officinalis AAC26O18 Myrcene A. grandis AAB71084 isopimaradiene synthase, (E)-Y-bisabolene synthase, copalyl Abietadiene A. grandis Q38710 pyrophosphate synthase, kaurene synthase, longifoiene Syn Taxadiene T. brevifolia AAK83566 thase, Y-humulene synthase, Ö-Selinene synthase, phelland FPP G. gallus PO8836 renc Synthase, terpinotene synthase, (-)-3-carene synthase, Amorphadiene A. annia AAF61439 Bisabolene A. grandis O81086 Syn-copalyl diphosphate synthase, a-terpineol synthase, Syn Diapophytoene S. airetts pimara-7, 15-diene synthase, ent-sandaaracopimaradiene Diapophytoene desaturase S. airetts synthase, Sterner-13-ene synthase, S-linalool synthase, GPPS-LSU M. spicaia AAFO8793 geraniol synthase, Y-terpinene synthase, linalool synthase, GPPS-SSU M. spicaia AAFO8792 GPPS A. thaliana CAC16849 E-B-ocimene synthase, epi-cedrol synthase, C-Zingiberene GPPS C. reinhardtii EDP05515 synthase, guaiadiene synthase, cascarilladiene synthase, cis FPP E. coi NP 414955 muuroladiene synthase, aphidicoian-16b-ol synthase, eliza FPP A. thaliana NP 199588 bethatriene synthase, Sandalol synthase, patchoulol synthase, FPP A. thaliana NP 1934.52 FPP C. reinhardtii EDPO31.94 Zinzanol synthase, cedrol synthase, Scareol Synthase, copatol Limonene L. anglisiifolia ABB73O44 synthase, or manoot synthase. Monoterpene S. lycopersicum AAX69064 (0098 Nucleic Acids Proteins ad Enzymes Terpinolene O. basilicum AAV63792 Myrcene O. basilicum AAV63791 0099. The vectors and other nucleic acids disclosed herein Zingiberene O. basilicum AAV63788 can encode polypeptide(s) that promote the production of Myrcene Q. ilex CAC41012 intermediates, products, precursors, and derivatives of the Myrcene Pabies AAS47696 products (e.g., terpenes and terpenoids) described herein. For Myrcene, Ocimene A. thaliana NP 179998 Myrcene, Ocimene A. thaliana NP 567511 example, the vectors can encode polypeptide(s) that promote Sesquiterpene Z. mays; B73 AAS88571 the production of intermediates, products, precursors, and Sesquiterpene A. thaliana NP 199276 derivatives in the isoprenoid pathway. Sesquiterpene A. thaliana NP 193064 0100. The enzymes utilized in practicing the present dis Sesquiterpene A. thaliana NP 193066 Curcumene P. Cabin AAS86319 closure may be encoded by nucleotide sequences derived Farnesene M. domestica AAX19772 from any organism, including bacteria, plants, fungi and ani Farnesene C. Sativits AAUOS951 mals. In some instances, the enzymes are terpene synthases. As used herein, a “terpene synthase' is a naturally or non US 2012/0058535 A1 Mar. 8, 2012

naturally occurring enzyme which produces or increases pro (0103. The term “biased, when used in reference to a duction of terpene/terpenoids and/or their derivatives. Terpe codon, means that the sequence of a codon in a polynucle nes/terpenoids of the present disclosure can be otide has been changed such that the codon is one that is used monoterpenes, diterpenes, triterpenes, sesquiterpenes, or any preferentially in the target which the bias is for, e.g., alga other naturally or non-naturally occurring terpene. In some cells, or chloroplasts. A polynucleotide that is biased for embodiments, the terpene is fusicoccadiene. Sonic instances, chloroplast codon usage can be synthesized de novo, or can be a terpene synthase of the present disclosure is fusicoccadiene genetically modified using routine recombinant DNA tech synthase, producing fusicoccadiene. In other instances, a ter niques, for example, by a site-directed mutagenesis method, pene synthase of the present disclosure catalyzes the conver to change one or more codons such that they are biased for sion of IPP and/or DMAPP into a terpene/terpenoid of inter chioroplast codon usage. Chioroplast codon bias can be vari est, such as fusicoccadiene. The enzymes may have one or ously skewed in different plants, including, for example, in more distinct catalytic activities, such as prenyitransferase alga chloroplasts as compared to tobacco. Generally, the activity and/or terpene cyclase activity. In some embodi chloropiast codon bias selected reflects chloroplast codon ments, a host cell may be genetically modified so as to pro usage of the plant which is being transformed with the nucleic duce more than one exogenous or endogenous polypeptide acids of the present disclosure. For example, where C. rein (e.g., enzyme) which, in combination results in the produc hardtti is the host, the chloroplast codon usage is biased to tion of a desired product (e.g., terpeneftelpenoid). In some reflect alga chloroplast codon usage (about 74.6% AT bias in instances, the polypeptides may be naturally occurring the third codon position). polypeptides. In other instances, the polypeptides and/or the 0104. The terms “hot” codon bias or “regular codon bias genes encoding them may be modified from their natural are used broadly here to refer to different types of artificially state, including, but not limited to finctional truncations, introduced codon bias to a gene. “Regular codon bias refers genetic modifications, or synthetically synthesized poly to a codon bias closely following the codon usage of the host nucleotides. Polynucleotides encoding enzymes and other organism into which the gene is introduced. Such regular proteins useful in the present disclosure may be isolated and/ codon bias can involve the alteration of one or more codons or synthesized by any means known in the art, including, but from the native sequence to a codon preferred in a host organ not limited to cloning, Sub-cloning, and PCR. Exemplary ism. In some instances, a host organism will have different DNA manipulations are described in Sambrook et al., codon usages in different genomes. For example, the chioro Molecular Cloning: A Laboratory Manual (Cold Spring Har plast genome of C. reinharchii has a different codon bias than bor Laboratory Press 1989) and Cohen et al., Meth. Enzymol, the nuclear genome. Therefore, codon biasing typically will 297, 192-208, 1998. reflect the targeted genome within the host cell. 0101. An expression vector, including, but not limited to, 0105. “Hot codon bias is similar to regular codon bias in regulatory elements and sequences encoding genes, may that one or more codons from a native sequence are changed comprise nucleotide sequences that are codon biased for to reflect codon usage in the host organism. For “hot” codon expression in the organism being transformed. Therefore, bias, the synthetic gene contains the codon most frequently when synthesizing, for example, a gene for expression in a used by the host genome to encode the desired amino acid at host cell, it may be desirable to design the gene Such that its that position, unless use of that codon would introduce an frequency of codon usage approaches the frequency of the undesired restriction enzyme recognition sequence at a given preferred codon usage of the host cell. In some instances, a position. For instance, there are three codons that encode the native (unmodified) gene may exhibit a complete or partial amino acid isoleucine, ATC, ATT, and ATA. the Chlamy match to the codon bias of the intended target host cell. In clomonas chloroplast genome, the codon ATT is used 77% of Such instances, little or no codon optimization need be per the time, ATC is used 12% of the time, and ATA is used 11% formed. In some organisms, codon bias differs between the of the time. In a “hot” codon biased gene, the codon ATT will nuclear genome and organelle genomes, thus, codon optimi therefore be used at all posifions where isoleucine is to be Zation or biasing may be performed for the target genome encoded, unless use of ATT would introduce an undesired (e.g., nuclear codon biased or chloroplast codon biased). The restriction enzyme recognition site. codons of the host organism may be, for example, A/Trich in 0106 Nucleic Acid and Amino Acid Seqences Useful in the third nucleotide position. Often, A/T rich codon bias is the Disclosed Embodiments used for algae. In some embodiments, at least 50% of the third 0107 SEQID NO:1 Phomopsis amygdah fusicoccadiene nucleotide position of the codons are A or T. In other embodi synthase (PaFS) nucleotide sequence ments, at least 60%, at least 70%, at least 80%, at least 90%, (0.108 SEQ NO:2 PaFS protein sequence at least 95%, or at least 99% of the third nucleotideposition of 0109 SEQ ID NO:3 Strep-Tag amino acid sequence the codons are A or T. including TG linker 0102 One or more codons of an encoding polynucleotide 0110 SEQ ID NO:4 “Regular codon optimized PaFS can be biased to reflect chloroplast and/or nuclear codon nucleotide sequence without tag usage. Mostamino acids are encoded by two or more different 0111 SEQ ID NO:5 “Regular codon optimized PaFS (degenerate) codons, and it is well recognized that various nucleotide sequence with C-terminal Strep Tag organisms utilize certain codons in preference to others. Such (O112 SEQ ID NO:6 Amino acid sequence of PaFS with preferential codon usage, which also is utilized in chloro C-terminal Strep Tag plasts, is referred to herein as "chloroplast codon usage'. The 0113 SEQID NO:7 "Hot” codon optimized PaFS nucle codon bias of Chlamydomonas reinhardtti has been reported. otide sequence without tag See U.S. Application 2004/0014174. Percent identity to the 0114 SEQID NO:8 “Hot” codon optimized PaFS nucle native sequence (in the organism from which the sequence otide sequence with C-terminal Strep Tag was isolated) may be about 50%, about 60%, about 70%, 0115 SEQ ID NO:9 Phaesosphaeria nodorum ent-Kau about 80%, about 90%, about 95%, about 99% or higher. rene synthase nucleotide sequence US 2012/0058535 A1 Mar. 8, 2012

0116 SEQ ID NO:10 Ent-Kaurene synthase protein 0147 SEQID NO:41 Prenyltransferase domain of fusic Sequence occadiene synthase with C-terminal Strep Tag protein 0117 SEQID NO:11 “Hot” codon optimizedent-Kaurene Sequence synthase nucleic acid sequence, without tag 0148 SEQID NO:42 Primer I from Example 12 0118 SEQ ID NO:12 N-terminal FLAG tag amino acid 0149 SEQID NO:43 Primer 2 from Example 12 Sequence (O150 SEQID NO:44 Native nucleotide sequence encod 0119 SEQID NO:13“Hot” codon optimizedent-Kaurene ing a hypothetical protein EAS27885 from C. immitis synthase nucleic acid sequence with N-terminal FLAG tag 0151 SEQ NO:45 Translation of C. immitis protein 0120 SEQ ID NO:14 Amino acid sequence of ent-Kau EAS27885 rene synthase with N-terminal FLAG tag 0152 SEQ ID NO:46 Codon optimized nucleotide 0121 SEQ ID NO:15 Ricinus communis casbene syn sequence for C. immitis EAS27885 without tag thase nucleotide sequence 0153. SEQ ID NO:47 C. immitis hypothetical protein 0122 SEQID NO:16 Casbene synthase protein sequence nucleotide sequence as expressed (IS-92) with C-terminal (0123 SEQ ID NO:17 “Hot” codon optimized casbene Strep tag synthase nucleic acid sequence, without tag 0154 SEQ ID NO:48 C. immitis hypothetical protein (0.124 SEQ ID NO:18 “Hot” codon optimized casbene translation as expressed (IS-92) with C-terminal strep tag synthase nucleic acid sequence, with C-terminal Strep tag (O155 SEQ ID NO:49 Nucleotide sequence Encoding a including TGIN linker hypothetical protein EAA68264 from G. zeae 0.125 SEQ ID NO:19 Strep tag amino acid sequence 0156 SEQ NO:50 Translation of gene encoding hypo including TUN linker thetical protein EAA68264 from G. zeae 0126 SEQID NO:20 Casbene synthase protein sequence (O157 SEQ ID NO:51 Codon optimized gene encoding with Strep-tag hypothetical protein EAA68264 from C. zeae without tag 0127 SEQ ID NO:21 Casbene synthase/GGPP synthase 0158 SEQ ID NO:52 Codon optimized gene encoding fusion protein nucleotide sequence, without tag hypothetical protein EAA68264 from a zeae nucleotide 0128 SEQ ID NO:22 Translation of Casbene synthase/ sequence as expressed with c-terminal strep tag GGPP synthase fusion protein without tag 0159 SEQID NO:53 Translation of gene encoding hypo 0129. SEQID NO:23 CLIP-8x his tag protein sequence thetical protein EAA68264 from G. zeae nucleotide sequence 0130 SEQ ID NO:24 Casbene synthase/GGPP synthase as expressed with c-terminal strep tag fusion protein nucleotide sequence including CLIP-8xhis tag (0160 SEQID NO:54 Nucleotide sequence from Aspergi 0131 SEQ ID NO:25 Casbene synthase/GGPP synthase lius clavatus NRRLI encoding hypothetical protein ACLA fusion protein sequence including CLIP-8x his tag O76850 (0132) SEQ NO:26 Mies grandis Abietadiene synthase (0161 SEQID NO:55 Translation of nucleotide sequence gene nucleotide sequence from Aspergillus clavatus NRRL1 encoding hypothetical 0.133 SEQ ID NO:27 Abietadiene synthase protein protein ACLA 076850 Sequence (0162 SEQ ID NO:56 Codon optimized nucleotide 0134 SEQ ID NO:28 Codon optimized abietadiene syn sequence for hypothetical protein ACLA 076850 without thase nucleotide sequence without tag tags 0135 SEQ ID NO:29 TEV-FLAG tag amino acid (0163 SEQ ID NO:57 Codon optimized nucleotide Sequence sequence for hypothetical protein ACLA 076850 as 0.136 SEQ ID NO:30 Codon optimized abietadiene syn expressed, with c-terminal strep-tag thase nucleotide sequence with C-terminal TEV-FLAG tag (0164 SEQ ID NO:58 Translation of Codon optimized 0137 SEQ ID NO:31 Abietadiene synthase nucleotide nucleotide sequence for hypothetical protein ACLA 076850 sequence with C-terminal TEV-FLAG tag protein sequence as expressed, with c-terminal strep-tag 0138 SEQ NO:32 Ratts brevilolia taxadiene synthase (0165 SEQID1NO:59 Primer 1 from Example 13 gene nucleotide sequence (0166 SEQID NO:60 Primer 2 from Example 13 0139 SEQ ID NO:33 Taxadiene synthase protein (0167 Percent Sequence Identity Sequence 0168 One example of an algorithm that is suitable for 0140 SEQ ID NO:34 Codon optimized taxadiene syn determining percent sequence identity or sequence similarity thase nucleotide sequence without tag between nucleic acid or polypeptide sequences is the BLAST 0141 SEQ ID NO:35 Codon optimized taxadiene syn algorithm, which is described, e.g., in Altschulet al., J. Mol. thase nucleotide sequence with C-terminal TEV-FLAG tag Biol. 215:403-410 (1990). Software for performing BLAST protein sequence analysis is publicly available through the National Center for 0142 SEQ ID NO:36 Taxadiene synthase nucleotide Biotechnology 1 information, The BLAST algorithm param sequence with C-terminal TEV-FLAG tag protein sequence eters W. T. and X determine the sensitivi and speed of the 0143 SEQID NO:37 Prenyltransferase domain of fusic alignment. The BLASTN program (for nucleotide occadiene synthase nucleotide sequence sequences) uses as detintits a word length (W) of 11, an 0144 SEQID NO:38 Prenyltransferase domain of fusic expectation (E) of 10, a cutoff of 100, M-5, N=-4, and a occadiene synthase protein sequence comparison of both Strands. For amino acid sequences, the (0145 SEQID NO:39“Hot” codon optimized prenyltrans BLASTP program uses as defaults a word length (W) of 3, an ferase domain of fusicoccadiene synthase nucleotide expectation (E) of 10, and the BLOSUM62 scoring matrix (as sequence without tag described, for example, in Henikoff & Henikoff (1989) Proc. 0146 SEQID NO:40“Hot” codon optimized prenyltrans Natl. Acad, Sci, USA, 89:10915). In addition to calculating ferase domain of fusicoccadiene synthase nucleotide percent sequence identity, the BLAST algorithm also can sequence with C-terminal Strep Tag perform a statistical analysis of the similarity between two US 2012/0058535 A1 Mar. 8, 2012

sequences (for example, as described in & Altschul, Proc. or carbon. Examples of heteroatoms include, but are not Natl. Acad. Sci, USA, 90:5873-5787 (1993)). One measure limited to, nitrogen, oxygen, Sulfur, and phosphorus. Some of similarity provided by the BLAST algorithm is the smallest products can be hydrocarbon-rich, wherein, for example, at sum probability (P(N)), which provides an indication of the least 50%, at least 60%, at least 70%, at least 80%, at least probability by which a match between two nucleotide or 90%, or at least 95% of the product by weight is made up of amino acid sequences would occur by chance. For example, a carbon and hydrogen. nucleic acid is considered similar to a reference sequence if 0.175 One exemplary group of hydrocarbon products are the Smallest Sum probability in a comparison of the test isoprenoids. Isoprenoids (including terpenoids) are derived nucleic acid to the reference nucleic acid is less than about from isoprene sub-units, but are modified, for example, by the 0.1, less than about 0.01, or less than about 0.001, addition of heteroatoms such as oxygen, by carbon skeleton 0169. A polynucleotide or nucleic acid of the present dis rearrangement, and by alkylation. isoprenoids generally have closure can encode more than one gene. For example, the a number of carbon atoms which is evenly divisible by five, polynucleotide can encode fora first gene and a second gene, hut this is not a requirement as “irregular terpenoids are or a first gene, a second gene, and a third gene. Furthermore, known to one of skill in the art. Carotenoids, such as carotenes any or all of the genes can be the same or different. and Xanthophylls, are examples ofisoprenoids that are useful 0170 The polypeptides expressed in host cells of the products, Asteroid is an example of a terpenoid. Examples of present disclosure, including yeast, bacteria, or a microalga isoprenoids include, but are not limited to, hemiterpenes such as C. reinhardtii may be assembled to form functional (C5), monoterpenes (C10), sesquiterpenes (C15), diterpenes polypeptides and protein complexes. As such, one embodi (C20), triterpenes (C30), tetraterpenes (C40), polyterpenes ment of the disclosure provides a method to produce func (C., wherein “n” is equal to or greater than 45), and their tional protein complexes, including, for example, ditners, derivatives. Other examples of isoprenoids include, but are trimers, and tetramers, wherein the subunits of the complexes not limited to, limonene, 1,8-cineole, ot-pinene, camphene, can be the same or different (e.g., homodimers or het (+)-Sabinene, myrcene, abietadiene, taxadiene, famesyl pyro erodimers, respectively). phosphate, Ilisicoccadiene, amorphadiene, (E)-C.-bisab 0171 A polynucleotide or nucleic acid molecule as olene, Zingiberene, or diapophytoene, and their derivatives. described herein can contain two or more sequences that are 0176 Useful products include, but are not limited to, ter linked in a manner Such that the product is not found in a cell penes and terpenoids as described above. An exemplary in nature. The two or more nucleotide sequences can be group of terpenes are diterpenes (C20). Diterpenes are hydro operatively linked and, for example, can encode a fusion carbons that can be modified (e.g. oxidized, methyl groups polypeptide, or can comprise an encoding nucleotide removed, or cyclized); the carbon skeleton of a diterpene can sequence and a regulatory element. A nucleic acid molecule be rearranged, to form, for example, terpenolds, such as fusi also can be based on, but manipulated so as to be different coccadiene. Fusicoccadiene may also be formed, for from a naturally occurring polynucleotide, (e.g. biased for example, directly from the isoprene precursors, without being chtoroplast codon usage or a restriction enzyme site can be bound by the availability of diterpene or GGDP. Genetic inserted into the nucleic acid). A nucleic acid molecule may modification of organisms, such as algae, by the methods further contain a peptide tag (e.g., His-6 tag), which can described herein, can lead to the production of Ilisicoccadi facilitate identification of expression of the polypeptide in a ene, for example, and other types of terpenes, such as cell. Additional tags include, for example: a FLAG epitope; a limonene, for example. Genetic modification can also lead to c-myc epitope: Strep-TAGII; biotin; and glutathione S-trans the production of modified terpenes, such as methyl squalene ferase. Such tags can be detected by any method known in the or hydroxylated and/or conjugated terpenes such as pacli art (e.g., anti-tag antibodies or streptavidin). Such tags may taxel. also be used to isolate the operatively linked polypeptide(s), 0177. Other useful products can be, for example, a product for example by affinity chromatography. comprising a hydrocarbon Obtained from an organism 0172 A polynucleotide or nucleic acid sequence compris expressing a diterpene synthase. Such exemplary products ing naturally occurring nucleotides and phosphodiester include ent-kaurene, casbenk.. and fusicocaccadiene, and bonds can be chemically synthesized or can be produced may also include fuel additives. using recombinant DNA methods, using an appropriate poly 0.178 The products produced by the present disclosure nucleotide as a template. In comparison, a polynucleotide may be naturally, or non-naturally (e.g., as a result of trans comprising nucleotide analogs or covalent bonds other than formation) produced by the host cell(s) and/or organism(s) phosphodiester bonds generally are chemically synthesized, transformed. For example, products not naturally produced although an enzyme Such as T7 polymerase can incorporate by algae may include non-native terpenes/terpenoids such as certain types of nucleotide analogs into a polynucleotide and, fusicoccadiene. The host cell may be genetically modified, therefore, can be used to produce Such a polynucleotide for example, by transformation of the cell with a sequence recombinantly from an appropriate template (for example, as encoding a protein, wherein expression of the protein results described in Jellinek et al., Biochemistry 34:11363-11372, in the secretion of a non-naturally produced product or prod 1995), Polynucleotides or nucleic acids useful for practicing uctS. die present disclosure may be isolated from any organism. 0179 Examples of useful products include petrochemical (0173 Products products and their precursors and all other Substances that 0.174 Examples of products contemplated herein include may be useful in the petrochemical industry. Products hydrocarbon products and hydrocarbon derivative products. include, for example, petroleum products, precursors of A hydrocarbon product is one that consists of only hydrogen petroleum, as well as petrochemicals and precursors thereof. molecules and carbon molecules. A hydrocarbon derivative The fuel or fuel products may be used in a combustor such as product is a hydrocarbon product with one or more heteroa a boiler, kiln, dryer or furnace. Other examples of combustors toms, wherein the heteroatom is any atom that is not hydrogen are internal combustion engines Such as vehicle engines or US 2012/0058535 A1 Mar. 8, 2012 20 generators, including gasoline engines, diesel engines, jet 0185. The products may also be refined by altering, rear engines, and other types of engines. Products described ranging, or restructuring hydrocarbons into Smaller mol herein may also be used to produce plastics, resins, fibers, ecules. There are a number of chemical reactions that occur in elastomers, pharmacuticals, neutraceuticais, lubricants, and catalytic reforming processes which are known to one of gels, for example, ordinary skill in the arts. Catalytic reforming can be per formed in the presence of a catalyst and a high partial pressure 0180 Isoprenoid precursors are generated by one of two of hydrogen. One common process is alkylation. For pathways; the mevalonate pathway or the methyterythritol example, propylene and butylene are mixed with a catalyst phosphate (MEP) pathway (FIG. 2 and FIG. 3). Both path Such as hydrofluoric acid or Sulfuric acid, and the resulting ways generate dimethylallyl pyrophosphate (DMAPP) and products are high octane hydrocarbons, which can be used to isopentyl pyrophosphate (IPP), the common C5 precursor for reduce knocking in gasoline blends. isoprenoids. The DMAPP and IPP are condensed to form 0186 The products may also be blended or combined into geranyl-diphosphate (GPP), or other precursors, such as far mixtures to obtain an end product. For example, the products nesyl-diphosphate (FPP) or geranylgeranyl-diphosphate may be blended to form gasoline of various grades, gasoline (GGPP), from which higher isoprenoids are formed. with or without. additives, lubricating oils of various weights 0181 Useful products can also include small alkanes (for and grades, kerosene of various grades, jet fuel, diesel fuel, example, 1 to approximately 4 carbons) such as methane, heating oil, and chemicals for making plastics and other poly ethane, propane, or butane, which may be used for heating mers. Compositions of the products described herein may be (such as in cooking) or making plastics. Products may also combined or blended with fuel products produced by other include molecules with a carbon backbone of approximately means, 5 to approximately 9 carbon atoms, such as naptha or ligroin, 0187. Some products produced from the host cells of the or their precursors. Other products may be about 5 to about 12 disclosure, especially after refining, will be identical to exist carbonatoms, or cycioalkanes used as gasoline or motor fuel. ing petrochemicals, i.e., contain the same chemical structure. Molecules and aromatics of approximately 10 to approxi For instance, crude oil contains the isoprenoid pristane, mately 18 carbons, such as kerosene, or its precursors, may which is thought to be a breakdown product of phytol, which also be useful as products. Other products include lubricating is a component of chlorophyll. Some of the products may not oil, heavy gas oil, or fuel oil, or their precursors, and can be the same as existing petrochemicals. However, although a contain alkanes, cycloalkanes, or aromatics of approximately molecule may not exist in conventional petrochemicals or 12 to approximately 70 carbons. Products also include other refining, it may still be useful in these industries. For example, residuals that can be derived from or found in crude oil, such a hydrocarbon could be produced that is in the boiling point as coke, asphalt, tar, and waxes, generally containing multiple range of gasoline, and that could be used as gasoline or an rings with about 70 or more carbons, and their precursors. additive, even though the hydrocarbon does not normally 0182. The various products may be further refined to a occur in gasoline. final product for an end user by a number of processes. Refin 0188 Vectors ing can, for example, occur by fractional distillation. For 0189 The organisms/host cells herein can be transformed example, a mixture of products, such as a mix of different to modify the production and/or secretion of a product(s) with hydrocarbons with various chain lengths may be separated an expression vector, or a linearized portion thereof, for into various components by fractional distillation. example, to increase production and/or secretion of a product 0183 Refining may also include any one or more of the (s). The product(s) can be naturally or not naturally produced following steps, cracking, unifying, or altering the product, by the organism. Large products, such as large hydrocarbons (e.g. 2C10), may 0190. An expression vector, or a linearized portion thereof be broken down into Smaller fragments by cracking. Cracking can comprise one or more polynucleotides that comprise may be performed by heat or high pressure, such as by Steam, nucleotide sequences that are exogenous or endogenous to Visbreaking, or coking. Products may also be refined by vis the host organism. breaking, for example by thermally cracking large hydrocar 0191 In some instances, a sequence to be inserted into a bon molecules in the product by heating the product in a host cell genome (e.g., a nuclear genome or chloroplast furnace. Refining may also include coking, wherein a heavy, genome) is flanked by two sequences, These flanking almost pure carbon residue is produced. Cracking may also be sequences include those that have at least 50%, at least 60%, performed by catalytic means to enhance the rate of the crack at least 70%, at least 80%, at least 90%, at least 95%, or 100% ing reaction by using catalysts such as, but not limited to, sequence identity to the sequence found in the host cell. The Zeolite, aluminum hydrosilicate, bauxite, or silica-alumina, flanking homologous sequences enable recombination of the Catalysis may be by fluid catalytic cracking, whereby a hot exogenous or endogenous sequence into the genome of the catalyst, Such as Zeolite, is used to catalyze cracking reac host organism through homologous recombination. In some tions, Catalysis may also be performed by hydrocracking, instances, the flanking homologous sequences can be at least where lower temperatures are generally used in comparison 100, at least 200, at least 300, at least 400, at least 500, at least to fluid catalytic cracking. Hydrocracking can occur in the 1000, or at least 1500 nucleotides in length. presence of elevated partial pressure of hydrogen gas. Prod 0.192 Any of the vectors described hereincan further com ucts may be refined by catalytic cracking to generate diesel, prise a regulatory control sequence. A regulatory control gasoline, and/or kerosene. sequence may include, for example, promoter(s), operator(s), 0184 The products may also be refined by combining repressor(s), enhancer(s), transcription termination sequence them in a unification step, for example by using catalysts, (S), sequence(s) that regulate translation, or other regulatory Such as platinum or a platinum-rhenium mix. The unification control sequence(s) that are compatible with the host cell and process can produce hydrogen gas, a by-product, which may control the expression of the nucleic acid molecules of the be used in cracking. present disclosure. In some cases, a regulatory control US 2012/0058535 A1 Mar. 8, 2012 sequence includes transcription control sequence(s) that are nucleotide sequence in a non-vascular, photosynthetic able to control, modulate, or effect the initiation, elongation, organism. For example, the promoter may be an algal pro and/or termination of transcription. For example, a regulatory moter, for example as described in U.S. Publ. Appi. No. control sequence can increase the transcription and/or trans 2006/0234368, now U.S. Pat. No. 7,449,568, issued Nov. 11, lation rate and/or efficiency of a gene or gene product in an 2008, and U.S. Publ. Appi. No. 2004/0014174, and in Hohm organism, wherein expression of the gene or gene product is ann, Transgenic Plant J. 1:81-98(2007). The promoter may upregulated resulting (directly or indirectly) in the increased be a chloroplast specific promoter or a nuclear specific pro production, secretion, or both, of a product described herein. moter. The promoter may an EF1-C, gene promoter or a D The regulatory control sequence may also result in increased promoter. In some embodiments, the polypeptide, for of production, Secretion, or both, of a product by increasing example a synthase, is operably linked to an EF1-C, gene the stability of a gene or gene product. promoter. In other embodiments, a synthase is operably 0193 A regulatory control sequence can be exogenous or linked to a D promoter. Other exemplary promoters that can endogenous in relationship to the host organism. A regulatory be used in the embodiments disclosed herein include, but are control sequence may encode one or more polypeptides that not limited to, the psbA, psbD, tufA, rbell, HSP70A, and are enzymes that promote expression and production of a RBCS2 promoters. desired product. For example, an exogenous regulatory con 0199 A regulatory control sequence can be placed in a trol sequence may be derived from another species of the construct in a variety of locations, including for example, same genus of the organism (e.g., another algal species). within coding and non-coding regions, 5' untranslated regions 0194 Regulatory control sequences that can be used in the (e.g., regions upstream from the coding region), or 3' untrans disclosed embodiments can effect inducible or constitutive expression of a desired sequence. For example, algal regula lated regions (e.g., regions downstream from the coding tory control sequences can be used; these sequences can be of region). Thus, in some instances a regulatory control nuclear, viral, extrachrornosomal, mitochondrial, or chloro sequence can include one or more 3' or 5' untranslated plastic origin. regions, one or more introns, or one or more exons. 0.195 Suitable regulatory control sequences include those 0200 For example, the vector can comprise a 5' regulatory naturally associated with the nucleotide sequence to be region, In some embodiments, the 5' regulatory comprises a expressed (for example, an algal promoter operably linked promoter. The vector can also comprise a 3' regulatory region. with an algal-derived nucleotide sequence in nature). Suitable The promoter can be a constitutive promoter or an inducible regulatory control sequences also include regulatory control promoter. Examples of inducible promoters include, for sequences not naturally associated with the nucleic acid mol example, a light inducible promoter, a nitrate inducible pro ecule to be expressed (for example, an algal promoter of one moter, or a heat responsive promoter. species operatively linked to a nucleotide sequence of another 0201 For example, in some embodiments, a regulatory organism or algal species). control sequence can comprise a Cyclotelta cryptica acetyl 0196. A nucleic acid sequence is operably linked when it is CoA carboxylase 5' untranslated regulatory control sequence placed into a functional relationship with another nucleic acid or a Cyclotella cryptica acetyl-CoA carboxylase 3'-untran sequence. For example, DNA for a presequence or secretory stated regulatory control sequence (for example, as described leader is operatively linked to DNA for a polypeptide if it is in U.S. Pat. No. 5,661,017). expressed as a preprotein which participates in the Secretion 0202 A regulatory control sequence may also encode chi of the polypeptide; a promoter is operably linked to a coding meric or fusion polypeptides, such as the protein AB or SAA, sequence if it affects the transcription of the sequence; or a that promote expression of an endogenous or exogenous ribosome is operably linked to a coding sequence nucleotide sequence or protein. Other regulatory control if it is positioned so as to facilitate translation. Generally, sequences can include intron sequences that may promote operably linked sequences are contiguous and, in the case of translation of an endogenous or exogenous sequence, a secretory leader, contiguous and in reading phase. Linking 0203 The regulatory control sequences used in any of the is achieved by ligation at restriction enzyme sites. If suitable vectors described herein may be inducible. Inducible regula restriction sites are not available, then synthetic oligonucle tory control sequences. Such as promoters, can be inducible otide adapters or linkers can be used as is known to those by light, for example. Regulatory control sequences may also skilled in the art. Sambrook et at. Molecular Cloning, A be autoregulatable. Examples of autoregulatable regulatory Laboratory Manual, 2" Ed., Cold Spring Harbor Press, control sequences include those that are autoregulated by, for (1989) and Ausubel et al., Short Protocols in Molecular Biol example, endogenous ATP levels or by the product produced ogy, 2" Ed., John Wiley & Sons (1992). by the organism. Some instances, the regulatory control 0197) To determine whether a putative regulatory control sequences may be inducible by an exogenous agent. Other sequence is suitable, the putative regulatory control sequence inducible elements are well known in the art and may be can be linked to a nucleic acid molecule encoding a protein adapted for use in the present disclosure. that produces a detectable signal. The construct comprising 0204 Various combinations of the regulatory control the putative regulatory control sequence and nucleic acid may sequences described herein may be embodied by the present then be introduced into an alga or other organism by standard disclosure and combined with other features of the present techniques, and expression of the protein monitored. For disclosure. In some cases, an expression vector comprises one example, if the nucleic acid molecule encodes a dominant or more regulatory control sequences operatively linked to a selectable marker, the alga or organism to be used is tested for nucleotide sequence encoding a polypeptide. Such sequences the ability to grow in the presence of a compound for which may, for example, upregulate secretion, production, or both, the marker provides resistance. of a product described herein. In some cases, an expression 0198 In some cases, a regulatory control sequence is a vector comprises one or more regulatory control sequences promoter. Such as a promoter adapted for expression of a operatively linked to a nucleotide sequence encoding a US 2012/0058535 A1 Mar. 8, 2012 22 polypeptide that effects, for example, upregulates secretion, or more cloning sites Such as a multiple cloning site, which production, or both, of a product. can, hut need not, be positioned Such that an exogenous or 0205. In some instances, such vectors include promoters, endogenous polynucleotide can be inserted into the vector Promoters useful in the present disclosure may come from and operatively linked to a desired element. any source (e.g., viral, bacterial, fingal, protist, oranimal). The 0209. The vector can also contain a prokaryote origin of promoters contemplated for use herein can be, for example, replication (ori), for example, an E. coli ori or a cosmid ori, specific to photosynthetic organisms, prokaryotic or eukary thus allowing maintenance of the vector into a prokaryote otic non-vascular photosynthetic organisms, vascular photo host cell, as well as in a plant chloroplast, as desired. In some synthetic organisms (e.g., flowering plants), yeast, or non instances, the vectors of the present disclosure will contain photosynthetic bacteria. The promoter can be, for example, a elements such as an S. cerevisiae origin of replication. Such promoter for expression in a chloroplast and/or other plastid features, combined with appropriate selectable markers, organelle. Alternatively, the promoter can be a promoter for allows for the vector to be “shuttled' between the target host expression in abacterial host including, for example, a cyano cell and a bacterial and/or yeast cell, for example. The ability bacteria. In one example, the promoter is chloroplast based. to transfer a shuttle vector of the disclosure into a secondary Examples of promoters contemplated for use in the present host may allow for the more convenient manipulation of the disclosure include those disclosed in U.S. Application No. features of the vector. For example, a reaction mixture com 2004/0014174. The promoter can be a constitutive promoter prising a vector comprising a polynucleotide of interest can or an inducible promoter. A promoter typically includes nec be transformed into a prokaryote host cell Such as E. coli, essary nucleic acid sequences near the start site of transcrip amplified, and collected using routine methods, and exam tion, (e.g., a TATA element). ined to identify vectors containing an insert, peptide, or con 0206. A “constitutive' promoter is a promoter that is struct of interest. If desired, the vector can be further manipu active under most environmental and developmental condi lated, for example, by performing site-directed mutagenesis tions. An “inducible' promoter is a promoter that is active on the polynucleotide of interest, then again amplifying and under environmental or developmental regulation. Examples selecting for vectors that have the mutated polynucleotide of of inducible promoters/regulatory elements include, for interest. The shuttle vector can then be introduced into plant example, a nitrate-inducible promoter (for example, as cell chloroplasts, for example, wherein the polypeptide of described in Bocket al. Plant Mol. Biol. 17:9 (1991)), or a interest can be expressed and, if desired, isolated according to light-inducible promoter, (or example, as described in Fein methods known to one of skill in the art. baum et al, Mol Gen. Genet. 226:449 (1991); and Lam and 0210 A vector can also contain additional elements such Chua, Science 248:471 (1990)), or a heat responsive pro as a regulatory element. A regulatory element, as the term is moter (for example, as described in Muller et al., Gene 11(: used herein, broadly refers to a nucleotide sequence that 165-73 (1992)). regulates the transcription or translation of a polynucleotide, 0207 To select integration sites and/or determine codon or the localization of a polypeptide to which it is operatively usage, the genome of C. reinhardtii can be consulted. The linked. Examples include, but are not limited to, an RBS, a entire chloroplast genome of C. reinhardtii is available to the promoter, enhancer, transcription terminator, an initiation public on the world wide web, at the URL “http://www. (start) codon, a splicing signal for intron excision and main chlamy.org/chloro/default.html, which is incorporated tenance of a correct reading frame, a STOP codon, an amber herein by reference. The chloropiast genome is also described or ochre codon, and an IRES, A regulatory element can be a in GenBank Acc. No.:AF396929, and in Maul, J. E., et al., cell compartmentalization signal, for example, a sequence Plant Cell 14 (11), 2659-2679 (2002). Generally, a portion of that targets a polypeptide to the cytosol, nucleus, chloroplast the nucleotide sequence of the chloroplast genomic DNA is membrane, or cell membrane. In some aspects of the present selected as an integration site. Such that it is not a portion of a disclosure, a cell compartmentalization signal (e.g., a chlo gene, a regulatory sequence or a coding sequence, especially roplast targeting sequence) may be ligated to a gene and/or where integration of exogenous DNA would produce a del transcript, Such that translation of the gene occurs in the eterious effect with respect to the chloroplast and/or host cell chloroplast. In other aspects, a cell compartmentalization (e.g., replication of the chloroplast genome). In this respect, signal may be ligated to a gene Such that, following transla the website containing the C. reinhardtii chloroplast genome, tion of the gene, the protein is transported to the chioroplast. the GenBank Acc. No.: AF396929, and Maul, J. E., et al., Such signals are well known in the art and have been widely Plant Cell 14 (11), 2659-2679 (2002), all provide maps show reported (for example, as described in U.S. Pat. No. 5,776, ing the coding and non-coding regions of the chtoroplast 689; Quinn et al., J. Biol. Chem. 1999; 274(20): 14444-54; genome, thus facilitating selection of a sequence useful for and von Heijneet al., Eur: J. Biochem. 1989; 180(3): 535-45). constructing a vector of the present disclosure. For example, 0211 A vector, or a linearized portion thereofmay include the chloroplast vector, p322, is a clone extending from the a nucleotide sequence encoding a reporter polypeptide or Eco (EcoRI) site at about position 143.1 kb to the Xho (Xho other selectable marker. The term “reporter or “selectable I) site at about position 148.5 kb of the C. reinhardtii chloro marker” refers to a polynucleotide (or encoded potypeptide) plast genome (fittp://www.chlamy.org/chlorofaefault.html). that confers a detectable phenotype. A reporter may encode a 0208. A vector utilized in the practice of the disclosure detectable polypcptide, for example, a green fluorescent pro also can contain one or more additional nucleotide sequences tein or an enzyme such as luciferase, which, when contacted that confer desirable characteristics on the vector, including, with an appropriate agent (a particular wavelength of light or for example, sequences such as cloning sites that facilitate luciferin, respectively) generates a signal that can be detected manipulation of the vector, regulatory elements that direct by the eye or by using appropriate instrumentation (for replication of the vector or transcription of nucleotide example, as described in Giacomin, Plant Sci. 116:59-72, sequences contain therein, or sequences that encode a select 1996: Scikantha, Bacterial. 178:121, 1996; Gerdes, FEBS able marker. As such, the vector can contain, for example, one Lett 389:44-47, 1996; and Jefferson, EMBO.J. 6:390.1-3907, US 2012/0058535 A1 Mar. 8, 2012

1997, fl-ghicuronidase). A selectable marker can be, for protein expression have been reported. in addition, reporter example, a molecule that, when present or expressed in a cell, genes have been used in the chloroplast of C. reinhardtii. provides a selective advantage (or disadvan(age) to the cell Reporter genes greatly enhance the ability to monitor gene containing the marker, for example, the ability to grow in the expression in a number of biological organisms. For example, presence of an agent that otherwise would kill the cell. in the chloroplasts of higher plants, 3-glueuroniciase (uidA, 0212. A selectable marker can provide a means to obtain for example, as described in Staub and Maliga, EMBO J. prokaryotic cells, plant cells, or both, that express the marker 12:601-606, 1993), neomycin phosphotransferase (nptII, for and, therefore, can be useful as a component of a vector of the example, as described in Caner et al., Mol. Gen. Genet. 241: disclosure (for example, as described in Bock, R. (2001) 49-56, 1993), adenosyl-3-adenyltransferase (aadA, for Journal of Moleclar Biology 312(3) 425-438). One class of example, as described in Svab and Maliga, Proc. Natl. Acad. selectable markers are native or modified genes which restore Sci., USA 90:913-917, 1993), and Aequarea victoria GFP (for a biological or physiological function to a host cell (e.g., example, as described in Sidorov et al., Plant J. 19:209-216, restores photosynthetic capability or restores a metabolic 1999), have been used as reporter genes (as described in pathway). Other examples of selectable markers include, but Heifetz, Biochemie 82:655-666, 2000). Each of these genes are not limited to, those that confer antimetabolite resistance, has attributes that make them useful reporters of chloroplast for example, dihydrofolate reductase, which confers resis gene expression, Such as ease of analysis, sensitivity, or the tance to methotrexate (for example, as described in Reiss, ability to examine expression in situ. Proteins, such as Bacil Plant Physiol. (Life. Sci Adv.) 13:143-149, 1994); neomycin lus thuringiensis Cry toxins, have been expressed in the chlo phosphotransferase, which confers resistance to the ami ropiasts of higher plants, conferring resistance to insect her noglycosides neomycin, kanamycin, and paromycin (for bivores (for example, as described in Kota et al., Proc. Natl. example, as described in Herrera-Estrella, EMBO 12:987 Acad Sci., USA 96:1840-1845, 1999). Human somatotropin 995, 1983), hygro, which confers resistance to hygromycin (for example, as described in Staub et al., Nat. Biotechnol. (for example, as described in Marsh, Gene 32:481-485, 18:333-338, 2000), a potential biopharmaceutical, has also 1984), trpB, which allows cells to utilize indole in place of been expressed. In addition, several reporter genes have been tryptophan; hisD, which allows cells to utilize histinol in expressed in the chloroplast of the eukaryotic green alga, C. place of histidine (for example, as described in Hartman, reinhardtii, including aadA (for example, as described in Proc. Natl. Acad. Sci., USA 85:8047, 1988); mannose-6- Goldschmidt-Clermont, Nucl. Acids Res. 19:4083-4089 phosphate isomerase which allows cells to utilize mannose 1991; and Zerges and Rochaix, Mol. Cell Biol. 14:5268-5277. (for example, as described in WO94/20627); ornithine decar 1994), uidA (for example, as described in Sakamoto et al., boxylase, which confers resistance to the ornithine decar Proc. Natl. Acad. Sci., USA 90:477-501, 19933; and Ishikura boxylase inhibitor, 2-(difluoromethyl)-DL-ornithine et al., J. Biosci. Bioeng. 87:307-314 1999), Renilla hiciferase (DFMO; for example, as described in McConlogue, 1987. In: (for example, as described in Minko et al., Mol. Gen. Genet. Current Communications in Molecular Biology, Cold Spring 262:4211-425, 1999), and the amino glycoside phospho Harbor Laboratory ed.); and deaminase from Aspergilluster transferase from Acinetobacter baumanii, aph A6 (for reus, which confers resistance to Blasticidin S (for example, example, as described in Bateman and Purton, Mol. Gen. as described in Tamura, Biosci. Biotechnol. Biochem. Genet 263:44)4-410, 2000). 59:2336-2338, 1995), Additional selectable markers include 0214. A gene encoding a protein of interest may be fused those that confer herbicide resistance, for example, a phos to a molecular marker or tag. In some instances, the tag may phinothricin acetyltransferase gene, which confers resistance be an epitope tag or a tag polypeptide. For example, epitope to phosphinothricin (for example, as described in White et al., tags can comprise a sufficient number of amino acid residues Nucl. Acids Res. 18:1062, 1990; and Spencer et al., Theor: to provide an epitope against which an antibody can be made, Appl. Genet. 79:625-631, 1990), a mutant EPSPV-synthase, yet is short enough such that it does not interfere with the which confers glyphosate resistance (for example, as activity of the polypeptide to which it is fused. A tag may be described in Hinchee et al., BioTechnology 91:915-922, unique so that an antibody raised to the tag does not substan 1998), a mutant acetolactate synthase, which confers imida tially cross-react with other epitopes (e.g., a FLAG tag). Zolione or Sulfonylturea resistance (for example, as described Other appropriate tags that may be used, for example, are in Lee et al., EMBO.J. 7:1241-1248, 1988), a mutant psbA, affinity tags. Affinity tags are appended to proteins so that which confers resistance to atrazine (for example, as they can be purified from their crude biological Source using described in Smeda et al., Plant Physiol. 103.911-917, 1993), an affinity technique. Examples of Such tags include, but are a mutant protoporphyrinogen oxidase (for example, as not limited to, chitin binding protein (CBP), maltose binding described in U.S. Pat. NO.:5,767,373), or other markers con protein (MBP), glutathione-s-transferase (GST), a Strep ferring resistance to a herbicide such as glufosinate, Select Tagll tag, and metal affinity tags (e.g., pol(His), Positioning of able markers include, for example, polynucleotides that con tag(s) at the C- and/or N-terminal may be determined based fer dihydrofoiate reductase (DHFR), neomycin, and on, for example, protein function. One of skill in the art will tetracycline resistance for eukaryotic cells; amplicillin resis recognize that selection of an appropriate tag and its location tance for prokaryotes Such as E. coli, and bleomycin, genta in relationship to the protein of interest will be based on mycin, glyphosate, hygrornycin, kanamycin, methotrexate, multiple factors, including for example, the intended use of phleomycin, phosphinotricin, spectinomycin, Streptomycin, the protein and the target protein itself. Sulfonamide, and Sulfonylurea resistance in plants (for 0215 One approach to construction of a genetically example, as described in Maliga et al., Methods in Plant manipulated organism (e.g., algal strain) involves transfor Molecular Biology, Cold Spring Harbor Laboratory Press, mation with a nucleic acid which encodes a gene of interest, 1995, page 39). for example, a gene encoding fusicoccadiene synthase. In 0213 Reporter genes have been successfully used in chlo Some embodiments, a transformation may introduce nucleic roplasts of higher plants, and high levels of recombinant acids into any plastid of the host alga cell (e.g., chloroplast). US 2012/0058535 A1 Mar. 8, 2012 24

In other embodiments, a transforming vector may be extra mation; transformation using wounded or enzyme-degraded chromosomal (e.g., does not integrate into a genome). The immature embryos, or transformation using wounded or organism transformed can be an alga. In still other embodi enzyme-degraded embryogenic callus (fbr example, as ments, bacteria or yeast are transformed. Transformed cells described in Potrykus, Ann. Rev. Plant. Physiol. Plant Mal. are typically plated on selective media following the intro Biol. 42:205-225, 1991). duction of exogenous nucleic acids. This method may also 0218. The term “exogenous” is used herein in a compara comprise several steps for screening. Initially, a screen of tive sense to indicate that a nucleotide sequence (or polypep primary transformants is typically conducted to determine tide) being referred to is from a source other than a reference which clones have proper insertion of the exogenous nucleic acids. Clones which show the proper integration and/or vector Source, is linked to a second nucleotide sequence (or polypep capture may be propagated and re-screened to ensure genetic tide) with which it is not normally associated, or is modified stability. Such methodology ensures that the transformants Such that it is in a form that is not normally associated with a contain the genes of interest, In many instances, such screen reference material. ing is performed by polymerase chain reaction (PCR); how 0219 Plastid transformation is a method for introducing a ever, any other appropriate technique known in the art may be polynucleotide into a plant cell chloroplast (for example, as utilized. described in U.S. Pat. Nos. 5,451,513, 5,545,817, and 5,545, 0216 Many different methods of PCR are known in the a 818: WO95/16783; and McBride et al., Proc. Natl. Acad. (e.g., nested PCR or real time PCR). For any given screen, one Sci., USA'91:7301-7305, 1994). In some embodiments, chlo of skill in the art will recognize that PCR components may be roplast transformation involves introducing a desired nucle varied to achieve optimal screening results. For example, otide sequence flanked by regions of chloroplast DNA, allow magnesium concentration may need to be adjusted upwards ing for homologous recombination of the nucleotide when PCR is performed on disrupted alga cells to which sequence into the target chloroplast genome. EDTA (which chelates magnesium) is added to chelate toxic 0220. One of skill in the art will recognize that host cells, metals. In such instances, magnesium concentration may transformed with a vector as described above, include trans need to be adjusted upward, or downward (compared to the formation with a circular or a linearized vector, or a linearized standard concentration in commercially available PCR kits) portion o:a vector. In some instances, one to 1.5 kb flanking by about 0.1, about 0.2, about 0.3, about 0.4, about 0.5, about nucleotide sequences of chloroplast genomic DNA. may be 0.6, about 0.7, about 0.8, about 0.9, about 1.0, about 1.1, about used. Smaller regions of flanking sequences can be used. One 1.2, about 1.3, about 1.4, about 1.5, about 1.6, about 1.7, about of skill in the art would be able to determine the size of the 1.8, about 1.9, or about 2.0 mM. Thus, after adjusting, the flanking region that should be used without undue experimen final magnesium concentration in a PCR reaction may be, for tation. Using this method, point mutations in the chloroplast example about 0.7, about 0.8, about 0.9, about 1.0, about 1.1, 16S rRNA and rps 12 genes, which confer resistance to spec about 1.2, about 1.3, about 1.4, about 1.5, about 1.6, about 1.7, tinomycin and streptomycin, can be utilized as selectable about 1.8, about 1.9, about 2.0, about 2.1, about 2.2, about 2.3, markers for transformation (for example, as described in Svah about 2.4, about 2.5, about 2.6, about 2.7, about 2.8, about 2.9, et al., Proc. Natl., Acad, Sci., USA87:8526-8530, 1990), and about 3.0, about 3.1, about 3.2, about 3.3, about 3.4, about 3.5 can result in stable homoplasmic transformants, at a fre mM or higher. Several examples provided below utilize PCR, quency of approximately one per 100 bombardments of target however, one of skill in the art will recognize that other PCR leaves. techniques may be substituted for the particular protocols 0221 Microprojectile mediated transformation also can described. Following screening for clones with proper inte be used to introduce a polynucleotide into a plant cell chlo gration of exogenous nucleic acids, clones are typical roplast (for example, as described in Klein et al., Nature screened for the presence of the encoded protein. Protein 327:70-73, 1987). This method utilizes microprojectiles such expression screening can be performed by Western blot as gold or tungsten, which are coated with the desired poly analysis and/or enzyme activity assays. nucleotide by precipitation with calcium chloride, spermi 0217. A polynucleotide or recombinant nucleic acid mol dine or polyethylene glycol. The microprojectile particles are ecule of the disclosure can be introduced into host cells, accelerated at high speed into a plant tissue using a device including bacteria, yeast, and algae, chloroplasts or nuclei such as the BIOLISTIC PD-1000 particle gun (BioRad; Her using any method known in the art. A polynucleotide can be cules Calif). Methods for the transformation using biolistic introduced into a cell by a variety of methods, which are well methods are well known in the art (see, e.g., Christou, Trends known in the art and selected, in part, based on the particular in Plant Science 1:423-431, 1996). Microprojectile mediated host cell. For example, when a bacteria, is used as a host cell, transformation has been used, for example, to generate a the expression vector can be introduced into the host cell by variety of transgenic plant species, including cotton, tobacco, any conventional method known to one of skill in the art, Such corn, hybrid poplar and papaya. Important cereal crops such as a calcium chloride or electroporation, as described, for as wheat, oat, barley, Sorghum and rice also have been trans example, in Molecuter Cloning (J. Sambrook et al., Cold formed using microprojectite mediated delivery (for spring Harbor, 1989). When yeast is used as a host cell, the example, as described in Duanet al., Nature Biotech. 14:494 expression vector can be introduced into the host cell using a 498, 1996; and Shimamoto, Curr. Opin. Biotech. 5:158-162, lithium or spheroplast transformation technique, for example. 1994). The transformation of most dicotyledonous plants is in addition, a polyrtucleotide can be introduced into a plant possible with the methods described above. Transformation cell using various techniques. Such techniques include, but of monocotyledonous plants also can be transformed using, are not limited to: a direct gene transfer technique Such as for example, biolistic methods as described above, protoplast electroporation; microprojectile mediated (biolistic) trans transformation, electroporation of partially permeabilized formation using a particle gun; a 'glass bead method'; pol cells, introduction of DNA using glass fibers, and the glass len-mediated transformation; liposome-mediated transfor bead agitation method. US 2012/0058535 A1 Mar. 8, 2012

0222 Transformation frequency may be increased by versed by polypeptides expressed from a nuclear gene and, replacement of recessive rRNA or r-protein antibiotic resis therefore, are not subject to certain post-translational modi tance genes with a dominant selectable marker, including, but fications such as glycosylation. As such, the polypeptides and not limited to the bacterial aad. A gene (for example, as protein complexes produced by Some methods of the disclo described in Svab and Maliga, Proc. Natl. Acad. Sci., USA Sure can be expected to be produced without Such post-trans 90:913-917, 1993). For example, approximately 15 to 20 cell lational modification, division cycles following transformation may be required to 0227. The terms “polynucleotide”, “nucleic acid, “nucle reach a homoplastidic state...it is apparent to one of skill in the otide sequence', or “nucleic acid molecule', or similar terms art that a chloroplast may contain multiple copies of its known to one of skill in the art, are used broadly herein to genome, and therefore, the term "homoplasmic' or mean a sequence of two or more deoxyribonucleotides or “homoplasmy” refers to the state where all copies of a par ribonucleotides that are linked together by a phosphodiester ticular locus of interest are substantially identical. Plastid bond. As such, these terms are used interchangeably through expression, in which genes are inserted by homologous out the specification. These terms include, but are not limited recombination into all of the several thousand copies of the to, RNA and DNA, a gene or a portion thereof, a cDNA, or a circular plastid genome present in each plant cell, takes synthetic poty deoxyribonucleic acid sequence, and can be advantage of the enormous copy number advantage over single stranded or double stranded, as well as a DNA/RNA nuclear-expressed genes to permit expression levels that can hybrid. Furthermore, these terms as used herein include natu readily exceed 10% of the total soluble plant protein. rally occurring nucleic acid molecules, which can be isolated 0223) A method of the disclosure can be performed by from a cell, as well as synthetic polynucleotides, which can be introducing a recombinant nucleic acid molecule into a chlo prepared, for example, by methods of chemical synthesis or roplast or into the nucleus of a cell, wherein the recombinant by enzymatic methods such as by the polymerase chain reac nucleic acid molecule includes a first polynucleotide, which tion (PCR). encodes at least one polypeptide (i.e., 1, 2, 3, 4, or more). In 0228. The nucleotides comprising a polynucleotide can be Some embodiments, a polypeptide is operatively linked to a naturally occurring deoxyribonucleotides, such as adenine, second, third, fourth, fifth, sixth, seventh, eighth, ninth, tenth cytosine, guanine or thymine linked to 2'-deoxyribose, or and/or Subsequent polypeptide. For example, several ribonucleotides such as adenine, cytosine, guanine or uracil enzymes in a hydrocarbon production pathway may be linked to ribose. Depending on the use, however, a polynucle linked, either directly or indirectly, such that products pro otide also can contain nucleotide analogs, including non duced by one enzyme in the pathway, once produced, are in naturally occurring synthetic nucleotides or modified natu close proximity to the next enzyme in the pathway. rally occurring nucleotides. Nucleotide analogs are well 0224 For transformation of chloroplasts, one aspect of the known in the art and are commercially available, as are poly present disclosure is the utilization of a recombinant nucleic nucleotidks containing Such nucleotide analogs (for example, acid construct which contains both a selectable marker and as described in Lin et al., Nucl. Acids Res. 22:5220-5234, one or more genes of interest. In one instance, transformation 1994; Jellinek et al., Biochemistry 34:11363-11372, 1995; of chloroplasts is performed by co-transformation of chloro and Pagratis et al., Nature Biotechnol. 15:68-73, 1997). A plasts with two constructs: one containing a selectable marker phosphodiester bond can link the nucleotides of a polynucle and a second containing the gene(s) of interest. The time otide of the present disclosure; however other bonds, for required to grow some transformed organisms may be example, including mlhiodieyierbond, a phosphorothioate lengthy. The transformants are then screened both for the bond, a peptide-like bond, and any other bond known in the presence of the selectable marker and for the presence of the art may be utilized to produce synthetic polynucleotides (for gene(s) of interest. Typically, secondary screening for the example, as described in Tam et at. Nucl. Acids Res. 22:977 gene(s) of interest is performed by Southern blot. 986, 1994; and Ecker and Crooke, BioTechnology 13:35.1360, 0225. In chloroplasts, regulation of gene expression gen 1995). erally occurs after transcription, and often during translation 0229. Any of the products described herein can be pre initiation. This regulation is dependent upon the chloroplast pared by transforming an organism to cause the production translational apparatus, as well as nuclear-encoded regulatory and/or secretion by Such organism of the product. An organ factors (for example, as described in Barkan and Gold ism is considered to be a photosynthetic organism even if a schmidt-Clermont, Biochemie 82:559-572, 2000; and transformation event destroys or diminishes the photosyn Zerges, Biochemie 82:583-601, 2000). The chloroplast trans thetic capability of the transformed organism (e.g., exog lational apparatus generally resembles that of bacteria; chlo enous nucleic acid is inserted into a gene encoding a protein roplasts contain 70S ribosomes; have mRNAs that lack 5' required for photosynthesis). caps and generally do not contain 3' poly-adenylated tails (for 0230. Any of the expression vectors described herein may example, as described in Harris et al., Microbiol. Rev. 58:700 be adapted for expression of a desired nucleic acid in a chlo 754, 1994); and translation is inhibited in chloroplasts and in roplast or nucleus of a host organism. A number of chloroplast bacteria by selective agents such as chloramphenicol. promoters from higher plants have been identified, for 0226. Some methods of the present disclosure take advan example, as described in Kung and Lin, Nucleic Acids Res. tage of proper positioning of a ribosome binding sequence 13: 7543–7549 (1985). A chloroplast can be transformed by (RBS) with respect to a coding sequence, for example, a an expression vector comprising a nucleic acid sequence that polynucleotide of interest. It has previously been noted that encodes for a protein. In one embodiment the protein may be such placement of an RBS results in robust translation in targeted to the chloroplast by a chloroplast targeting plants (for example, as described in U.S. Application 2004/ sequence. For example, targeting an expression vector or the 0014174, incorporated herein by reference). An advantage of gene product(s) encoded by an expression vector to the chlo expressing polypeptides chloroplasts is that the polypeptides roplast may further enhance the effects provided by the regu do not proceed through cellular compartments typically tra latory control sequences described herein, and may effect the US 2012/0058535 A1 Mar. 8, 2012 26 expression of a protein or peptide that allows for or improves while selecting for the mutations of interest. In one example the accumulation of a fuel molecule. of in vivo shuffling, the mixed population of the specific 0231. The concept of chloroplast targeting described nucleic acid sequence is introduced into bacterial or eukary herein may be combined with other features of the present otic cells under conditions such that at least two different disclosure. For example, a nucleotide sequence encoding a nucleic acid sequences are present in each host cell, terpene synthase (e.g., fusicoccadiene synthase) may be oper 0238 Variant polypeptides of the disclosure having ably linked to a nucleotide sequence encoding a chloroplast altered properties can also be produced using “Sexual PCR.' targeting sequence and the “linked sequence then cloned In Such an approach, amplified or cloned polynucleotides into an expression vector. A host cell is then transformed with possessing a desired characteristic (for example, encoding a the expression vector and may produce more of the synthase polypeptide with a region of higher specificity to a substrate as compared to a host cell transformed with an expression are selected (via screening of a library of polynucleotides, for vector encoding terpene synthase but not achioroplast target example) and pooled. ing sequence. The increased terpene synthase expression may 0239 Variant polypeptides of the disclosure having also result in more of the terpene (e.g., fusicoccadiene) being altered properties can also be produced using "Sequence produced, Saturation Mutagenesis'. In such an approach, every nucle 0232. In yet another example, an expression vector com otide in a selected range of nucleotides is randomized using prising a nucleotide sequence encoding an enzyme that pro an early terminationlextension protocol, described in Wong et duces a product (e.g. fuel product, fragrance product, or al. (2004) Nucleic Acids Research, 32(3):e26. insecticide product), not naturally produced by the organism, 0240. Other techniques known to one skilled in the art can by using precursors that are naturally produced by the organ be used to generate variant polypeptides that can be used in ism as Substrates, is targeted to the chioroplast. By targeting the disclosed embodiments. the enzyme to the chloroplast, production of the product may 0241 Host, Organism be increased in comparison to a host cell, wherein the enzyme 0242 Examples of organisms that can be transformed is expressed, but not targeted to the chloroplast. Without using the compositions and methods herein include prokary being bound by theory, this may be due to increased precur otic or eukaryotic organisms. In some instances, the organ sors being produced in the chloroplast and thus, more prod ism is photosynthetic and can be vascular or non-vascular, ucts may be produced by the enzyme encoded by the intro Organisms useful herein can be of unicellular or multicellular duced nucleotide sequence. organism. 0233 Modification of Enzymes 0243 A host organism is an organism comprising a host 0234 Various methods may be used to generate a variant cell. In some embodiments, the host organism is photosyn polypeptide, for example, a variant terpene synthase. In some thetic. A photosynthetic organism is one that naturally pho embodiments, variant polypeptide enzymes are generated by tosynthesizes (has a plastid) or that is genetically engineered look-through mutagenesis, walk-through mutagenesis, gene or otherwise modified to be photosynthetic. In some shuffling, directed evolution, or sexual PCR. These methods instances, a photosynthetic organism may be transformed allow for the generation of variant polypeptides containing with a construct of the disclosure which renders all or part of random sequence(s), Variant polypeptides made using prede the photosynthetic apparatus inoperable. In some instances a termined modifications of particular residues, variant host organism is non-vascular and photosynthetic. In some polypeptides that utilize evolutionary traits from different embodiments, the host organism is prokaryotic. Examples of genes, and variant polypeptides that combine characteristics/ Some prokaryotic organisms of the present disclosure functions of different parent genes. include, but are not limited to, cyanobacteria (e.g., Synecho 0235. The method of walk-through mutagenesis com coccus, Synechocystis, Athrospira, Gleocapsa, Oscillatoria, prises introducing a predetermined amino acid into each and and Pseudoanabaena) and E. coli. The host organism can be every position in a predefined region (or several different unicellular or multicellular. In some embodiments, the host regions) of the amino acid sequence of a parent polypeptide. organism is eukatyotic, for example; algae (e.g., microalgae, Walk-through mutagenesis is further described in greater macroalgae, green algae, red algae, or brown algae) or fungi detail in U.S. Pat. No. 5,798,208, which is hereby incorpo (e.g., yeast Such as S. cerevisiae, Sz. pombe, and Candida rated by reference in its entirety, spp.). In one embodiment, the green algae is Chlorphycean. In 0236 Look-through mutagenesis comprises introducing a Some embodiments, the host cell is a microalga. Examples of predetermined amino acid into a selected set of positions, or organisms contemplated herein include, but are not limited to, a position, within a defined region (or several different rhodophyta, chlorophyta, heterokontophyta, tribophyta, regions) of the amino acid sequence of a parent polypeptide. glaucophyta, chlorarachniophytes, euglenoids, haptophyta, Look-through mutagenesis is further described in greater cryptomonads, dinofiagellata, and phytoplankton. detail in US Patent Publication No.: 2008/0214406, which is 0244 As used herein, the term “non-vascular photosyn hereby incorporated by reference in its entirety. thetic organism.” refers to any macroscopic or microscopic 0237 Gene shuffling is a method for recursive invitro or in organism, including, but not limited to, algae, protists (such Vivo homologous recombination of pools of nucleic acid frag as euglena), cyanobacteria and other photosynthetic bacteria, ments or polynucleotides. Mixtures of related nucleic acid which does not have a vascular system Such as that found in sequences or polynucleotides are randomly fragmented, and higher plants. Examples of non-vascular photosynthetic reasstmibied to yield a library or mixed population of recom organisms include bryophytes, such as marchantiophytes or binant nucleic acid molecules or polynucleotides. The anthocerotophytes. In some instances, the organism is a equivalents of some standard genetic matings may also be cyanobacteria, or algae (e.g., macroalgae or microalgae). The performed by “gene shuffling in vitro. For example, a algae can be unicellular or multicellular algae. The algae can “molecular backcross' can be performed by repeated mixing be a species of Chlamydomonas, Scenedesmus, Chlorella, or of the mutant's nucleic acid with the wild-type nucleic acid Nannochloropsis, for example. Examples of microalga US 2012/0058535 A1 Mar. 8, 2012 27 include, but are not limited to, Chlamydomonas reinhardtii, plant tissue, plant organ, or plant. Thus, a plant cell can be a D. Salina, H. pluvalis, S. dimorphus, Chlorella vulgaris, N. protoplast, a gamete producing cell, or a cell or collection of salina, N. Oculata, D. viridis, and D. tertiolecta. For example, cells that can regenerate into a whole plant. As such, a seed, the microalgak. Chlamydomonas reinhanitii may be trans which comprises multiple plant cells and is capable of regen formed with a vector, or a linearized portion thereof, encod erating into a whole plant, is considered plant cell for pur ing a fusicoccadiene synthase. In another embodiment, the poses of this disclosure. A plant tissue or plant organ can be a alga is C. reinhardtii 137c. seed, protoplast, callus, or any other groups of plant cells that 0245. In another instances, the organism can be a photo is organized into a structural or functional unit. Exemplary synthetic bacterium. A photosynthetic bacterium can be, for useful parts of a plant include harvestable parts and parts example, a member of the genus Synechocystis, Synechococ useful for propagation of progeny plants. A harvestable part cus, Athrospira. of a plant can be any useful part of a plant, for example, 0246. Also described herein are methods for utilizing non flowers, pollen, seedlings, tubers, leaves, stems, fruit, seeds, photosynthetic bacteria as hosts to produce, for example, roots, and the like. A part of a plant useful for propagation terpenoids. In some instances, the terpenoid is, for example, includes, for example, are seeds, fruits, cuttings, seedlings, fusicoccadiene. Non-photosynthetic bacteria can be useful tubers, rootstocks, and the like. for producing terpenoids as non-metabolized products. In 0252. In other embodiments the photosynthetic organism addition, various E. Coli strains, such as BL 21 or Bacillus is a vascular plant. Non-limiting examples of Such plants spp. can be used in the present disclosure. include various monocots and dicots, including high oil seed 0247 Genetic modifications of yeast host cells can be plants such as high oil seed Brassica (e.g., Brassica nigra, accomplished by complementation, transformation, homolo Brassica napus, Brassica hirta, Brassica rapa, Brassica gous recombination, or other methods knownto one of skill in campestris, Brossica carinata, and Brassica juncea), Soy the art. Genetic modification of bacterial cells can be accom bean (Glycine max), castor bean (Ricinus communis), cotton, plished, for example, by transient or stable transformation, or safflower (Carthamus inctorius), sunflower (Helianthus by modification of the bacterial genome. Techniques for annuus), fiax (Liman usitatissimum), corn (Zea mays), coco transforming bacteria are well known to one of skill in the art. nut (Cocos nucifera), palm (Elaeis guincensis), oilnut trees 0248. As described above, methods and compositions of Such as olive (Olea europaea), Sesame, and peanut (Arachis the present disclosure can also be performed using prokary hypogaea), as well as Arabidopsis, tobacco, wheat, barley, otic or eukaryotic organisms, for example, microorganisms. oats, amaranth, potato, rice, tomato, and legumes e.g., peas, In addition to photosynthetic bacteria, non-photosynthetic beans, lentils, alfalfa, etc. bacteria including, but not limited to, Escherischia coli and 0253) One of skill in the art will recognize that the organ Bacillus spp., can be utilized as host organisms for the isms listed herein are merely representative of the possible embodiments disclosed herein. Additionally, fungi, in par host organisms that can be used in any of the disclosed ticular yeasts including, but not limited to Saccharomyces embodiments, and are not limiting examples. cerevisive, Schizosaccharomcyes pombe, and Candida spp. 0254. Some of the host organisms which may be used to can be utilized as host organisms for the embodiments dis practice the present disclosure are halophilic (e.g., Dunaliella closed herein. salin, D. viridis, or D. tertiolecta). For example, D. Salina can 0249. The methods and compositions of the disclosure can grow in ocean water, salt lakes (sali)ity from about 30 to about be practiced using any plant having chloroplasts, including, 300 parts per thousand), and high Salinity media (e.g., artifi for example, microalga and macroalgae. Examples of Such cial seawater medium, Seawater nutrient agar, brackish water plants are marine algae and seaweed, as well as plants that medium, or seawater medium, for example). In some ern grow in soil. bodiments of the disclosure, a host cell comprising a vector of 0250 Methods and compositions of the disclosure can the present disclosure can be grown in a liquid environment generate a plant (e.g., alga) containing chloroplasts or a which is about 0.1, about 0.2, about 0.3, about 0.4, about 0.5, nucleus that is genetically modified to contain a stably inte about 0.6, about 0.7, about 0.8, about 0.9, about 1.0, about 1.1 grated polynucleotide (for example, as described in Hager about 1.2, about 1.3, about 0.4, about 1.5, about 1.6, about 1.7, and Bock, Appl. Microbial. Biotechnol, 54:302-310, 2000). about 1.8, about 1.9, about 2.0, about 2.1, about 2.2, about 2.3, Accordingly, the present disclosure further provides a trans about 2.4, about 2.5, about 2.6, about 2.7, about 2.8, about 2.9, genic (transpiastormic) plant, which comprises one or more about 3.0, about 31, about 3.2, about 3.3, about 3.4, about 3.5, chloroplasts and/or a nucleus comprising a polynucleotide about 3.6, about 3.7, about 3.8, about 3.9, about 4.0, about encoding one or more endogenous or exogenous polypep .4.1, about 4.2, about 4.3 molar, or higher concentrations of tides (such as a terpene/terpenoid synthase), including a sodium chloride. One of skill in the art will recognize that potypeptide or polypeptides that can specifically associate to other salts (sodium salts, calcium salts, sulfate salts, or potas form a functional protein complex, for example, a fusicocca sium salts, for example) may also be present in the liquid diene synthase. environment, 0251. In a one embodiment, the photosynthetic organism 0255. Where a halophilic organism is utilized for the is a plant. The term “plant' is used broadly herein to refer to present disclosure, it may be transformed with any of the a eukaryotic organism containing plastids, particularly chlo vectors described herein. For example, D, salina may be roplasts, and includes any such organism at any stage of transformed with a vector which is capable of insertion into development, or to part of a plant, including a plant cutting, a the chloroplast genome and which contains nucleic acids plant cell, a plant cell culture, a plant organ, a plant seed, and which encode a terpene producing enzyme (e.g., fusicocca a plantlet. A plant cell is the structural and physiological unit diene synthase), Transformed halophilic organisms may then of the plant, comprising a protoplast and a cell wall. A plant be grown in high-saline environments saltlakes, saltponds, or cell can be in the form of an isolated single cell or a cultured high-saline media, for example) to produce the product(s) of cell, or can be part of higher organized unit, for example, a interest. Isolation of the product(s) may involve removing a US 2012/0058535 A1 Mar. 8, 2012 28 transformed organism from a high-saline environment prior regions generating CO while making fuels by growing one or to extracting the product(s) from the organism. In instances more of the modified organisms described herein near the where the product is secreted into the Surrounding environ ethanol production plant. ment, it may be necessary to desalinate the liquid environ 0260. In some embodiments, the pH of the media in which ment prior to any further processing of the product. the host organism is grown may be controlled. The pH may be controlled using the addition of various acids. The acids used 0256 Host cells can be grown under conditions which to control pH may include CO, nitric acid, phosphoric acid, result in the production of a desired product, such as a terpene or other acids. The pH of the media may be controlled to or terpenoid fusicoccadiene). One of skill in the art will remain within the range of about pH 7.5 to about 8, about 8 to recognize that different growth conditions will be required, about 8.5, about 8.5 to about 9, about 9 to about 9.5, about 9.5 depending on the host cell. For example, where an alga (e.g., to about 10, about 10 to about 10.5, about 10.5 to about 11, or C. reinhardtii) is the host organism, growth in a liquid envi about 11 to about 11.5. ronment containing sufficient nitrogen, phosphorous and 0261 AS discussed above, the organisms may be grown in other essential elements may be required. In another example, outdoor open water, Such as ponds, the ocean, the sea, rivers, where a non-photosynthetic bacterium such as E. coli a host waterbeds, marsh water, shallow pools, lakes, or reservoirs, cell, growth on Solid or liquid media may be appropriate to for example. When grown in water, the organisms can be induce production of the desired product. In some instances, contained in a... halo-like object comprising lego-like par the growth environment is an aqueous environment. ticles. The halo object encircles the algae and allows it to 0257. A host organism may be grown under conditions retain nutrients from the water beneath, while keeping it in which permit photosynthesis, however, this is not a require open Sunlight, ment (e.g., a host organism may be grown in the absence of 0262. In some instances, organisms can be grown in con light). In some instances, the host organism may be geneti tainers wherein each container comprises 1 or 2 or a plurality cally modified in Such away that its photosynthetic capability of organisms. The containers can be configured to float on is diminished and/or destroyed. growth conditions where a water. For example, a container can be filled by a combination host organism is not capable of photosynthesis (e.g., because of air and water to make the container and the host organism of the absence of light and/or genetic modification), typically, (s) in it buoyant, A host organism that is adapted to grow in the organism will be provided the necessary nutrients to Sup fresh water can thus be grown in Salt water (i.e., the ocean) port growth in the absence of photosynthesis. For example, a and vice versa. This mechanism allows for the automatic culture medium in (or on) which an organism is grown, may death of the organism if there is any damage to the container. be supplemented with any required nutrient, including an 0263. In some instances a plurality of containers can be organic carbon source, nitrogen source, phosphorous source, contained withina halo-like structure as described above. For Vitamins, metals, lipids, nucleic acids, micronutrients, and/or example, up to 100, up to 1,000, up to 10,000, up to 100,000, any organism-specific requirement. Organic carbon Sources up to 1,000,000, or more containers can be arranged in a include any source of carbon which the host organism is able meter-square of a halo-like structure. to metabolize including, hut not limited to, acetate, simple 0264. In some embodiments, the product (e.g. fuel prod carbohydrates (e.g., glucose, Sucrose, or lactose), complex uct) is collected by harvesting the organism. The product may carbohydrates (e.g., starch or glycogen), proteins, and lipids, then be extracted from the organism. In some instances, the One of skill in the art will recognize that not all organisms will product may be produced without killing the organisms. Pro be able to sufficiently metabolize a particular nutrient and that ducing and/or expressing the product may not render the nutrient mixtures may need to be modified from one organism organism unviable. In other instances, the product may be to another in order to provide the appropriate nutrient mix. secreted into a growing environment. 0258. A host organism transformed to produce a protein 0265. The product-containing biomass can be harvested described herein, for example, a synthase, can be grown on from its growth environment (e.g. lake, pond, photobioreac land, e.g., ponds, aqueducts, landfills, or in closed or partially tor, or partially closed bioreactor system, for example) using closed bioreactor Systems, Organisms, such as algae, can be any suitable method. Non-limiting examples of harvesting grown directly in water, for example, in oceans, seas, lakes, techniques are centrifugation or flocculation. Once harvested, rivers, or reservoirs. In embodiments where algae are mass the product-containing biomass can be subjected to a drying cultured, the algae can be grown in high density photobiore process. Alternately, an extraction step may be performed on actors. Methods of mass-culturing algae are known in the art, wet biomass. The product-containing biomass can be dried For example, algae can be grown in high density ph.otobiore using any Suitable method. Non-limiting examples of drying actors (see, for example, Lee et al. Biotech. Bioengineering methods include Sunlight, rotary dryers, flash dryers, vacuum 44:1161-1167, 1994) and other bioreactors (such as those for dryers, ovens, freeze dryers, hot air dryers, microwave dryers sewage and waste water treatments) (for example, as and Superheated Steam dryers. After the drying process the described in Sawayama et al. Appl. Micro. Biotech., 41:729 product-containing biomass can be referred to as a dry or 731, 1994). Additionally, algae may be mass-cultured to semi-dry biomass. remove heavy metals (for example, as described in Wilkin 0266. In some embodiments, the production of the product son, Biotech. Letters, 11:861-864, 1989), hydrogen (for (e.g., fuel product, fragrance product, or insecticide product) example, as described in U.S. Patent Application Publication is inducible. The product may be induced to be expressed No. 20030162273), and pharmaceutical compounds, and/or produced, for example, by exposure to light. In yet 0259. In some cases, host organism(s) are grown near other embodiments, the production of the product is auto ethanol production plants or other facilities or regions (e.g., regulatable. The product may form a feedback loop, wherein cities or highways, for example) generating CO. As such, the when the product (e.g. fuel product, fragrance product, or methods discussed herein include business methods for sell insecticide product) reaches a certain level, expression or ing carbon credits to ethanol plants or other facilities or secretion of the product may be inhibited. In other embodi US 2012/0058535 A1 Mar. 8, 2012 29 ments, the level of a metabolite of the organism may inhibit Results are shown in FIG. 11 (Lanes: M-motecular weight expression or secretion of the product. For example, endog marker; 1 =Resin; 2=Elution 5; 3=Elution 4; 4=Elution 3; enous ATP produced by the organism as a result of increased 5=Elution 2: 6=Elution 1; 7=Flow through; 8–PeHet: energy production to express or produce the product, may 9–Clarified; 10–Crude Lysate). A fraction of the crude cell form. a feedback loop to inhibit expression of the product. in lysate was extracted with heptane and analyzed by Gas Chro yet another embodiment, production of the product may be matography using a Mass Selective Detector (GC/MSD), inducible, for example, by an exogenous agent. For example, The results showed accumulation of fusicoccadiene cells. an expression vector for effecting production of a product in This was identified by an essential oils mass spectrum library the host organism may comprise an inducible regulatory con match and by comparison with the GC/MSD spectrum pre trol sequence that is activated or inactivated by an exogenous sented in Toyomasu T. et al., (2007), PNAS 104(9):3084 agent. 3O88. 0267. The following examples are intended to provide 0271 The purified protein was also assayed for activity. illustrations of the application of the present disclosure. The The enzyme was incubated in an assay mixture containing following examples are not intended to completely define or IPP and 1-'C-DMAPP (DMAPP with one carbonuniformly otherwise limit the scope of the disclosure. labeled with 'C). The products of the reaction were extracted with heptane and analyzed by GC/MSD. During the interval EXAMPLES between the first experiment, this, and following experi Example 1 ments, the GC column was changed, resulting in a small change in retention time as the column length was increased. Synthesis of Codon Biased Genes Encoding Fusicoc The result is shown in FIG. 6A, demonstrating the mass cadiene Synthase spectrum of the product (both the m/Z 272 molecular ion and 0268 A nucleic acid (SEQID NO: 1) encoding Phomop the m/Z 229 fragment) was shifted by +1 amu (peak eluted at sis amygdali fusicoccadiene synthase (SEQ ID NO: 2)(gene 12.50 mM). product B.AF45924, 1, termed “PaFS) was synthesized by Example 3 DNA 2.0 in two different codon biases; one codon optimized by DNA 2.0 according to their usual algorithm using the C. Biosynthesis of fusicocca-2,10(14)-diene Ecoli in reinhardtii chloroplast optimization (“regular bias; IS87: vivo SEQID NO: 4), the other utilized the most frequent C. rein hardtii codon at each amino acid position except where a (0272. The codon biased PaFS (SEQID NO:8) with a Strep change was necessary to eliminate undesired restriction sites tag II described in Example 1 was cloned into a bacterial (“hot” codon bias: IS88: SEQID NO: 7). In both cases, DNA expression vector behind the T7 promoter as described in encoding the amino acid sequence of SEQ ID NO: 3 was Example 2. The bacterial gene construct was transformed into fused directly to the C-terminus to add an Age restriction BL21 (DE3) plysS cells (Novagen), grown, and induced with enzyme site to the gene, and to add the Strep-Tagll sequence IPTG at 17°C. for 36 hours. After induction, the cells were for affinity purification and detection. The resulting amino collected by centrifugation, lysed, and extracted with chloro acid sequence is shown in SEQID NO: 6. form. The chloroform extract was dried in a rotary evaporator, and the residue was dissolved in heptane. The sample was Example 2 analyzed by GC/MSD (FIG. 6B) and found to contain fusic occadiene (peak eluted at 12.08 minutes). Production of Fusicoccadiene in vitro by Recombi nant Fusicoccadiene Synthase Example 4 0269. The codon biased PaFS with a Strep tag II described Algal Expression of fusicoccadiene Synthase in Example 1 above, was introduced into E. coli BL-21 cells, In this instance, the nucleic acid sequence encoding fusicoc (0273. The “hot” codon biased PaFS with a Strep tag II cadiene synthase with a Strep tag II (SEQ ID NO: 8) was (encoded by the nucleic acid sequence of SEQ ID NO: 8) ligated into the plasmid pST7, a customized vector using T7 described in Example I was cloned into two algal expression promoter and terminator and containing NdeI and Xbal sites vectors: 1) Chlamydomonas expression vector pSE-3HB for addition of the synthetic fusicoccadiene gene. The result Kart-tD2; a vector containing a Kanamycin resistance gene ing plasmid was transformed into E. coli BL-21 (DE3) plysS driven by the Chlamydmonomas atpA promoter, fusicocca cells (Novagen). All DNA manipulations carried out in the diene synthase driven by the t2 promoter (i.e., a truncated construction of thistransforming DNA were essentially as Chlamydomonas D2 promoter), and flanked by homologous described by Sambrook et al., Molecular Cloning: A Labora regions to drive integration into the Chlamydomonas chloro tory Manual (Cold Spring Harbor Laboratory Press 1989) and plast genome3HB si{e; 2 Chlamydomonas expression vec Cohen et al., Meth. Enzymol. 297, 192-208, 1998. torpSE-D1-Kan; a vector containing a Kanamycin resistance (0270 Expression of IS-88 (“hot” codon optimized fusic gene driven by the Chlamydomonas atpA promoter, fusicoc occadiene synthase; encoded by the nucleic acid sequence of cadiene synthase driven by the D1 promoter, and flanked by SEQ ID NO: 8) in a bacterial host under control of the T7 homologous regions to drive integration into the Chlamy promoter was induced with IPTG. The bacteria were lysed by domonas chloroplast genome D1 site resulting in replace microfluidization, clarified by centrifugation, and the Super ment of the native D1 gene. natant was applied to Streptactin resin (Qiagen, Inc.) used 0274 The algal expression vector pSE-3HB-Kan-tD2 according to manufacturers instructions. The resin was containing SEQID NO:8 was introduced into the chloroplast washed and then the bound protein was eluted with desthio of the algal host strains (strain backgrounds 1690 and 137c, biotin, as instructed. The samples were run on an SDS-PAGE both mating type positive) using biolistic gold followed by gel, stained with coomassie brilliant blue, and imaged. growth on TAP plates with kanamycin selection (50 ug/ml). US 2012/0058535 A1 Mar. 8, 2012 30

Colonies were screened for homoplasmicity and the presence stained with the general dye p-anisaidehyde. The spot near of the fusicoccadiene synthase gene by PCR. Cultures (2 ml) the top of the plate shows the purified fusicoccadiene, of gene positive, homoplasmic algae were collected by cen trifugation, resuspended in 250 ul of methanol. 500 ul of Example 6 saturated NaCl in water and 500 ul of petroleum ether were Production of Fusicaccadiene Synechocystis sp. added to the resuspended cultures. The solution was vortexed Strain PCC6803 for three minutes, then centrifuged at 14,000xg for five min utes at room temperature to separate the organic and aqueous 0278. The nucleic acid encoding the “hot” codon bias of layers. The organic layer (1000 was transferred to a vial insert PaFS (IS-88: SEQID NO: 8) was cloned into the cyanobac in a standard 2 ml sample vial and analyzed using GC/MSD, terium Synechocystis, downstream of the truncated IAA pro on the same column as in Example 2. The mass spectrum at moter from PCC 6803, with the 3'-UTR of the gene encoding 12.49 minutes for one sample (IS-88, PaFS with the “hot” the S-layer protein from L. brevis as the terminator sequence. codon bias under the D2 promoter, in the 1690 algal back The truncated IlrtA has previously been demonstrated to con ground) was obtained. The diagnostic ions at m/Z=272,229. stitutively drive protein expression PCC 6803. The regions of homology utilized for integration into the chromosome were 135, 122, 107.95, and 79 are present in this spectrum, dem from the Ikb regions Surrounding the psby gene, a disposable onstrating the presence of fusicocca-2.10 (14)-diene (FIG. subunit of the Synechocystis photosystem. The vector con 6C), tains a kanamycin marker for antibiotic selection at a concen tration of 5 g/ml. Example 5 0279. This DNA was introduced by natural transformation into Synechocystis sp strain PCC 6803 as follows. Liquid Codon Optimiza on of PaFS in Algal Host Cells with cultures of cells in log phase were concentrated to 10 million Different Genetic Background. celis/mt and washed once with an excess volume of 10 mM NaCl. After removal of the salt solution, the cells were resus 0275. Two codon optimizations of PaFS for algal expres pended in an equal Volume of nitrate-containing medium and sion were tested. As described above, “regular codon bias treated with plasmid DNA at a concentration of 1 ug/mL. The was applied to a nucleic acid encoding PaFS by DNA 2.0 cells and DNA were incubated at room temperature with software to generate sequence IS-87 (SEQ ID NO: 5). shaking and 5% CO2 overnight while shaded from light. The Sequence IS-88 (SEQID NO:8) was generated by replacing following day, the cell suspension was plated onto a nitrate all codons of PaFS with the codons most frequently used in containing agar plate in the presence of 5 ug/mL, kanamycin. the C. reinhanitii chloroplast genome except where such a The plates were exposed to low light levels in the presence of CO for 3 days, and then shifted to highlight conditions for 48 replacement would introduce an undesirable feature Such as a hrs to facilitate clearing. Upon appearance of colonies, clones restriction enzyme site. were isolated, patched to another 5 ugmL kanamycin plate, 0276 Three algal samples were extracted as described in and incubated at room temperature with 5% CO for an addi Example 4 (replacing the petroleum ether with heptane) and tional 5 days. Patches that grew colonies were subjected to analyzed by GC/MSD. FIG. 7A shows the mass spectrum for colony PCR screening with primers specific to the “hot” an algal extract from cells containino PalFS with regular codon bias of the fusicoccadiene synthase gene (termed codon bias in the C. reinhardtii 137c genetic background at PAFS103). Six gene-positive clones were identified (FIG.9). 12.49 minutes post-injection. FIG. 7B shows the mass spec 0280. In order to confirm the presence offusicoccadiene in trum of an algal extract from wild type C. reinhardiii 1690 the gene-positive clones, three of the six clones (clones 1, 3 cells that lack the PaFS gene according to PeR screening and 4) were inoculated into liquid medium and grown for 48 (gene negative). Finally FIG.7C shows the mass spectrum for hours in the presence of light and 5% CO. 3 milliliters of an algal extract from cells containing the PaFS “hot” codon liquid culture of the clones were harvested, pelleted by cen bias gene in C, reinhardtii 1690 from Example 4. The ions for trifugation, and resuspended in brine solution, PCC6803 cells fusicoccadiene are clearly present in FIG. 7A and FIG.7C at expressing a Xylanase gene integrated at the same locus m/Z 229, 135, 123, and 95, and are absent in FIG. 7B. Of the (psby), were utilized as a negative control. Whole cell lysates differently optimized PaFS versions, the “Hot' codon opti were then prepared by Sonication, and the resulting lysates mized clone (SEQID NO:8) produced a much stronger fusi extracted with 500ul of heptane for 2 hours at room tempera coccadiene signal than the "Regular codon optimized clone ture. After phase separation by centriffigation, the organic layer was analyzed by GC. IMSD. Results are shown in FIG. (SEQ ID NO. 5). 10A and FIG. 10B. 0277. Thin layer chromatography was performed to com (0281 FIG. 10A shows the mtz-435 extracted ion chro pare differently optimized PaFS versions (FIG. 8). In FIG. 8, matogram data for three clones (0036-88-1, 0036-88-3, and lane one is fusicoccadiene produced in Viva by E. coli as 0036-88-4 respectively) and a negative control (0036-BD described in Example 3. Lanes 2, 3, and 4 show the heptane 11). The three fusicoccadiene synthase-containing clones all extracts of Chkonydomonas cell cultures expressing genes have a significant peak at 12.48 minutes, while the BD-11 IS-87 (regular codon bias fusicoccadiene synthase; encoded clone does not have a peak. FIG. 10B is the mass spectrometry by the nucleic acid sequence of SEQID NO. 5), IS-88 (“hot” data for clone number one (0036-88-1) confirming the pres codon bias fusicoccadiene synthase; encoded by the nucleic ence of the fusicoccadiene ions as described in example 4. acid sequence of SEQID NO: 8), or IS-89 (the nucleic acid 0282. The m/z 272 extracted ion chromatogram and mass sequence encoding the prenyltransferase domain of fusicoc spectrum of clone I is shown in FIG. 13A and 13B respec cadiene synthase) (SEQID NO: 40), 2 ul samples were spot tively. The extracted ion chromatogram contains a peak at ted onto a silica gel TLC plate, developed with heptane, and 12.5 minutes that gives the characteristic mass spectrum for US 2012/0058535 A1 Mar. 8, 2012

fusicoccadiene containing ions 135, 229 and 272. The sequence of SEQ .1D NO:53. The hot codon optimized m/Z 272 extracted ion chromatogram of the negative control nucleic acid encoding protein ACLA 076850 including the containing a Xylanase gene instead of PaFS contains no peak Strep-tag sequence (SEQ NO:57) encodes the protein at 12.5 minutes (FIG. 13C). sequence of SEQ ID NO:58, The synthesized genes were cloned into several expression vectors: 1) bacterial expres Example 7 sion vector behind the T7 promoter as described in Example 2; 2) Chlamydomonas expression vector behind the t)2 pro Expression of the C-Terminal Domain of fusicocca moter as described in Example 4; 3) Chlamydomonas expres diene Synthase sion vector behind the D1 promoter as described in Example 0283. The C-terminal prenyltransferase domain (SEQ ID 4; and 4) Cyanobacterial expression vector behind the tirtA NO: 40) was cloned into vectorpST7 and transformed into E. promoter as described in Example 6. The host cells are cul coli strain BL-21 as described in Example 2. Cells were tured in conditions appropriate for bacteria (as described in grown in LB/Kan to an ODoo -0.6 and induced by the Example 2), algae (as described in Example 4), or cyanobac addition of IPTG at 16° C. for 24 h. Cells were harvested by teria (as described in Example 6). Cell extracts were prepared centrifugation and the enzyme was purified using streptactin and tested for terpenoid production by the GC/MSD resin Qiagen, Inc. as instructed by the manufacturer. The described in Example 2. purified enzyme was analyzed by SDS-PAGE to confirm the molecular mass, The purified enzyme was assayed for activity Example 9 by incubating with IPP and DMAPP, or with IPP and FPP em. Substrates. After an overnight incubation at 30 the assay mix Expression of Ent-Kaurene in Algal Host Cells ture was treated with alkaline phosphatase to convert the 0286 A gene from Phaeosphaeria nodorum was identified &phosphate esters into their corresponding alcohols. This from Genbank (SEQ ID NO: 9) as encoding ertt-Kaurene mixture was then extracted using heptane, and the heptane Synthase (SEQ ID NO: 10). A “hot” codon optimized extract was analyzed by GC/MSD for the production of gera sequence was synthesized by DNA 2.0 (SEQ ID NO: 13) nylgeraniol (GGOH). In addition to the experimental encoding the ent-kaurene synthase with an N-terminal FLAG samples, a sample of pure GGPP (Sigma-Aldrich) was treated tag (SEQ ID NO:14), SEQ ID NO: 13 was cloned into the with phosphatase and extracted as a positive control. A mass algal expression vector pSE-3HB-Kan-tD2 and transformed spectrum library match confirmed the production of GGOH. into C. reinhardtii as described in Example 4. from both HP and DMAPP as well as IPP and HP Results are 0287 Transformants were grown to mid-log phase and shown in FIG. 12, collected by centrifugation and resuspended in brine. Cells 0284 FIG. 12 shows the total ion chromatograms of three were lysed by bead beating with zirconium beads. Whole cell reaction mixture extracts as analyzed by GC/MSD. One lysates were extracted with 1 mL of heptane by vigorous sample was of the standard compound, another sample was of Vortexing. The resulting emulsion was clarified by centrifu the untransformed E. coli cells, and the third sample is of E. gation and the heptane was transferred to a glass vial contain coli expressing the GGPP synthase as described above. In this ing a small amount of silica gel. The sample was Vortexed and chromatogram, geraniol elutes at time=14.3 minutes. The the silica gel allowed to settle. The heptane layer was than standard compound GGOH produced a peak with abun analyzed by GC/MSD. FIG. 14A is the m/z 272 extractedion dance=40000. The sample from warms-formed E. coli pro chroinatogram of the organic extract from Chlamyclomonas duced a peak with abundance=7000, and the sample from the cells expressing ent-kaurene showing a strong peak at 8.36 GGPP synthase containing E. coli produced a peak with minutes. The mass spectrum (FIG. 14B) of the peak at 8.36 abundance=25000, clearly demonstrating an increase in minutes shows the characteristic ions of ent-kaurene includ GGPP production in the transformed bacteria. ing 229, 257, and 272. Chlarnydamonas cells lacking the gene for ent-kaurene were extracted following the same procedure Example 8 for use as a negative control. The total ion chromatogram of the organic extract of these samples does not contain a peak at Cloning and Transformation of PaFS Homologs 8.36 minutes (FIG. 14C). The mass spectrum of the strong 0285) A GenBank database search for nucleic acids with peak at 8.28 minutes does not contain the ions for ent-kaurene sequence similarity to PaFS was performed. The nucleotide namely, 229, 257 and 272 (FIG. 14D). sequence (SEQID NO. 44), encoding the protein EAS27885 0288 Ent-kaurene synthase was also cloned and (SEQID NO: 45) from Coccidioides immitis; the nucleotide expressed in Scenedesmus cells, The codon optimized ent sequence (SEQID NO: 49) encoding the protein EAA68264 Kaurene synthase (SEQ ID NO: 13) was cloned into the (SEQ ID NO: 50) from Gibberella zeae; and the nucleotide Scenedesmus chloroplast expression vector p04-138, which sequence (SEQ ID NO: 54), encoding the protein ACLA uses the Scenedesmus psbD promoter to drive expression and 076850 from Aspergillus clavatusi (SEQ ID NO: 55) were recombines into the chioroplast genome in an intergenic found as candidate genes with the potential to contain PaFS region near the psbA site. The vector also contains the like activity. These genes were synthesized by DNA 2.0 uti chloramphenicol acetyltransferase resistance gene driven by lizing the most frequent C. reinhardtii codon at each amino the Scenedesmus tufA promoter. Transformants were pro acid position except where a change is necessary to eliminate duced as described in Example 4, except selection was on 25 undesired restriction sites “hot” codon bias). The hot codon ug/ml chloramphenicol instead of kanamycin. optimized nucleic acid encoding protein EAS27885 includ 0289 Cells expressing ent-kaurene synthase were lysed ing the Strep-tag sequence (SEQ NDN() 47) encodes the and extracted following the same procedure used for the protein sequence of SEQID NO:48, The hot codon optimized Chlamydanionas samples described in Example 4. The nucleic acid encoding protein EAA68264 including the organic extracts of the Scenedesmus samples were analyzed Strep-tag sequence (SEQ ID NO:52) encodes the protein by GC/MSD. FIG. 15A shows the totalion chromatogram for US 2012/0058535 A1 Mar. 8, 2012 32 an extract of a Scenedesmus sample that was gene positive for and 272. No gene for casbene synthase is present in C rein ent-kaurene synthase. The mass spectrum of this peak shown hardtii and the wild-type organism does not produce or accu in FIG. 15B contains the molecular ion of 272 as well as the mulate casbene. characteristic 229 and 257 ions, Scenedestnus cells which do not contain the ent-kaurene synthase gene were used as a Example 12 negative control. The total ion chromatogram of the organic Production of Fusicoccadiene in Yeast extracts from this sample shows no peak at 7.9 minutes (FIG. 15C). 0294 The “hot” codon biased PaFS with a Strep tag II (SEQID NO: 8) described in Example 1 is cloned into a yeast expression vector pPIC3.5 under the control of the AOX1 Example 10 promoter, which can be induced by addition of alcohol to the yeast in culture. Expression of Casbene Synthase in Algal Host Cells 0295) To clone the IS-88 gene into the yeast expression vector, the DNA in SEQID NO: 8 is amplified by PCR using 0290. A gene from Ricinus communis was identified from Primer 1-GGATCCAATAATGGAATTTAAATATTCA Genbank (SEQ ID NO: 15) as encoding Casbene Synthase CAAG (SEQID NO:42) and Primer 2-GAATTCTTATTICT (SEQ ID NO: 16). A “hot” codon optimized sequence was CAAATTGAGGGTG (SEQID NO: 43), These primers add synthesized by DNA 2.0 (SEQ ID NO: 18) encoding the a BamHI restriction site and Kozak translation initiation site ent-kaurene synthase with an C-terminal strep tag (SEQ ID to the 5' end of the IS-88 gene, and an EcoRI restriction site to NO:20). SEQID NO: 18 was cloned into the algal expression the 3' end of the IS-88 gene. After amplification, both the PCR vectorpSE-3FB-Kan-tD2 and transformed into C. reinhardtii product and vector pPIC3.5 (Invitrogen, Carlsbad, Calif.) are described in Example 4. digested with Barnfil and EcoR1; the vector digest is treated 0291 Transformants are grown to mid log phase. Cells are with Calf Intestinal Phosphatase, and the digested vector and collected by centrifugation and are resuspended in brine. PCR product are run out on an agarose The gel is stained with Cells are lysed by bead beating with zirconium beads. Whole ethidium bromide, and the bands corresponding to the cell lysates are extracted with 1 mL of heptane by vigorous digested vector and insert are purified from the gel. The vector and insert are mixed, ligated, and transformed into E. coli. Vortexing. The resulting emulsion is clarified by centrifuga After transformation, the bacteria are plated onto LB solid tion and the heptane Supernatant is transferred to a glass vial agar plates containing amplicillin. Resistant colonies are containing a small amount of silica gel. The sample is yor expanded and DNA is prepared from the bacteria, and the texed and the silica get is allowed to settle. The heptartelayer vector is againdigested with EcoR1 and Banifil to confirm the is then analyzed by GC/MSD. correct insertion of the IS-88 gene. 0296. Once the correct expression vector is isolated, it is Example 11 introduced into Pichia pastoris according to directions pro vided with the “Pichia Expression Kit' (Invitrogen, Carlsbad, Synthesis and Expression of Codon-Biased Gene Calif.). Cultures (2 mls) of Pichia yeast expressing IS-88 are Encoding a Fusion of Casbenk Synthase and Gera grown and induced using methanol as directed, and collected nylgeranyl Diphosphate Synthase by centrifugation and resuspended in 250 LS of methanol. Saturated NaCl in water (500 uls), 500 uls of petroleum ether, 0292. In order to increase the in vivo accumulation of and 250 uls of 1mm zirconium beads (Bio-spec Products) are casbene in algae, a gene encoding a fusion of the Ricinus added. The solution is vortexed for three minutes and centri communis casbene synthase and the geranylgeranyl diphos fuged at 14,000 g for five minutes at room temperature to phate synthase domain of Phomopsis amygdali fusicaccadi separate the organic and aqueous layers. The organic layer ene synthase was designed using the most frequent C. rein (100 us) is transferred to a vial insert in a standard 2 ml sample hardtii codon at each amino acid position except where a vial and analyzed using GC/MSD, as described in Example 2. change was necessary to eliminate undesired restriction sites (“hot” codon bias), and was synthesized by DNA 2.0 (SEQID Example 13 NO: 24), encoding the amino acid sequence SEQID NO: 25. Higher plant Expression of fusicoccadiene Synthase In this fusion protein, amino acid residues 1-546 are from the casbene synthase gene, and amino acid residues 547-932 are 0297. The “hot” codon biased PaFS with a Strep tag II from the geranylgeranyl diphosphate synthase gene. SEQID (SEQ ID NO: 8) described in Example 1 is cloned into a NO: 24 was cloned into the pSE-3HB-k-tD2 expression vec Gateway cloning vectorpENTR/D-TOPO (Invitrogen, Carls tor and transformed into C. reinhardtii as described in bad, Calif.) and then transferred to the plant expression vector Example 4. pEarleyGate104 (FIG. 16). 0293 Transformants were grown to produce a 1 L liquid 0298 To clone the IS-88 gene into the Gateway cloning culture. This culture was steam distilled using hexane as the vector, the DNA in (SEQIDNO:8) is amplified by PCR using solvent according to the method of H. Maarse and R. Kepner Primer 1 (CACCATGGAATTTAAATATTCAGAAG (SEQ (1970).J. Agric. Food Chem 18(6)1095-1101. After 10 hours ID NO. 59) and Primer 2 (TTATTTCTCAAATTGAGGTG at reflux, the hexane fraction was concentrated by rotary (SEQ ID NO: 60). The primers add a directional topoi evaporation and analyzed by GC/MSD on a FAMEWAX Somerase cloning sequence to the 5' end of the IS-88 gene. column, FIG. 17A shows the m/z 272 extracted ion chro After amplification, the PCR product is mixed with the matogram of the hexane concentrate, showing a peak at 6.93 pENTR/D-7170P0 vector and transformed into E. coli. After minutes. FIG. 17B shows the mass spectrum of this peak. The transformation, the bacteria are plated onto LB Solid agar characteristic ions for casbene are present including: 229, 257 plates containing 50 ug/mlkanamycin. Resistant colonies are US 2012/0058535 A1 Mar. 8, 2012

grown and DNA is isolated from the cells. The cloning vector quantify the relative amount of fusicoccadiene present in the containing the IS-88 gene and Gateway recombination algae, and normalized to either the number of cells per Vol sequences is digested with Mita and mixed with pBarleyG ume or the ash-free dry weight per volume of the test cultures. ate 104 DNA and clonase, according to the Invitrogen direc The relative amount of fusicoccadiene present reflects the tions. The reaction mixture is transformed into E. coli and flux through the isoprenoid pathway under the different cul plated onto LB Solid agar plates containing 50 ug/ml kana ture conditions. mycin. Resistant colonies are isolated and the plasmid DNA is isolated. 0303 in the same manner, genetic induction of changes in 0299. The expression vector pEarleyGate104-IS-88 is flux through the isoprenoid pathway can be determined by introduced into Agrobacterium tumefaciens according to quantifying fusicoccadiene levels. Algae expressing fusicoc directions provided with the "Agrobacterium transformation cadiene synthase are modified genetically by a number of kit' (MPBiomedicals Life Sciences, Solon, Ohio). Kanamy means, including mutagenesis, breeding, introduction of cin-resistant Agrobacterium cells are isolated on Agrobacte other transgenes, or gene silencing using recombinant nucleic rium medium agar (MPBiomedicals Life Sciences, Solon, acids (for example, siRNA or miRNA). The quantity of fusi Ohio) containing kanamycin. coccadiene present is measured as above. The relative amount 0300. To produce transgenic higher plants, A. tumelaciens of fusicoccadiene present again reflects the flux through the bacteria containing the pEarleyGate 104-IS88 plasmid are isoprenoid pathway. grown in Agrobacterium medium and used to transform Ara 0304 Technical and scientific terms used herein have the bidopsis thaliana Seedlings according to the method of meanings commonly understood by one of ordinary skill in Clough and Bent (1998, Plant Journal 16:735-743). Trans the art to which the instant disclosure pertains, unless other genic plants are identified by resistance to treatment with the wise defined. Reference is made herein to various materials herbicide glufosinate. and methodologies known to those of skill in the art, Standard 0301 Transgenic whole Arabidopsis plants are grown to reference works setting forth the general principles of recom maturity and ground in a mortar and pestle using 1 ml of binant DNA technology include, for example, Sambrook et methanol per plant. The ground up suspension is transferred al., “Molecular Cloning: A Laboratory Manual. 2d ed., Cold to a 2 ml centrifuge tube. Saturated NaCl in water (500 uls), Spring Harbor Laboratory Press, Plainview, N.Y., 1989; 500 ul of petroleum ether, and 250 ul of 1 mm zirconium Kaufman et al., eds., “Handbook of Molecular and Cellular beads (Bio-spec Products) are added to the suspension. The Methods in Biology and Medicine'. CRC Press, Boca Raton, Solution is Vortexed for three minutes and centrifuged at 1995; and McPherson, ed., “Directed Mutagenesis: A Practi 14,000 g for five minutes at room temperature to separate the cal Approach”, IRL: Press, Oxford, 1991. Standard reference organic and aqueous layers. The organic layer (1000) is trans literature teaching general methodologies and principles of ferred to a vial insert in a standard 2 ml sample vial and yeast genetics useful for selected aspects of the disclosure analyzed using GC/MSD as in Example 2. include: Sherman et al. “Laboratory Course Manual Methods in Yeast Genetics’. Cold Spring Harbor Laboratory, Cold Example 14 Spring Harbor, N.Y., 1986, and Guthrie et al., “Guide to Yeast Genetics and Molecular Biology’. Academic, New York, Use of a diterpene Synthase as a Readout of 1991. Isorenoid Pathway Metabolic Flux 0305 While certain embodiments have been shown and 0302 Algal cells expressing the “Hot codon optimized described herein, it will be obvious to those skilled in the art fusicoccadiene synthase (SEQ ID NO:8) are cultured in a that such embodiments are provided by way of example only. number of different conditions expected to modulate the flux Numerous variations, changes, and Substitutions will now through the isoprenoid pathway. These conditions include occur to those skilled in the art without departing from the reduction of nitrogen levels in the growth media, reduction of disclosure. It should be understood that various alternatives to Sulfur levels in the growth media, reduction or increase in the embodiments of the disclosure described herein may be light levels during growth, and modulation of temperature employed in practicing the disclosure. It is intended that the during growth, among others, Cells are collected by centrifu following claims define the scope of the disclosure and that gation and extracted with organic Sotvent as described in methods and structures within the scope of these claims and Example 2. The organic extracts are analyzed by GC/MSD to their equivalents be covered thereby.

SEQUENCE LISTING

<16 Os NUMBER OF SEO ID NOS: 6 O

<21 Os SEQ ID NO 1 &211s LENGTH: 216 O &212s. TYPE: DNA <213> ORGANISM: Phomopsis amygdali

<4 OOs SEQUENCE: 1

atggagttca aatact.cgga agtcgttgaa cc ct caactt attacactgaggggctttgc 60

galaggtatcg atgtgcgcaa gagcaagttc accact Cttg aggat.cgagg teccattcgt. 12O

US 2012/0058535 A1 Mar. 8, 2012 35

- Continued

Met Glu Phe Llys Tyr Ser Glu Val Val Glu Pro Ser Thr Tyr Tyr Thr 1. 5 1O 15 Glu Gly Lieu. Cys Glu Gly Ile Asp Val Arg Llys Ser Llys Phe Thir Thr 2O 25 3O Lieu. Glu Asp Arg Gly Ala Ile Arg Ala His Glu Asp Trp Asn Llys His 35 4 O 45 Ile Gly Pro Cys Gly Glu Tyr Arg Gly Thr Lieu. Gly Pro Arg Phe Ser SO 55 6 O Phe Ile Ser Val Ala Val Pro Glu. Cys Ile Pro Glu Arg Lieu. Glu Val 65 70 7s 8O Ile Ser Tyr Ala Asn. Glu Phe Ala Phe Lieu. His Asp Asp Val Thr Asp 85 90 95 His Val Gly His Asp Thr Gly Glu Val Glu Asn Asp Glu Met Met Thr 1OO 105 11 O Val Phe Lieu. Glu Ala Ala His Thr Gly Ala Ile Asp Thir Ser Asn Lys 115 12 O 125 Val Asp Ile Arg Arg Ala Gly Lys Lys Arg Ile Glin Ser Glin Lieu. Phe 13 O 135 14 O Lieu. Glu Met Lieu Ala Ile Asp Pro Glu. Cys Ala Lys Thir Thr Met Lys 145 150 155 160 Ser Trp Ala Arg Phe Val Glu Val Gly Ser Ser Arg Gln His Glu Thr 1.65 17O 17s Arg Phe Val Glu Lieu Ala Lys Tyr Ile Pro Tyr Arg Ile Met Asp Wall 18O 185 19 O Gly Glu Met Phe Trp Phe Gly Lieu Val Thr Phe Gly Lieu. Gly Lieu. His 195 2OO 2O5 Ile Pro Asp His Glu Lieu. Glu Lieu. Cys Arg Glu Lieu Met Ala Asn Ala 21 O 215 22O Trp Ile Ala Val Gly Lieu. Glin Asn Asp Ile Trp Ser Trp Pro Lys Glu 225 23 O 235 24 O Arg Asp Ala Ala Thr Lieu. His Gly Lys Asp His Val Val Asn Ala Ile 245 250 255 Trp Val Lieu Met Glin Glu. His Glin Thr Asp Val Asp Gly Ala Met Glin 26 O 265 27 O Ile Cys Arg Llys Lieu. Ile Val Glu Tyr Val Ala Lys Tyr Lieu. Glu Val 27s 28O 285 Ile Glu Ala Thir Lys Asn Asp Glu Ser Ile Ser Lieu. Asp Lieu. Arg Llys 29 O 295 3 OO Tyr Lieu. Asp Ala Met Leu Tyr Ser Ile Ser Gly Asn Val Val Trp Ser 3. OS 310 315 32O Lieu. Glu. Cys Pro Arg Tyr Asn Pro Asp Val Ser Phe Asn Llys Thr Glin 3.25 330 335 Lieu. Glu Trp Met Arg Glin Gly Lieu Pro Ser Lieu. Glu Ser Cys Pro Val 34 O 345 35. O Lieu Ala Arg Ser Pro Glu Ile Asp Ser Asp Glu Ser Ala Val Ser Pro 355 360 365 Thir Ala Asp Glu Ser Asp Ser Thr Glu Asp Ser Lieu. Gly Ser Gly Ser 37 O 375 38O Arg Glin Asp Ser Ser Lieu. Ser Thr Gly Lieu Ser Leu Ser Pro Val His 385 390 395 4 OO US 2012/0058535 A1 Mar. 8, 2012 36

- Continued Ser Asn. Glu Gly Lys Asp Lieu. Glin Arg Val Asp Thr Asp His Ile Phe 4 OS 41O 415 Phe Glu Lys Ala Val Lieu. Glu Ala Pro Tyr Asp Tyr Ile Ala Ser Met 42O 425 43 O Pro Ser Lys Gly Val Arg Asp Glin Phe Ile Asp Ala Lieu. Asn Asp Trp 435 44 O 445 Lieu. Arg Val Pro Asp Val Llys Val Gly Lys Ile Lys Asp Ala Val Arg 450 45.5 460 Val Lieu. His Asn. Ser Ser Lieu. Lieu. Lieu. Asp Asp Phe Glin Asp Asn. Ser 465 470 47s 48O Pro Lieu. Arg Arg Gly Llys Pro Ser Thr His Asn. Ile Phe Gly Ser Ala 485 490 495 Gln Thr Val Asn Thr Ala Thr Tyr Ser Ile Ile Lys Ala Ile Gly Glin SOO 505 51O Ile Met Glu Phe Ser Ala Gly Glu Ser Val Glin Glu Val Met Asn Ser 515 52O 525 Ile Met Ile Leu Phe Glin Gly Glin Ala Met Asp Leu Phe Trp Thr Tyr 53 O 535 54 O Asn Gly His Val Pro Ser Glu Glu Glu Tyr Tyr Arg Met Ile Asp Glin 5.45 550 555 560 Llys Thr Gly Glin Lieu. Phe Ser Ile Ala Thir Ser Lieu Lleu Lieu. Asn Ala 565 st O sts Ala Asp Asn. Glu Ile Pro Arg Thr Lys Ile Glin Ser Cys Lieu. His Arg 58O 585 59 O Lieu. Thir Arg Lieu. Lieu. Gly Arg Cys Phe Glin Ile Arg Asp Asp Tyr Glin 595 6OO 605 Asn Lieu Val Ser Ala Asp Tyr Thr Lys Glin Lys Gly Phe Cys Glu Asp 610 615 62O Lieu. Asp Glu Gly Llys Trp Ser Lieu Ala Lieu. Ile His Met Ile His Lys 625 630 635 64 O Glin Arg Ser His Met Ala Lieu. Lieu. Asn Val Lieu. Ser Thr Gly Arg Llys 645 650 655 His Gly Gly Met Thr Lieu. Glu Gln Lys Glin Phe Val Lieu. Asp Ile Ile 660 665 67 O Glu Glu Glu Lys Ser Lieu. Asp Tyr Thr Arg Ser Val Met Met Asp Lieu. 675 68O 685 His Val Glin Lieu. Arg Ala Glu Ile Gly Arg Ile Glu Ile Lieu. Lieu. Asp 69 O. 695 7 OO Ser Pro Asn Pro Ala Met Arg Lieu. Lieu. Lieu. Glu Lieu. Lieu. Arg Val 7 Os 71O 71s

<210s, SEQ ID NO 3 &211s LENGTH: 12 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Strep tag II <4 OOs, SEQUENCE: 3 Thr Gly Ser Ala Trp Ser His Pro Glin Phe Glu Lys 1. 5 1O

<210s, SEQ ID NO 4 &211s LENGTH: 2157 US 2012/0058535 A1 Mar. 8, 2012 37

- Continued

&212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Codon optimized sequence

<4 OOs, SEQUENCE: 4 atggaattta aatatt caga agttgtagaa ccatcaactt actatacaga aggattatgt 6 O gaagg tattg atgitacgitaa atcaaaattt act actittag aagat.cgtgg togctatt cqt 12 O gcacacgaag actggaacaa acacattggit C catgtggtgaat atcgtgg cacattaggit 18O ccacgttitta gttittattitc agttgcagta cct gaatgca titccagaaag attagaagtt 24 O atat cittatgctaatgagtt cqcttittctt cacgatgatg taact gacca cqttggit cac 3OO gacacaggag aggttgaaaa catgaaatg atgactgtat ttittagaagc tigCacataca 360 ggtgctattg acacttctaa taaagtagat attcgt.cgtg ctggtaaaaa acg tatt caa 42O t citcaactitt ttittagaaat gcttgctatt gatcc tdaat gtgctaaaac aac tatgaaa 48O agttgggcac gtttcgtaga gg taggttca agt cqtcagc acgaaacticg ttttgtagaa 54 O ttagcaaaat acattccata ccg tattatg gatgttggtgaaatgttittg gttcggttta 6OO gttacttittg gtttaggttt acatatt cot gat catgagt tagaactittg tagagaactt 660 atggctaatg Cttggattgc agtaggttta caaaatgata tittggagttg gccaaaagaa 72 O cgtgatgctg caa.cattaca togg taaagat catgtagtta atgcaatttg ggittittaatg 78O

Caagaacacic aaactgacgt agacggtgca atgcaaatct gcc.gtaaact tattgtagaa 84 O tacgtagcaa aatact taga agtaattgaa got actaaaa atgatgaaag tatttctitta 9 OO gatttacgta aatat cittga tigcaatgctt tacagtatta gtggaaacgt agtatggtct 96.O ttagaatgcc ct cqttataa cccagatgtt tottttalaca aaacacaatt agaatggatg O2O cgtcaagg to titc catctitt agagt cittgt cct gitat tag citcqttctic c agagatagat O8O tctgatgaaa gtgctgtttc accaa.cagct gatgaat cag attctacaga agatagttta 14 O ggttctggitt cacgtcaaga cagttcatta t c tactggtc. ttagtttatic accagttcat 2OO tctaatgagg gaaaag actt acaacgtgtt gatactgacc at atttittitt cqaaaaag.ca 26 O gtattagagg citcct tatga ttacatagct agtatgcctt ctaaaggtgt acgtgat caa 32O tt cattgacg ct cittaacga ttggittacgt gttcc tdacg taaaagttgg taaaatcaaa 38O gacgctgttc gtgtact tca taatagttca ttatt attag atgattt coa agacaattica 44 O c cattacgta gaggtaaacc titc tact cat aacatttittg gtag togcaca aac agittaat SOO acagcaa.cat act caatcat taaagctatt gga caaataa toggaattitt c togctggtgaa 560 agtgtacaag aagttatgaa citcaattatg attittatt co aaggccaagc tatggattta 62O ttctggacat ataatggaca togttc catca gaagaagagt attatcgitat gattgaccala 68O aaaactggtc. aattatt ct c tattgcaa.ca agt cittctitc ttaatgcagc tigataatgaa 74 O ataccacgta ctaaaattica at catgtc.tt caccgtttaa cacgtttatt aggtogttgt 8OO tittcaaattic gtgacgacta t caaaactta gitatctgctg attatactaa acaaaaaggit 86 O ttttgttgaag accttgatga gqqtaaatgg totttagctt taatt cacat gatt cacaaa 92 O Caacgtag to a catggcatt attaaatgtt ttaagtacag gtcgtaalaca tdgtgg tatg 98 O actittagagc aaaaacaatt cqtacttgat attattgaag aggaaaaatc tittagattat 2O4. O acacgttcag titatgatgga cittacacgtt caattacgtg ctgaaattgg tog tattgag 21OO US 2012/0058535 A1 Mar. 8, 2012 38

- Continued atcc titt tag attct c ctaa toctogctato agacttittat tagaattatt acgtgtt 2157

<210s, SEQ ID NO 5 &211s LENGTH: 21.96 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Codon optimized sequence 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (2158) ... (21.96) <223> OTHER INFORMATION: Tag <4 OOs, SEQUENCE: 5 atggaattta aatatt caga agttgtagaa ccatcaactt actatacaga aggattatgt 6 O gaagg tattg atgitacgitaa atcaaaattt act actittag aagat.cgtgg togctatt cqt 12 O gcacacgaag actggaacaa acacattggit C catgtggtgaat atcgtgg cacattaggit 18O ccacgttitta gttittattitc agttgcagta cct gaatgca titccagaaag attagaagtt 24 O atat cittatgctaatgagtt cqcttittctt cacgatgatg taact gacca cqttggit cac 3OO gacacaggag aggttgaaaa catgaaatg atgactgtat ttittagaagc tigCacataca 360 ggtgctattg acacttctaa taaagtagat attcgt.cgtg ctggtaaaaa acg tatt caa 42O t citcaactitt ttittagaaat gcttgctatt gatcc tdaat gtgctaaaac aac tatgaaa 48O agttgggcac gtttcgtaga gg taggttca agt cqtcagc acgaaacticg ttttgtagaa 54 O ttagcaaaat acattccata ccg tattatg gatgttggtgaaatgttittg gttcggttta 6OO gttacttittg gtttaggttt acatatt cot gat catgagt tagaactittg tagagaactt 660 atggctaatg Cttggattgc agtaggttta caaaatgata tittggagttg gccaaaagaa 72 O cgtgatgctg caa.cattaca togg taaagat catgtagtta atgcaatttg ggittittaatg 78O

Caagaacacic aaactgacgt agacggtgca atgcaaatct gcc.gtaaact tattgtagaa 84 O tacgtagcaa aatact taga agtaattgaa got actaaaa atgatgaaag tatttctitta 9 OO gatttacgta aatat cittga tigcaatgctt tacagtatta gtggaaacgt agtatggtct 96.O ttagaatgcc ct cqttataa cccagatgtt tottttalaca aaacacaatt agaatggatg O2O cgtcaagg to titc catctitt agagt cittgt cct gitat tag citcqttctic c agagatagat O8O tctgatgaaa gtgctgtttc accaa.cagct gatgaat cag attctacaga agatagttta 14 O ggttctggitt cacgtcaaga cagttcatta t c tactggtc. ttagtttatic accagttcat 2OO tctaatgagg gaaaag actt acaacgtgtt gatactgacc at atttittitt cqaaaaag.ca 26 O gtattagagg citcct tatga ttacatagct agtatgcctt ctaaaggtgt acgtgat caa 32O tt cattgacg ct cittaacga ttggittacgt gttcc tdacg taaaagttgg taaaatcaaa 38O gacgctgttc gtgtact tca taatagttca ttatt attag atgattt coa agacaattica 44 O c cattacgta gaggtaaacc titc tact cat aacatttittg gtag togcaca aac agittaat SOO acagcaa.cat act caatcat taaagctatt gga caaataa toggaattitt c togctggtgaa 560 agtgtacaag aagttatgaa citcaattatg attittatt co aaggccaagc tatggattta 62O ttctggacat ataatggaca togttc catca gaagaagagt attatcgitat gattgaccala 68O aaaactggtc. aattatt ct c tattgcaa.ca agt cittctitc ttaatgcagc tigataatgaa 74 O ataccacgta ctaaaattica at catgtc.tt caccgtttaa cacgtttatt aggtogttgt 8OO US 2012/0058535 A1 Mar. 8, 2012 39

- Continued tittcaaattic gtgacgacta t caaaactta gitatctgctg attatactaa acaaaaaggit 1860 ttttgttgaag accttgatga gqqtaaatgg totttagctt taatt cacat gatt cacaaa 1920 Caacgtag to a catggcatt attaaatgtt ttaagtacag gtcgtaalaca tdgtgg tatg 198O actittagagc aaaaacaatt cqtacttgat attattgaag aggaaaaatc tittagattat 2O4. O acacgttcag titatgatgga cittacacgtt caattacgtg ctgaaattgg tog tattgag 21OO atcc titt tag attct c ctaa toctogctato agacttittat tagaattatt acgtgttacc 216 O gg tagtgctt ggt cacaccc tica atttgag aaataa 21.96

<210s, SEQ ID NO 6 &211s LENGTH: 731 212. TYPE: PRT <213> ORGANISM: Phomopsis amygdali 22 Os. FEATURE: <221s NAME/KEY: MISC FEATURE <222s. LOCATION: (720) . . (731) <223> OTHER INFORMATION: Strep tag II <4 OOs, SEQUENCE: 6 Met Glu Phe Llys Tyr Ser Glu Val Val Glu Pro Ser Thr Tyr Tyr Thr 1. 5 1O 15 Glu Gly Lieu. Cys Glu Gly Ile Asp Val Arg Llys Ser Llys Phe Thir Thr 2O 25 3O Lieu. Glu Asp Arg Gly Ala Ile Arg Ala His Glu Asp Trp Asn Llys His 35 4 O 45 Ile Gly Pro Cys Gly Glu Tyr Arg Gly Thr Lieu. Gly Pro Arg Phe Ser SO 55 6 O Phe Ile Ser Val Ala Val Pro Glu. Cys Ile Pro Glu Arg Lieu. Glu Val 65 70 7s 8O Ile Ser Tyr Ala Asn. Glu Phe Ala Phe Lieu. His Asp Asp Val Thr Asp 85 90 95 His Val Gly His Asp Thr Gly Glu Val Glu Asn Asp Glu Met Met Thr 1OO 105 11 O Val Phe Lieu. Glu Ala Ala His Thr Gly Ala Ile Asp Thir Ser Asn Lys 115 12 O 125 Val Asp Ile Arg Arg Ala Gly Lys Lys Arg Ile Glin Ser Glin Lieu. Phe 13 O 135 14 O Lieu. Glu Met Lieu Ala Ile Asp Pro Glu. Cys Ala Lys Thir Thr Met Lys 145 150 155 160 Ser Trp Ala Arg Phe Val Glu Val Gly Ser Ser Arg Gln His Glu Thr 1.65 17O 17s Arg Phe Val Glu Lieu Ala Lys Tyr Ile Pro Tyr Arg Ile Met Asp Wall 18O 185 19 O Gly Glu Met Phe Trp Phe Gly Lieu Val Thr Phe Gly Lieu. Gly Lieu. His 195 2OO 2O5 Ile Pro Asp His Glu Lieu. Glu Lieu. Cys Arg Glu Lieu Met Ala Asn Ala 21 O 215 22O Trp Ile Ala Val Gly Lieu. Glin Asn Asp Ile Trp Ser Trp Pro Lys Glu 225 23 O 235 24 O Arg Asp Ala Ala Thr Lieu. His Gly Lys Asp His Val Val Asn Ala Ile 245 250 255 Trp Val Lieu Met Glin Glu. His Glin Thr Asp Val Asp Gly Ala Met Glin US 2012/0058535 A1 Mar. 8, 2012 40

- Continued

26 O 265 27 O

Ile Arg Lell Ile Wall Glu Wall Ala Tyr Luell Glu Wall 27s 28O 285

Ile Glu Ala Thir Lys Asn Asp Glu Ser Ile Ser Lell Asp Luell Arg 29 O 295 3 OO

Tyr Luell Asp Ala Met Lell Ser Ile Ser Gly Asn Wall Wall Trp Ser 3. OS 310 315

Lell Glu Pro Arg Asn Pro Asp Wall Ser Phe Asn Thir Glin 3.25 330 335

Lell Glu Trp Met Arg Glin Gly Luell Pro Ser Luell Glu Ser Cys Pro Wall 34 O 345 35. O

Lell Ala Arg Ser Pro Glu Ile Asp Ser Asp Glu Ser Ala Wall Ser Pro 355 360 365

Thir Ala Asp Glu Ser Asp Ser Thir Glu Asp Ser Lell Gly Ser Gly Ser 37 O 375

Arg Glin Asp Ser Ser Lell Ser Thir Gly Luell Ser Lell Ser Pro Wall His 385 390 395 4 OO

Ser Asn Glu Gly Lys Asp Lell Glin Arg Wall Asp Thir Asp His Ile Phe 4 OS 415

Phe Glu Ala Wall Lell Glu Ala Pro Asp Ile Ala Ser Met 425 43 O

Pro Ser Lys Gly Wall Arg Asp Glin Phe Ile Asp Ala Lell Asn Asp Trp 435 44 O 445

Lell Arg Wall Pro Asp Wall Lys Wall Gly Ile Lys Asp Ala Wall Arg 450 45.5 460

Wall Luell His Asn Ser Ser Lell Luell Luell Asp Asp Phe Glin Asp Asn Ser 465 470

Pro Luell Arg Arg Gly Pro Ser Thir His ASn Ile Phe Gly Ser Ala 485 490 495

Glin Thir Wall Asn Thir Ala Thir Ser Ile Ile Ala Ile Gly Glin SOO 505

Ile Met Glu Phe Ser Ala Gly Glu Ser Wall Glin Glu Wall Met Asn Ser 515 525

Ile Met Ile Luell Phe Glin Gly Glin Ala Met Asp Lell Phe Trp Thir 53 O 535 54 O

Asn Gly His Wall Pro Ser Glu Glu Glu Tyr Arg Met Ile Asp Glin 5.45 550 555 560

Thir Gly Glin Lell Phe Ser Ile Ala Thir Ser Lell Lell Luell Asn Ala 565 st O sts

Ala Asp Asn Glu Ile Pro Arg Thir Lys Ile Glin Ser Luell His Arg 585 59 O

Lell Thir Arg Luell Lell Gly Arg Cys Phe Glin Ile Arg Asp Asp Glin 595 605

Asn Luell Wall Ser Ala Asp Tyr Thir Glin Lys Gly Phe Glu Asp 610 615

Lell Asp Glu Gly Lys Trp Ser Luell Ala Luell Ile His Met Ile His Lys 625 630 635 64 O

Glin Arg Ser His Met Ala Lell Luell Asn Wall Luell Ser Thir Gly Arg 645 650 655

His Gly Gly Met Thir Lell Glu Glin Lys Glin Phe Wall Lell Asp Ile Ile 660 665 67 O US 2012/0058535 A1 Mar. 8, 2012 41

- Continued

Glu Glu Glu Lys Ser Lieu. Asp Tyr Thr Arg Ser Val Met Met Asp Lieu. 675 68O 685 His Val Glin Lieu. Arg Ala Glu Ile Gly Arg Ile Glu Ile Lieu. Lieu. Asp 69 O. 695 7 OO Ser Pro Asn Pro Ala Met Arg Lieu. Lieu. Lieu. Glu Lieu. Lieu. Arg Val Thr 7 Os 71O 71s 72O Gly Ser Ala Trp Ser His Pro Glin Phe Glu Lys 72 73 O

<210s, SEQ ID NO 7 &211s LENGTH: 2157 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Codon optimized seqeunce <4 OO > SEQUENCE: 7 atggaattta aatatt caga agttgttgaa ccatcaa.cat attatacaga aggtt tatgt 6 O gaagg tattg atgttcgtaa atcaaaattt acaac attag aagat.cgtgg togctatt cqt 12 O gct catgaag attggaataa acatattggit C catgtggtgaat atcgtgg tacattaggit 18O ccacgtttitt catttattitc agttgctgtt coagaatgta titccagaacg tittagaagtt 24 O attt catacg ctaatgaatt togcttttitta catgatgatgttacagat.ca tdttggit cat 3OO gatacaggtg aagttgaaaa tatgaaatg atgacagttt ttittagaagc tigct cataca 360 ggtgctattgata catcaaa taaagttgat attcgt.cgtg ctggtaaaaa acg tatt caa 42O t cacaattat ttittagaaat gttagctatt gatccagaat gtgctaaaac aacaatgaaa 48O t catgggctic gttttgttga agttggttca toacgtcaac atgaaacacg ttttgttgaa 54 O ttagctaaat at attc cata t cqtattatg gatgttggtgaaatgttittg gtttggttta 6OO gttacatttg gtttaggttt acatatt coa gat catgaat tagaattatgtcgtgaactt 660 atggctaatg Cttggattgc tigttggttta caaaatgata tittggtcatg gccaaaagaa 72 O cgtgatgctg ctacattaca togg taaagat catgttgtta atgct atttg ggittittaatg 78O caagaa catc aaacagatgt tdatggtgct atgcaaattt gtcgtaaact tattgttgaa 84 O tatgttgcta aatatttaga agittattgaa got acaaaaa atgatgaatc aattt catta 9 OO gatttacgta aatatttaga tigctatotta tatt caattt caggtaatgt tdtttggtca 96.O ttagaatgtc. cacgittataa to cagatgtt to atttaata aaacacaatt agaatggatg O2O cgtcaaggitt taccat catt agaat catgt ccagttt tag citcqttcacc agaaattgat O8O t cagatgaat cagcagtttc accaactgct gatgaat cag attcaacaga agatt catta 14 O ggttcaggitt cacgtcaaga titcat catta t caac aggtt tat cattatic accagttcat 2OO tcaaatgaag gtaaagattt acaacgtgtt gatacagatc at atttittitt tdaaaaagct 26 O gttittagaag ctic catacga ttatattgct tcaatgc.cat caaaaggtgt togtgaccala 32O tittattgatg ctittaaatga ttggittacgt gttccagatgttaaagttgg taaaattaaa 38O gatgctgttc gtgttttaca taatt catca ttatt attag atgattitt.ca agataattica 44 O c cattacgt.c gtggtaaacc atcaacacat aatatttittg gttcagotca aac agittaat SOO acagctacat attcaattat taaagctatt ggtcaaatta toggaattitt c togctggtgag 560 t cagttcaag aagttatgaa citcaattatg attittatttic aaggit caagc tatggattta 62O US 2012/0058535 A1 Mar. 8, 2012 42

- Continued ttittggacat ataatggit ca togttc catca gaagaagaat attatcgitat gattgaccala 168O aaaacagg to aattattittcaattgctaca toattatt at taaatgctgc tigataatgaa 1740 attic cacgta caaaaattica at catgttta catcgtttaa cacgtttatt aggtogttgt 18OO tittcaaattic gtgatgatta t caaaattta gtttctgctg attacactaa acaaaaagga 1860 ttctgttgaag atttagatga agg taaatgg to attagctt taatt cacat gatt cataaa 1920 caacgttcac acatggctitt attaaatgtt ttatcaa.cag gtcgtaalaca tdgtgg tatg 198O acattagaac aaaaacaatt tdttittagat attattgaag aagaaaaatc attagattat 2O4. O acacgttcag titatgatgga t citt catgtt caattacgtg ctgaaattgg tog tattgaa 21OO attitt attag attcaccalaa to cagctato cqttt attat tagaattatt acgtgtt 2157

<210s, SEQ ID NO 8 &211s LENGTH: 21.96 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Codon optimized sequence 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (2158) ... (21.96) <223> OTHER INFORMATION: Tag <4 OOs, SEQUENCE: 8 atggaattta aatatt caga agttgttgaa ccatcaa.cat attatacaga aggtt tatgt 6 O gaagg tattg atgttcgtaa atcaaaattt acaac attag aagat.cgtgg togctatt cqt 12 O gct catgaag attggaataa acatattggit C catgtggtgaat atcgtgg tacattaggit 18O ccacgtttitt catttattitc agttgctgtt coagaatgta titccagaacg tittagaagtt 24 O attt catacg ctaatgaatt togcttttitta catgatgatgttacagat.ca tdttggit cat 3OO gatacaggtg aagttgaaaa tatgaaatg atgacagttt ttittagaagc tigct cataca 360 ggtgctattgata catcaaa taaagttgat attcgt.cgtg ctggtaaaaa acg tatt caa 42O t cacaattat ttittagaaat gttagctatt gatccagaat gtgctaaaac aacaatgaaa 48O t catgggctic gttttgttga agttggttca toacgtcaac atgaaacacg ttttgttgaa 54 O ttagctaaat at attc cata t cqtattatg gatgttggtgaaatgttittg gtttggttta 6OO gttacatttg gtttaggttt acatatt coa gat catgaat tagaattatgtcgtgaactt 660 atggctaatg Cttggattgc tigttggttta caaaatgata tittggtcatg gccaaaagaa 72 O cgtgatgctg ctacattaca togg taaagat catgttgtta atgct atttg ggittittaatg 78O caagaa catc aaacagatgt tdatggtgct atgcaaattt gtcgtaaact tattgttgaa 84 O tatgttgcta aatatttaga agittattgaa got acaaaaa atgatgaatc aattt catta 9 OO gatttacgta aatatttaga tigctatotta tatt caattt caggtaatgt tdtttggtca 96.O ttagaatgtc. cacgittataa to cagatgtt to atttaata aaacacaatt agaatggatg 1 O2O cgtcaaggitt taccat catt agaat catgt ccagttt tag citcqttcacc agaaattgat 108 O t cagatgaat cagcagtttc accaactgct gatgaat cag attcaacaga agatt catta 114 O ggttcaggitt cacgtcaaga titcat catta t caac aggtt tat cattatic accagttcat 12 OO tcaaatgaag gtaaagattt acaacgtgtt gatacagatc at atttittitt tdaaaaagct 126 O gttittagaag ctic catacga ttatattgct tcaatgc.cat caaaaggtgt togtgaccala 132O

US 2012/0058535 A1 Mar. 8, 2012 45

- Continued Glu. His Gly Asp Llys Val Trp Leu Phe Pro Glu Ser Phe Llys Tyr Lieu. SO 55 6 O Lieu. Glu Lys Glin Gly Glu Asp Gly Ser Trp Glu Arg His Pro Arg Ser 65 70 7s 8O Llys Thr Val Gly Val Lieu. Asn. Thir Ala Ala Ala Cys Lieu Ala Lieu. Lieu. 85 90 95 Arg His Val Lys Asn Pro Lieu. Glin Lieu. Glin Asp Ile Ala Ala Glin Asp 1OO 105 11 O Ile Glu Lieu. Arg Ile Glin Arg Gly Lieu. Arg Ser Lieu. Glu Glu Glin Lieu. 115 12 O 125 Ile Ala Trp Asp Asp Val Lieu. Asp Thr Asn His Ile Gly Val Glu Met 13 O 135 14 O Ile Val Pro Ala Lieu. Lieu. Asp Tyr Lieu. Glin Ala Glu Asp Glu Asn Val 145 150 155 160 Asp Phe Glu Phe Glu Ser His Ser Lieu. Leu Met Gln Met Tyr Lys Glu 1.65 17O 17s Llys Met Ala Arg Phe Ser Pro Glu Ser Lieu. Tyr Arg Ala Arg Pro Ser 18O 185 19 O Ser Ala Lieu. His Asn Lieu. Glu Ala Lieu. Ile Gly Llys Lieu. Asp Phe Asp 195 2OO 2O5 Llys Val Gly His His Lieu. Tyr Asn Gly Ser Met Met Ala Ser Pro Ser 21 O 215 22O Ser Thr Ala Ala Phe Leu Met His Ala Ser Pro Trp Ser His Glu Ala 225 23 O 235 24 O Glu Ala Tyr Lieu. Arg His Val Phe Glu Ala Gly Thr Gly Lys Gly Ser 245 250 255 Gly Gly Phe Pro Gly Thr Tyr Pro Thr Thr Tyr Phe Glu Lieu. Asn Trp 26 O 265 27 O Val Lieu. Ser Thr Lieu Met Lys Ser Gly Phe Thr Lieu Ser Asp Leu Glu 27s 28O 285 Cys Asp Glu Lieu. Ser Ser Ile Ala Asn. Thir Ile Ala Glu Gly Phe Glu 29 O 295 3 OO Cys Asp His Gly Val Ile Gly Phe Ala Pro Arg Ala Val Asp Val Asp 3. OS 310 315 32O Asp Thir Ala Lys Gly Lieu. Lieu. Thir Lieu. Thir Lieu. Lieu. Gly Met Asp Glu 3.25 330 335 Gly Val Ser Pro Ala Pro Met Ile Ala Met Phe Glu Ala Lys Asp His 34 O 345 35. O Phe Lieu. Thr Phe Lieu. Gly Glu Arg Asp Pro Ser Phe Thr Ser Asn Cys 355 360 365 His Val Lieu. Lieu. Ser Lieu. Lieu. His Arg Thir Asp Lieu. Lieu. Glin Tyr Lieu. 37 O 375 38O Pro Glin Ile Arg Lys Thir Thr Thr Phe Lieu. Cys Glu Ala Trp Trp Ala 385 390 395 4 OO Cys Asp Gly Glin Ile Lys Asp Llys Trp His Lieu. Ser His Lieu. Tyr Pro 4 OS 41O 415 Thir Met Lieu Met Val Glin Ala Phe Ala Glu Ile Lieu Lleu Lys Ser Ala 42O 425 43 O Glu Gly Glu Pro Lieu. His Asp Ala Phe Asp Ala Ala Thr Lieu. Ser Arg 435 44 O 445 Val Ser Ile Cys Val Phe Glin Ala Cys Lieu. Arg Thr Lieu. Lieu Ala Glin US 2012/0058535 A1 Mar. 8, 2012 46

- Continued

450 45.5 460 Ser Glin Asp Gly Ser Trp His Gly Glin Pro Glu Ala Ser Cys Tyr Ala 465 470 47s 48O Val Lieu. Thir Lieu Ala Glu Ser Gly Arg Lieu Val Lieu. Lieu. Glin Ala Lieu 485 490 495 Glin Pro Glin Ile Ala Ala Ala Met Glu Lys Ala Ala Asp Wal Met Glin SOO 505 51O Ala Gly Arg Trp Ser Cys Ser Asp His Asp Cys Asp Trp Thir Ser Lys 515 52O 525 Thir Ala Tyr Arg Val Asp Lieu Val Ala Ala Ala Tyr Arg Lieu Ala Ala 53 O 535 54 O Met Lys Ala Ser Ser Asn Lieu. Thir Phe Thr Val Asp Asp Asin Val Ser 5.45 550 555 560 Lys Arg Ser Asn Gly Phe Glin Glin Lieu Val Gly Arg Thr Asp Lieu. Phe 565 st O sts Ser Gly Val Pro Ala Trp Glu Lieu. Glin Ala Ser Phe Lieu. Glu Ser Ala 58O 585 59 O Lieu. Phe Val Pro Lieu. Lieu. Arg Asn His Arg Lieu. Asp Val Phe Asp Arg 595 6OO 605 Asp Asp Ile Llys Val Ser Lys Asp His Tyr Lieu. Asp Met Ile Pro Phe 610 615 62O Thir Trp Val Gly Cys Asn Asn Arg Ser Arg Thr Tyr Val Ser Thr Ser 625 630 635 64 O Phe Leu Phe Asp Met Met Ile Ile Ser Met Leu Gly Tyr Glin Ile Asp 645 650 655 Glu Phe Phe Glu Ala Glu Ala Ala Pro Ala Phe Ala Glin Cys Ile Gly 660 665 67 O Glin Lieu. His Glin Val Val Asp Llys Val Val Asp Glu Val Ile Asp Glu 675 68O 685 Val Val Asp Llys Val Val Gly Llys Val Val Gly Llys Val Val Gly Lys 69 O. 695 7 OO Val Val Asp Glu Arg Val Asp Ser Pro Thr His Glu Ala Ile Ala Ile 7 Os 71O 71s 72O Cys Asn. Ile Glu Ala Ser Lieu. Arg Arg Phe Val Asp His Val Lieu. His 72 73 O 73 His Gln His Val Lieu. His Ala Ser Glin Glin Glu Glin Asp Ile Lieu. Trp 740 74. 7 O Arg Glu Lieu. Arg Ala Phe Lieu. His Ala His Val Val Glin Met Ala Asp 7ss 760 765 Asn Ser Thr Lieu Ala Pro Pro Gly Arg Thr Phe Phe Asp Trp Val Arg 770 775 78O Thir Thr Ala Ala Asp His Val Ala Cys Ala Tyr Ser Phe Ala Phe Ala 78s 79 O 79. 8OO Cys Cys Ile Thr Ser Ala Thr Ile Gly Glin Gly Glin Ser Met Phe Ala 805 810 815 Thr Val Asn. Glu Lieu. Tyr Lieu Val Glin Ala Ala Ala Arg His Met Thr 82O 825 83 O Thir Met Cys Arg Met Cys Asn Asp Ile Gly Ser Val Asp Arg Asp Phe 835 84 O 845

Ile Glu Ala ASn Ile ASn Ser Wal His Phe Pro Glu Phe Ser Thir Lieu. 850 855 860 US 2012/0058535 A1 Mar. 8, 2012 47

- Continued

Ser Lieu Val Ala Asp Llys Llys Lys Ala Lieu Ala Arg Lieu Ala Ala Tyr 865 87O 87s 88O Glu Lys Ser Cys Lieu. Thir His Thr Lieu. Asp Glin Phe Glu Asn. Glu Val 885 890 895 Lieu. Glin Ser Pro Arg Val Ser Ser Ala Ala Ser Gly Asp Phe Arg Thr 9 OO 905 91 O Arg Llys Val Ala Val Val Arg Phe Phe Ala Asp Val Thr Asp Phe Tyr 915 92 O 925 Asp Gln Lieu. Tyr Ile Lieu. Arg Asp Lieu. Ser Ser Ser Lieu Lys His Val 93 O 935 94 O Gly Thr 945

<210s, SEQ ID NO 11 &211s LENGTH: 2835 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Codon optimized sequence

<4 OOs, SEQUENCE: 11 tittgctaaat ttgatatgtt agaagaagaa got cqtgctt tagttcgitaa agttggtaat 6 O gctgttgatc caattitatgg tttittcaa.ca acat catgtc. aaattitatga tacagcttgg 12 O gctgctatga tittcaaaaga agaacatggit gataaagttt ggittattitcc agaat cattt 18O aaat attt at tagaaaaa.ca aggtgaagat ggttcatggg aacgt catcc acgttcaaaa 24 O acagttggtg ttittaaatac togctgctgct tdtttagctt tattacgt.ca tottaaaaat 3OO c cattacaat tacaagat at togctgcticaa gat attgaat tacg tatt ca acgtggttta 360 cgttcattag aagaacaact tattgcttgg gatgatgttt tagatacaaa toatattggit 42O gttgaaatga ttgttc.ca.gc titt attagat tatttacaag ctgaagatga aaatgttgat 48O tittgaatttgaat cacattc attact tatgcaaatgtata aagaaaaaat ggct cqttitt 54 O t caccagaat cattatat cq togct cqtcca toatcagctt tacataattt agaagct citt 6OO attggtaaat tagattittga taaagttggit cat catttat ataatggttcaatgatggct 660 t cac catcat caa.cagoagc titttittaatg cacgctt cac cittggtcaca tdaagct gag 72 O gctt atttac git catgtttt taagctggit acaggtaaag gttcaggtgg ttitt C Caggit 78O acatat coaa caa.cat attt tdaattaa at tdggttittat caacactitat gaaat caggit 84 O titta cattat cagatttaga atgtgatgaa ttatcatcaa ttgctaatac aattgctgaa 9 OO ggttittgaat gtgat catgg tittattggit tttgctic cac gtgctgttga tigttgatgat 96.O acagctaaag gtttattaac attaa catta ttagg tatgg atgaaggtgt ttcaccagot 1 O2O c caatgattig citatgtttga agctaaagat cattttittaa catttittagg tdaacgtgat 108 O c cat cattta catcaaattig to atgttitta titat cattat tacatcqtac agatttatta 114 O caat atttac cacaaatticq taaaacaa.ca acatttittat gtgaggctt g g togggcttgt 12 OO gatggit caaa ttaaagataa atggcattta t cacatttat atccaacaat gttaatggitt 126 O caggcttittg ctgaaattitt attaaaatct gctgaaggtgaac cattaca tdatgcttitt 132O gatgctgcta cattat cacg togtttcaatt tdtgtttitt.c aggcttgttt acgtacatta 1380 ttagct caat cacaagatgg tt catggcat gigt caac cag aggctt catgttatgctgtt 144 O US 2012/0058535 A1 Mar. 8, 2012 48

- Continued ttaa cattag ctgaat cagg togtttagtt ttattacaag cattaca acc acaaattgct SOO gctgctatgg aaaaagctgc tigatgttatgcaa.gctggtc gttggtcatgttcagat cat 560 gattgttgatt gga catcaaa alacagctitat cqtgttgatt tagttgctgc tigctitat cqt 62O ttagctgcta tdaaag catc atcaaattta acatttacag ttgatgataa tdtttcaaaa 68O cgttcaaatg gttittcaa.ca attagttggit cqtacagatt tattitt cagg togttc.ca.gct 74 O tgggaattac aag cat catt tttagaatca gctittatttgttc cattatt acgtaat cat 8OO cgtttagatgtttittgat cq tdatgatatt aaagtttcaa aagat catta tittagatatg 86 O attic cattta catgggttgg ttgtaataat cqttcacgta catacgtttic aac at cattt 92 O ttatttgata tdatgatt at ttcaatgtta ggittatcaaa ttgatgaatt ttittgaagct 98 O gaagctgctic cagcttittgc ticaatgtatt ggit caattac atcaagttgt tdataaagtt 2O4. O gttgatgaag ttattgatga agttgtagat aaagttgttg gtaaagttgt aggtaaagtt 21OO gttggtaaag ttgttgatga acgtgttgat t caccaacac atgaagctat togctatttgt 216 O aatattgaag cat cattacg togttttgtt gat catgttt tacat catca acatgttitta 222 O catgctt cac aacaagaa.ca agatattitta tdgcgtgaat tacgtgctitt tttacatgct 228O catgttgttcaaatggctga taattcaa.ca ttagctic cac caggit cqtac atttitttgat 234 O tgggttcgta caactgctgc tigatcatgtt gcttgtgctt att catttgc titttgcttgt 24 OO tg tattacat cagctacaat tdgtolaaggt caatcaatgt ttgctacagt taatgaatta 246 O tatttagttcaagctgctgc ticgtcacatg acaacaatgt gtcg tatgtg taatgatatt 252O ggttcagttg atcgtgattt tattgaagct aat attaact cagttcattt to cagaattit 2580 t caa.cattat cattagttgc tigataaaaaa aaa.gcattag citcqtttagc tigct tatgaa 264 O aaat catgtt taacacatac attagat caa tittgaaaatgaagttttaca at caccacgt. 27 OO gttt catcag cagctt cagg togattitt.cgt acacgtaaag ttgctgttgt togtttittitt 276 O gctgatgtta cagatttitta tdatcaatta tatattittac gtgattitatic atcat catta 282O aaacatgttg gtaca 2835

<210s, SEQ ID NO 12 &211s LENGTH: 10 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Tag <4 OOs, SEQUENCE: 12 Met Asp Tyr Lys Asp Asp Asp Asp Llys Gly 1. 5 1O

<210s, SEQ ID NO 13 &211s LENGTH: 287.4 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Codon optimized sequence 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (1) ... (30) <223> OTHER INFORMATION: Tag <4 OOs, SEQUENCE: 13

US 2012/0058535 A1 Mar. 8, 2012 50

- Continued tgg.cgtgaat tacgtgcttt tttacatgct catgttgttcaaatggctga taattcaa.ca 234 O ttagct coac caggtogtac atttitttgat tdggttcgta caactgctgc tigatcatgtt 24 OO gcttgtgctt att catttgc titttgcttgt tdt attacat cagot acaat togg to aaggit 246 O caat caatgt ttgctacagt taatgaatta tatttagttcaagctgctgc ticgtcacatg 252O acaacaatgt gtcg tatgtg taatgatatt ggttcagttg atcgtgattt tattgaagct 2580 aat attaact cagttcattt to cagaattt toaac attat cattagttgc tigataaaaaa 264 O aaag cattag citcgtttago togct tatgaaaaatcatgtt taacacatac attagat caa 27 OO tittgaaaatgaagttttaca at caccacgt gttt catcag cagcttcagg tdattitt cqt 276 O acacgtaaag ttgctgttgt togttitttitt gctgatgtta cagatttitta tdatcaatta 282O tatattittac gtgattitatic atcat catta aaa catgttg gtaca accqg ttaa 287.4

<210s, SEQ ID NO 14 &211s LENGTH: 955 212. TYPE: PRT <213> ORGANISM: Phaeosphaeria nodorum 22 Os. FEATURE: <221s NAME/KEY: MISC FEATURE <222s. LOCATION: (1) . . (10 <223> OTHER INFORMATION: Tag <4 OOs, SEQUENCE: 14 Met Asp Tyr Lys Asp Asp Asp Asp Lys Gly Phe Ala Lys Phe Asp Met 1. 5 1O 15 Lieu. Glu Glu Glu Ala Arg Ala Lieu Val Arg Llys Val Gly Asn Ala Val 2O 25 3O Asp Pro Ile Tyr Gly Phe Ser Thr Thr Ser Cys Glin Ile Tyr Asp Thr 35 4 O 45 Ala Trp Ala Ala Met Ile Ser Lys Glu Glu. His Gly Asp Llys Val Trip SO 55 6 O Lieu. Phe Pro Glu Ser Phe Llys Tyr Lieu. Lieu. Glu Lys Glin Gly Glu Asp 65 70 7s 8O Gly Ser Trp Glu Arg His Pro Arg Ser Lys Thr Val Gly Val Lieu. Asn 85 90 95 Thir Ala Ala Ala Cys Lieu Ala Lieu. Lieu. Arg His Val Lys Asn Pro Lieu. 1OO 105 11 O Glin Lieu. Glin Asp Ile Ala Ala Glin Asp Ile Glu Lieu. Arg Ile Glin Arg 115 12 O 125 Gly Lieu. Arg Ser Lieu. Glu Glu Glin Lieu. Ile Ala Trp Asp Asp Val Lieu. 13 O 135 14 O Asp Thr Asn His Ile Gly Val Glu Met Ile Val Pro Ala Lieu. Lieu. Asp 145 150 155 160 Tyr Lieu. Glin Ala Glu Asp Glu Asn Val Asp Phe Glu Phe Glu Ser His 1.65 17O 17s Ser Lieu. Leu Met Gln Met Tyr Lys Glu Lys Met Ala Arg Phe Ser Pro 18O 185 19 O Glu Ser Lieu. Tyr Arg Ala Arg Pro Ser Ser Ala Lieu. His Asn Lieu. Glu 195 2OO 2O5 Ala Lieu. Ile Gly Llys Lieu. Asp Phe Asp Llys Val Gly. His His Lieu. Tyr 21 O 215 22O Asn Gly Ser Met Met Ala Ser Pro Ser Ser Thr Ala Ala Phe Leu Met 225 23 O 235 24 O US 2012/0058535 A1 Mar. 8, 2012 51

- Continued

His Ala Ser Pro Trp Ser His Glu Ala Glu Ala Tyr Lieu. Arg His Val 245 250 255 Phe Glu Ala Gly Thr Gly Lys Gly Ser Gly Gly Phe Pro Gly Thr Tyr 26 O 265 27 O Pro Thir Thr Tyr Phe Glu Lieu. Asn Trp Val Lieu Ser Thr Lieu Met Lys 27s 28O 285 Ser Gly Phe Thir Lieu. Ser Asp Lieu. Glu. Cys Asp Glu Lieu. Ser Ser Ile 29 O 295 3 OO Ala Asn. Thir Ile Ala Glu Gly Phe Glu. Cys Asp His Gly Val Ile Gly 3. OS 310 315 32O Phe Ala Pro Arg Ala Val Asp Wall Asp Asp Thr Ala Lys Gly Lieu. Lieu. 3.25 330 335 Thr Lieu. Thir Lieu. Lieu. Gly Met Asp Glu Gly Val Ser Pro Ala Pro Met 34 O 345 35. O Ile Ala Met Phe Glu Ala Lys Asp His Phe Lieu. Thir Phe Lieu. Gly Glu 355 360 365 Arg Asp Pro Ser Phe Thir Ser Asn. Cys His Val Lieu Lleu Ser Lieu. Lieu 37 O 375 38O His Arg Thr Asp Lieu. Lieu. Glin Tyr Lieu Pro Glin Ile Arg Llys Thir Thr 385 390 395 4 OO Thir Phe Lieu. Cys Glu Ala Trp Trp Ala Cys Asp Gly Glin Ile Lys Asp 4 OS 41O 415 Lys Trp His Leu Ser His Lieu. Tyr Pro Thr Met Leu Met Val Glin Ala 42O 425 43 O Phe Ala Glu Ile Lieu Lleu Lys Ser Ala Glu Gly Glu Pro Lieu. His Asp 435 44 O 445 Ala Phe Asp Ala Ala Thr Lieu. Ser Arg Val Ser Ile Cys Val Phe Glin 450 45.5 460 Ala Cys Lieu. Arg Thr Lieu. Lieu Ala Glin Ser Glin Asp Gly Ser Trp His 465 470 47s 48O Gly Glin Pro Glu Ala Ser Cys Tyr Ala Val Lieu. Thir Lieu Ala Glu Ser 485 490 495 Gly Arg Lieu Val Lieu. Lieu. Glin Ala Lieu. Glin Pro Glin Ile Ala Ala Ala SOO 505 51O Met Glu Lys Ala Ala Asp Wal Met Glin Ala Gly Arg Trp Ser Cys Ser 515 52O 525 Asp His Asp Cys Asp Trp Thir Ser Lys Thr Ala Tyr Arg Val Asp Lieu 53 O 535 54 O Val Ala Ala Ala Tyr Arg Lieu Ala Ala Met Lys Ala Ser Ser Asn Lieu 5.45 550 555 560 Thir Phe Thr Val Asp Asp Asn Val Ser Lys Arg Ser Asn Gly Phe Glin 565 st O sts Glin Lieu Val Gly Arg Thr Asp Lieu. Phe Ser Gly Val Pro Ala Trp Glu 58O 585 59 O Lieu. Glin Ala Ser Phe Lieu. Glu Ser Ala Lieu. Phe Val Pro Lieu. Lieu. Arg 595 6OO 605 Asn His Arg Lieu. Asp Val Phe Asp Arg Asp Asp Ile Llys Val Ser Lys 610 615 62O Asp His Tyr Lieu. Asp Met Ile Pro Phe Thr Trp Val Gly Cys Asn Asn 625 630 635 64 O US 2012/0058535 A1 Mar. 8, 2012 52

- Continued Arg Ser Arg Thr Tyr Val Ser Thr Ser Phe Leu Phe Asp Met Met Ile 645 650 655 Ile Ser Met Leu Gly Tyr Glin Ile Asp Glu Phe Phe Glu Ala Glu Ala 660 665 67 O Ala Pro Ala Phe Ala Glin Cys Ile Gly Glin Lieu. His Glin Val Val Asp 675 68O 685 Llys Val Val Asp Glu Val Ile Asp Glu Val Val Asp Llys Val Val Gly 69 O. 695 7 OO Llys Val Val Gly Llys Val Val Gly Llys Val Val Asp Glu Arg Val Asp 7 Os 71O 71s 72O Ser Pro Thr His Glu Ala Ile Ala Ile Cys Asn. Ile Glu Ala Ser Lieu. 72 73 O 73 Arg Arg Phe Val Asp His Val Lieu. His His Glin His Val Lieu. His Ala 740 74. 7 O Ser Glin Glin Glu Glin Asp Ile Lieu. Trp Arg Glu Lieu. Arg Ala Phe Lieu. 7ss 760 765 His Ala His Val Val Glin Met Ala Asp Asn Ser Thr Lieu Ala Pro Pro 770 775 78O Gly Arg Thr Phe Phe Asp Trp Val Arg Thr Thr Ala Ala Asp His Val 78s 79 O 79. 8OO Ala Cys Ala Tyr Ser Phe Ala Phe Ala Cys Cys Ile Thr Ser Ala Thr 805 810 815 Ile Gly Glin Gly Glin Ser Met Phe Ala Thr Val Asn Glu Lieu. Tyr Lieu. 82O 825 83 O Val Glin Ala Ala Ala Arg His Met Thr Thr Met Cys Arg Met Cys Asn 835 84 O 845 Asp Ile Gly Ser Val Asp Arg Asp Phe Ile Glu Ala Asn. Ile Asn. Ser 850 855 860 Val His Phe Pro Glu Phe Ser Thr Lieu Ser Leu Val Ala Asp Llys Lys 865 87O 87s 88O Lys Ala Lieu Ala Arg Lieu Ala Ala Tyr Glu Lys Ser Cys Lieu. Thir His 885 890 895 Thir Lieu. Asp Glin Phe Glu Asn. Glu Val Lieu. Glin Ser Pro Arg Val Ser 9 OO 905 91 O Ser Ala Ala Ser Gly Asp Phe Arg Thr Arg Llys Val Ala Val Val Arg 915 92 O 925 Phe Phe Ala Asp Val Thr Asp Phe Tyr Asp Gln Lieu. Tyr Ile Lieu. Arg 93 O 935 94 O Asp Lieu. Ser Ser Ser Lieu Lys His Val Gly Thr 945 950 955

<210s, SEQ ID NO 15 &211s LENGTH: 1806 &212s. TYPE: DNA <213> ORGANISM: Ricinus communis

<4 OOs, SEQUENCE: 15 atggcattgc cat cagotgc tatgcaatcc aaccotgaaa agcttaactt attt cacaga 6 O ttgttcaagct tacccaccac tagcttggaa tatggcaata atcgct tcc c tittcttitt.cc 12 O t catctgcca agt cacactt taaaaaacca act caag cat gtttatcct c aacaa.cccac 18O caagaagttc gtc cattago atactitt.cct c ct actgtct gggg.caatcg ctittgct tcc 24 O

US 2012/0058535 A1 Mar. 8, 2012 54

- Continued

Lieu. Thir Phe Asin Pro Ser Glu Phe Glu Ser Tyr Asp Glu Arg Val Ile 85 90 95 Val Lieu Lys Llys Llys Wall Lys Asp Ile Lieu. Ile Ser Ser Thir Ser Asp 1OO 105 11 O Ser Val Glu Thr Val Ile Lieu. Ile Asp Lieu. Lieu. Cys Arg Lieu. Gly Val 115 12 O 125 Ser Tyr His Phe Glu Asn Asp Ile Glu Glu Lieu Lleu Ser Lys Ile Phe 13 O 135 14 O Asn Ser Glin Pro Asp Lieu Val Asp Glu Lys Glu. Cys Asp Lieu. Tyr Thr 145 150 155 160 Ala Ala Ile Val Phe Arg Val Phe Arg Gln His Gly Phe Llys Met Ser 1.65 17O 17s Ser Asp Val Phe Ser Llys Phe Lys Asp Ser Asp Gly Llys Phe Lys Glu 18O 185 19 O Ser Lieu. Arg Gly Asp Ala Lys Gly Met Lieu. Ser Lieu. Phe Glu Ala Ser 195 2OO 2O5 His Lieu. Ser Val His Gly Glu Asp Ile Lieu. Glu Glu Ala Phe Ala Phe 21 O 215 22O Thir Lys Asp Tyr Lieu. Glin Ser Ser Ala Val Glu Lieu. Phe Pro Asn Lieu. 225 23 O 235 24 O Lys Arg His Ile Thr Asn Ala Lieu. Glu Glin Pro Phe His Ser Gly Val 245 250 255 Pro Arg Lieu. Glu Ala Arg Llys Phe Ile Asp Lieu. Tyr Glu Ala Asp Ile 26 O 265 27 O Glu Cys Arg Asn. Glu Thir Lieu. Lieu. Glu Phe Ala Lys Lieu. Asp Tyr Asn 27s 28O 285 Arg Val Glin Lieu. Lieu. His Glin Glin Glu Lieu. Cys Glin Phe Ser Lys Trip 29 O 295 3 OO Trp Lys Asp Lieu. Asn Lieu Ala Ser Asp Ile Pro Tyr Ala Arg Asp Arg 3. OS 310 315 32O Met Ala Glu Ile Phe Phe Trp Ala Val Ala Met Tyr Phe Glu Pro Asp 3.25 330 335 Tyr Ala His Thir Arg Met Ile Ile Ala Lys Val Val Lieu. Lieu. Ile Ser 34 O 345 35. O Lieu. Ile Asp Asp Thir Ile Asp Ala Tyr Ala Thr Met Glu Glu Thir His 355 360 365 Ile Lieu Ala Glu Ala Val Ala Arg Trp Asp Met Ser Cys Lieu. Glu Lys 37 O 375 38O Lieu Pro Asp Tyr Met Llys Val Ile Tyr Lys Lieu Lleu Lieu. Asn. Thir Phe 385 390 395 4 OO Ser Glu Phe Glu Lys Glu Lieu. Thir Ala Glu Gly Lys Ser Tyr Ser Val 4 OS 41O 415 Llys Tyr Gly Arg Glu Ala Phe Glin Glu Lieu Val Arg Gly Tyr Tyr Lieu. 42O 425 43 O Glu Ala Val Trp Arg Asp Glu Gly Lys Ile Pro Ser Phe Asp Asp Tyr 435 44 O 445 Lieu. Tyr Asn Gly Ser Met Thr Thr Gly Leu Pro Leu Val Ser Thr Ala 450 45.5 460 Ser Phe Met Gly Val Glin Glu Ile Thr Gly Lieu. Asn Glu Phe Gln Trp 465 470 47s 48O US 2012/0058535 A1 Mar. 8, 2012 55

- Continued Lieu. Glu Thir Asn. Pro Llys Lieu. Ser Tyr Ala Ser Gly Ala Phe Ile Arg 485 490 495 Lieu Val Asn Asp Lieu. Thir Ser His Val Thr Glu Glin Glin Arg Gly His SOO 505 51O Val Ala Ser Cys Ile Asp Cys Tyr Met Asin Gln His Gly Val Ser Lys 515 52O 525 Asp Glu Ala Val Lys Ile Lieu Gln Lys Met Ala Thr Asp Cys Trp Llys 53 O 535 54 O Glu Ile Asin Glu Glu. Cys Met Arg Glin Ser Glin Val Ser Val Gly His 5.45 550 555 560 Lieu Met Arg Ile Val Asn Lieu Ala Arg Lieu. Thir Asp Val Ser Tyr Lys 565 st O sts Tyr Gly Asp Gly Tyr Thr Asp Ser Glin Glin Lieu Lys Glin Phe Val Lys 58O 585 59 O Gly Leu Phe Val Asp Pro Ile Ser Ile 595 6OO

<210s, SEQ ID NO 17 &211s LENGTH: 1638 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Codon optimized sequence < 4 OO > SEQUENCE: 17 atgtcaacaa cacatcaaga agttcgt.cca ttagcttatt titccaccaac agitttggggt 6 O aatcgttittg citt cattaac atttaatcca toaga atttgaatct tatga tigaacgtgtt 12 O attgttittaa aaaaaaaagt taaagatatt ttaattt cat caa.catcaga titcagttgaa 18O acagttattt taattgattt attatgtcgt ttaggtgttt catat cattt tdaaaatgat 24 O attgaagaat tattatcaaa aatttittaat t cacaac cag atttagttga tigaaaaagaa 3OO tgttgattitat atacagoagc tattgtttitt cqtgtttittc gtcaa.catgg ttittaaaatg 360 t catcagatgtttitttcaaa atttaaagat t cagatggta aatttaaaga at cattacgt. 42O ggtgatgcta aagg tatgtt at cattattt gaagcatcac attitat cagt to atggtgaa 48O gatattittag aagaag catt togcttttaca aaagattatt tacaatcatc togctgttgaa 54 O ttattt coaa atttaaaacg. t catattaca aatgctt tag aacaaccatt to attcaggit 6OO gttccacgtt tagaagct cq taaatttatt gatttatatgaagctgatat togaatgtcgt. 660 aatgaaac at tattagaatt togctaaatta gattataatc gtgttcaatt attacat caa 72 O caagaattat gtcaattittcaaaatggtgg aaagatttaa atttagcttic agatatt cot 78O tatgct cqtg atcg tatggc tigaaatttitt ttittgggctg ttgctatgta ttittgaacca 84 O gattatgctic atacacgitat gattattgct aaagttgttt tact tatttic tittaattgat 9 OO gatacaattig atgct tatgc tacaatggaa gaalacacata ttittagotga agctgttgct 96.O cgttgggata tdt catgttt agaaaaatta ccagattata togaaagttat ttataaatta 1 O2O ttattaaata cattitt caga atttgaaaaa gaattaa.cag cagaaggtaa at catatto a 108 O gttaaatatg gtcgtgaagc atttcaagaa ttagttcgtg gtt attattt agaagctgtt 114 O tgg.cgtgatgaaggtaaaat tcc at cattt gatgattatt tatataatgg ttcaatgaca 12 OO acaggtttac cattagtttcaac agctt catttatgggtg ttcaagaaat tacaggttta 126 O US 2012/0058535 A1 Mar. 8, 2012 56

- Continued aatgaattitc aatggittaga aacaaatcca aaattat citt atgcttcagg togcttittatt 132O cgtttagtta atgatttaac atctoratgtt acagaacaac aacgtggtca tdttgct tca 1380 tg tattgatt gttatatgaa totalacatggt gtttcaaaag atgaagctgt taaaattitta 144 O caaaaaatgg ctacagattgttggaaagaa atcaatgaag aatgtatgcg tcaat cacala 15OO gttt cagttg gtcatttaat gcg tattgtt aatttagctic gtttalacaga tigttt catat 1560 aaatatggtg atggittatac agatt cacaa caattaaaac aatttgttaa aggtttattt 162O gttgat coaa tittcaatt 1638

<210s, SEQ ID NO 18 &211s LENGTH: 1683 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Codon optimized sequence 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (1639) . . (1683) <223> OTHER INFORMATION: Tag <4 OOs, SEQUENCE: 18 atgtcaacaa cacatcaaga agttcgt.cca ttagcttatt titccaccaac agitttggggt 6 O aatcgttittg citt cattaac atttaatcca toaga atttgaatct tatga tigaacgtgtt 12 O attgttittaa aaaaaaaagt taaagatatt ttaattt cat Caacatcaga titcagttgaa 18O acagttattt taattgattt attatgtcgt ttaggtgttt catat cattt tdaaaatgat 24 O attgaagaat tattatcaaa aatttittaat t cacaac cag atttagttga tigaaaaagaa 3OO tgttgattitat atacagoagc tattgtttitt cqtgtttittc gtcaa.catgg ttittaaaatg 360 t catcagatgtttitttcaaa atttaaagat t cagatggta aatttaaaga at cattacgt. 42O ggtgatgcta aagg tatgtt at cattattt gaagcatcac attitat cagt to atggtgaa 48O gatattittag aagaag catt togcttttaca aaagattatt tacaatcatc togctgttgaa 54 O ttattt coaa atttaaaacg. t catattaca aatgctt tag aacaaccatt to attcaggit 6OO gttccacgtt tagaagct cq taaatttatt gatttatatgaagctgatat togaatgtcgt. 660 aatgaaac at tattagaatt togctaaatta gattataatc gtgttcaatt attacat caa 72 O caagaattat gtcaattittcaaaatggtgg aaagatttaa atttagcttic agatatt cot 78O tatgct cqtg atcg tatggc tigaaatttitt ttittgggctg ttgctatgta ttittgaacca 84 O gattatgctic atacacgitat gattattgct aaagttgttt tact tatttic tittaattgat 9 OO gatacaattig atgct tatgc tacaatggaa gaalacacata ttittagotga agctgttgct 96.O cgttgggata tdt catgttt agaaaaatta ccagattata togaaagttat ttataaatta 1 O2O ttattaaata cattitt caga atttgaaaaa gaattaa.cag cagaaggtaa at catatto a 108 O gttaaatatg gtcgtgaagc atttcaagaa ttagttcgtg gtt attattt agaagctgtt 114 O tgg.cgtgatgaaggtaaaat tcc at cattt gatgattatt tatataatgg ttcaatgaca 12 OO acaggtttac cattagtttcaac agctt catttatgggtg ttcaagaaat tacaggttta 126 O aatgaattitc aatggittaga aacaaatcca aaattat citt atgcttcagg togcttittatt 132O cgtttagtta atgatttaac atctoratgtt acagaacaac aacgtggtca tdttgct tca 1380 tg tattgatt gttatatgaa totalacatggt gtttcaaaag atgaagctgt taaaattitta 144 O US 2012/0058535 A1 Mar. 8, 2012 57

- Continued caaaaaatgg ctacagattgttggaaagaa atcaatgaag aatgtatgcg tcaat cacala 15OO gttt cagttg gtcatttaat gcg tattgtt aatttagctic gtttalacaga tigttt catat 1560 aaatatggtg atggittatac agatt cacaa caattaaaac aatttgttaa aggtttattt 162O gttgat coaa tittcaattac cqg tattaat t cagottggit cacat coaca atttgaaaaa 168O taa 1683

<210s, SEQ ID NO 19 &211s LENGTH: 14 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Strep tag <4 OOs, SEQUENCE: 19 Thr Gly Ile Asin Ser Ala Trp Ser His Pro Glin Phe Glu Lys 1. 5 1O

<210s, SEQ ID NO 2 O &211s LENGTH: 560 212. TYPE: PRT <213> ORGANISM: Ricinus communis 22 Os. FEATURE: <221s NAME/KEY: MISC FEATURE <222s. LOCATION: (547) . . (560) <223> OTHER INFORMATION: Strep tag

<4 OOs, SEQUENCE: 2O Met Ser Thr Thr His Glin Glu Val Arg Pro Leu Ala Tyr Phe Pro Pro 1. 5 1O 15 Thr Val Trp Gly Asn Arg Phe Ala Ser Lieu. Thr Phe Asn Pro Ser Glu 2O 25 3O Phe Glu Ser Tyr Asp Glu Arg Val Ile Val Lieu Lys Llys Llys Val Lys 35 4 O 45 Asp Ile Lieu. Ile Ser Ser Thr Ser Asp Ser Val Glu Thr Val Ile Leu SO 55 6 O Ile Asp Lieu. Lieu. Cys Arg Lieu. Gly Val Ser Tyr His Phe Glu Asn Asp 65 70 7s 8O Ile Glu Glu Lieu Lleu Ser Lys Ile Phe Asn. Ser Glin Pro Asp Lieu Val 85 90 95 Asp Glu Lys Glu. Cys Asp Lieu. Tyr Thr Ala Ala Ile Val Phe Arg Val 1OO 105 11 O Phe Arg Gln His Gly Phe Lys Met Ser Ser Asp Val Phe Ser Llys Phe 115 12 O 125 Lys Asp Ser Asp Gly Llys Phe Lys Glu Ser Lieu. Arg Gly Asp Ala Lys 13 O 135 14 O Gly Met Leu Ser Leu Phe Glu Ala Ser His Leu Ser Val His Gly Glu 145 150 155 160 Asp Ile Lieu. Glu Glu Ala Phe Ala Phe Thir Lys Asp Tyr Lieu. Glin Ser 1.65 17O 17s Ser Ala Val Glu Lieu. Phe Pro Asn Lieu Lys Arg His Ile Thir Asn Ala 18O 185 19 O Lieu. Glu Gln Pro Phe His Ser Gly Val Pro Arg Lieu. Glu Ala Arg Llys 195 2OO 2O5 Phe Ile Asp Lieu. Tyr Glu Ala Asp Ile Glu. Cys Arg Asn. Glu Thir Lieu. 21 O 215 22O US 2012/0058535 A1 Mar. 8, 2012 58

- Continued

Lieu. Glu Phe Ala Lys Lieu. Asp Tyr Asn Arg Val Glin Lieu. Lieu. His Glin 225 23 O 235 24 O Glin Glu Lieu. Cys Glin Phe Ser Lys Trp Trp Lys Asp Lieu. Asn Lieu Ala 245 250 255 Ser Asp Ile Pro Tyr Ala Arg Asp Arg Met Ala Glu Ile Phe Phe Trp 26 O 265 27 O Ala Val Ala Met Tyr Phe Glu Pro Asp Tyr Ala His Thr Arg Met Ile 27s 28O 285 Ile Ala Lys Val Val Lieu. Lieu. Ile Ser Lieu. Ile Asp Asp Thir Ile Asp 29 O 295 3 OO Ala Tyr Ala Thr Met Glu Glu Thir His Ile Lieu Ala Glu Ala Val Ala 3. OS 310 315 32O Arg Trp Asp Met Ser Cys Lieu. Glu Lys Lieu Pro Asp Tyr Met Llys Val 3.25 330 335 Ile Tyr Lys Lieu Lleu Lieu. Asn. Thir Phe Ser Glu Phe Glu Lys Glu Lieu. 34 O 345 35. O Thir Ala Glu Gly Llys Ser Tyr Ser Val Lys Tyr Gly Arg Glu Ala Phe 355 360 365 Glin Glu Lieu Val Arg Gly Tyr Tyr Lieu. Glu Ala Val Trp Arg Asp Glu 37 O 375 38O Gly Lys Ile Pro Ser Phe Asp Asp Tyr Lieu. Tyr Asn Gly Ser Met Thr 385 390 395 4 OO Thr Gly Lieu Pro Leu Val Ser Thr Ala Ser Phe Met Gly Val Glin Glu 4 OS 41O 415 Ile Thr Gly Lieu. Asn Glu Phe Gln Trp Lieu. Glu Thr Asn Pro Llys Lieu. 42O 425 43 O Ser Tyr Ala Ser Gly Ala Phe Ile Arg Lieu Val Asn Asp Lieu. Thir Ser 435 44 O 445 His Val Thr Glu Glin Glin Arg Gly His Val Ala Ser Cys Ile Asp Cys 450 45.5 460 Tyr Met Asn Gln His Gly Val Ser Lys Asp Glu Ala Wall Lys Ile Lieu. 465 470 47s 48O Glin Llys Met Ala Thr Asp Cys Trp Llys Glu Ile Asn. Glu Glu. Cys Met 485 490 495 Arg Glin Ser Glin Val Ser Val Gly. His Lieu Met Arg Ile Val Asn Lieu SOO 505 51O Ala Arg Lieu. Thir Asp Val Ser Tyr Lys Tyr Gly Asp Gly Tyr Thr Asp 515 52O 525 Ser Glin Glin Lieu Lys Glin Phe Wall Lys Gly Lieu. Phe Val Asp Pro Ile 53 O 535 54 O Ser Ile Thr Gly Ile Asn Ser Ala Trp Ser His Pro Glin Phe Glu Lys 5.45 550 555 560

<210s, SEQ ID NO 21 &211s LENGTH: 2793 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Codon optimized sequence <4 OOs, SEQUENCE: 21 atgtcaacaa cacatcaaga agttcgt.cca ttagcttatt titccaccaac agitttggggt 6 O US 2012/0058535 A1 Mar. 8, 2012 59

- Continued aatcgttittg caagtttaac atttaatcca toaga atttgaat catacga tigaacgtgtt 12 O attgttittaa aaaaaaaagt taaagatatt ttaattt cat caa.catcaga titcagttgaa 18O acagttattt taattgattt attatgtcgt ttaggtgttt catat cattt tdaaaatgat 24 O attgaagaat tattatcaaa aatttittaat t cacaac cag atttagttga tigaaaaagaa 3OO tgttgattitat atacagoagc tattgtttitt cqtgtttittc gtcaa.catgg ttittaaaatg 360 t catcagatgtttitttcaaa atttaaagat t cagatggta aatttaaaga at cattacgt. 42O ggtgatgcta aagg tatgtt at cattattt gaagcatcac attitat cagt to atggtgaa 48O gatattittag aagaag catt togcttttaca aaagattatt tacaatcatc togctgttgaa 54 O ttattt coaa atttaaaacg. t catattaca aatgctt tag aacaaccatt to attcaggit 6OO gttccacgtt tagaagct cq taaatttatt gatttatatgaagctgatat togaatgtcgt. 660 aatgaaac at tattagaatt togctaaatta gattataatc gtgttcaatt attacat caa 72 O caagaattat gtcaattittcaaaatggtgg aaagatttaa atttagcttic agatatt coa 78O tacgct cqtg atcg tatggc tigaaatttitt ttittgggctg ttgctatgta ttittgaacca 84 O gattatgctic atacacgitat gattattgct aaagttgttc ttittaatttic tittaattgat 9 OO gatacaattig atgct tatgc tacaatggaa gaalacacata ttittagotga agctgttgct 96.O cgttgggata tdt catgttt agaaaaatta ccagattata togaaagttat ttataaatta O2O ttattaaata catttt caga atttgaaaaa gaattaactg ctgaagg taa atcatattoa O8O gttaaatatg gtcgtgaagc atttcaagaa ttagttcgtg gtt attattt agaagctgtt 14 O tgg.cgtgatgaaggtaaaat tcc at cattt gatgattatt tatataatgg ttcaatgaca 2OO acaggtttac cattagtttcaac agctt catttatgggtg ttcaagaaat tacaggttta 26 O aatgaattitc aatggittaga aacaaatcca aaattat cat acgcttcagg togcttittatt 32O cgtttagtta atgatttaac at cacatgtt acagaacaac aacgtggtca tdttgct tca 38O tg tattgatt gttatatgaa totalacatggt gtttcaaaag atgaagctgt taaaattitta 44 O caaaaaatgg Ctactgattgttggaaagaa attaacgaag aatgtatgcg tcaat cacala SOO gttt cagttg gtcatttaat gcg tattgtt aatttagctic gtttalacaga tigttt catat 560 aaatatggtg atggittatac agatt cacaa caattaaaac aatttgttaa aggtttattt 62O gttgat coaa tittcaattac acaattagaa tdgatgcgt.c aaggtttacc at cattagaa 68O t catgtc.cag ttittagct cq t t caccagaa attgatt cag atgaatcagc agittt cacca 74 O acagcagatgaat cagattic aacagaagat t cattaggitt caggttcacg tdaagattica 8OO t cattatcaa caggtttatc attat cacca gttcattcaa atgaaggtaa agatttacaa 86 O cgtgttgata cagat catat ttitttittgaa aaa.gctgttt tagaa.gctic catacgattat 92 O attgcttcaa togc catcaaa aggtgttcgt gat caattta ttgatgctitt aaatgattgg 98 O ttacgtgttc cagatgttaa agttggtaaa attaaagatg citgttcgtgt tttacataat 2O4. O t cat catt at tattagatga ttittcaagat aattcac cat tacgt.cgtgg taaac catca 21OO acacataata tttittggttc agctcaaaca gttaatacag ctacatatt c aattattaaa 216 O gctattggtc. aaattatgga attittctgct ggtgaat cag ttcaagaagt tatgaactica 222 O attatgattt tatttcaagg toaa.gctato gatttattitt gga catataa togg to atgtt 228O c catcagaag aagaat atta t cqtatgatt gatcaaaaaa caggit caatt attittcaatt 234 O US 2012/0058535 A1 Mar. 8, 2012 60

- Continued gctacatcat tatt attaaa togctgctgat aatgaaattic cacgtacaaa aattcaatca 24 OO tgtttacatc gtttaa.cacg titt attaggit cqttgttittcaaatt cqtga tigattat caa 246 O aatttagttt cagcagatta tacaaaacaa aaaggtttitt gtgaagattt agatgaaggit 252O aaatggtcat tagctittaat t cacatgatt cataaacaac gttcacacat ggctitt atta 2580 aatgttittat caa.caggit cq taalacatggt gig tatgacat tagaacaaaa acaatttgtt 264 O ttagat atta ttgaagaaga aaaat catta gattatacac gttcagttat gatggattta 27 OO catgttcaat tacgtgctga aattggtcgt attgaaattt tattagattic accaaatcca 276 O gctato.cgitt tatt attaga attattacgt gtt 2793

<210s, SEQ ID NO 22 &211s LENGTH: 931 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Fusion protein <4 OOs, SEQUENCE: 22 Met Ser Thr Thr His Glin Glu Val Arg Pro Leu Ala Tyr Phe Pro Pro 1. 5 1O 15 Thr Val Trp Gly Asn Arg Phe Ala Ser Lieu. Thr Phe Asn Pro Ser Glu 2O 25 3O Phe Glu Ser Tyr Asp Glu Arg Val Ile Val Lieu Lys Llys Llys Val Lys 35 4 O 45 Asp Ile Lieu. Ile Ser Ser Thr Ser Asp Ser Val Glu Thr Val Ile Leu SO 55 6 O Ile Asp Lieu. Lieu. Cys Arg Lieu. Gly Val Ser Tyr His Phe Glu Asn Asp 65 70 7s 8O Ile Glu Glu Lieu Lleu Ser Lys Ile Phe Asn. Ser Glin Pro Asp Lieu Val 85 90 95 Asp Glu Lys Glu. Cys Asp Lieu. Tyr Thr Ala Ala Ile Val Phe Arg Val 1OO 105 11 O Phe Arg Gln His Gly Phe Lys Met Ser Ser Asp Val Phe Ser Llys Phe 115 12 O 125 Lys Asp Ser Asp Gly Llys Phe Lys Glu Ser Lieu. Arg Gly Asp Ala Lys 13 O 135 14 O Gly Met Leu Ser Leu Phe Glu Ala Ser His Leu Ser Val His Gly Glu 145 150 155 160 Asp Ile Lieu. Glu Glu Ala Phe Ala Phe Thir Lys Asp Tyr Lieu. Glin Ser 1.65 17O 17s Ser Ala Val Glu Lieu. Phe Pro Asn Lieu Lys Arg His Ile Thir Asn Ala 18O 185 19 O Lieu. Glu Gln Pro Phe His Ser Gly Val Pro Arg Lieu. Glu Ala Arg Llys 195 2OO 2O5

Phe I e Asp Lieu. Tyr Glu Ala Asp Ile Glu. Cys Arg Asn. Glu Thir Lieu. 21 O 215 22O Lieu. Glu Phe Ala Lys Lieu. Asp Tyr Asn Arg Val Glin Lieu. Lieu. His Glin 225 23 O 235 24 O Glin Glu Lieu. Cys Glin Phe Ser Lys Trp Trp Lys Asp Lieu. Asn Lieu Ala 245 250 255 Ser Asp Ile Pro Tyr Ala Arg Asp Arg Met Ala Glu Ile Phe Phe Trp 26 O 265 27 O US 2012/0058535 A1 Mar. 8, 2012 61

- Continued

Ala Val Ala Met Tyr Phe Glu Pro Asp Tyr Ala His Thr Arg Met Ile 27s 28O 285 Ile Ala Lys Val Val Lieu. Lieu. Ile Ser Lieu. Ile Asp Asp Thir Ile Asp 29 O 295 3 OO Ala Tyr Ala Thr Met Glu Glu Thir His Ile Lieu Ala Glu Ala Val Ala 3. OS 310 315 32O Arg Trp Asp Met Ser Cys Lieu. Glu Lys Lieu Pro Asp Tyr Met Llys Val 3.25 330 335 Ile Tyr Lys Lieu Lleu Lieu. Asn. Thir Phe Ser Glu Phe Glu Lys Glu Lieu. 34 O 345 35. O Thir Ala Glu Gly Llys Ser Tyr Ser Val Lys Tyr Gly Arg Glu Ala Phe 355 360 365 Glin Glu Lieu Val Arg Gly Tyr Tyr Lieu. Glu Ala Val Trp Arg Asp Glu 37 O 375 38O Gly Lys Ile Pro Ser Phe Asp Asp Tyr Lieu. Tyr Asn Gly Ser Met Thr 385 390 395 4 OO Thr Gly Lieu Pro Leu Val Ser Thr Ala Ser Phe Met Gly Val Glin Glu 4 OS 41O 415 Ile Thr Gly Lieu. Asn Glu Phe Gln Trp Lieu. Glu Thr Asn Pro Llys Lieu. 42O 425 43 O Ser Tyr Ala Ser Gly Ala Phe Ile Arg Lieu Val Asn Asp Lieu. Thir Ser 435 44 O 445 His Val Thr Glu Glin Glin Arg Gly His Val Ala Ser Cys Ile Asp Cys 450 45.5 460 Tyr Met Asn Gln His Gly Val Ser Lys Asp Glu Ala Wall Lys Ile Lieu. 465 470 47s 48O Glin Llys Met Ala Thr Asp Cys Trp Llys Glu Ile Asn. Glu Glu. Cys Met 485 490 495 Arg Glin Ser Glin Val Ser Val Gly. His Lieu Met Arg Ile Val Asn Lieu SOO 505 51O Ala Arg Lieu. Thir Asp Val Ser Tyr Lys Tyr Gly Asp Gly Tyr Thr Asp 515 52O 525 Ser Glin Glin Lieu Lys Glin Phe Wall Lys Gly Lieu. Phe Val Asp Pro Ile 53 O 535 54 O Ser Ile Thr Glin Lieu. Glu Trp Met Arg Glin Gly Lieu Pro Ser Lieu. Glu 5.45 550 555 560 Ser Cys Pro Val Lieu Ala Arg Ser Pro Glu Ile Asp Ser Asp Glu Ser 565 st O sts Ala Val Ser Pro Thr Ala Asp Glu Ser Asp Ser Thr Glu Asp Ser Lieu 58O 585 59 O Gly Ser Gly Ser Arg Glin Asp Ser Ser Lieu. Ser Thr Gly Lieu. Ser Lieu. 595 6OO 605 Ser Pro Wal His Ser Asn. Glu Gly Lys Asp Lieu. Glin Arg Val Asp Thr 610 615 62O Asp His Ile Phe Phe Glu Lys Ala Val Lieu. Glu Ala Pro Tyr Asp Tyr 625 630 635 64 O Ile Ala Ser Met Pro Ser Lys Gly Val Arg Asp Glin Phe Ile Asp Ala 645 650 655 Lieu. Asn Asp Trp Lieu. Arg Val Pro Asp Wall Lys Val Gly Lys Ile Llys 660 665 67 O US 2012/0058535 A1 Mar. 8, 2012 62

- Continued Asp Ala Val Arg Val Lieu. His Asn. Ser Ser Lieu Lleu Lieu. Asp Asp Phe 675 68O 685 Glin Asp Asn. Ser Pro Lieu. Arg Arg Gly Llys Pro Ser Thr His Asn. Ile 69 O. 695 7 OO Phe Gly Ser Ala Glin Thr Val Asn Thr Ala Thr Tyr Ser Ile Ile Llys 7 Os 71O 71s 72O Ala Ile Gly Glin Ile Met Glu Phe Ser Ala Gly Glu Ser Val Glin Glu 72 73 O 73 Val Met Asn Ser Ile Met Ile Leu Phe Glin Gly Glin Ala Met Asp Leu 740 74. 7 O Phe Trp Thr Tyr Asn Gly His Val Pro Ser Glu Glu Glu Tyr Tyr Arg 7ss 760 765 Met Ile Asp Gln Lys Thr Gly Glin Leu Phe Ser Ile Ala Thr Ser Lieu. 770 775 78O Lieu. Lieu. Asn Ala Ala Asp Asn. Glu Ile Pro Arg Thr Lys Ile Glin Ser 78s 79 O 79. 8OO Cys Lieu. His Arg Lieu. Thir Arg Lieu. Lieu. Gly Arg Cys Phe Glin Ile Arg 805 810 815 Asp Asp Tyr Glin Asn Lieu Val Ser Ala Asp Tyr Thr Lys Glin Lys Gly 82O 825 83 O Phe Cys Glu Asp Lieu. Asp Glu Gly Lys Trp Ser Lieu Ala Lieu. Ile His 835 84 O 845 Met Ile His Lys Glin Arg Ser His Met Ala Lieu. Lieu. Asn Val Lieu. Ser 850 855 860 Thr Gly Arg Llys His Gly Gly Met Thr Lieu. Glu Gln Lys Glin Phe Val 865 87O 87s 88O Lieu. Asp Ile Ile Glu Glu Glu Lys Ser Lieu. Asp Tyr Thr Arg Ser Val 885 890 895 Met Met Asp Lieu. His Val Glin Lieu. Arg Ala Glu Ile Gly Arg Ile Glu 9 OO 905 91 O Ile Lieu. Lieu. Asp Ser Pro ASn Pro Ala Met Arg Lieu Lleu Lieu. Glu Lieu. 915 92 O 925 Lieu. Arg Val 93 O

<210s, SEQ ID NO 23 &211s LENGTH: 191 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: CLIP-8xhis tag <4 OOs, SEQUENCE: 23 Thr Gly Asp Lys Asp Cys Glu Met Lys Arg Thir Thr Lieu. Asp Ser Pro 1. 5 1O 15 Lieu. Gly Llys Lieu. Glu Lieu. Ser Gly Cys Glu Glin Gly Lieu. His Glu Ile 2O 25 3O Ile Phe Lieu. Gly Lys Gly. Thir Ser Ala Ala Asp Ala Val Glu Val Pro 35 4 O 45 Ala Pro Ala Ala Val Lieu. Gly Gly Pro Glu Pro Lieu. Ile Glin Ala Thr SO 55 6 O Ala Trp Lieu. Asn Ala Tyr Phe His Gln Pro Glu Ala Ile Glu Glu Phe 65 70 7s 8O US 2012/0058535 A1 Mar. 8, 2012 63

- Continued

Pro Wall Pro Ala Lieu. His His Pro Wall Phe Glin Glin Glu Ser Phe Thr 85 90 95 Arg Glin Val Lieu. Trp Llys Lieu. Lieu Lys Val Val Llys Phe Gly Glu Val 1OO 105 11 O Ile Ser Glu Ser His Lieu Ala Ala Lieu Val Gly Asn Pro Ala Ala Thr 115 12 O 125 Ala Ala Val Asn. Thir Ala Lieu. Asp Gly Asn Pro Val Pro Ile Lieu. Ile 13 O 135 14 O Pro Cys His Arg Val Val Glin Gly Asp Ser Asp Val Gly Pro Tyr Lieu. 145 150 155 160 Gly Gly Lieu Ala Val Lys Glu Trp Lieu. Lieu Ala His Glu Gly His Arg 1.65 17O 17s Lieu. Gly Llys Pro Gly Lieu. Gly His His His His His His His His 18O 185 19 O

<210s, SEQ ID NO 24 &211s LENGTH: 3369 &212s. TYPE: DNA <213> ORGANISM: Artificial sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Fusion protein 22 Os. FEATURE: <221 > NAMEAKEY: misc feature <222s. LOCATION: (2794) . . (3369 <223> OTHER INFORMATION: CLIP-8xhis tag <4 OOs, SEQUENCE: 24 atgtcaacaa cacatcaaga agttcgt.cca ttagcttatt titccaccaac agitttggggt 6 O aatcgttittg caagtttaac atttaatcca toaga atttgaat catacga tigaacgtgtt 12 O attgttittaa aaaaaaaagt taaagatatt ttaattt cat caa.catcaga titcagttgaa 18O acagttattt taattgattt attatgtcgt ttaggtgttt catat cattt tdaaaatgat 24 O attgaagaat tattatcaaa aatttittaat t cacaac cag atttagttga tigaaaaagaa 3OO tgttgattitat atacagoagc tattgtttitt cqtgtttittc gtcaa.catgg ttittaaaatg 360 t catcagatgtttitttcaaa atttaaagat t cagatggta aatttaaaga at cattacgt. 42O ggtgatgcta aagg tatgtt at cattattt gaagcatcac attitat cagt to atggtgaa 48O gatattittag aagaag catt togcttttaca aaagattatt tacaatcatc togctgttgaa 54 O ttattt coaa atttaaaacg. t catattaca aatgctt tag aacaaccatt to attcaggit 6OO gttccacgtt tagaagct cq taaatttatt gatttatatgaagctgatat togaatgtcgt. 660 aatgaaac at tattagaatt togctaaatta gattataatc gtgttcaatt attacat caa 72 O caagaattat gtcaattittcaaaatggtgg aaagatttaa atttagcttic agatatt coa 78O tacgct cqtg atcg tatggc tigaaatttitt ttittgggctg ttgctatgta ttittgaacca 84 O gattatgctic atacacgitat gattattgct aaagttgttc ttittaatttic tittaattgat 9 OO gatacaattig atgct tatgc tacaatggaa gaalacacata ttittagotga agctgttgct 96.O cgttgggata tdt catgttt agaaaaatta ccagattata togaaagttat ttataaatta 1 O2O ttattaaata cattitt caga atttgaaaaa gaattaactg. citgaaggtaa at catatto a 108 O gttaaatatg gtcgtgaagc atttcaagaa ttagttcgtg gtt attattt agaagctgtt 114 O tgg.cgtgatgaaggtaaaat tcc at cattt gatgattatt tatataatgg ttcaatgaca 12 OO acaggtttac cattagtttcaac agctt catttatgggtg ttcaagaaat tacaggttta 126 O US 2012/0058535 A1 Mar. 8, 2012 64

- Continued aatgaattitc aatggittaga aacaaatcca aaattat cat acgcttcagg togcttittatt 32O cgtttagtta atgatttaac at cacatgtt acagaacaac aacgtggtca tdttgct tca 38O tg tattgatt gttatatgaa totalacatggt gtttcaaaag atgaagctgt taaaattitta 44 O caaaaaatgg Ctactgattgttggaaagaa attaacgaag aatgtatgcg tcaat cacala SOO gttt cagttg gtcatttaat gcg tattgtt aatttagctic gtttalacaga tigttt catat 560 aaatatggtg atggittatac agatt cacaa caattaaaac aatttgttaa aggtttattt 62O gttgat coaa tittcaattac acaattagaa tdgatgcgt.c aaggtttacc at cattagaa 68O t catgtc.cag ttittagct cq t t caccagaa attgatt cag atgaatcagc agittt cacca 74 O acagcagatgaat cagattic aacagaagat t cattaggitt caggttcacg tdaagattica 8OO t cattatcaa caggtttatc attat cacca gttcattcaa atgaaggtaa agatttacaa 86 O cgtgttgata cagat catat ttitttittgaa aaa.gctgttt tagaa.gctic catacgattat 92 O attgcttcaa togc catcaaa aggtgttcgt gat caattta ttgatgctitt aaatgattgg 98 O ttacgtgttc cagatgttaa agttggtaaa attaaagatg citgttcgtgt tttacataat 2O4. O t cat catt at tattagatga ttittcaagat aattcac cat tacgt.cgtgg taaac catca 21OO acacataata tttittggttc agctcaaaca gttaatacag ctacatatt c aattattaaa 216 O gctattggtc. aaattatgga attittctgct ggtgaat cag ttcaagaagt tatgaactica 222 O attatgattt tatttcaagg toaa.gctato gatttattitt gga catataa togg to atgtt 228O c catcagaag aagaat atta t cqtatgatt gatcaaaaaa caggit caatt attittcaatt 234 O gctacatcat tatt attaaa togctgctgat aatgaaattic cacgtacaaa aattcaatca 24 OO tgtttacatc gtttaa.cacg titt attaggit cqttgttittcaaatt cqtga tigattat caa 246 O aatttagttt cagcagatta tacaaaacaa aaaggtttitt gtgaagattt agatgaaggit 252O aaatggtcat tagctittaat t cacatgatt cataaacaac gttcacacat ggctitt atta 2580 aatgttittat caa.caggit cq taalacatggt gig tatgacat tagaacaaaa acaatttgtt 264 O ttagat atta ttgaagaaga aaaat catta gattatacac gttcagttat gatggattta 27 OO catgttcaat tacgtgctga aattggtcgt attgaaattt tattagattic accaaatcca 276 O gctato.cgitt tatt attaga attattacgt gttaccggtg ataaagattg togaaatgaaa 282O cgtacaac at tagatt cacc attaggtaaa ttagaattat caggttgttga acaaggttta 288O catgaaatta ttttitt tagg taaaggtaca totgctgcag atgctgttga agttc.ca.gct 294 O cctgctgcag titt taggtgg to cagaacct ttaattcaag ctacagottg gttaaatgct 3 OOO tattitt catc aaccagaa.gc tattgaagaa titt coagttc cagctttaca totatic cagtt 3 O 6 O tittcaacaag aat catttac acgtcaagta t tatggaaat tattaaaagt tdttaaattit 312 O ggtgaagtta titt cagaatc acatttagct gctittagttg gtaatccago agctacagoa 318O gcagttaata cagctittaga tigg taatcca gttccaattt taatticcatgtcatcgtgtt 324 O gttcaaggtg attcagatgt titc cat at ttaggtggitt tagctgttaa agaatggitta 33 OO ttagct catgaaggtoat cq tttaggtaaa ccaggitt tag gtcat cacca toat cac cat 3360

CaCCaCtaa 3369

<210s, SEQ ID NO 25 &211s LENGTH: 1122 US 2012/0058535 A1 Mar. 8, 2012 65

- Continued

212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Fusion protein 22 Os. FEATURE: <221s NAME/KEY: MISC FEATURE <222s. LOCATION: (932) ... (1122) <223> OTHER INFORMATION: CLIP-8xhis tag <4 OOs, SEQUENCE: 25 Met Ser Thr Thr His Glin Glu Val Arg Pro Leu Ala Tyr Phe Pro Pro 1. 5 1O 15 Thr Val Trp Gly Asn Arg Phe Ala Ser Lieu. Thr Phe Asn Pro Ser Glu 2O 25 3O Phe Glu Ser Tyr Asp Glu Arg Val Ile Val Lieu Lys Llys Llys Val Lys 35 4 O 45 Asp Ile Lieu. Ile Ser Ser Thr Ser Asp Ser Val Glu Thr Val Ile Leu SO 55 6 O Ile Asp Lieu. Lieu. Cys Arg Lieu. Gly Val Ser Tyr His Phe Glu Asn Asp 65 70 7s 8O Ile Glu Glu Lieu Lleu Ser Lys Ile Phe Asn. Ser Glin Pro Asp Lieu Val 85 90 95 Asp Glu Lys Glu. Cys Asp Lieu. Tyr Thr Ala Ala Ile Val Phe Arg Val 1OO 105 11 O Phe Arg Gln His Gly Phe Lys Met Ser Ser Asp Val Phe Ser Llys Phe 115 12 O 125 Lys Asp Ser Asp Gly Llys Phe Lys Glu Ser Lieu. Arg Gly Asp Ala Lys 13 O 135 14 O Gly Met Leu Ser Leu Phe Glu Ala Ser His Leu Ser Val His Gly Glu 145 150 155 160 Asp Ile Lieu. Glu Glu Ala Phe Ala Phe Thir Lys Asp Tyr Lieu. Glin Ser 1.65 17O 17s Ser Ala Val Glu Lieu. Phe Pro Asn Lieu Lys Arg His Ile Thir Asn Ala 18O 185 19 O Lieu. Glu Gln Pro Phe His Ser Gly Val Pro Arg Lieu. Glu Ala Arg Llys 195 2OO 2O5

Phe I e Asp Lieu. Tyr Glu Ala Asp Ile Glu. Cys Arg Asn. Glu Thir Lieu. 21 O 215 22O Lieu. Glu Phe Ala Lys Lieu. Asp Tyr Asn Arg Val Glin Lieu. Lieu. His Glin 225 23 O 235 24 O Glin Glu Lieu. Cys Glin Phe Ser Lys Trp Trp Lys Asp Lieu. Asn Lieu Ala 245 250 255 Ser Asp Ile Pro Tyr Ala Arg Asp Arg Met Ala Glu Ile Phe Phe Trp 26 O 265 27 O Ala Val Ala Met Tyr Phe Glu Pro Asp Tyr Ala His Thr Arg Met Ile 27s 28O 285 Ile Ala Lys Val Val Lieu. Lieu. Ile Ser Lieu. Ile Asp Asp Thir Ile Asp 29 O 295 3 OO Ala Tyr Ala Thr Met Glu Glu Thir His Ile Lieu Ala Glu Ala Val Ala 3. OS 310 315 32O Arg Trp Asp Met Ser Cys Lieu. Glu Lys Lieu Pro Asp Tyr Met Llys Val 3.25 330 335 Ile Tyr Lys Lieu Lleu Lieu. Asn. Thir Phe Ser Glu Phe Glu Lys Glu Lieu. 34 O 345 35. O US 2012/0058535 A1 Mar. 8, 2012 66

- Continued

Thir Ala Glu Gly Llys Ser Tyr Ser Val Lys Tyr Gly Arg Glu Ala Phe 355 360 365 Glin Glu Lieu Val Arg Gly Tyr Tyr Lieu. Glu Ala Val Trp Arg Asp Glu 37 O 375 38O Gly Lys Ile Pro Ser Phe Asp Asp Tyr Lieu. Tyr Asn Gly Ser Met Thr 385 390 395 4 OO Thr Gly Lieu Pro Leu Val Ser Thr Ala Ser Phe Met Gly Val Glin Glu 4 OS 41O 415 Ile Thr Gly Lieu. Asn Glu Phe Gln Trp Lieu. Glu Thr Asn Pro Llys Lieu. 42O 425 43 O Ser Tyr Ala Ser Gly Ala Phe Ile Arg Lieu Val Asn Asp Lieu. Thir Ser 435 44 O 445 His Val Thr Glu Glin Glin Arg Gly His Val Ala Ser Cys Ile Asp Cys 450 45.5 460 Tyr Met Asn Gln His Gly Val Ser Lys Asp Glu Ala Wall Lys Ile Lieu. 465 470 47s 48O Glin Llys Met Ala Thr Asp Cys Trp Llys Glu Ile Asn. Glu Glu. Cys Met 485 490 495 Arg Glin Ser Glin Val Ser Val Gly. His Lieu Met Arg Ile Val Asn Lieu SOO 505 51O Ala Arg Lieu. Thir Asp Val Ser Tyr Lys Tyr Gly Asp Gly Tyr Thr Asp 515 52O 525 Ser Glin Glin Lieu Lys Glin Phe Wall Lys Gly Lieu. Phe Val Asp Pro Ile 53 O 535 54 O Ser Ile Thr Glin Lieu. Glu Trp Met Arg Glin Gly Lieu Pro Ser Lieu. Glu 5.45 550 555 560 Ser Cys Pro Val Lieu Ala Arg Ser Pro Glu Ile Asp Ser Asp Glu Ser 565 st O sts Ala Val Ser Pro Thr Ala Asp Glu Ser Asp Ser Thr Glu Asp Ser Lieu 58O 585 59 O Gly Ser Gly Ser Arg Glin Asp Ser Ser Lieu. Ser Thr Gly Lieu. Ser Lieu. 595 6OO 605 Ser Pro Wal His Ser Asn. Glu Gly Lys Asp Lieu. Glin Arg Val Asp Thr 610 615 62O Asp His Ile Phe Phe Glu Lys Ala Val Lieu. Glu Ala Pro Tyr Asp Tyr 625 630 635 64 O Ile Ala Ser Met Pro Ser Lys Gly Val Arg Asp Glin Phe Ile Asp Ala 645 650 655 Lieu. Asn Asp Trp Lieu. Arg Val Pro Asp Wall Lys Val Gly Lys Ile Llys 660 665 67 O Asp Ala Val Arg Val Lieu. His Asn. Ser Ser Lieu Lleu Lieu. Asp Asp Phe 675 68O 685 Glin Asp Asn. Ser Pro Lieu. Arg Arg Gly Llys Pro Ser Thr His Asn. Ile 69 O. 695 7 OO Phe Gly Ser Ala Glin Thr Val Asn Thr Ala Thr Tyr Ser Ile Ile Llys 7 Os 71O 71s 72O Ala Ile Gly Glin Ile Met Glu Phe Ser Ala Gly Glu Ser Val Glin Glu 72 73 O 73 Val Met Asn Ser Ile Met Ile Leu Phe Glin Gly Glin Ala Met Asp Leu 740 74. 7 O US 2012/0058535 A1 Mar. 8, 2012 67

- Continued Phe Trp Thr Tyr Asn Gly His Val Pro Ser Glu Glu Glu Tyr Tyr Arg 7ss 760 765 Met Ile Asp Gln Lys Thr Gly Glin Leu Phe Ser Ile Ala Thr Ser Lieu. 770 775 78O Lieu. Lieu. Asn Ala Ala Asp Asn. Glu Ile Pro Arg Thr Lys Ile Glin Ser 78s 79 O 79. 8OO Cys Lieu. His Arg Lieu. Thir Arg Lieu. Lieu. Gly Arg Cys Phe Glin Ile Arg 805 810 815 Asp Asp Tyr Glin Asn Lieu Val Ser Ala Asp Tyr Thr Lys Glin Lys Gly 82O 825 83 O Phe Cys Glu Asp Lieu. Asp Glu Gly Lys Trp Ser Lieu Ala Lieu. Ile His 835 84 O 845 Met Ile His Lys Glin Arg Ser His Met Ala Lieu. Lieu. Asn Val Lieu. Ser 850 855 860 Thr Gly Arg Llys His Gly Gly Met Thr Lieu. Glu Gln Lys Glin Phe Val 865 87O 87s 88O Lieu. Asp Ile Ile Glu Glu Glu Lys Ser Lieu. Asp Tyr Thr Arg Ser Val 885 890 895 Met Met Asp Lieu. His Val Glin Lieu. Arg Ala Glu Ile Gly Arg Ile Glu 9 OO 905 91 O Ile Lieu. Lieu. Asp Ser Pro ASn Pro Ala Met Arg Lieu Lleu Lieu. Glu Lieu. 915 92 O 925 Lieu. Arg Val Thr Gly Asp Lys Asp Cys Glu Met Lys Arg Thir Thr Lieu. 93 O 935 94 O Asp Ser Pro Lieu. Gly Llys Lieu. Glu Lieu. Ser Gly Cys Glu Glin Gly Lieu 945 950 955 96.O His Glu Ile Ile Phe Lieu. Gly Lys Gly. Thir Ser Ala Ala Asp Ala Val 965 97O 97. Glu Val Pro Ala Pro Ala Ala Val Lieu. Gly Gly Pro Glu Pro Lieu. Ile 98O 985 99 O Glin Ala Thir Ala Trp Lieu. Asn Ala Tyr Phe His Glin Pro Glu Ala Ile 995 1OOO 1005

Glu Glu Phe Pro Wall Pro Ala Lieu. His His Pro Wall Phe Glin Glin 1010 1015 1 O2O Glu Ser Phe Thr Arg Glin Val Lieu. Trp Llys Lieu Lleu Lys Val Val O25 O3O O35 Llys Phe Gly Glu Val Ile Ser Glu Ser His Lieu Ala Ala Lieu Val O4 O O45 OSO Gly Asn Pro Ala Ala Thir Ala Ala Val Asn. Thir Ala Lieu. Asp Gly O55 O60 O65 Asn Pro Val Pro Ile Lieu. Ile Pro Cys His Arg Val Val Glin Gly

Asp Ser Asp Val Gly Pro Tyr Lieu. Gly Gly Lieu Ala Wall Lys Glu

rp Lieu. Lieu Ala His Glu Gly His Arg Lieu. Gly Llys Pro Gly Lieu.

Gly His His His His His His His His

<210s, SEQ ID NO 26 &211s LENGTH: 26O7 &212s. TYPE: DNA