US009359581B2

(12) United States Patent (10) Patent No.: US 9,359,581 B2 Ness et al. (45) Date of Patent: *Jun. 7, 2016

(54) BIOTRANSFORMATION USING CI2N 5/8 (2006.01) GENETICALLY MODIFIED CI2N 9/04 (2006.01) (52) U.S. Cl. (71) Applicant: Synthezyme, LLC, Rensselaer, NY CPC ...... CIIC3/006 (2013.01); C12N 9/0006 (US) (2013.01); C12N 15/815 (2013.01): CI2P 7/6409 (2013.01): CI2P 7/649 (2013.01): CI2P (72) Inventors: Jon E. Ness, Redwood City, CA (US); 7/6427 (2013.01); C12P 7/6436 (2013.01); Jeremy Minshull, Palo Alto, CA (US) Y02E 50/13 (2013.01); Y02P 20/52 (2015.11) (58) Field of Classification Search (73) Assignee: Synthezyme LLC, Rensselaer, NY (US) None See application file for complete search history. (*) Notice: Subject to any disclaimer, the term of this patent is extended or adjusted under 35 (56) References Cited U.S.C. 154(b) by 0 days. U.S. PATENT DOCUMENTS This patent is Subject to a terminal dis claimer. 8, 158,391 B2 * 4/2012 Gross et al...... 435.134 (21) Appl. No.: 14/084.230 OTHER PUBLICATIONS Chica et al. Curr Opin Biotechnol. Aug. 2005:16(4):378-84.* (22) Filed: Nov. 19, 2013 Sen et al. Appl Biochem Biotechnol. Dec. 2007: 143(3):212-23.* (65) Prior Publication Data * cited by examiner US 2015/OO94483 A1 Apr. 2, 2015 Primary Examiner — Christian Fronda Related U.S. Application Data (74) Attorney, Agent, or Firm — Laurence P. Colton; Smith Tempel Blaha LLC (60) Division of application No. 12/775,306, filed on May 6, 2010, now Pat. No. 8,597,923, which is a (57) ABSTRACT SENNES applit s 2,672). A substantially pure Candida host cell is provided for the yo, s • L vs. 8 Y-s u a- als--- u • biotransformation of a substrate to a product wherein the host (60) Provisional application No. 61/176,064, filed on May cell is characterized by a first genetic modification class that 6, 2009. comprises one or more genetic modifications that collectively or individually disrupt at least one alcohol dehydrogenase (51) Int. Cl. gene in the Substantially pure Candida host cell. CI2P 7/64 (2006.01) CIIC3/00 (2006.01) 33 Claims, 28 Drawing Sheets U.S. Patent Jun. 7, 2016 Sheet 1 of 28 US 9,359,581 B2

Fatty Acid rt-NS Fatty Acid Fatty Acid CYP52AType P450 s COAOxidase O-HO-Fatty Acid (B-unsaturated fatty acyl CoA Fatty Alcohol Oxidase CYP52A Type P450 Hydratase OCH-Fatty Acid 3-hydroxy fatty-acyl CoA Dehydrogenase CYP52A Type P450 Dehydrogenase HOOC-Fatty Acid B-OXO fatty-acyl CoA

Thiolase Fatty-acyl COA+ Acetyl CoA

Figure U.S. Patent Jun. 7, 2016 Sheet 2 of 28 US 9,359,581 B2

Fatty Acid a-Oxidation s B-Oxidation s s is s 8. i. Fatty Acid Fatty Acid CYP52AType P450 X ApOX4 ApOx5 (strain DP1) y C-HO-Fatty Acid 0.8-unsaturated fatty acyl COA Fatty Alcohol Oxidase CYP52A Type P450 Hydratase y OCH-Fatty Acid 3-hydroxy fatty-acyl CoA Dehydrogenase CYP52A Type P450 Dehydrogenase s HOOC-Fatty Acid 8-OXO fatty-acyl CoA

Thiolase Fatty-acyl CoA+ Acetyl CoA

Figure2 U.S. Patent Jun. 7, 2016 Sheet 3 of 28 US 9,359,581 B2

albicans DH 1A MSLFRIFRGASLTTTTASFTATTGATT.KTLS GSTWLRKSYKRTYSSSWLSSFE albicans ADH 1E MERFFRIFEGGSLTTTTSFTTTGTTTLSGSTWLREEERTESSSWLSSPE tropicalis ADE 4 albicans ADH 24. albicans ADH 2B tropicalis ADH B4 trialis DH 1 tropicalis ADHB11

albicans DH 1A LFFFHFINNERYCHTTTTTNTRTIMSEIFETEAWWFDTNGGOLWYKDYPWPTFEERIE albicans DH 1E LFFFHOFINNERYCHTTTTTNTETIMSEQIPETEAWWFDTNGGOLWYEDYFWPTFEPTE tropicalis ADH 44 ELEYEDIPWRTREME albicans ADH 2A MSWFTTE WIFETIGGELEYEDIFFFERENE albicans DH 2B MSWFTTEAWIFETNGGKLEYEDIPWPKPKANE tropicalis ADH B4 LEEDIPWREPERIE

... tropicalis ADH 10 ...... KLEYEDWPWPWPEPE tropicalis ADHB11 FLYTDIPWPWPKFNE it fit fit fit, it albicans DH 1A LLIHESGHTILHWEGDWPLTELPLGGHEGGGGEEGWEIGDFGIEW albicans DE 1B LLINESSWCHTILHiWRGITVPLTELPLWGGHEGGGGEEGWEIGIF,GIEW tropicalis ADH 4 LLINKSGHTILHWEGIWFLTELFLGGHEGGGGEEGWEIGDFGIEW allian8 ADH 2. LLINESGHTILHWEGIWFLTELFLGGHEGFLGENEGWEGD, albicans ADH2B LLINESGCHTILHWRGITFLTELFLGGHEGGS WILGEMWKGWEGIGWEW ... tropicalis DH B4 LLINESGCHTILHWEGIWFLITELPLGGHEGGIGINEGWEGILGW tropicalis ADH.10 L LRESGCHSDLHTTEGDWFIFELFLGGHEGGGMGNREGTEGDLGIET tropicalis ADH Bll LLYHWESGWCHSDIHWWEGIWFPSELFWGGHEGGWWIGENWGWEWGDL.GIEM * f : . it it if it if f : * : * * * * * * : * * * : h if f it it if it it. if it; if it if f it: it

albicans ADH 1A LNGSCMSCEFCOGEFNCGEDLSGTHDGSFETDAW, EIFGTILITAFIL albicans DH 1E LNGSCMSCEFCOGEFMCGEDLSGTHIGSFETDAWKIFAGTDLMWFIL tropicalis ADH 4 LNGSCMSCEFCOGEFMCGEDLSGTHIGSFETDAWRIPTILEWPIL albicans ADH2A LNGSCLNCEYCSGAEFNCEADLSGYTHIGSFQYATA DAN, RIPAGTDLANWAFIL albicans ADH 2B LNGSCLNCECSGAEFNCEDLSGTHIGSFYIT.I.W.RIPEGTILITAFIL ... tropicalis ADH B4 LNGSCMICEYCOOGEPNCPO, DLSGTHIGSFOOTDRIP, GTDLNWPIL tropicalis DH.10 LNGSCMICEFCOGAEFNCSRADMSG THDGTFOYATADAWKIPEGDMASIAFIL tropicalis ADH Bll LNGSCMNCEYCOGAEPMCPHDWSGSHDGTFYTD.W.EFPGSDLSIPIS * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

allians DH 1. CGTELETILGWISGGGGLGSLWORMGLRWIGGIEEGEFE albicans DH1B CGTWELETILGWISGGGGLGSLWYARMGLRWIDGGDEKGEFE tropicalis ADH 4 CGTELETILGISGGGLSLOWMGLRIGGIEEGFE albicans DH 2A C.G.T.YKALETELEGOWAISG. AGGLGSLWYAKMGYRLAIDGGEDKGEFWES albicans DH 2E CGITELETELEGISGIGGLGSLW.E.MGR LIDGGEDEGEFE Figure 3A U.S. Patent Jun. 7, 2016 Sheet 4 of 28 US 9,359,581 B2

C topicalis ADE E4 GWTWELETEDLFNISG, GLSLWEMGRWIDGGEGEFE C tropicalis ADE 10 CAGSTWELENADLLAGOT...ISGGGGLGSLGWE, MGRWLAIDGGIERGEFWE , tropicalis ADE B11 GWTWELET GLOFWISG, GLSLEMGLRWIGGERFES * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

C albicans ADH 14. LGENDFTEDEDIWEWEETIGGRHGINESSEEIDSWEWRFLGWWLWGLP. albicans ADH 1E LG EASWDFTEIRDIYESERTIGGPHGAINSYSERIDSWEEWRFLGRWLWGLFA topicalis ADE 44 LEIDFLEEEDIWSWEETIGGFHINESSEEISEWRFLGELLP albicans ADH2A LGETFIDFTEEEDWWE WEETINGGPHGWINSWSERIGOSTEYWRTLGEWWLWGLPA albicans ADH2E LGETFIDFTEEEDWWE WEETINGGFHGWINSWSERIGOSTE WRTLG&WLWGLFA tIOpicalis ADF E4 LGAEWFWDFL.EEEDIWGWEKATIGGRHGWNYSISEEINSWDWRTLGEWWLWGLFA , tropicalis DH 1. LGAEYYIDFLEEDIYSIRRATGGGFHGWINSWSEEINSWEWRTLGKWWLWSLFA tropicalis ADEE 11 LGAEWFWDFTEEANYSEIIRATIGGAHGSINESISEEINSWEEWRTLGTWLWGLPA * * * * * * * : ; ; t; fift fit; if it: k, t, if it, if t if f it?

C albicans ADH 1A HETPWFIWWESIEIGSEGNREITEIDFFSRGLIECPIKIWGLSDLPEFELM C albicans ADH 1E HETPWFD WESIEIRGSGNREDTEIDFFSRGLIECPIEIWGLSDLPEFELE C tIOpialis DH GSEWTGWFEWWESIEIEGSYWGNREDTENDFFSRGLIECFIEIWGLSELFWFELM C albicans TH 2. GART3TPWFD WIFTIOIRGSYWGNRRDTAE ANDFFTRG, IRCFTRIWGLSELPEWYRLY allicans ADH2E GAEISTFWFD WIETIIEGSYNGNREDTAETIFFTRGLIECFIKIWGLSELFEWELE C , tropicalis ADF E4 GSEWSFWFDSWWESIQIEGSYWGNREDTESDFFSRGLIECFIEWGLSELFEWYELM C tIOpicalis ADE 10 GGELTPLFESWARSIOIRTTCWGNREDTTEIDFFWRGLIDCPIKWGLSEWPEIFILM tropicalis ADE B11 ELEFIFMSAEIIRGSYNNRRDTAEWDFFARGLWECFIEWGLSELFEIFELL f; ; ; ; ; ; ; ; f : h : h if it; it fifth if it? fift fit fift f ; ; ; ; ; ; , f :

C. albicans ADH 1, EEGEILSRYWLITS-SEC ID MO; 72 C. albicans ADH 1B EEGRILGRYWLITSE SEC ID MO; 173 C. tropicalis ADH4 ------SE ID MO; 155 C. albicans ADH2. EEGRILGRYWLINDE SEO, ID MO; 174 C, albicans ADH 2B EEGKILGRLINK SE II MO; 175 C. tropicalis ADH E4 ------SE ID MO;154 C. tropicalis ADH 10 ------SE ID MO; 152 C. tropicalis ADH E 11 ------SE II MO; 151 Figure3B U.S. Patent Jun. 7, 2016 Sheet 5 of 28 US 9,359,581 B2

ReCOmbinase ReCOmbinase Site Site Targeting Recombinase Targeting Sequence enCOding gene Selective marker Sequence

Restriction site to release targeting Restriction site to release targeting Construct from Sequences required for Construct from Sequences required for bacterial propagation bacterial propagation

Figure 4 U.S. Patent Jun. 7, 2016 Sheet 6 of 28 US 9,359,581 B2

ReCOmbinase ReCOmbinase site Targeting COnstruct site Targeting Recombinase Targeting Sequence 1 enCOding gene Selective marker Z8

: TargetW: TargetNEB Sequence 1 Sequence 2 GenOmic DNA DNA region to be replaced HomologOUS recombination

Integrant

W ''''''''''''''''''''''^ ReCOmbinase GenOmic DNA enCOding gene Selective marker ReCOmbinase ReCOmbinase Site Site EXCised DNA W: N D Figure 5 U.S. Patent Jun. 7, 2016 Sheet 7 of 28 US 9,359,581 B2

ReCOmbinase ReCOmbinase Site Site ReCOmbinase encoding gene Selective marker Z8E0 N A GenOmic DNA Induction of reCOmbinase

ReCOmbinase

WXNE B : GenOmic DNA ReCOmbinase Recombinase site encoding gene Selective marker

U.S. Patent Jun. 7, 2016 Sheet 8 of 28 US 9,359,581 B2

Insertion Sequences ReCOmbinase ReCOmbinase site Site Targeting Recombinase Targeting Sequence encoding gene Selective marker Sequence

WIO E.8%:W.III E:

Restriction site to release targeting Restriction site to release targeting construct from sequences required for Construct from sequences required for bacterial propagation bacterial propagation

Figure 7 U.S. Patent Jun. 7, 2016 Sheet 9 of 28 US 9,359,581 B2

Insertion ReCOmbinase ReCOmbinase Sequences Targeting COnstruct Site Site Targeting Recombinase Targeting Sequence? encoding gene Selective marker eCuence 2 W| EN A

: Target Target Sequence Sequence 2 Genomic DNA DNA region to be replaced Homologous recombination

Insertion Sequence Integrant

EWIE8Ex ReCOmbinase GenOmic DNA enCOding gene Selective marker ReCOmbinase ReCOmbinase Site site EXCISec DNA

U.S. Patent Jun. 7, 2016 Sheet 10 of 28 US 9,359,581 B2

ReCOmbinase ReCOmbinase Site Site ReCOmbinase encoding gene Selective marker EZIEEEx:N

Genomic DNA \""Sequences Induction OfreCOmbinase

Insertion Sequences ReCOmbinase

WIEX NE B

GenOmic DNA Recombinase ReCombinase site enCOding gene Selective marker

Figure 9 U.S. Patent Jun. 7, 2016 Sheet 11 of 28 US 9,359,581 B2

Original genomic sequence Target Sequence 2

W NEA A

ReCOmonase ReCOmonase Selective marker ReCombinase Site encoding gene Site K EW E7070707 N B 'N Target

Sequence? Recombinase W:N

A A A A A A B A A A C A

2kb 1k

500b

Figure 10 U.S. Patent Jun. 7, 2016 Sheet 12 of 28 US 9,359,581 B2

Targeting Targeting Sequence Selective marker Sequence

Figure 11 U.S. Patent Jun. 7, 2016 Sheet 13 of 28 US 9,359,581 B2

Fatty Acid ()-Oxidation s Boxidation 83. 38 88. * Fatty Acid Fatty Acid CYP52AType P450 X X ApOx4AO0x5 (strain DP1) y (HO-Fatty Acid (B-unsaturated fattyacyl CoA Fatty Alcohol Oxidase CYP52AType P450 Hydratase y g OCH-Fatty Acid B-hydroxy fatty-acyl CoA Dehydrogenase CYP52A Type P450 Dehydrogenase y g HOOC-Fatty Acid 8-OXOfatty-acyl CoA

Thiolase Fatty-acyl CoA + Acetyl CoA

Figure 12 U.S. Patent Jun. 7, 2016 Sheet 14 of 28 US 9,359,581 B2

S350 PartB (O8 sts 3 -- iwi, ES pH controlled / 2500 pH controlled s 3 200 9 () s O 150 f()> O 8: 100 D 50 ------12 24 36 48 o 12 24 36 48 Transformation time (h) Transformation time (h)

818 Part C a350C Part D 59 16

2x Ea. 5& 14 EMyristicacic N250of EMyristic...i. acid 2 : 12 pH controlled pH controlled & 10 . 2OOC W 9 () all . 1500 : X5S 8 O : ficts: 6 5.21OOC o . % DP1 DP174 DP DP174 Strains Strains U.S. Patent Jun. 7, 2016 Sheet 15 of 28 US 9,359,581 B2

Fatty Acid a-Oxidation N. B-Oxidation 8. g s s s : Fatty Acid Fatty Acid CYP52AType P450 i X ApOx4 ApOx5 (strain DP1) y C-HO-Fatty Acid (B-unsaturated fattyacyl CoA Fatty Alcohol Oxidase X CYP52A Type P450 Hydratase V y OCH-Fatty Acid 8-hydroxy fatty-acyl CoA Dehydrogenase X CYP52A Type P450 Dehydrogenase V HOOC-Fatty Acid BOXO fatty-acyl CoA

Thiolase s Fatty-acyl COA+ Acetyl CoA

Figure 14 U.S. Patent Jun. 7, 2016 Sheet 16 of 28 US 9,359,581 B2

A ()-hydroxy laurate Substrate B: ()-hydroxy palmitate Substrate

% %

DP186 DP258 DP259 DP186 DP258 DP259 Strains Strains

Figure 15 U.S. Patent Jun. 7, 2016 Sheet 17 of 28 US 9,359,581 B2

A ()-hydroxy laurate Substrate B: ()-hydroxy palmitate Substrate

% W

DP186 DP283 DP284 COntrol DP186 DP283 DP284 Control Strains Strains

Figure 16 U.S. Patent Jun. 7, 2016 Sheet 18 of 28 US 9,359,581 B2

B2 A4

CaADHA B4 Ca ADH2A

Figure 17 U.S. Patent Jun. 7, 2016 Sheet 19 Of 28 US 9,359,581 B2

5. ADH 5. ADHOut 3. ADHin 3. ADHOut

Figure 18 U.S. Patent Jun. 7, 2016 Sheet 20 of 28 US 9,359,581 B2

A. Cell growth B: Diacid production

35 --DP1 30 --DP283 r&n DP415 se 25 --- 2O f

4 8 12 16 20 24 28

Figure 19 U.S. Patent Jun. 7, 2016 Sheet 21 of 28 US 9,359,581 B2

--DP --DP390 -A-DP415 -v-DP417

-b-DP434 U.S. Patent Jun. 7, 2016 Sheet 22 of 28 US 9,359,581 B2

Bacterial Origin of replication

linearization position Transcription terminator 2 Promoter?

Selectablemarker

Bacterial promoter Promoter 2

Transcription terminator

Figure 21 U.S. Patent Jun. 7, 2016 Sheet 23 of 28 US 9,359,581 B2

Promoter 2 Promoter 1 Gene for Expression Selectable make Bacterial... Orion (5' part) Transcription Bacterialpromoter TranSCriptioncrp Promoter?(3part) s ling 2 / WE Eroiii.3% IW Part A. -1 t linearized COnstruct B C A B

GenOmic Gene Promoter Expressed from Promoter Part B. genomic Sequence A B C

Gene for EXDreSSION Selectable marker Bacterial Origin GenOmic Gene Promoter? p Bacterial ofreplication Expressed from Transcription promoter Transcription Promoter 1 Promoter? s ? l 2 Z: EHXX:...I.Z. Part C: Construct integrated into genome Figure 22 U.S. Patent Jun. 7, 2016 Sheet 24 of 28 US 9,359,581 B2

pUC bacterial origin of replication BSWI site for linearization CYC transcription terminator 8, '. isOcitrate lyase promoter

Ze0cin resistance gene

EM7 bacterial promoter

TEF promoter Gene for Expression lsOcitrate lyase transcription terminator Figure23 U.S. Patent Jun. 7, 2016 Sheet 25 of 28 US 9,359,581 B2

10 --DP428 ra. DP1 8 -y-DP20 -1 - 6

4

2

O 12 24 36 48 6O 72 84 O 12 24 36 48 6O 72 84 Time (h) Time (h) ()-hydroxymyristic acid production Tetradecanedioic acid (diacid) production

Figure24 U.S. Patent Jun. 7, 2016 Sheet 26 of 28 US 9,359,581 B2

Myristic acid (C14:0) Oleic acid (C18:1) 60 2O 3 -A-o-OHC14 –A-diacid by P450A13 s -A- (-OHOA-A-diacid by P4SCA3 9 -- (-OHC14 - Ordiacid by 850i 50 ... A 16 |-8-0-OHOA-O-diacid by P456A7 40- us / w ^, E 12 - / O 2 30- A 5 8 / 8 20- / / 9t - 17s-s- - - - s / A d 4 A1A --a C. O ole31-2-- * ----... ------ar-A - 6 /. ---- A-r O 20 40 60 8O O 2O 40 60 BioConversion time (h) BioConversion time (h) Linoleic acid (C182) 1O -A-o-OHLA-A-diacid by P450A13 O) -- (-OHLA-O-diacid by P450A Sigis, 9 8 DP428: P450A17 6 DP522. P450A13 8 54 O

9 t () E E Figure25 904-A-O 20 40 60 BioCOnversion time (h) U.S. Patent Jun. 7, 2016 Sheet 27 of 28 US 9,359,581 B2

ara 12O 2 O 5 -A-O-hydroxymyristate 1-A - (0-hydroxymyristate 100---Dacid C A 100-e-Dacid 9 --- A 80- / 5 /

60- A -- 8 / 9 40- / / s 20- / / O / > O 1.A ...--&-r" & co. S O 2O 40 6O 8O BioConversion time (h) BioConversion time (h)

Figure26 U.S. Patent Jun. 7, 2016 Sheet 28 of 28 US 9,359,581 B2

&

& & 3 & 83

8. 8 3 N 3. 8 33 : & 8

Figure27 US 9,359,581 B2 1. 2 BOTRANSFORMATION USING estas potential sources of chemical intermediates, and thus as GENETICALLY MODIFIED CANDIDA possible replacements for chemicals derived from crude oil and its distillates. RELATED APPLICATIONS from the genus Candida are industrially important, they tolerate high concentrations offatty acids and hydrocar This application is a divisional of U.S. patent application bons in their growth media and have been used to produce Ser. No. 12/775,306, entitled “OXIDATION OF COM long chain fatty diacids (Picataggio et al. (1992), Biotechnol POUNDS USING GENETICALLY MODIFIED CAN ogy (NY): 10,894-8.) However they frequently lackenzymes DIDA, filed on May 6, 2010, which claims priority to U.S. that would facilitate conversion of plant cell wall material 10 (cellulose, hemicellulose, pectins and lignins) into Sugar Provisional Patent Application No. 61/176,064, entitled monomers for use in biofuel production. Methods for addi BIOSYNTHETIC ROUTES TO ENERGY RICHMOL tion of genes encoding proteins capable of catalyzing Such ECULES USING GENETICALLY MODIFIED CAN conversion into the Candida genome are thus of commercial DIDA, filed May 6, 2009, and which is a continuation-in-part interest. Further, because yeasts do not always contain enzy of U.S. patent application Ser. No. 12/436,729, entitled “BIO 15 matic systems for uptake and metabolism of all of the Sugar SYNTHETIC ROUTES TO LONG-CHAIN ALPHA, monomers derived from plant cell wall material, genes encod OMEGA-HYDROXYACIDS, DIACIDS AND THEIR ing enzymes that enable Candida to utilize Sugars that it does CONVERSION TO OLIGOMERS AND POLYMERS not normally use, and methods for adding these genes to the filed May 6, 2009. The disclosures of the above-referenced Candida genome, are thus of commercial interest. applications are incorporated by reference herein in their Currently, C.O)-dicarboxylic acids are almost exclusively entirety. produced by chemical conversion processes. However, the chemical processes for production of C.O)-dicarboxylic acids STATEMENT REGARDING FEDERALLY from non-renewable petrochemical feedstocks usually pro SPONSORED RESEARCH ORDEVELOPMENT duces numerous unwanted byproducts, requires extensive 25 purification and gives low yields (Picataggio et al., 1992, This invention was made with government Support under Bio/Technology 10, 894-898). Moreover, C.O)-dicarboxylic grant number DAAD19-03-1-0091, W911QY-04-C-0082 acids with carbon chain lengths greater than 13 are not readily and NBCH1070004 awarded by the Defense Advanced available by chemical synthesis. While several chemical Research Projects Agency (DARPA) to Richard A. Gross. routes to synthesize long-chain C.O)-dicarboxylic acids are The United States Government has certain rights in this 30 available, their synthesis is difficult, costly and requires toxic invention. reagents. Furthermore, most methods result in mixtures con taining shorter chain lengths. Furthermore, other than four SEQUENCE LISTING carbon C,c)-unsaturated diacids (e.g. maleic acid and fumaric acid), longer chain unsaturated C.()-dicarboxylic acids or This application includes a Sequence Listing Submitted as 35 those with other functional groups are currently unavailable since chemical oxidation cleaves unsaturated bonds or modi filename 49840.054DV1 SL.txt, of size 337,998 bytes, cre fies them resulting in cis-trans isomerization and other by ated Jun. 11, 2014. The Sequence Listing is incorporated by products. reference herein in its entirety. Many microorganisms have the ability to produce C.co 40 dicarboxylic acids when cultured in n-alkanes and fatty acids, 1. FIELD including Candida tropicalis, Candida cloacae, Cryptococ cus neoforman and Corynebacterium sp. (Shiio et al., 1971, Methods for biological production of C,c)-hydroxyacids Agr. Biol. Chem. 35, 2033-2042: Hill et al., 1986, Appl. using genetically modified strains of the Candida are Microbiol. Biotech. 24: 168-174; and Broadway et al., 1993, provided. Also provided are methods for the genetic modifi 45 J. Gen. Microbiol. 139, 1337-1344). Candida tropicalis and cation of the yeast Candida. Also provided are DNA con similar yeasts are known to produce C.()-dicarboxylic acids structs for removal of genes that can interfere with the pro with carbon lengths from C12 to C22 via an co-oxidation duction of energy rich molecules by Candida. Also provided pathway. The terminal methyl group of n-alkanes or fatty are DNA constructs for insertion of genes for expression into acids is first hydroxylated by a membrane-bound enzyme the Candida genome. 50 complex consisting of cytochrome P450 monooxygenase and associated NADPH cytochrome reductase that is the rate 2. BACKGROUND limiting step in the co-oxidation pathway. Two additional enzymes, the fatty alcohol oxidase and fatty aldehyde dehy Genes that encode proteins that catalyze chemical trans drogenase, further oxidize the alcohol to create ()-aldehyde formations of alkanes, alkenes, fatty acids, fatty alcohols, 55 acid and then the corresponding C.O)-dicarboxylic acid (ES fatty aldehydes, aldehydes and alcohols may aid in the bio chenfeldt et al., 2003, Appl. Environ. Microbiol. 69, 5992 synthesis of energy rich molecules, or in the conversion of 5999). However, there is also a f-oxidation pathway for fatty Such compounds to compounds better Suited to specific appli acid oxidation that exists within Candida tropicalis. Both cations. Such molecules include hydrocarbons (alkane, alk fatty acids and C.co-dicarboxylic acids in wild type Candida ene and isoprenoid), fatty acids, fatty alcohols, fatty alde 60 tropicalis are efficiently degraded after activation to the cor hydes, esters, ethers, lipids, triglycerides, and waxes, and can responding acyl-CoA ester through the B-Oxidation pathway, be produced from plant derived substrates, such as plant cell leading to carbon-chain length shortening, which results in walls (lignocellulose, cellulose, hemicellulose, and pectin) the low yields of C,c)-dicarboxylic acids and numerous by starch, and Sugar. These molecules are of particular interestas products. potential Sources of energy from biological Sources, and thus 65 Mutants of C. tropicalis in which the f-oxidation of fatty as possible replacements from energy sources derived from acids is impaired may be used to improve the production of crude oil and its distillates. These molecules are also of inter C.co-dicarboxylic acids (Uemura et al., 1988, J. Am. Oil. US 9,359,581 B2 3 4 Chem. Soc. 64, 1254-1257; and Yiet al., 1989, Appl. Micro Another embodiment provides a method for producing an biol. Biotech. 30, 327-331). Recently, genetically modified C-carboxyl-(o-hydroxy fatty acid having a carbon chain strains of the yeast Candida tropicalis have been developed to length in the range from C6 to C22, an O.()-dicarboxylic fatty increase the production of C,c)-dicarboxylic acids. An engi acid having a carbon chain length in the range from C6 to neered Candida tropicalis (Strain H5343, ATCC No. 20962) C22, or mixtures thereof in a Candida host cell. The method with the POX4 and POX5 genes that code for enzymes in the comprises (A) making one or more first genetic modifications first step of fatty acid B-Oxidation disrupted was generated so in a first genetic modification class to the Candida host cell. that it can prevent the strain from metabolizing fatty acids, The method further comprises (B) making one or more sec which directs the metabolic flux toward ()-oxidation and ond genetic modifications in a second genetic modification results in the accumulation of C,c)-dicarboxylic acids (FIGS. 10 class to the Candida host cell, where steps (A) and (B) col 3A and 3B). See U.S. Pat. No. 5.254.466 and Picataggio et al., 1992, Bio/Technology 10: 894-898, each of which is hereby lectively form a genetically modified Candida host cell. The incorporated by reference herein. Furthermore, by introduc method further comprises (C) producing an O-carboxyl-(r)- tion of multiple copies of cytochrome P450 and reductase hydroxy fatty acid having a carbon chain length in the range genes into C. tropicalis in which the 3-oxidation pathway is 15 from C6 to C22, an O.()-dicarboxylic fatty acid having a blocked, the C. tropicalis strain AR40 was generated with carbon chain length in the range from C6 to C22, or mixtures increased co-hydroxylase activity and higher specific produc thereof, by fermenting the genetically modified Candida host tivity of diacids from long-chain fatty acids. See, Picataggio cell in a culture medium comprising a nitrogen Source, an et al., 1992, Bio/Technology 10: 894-898 (1992); and U.S. organic Substrate having a carbon chain length in the range Pat. No. 5,620,878, each of which is hereby incorporated by from C6 to C22, and a cosubstrate. Here, the first genetic reference herein. Genes encoding proteins that catalyze modification class comprises one or more genetic modifica chemical transformations of alkanes, alkenes, fatty acids, tions that disrupt the B-oxidation pathway of the Candida host fatty alcohols, fatty aldehydes, aldehydes and alcohols may cell. Also, the second genetic modification class comprises also reduce the usefulness of these compounds as energy one or more genetic modifications that collectively or indi Sources, for example by oxidizing them or further metaboliz 25 vidually disrupt at least one gene selected from the group ing them. Methods for identifying and eliminating from the consisting of a CYP52A type cytochrome P450, a fatty alco Candida genome genes encoding enzymes that oxidize or hol oxidase, and an alcohol dehydrogenase in the Candida metabolize alkanes, alkenes, fatty acids, fatty alcohols, fatty host cell. aldehydes, aldehydes and alcohols are thus of commercial One embodiment provides a substantially pure Candida interest. For example fatty alcohols cannot be prepared using 30 host cell for the production of energy rich molecules. The any described strain of Candida because the hydroxy fatty Candida host cell is characterized by a first genetic modifi acid is oxidized to form a dicarboxylic acid, which has cation class and a second genetic modification class. The first reduced energy content relative to the hydroxy fatty acid. genetic modification class comprises one or more genetic Furthermore, neither the general classes nor the specific modifications that collectively or individually disrupt at least sequences of the Candida enzymes responsible for the oxi 35 one gene in the Substantially pure Candida host cell selected dation from hydroxy fatty acids to dicarboxylic acids have from the group consisting of a fatty alcohol oxidase, and an been identified. There is therefore a need in the art for meth alcohol dehydrogenase. The second genetic modification ods to prevent the oxidation of hydroxy fatty acids to diacids class comprises one or more genetic modifications that col during fermentative production. lectively or individually add to the host cell genome at least 40 one gene selected from the group consisting of a lipase, a 3. SUMMARY cellulase, a ligninase or a cytochrome P450 that is not iden tical to a naturally occurring counterpart gene in the Candida Methods for the genetic modification of Candida species to host cell; or a lipase, a cellulase, aligninase or a cytochrome produce strains improved for the production of biofuels are P450 that is expressed under control of a promoter other than disclosed. Methods by which yeaststrains may be engineered 45 the promoter that controls expression of the naturally occur by the addition or removal of genes to modify the oxidation of ring counterpart gene in the Candida host cell. compounds of interest as biofuels are disclosed. Enzymes to One embodiment provides a substantially pure Candida facilitate conversion of plant cell wall material (cellulose, host cell for the biotransformation of organic molecules. The hemicellulose, pectins and lignins) into Sugar monomers and Candida host cell is characterized by a first genetic modifi enzymes to enable Candida to utilize Such Sugars for use in 50 cation class and a second genetic modification class. The first biofuel production and methods for addition of genes encod genetic modification class comprises one or more genetic ing Such enzymes into the Candida genome are disclosed. modifications that collectively or individually disrupt at least One embodiment provides a substantially pure Candida one alcohol dehydrogenase gene in the Substantially pure host cell for the production of an O-carboxyl-(o-hydroxy fatty Candida host cell. The second genetic modification class acid having a carbon chain length in the range from C6 to 55 comprises one or more genetic modifications that collectively C22, an C. ()-dicarboxylic fatty acid having a carbon chain or individually add to the host cell genome at least one gene length in the range from C6 to C22, or mixtures thereof. The that is not identical to a naturally occurring counterpart gene Candida host cell is characterized by a first genetic modifi in the Candida host cell; or at least one gene that is identical cation class and a second genetic modification class. The first to a naturally occurring counterpart gene in the Candida host genetic modification class comprises one or more genetic 60 cell, but that is expressed under control of a promoter other modifications that disrupt the B-Oxidation pathway in the than the promoter that controls expression of the naturally Substantially pure Candida host cell. The second genetic occurring counterpart gene in the Candida host cell. modification class comprises one or more genetic modifica In some embodiments the first genetic modification class tions that collectively or individually disrupt at least one gene comprises disruption of at least one alcohol dehydrogenase in the substantially pure Candida host cell selected from the 65 gene selected from the group consisting of ADH-A4, ADH group consisting of a CYP52A type cytochrome P450, a fatty A4B, ADH-B4, ADH-B4B, ADH-A10, ADH-A1 OB, ADH alcohol oxidase, and an alcohol dehydrogenase. B11 and ADH-B11B. US 9,359,581 B2 5 6 In some embodiments the first genetic modification class In some embodiments the first genetic modification class comprises disruption of at least one alcohol dehydrogenase comprises disruption of at least one alcohol dehydrogenase gene whose nucleotide sequence is gene whose amino acid sequence, predicted from translation at least 95% identical to a stretch of at least 50, at least 60, of the gene that encodes it, comprises a first peptide. In some at least 70, at least 80, at least 90, at least 100, at least 110 at 5 embodiments the first peptide has the sequence VKYSGVCH least 120 contiguous nucleotides of SEQID NO:39, SEQID (SEQ ID NO: 156). In some embodiments, the first peptide NO:40, SEQID NO:42, SEQID NO:43, or SEQID NO:56, has the sequence VKYSGVCHXXXXXWKGDW (SEQ ID O NO: 162). In some embodiments the first peptide has the at least 90% identical to a stretch of at least 50, at least 60, Sequence VKYSGVCHXXXXXWKGDWXXXXKLPXVG at least 70, at least 80, at least 90, at least 100, at least 110 at 10 GHEGAGVVV (SEQID NO: 163). It will be understood that least 120 contiguous nucleotides of SEQID NO:39, SEQID in amino acid sequences presented herein, each 'X' respre NO:40, SEQID NO:42, SEQID NO:43, or SEQID NO:56, sents a placeholder for a residue of any of the naturally occur O ring aminoa acids. at least 85% identical to a stretch of at least 50, at least 60, In some embodiments the first genetic modification class at least 70, at least 80, at least 90, at least 100, at least 110 at 15 comprises disruption of at least one alcohol dehydrogenase least 120 contiguous nucleotides of SEQID NO:39, SEQID gene whose amino acid sequence, predicted from translation NO:40, SEQID NO:42, SEQID NO:43, or SEQID NO:56, of the gene that encodes it, comprises a second peptide. In or at least 80% identical to a stretch of at least 50, at least Some embodiments the second peptide has the sequence 60, at least 70, at least 80, at least 90, at least 100, at least 110 QYATADAVQAA (SEQID NO: 158). In some embodiments at least 120 contiguous nucleotides of SEQID NO:39, SEQ the second peptide has the sequence SGYxHDGXFXOYATA ID NO:40, SEQID NO:42, SEQID NO:43, or SEQID NO: DAVQAA (SEQ ID NO: 164). In some embodiments the 56, or second peptide has the sequence GAEPNCXXADxSGYX at least 75% identical to a stretch of at least 50, at least 60, HDGxFXOYATADAVQAA (SEQID NO: 165). at least 70, at least 80, at least 90, at least 100, at least 110 at In some embodiments the first genetic modification class least 120 contiguous nucleotides of SEQID NO:39, SEQID 25 comprises disruption of at least one alcohol dehydrogenase NO:40, SEQID NO:42, SEQID NO:43, or SEQID NO:56, gene whose amino acid sequence, predicted from translation O of the gene that encodes it, comprises a third peptide. In some at least 70% identical to a stretch of at least 50, at least 60, embodiments the third peptide has the sequence at least 70, at least 80, at least 90, at least 100, at least 110 at CAGVTVYKALK (SEQ ID NO: 159). In some embodi least 120 contiguous nucleotides of SEQID NO:39, SEQID 30 ments the third peptide has the sequence APIX NO:40, SEQID NO:42, SEQID NO:43, or SEQID NO:56, CAGVTVYKALK (SEQID NO: 166). O In some embodiments the first genetic modification class at least 65% identical to a stretch of at least 50, at least 60, comprises disruption of at least one alcohol dehydrogenase at least 70, at least 80, at least 90, at least 100, at least 110 at gene whose amino acid sequence, predicted from translation least 120 contiguous nucleotides of SEQID NO:39, SEQID 35 of the gene that encodes it, comprises a fourth peptide. In NO:40, SEQID NO:42, SEQID NO:43, or SEQID NO:56, Some embodiments the fourth peptide has the sequence O GQWVAISGA(SEQID NO: 160). In some embodiments the at least 60% identical to a stretch of at least 50, at least 60, fourth peptide has the sequence GQWVAISGAXGGLGSL at least 70, at least 80, at least 90, at least 100, at least 110 at (SEQID NO: 167). In some embodiments the fourth peptide least 120 contiguous nucleotides of SEQID NO:39, SEQID 40 has the sequence GQWVAISGAXGGLGSLXVQYA (SEQID NO:40, SEQID NO:42, SEQID NO:43, or SEQID NO:56, NO: 168). In some embodiments, the fourth peptide has the O sequence GQWVAISGAXGGLGSLXVQYAXAMG (SEQID at least 50% identical to a stretch of at least 50, at least 60, NO: 169). In some embodiments the fourth peptide has the at least 70, at least 80, at least 90, at least 100, at least 110 at Sequence GQWVAISGAxGGLGSLXVQYAXAM least 120 contiguous nucleotides of SEQID NO:39, SEQID 45 GxRVXAIDGG (SEQID NO: 170). NO:40, SEQID NO:42, SEQID NO:43, or SEQID NO:56, In some embodiments the first genetic modification class O comprises disruption of at least one alcohol dehydrogenase at least 40% identical to a stretch of at least 50, at least 60, gene whose amino acid sequence, predicted from translation at least 70, at least 80, at least 90, at least 100, at least 110 at of the gene that encodes it, comprises a fifth peptide. In some least 120 contiguous nucleotides of SEQID NO:39, SEQID 50 embodiments the fifth peptide has the sequence VGGHE NO:40, SEQID NO:42, SEQID NO:43, or SEQID NO:56, GAGVVV (SEQID NO: 157). O In some embodiments the first genetic modification class at least 30% identical to a stretch of at least 50, at least 60, comprises disruption of at least one alcohol dehydrogenase at least 70, at least 80, at least 90, at least 100, at least 110 at gene whose amino acid sequence, predicted from translation least 120 contiguous nucleotides of SEQID NO:39, SEQID 55 of the gene that encodes it, comprises at least one, two, three, NO:40, SEQID NO:42, SEQID NO:43, or SEQID NO:56, four or five peptides selected from the group consisting of a O first peptide having the sequence VKYSGVCH (SEQID NO: at least 20% identical to a stretch of at least 50, at least 60, 156), a second peptide having the sequence QYATA at least 70, at least 80, at least 90, at least 100, at least 110 at DAVQAA (SEQ ID NO: 158), a third peptide having the least 120 contiguous nucleotides of SEQID NO:39, SEQID 60 sequence CAGVTVYKALK (SEQ ID NO: 159), a fourth NO:40, SEQID NO:42, SEQID NO:43, or SEQID NO:56, peptide having the sequence GQWVAISGA (SEQ ID NO: O 160) and a fifth peptide having the sequence VGGHE at least 10% identical to a stretch of at least 50, at least 60, GAGVVV (SEQID NO: 157). at least 70, at least 80, at least 90, at least 100, at least 110 at In some embodiments the first genetic modification class least 120 contiguous nucleotides of any one of SEQID NO: 65 comprises disruption of at least one alcohol dehydrogenase 39, SEQID NO:40, SEQID NO:42, SEQID NO:43, or SEQ gene whose amino acid sequence, predicted from translation ID NO: 56. of the gene that encodes it has at least 65 percent sequence US 9,359,581 B2 7 8 identity, at least 70 percent sequence identity, at least 75 In some embodiments, the first genetic modification class percent sequence identity, at least 80 percent sequence iden causes an alcoholdehydrogenases to have decreased function tity, at least 85 percent sequence identity, at least 90 percent relative to the function of the wild-type counterpart in the sequence identity, or at least 95 percent sequence identity to a Candida host cell. stretch of at least 20, at least 30, at least 40, at least 50, at least In some embodiments, decreased function of an alcohol 60, at least 70, at least 80, at least 90, or at least 100 contigu dehydrogenase in a Candida host cell is measured by incu ous residues of any one of SEQID NO:151, SEQID NO:152, bating the Candida host cell in a mixture comprising a Sub SEQID NO:153, SEQID NO:154, or SEQID NO:155. strate possessing a hydroxyl group and measuring the rate of In some embodiments the first genetic modification class conversion of the Substrate to a more oxidized product such as comprises disruption of at least one alcohol dehydrogenase 10 gene whose amino acid sequence, predicted from translation an aldehyde or a carboxyl group. The rate of conversion of the of the gene that encodes it, has at least 65 percent sequence substrate by the Candida host cell is compared with the rate of identity, at least 70 percent sequence identity, at least 75 conversion produced by a second host cell that does not percent sequence identity, at least 80 percent sequence iden contain the disrupted gene but contains a wild type counter tity, at least 85 percent sequence identity, at least 90 percent 15 part of the gene, when the Candida host cell and the second sequence identity, or at least 95 percent sequence identity to a host cell are under the same environmental conditions (e.g., stretch of between 5 and 120 contiguous residues, between 40 same temperature, same media, etc.). The rate of formation of and 100 contiguous residues, between 50 and 90 contiguous the product can be measured using colorimetric assays, or residues, between 60 and 80 contiguous residues of any one chromatographic assays, or mass spectroscopy assays. In of SEQID NO:151, SEQID NO:152, SEQID NO:153, SEQ Some embodiments the alcohol dehydrogenase is deemed to ID NO:154, or SEQID NO:155. have decreased function if the rate of conversion is at least 5% In some embodiments the first genetic modification class lower, at least 10% lower, at least 15% lower, at least 20% comprises disruption of at least one alcohol dehydrogenase lower, at least 25% lower, or at least 30% lower in the Can gene whose amino acid sequence, predicted from translation dida host cell than the second host cell of the gene that encodes it, has at least 90 percent sequence 25 In some embodiments, decreased function of an alcohol identity to a stretch of between 10 and 100 contiguous resi dehydrogenase in a Candida host cell is measured by incu dues of any one of SEQID NO:151, SEQID NO:152, SEQID bating the Candida host cell in a mixture comprising a Sub NO:153, SEQID NO:154, or SEQID NO:155. strate possessing a hydroxyl group and measuring the rate of In some embodiments, the first genetic modification class conversion of the Substrate to a more oxidized product such as causes disruption of an alcohol dehydrogenase in a Candida 30 an aldehyde or a carboxyl group. The amount of the Substrate host cell. In some embodiments disruption of an alcohol converted to product by the Candida host cell in a specified dehydrogenase is measured by incubating the Candida host time is compared with the amount of substrate converted to cell in a mixture comprising a substrate possessing a hydroxyl product by a second host cell that does not contain the dis group and measuring the rate of conversion of the Substrate to rupted gene but contains a wild type counterpart of the gene, a more oxidized product Such as an aldehyde or a carboxyl 35 when the Candida host cell and the second host cell are under group. The rate of conversion of the substrate by the Candida the same environmental conditions (e.g., same temperature, host cell is compared with the rate of conversion produced by same media, etc.). The amount of product can be measured a second host cell that does not contain the disrupted gene but using colorimetric assays, or chromatographic assays, or contains a wild type counterpart of the gene, when the Can mass spectroscopy assays. In some embodiments the alcohol dida host cell and the second host cell are under the same 40 dehydrogenase is deemed to have decreased function if the environmental conditions (e.g., same temperature, same amount of product is at least 5% lower, at least 10% lower, at media, etc.). The rate of formation of the product can be least 15% lower, at least 20% lower, at least 25% lower, or at measured using colorimetric assays, or chromatographic least 30% lower in the Candida host cell than the second host assays, or mass spectroscopy assays. In some embodiments cell. the alcohol dehydrogenase is deemed disrupted if the rate of 45 In some embodiments, the first genetic modification class conversion is at least 5% lower, at least 10% lower, at least causes an alcoholdehydrogenases to have a modified activity 15% lower, at least 20% lower, at least 25% lower, or at least spectrum relative to an activity spectrum of the wild-type 30% lower in the Candida host cell than the second host cell. counterpart. In some embodiments, disruption of an alcohol dehydro In some embodiments, activity of an alcohol dehydroge genase in a Candida host cell is measured by incubating the 50 nase in a Candida host cell is measured by incubating the Candida host cell in a mixture comprising a Substrate pos Candida host cell in a mixture comprising a Substrate pos sessing a hydroxyl group and measuring the rate of conver sessing a hydroxyl group and measuring the rate of conver sion of the Substrate to a more oxidized product such as an sion of the Substrate to a more oxidized product such as an aldehyde or a carboxyl group. The amount of the substrate aldehyde or a carboxyl group. The rate of conversion of the converted to product by the Candida host cell in a specified 55 substrate by the Candida host cell is compared with the rate of time is compared with the amount of substrate converted to conversion produced by a second host cell that does not product by a second host cell that does not contain the dis contain the disrupted gene but contains a wild type counter rupted gene but contains a wild type counterpart of the gene, part of the gene, when the Candida host cell and the second when the Candida host cell and the second host cell are under host cell are under the same environmental conditions (e.g., the same environmental conditions (e.g., same temperature, 60 same temperature, same media, etc.). The rate of formation of same media, etc.). The amount of product can be measured the product can be measured using colorimetric assays, or using colorimetric assays, or chromatographic assays, or chromatographic assays, or mass spectroscopy assays. In mass spectroscopy assays. In some embodiments the alcohol Some embodiments the alcohol dehydrogenase is deemed to dehydrogenase is deemed disrupted if the amount of product have a modified activity spectrum if the rate of conversion is is at least 5%, at least 10%, at least 15%, at least 20%, at least 65 at least 5% lower, at least 10% lower, at least 15% lower, at 25%, or at least 25% lower in the Candida host cell than the least 20% lower, or at least 25% lower in the Candida host cell second host cell. than the second host cell. US 9,359,581 B2 10 In some embodiments, activity of an alcohol dehydroge the B-oxidation pathway is blocked (indicated by broken nase in a Candida host cell is measured by incubating the arrows), so that fatty acids are not used as Substrates for Candida host cell in a mixture comprising a Substrate pos growth. This genetic modification allows Candida species of sessing a hydroxyl group and measuring the rate of conver yeast including Candida tropicalis to be used as a biocatalyst sion of the Substrate to a more oxidized product such as an for the production of C,c)-diacids. See, for example, Picatag aldehyde or a carboxyl group. The amount of the substrate gio et al., 1991, Mol Cell Biol 11,4333-4339; and Picataggio converted to product by the Candida host cell in a specified et al., 1992, Biotechnology 10, 894-898. The B-oxidation time is compared with the amount of substrate converted to pathway may be disrupted by any genetic modification or product by a second host cell that does not contain the dis treatment of the host cells with a chemical for example an rupted gene but contains a wild type counterpart of the gene, 10 inhibitor that substantially reduces or eliminates the activity when the Candida host cell and the second host cell are under of one or more enzymes in the B-Oxidation pathway, includ the same environmental conditions (e.g., same temperature, ing the hydratase, dehydrogenase or thiolase enzymes, and same media, etc.). The amount of product can be measured thereby reduces the flux through that pathway and thus the using colorimetric assays, or chromatographic assays, or utilization of fatty acids as growth substrates. mass spectroscopy assays. In some embodiments the alcohol 15 FIGS. 3A and 3B show an alignment, using ClustalW, of dehydrogenase is deemed to have a modified activity spec the amino acid sequences of alcohol dehydrogenase proteins trum if the amount of product is at least 5% lower, at least 10% predicted from the sequences of genes from Candida albi lower, at least 15% lower, at least 20% lower, at least 25% cans and Candida tropicalis. The genes from Candida tropi lower in the Candida host cell than the second host cell. calis were isolated as partial genes by PCR with degenerate In some embodiments the second genetic modification primers, so the nucleic acid sequences of the genes and the class comprises addition of at least one modified CYP52A predicted amino acid sequences of the encoded proteins are type cytochrome P450 selected from the group consisting of incomplete. Amino acid sequences of the partial genes are CYP52A13, CYP52A14, CYP52A17, CYP52A18, predicted and provided: SEQID NO:155 (ADH-A4), SEQID CYP52A12, and CYP52A12B. NO:154 (ADH-B4), SEQID NO:152 (ADH-A10), SEQ ID Disclosed are biosynthetic routes that convert (oxidize) 25 NO:153 (ADH-A10B) and SEQID NO:151 (ADH-B11). fatty acids to their corresponding C-carboxyl-()-hydroxyl FIG. 4 shows a schematic representation of a DNA fatty acids. This is accomplished by culturing fatty acid Sub 'genomic targeting construct for deleting sequences from strates with a yeast, preferably a strain of Candida and more the genome of yeasts. The general structure is that the con preferably a strain of Candida tropicalis. The yeast converts struct has two targeting sequences that are homologous to the fatty acids to long-chain ()-hydroxy fatty acids and C.()- 30 sequences of two regions of the target yeast chromosome. dicarboxylic acids, and mixtures thereof. Methods by which Between these targeting sequences are two sites recognized yeast strains may be engineered by the addition or removal of by a site-specific recombinase (indicated as "recombinase genes to modify the oxidation products formed are disclosed. site'). Between the two site specific recombinase sites are Fermentations are conducted in liquid media containing fatty sequence elements, one of which encodes a selective marker acids as Substrates. Biological conversion methods for these 35 and the other of which (optionally) encodes the site-specific compounds use readily renewable resources such as fatty recombinase that recognizes the recombinase sites. In one acids as starting materials rather than non-renewable petro embodiment the sequences of the DNA construct between the chemicals For example, ()-hydroxy fatty acids and C. ()-di targeting sequences is the “SAT1 flipper', a DNA construct carboxylic acids can be produced from inexpensive long for inserting and deleting sequences into the chromosome of chain fatty acids, which are readily available from renewable 40 Candida (Reuss et al., (2004), Gene: 341, 119-27). In the agricultural and forest products Such as soybean oil, corn oil “SAT1 flipper, the recombinase is the flp recombinase from and tallow. Moreover, a wide range of C-carboxyl-(o-hy Saccharomyces cerevisiae (Vetter et al., 1983, Proc Natl Acad droxyl fatty acids with different carbon length can be pre Sci USA: 80, 7284-8) (FLP) and the flanking sequences rec pared because the biocatalyst accepts a wide range of fatty ognized by the recombinase are recognition sites for the flp acid substrates. Products described herein produced by the 45 recombinase (FRT). The selective marker is the gene encod biocatalytic methods described herein are new and not com ing resistance to the Nourseothricin resistance marker from mercially available since chemical methods are impractical to transposon Tn 1825 (Tietzeet al., 1988, J Basic Microbiol: 28, prepare the compounds and biocatalytic methods to these 129-36). The DNA sequence of the SAT1-flipper is given as products were previously unknown. SEQ ID NO: 1. The genomic targeting sequence can be 50 propagated in bacteria, for example Ecoli, in which case the 4. BRIEF DESCRIPTION OF THE FIGURES complete plasmid will also contain sequences required for propagation in bacteria, comprising a bacterial origin of rep FIG. 1 shows two pathways for metabolism of fatty acids, lication and a bacterial selective marker Such as a gene con ()-oxidation and B-Oxidation, both of which exist in yeasts of ferring antibiotic resistance. The targeting construct can be the genus Candida including Candida tropicalis. The names 55 released from this plasmid in a linear form by digestion with of classes of compounds are shown, arrows indicate transfor one or more restriction enzymes with recognition sites that mations from one compound to another, and the names of flank the targeting sequences. classes of enzymes that perform these conversions are indi FIG. 5 shows a schematic representation of the homolo cated by underlined names adjacent to the arrows. gous recombination between a “genomic targeting construct FIG. 2 shows two pathways for metabolism of fatty acids, 60 of the form shown in FIG. 4, with the DNA contained in a ()-oxidation and B-Oxidation, both of which exist in yeasts of yeast genome (either in the chromosome or in the mitochon the genus Candida including Candida tropicalis. The names drial DNA). The targeting construct (A) contains two regions of classes of compounds are shown, arrows indicate transfor of sequence homology to the genomic sequence (B); the mations from one compound to another, and the names of corresponding sequences in the genomic sequence flank the classes of enzymes that perform these conversions are indi 65 DNA region to be replaced. Introduction of the targeting cated by underlined names adjacent to the arrows. By inacti construct into the host cell is followed by homologous recom Vating the genes encoding acyl coA oxidase (poX4 and poX5), bination catalyzed by host cell enzymes. The result is an US 9,359,581 B2 11 12 integrant of the targeting construct into the genomic DNA (C) sequences. This changes the size of the DNA segment and the excised DNA (D) which will generally be lost from between the two PCR primers; in the case shown the size is the cell. increased. (C) Induction of the recombinase results in exci FIG. 6 shows a schematic representation of excision of the sion of the recombinase encoding gene, the selective marker targeting construct from the yeast genome that occurs when and one of the recombinase sites. This again changes the size expression of the recombinase in the targeting construct is of the DNA segment between the two PCR primers. (D) PCR induced in the integrant (A) shown in FIG. 5. Induction of the amplification from yeast genomic DNA unmodified (gel site-specific recombinase causes recombination between the lanes marked A), with integrated genomic targeting vector two recombinase recognition sites. The result is the excision (gel lanes marked B) or after excision of the genomic target of the sequences between the two recombinase sites (C) leav 10 ing vector (gel lanes marked C). ing a single recombinase site in the genomic DNA (B). FIG. 11 shows a schematic representation of a DNA FIG. 7 shows a schematic representation of a DNA 'genomic targeting construct for inserting or deleting 'genomic targeting construct for inserting sequences into sequences in the genome of yeasts. The general structure is the genome of yeasts. The general structure is that the con that the construct has two targeting sequences that are struct has two targeting sequences that are homologous to the 15 homologous to the sequences of two regions of the target sequences of two regions of the target yeast chromosome. yeast chromosome. Between these targeting sequences is a Between these targeting sequences are two sites recognized sequence that encodes a selective marker. by a site-specific recombinase (indicated as “recombinase FIG. 12 shows two pathways for metabolism offatty acids, site'). Between the two site specific recombinase sites are ()-oxidation and B-oxidation, both of which exist in Candida sequence elements, one of which encodes a selective marker species of yeast including Candida tropicalis. The names of and the other of which (optionally) encodes the site-specific classes of compounds are shown, arrows indicate transforma recombinase that recognizes the recombinase sites. Insertion tions from one compound to another, and the names of classes of additional sequences between one of the targeting of enzymes that perform these conversions are indicated by sequences and its closest recombinase recognition site will underlined names adjacent to the arrows. By inactivating the result in those sequences being inserted into the chromosome 25 Candida tropicalis genes poX4 and poX5 (or their functional after excision of the targeting construct (“Insertion homologs in other Candida species), the B-Oxidation pathway sequences'). The genomic targeting sequence can be propa is blocked (indicated by broken arrows), so that fatty acids are gated in bacteria, for example Ecoli, in which case the com not used as Substrates for growth. Furthermore, inactivation plete plasmid will also contain sequences required for propa of CYP52A type cytochrome P450 enzymes, as illustrated in gation in bacteria, comprising a bacterial origin of replication 30 the Figure, prevents the co-oxidation of these fatty acids. and a bacterial selective marker Such as a gene conferring These enzymes may also be responsible for some or all of the antibiotic resistance. The targeting construct can be released transformations involved in oxidizing ()-hydroxy fatty acids from this plasmid in a linear form by digestion with one or to C,c)-dicarboxylic acids. See Eschenfeldt et al., 2003, more restriction enzymes with recognition sites that flank the “Transformation offatty acids catalyzed by cytochrome P450 targeting sequences. 35 monooxygenase enzymes of Candida tropicalis. Appli. FIG. 8 shows a schematic representation of the homolo Environ. Microbiol. 69: 5992-5999, which is hereby incor gous recombination between a “genomic targeting construct porated by reference herein. of the form shown in FIG. 7, with the DNA contained in a FIG. 13 shows the levels of co-hydroxy myristate and the yeast genome (either in the chromosome or in the mitochon over-oxidized C14 diacid produced by Candida tropicalis drial DNA). The targeting construct (A) contains two regions 40 strains DP1 (ura3A?ura3B pox5A::ura3A/pox5B::ura3A of sequence homology to the genomic sequence (B); the pox4A::ura3A/pox4B::URA3A) and DP174 (ura3A?ura3B corresponding sequences in the genomic sequence flank the pox5A::ura3A/pox5B::ura3A pox4A::ura3A/pox4B:: DNA region to be replaced. Introduction of the targeting URA3A ACYP52A17/ACYP52A18 ACYP52A13/ construct into the host cell is followed by homologous recom ACYP52A14). Cultures of the yeast strains were grown at 30° bination catalyzed by host cell enzymes. The result is an 45 C. and 250 rpm for 16 hours in a 500 ml flask containing 30 ml integrant of the targeting construct into the genomic DNA (C) of media F (media F is peptone 3 g/l. yeast extract 6 g/l. yeast and the excised DNA (D) which will generally be lost from nitrogen base 6.7 g/l. Sodium acetate 3 g/l. KHPO, 7.2 g/l. the cell. KHPO 9.3 g/l) plus 30 g/l glucose. After 16 hours 0.5 ml of FIG.9 shows a schematic representation of excision of the culture was added to 4.5 ml fresh media Fplus 60 g/l glucose targeting construct from the yeast genome that occurs when 50 in a 125 ml flask, and grown at 30° C. and 250 rpm for 12 expression of the recombinase in the targeting construct is hours before addition of substrate. After addition of substrates induced in the integrant (A) shown in FIG.8. Induction of the growth was continued at 30° C. and 250 rpm. Parts A and B: site-specific recombinase causes recombination between the the substrate methyl myristate was then added to a final con two recombinase recognition sites. The result is the excision centration of 10 g/l and the pH was adjusted to between 7.5 of the sequences between the two recombinase sites (C) leav 55 and 8. The culture was pH controlled by adding 2 mol/l NaOH ing a single recombinase site together with the additional every 12 hours and glucose was fed as a coSubstrate by adding sequences that were included between the targeting 400 g/l glucose every 8 hours. Samples were taken at the sequences and the recombinase site (see FIG. 7) in the times indicated, cell culture was acidified to pH -1.0 by genomic DNA (B). addition of 6 NHCl, products were extracted from the cell FIG.10 shows a schematic representation of three stages in 60 culture by diethyl ether and the concentrations of co-hydroxy generation of a targeted deletion in a yeast genome (either in myristate and of the C14 diacid produced by oxidation of the the chromosome or in the mitochondrial DNA), and the ()-hydroxy myristate were measured by LC-MS (liquid chro results of a PCR test to distinguish between the three stages. matography mass spectroscopy). The diacid was quantified (A) PCR primers (thick arrows) are designed to flank the relative to a known standard. No such standard was available targeted region. (B) Insertion of agenomic targeting construct 65 for the co-hydroxy myristate. So it was quantified by measur into the genome inserts two recombinase sites, a recombinase ing the area under the peak in the MS chromatogram. Parts C gene and a selection marker between the two target and D: the Substrates methyl myristate, sodium myristate or US 9,359,581 B2 13 14 myristic acid were added to a final concentration of 10 g/land media F plus 20 g/l glycerol in a 125 ml flask, and grown at the pH was adjusted to between 7.5 and 8. The culture was pH 30° C. and 250 rpm for 12 hours before addition of substrate. controlled by adding 2 mol/l NaOH every 12 hours and glu After addition of substrates growth was continued at 30° C. cose was fed as a coSubstrate by adding 400 g/l glucose every and 250 rpm. Part A: the substrate ()-hydroxy laurate was then 8 hours. Samples were taken after 48 hours, cell culture was added to a final concentration of 5 g/l and the pH was adjusted acidified to pH -1.0 by addition of 6 NHCl, products were to between 7.5 and 8. Samples were taken after 24 hours, cell extracted from the cell culture by diethyl ether and the con culture was acidified to pH -1.0 by addition of 6 NHCl, centrations of co-hydroxy myristate and of the C14 diacid products were extracted from the cell culture by diethyl ether produced by oxidation of the co-hydroxy myristate were mea and the concentrations of C.O)-dicarboxy laurate were mea Sured by LC-MS (liquid chromatography mass spectros 10 Sured by LC-MS (liquid chromatography mass spectros copy). copy). Part B: the substrate ()-hydroxy palmitate was then FIG. 14 shows two pathways for metabolism offatty acids, added to a final concentration of 5 g/l and the pH was adjusted ()-oxidation and 3-oxidation, both of which exist in Candida to between 7.5 and 8. Samples were taken after 24 hours, cell species of yeast including Candida tropicalis. The names of culture was acidified to pH -1.0 by addition of 6 NHCl, classes of compounds are shown, arrows indicate transforma 15 products were extracted from the cell culture by diethyl ether tions from one compound to another, and the names of classes and the concentrations of C.O)-dicarboxy laurate were mea of enzymes that perform these conversions are indicated by Sured by LC-MS (liquid chromatography mass spectros underlined names adjacent to the arrows. By inactivating the copy). Candida tropicalis genes poX4 and poX5 (or their functional FIG. 17 shows a phylogenetic tree with five Candida tropi homologs in other Candida species), the 3-oxidation pathway calis alcohol dehydrogenase sequences (A10, B11, B2, A4 is blocked (indicated by broken arrows), so that fatty acids are and B4) and two alcohol dehydrogenases from Candida albi not used as Substrates for growth. Furthermore, inactivation cans (Ca ADH1A and Ca ADH2A). of CYP52A type cytochrome P450 enzymes prevents the FIG. 18 shows a schematic design for selecting two sets of w-oxidation of fatty acids. Several enzymes including, but nested targeting sequences for the deletion of two alleles of a not limited to CYP52A type P450s, are responsible for trans 25 gene whose sequences are very similar, for example the alco formations involved in oxidizing ()-hydroxy fatty acids to holdehydrogenase genes. The construct for the first allele C.()-dicarboxylic acids. If other enzymes involved in oxida uses ~200 base pair at the 5' end and ~200 base pair at the 3' tion of ()-hydroxy fatty acids are present in the strain, then the end as targeting sequences (5'-ADH Out and 3'-ADH Out). strain will convert ()-hydroxy fatty acids fed in the media to The construct for the second allele uses two sections of ~200 C.()-dicarboxylic acids. If other enzymes involved in oxida 30 base pair between the first two targeting sequences (5'-ADH tion of co-hydroxy fatty acids have been eliminated from the In and 3'-ADH In). These sequences are eliminated by the strain, then the strain will convert ()-hydroxy fatty acids fed in first targeting construct from the first allele of the gene and the media to C.O)-dicarboxylic acids. will thus serve as a targeting sequence for the second allele of FIG. 15 shows the levels of C,c)-dicarboxylic acids pro the gene. duced by Candida tropicalis strains DP186, DP258 and 35 FIG. 19 shows the levels of C.co-dicarboxylic acids pro DP259 (see Table 3 for genotypes). Cultures of the yeast duced by Candida tropicalis strains DP1, DP283 and DP415 strains were grown at 30°C. and 250 rpm for 16 hours in a 500 (see Table 3 for genotypes). Cultures of the yeast strains were ml flask containing 30 ml of media F (media F is peptone3 g/l. grown at 30° C. and 250 rpm for 18 hours in a 500 ml flask yeast extract 6 g/l. yeast nitrogen base 6.7 g/l. Sodium acetate containing 30 ml of media F (media F is peptone 3 g/l, yeast 3 g/l. KHPO, 7.2 g/l. KHPO, 9.3 g/l) plus 20 g/l glycerol. 40 extract 6 g/l. yeast nitrogen base 6.7 g/l. Sodium acetate 3 g/l. After 16 hours 0.5 ml of culture was added to 4.5 ml fresh KHPO, 7.2 g/l. KHPO 9.3 g/l) plus 20 g/l glycerol. After media F plus 20 g/l glycerol in a 125 ml flask, and grown at 18 hours the preculture was diluted in fresh media to 30° C. and 250 rpm for 12 hours before addition of substrate. Asoo 1.0. This culture was shaken until the Asoo reached After addition of substrates growth was continued at 30° C. between 5.0 and 6.0. Biocatalytic conversion was initiated by and 250 rpm. Part A: the substrate ()-hydroxy laurate was then 45 adding 5 ml culture to a 125 ml flask together with 50 mg of added to a final concentration of 5 g/l and the pH was adjusted ()-hydroxy lauric acid, and pH adjusted to ~7.5 with 2M to between 7.5 and 8. Samples were taken after 24 hours, cell NaOH. Part A: cell growth was followed by measuring the culture was acidified to pH -1.0 by addition of 6 NHCl, Asoo every 2 hours. Part B: formation of diacid; every 2 hours products were extracted from the cell culture by diethyl ether a sample of the cell culture was taken, acidified to pH~1.0 by and the concentrations of C.O)-dicarboxy laurate were mea 50 addition of 6 NHCl, products were extracted from the cell Sured by LC-MS (liquid chromatography mass spectros culture by diethyl ether and the concentrations of C,c)-dicar copy). Part B: the substrate ()-hydroxy palmitate was then boxy laurate were measured by LC-MS (liquid chromatogra added to a final concentration of 5 g/l and the pH was adjusted phy mass spectroscopy). to between 7.5 and 8. Samples were taken after 24 hours, cell FIG. 20 shows the levels of C.co-dicarboxylic acids pro culture was acidified to pH -1.0 by addition of 6 NHCl, 55 duced by Candida tropicalis strains DP1, DP390, DP415, products were extracted from the cell culture by diethyl ether DP417, DP421, DP423, DP434 and DP436 (see Table 3 for and the concentrations of C.O)-dicarboxy laurate were mea genotypes). Cultures of the yeast strains were grown at 30°C. Sured by LC-MS (liquid chromatography mass spectros and 250 rpm for 18 hours in a 500 ml flask containing 30 ml copy). of media F (media F is peptone 3 g/l. yeast extract 6 g/l. yeast FIG. 16 shows the levels of C,c)-dicarboxylic acids pro 60 nitrogen base 6.7 g/l. Sodium acetate 3 g/l. KHPO, 7.2 g/l. duced by Candida tropicalis strains DP186, DP283 and KHPO, 9.3 g/l) plus 20 g/l glycerol. After 18 hours the DP284 (see Table 3 for genotypes). Cultures of the yeast preculture was diluted in fresh media to A-1.0. This cul strains were grown at 30°C. and 250 rpm for 16 hours in a 500 ture was shaken until the Ao reached between 5.0 and 6.0. ml flask containing 30 ml of media F (media F is peptone3 g/l. Biocatalytic conversion was initiated by adding 5 ml culture yeast extract 6 g/l. yeast nitrogen base 6.7 g/l. Sodium acetate 65 to a 125 ml flask together with 50 mg of co-hydroxy lauric 3 g/l. KHPO, 7.2 g/l. KHPO 9.3 g/l) plus 20 g/l glycerol. acid, and pH adjusted to ~7.5 with 2M NaOH. Formation of After 16 hours 0.5 ml of culture was added to 4.5 ml fresh diacid was measured at the indicated intervals by taking a US 9,359,581 B2 15 16 sample of the cell culture and acidifying to pH~1.0 by addi dpl. dp201 and dp428 (see table 3 for genotypes). Cultures of tion of 6 NHCl, products were extracted from the cell culture the yeast strains were grown at 30° c. and 250 rpm for 18 by diethyl ether and the concentrations of C,c)-dicarboxy hours in a 500 ml flask containing 30 ml of media f(media f laurate were measured by LC-MS (liquid chromatography is peptone 3 g/l. yeast extract 6 g/l, yeast nitrogen base 6.7 g/l. mass spectroscopy). Sodium acetate 3 g/l. khpo 7.2 g/l. khpo 9.3 g/l) plus 20 g/1 FIG. 21 shows a schematic representation of a DNA glucose plus 5 g/l ethanol. After 18 hours 3 ml of preculture 'genomic insertion' construct for inserting sequences to be was added to 27 ml fresh media fplus 20 g/l glucose plus 5 g/1 expressed into the genome of yeasts. The general structure is ethanol in a 500 ml flask, and grown at 30° c. and 250 rpm for that the construct has a gene for expression which is preceded 20 hours before addition of substrate. Biocatalytic conversion by a promoter that is active in the yeast (Promoter 1). Pro 10 was initiated by adding 40 g/l of methyl myristate, the ph was moter 1 comprises a linearization position which may be a site adjusted to ~7.8 with 2 m naoh. The culture was ph controlled recognized by a restriction enzyme which cleaves the by adding 2 mol/l naoh every 12 hours, glycerol was fed as genomic insertion construct once to linearize it, or an anneal cosubstrate by adding 500 g/l glycerol and ethanol was fed as ing site for PCR primers to amplify a linear molecule from the a inducer by adding 50% ethanol every 12 hours. Samples construct. Three positions (A, B and C) are marked in Pro 15 were taken at the times indicated, cell culture was acidified to moter 1 for reference in FIG.22 when the construct is linear ph ~1.0 by addition of 6 nhcl. products were extracted from ized. The gene for expression is optionally followed by a the cell culture by diethyl ether and the concentrations of transcription terminator (Transcription terminator 1). The ()-hydroxy myristate and C. ()-dicarboxymyristate were mea genomic insertion construct also comprises a selectable Sured by lc-ms (liquid chromatography mass spectroscopy). marker. The selectable marker is preferably one that is active FIG. 25 shows the levels of C,c)-dicarboxylic acids and in both bacterial and yeast hosts. To achieve this, the select ()-hydroxy fatty acids produced by Candida tropicalis Strains able marker may be preceded by a yeast promoter (promoter dp428 and dp522 (see table 3 for genotypes). Cultures of the 2) and a bacterial promoter, and optionally it may be followed yeast strains were grown at 30 c. in a dasgip parallel fermen by a transcription terminator (transcription terminator 2). The tor containing 200 ml of media f (media f is peptone 3 g/l. genomic insertion construct also comprises a bacterial origin 25 yeast extract 6 g/l. yeast nitrogen base 6.7 g/l. Sodium acetate of replication. 3 g/l. khpo 7.2 g/l. khpo 9.3 g/l) plus 30 g/l glucose. The FIG. 22 shows a schematic representation of the integra ph was maintained at 6.0 by automatic addition of 6 m naoh or tion of a DNA “genomic insertion' construct into the DNA of 2 m hiso solution. Dissolved oxygen was kept at 70% by a yeast genome. Part A shows an integration construct of the agitation and o-cascade control mode. After 6 hour growth, structure shown in FIG. 22, with parts marked. The construct 30 ethanol was fed into the cell culture to 5 g/l. After 12h growth, is linearized, for example by digesting with an enzyme that biocatalytic conversion was initiated by adding (a) 20 g/l of recognizes a unique restriction site within promoter 1, or by methyl myristate, (b) 20 g/l oleic acid or (c) 10 g/l linoleic PCR amplification, or by any other method, so that a portion acid. During the conversion phase, 80% glycerol was fed as of promoter 1 is at one end of the linearized construct (5' part), co-substrate for conversion of methyl myristate and 500 g/1 and the remainder at the other end (3' end). Three positions 35 glucose was fed as co-substrate for conversion of oleic acid (A, B and C) are marked in Promoter 1, these refer to the and linoleic acid by dissolved oxygen-stat control mode (the positions in FIG. 21. Part B shows the intact Promoter 1 in the high limit of dissolved oxygen was 75% and low limit of yeast genome, followed by the gene that is normally tran dissolved oxygen was 70%, which means glycerol feeding scribed from Promoter 1 (genomic gene expressed from pro was initiated when dissolved oxygen is higher than 75% and moter 1). Three positions (A, B and C) are also marked in the 40 stopped when dissolved oxygen was lower than 70%). Every genomic copy of Promoter 1. Part C shows the genome after 12 hour, ethanol was added into cell culture to 2 g/l, and fatty integration of the construct. The construct integrates at posi acid substrate was added to 20 g/1 until the total substrate tion B in Promoter 1, the site at which the construct was concentration added was (a) 60 g/l of methyl myristate, (b) 60 linearized. This results in a duplication of promoter 1 in the g/l oleic acid or (c)30 g/l linoleic acid. Formation of products genome, with one copy of the promoter driving transcription 45 was measured at the indicated intervals by taking samples and of the introduced gene for expression and the other copy acidifying to ph ~1.0 by addition of 6 nhcl: products were driving the transcription of the genomic gene expressed from extracted from the cell culture by diethyl ether and the con promoter 1. centrations of ()-hydroxy fatty acids and C.co-dicarboxylic FIG. 23 shows a specific embodiment of the DNA acids were measured by lc-ms (liquid chromatography mass “genomic insertion' construct shown in FIG. 21. The general 50 spectroscopy). structure is that the construct has a gene for expression which FIG. 26 shows the levels of C,c)-dicarboxylic acids and is preceded by a promoter that is active in the yeast (the ()-hydroxy fatty acids produced by Candida tropicalis strain Candida tropicalis isocitrate lyase promoter). The isocitrate dp428 (see table 3 for genotype) in two separate fermentor lyase promoter comprises a unique BsiWI site whereby the runs. C. Tropicalis dp428 was taken from a glycerol stock or construct may be cleaved by endocunclease BsiWI once to 55 fresh agar plate and inoculated into 500 ml shake flask con linearize it. The gene for expression is followed by a tran taining 30 ml of ypd medium (20 g/l glucose, 20 g/l peptone Scription terminator (isocitrate lyase transcription termina and 10 g/l yeast extract) and shaken at 30 c. 250 rpm for 20 tor). The genomic insertion construct also comprises a select hours. Cells were collected by centrifugation and re-sus able marker conferring resistance to the antibiotic Zeocin. pended infm3 medium for inoculation. (fm3 medium is 30 g/1 This selectable marker is active in both bacterial and yeast 60 glucose, 7 g/l ammonium sulfate, 5.1 g/l potassium phos hosts and preceded by a yeast promoter (the TEF1 promoter) phate, monobasic, 0.5 g/l magnesium Sulfate, 0.1 g/l calcium and a Bacterial promoter (the EM7 promoter), and followed chloride, 0.06 g/l citric acid, 0.023 g/l ferric chloride, 0.0002 by a transcription terminator (the CYC1 transcription termi g/l biotin and 1 ml/l of a trace elements solution. The trace nator 2). The genomic insertion construct also comprises a elements solution contains 0.9 g/l boric acid, 0.07 g/l cupric bacterial origin of replication (the puC origin of replication). 65 sulfate, 0.18 g/l potassium iodide, 0.36 g/l ferric chloride, FIG. 24 shows the levels of C,c)-dicarboxylic acids and 0.72 g/l manganese sulfate, 0.36 g/l sodium molybdate, 0.72 ()-hydroxy fatty acids produced by Candida tropicalis strains g/l Zinc sulfate.) Conversion was performed by inoculating 15 US 9,359,581 B2 17 18 ml of preculture into 135 ml fm3 medium, methyl myristate that embodiment in which each alternative is excluded singly was added to 20 g/l and the temperature was kept at 30c. The or in any combination with the other alternatives are also ph was maintained at 6.0 by automatic addition of 6 m naoh or hereby disclosed; more than one element of a disclosed 2 m hSo Solution. Dissolved oxygen was kept at 70% by embodiment can have Such exclusions, and all combinations agitation and o-cascade control mode. After six hour growth, of elements having such exclusions are hereby disclosed. ethanol was fed into the cell culture to 5 g/l. During the Unless defined otherwise herein, all technical and scien conversion phase, 80% glycerol was fed as co-substrate by tific terms used herein have the same meaning as commonly dissolved oxygen-stat control mode (the high limit of dis understood by one of ordinary skill in the art. Singleton et al., solved oxygen was 75% and low limit of dissolved oxygen Dictionary of Microbiology and Molecular Biology, 2nd Ed., was 70%, which means glycerol feeding was initiated when 10 John Wiley and Sons, New York, 1994, and Hale & Marham, dissolved oxygen is higher than 75% and stopped when dis The Harper Collins Dictionary of Biology, Harper Perennial, solved oxygen was lower than 70%). Every 12 hour, ethanol NY. 1991, provide one of ordinary skill in the art with a was added into cell culture to 2 g/l, and methyl myristate was general dictionary of many of the terms used herein. Although added to 40 g/1 until the total methyl myristate added was 140 any methods and materials similar or equivalent to those g/l (e.g. the initial 20 g/l plus 3 Subsequent 40 g/l additions). 15 described herein can be used in the practice or testing of the Formation of products was measured at the indicated inter disclosed embodiments, the preferred methods and materials vals by taking samples and acidifying to ph ~1.0 by addition are described. Unless otherwise indicated, nucleic acids are of 6 nhcl: products were extracted from the cell culture by written left to right in 5' to 3' orientation; amino acid diethyl ether and the concentrations of co-hydroxy myristate sequences are written left to right in amino to carboxy orien and am-dicarboxymyristate were measured by lc-ms (liquid tation, respectively. The terms defined immediately below are chromatography mass spectroscopy). more fully defined by reference to the specification as a FIG. 27 shows the red fluorescent protein mGherry pro whole. duced by Candida tropicalis strain DP197 (see Table 3 for As used, herein, computation of percent identity takes full genotypes). Cultures of the yeast strains were grown at 30°C. weight of any insertions in two sequences for which percent on plates containing Buffered Minimal Medium+0.5% Glu 25 identity is computed. To compute percent identity between cose, 0.5% Glycerol, and 0.5% EtOH. two sequences, they are aligned and any necessary insertions in either sequence being compared are then made in accor 5. DETAILED DESCRIPTION dance with sequence alignment algorithms known in the art. Then, the percent identity is computed, where each insertion It is to be understood that what is disclosed herein is not 30 in either sequence necessary to make the optimal alignment limited to the particular methodology, devices, solutions or between the two sequences is counted as a mismatch. apparatuses described, as such methods, devices, solutions or The terms “polynucleotide.” “oligonucleotide,” “nucleic apparatuses can, of course, vary. acid and “nucleic acid molecule' and “gene' are used inter changeably herein to refer to a polymeric form of nucleotides 5.1. Definitions 35 of any length, and may comprise ribonucleotides, deoxyribo nucleotides, analogs thereof, or mixtures thereof. This term Use of the singular forms “a,” “an and “the include refers only to the primary structure of the molecule. Thus, the plural references unless the context clearly dictates other term includes triple-, double- and single-stranded deoxyribo wise. Thus, for example, reference to “a polynucleotide' nucleic acid (“DNA), as well as triple-, double- and single includes a plurality of polynucleotides, reference to “a sub 40 stranded ribonucleic acid (“RNA). It also includes modified, strate' includes a plurality of such substrates, reference to “a for example by alkylation, and/or by capping, and unmodified variant' includes a plurality of variants, and the like. forms of the polynucleotide. More particularly, the terms Terms such as “connected,” “attached,” “linked, and “con "polynucleotide.” “oligonucleotide.” “nucleic acid and jugated are used interchangeably herein and encompass “nucleic acid molecule' include polydeoxyribonucleotides direct as well as indirect connection, attachment, linkage or 45 (containing 2-deoxy-D-ribose), polyribonucleotides (con conjugation unless the context clearly dictates otherwise. taining D-ribose), including tRNA, rRNA, hRNA, siRNA and Where a range of values is recited, it is to be understood that mRNA, whether spliced or unspliced, any other type of poly each intervening integer value, and each fraction thereof, nucleotide which is an N- or C-glycoside of a purine or between the recited upper and lower limits of that range is pyrimidine base, and other polymers containing nonnucleo also specifically disclosed, along with each Subrange between 50 tidic backbones, for example, polyamide (e.g., peptide Such values. The upper and lower limits of any range can nucleic acids (“PNAs)) and polymorpholino (commercially independently be included in or excluded from the range, and available from the Anti-Virals, Inc., Corvallis, Oreg., as Neu each range where either, neither or both limits are included is gene) polymers, and other synthetic sequence-specific also encompassed in the disclosed embodiments. Where a nucleic acid polymers providing that the polymers contain value being discussed has inherent limits, for example where 55 nucleobases in a configuration which allows for base pairing a component can be present at a concentration of from 0 to and base stacking, such as is found in DNA and RNA. There 100%, or where the pH of an aqueous solution can range from is no intended distinction in length between the terms “poly 1 to 14, those inherent limits are specifically disclosed. Where nucleotide.” “oligonucleotide.” “nucleic acid' and “nucleic a value is explicitly recited, it is to be understood that values acid molecule.” and these terms are used interchangeably which are about the same quantity or amount as the recited 60 herein. These terms refer only to the primary structure of the value are also encompassed. Where a combination is dis molecule. Thus, these terms include, for example, 3'-deoxy closed, each subcombination of the elements of that combi 2", 5'-DNA, oligodeoxyribonucleotide N3' P5' phosphorami nation is also specifically disclosed and is within the scope of dates. 2'-O-alkyl-substituted RNA, double- and single the disclosed embodiments. Conversely, where different ele stranded DNA, as well as double- and single-stranded RNA, ments or groups of elements are individually disclosed, com 65 and hybrids thereof including for example hybrids between binations thereof are also disclosed. Where any embodiment DNA and RNA or between PNAS and DNA or RNA, and also is disclosed as having a plurality of alternatives, examples of include known types of modifications, for example, labels, US 9,359,581 B2 19 20 alkylation, “caps, substitution of one or more of the nucle unique base pairs are known, such as those described in Leach otides with an analog, internucleotide modifications such as, et al., 1992, J. Am. Chem. Soc.114:3675-3683 and Switzer et for example, those with uncharged linkages (e.g., methyl al., Supra. phosphonates, phosphotriesters, phosphoramidates, carbam The phrase “DNA sequence” refers to a contiguous nucleic ates, etc.), with negatively charged linkages (e.g., phospho- 5 acid sequence. The sequence can be either single stranded or rothioates, phosphorodithioates, etc.), and with positively double stranded, DNA or RNA, but double stranded DNA charged linkages (e.g., aminoalkylphosphoramidates, ami sequences are preferable. The sequence can be an oligonucle noalkylphosphotriesters), those containing pendant moieties, otide of 6 to 20 nucleotides in length to a full length genomic Such as, for example, proteins (including enzymes (e.g. sequence of thousands or hundreds of thousands of base pairs. nucleases), toxins, antibodies, signal peptides, poly-L-lysine, 10 DNA sequences are written from 5' to 3' unless otherwise etc.), those with intercalators (e.g., acridine, psoralen, etc.), indicated. those containing chelates (of, e.g., metals, radioactive metals, The term “protein’ refers to contiguous “amino acids” or boron, oxidative metals, etc.), those containing alkylators, amino acid “residues. Typically, proteins have a function. those with modified linkages (e.g., alpha anomeric nucleic 15 However, for purposes of this disclosure, proteins also acids, etc.), as well as unmodified forms of the polynucleotide encompass polypeptides and Smaller contiguous amino acid or oligonucleotide. sequences that do not have a functional activity. The func Where the polynucleotides are to be used to express tional proteins of this disclosure include, but are not limited encoded proteins, nucleotides that can perform that function to, esterases, dehydrogenases, hydrolases, oxidoreductases, or which can be modified (e.g., reverse transcribed) to per- 20 transferases, lyases, ligases, receptors, receptor ligands, form that function are used. Where the polynucleotides are to cytokines, antibodies, immunomodulatory molecules, signal be used in a scheme that requires that a complementary strand ing molecules, fluorescent proteins and proteins with insec be formed to a given polynucleotide, nucleotides are used ticidal or biocidal activities. Useful general classes of which permit such formation. enzymes include, but are not limited to, proteases, cellulases, It will be appreciated that, as used herein, the terms 25 lipases, hemicellulases, laccases, amylases, glucoamylases, “nucleoside' and “nucleotide' will include those moieties esterases, lactases, polygalacturonases, galactosidases, ligni which contain not only the known purine and pyrimidine nases, oxidases, peroxidases, glucose isomerases, nitrilases, bases, but also other heterocyclic bases which have been hydroxylases, polymerases and depolymerases. In addition to modified. Such modifications include methylated purines or enzymes, the encoded proteins which can be used in this pyrimidines, acylated purines or pyrimidines, or otherhetero- 30 disclosure include, but are not limited to, transcription fac cycles. Modified nucleosides or nucleotides can also include modifications on the sugar moiety, e.g., where one or more of tors, antibodies, receptors, growth factors (any of the PDGFs, the hydroxyl groups are replaced with halogen, aliphatic EGFs, FGFs, SCF, HGF, TGFs, TNFs, insulin, IGFs, LIFs, groups, or is functionalized as ethers, amines, or the like. oncostatins, and CSFs), immunomodulators, peptide hor Standard A-T and G-C base pairs form under conditions 35 mones, cytokines, integrins, interleukins, adhesion mol which allow the formation of hydrogen bonds between the ecules, thrombomodulatory molecules, protease inhibitors, N3-H and C4-oxy of thymidine and the NI and C6-NH2, angiostatins, defensins, cluster of differentiation antigens, respectively, of adenosine and between the C2-oxy, N3 and interferons, chemokines, antigens including those from infec C4-NH2, of cytidine and the C2-NH2, N H and C6-oxy, tious viruses and organisms, oncogene products, thrombopoi respectively, of guanosine. Thus, for example, guanosine 40 etin, erythropoietin, tissue plasminogen activator, and any (2-amino-6-oxy-9-beta-D-ribofuranosyl-purine) may be other biologically active protein which is desired for use in a modified to form isoguanosine (2-oxy-6-amino-9-beta-D-ri clinical, diagnostic or veterinary setting. All of these proteins bofuranosyl-purine). Such modification results in a nucleo are well defined in the literature and are so defined herein. side base which will no longer effectively form a standard Also included are deletion mutants of Such proteins, indi base pair with cytosine. However, modification of cytosine 45 vidual domains of Such proteins, fusion proteins made from (1-beta-D-ribofuranosyl-2-oxy-4-amino-pyrimidine) to form Such proteins, and mixtures of Such proteins; particularly isocytosine (1-beta-D-ribofuranosyl-2-amino-4-oxy-pyrimi useful are those which have increased half-lives and/or dine-) results in a modified nucleotide which will not effec increased activity. tively base pair with guanosine but will form a base pair with “Polypeptide' and “protein’ are used interchangeably isoguanosine (U.S. Pat. No. 5,681.702 to Collins et al., hereby 50 herein and include a molecular chain of amino acids linked incorporated by reference in its entirety). Isocytosine is avail through peptide bonds. The terms do not refer to a specific able from Sigma Chemical Co. (St. Louis, Mo.); isocytidine length of the product. Thus, “peptides.” “oligopeptides, and may be prepared by the method described by Switzer et al. “proteins' are included within the definition of polypeptide. (1993) Biochemistry 32: 10489-10496 and references cited The terms include polypeptides containing in co-and/or post therein; 2'-deoxy-5-methyl-isocytidine may be prepared by 55 translational modifications of the polypeptide made in vivo or the method of Tor et al., 1993, J. Am. Chem. Soc. 115:4461 in vitro, for example, glycosylations, acetylations, phospho 4467 and references cited therein; and isoguanine nucleotides rylations, PEGylations and Sulphations. In addition, protein may be prepared using the method described by Switzer et al., fragments, analogs (including amino acids not encoded by the 1993, supra, and Mantsch et al., 1993, Biochem. 14:5593 genetic code, e.g. homocysteine, ornithine, p-acetylphenyla 5601, or by the method described in U.S. Pat. No. 5,780,610 60 lanine, D-amino acids, and creatine), natural or artificial to Collins et al., each of which is hereby incorporated by mutants or variants or combinations thereof, fusion proteins, reference in its entirety. Other nonnatural base pairs may be derivatized residues (e.g. alkylation of amine groups, acety synthesized by the method described in Piccirilliet al., 1990, lations or esterifications of carboxyl groups) and the like are Nature 343:33-37, hereby incorporated by reference in its included within the meaning of polypeptide. entirety, for the synthesis of 2,6-diaminopyrimidine and its 65 “Amino acids” or "amino acid residues' may be referred to complement (1-methylpyrazolo-4.3 pyrimidine-5,7-(4H, herein by either their commonly known three letter symbols 6H)-dione. Other such modified nucleotidic units which form or by the one-letter symbols recommended by the IUPAC US 9,359,581 B2 21 22 IUB Biochemical Nomenclature Commission. Nucleotides, In some embodiments, a gene is deemed to be disrupted likewise, may be referred to by their commonly accepted when the disrupted gene expresses protein in a first host cell single-letter codes. organism that contains the disrupted gene in amounts that are The term “expression system” refers to any in vivo or in 20% or less than the amounts of protein expressed by the wild vitro biological system that is used to produce one or more 5 type counterpart of the gene in a second host cell organism protein encoded by a polynucleotide. that does not contain the disrupted gene, when the first host The term “translation” refers to the process by which a cell organism and the second host cell organism are under the polypeptide is synthesized by a ribosome reading the same environmental conditions (e.g., same temperature, sequence of a polynucleotide. same media, etc.). 10 In some embodiments, a gene is deemed to be disrupted In some embodiments, the term “disrupt” means to reduce when the disrupted gene expresses protein in a first host cell or diminish the expression of a gene in a host cell organism. organism that contains the disrupted gene in amounts that are In some embodiments, the term “disrupt” means to reduce 30% or less than the amounts of protein expressed by the wild or diminish a function of a protein encoded by a gene in a host type counterpart of the gene in a second host cell organism cell organism. This function may be, for example, an enzy 15 that does not contain the disrupted gene, when the first host matic activity of the protein, a specific enzymatic activity of cell organism and the second host cell organism are under the the protein, a protein-protein interaction that the protein same environmental conditions (e.g., same temperature, undergoes in a host cell organism, or a protein-nucleic acid same media, etc.). interaction that the protein undergoes in a host cell organism. In some embodiments, a gene is deemed to be disrupted In some embodiments, the term “disrupt” means to elimi when the disrupted gene expresses protein in a first host cell nate the expression of a gene in a host cell organism. organism that contains the disrupted gene in amounts that are In some embodiments, the term “disrupt” means to elimi 40% or less than the amounts of protein expressed by the wild nate the function of a protein encoded by a gene in a host cell type counterpart of the gene in a second host cell organism organism. This function may be, for example, an enzymatic that does not contain the disrupted gene, when the first host activity of the protein, a specific enzymatic activity of the 25 cell organism and the second host cell organism are under the protein, a protein-protein interaction that the protein under same environmental conditions (e.g., same temperature, goes in a host cell organism, or a protein-nucleic acid inter same media, etc.). action that the protein undergoes in a host cell organism. In some embodiments, a gene is deemed to be disrupted In Some embodiments, the term "disrupt” means to cause a when the disrupted gene expresses protein in a first host cell protein encoded by a gene in a host cell organism to have a 30 organism that contains the disrupted gene in amounts that are modified activity spectrum (e.g., reduced enzymatic activity) 50% or less than the amounts of protein expressed by the wild relative to wild-type activity spectrum of the protein. type counterpart of the gene in a second host cell organism In some embodiments, disruption is caused by mutating a that does not contain the disrupted gene, when the first host gene in a host cell organism that encodes a protein. For cell organism and the second host cell organism are under the example, a point mutation, an insertion mutation, a deletion 35 same environmental conditions (e.g., same temperature, mutation, or any combination of such mutations, can be used same media, etc.). to disrupt the gene. In some embodiments, this mutation In some embodiments, a gene is deemed to be disrupted causes the protein encoded by the gene to express poorly or when the disrupted gene expresses protein in a first host cell not at all in the host cell organism. In some embodiments, this organism that contains the disrupted gene in amounts that are mutation causes the gene to no longer be present in the host 40 60% or less than the amounts of protein expressed by the wild cell organism. In some embodiments, this mutation causes type counterpart of the gene in a second host cell organism the gene to no longer encode a functional protein in the host that does not contain the disrupted gene, when the first host cell organism. The mutation to the gene may be in the portion cell organism and the second host cell organism are under the of the gene that encodes a protein product (exon), it may be in same environmental conditions (e.g., same temperature, any of the regulatory sequences (e.g., promoter, enhancer, 45 same media, etc.). etc.) that regulate the expression of the gene, or it may arise in In some embodiments, a gene is deemed to be disrupted an intron. when the disrupted gene expresses protein in a first host cell In some embodiments, the disruption (e.g., mutation) of a organism that contains the disrupted gene in amounts that are gene causes the protein encoded by the gene to have a muta 70% or less than the amounts of protein expressed by the wild tion that diminishes a function of the protein relative to the 50 type counterpart of the gene in a second host cell organism function of the wild type counterpart of the mutated protein. that does not contain the disrupted gene, when the first host As used, herein, the wild type counterpart of a mutated cell organism and the second host cell organism are under the protein is the unmutated protein, occurring in wild type host same environmental conditions (e.g., same temperature, cell organism, which corresponds to the mutated protein. For same media, etc.). example, if the mutated protein is a protein encoded by 55 In some embodiments, a gene is deemed to be disrupted mutated Candida tropicalis PDX 5, the wildtype counterpart when the abundance of mRNA transcripts that encode the of the mutated protein is the gene product from naturally disrupted gene in a first host cell organism that has the dis occurring Candida tropicalis PDX5 that is not mutated. rupted gene are 20% or less than the abundance of mRNA As used herein, the wildtype counterpart of a mutated gene transcripts that encode the gene in second wild type host cell is the unmutated gene occurring in wild type host cell organ 60 organism that does not contain the disrupted gene when the ism, which corresponds to the mutated gene. For example, if first host cell organism and the second host cell organism are the mutated gene is Candida tropicalis PDX 5 containing a under the same environmental conditions (e.g., temperature, point mutation, the wild type counterpart is Candida tropica media, etc.). lis PDX5 without the point mutation. In some embodiments, a gene is deemed to be disrupted In some embodiments, a gene is deemed to be disrupted 65 when the abundance of mRNA transcripts that encode the when the gene is not capable of expressing protein in the host disrupted gene in a first host cell organism that has the dis cell organism. rupted gene are 30% or less than the abundance of mRNA US 9,359,581 B2 23 24 transcripts that encode the gene in second wild type host cell the protein are under the same conditions (e.g., temperature, organism that does not contain the disrupted gene when the concentration, pH, concentration of Substrate, salt concentra first host cell organism and the second host cell organism are tion, etc.). under the same environmental conditions (e.g., temperature, In some embodiments, a protein is deemed to be disrupted media, etc.). when the protein has an enzymatic activity that is 60% or less In some embodiments, a gene is deemed to be disrupted than the activity of the wild type counterpart of the protein when the abundance of mRNA transcripts that encode the when the disrupted protein and the wild type counterpart of disrupted gene in a first host cell organism that has the dis the protein are under the same conditions (e.g., temperature, rupted gene are 40% or less than the abundance of mRNA concentration, pH, concentration of Substrate, salt concentra transcripts that encode the gene in second wild type host cell 10 tion, etc.). organism that does not contain the disrupted gene when the In some embodiments, a protein is deemed to be disrupted first host cell organism and the second host cell organism are when the protein has an enzymatic activity that is 70% or less under the same environmental conditions (e.g., temperature, than the activity of the wild type counterpart of the protein media, etc.). 15 when the disrupted protein and the wild type counterpart of In some embodiments, a gene is deemed to be disrupted the protein are under the same conditions (e.g., temperature, when the abundance of mRNA transcripts that encode the concentration, pH, concentration of Substrate, salt concentra disrupted gene in a first host cell organism that has the dis tion, etc.). rupted gene are 50% or less than the abundance of mRNA In some embodiments enzymatic activity is defined as transcripts that encode the gene in second wild type host cell moles of substrate converted per unit time-ratexreaction vol organism that does not contain the disrupted gene when the ume. Enzymatic activity is a measure of the quantity of active first host cell organism and the second host cell organism are enzyme present and is thus dependent on conditions, which under the same environmental conditions (e.g., temperature, are to be specified. The SI unit for enzyme activity is the katal, media, etc.). 1 katal=1 mol s-1. In some embodiments, a gene is deemed to be disrupted 25 In some embodiments enzymatic activity is expressed as an when the abundance of mRNA transcripts that encode the enzyme unit (EU)=130 umol/min, where 1 U corresponds to disrupted gene in a first host cell organism that has the dis 16.67 nanokatals. See Nomenclature Committee of the Inter rupted gene are 60% or less than the abundance of mRNA national Union of Biochemistry (NC-IUB) (1979), “Units of transcripts that encode the gene in second wild type host cell Enzyme Activity.” Eur. J. Biochem. 97: 319-320, which is organism that does not contain the disrupted gene when the 30 hereby incorporated by reference herein. first host cell organism and the second host cell organism are In some embodiments, a protein is deemed to be disrupted under the same environmental conditions (e.g., temperature, when a sample of the disrupted protein “disrupted sample” media, etc.). having a purity of 50% weight per weight (w/w) or weight per In some embodiments, a gene is deemed to be disrupted volume (w/v) or greater, a purity of 55% (w/w or w/v) or when the abundance of mRNA transcripts that encode the 35 greater, a purity of 60% (w/w or w/v) or greater, a purity of disrupted gene in a first host cell organism that has the dis 65% (w/w or w/v) or greater, a purity of 70% (w/w or w/v) or rupted gene are 70% or less than the abundance of mRNA greater, a purity of 75% (w/w or w/v) or greater, a purity of transcripts that encode the gene in second wild type host cell 80% (w/w or w/v) or greater, a purity of 85% (w/w or w/v) or organism that does not contain the disrupted gene when the greater, a purity of 90% (w/w or w/v) or greater, a purity of first host cell organism and the second host cell organism are 40 95% (w/w or w/v) or greater, a purity of 99% (w/w or w/v) or under the same environmental conditions (e.g., temperature, greater in the disrupted Sample has a specific enzymatic activ media, etc.). ity that is 20% or less than the specific enzymatic activity of In some embodiments, a protein is deemed to be disrupted a sample of the wildtype counterpart of the protein “wildtype when the protein has an enzymatic activity that is 20% or less sample' in which the purity of the wildtype counterpart of the than the activity of the wild type counterpart of the protein 45 protein in the wild type sample is the same as or greater than when the disrupted protein and the wild type counterpart of the purity of the disrupted protein in the disrupted protein the protein are under the same conditions (e.g., temperature, sample, wherein disrupted protein sample and the sample concentration, pH, concentration of Substrate, salt concentra wild type sample are under the same conditions (e.g., tem tion, etc.). perature, concentration, pH, concentration of Substrate, salt In some embodiments, a protein is deemed to be disrupted 50 concentration, etc.). when the protein has an enzymatic activity that is 30% or less In some embodiments, a protein is deemed to be disrupted than the activity of the wild type counterpart of the protein when a sample of the disrupted protein “disrupted sample' when the disrupted protein and the wild type counterpart of having a purity of 50% (w/w or w/v) or greater, a purity of the protein are under the same conditions (e.g., temperature, 55% (w/w or w/v) or greater, a purity of 60% (w/w or w/v) or concentration, pH, concentration of Substrate, salt concentra 55 greater, a purity of 65% (w/w or w/v) or greater, a purity of tion, etc.). 70% (w/w or w/v) or greater, a purity of 75% (w/w or w/v) or In some embodiments, a protein is deemed to be disrupted greater, a purity of 80% (w/w or w/v) or greater, a purity of when the protein has an enzymatic activity that is 40% or less 85% (w/w or w/v) or greater, a purity of 90% (w/w or w/v) or than the activity of the wild type counterpart of the protein greater, a purity of 95% (w/w or w/v) or greater, a purity of when the disrupted protein and the wild type counterpart of 60 99% (w/w or w/v) or greater in the disrupted sample has a the protein are under the same conditions (e.g., temperature, specific enzymatic activity that is 30% or less than the specific concentration, pH, concentration of Substrate, salt concentra enzymatic activity of a sample of the wild type counterpart of tion, etc.). the protein “wild type sample in which the purity of the wild In some embodiments, a protein is deemed to be disrupted type counterpart of the protein in the wild type sample is the when the protein has an enzymatic activity that is 50% or less 65 same as or greater than the purity of the disrupted protein in than the activity of the wild type counterpart of the protein the disrupted protein sample, wherein disrupted protein when the disrupted protein and the wild type counterpart of sample and the sample wild type sample are under the same US 9,359,581 B2 25 26 conditions (e.g., temperature, concentration, pH, concentra greater, a purity of 95% (w/w or w/v) or greater, a purity of tion of Substrate, Salt concentration, etc.). 99% (w/w or w/v) or greater in the disrupted sample has a In some embodiments, a protein is deemed to be disrupted specific enzymatic activity that is 70% or less than the specific when a sample of the disrupted protein “disrupted sample' enzymatic activity of a sample of the wild type counterpart of having a purity of 50% (w/w or w/v) or greater, a purity of 5 the protein “wild type sample in which the purity of the wild 55% (w/w or w/v) or greater, a purity of 60% (w/w or w/v) or type counterpart of the protein in the wild type sample is the greater, a purity of 65% (w/w or w/v) or greater, a purity of same as or greater than the purity of the disrupted protein in 70% (w/w or w/v) or greater, a purity of 75% (w/w or w/v) or the disrupted protein sample, wherein disrupted protein greater, a purity of 80% (w/w or w/v) or greater, a purity of sample and the sample wild type sample are under the same 85% (w/w or w/v) or greater, a purity of 90% (w/w or w/v) or 10 conditions (e.g., temperature, concentration, pH, concentra greater, a purity of 95% (w/w or w/v) or greater, a purity of tion of Substrate, Salt concentration, etc.). 99% (w/w or w/v) or greater in the disrupted sample has a In Some embodiments, the enzymatic activity or enzymatic specific enzymatic activity that is 40% or less than the specific specific activity is measured by an assay that measures the enzymatic activity of a sample of the wildtype counterpart of consumption of Substrate or the production of product over the protein “wild type sample in which the purity of the wild 15 time such as those disclosed in Schnell et al., 2006, Comptes type counterpart of the protein in the wild type sample is the Rendus Biologies 329, 51-61, which is hereby incorporated same as or greater than the purity of the disrupted protein in by reference herein. the disrupted protein sample, wherein disrupted protein In Some embodiments, the enzymatic activity or enzymatic sample and the sample wild type sample are under the same specific activity is measured by an initial rate experiment. In conditions (e.g., temperature, concentration, pH, concentra Such an assay, the protein (enzyme) is mixed with a large tion of Substrate, Salt concentration, etc.). excess of the Substrate, the enzyme-substrate intermediate In some embodiments, a protein is deemed to be disrupted builds up in a fast initial transient. Then the reaction achieves when a sample of the disrupted protein “disrupted sample' a steady-state kinetics in which enzyme Substrate intermedi having a purity of 50% (w/w or w/v) or greater, a purity of ates remains approximately constant over time and the reac 55% (w/w or w/v) or greater, a purity of 60% (w/w or w/v) or 25 tion rate changes relatively slowly. Rates are measured for a greater, a purity of 65% (w/w or w/v) or greater, a purity of short period after the attainment of the quasi-steady state, 70% (w/w or w/v) or greater, a purity of 75% (w/w or w/v) or typically by monitoring the accumulation of product with greater, a purity of 80% (w/w or w/v) or greater, a purity of time. Because the measurements are carried out for a very 85% (w/w or w/v) or greater, a purity of 90% (w/w or w/v) or short period and because of the large excess of Substrate, the greater, a purity of 95% (w/w or w/v) or greater, a purity of 30 approximation free Substrate is approximately equal to the 99% (w/w or w/v) or greater in the disrupted sample has a initial substrate can be made. The initial rate experiment is specific enzymatic activity that is 50% or less than the specific relatively free from complications such as back-reaction and enzymatic activity of a sample of the wildtype counterpart of enzyme degradation. the protein “wild type sample in which the purity of the wild In Some embodiments, the enzymatic activity or enzymatic type counterpart of the protein in the wild type sample is the 35 specific activity is measured by progress curve experiments. same as or greater than the purity of the disrupted protein in In Such experiments, the kinetic parameters are determined the disrupted protein sample, wherein disrupted protein from expressions for the species concentrations as a function sample and the sample wild type sample are under the same of time. The concentration of the substrate or product is conditions (e.g., temperature, concentration, pH, concentra recorded in time after the initial fast transient and for a suffi tion of Substrate, Salt concentration, etc.). 40 ciently long period to allow the reaction to approach equilib In some embodiments, a protein is deemed to be disrupted rium. when a sample of the disrupted protein “disrupted sample' In Some embodiments, the enzymatic activity or enzymatic having a purity of 50% (w/w or w/v) or greater, a purity of specific activity is measured by transient kinetics experi 55% (w/w or w/v) or greater, a purity of 60% (w/w or w/v) or ments. In Such experiments, reaction behaviour is tracked greater, a purity of 65% (w/w or w/v) or greater, a purity of 45 during the initial fast transient as the intermediate reaches the 70% (w/w or w/v) or greater, a purity of 75% (w/w or w/v) or steady-state kinetics period. greater, a purity of 80% (w/w or w/v) or greater, a purity of In Some embodiments, the enzymatic activity or enzymatic 85% (w/w or w/v) or greater, a purity of 90% (w/w or w/v) or specific activity is measured by relaxation experiments. In greater, a purity of 95% (w/w or w/v) or greater, a purity of these experiments, an equilibrium mixture of enzyme, Sub 99% (w/w or w/v) or greater in the disrupted sample has a 50 strate and product is perturbed, for instance by a temperature, specific enzymatic activity that is 60% or less than the specific pressure or pH jump, and the return to equilibrium is moni enzymatic activity of a sample of the wildtype counterpart of tored. The analysis of these experiments requires consider the protein “wild type sample in which the purity of the wild ation of the fully reversible reaction. type counterpart of the protein in the wild type sample is the In Some embodiments, the enzymatic activity or enzymatic same as or greater than the purity of the disrupted protein in 55 specific activity is measured by continuous assays, where the the disrupted protein sample, wherein disrupted protein assay gives a continuous reading of activity, or discontinuous sample and the sample wild type sample are under the same assays, where samples are taken, the reaction stopped and conditions (e.g., temperature, concentration, pH, concentra then the concentration of substrates/products determined. tion of Substrate, Salt concentration, etc.). In Some embodiments, the enzymatic activity or enzymatic In some embodiments, a protein is deemed to be disrupted 60 specific activity is measured by a fluorometric assay (e.g., when a sample of the disrupted protein “disrupted sample' Bergmeyer, 1974, "Methods of Enzymatic Analysis”, Vol. 4, having a purity of 50% (w/w or w/v) or greater, a purity of Academic Press, New York, N.Y., 2066-2072), a calorimetric 55% (w/w or w/v) or greater, a purity of 60% (w/w or w/v) or assay (e.g., Todd and Gomez, 2001, Anal Biochem. 296, greater, a purity of 65% (w/w or w/v) or greater, a purity of 179-187), a chemiluminescent assay, a light scattering assay, 70% (w/w or w/v) or greater, a purity of 75% (w/w or w/v) or 65 a radiometric assay, or a chromatrographic assay (e.g., greater, a purity of 80% (w/w or w/v) or greater, a purity of Churchwella et al., 2005, Journal of Chromatography B825, 85% (w/w or w/v) or greater, a purity of 90% (w/w or w/v) or 134-143). US 9,359,581 B2 27 28 In some embodiments, a protein is deemed to be disrupted 10% (wt/vol) dextran sulfate, and 5-20x106 cpm32P-labeled when the protein has a function whose performance is 20% or probe. Filters are incubated in hybridization mixture for less than the function of the wild type counterpart of the 18-20 hours at 40°C., and then washed for 1.5 hour at 55° C. protein when the disrupted protein and the wild type counter in a solution containing 2xSSC, 25 mM Tris-HCl (pH 7.4), 5 part of the protein are under the same conditions (e.g., tem mM EDTA, and 0.1% SDS. The wash solution is replaced perature, concentration, pH, concentration of Substrate, salt with fresh solution and incubated an additional 1.5 hour at 60° concentration, etc.). C. Filters are blotted dry and exposed for autoradiography. If In some embodiments, a protein is deemed to be disrupted necessary, filters are washed for a third time at 65-68°C. and when the protein has a function whose performance is 30% or reexposed to film. Other conditions of low stringency that less than the function of the wild type counterpart of the 10 may be used are well known in the art (e.g., as employed for protein when the disrupted protein and the wild type counter cross-species hybridizations). part of the protein are under the same conditions (e.g., tem In some embodiments, the invention relates to nucleic perature, concentration, pH, concentration of Substrate, salt acids under conditions of moderate stringency (moderately concentration, etc.). stringent conditions). As used herein, conditions of moderate In some embodiments, a protein is deemed to be disrupted 15 stringency (moderately stringent conditions), are as known to when the protein has a function whose performance is 40% or those having ordinary skill in the art. Such conditions are also less than the function of the wild type counterpart of the defined by Sambrooketal. Molecular Cloning: A Laboratory protein when the disrupted protein and the wild type counter Manual, 2nd Ed. Vol. 1, pp. 1.101-104, Cold Spring Harbor part of the protein are under the same conditions (e.g., tem Laboratory Press, 1989, which is hereby incorporated by perature, concentration, pH, concentration of Substrate, salt reference herein in its entirety. They include, for example, use concentration, etc.). of a prewashing solution for the nitrocellulose filters 5xSSC, In some embodiments, a protein is deemed to be disrupted 0.5% SDS, 1.0 mM EDTA (pH 8.0), hybridization conditions when the protein has a function whose performance is 50% or of 50 percent formamide, 6xSSC at 42°C. (or other similar less than the function of the wild type counterpart of the hybridization solution, or Stark's solution, in 50% formamide protein when the disrupted protein and the wild type counter 25 at 42°C.), and washing conditions of about 60°C., 0.5xSSC, part of the protein are under the same conditions (e.g., tem 0.1% SDS. See also, Ausubel et al., eds., in the Current perature, concentration, pH, concentration of Substrate, salt Protocols in Molecular Biology series of laboratory tech concentration, etc.). nique manuals, (C) 1987-1997, Current Protocols, 1994 In some embodiments, a protein is deemed to be disrupted 1997, John Wiley and Sons, Inc., hereby incorporated by when the protein has a function whose performance is 60% or 30 reference herein in its entirety. The skilled artisan will recog less than the function of the wild type counterpart of the nize that the temperature, Salt concentration, and chaotrope protein when the disrupted protein and the wildtype counter composition of hybridization and wash solutions can be part of the protein are under the same conditions (e.g., tem adjusted as necessary according to factors such as the length perature, concentration, pH, concentration of Substrate, salt and nucleotide base composition of the probe. Other condi concentration, etc.). 35 tions of moderate Stringency that may be used are well known In some embodiments, a protein is deemed to be disrupted in the art. when the protein has a function whose performance is 70% or In some embodiments, the invention relates to nucleic less than the function of the wild type counterpart of the acids under conditions of high Stringency (high Stringent protein when the disrupted protein and the wild type counter conditions). As used herein conditions of high Stringency part of the protein are under the same conditions (e.g., tem 40 (high Stringent conditions) are as known to those having perature, concentration, pH, concentration of Substrate, salt ordinary skill in the art. By way of example and not limitation, concentration, etc.). procedures using Such conditions of high Stringency are as In some embodiments, a protein is disrupted by a genetic follows. Prehybridization offilters containing DNA is carried modification. In some embodiments, a protein is disrupted by out for 8 hours to overnight at 65 C in buffer composed of exposure of a host cell to a chemical (e.g., an inhibitor that 45 6xSSC, 50 mM Tris-HCl (pH 7.5), 1 mM EDTA, 0.02% PVP. substantially reduces or eliminates the activity of the 0.02% Ficoll, 0.02% BSA, and 500 mg/ml denatured salmon enzyme). In some embodiments, this compound satisfies the sperm DNA. Filters are hybridized for 48 hours at 65 C in Lipinski’s Rule of Five: 30 (i) not more than five hydrogen prehybridization mixture containing 100 mg/ml denatured bond donors (e.g., OH and NH groups), (ii) not more than ten salmon sperm DNA and 5-20x106 cpm of 32P-labeled probe. hydrogen bond acceptors (e.g. N and O), (iii) a molecular 50 Washing of filters is done at 37 C for one hour in a solution weight under 500 Daltons, and (iv) a Log P under 5. The containing 2xSSC, 0.01% PVP 0.01% Ficoll, and 0.01% "Rule of Five' is so called because three of the four criteria BSA. This is followed by a wash in 0.1xSSC at 50 C for 45 involve the number five. See, Lipinski, 1997, Adv. Drug Del. minutes before autoradiography. Other conditions of high Rev. 23, 3, which is hereby incorporated herein by reference stringency that may be used are well known in the art. in its entirety. 55 As used herein, computation of percent identity takes full In some embodiments, the invention relates to nucleic weight of any insertions in two sequences for which percent acids hybridized using conditions of low stringency (low identity is computed. To compute percent identity between stringency conditions). By way of example and not limitation, two sequences, they are aligned and any necessary insertions hybridization using Such conditions of low stringency are as in either sequence being compared are then made in accor follows (see also Shilo and Weinberg, 1981, Proc. Natl. Acad. 60 dance with sequence alignment algorithms known in the art. Sci. U.S.A. 78:6789-6792): filters containing DNA are pre Then, the percent identity is computed, where each insertion treated for 6 hours at 40° C. in a solution containing 35% in either sequence necessary to make the optimal alignment formamide, 5xSSC, 50 mM Tris-HCl (pH 7.5), 5 mM EDTA, between the two sequences is counted as a mismatch. Unless 0.1% PVP, 0.1% Ficoll, 1% BSA, and 500 mg/ml denatured explicitly indicated otherwise, the percent identity of two salmon sperm DNA. Hybridizations are carried out in the 65 sequences is the percent identity across the entire length of same solution with the following modifications: 0.02% PVP. each of the sequences being compared, with gaps insertions 0.02% Ficoll, 0.2% BSA, 100 mg g/ml salmon sperm DNA, processed as specified in this paragraph. US 9,359,581 B2 29 30 5.2. Enzymes to Derive and Utilize Sugar from Plant machines consisting cellulase catalytic modules, carbohy Cell Walls and Plant Starches drate binding domains that lock into the Substrate, and dock erins plus cohesions that serve to connect the catalytic and Many biofuel production pathways start from Sugars which carbohydrate binding domains to the surface of the bacterial are expensive and compete, directly or indirectly, with food cell that is expressing the cellulosome. crops. Commercially advantageous production pathways are 5.2.1.2. Hemicellulose: those that begin with cheaper raw materials such as agricul Hemicellulose is the second most abundant component of tural by-products, or agricultural products that require mini plant cell walls. Hemicelluloses are heterogeneous polymers mal processing for example cell wall material. built up by many different Sugar monomers. In contrast, cel In addition to naturally occurring enzymes, modified 10 lulose contains only anhydrous glucose. For instance, besides enzymes may be added into the host genome. For example glucose, Sugar monomers in hemicellulose can include enzymes may be altered by incorporating systematically var Xylose, mannose, galactose, rhamnose, and arabinose. Hemi ied sets of amino acid changes, with the resulting changes in celluloses contain most of the D-pentose Sugars, and occa phenotypes measured and used to identify sequence changes sionally small amounts of L-Sugars as well. Xylose is always conferring improved function (see for example Liao et al., 15 the Sugar monomer present in the largest amount, but man 2007, BMC Biotechnol 7: 16; Ehren et al., 2008, Protein Eng nuronic acid and galacturonic acid also tend to be present. Des Sel 21:699-707 and Heinzelman et al., 2009, Proc Natl Hemicellulose degrading enzymes include the Xylan AcadSci USA 106:5610-5615). degrading enzymes (endo-B-Xylanase, C-glucuronidase, 5.2.1. Enzymes for Cellulose, Hemicellulose, and Lignocel C-arabinofuranosidase, and B-Xylosidase) and glucomannan lulose Degradation degrading enzymes (B-mannanase and B-mannosidase). Organisms capable of generating enzymes for the break Xylan is the predominant component of hemicellulose from down of cellulose, hemicellulose, and pectin include, Tricho hardwood and agricultural plants, like grasses and stray. Glu derma viride, Fusarium oxysporium, Piptoporus betulinus, comannan is the dominant component of hemicellulose from Penicillium echinulatum, Penicillium purpurogenium, Peni hardwood. cillium rubrum, Aspergillus niger, Aspergillus fumigatus, 25 Cellulose does not typically exist in nature by itself and so Aspergillus phoenicus, Sporotrichum thermophile, Scyta other enzymes are needed for effective biomass utilization. lidium thermophillum, Clostridium straminisolvens, Ther Xylanases hydrolyze the B-1,4-Xylan linkage of hemicellu monospora curvata, Rhodospirillum rubrum, Cellulomonas lose to produce the pentose Xylose. There are a large number fini, Clostridium Stercorarium, Bacillus polymyxa, Bacillus of distinctXylanase protein families. Some fungi secretexyla coagulans, Pyrococcu firiosus, Acidothermus cellulolyticus, 30 nase isozymes: Trichoderme viride makes 13 and Asperigil Saccharophagus degradans, Neurospora crass, Humicola lus niger produces 15. Xylanases will be an increasing impor fiascoatra, Chaectomium globosum, Thielavia terrestris-255, tant component of hemicellulose utilization as an added Mycelieopthra fergussi-246C, Aspergillus wentii, Aspergil enzyme or part of an integrated bioprocessing system pro lus Ornatus, Pleurotus florida, Pleurotus cornucopiae, Tra duced in situ by a Suitable organism. Xylanases would be of mates versicolor, Bacteroides thetaiotaOmicron, and Nectria 35 utility in a Candida Strain configured for cellulose degrada catalinensis; see Kumaret al., 2008, JInd Microbiol Biotech tion. nol: 35, 377-91. 5.2.1.3. Pectin: 5.2.1.1. Cellulose Pectins are the third main structural polysaccharide of Cellulose is a homopolymeric compound composed of plant cell walls. Pectins are abundant in Sugar beat pulp and B-D-glucopyranose units, linked by a B-(1->4)-glycosidic 40 fruits, e.g., citrus and apples, where it can form up to /2 the bond and represents the most abundant polysaccharide in polymeric content of cell walls. The pectin backbone consists plant cell walls. of homo-galacturonic acid regions and neutral Sugar side Trichoderma reesei is one of the prototypical cellulose chains from L-rhamnose, arabinose, galactose, and Xylose. metabolizing fungi. It encodes genes for 3 enzyme classes L-rhamnose residues in the backbone carry sidechains con required for the degradation of cellulose to glucose. These are 45 taining arabinose and galactose. Pectin degrading enzymes Exoglucanases or cellobiohydrolases (genes CBH1 and include pectin lyase, endo-polygalacturonase, C.-arabinofura CBH2), Endoglucanases (genes EG1, EG2, EG3, EG5) and nosidase, C.-galactosidase, polymethylgalacturonase, pectin B-glucosidase (gene BGL1). Genes for these 3 classes of depolymerase, pectinase, exopolygalacturanosidase hydro enzymes could be expressed and secreted from a modified C. lase, C-L-Rhamnosidase, C-L-Arabinofuranosidase, polym tropicalis strainto allow it to generate glucose from cellulose. 50 ethylgalacturonate lyase (pectin lyase), polygalacturonate Clostridium thermocellum is a prototypical cellulose lyase (pectate lyase), exopolygalacturonate lyase (pectate degrading bacterium. It encodes numerous genes that form disaccharide-lyase). Pectinases would be of utility in a Can the cellulosome, a complex of enzymes used in the degrada dida Strain configured for cellulose degradation tion of cellulose. Enzymes participate in the formation of the 5.2.2. Biological Delignification cellulosome include scaffoldin (cipA), cellulase (cell), cello 55 The white rot fungi are a diverse group of Basidiomycetes biohydrolase (cbh A, celK, cello), Xylanase (xynY. XynZ. that are capable of completely degrading all the major com XynA, XynU, XynC, XynD, XynB, XynV), endoglucanase ponents of plant cell walls, including cellulose, hemicellulose (celH, celE, celS, celF, celN, celQ, celD, celB, celT, celG, and lignin. Phanerochaete chrysosporium is a prototypical celA), mannanase (manA), chitinase (chiA), lichenase (licB) example that has recently been the focus of a genome and a protein with unknown function Csep (cseP). 60 sequencing and anotization project. See review of genome Encoding all or a Subset of the genes required to replicate project and genes used in delignification (Kersten et al., 2007. the C. thermocellum cellulosome, component enzymes or Fungal Genet Biol: 44, 77-87). engineered derivatives would be of utility in a Candida strain Lignocellulosic biomass refers to plant biomass that is configured for cellulose degradation. There is emerging evi composed of cellulose, hemicellulose, and lignin. The carbo dence that effective hydrolysis of cellulose requires a multi 65 hydrate polymers (cellulose and hemicelluloses) are tightly component system like the cellulosome that interacts with the bound to the lignin, by hydrogen and covalent bonds. Biom substrate and the surface of the cell. Cellulosomes are nano ass comes in many different types, which may be grouped into US 9,359,581 B2 31 32 four main categories: (1) wood residues (including sawmill Other lignocellulose degrading organisms include Pleuro and paper mill discards), (2) municipal paper waste, (3) agri tus erygii (has a versatile peroxidase that exhibits both LiP cultural residues (including corn Stover and Sugarcane and MnPactivities), Cyathus sp., Streptomyces viridosporus bagasse), and (4) dedicated energy crops (which are mostly T7A (the lignin peroxidase, LiP has been studied in some composed of fast growing tall, woody grasses). Fermentation 5 detail), Phelebia tremellosus, Pleurotus florida, Peurotus cor of lignocellulosic biomass to ethanol is an attractive route to nucopiae, Pleurotus Ostreatus, Trametes versicolor; Irpex lac energy feedstocks that Supplements the depleting stores of teus, Ganoderma lucidum, Ganoderma applanatum, Corio fossil fuels. Biomass is a carbon-neutral source of energy, lus versicolor, Aspergillus 2BNL1, Aspergillus 1AAL1, since it comes from dead plants, which means that the com Lentinus edodes UEC 2019, Ceriporiopsis subvermispora, bustion of ethanol produced from lignocelluloses will pro 10 duce no net carbon dioxide in the earth's atmosphere. Also, Panus conchatus. biomass is readily available, and the fermentation of ligno 5.2.3. Enzymes Needed for Utilization of Starch: celluloses provides an attractive way to dispose of many Enzymes for saccharification include C.-amylases, B-amy industrial and agricultural waste products. Finally, lignocel lases, y-amylases, glucoamylase, maltogenase and pullanase. lulosic biomass is a very renewable resource. Many of the 15 dedicated energy crops can provide high-energy biomass, 5.3. Potential Feedstocks Used Directly or Following which may be harvested multiple times each year. Enzymatic, Physical, Chemical, and or Mechanical One barrier to the production of biofuels from biomass is Pretreatment that the Sugars necessary for fermentation are trapped inside the lignocellulose. Lignocellulose has evolved to resist deg Almost anything derived from the Kingdom Plantae, and radation and to confer hydrolytic stability and structural more specifically anything containing, lingnocellulose, cel robustness to the cell walls of the plants. This robustness or lulose, hemicellulose, pectin, and/or starch can be used as a “recalcitrance' is attributable to the crosslinking between the feedstock for the production of biofuels. polysaccharides (cellulose and hemicellulose) and the lignin The heterogeneous structure of the lignin polymer renders via ester and ether linkages. Ester linkages arise between 25 it highly difficult to degrade. Lignin degradation occurs quite oxidized Sugars, the uronic acids, and the phenols and phe slowly in nature via the action of wood rot fungi that produce nylpropanols functionalities of the lignin. To extract the fer ligninases. These fungi and some bacteria recycle the carbon mentable Sugars, one must first disconnect the celluloses from locked in Woody plants taking years to digest a large tree. A the lignin, and then acid-hydrolyze the newly freed celluloses major strategy for increasing availability of Sugar polymers is to break them down into simple monosaccharides. Another 30 to genetically decrease the lignin content of plants. Alfalfa challenge to biomass fermentation is the high percentage of lines downregulated in several steps of lignin biosynthesis pentoses in the hemicellulose, such as xylose, or wood sugar. were tested for Sugar release during chemical saccharification Unlike hexoses, like glucose, pentoses are difficult to fer with promising results. Plant with the lowest lignin compen ment. The problems presented by the lignin and hemicellu sated by making more carbohydrate. Moreover, the carbohy lose fractions are the foci of much contemporary research. 35 drate was more readily released with decreasing lignin. Sug Hundreds of sequences from P. chrysosporium are pre ars present were Xylose, arabinose, glucose, and galactose dicted to encode extracellular enzymes including many oxi that were representative of hemicellulosic and pectic cell wall dative enzymes potentially involved in lignocellulose degra polymers (Chen et al., 2007, Nat Biotechnol: 25,759-61). dation, including peroxidases, copper radical oxidases, FAD 5.3.1. Physical, Chemical, and/or Mechanical Lignocellulose dependent oxidases, and multicopper oxidases. The oxidases 40 Pre-treatments and peroxidases are responsible for generating reactive and Lignocellulosic Substrates used by an engineered C. tropi nonspecific free radicals that affect lignin degradation. calis Strain may include one or more of the following pre Enzymes that accelerate the rate of lignocellulose degrada treatments: mechanical pretreatment (milling), thermal pre tion would be of utility in a Candida strain configured for treatment (steam pretreatment, steam explosion, and/or liquid cellulose degradation. 45 hot waterpretreatment), alkaline pretreatment, oxidative pre Large and complex families of cytochrome P450s, peroxi treatment, thermal pretreatment in combination with acid dases, glycoside hydrolases, proteases, copper radical oxi pretreatment, thermal pretreatment in combination with alka dases and multicopper oxidases are observed in the P. chry line pretreatment, thermal pretreatment in combination with sosporium genome. Structurally related genes may encode oxidative pretreatment, thermal pretreatment in combination proteins with subtle differences in functions, and such diver 50 with alkaline oxidative pretreatment, ammonia and carbon sity may provide flexibility needed to change environmental dioxide pretreatment, enzymatic pretreatment, and/or pre conditions (pH, temperature, ionic strength), Substrate com treatment with an engineered organism (Hendriks et al., 2009, position and accessibility, and wood species. Alternatively, Bioresour Technol: 100, 10-8). Some of the genetic multiplicity may merely reflect redun dancy. 55 5.4. Sugars Derived from Plant Cell Walls that May Lignin peroxidases (LiP) and manganese peroxidases Require Engineering of C. Tropicalis (MnP) have been the most intensively studied extracelluar enzymes of P chrysosporium. Also, implicated in lignocel Plant biomass hydrolysates contain carbon Sources that luose degradation are, copper radical oxidases (e.g., glyoxal may not be readily utilized by yeast unless appropriate oxidase, GLX), flavin and cytochrome enzymes Such as, cel 60 enzymes are added via metabolic engineering (van Mans et lobiose dehydrogenase (CDH), glucose oxidases (glucose al., 2006, Antonie Van Leeuwenhoek: 90, 391-418). For 1-oxidase and glucose 2-oxidase), aryl alcohol oxidases, example, S. cerivisiae readily ferments glucose, mannose, Veratryl alcohol oxidase, multicopper oxidases (mcol). and fructose via the Embden-Meyerhof pathway of glycoly Proteases produced by P. chrysosporium may be involved sis, while galactose is fermented via the Leloir pathway. in activation of cellulase activity. P. chrysosporium appar 65 Construction of yeast strains that efficiently convert other ently does not code for laccases, which are used by other potentially fermentable substrates in plant biomass will organisms for lignocellulose degradation. require metabolic engineering. The most abundant of these US 9,359,581 B2 33 34 compounds is xylose. Other fermentable substrates include lose-5-phosphate and D-xylulose-5-phosphate, respectively. L-arabinose, galacturonic acid, and rhamnose. These enzymes are encoded by the araA, araB, and ara) 5.4.1. Xylose Fermentation genes respectively. Xylose-fermenting yeasts link Xylose metabolism to the A first attempt to express the E. coli genes in S. Cerivisiae pentose-phosphate pathway. These yeasts use two oxi was only partly successful, with the strain generating only doreductases, Xylose reductase (XR) and Xylitol dehydroge L-arabinitol. One of the most promising examples of S. nase (XDH), to convert xylose to xylulose 5-phosphate, cerivisiae engineering for L-arabinose fermentation is which enters the pentose phosphate pathway. described in (Beckeret al., 2003, Appl Environ Microbiol: 69. Although strains of S. cerivisiae that express both xylose 4144-50). In this work the bacterial L-arabinose operon con reductase (XR) and xylitol dehydrogenase (XDH) have been 10 sisted of E. coli araB and araD and Bacillus subtilis araA, constructed, anaerobic fermentation was accompanied by along with overexpression of the yeast galactose permease considerable xylitol production. For every one NADPH used gene (GAL2). Gallip is known to transport L-arabinose. by XR, one NADH needs to be reoxidized, and the only way Although overexpression of these enzymes did not result in to do it be the engineered yeasts is to produce Xylitol, 15 immediate growth on L-arabinose as the Sole carbon Source, although ethanol vs. xylitol production can be impacted both the growth rate of the transformants increased progressively positively and negatively by starting strain, source of heter after 4-5 days incubation. Eventually an L-arabinose-utiliz ologous enzymes, and culture conditions. Ideally, the XRand ing Strain was selected after several sequential transfers in XDH can be engineered to be linked to the same coenzyme L-arabinose medium. In addition to being able to grow aero system eliminating the production of excess NADH in the bically on L-arabinose, the evolved strain produced ethanol process of ethanol production. from L-arabinose at 60% the theoretical yield under oxygen One of the most Successful examples of engineering S. limited conditions. An enhanced transaldolase (TALI) activ cerivisiae for ethanol production from Xylose uses the fungal ity was reported to enhance L-arabinose fermentation and Xylose isomerase (XylA) from obligately anaerobic fungi overexpression of GAL2 was found not to be essential for Piromyces sp. E2. The introduction of the Xy1A gene was 25 growth on L-arabinose, Suggesting that other yeast Sugar sufficient to enable the resulting strain to grow slowly with transporters can also transport L-arabinose. A similar Xylose as sole carbon Source under aerobic conditions. Via an approach would be feasible in Candida, re-coding the genes extensive selection procedure a new strain was derived to be better expressed in Candida, and to remove those (Kuyperet al., 2005, FEMSYeast Res: 5,399-409) which was codons that are non-canonical in Candida. capable of anaerobic growth on Xylose producing mainly 30 5.4.3. Galacturonic Acid Fermentation: ethanol, CO2, glycerol, biomass, and notably little xylitol. Reduction of galacturonic acid to the same level of a hex The ethanol production rate was considered still too low for ose requires the input of two electron pairs, for instance via industrial applications. To obtain a higher specific rate of two NADH-dependent reduction steps. Galacturonic acid is a ethanol production, a strain was constructed that in addition major component of pectin and therefore occurs in all plant to the Xy 1A gene, overexpressed all genes involved in the 35 biomass hydrolysates. Pectin-rich residues from citrus fruit, conversion of Xylose into the intermediates of glycolysis, apples, Sugar cane and Sugar beets contain especially large including Xylulokinase, ribulose 5-phosphate isomerase, amounts of D-galacturonic acid. If D-galacturonic acid can be ribulose 5-phosphate epimerase, transketolase, and transal converted to ethanol, this would increase the relevance of dolase. In addition the gene GRE3, encoding aldose red these abundantly available feedstocks. cutase, was deleted to further minimize xylitol production. 40 Several yeasts, e.g., Candida and Pichia, can grow on The resulting strain could be cultivated under anaerobic con D-galacturonic acid, and therefore potential sources for trans ditions without further selection or mutagenesis and at the port enzymes and a heterologous pathway if needed. time had the highest reported specific ethanol production rate. The ability to utilize D-galacturonic acid is widespread Candida tropicalis has been shown to be able to ferment among bacteria, which all seem to use the same metabolic xylose to ethanol (Zhanget al., 2008, Sheng Wu Gong Cheng 45 pathway. In the bacterial pathway, D-galacturonic acid is Xue Bao: 24, 950-6.) Pichia stipitis is another yeast that is converted to pyruvate and glyceraldehydes-3-phosphate via a able to ferment Xylose to alcohol and being studied (Agbogbo five-step pathway. Overall this results in the conversion of et al., 2008, Appl Biochem Biotechnol: 145, 53-8). D-galacturonic acid, NADH, and ATP into pyruvate, glycer 5.4.2. L-Arabinose Fermentation aldehydes-3-phosphate and water. Glyceraldehyde-3-phos Although D-xylose is the most abundant pentose in hemi 50 phate can be converted to equimolar amounts of ethanol and cellulosic Substrates, L-arabinose is present in significant CO2 via standard glycolytic reactions yielding 2 ATP. How amounts, thus the importance of converting arabinose to etha ever, conversion of pyruvate to ethanol requires oxidation of nol. a second NADH. Saccharomyces cannot ferment or assimilate L-arabinose. During anaerobic growth and fermentation on Sugars (hex Although many types of yeast are capable of assimilating 55 oses, but also Xylose by engineered Xylose-fermenting L-arabinose aerobically, most are unable to ferment it to strains) of S. Cerivisiae, a significant fraction of the carbon is ethanol. Some Candida species are able to make arabinose channeled into glycerol to compensate for oxidative, NADH fermentation to ethanol, but production rates are low. generating reactions in biosynthesis. L-arabinose fermentation may be-rare among yeasts due to In theory, introduction of the prokaryotic glacturonic acid a redox imbalance in the fungal L-arabinose pathway, there 60 fermentation route into yeast can create an alternative redox fore an alternative approach to using the fungal enzymes is to sink for the excess NADH formed in biosynthesis. This would construct L-arabinose fermenting yeast by overexpression of have two advantages. Firstly, the NADH derived from bio the bacterial L-arabinose pathway. In the bacterial pathway synthetic processes can be used to increase ethanol yield on no redox reactions are involved in the initial steps of L-ara glacturonic acid to 2 molethanol per mol of glacturonic acid, binose metabolism. Instead the enzymes, L-arabinose 65 as the pyruvate formed can now be converted to ethanol. isomerase, L-ribulokinase, and L-ribulose-5-phosphate Secondly, since the Sugar requirements production for glyc 4-epimerase are involved in converting L-arabinose to L-ribu erol are reduced, the ethanol yield on Sugars will increase. US 9,359,581 B2 35 36 Bacterial D-galacturonate catabolism uses the following rhamnono-1,5-lactone. The 1.4 lactone is hydrolyzed to enzymes: D-galacturonate isomerase, altronate oxidoreduc L-rhamnonate by L-rhamnono-1,4-lactonase. The unstable tase, altronate dehydratase, 2-dehydro-3-deoxygluconoki 1.5-lactone has been reported to spontaneously hydrolyze to nase, 2-keto-3-deoxy-6-phosphogluconate aldolase, glycer L-rhamnonate. L-Rhamnonate is Subsequently dehydrated to aldehydes-3-phosphate. Although a large number of yeasts 2-keto-3-deoxy-L-rhamnonate by L-rhamnonate dehy and molds use galacturonic acid as carbon and energy for dratase. The product of this reaction is then cleaved into growth, knowledge of the underlying metabolic process is pyruvate and L-lactaldehyde by an aldolase. In P stipitis the limited. At present, the prokaryotic pathway offers the most thus formed L-lactaldehyde is converted to lactate and NADH promising approach for engineering Candida for galactur by lactaldehyde dehydrogenase. Introduction of this fungal onic acid metabolism. 10 pathway into S. cerivisiae should enable the conversion of 5.4.4. L-Rhamnose Fermentation: L-rhamnose to equimolar amounts of ethanol, lactaldehyde The deoxyhexose L-rhamnose is named after the plant it and CO2 without a net generation of ATP. This conversion was first isolated from: the buckthorn (Rhamnus). In contrast would require the introduction of a transporter and four het with most natural Sugars, L-rhamnose is much more common erologous enzymes (including 1.4-lactonase). than D-rhamnose. It occurs as part of therhamnogalacturonan 15 5.4.5. Inhibitor Tolerance: of pectin and hemicellulose. Being a 6-deoxy Sugar, L-rham The harsh conditions that prevail during the chemical and nose is more reduced than the rapidly fermentable Sugars physical pretreatment of ligncullulse result in the release of glucose and fructose. many Substances that inhibit growth and productivity of S. cerivisiae cannot grow on L-rhamonose. The metabolic microorganisms such as S. cerivisiae. The number and iden engineering of S. cerivisiae for the production of ethanol will tity of the toxic compounds varies with the nature of the raw have to address two key aspects: the enhancement of rham material and pretreatment conditions. nose transport across the plasma membrane and the introduc There are two approaches to limit the impact of the inhibi tion of a rhamnose-metabolizing pathway. tors on the fermentation process: (i) introduction of additional Two possible strategies to engineer uptake follow. Firstly, chemical, physical, or biological process steps for removal or after introduction of an ATP-yielding pathway for L-rham 25 inactivation of inhibitors (ii) improvement of S. cerivisiae to nose catabolism (see below), selection for growth on L-rham the inhibitors. nose can be used to investigate whether or not mutations in hexose transporters enable uptake of L-rhamnose. 5.5. Fermentation Products from Biomass Although the rhamnose transporters from bacteria (e.g., E. coli) are well characterized, functional expression of bacterial 30 5.5.1. Butanol transporters in the yeast plasma membrane may be challeng Metabolic engineering of Escherichia coli for butanol pro ing. Pichia stipidis is able to use L-rhamnose. Using infor duction by inserting genes from the butanol production bac mation generated by the P stipidis genome project, it might teria Clostridium acetobutylicum into E. coli has been be possible to identify a rhamnose transporter if such a gene described (Inui et al., 2008, Appl Microbiol Biotechnol: 77, can be shown to be induced by rhamnose (as proposed for 35 1305-16). galacturonic acid above). A similar strategy can be envisioned for an engineered C. After uptake the next requirement for Successful rhamnose tropicalis Strain configured to derive Sugars from biomass. fermentation is conversion into intermediates of central Enzymes (and genes) from Clostridium acetobutylicum metabolism. required for butanol production from Acetyl-CoA include: Two pathways for rhamnose utilization have been reported 40 Acetyl-CoA acetyltransferase (thiL), B-hydroxybutyryl-CoA in microorganisms. dehydrogenase (hbd), 3-hydroxybutyryl-CoA dehydratase The first catabolic pathway involves phosphorylated inter (crt), butyryl-CoA dehydrogenase (bcd, etfA, etfB), butyrla mediates and is used, for example, by E. coli. In this pathway, ldehyde dehydrogenase (adhel, adhe), butanol dehydroge L-rhamnose is converted to L-rhamnulose by L-rhamnose nase (adhel, adhe), butyrlaldehyde dehydrogenase (bdha), isomerase. After the Subsequent phosphorylation to L-rham 45 butanol dehydrogenase (bdha), butyrlaldehyde dehydroge nulose by rhamnulokinase, L-rhamnulose-1-phosphate is nase (bdhB), butanol dehydrogenase (bdhB). split into dihdroxy-acetone-phosphate (DHAP) and L-lactal n-Butanol is a commercially important alcohol that is con dehyde by rhamnulose-1-phosphate aldolase. DHAP can be sidered by some to be a strong Candidate for widespread use normally processed by glycolysis, yielding 1 molethanol per as a motor fuel. n-Butanol is currently produced via chemical mol L-rhamnose. In E. coli, further metabolism of L-lactal 50 synthesis almost exclusively. The dominant synthetic process dehyde depends on the redox state of the cells. L-lactaldehyde in industry, the acetaldehyde method, relies on propylene can be oxidized to lactate by lactaldehyde dehydrogenase, derived from petroleum 1. The U.S. market for butanol is reduced to 1,2-propanediol by lactaldehyde reductase, or pro 2.9 billion pounds per year 2. Currently, the primary use of cessed via a redox-neutral mix of these two reactions. Intro n-butanol is as a solvent, however, several companies includ duction of this pathway into S. Cerivisiae, L-rhamnulose is 55 ing British Petroleum and DuPont are developing methods to expected to be converted to equimolar amounts of ethanol, utilize bacteria to produce n-butanol on a large scale for fuel lactaldehyde and CO2 with generation of 1 ATP. In summary, 3. Microorganisms capable of producing n-butanol by fer this strategy would require the introduction of a transporter mentation are Clostridia acetobutylicum, C. beijerinckii, and and three heterologous enzymes into S. Cerivisiae. C. tetanomorphum. A second route for rhamnose degradation, which does not 60 n-Butanol has several characteristics that make it a viable involve phosphorylated intermediates was first described for alternative fuel option. It has an energy density that is similar the Aureobasidium pullulans and is referred to as to gasoline. Additionally, it could power a combustion engine direct oxidative catabolism of rhamnose. A similar pathway with minimal or no modifications. In either a blended or neat occurs in the yeasts P stipitis and Debaryomyces polymor form, n-butanol could be easily integrated into our current phus. This pathway is initiated by the oxidation of L-rham 65 infrastructure. nose by NAD+-dependent L-rhamnose dehydrogenase, Enzymes for butanol production include Pyruvate dehy yielding either L-rhamnono-1.4-lactone or the unstable drogenase complex, acetyl-CoA acetyltransferase, 3-hy US 9,359,581 B2 37 38 droxybutyryl-CoA dehydrogenase, crotonase, butyryl-CoA isopropanol for methanol in fatty acid esters is a higher tol dehydrogenase, aldhyde and/or alcohol dehydrogenase. erance for cold temperatures. The fatty acid isopropyl ester (Steen et al., 2008, Microb Cell Fact: 7, 36: Atsumi et al., would remain liquid in cooler climates. The biosynthesis 2008, Metab Eng: 10, 305-11.) genes for isopropanol originally found in Clostridia acetobu 5.5.2. Branched Chain Alcohols tylicum were engineered into an E. coli Strain for optimal In the quest to find a substitute for petroleum based fuels, industrial usage (Hanai et al., 2007, Appl Environ Microbiol: several low energy molecules have been Suggested due to the 73, 7814-8). ease of production. However, a molecule with similar energy Synthesis would require pyruvate dehydrogenase com density to current fuels would be preferred as a biofuel. 10 plex, acetyl-CoA acetyltransferase, acetoacetyltransferase, Branched higher alcohols have a higher energy density than secondary alcohol dehydrogenase. some of the alcohols proposed as alternative fuels. Other 5.5.5. Methanol Pathway from Methane various properties of these alcohols also display more desir Methanol can be synthesized chemically or biochemically able features. For example, a lower miscibility with water and from methane gas. Over 30 million tons per year of methanol 15 are produced worldwide 1. Currently, chemical synthesis is lower vapor pressure are benefits of higher alcohols. A unique the method of choice. Methanol is widely used as a solvent, in approach to alcohol synthesis taken by Atsumi etal. Atsumi et antifreeze, and as an intermediate in synthesis of more com al., 2008, Nature: 451, 86-9, employs synthetic biology to plex chemicals. Methanol is used as a fuel in Indy race cars engineer non-fermentative pathways based on amino acid and it has been blended into gasoline for civilian automobiles. biosynthesis. These pathways produce alcohols that are not Microorganisms capable of methanol production include natural fermentation products. Some of the features of these Methylobacterium sp., Methylococcus capsulatus, and Methylosinus trichosporium. molecules include branching and addition of aromatic cyclic Enzymes required: methane monooxygenase. hydrocarbon structures. 5.5.6. Other Possible End Products Requiring Metabolic An engineered C. tropicalis capable of generating 2-me 25 Engineering thyl-1-butanol from L-threonine would use either the endog Esters: Fatty acid ethyl ester, Fatty acid methyl ester enous or exogenously added threonine biosynthetic enzymes, Ethers: Dimethyl ether, Dimethylfuran, Methyl-t-butyl ether L-threonine ammonia lyase, endogenous or exogenously Hydrocarbons: Alkanes, Alkenes, Isoprenoids added isoluecine biosynthetic enzymes, 2-keto-acid decar boxylase, and an alcohol dehydrogenase. 5.5.7. Over-Production of Fatty Acids 3-methyl-1-butanol pathway from pyruvate would require 30 Because many of the strains described here are no longer valine biosynthesis enzymes, leucine biosynthesis enzymes, able to utilize many fatty acids as carbon and energy sources 2-keto-acid decarboxulase, alcohol dehydrogenase. due to the knockouts in both B-oxidation (pox4a/pox4b 2-phenylethanol pathway from pyruvate would require pox5a/pox5b) and co-oxidation pathways (P450 (cytochrome Phenylalanine biosynthesis enzymes, 2-keto acid decarboxy P450), fao (fatty alcohol oxidase), and adh (alcohol dehydro 35 genase) gene), the strain is an ideal Candidate for metabolic lase, alcohol dehydrogenase. engineering for manipulation of the fatty acid biosynthetic 5.5.3. Isobutanol Pathway from Pyruvate pathways for overproduction of fatty acids. Isobutanol has a higher carbon content than ethanol, there Fatty acids (and/or lipids) so produced could either be used fore making its energy properties closer to gasoline. Cur for production of biofuels such as biodiesel or by restoring a rently, isobutanol is used as a precursor for commodity 40 P450 or P450s for endogenous production of co-hydroxy fatty chemicals including isobutyl acetate. Atsumi et al., 2008, acids. Methods for over-production of endogenous fatty acids Nature: 451, 86-9, synthesized isobutanol via synthetic biol may be similar to those used by Lu X et al., 2008, Metab Eng: ogy. The origin of the enzymes required to synthesize isobu 10,333-9. tanol were from a variety of microorganisms including Lac Steps include: 45 1. Knocking out the E. coli fad) gene, which encodes an tococcus lactis and Saccharomyces cerevisiae. In addition to acyl-CoA synthetase, to block fatty acid degradation. This expressing foreign enzymes, the host, E. coli, was modified to may be accomplished by knocking out acyl-coA synthetases direct metabolism toward isobutanol production. The inter and acyl-coA oxidases of the Candida tropicalis (e.g., POX4 esting feature of this pathway to synthesize isobutanol is that and POX4 genes are already absent). it employs amino acid biosynthesis to generate the essential 50 2. Heterologous expression of acyl-ACP thioesterases to precursor. This allows the microbe to produce the alcohol in increase the abundance of shorter chain fatty acids, e.g., the presence of oxygen. In fact, semi-aerobic conditions U31813 from Cinnamomum camphorum (improved fuel increased yields. This approach has been applied to generate quality). several other alcohols, such as 2-phenylethanol and 2-me 3. Increasing the Supply of malonyl-CoA by over-express thyl-1-butanol. The pathways for these interesting alcohols 55 ing acetyl-coA carboxylase. have not yet been optimized. 4. Releasing feedback inhibition caused by long-chain Synthesis would require valine biosynthesis enzymes, fatty acids by overexpression of an endogenous or exogenous 2-keto-acid decarboxylase, alcohol dehydrogenase. acyl-ACP thioesterase. Acyl-ACP thioesterases release free 5.5.4. Isopropanol Pathway from Pyruvate fatty acids from acyl-ACPs. Isopropanol is commonly employed as an industrial 60 Mechanisms for membrane proliferation (more membrane-more lipid?): cleaner and solvent. Additionally, it is sold as “rubbing alco Expression/overexpression of P450s including fatty acid, hol” for use as a disinfectant. As a significant component in alkane, and alkene metabolizing P450s lead to membrane dry gas, a fuel additive, it solubilizes water in gasoline, proliferation in Yeasts. May be possible to express an enzy thereby removing the threat of frozen supply lines. Proposed 65 matically inactive P450 that elicits proliferation via mem biofuel applications include partial replacement of gasoline brane anchor. Expression of secreted enzymes, such as inver and in production offatty acid esters. A benefit of substituting tase (SUC2) can lead to membrane genesis in yeasts. US 9,359,581 B2 39 40 Growth with compounds that lead to membrane prolifera n-alkanes, n-alkenes, n-alkynes and/or fatty alcohols that tion. have a carbon chain length from 12 to 22 are converted to a Altering genetics of peroxisome proliferation. hydroxyl or carboxyl group. Enzymes that are Candidates for manipulating either by modulating or eliminating expression, or Substituting homo 5.7. Genetic Modifications of Candida tropicalis logues or engineered enzymes, e.g. that eliminate feedback or end product inhibition. Yeasts of the genus Candida including Candida tropicalis contains two pathways for the metabolism of fatty acids: 5.6. Production of Long-Chain C2-Hydroxy Fatty ()-oxidation and B-Oxidation. These pathways are shown Acids 10 schematically in FIG. 2, together with some classes of enzymes capable of catalyzing the chemical conversions in Whole-cell biocatalysts currently used to oxidize long each pathway. In order for Candida to be used to transform chain fatty acids include Candida tropicalis, Candida cloa fatty acids into useful compounds-such as diacids and cae, Cryptococcus neoforman and Corynebacterium sp. One hydroxyl fatty acids, or high energy compounds, or other 15 chemicals it is advantageous to eliminate metabolic pathways preferred microorganisms is Candida tropicalis ATCC20962 that can divert either the substrates or products of the desired in which the B-oxidation pathway is blocked by disrupting pathway. For example it may be desirable to prevent Candida PDX 4 and PDX 5 genes which respectively encode the from metabolizing fatty acids through the B-Oxidation path acyl-coenzyme A oxidases PXP-4 (SEQ ID NO: 134) and way, so that more fatty acids are available for conversion to PXP-5 (SEQ ID NO: 135). This prevents metabolism of the C.()-diacids and ()-hydroxy fatty acids by the co-oxidation fatty acid by the yeast (compare FIGS. 2 and 3). The fatty pathway. This can be accomplished by deleting the acyl coen acids or alkynes used have 14 to 22 carbon atoms, can be Zyme A oxidase genes, as shown in FIG. 2 (Picataggio et al., natural materials obtained from plants or synthesized from 1992, Biotechnology (NY): 10,894-8; Picataggio et al., 1991, natural fatty acids, such as lauric acid (C12:0), myristic acid Mol Cell Biol: 11,4333-9). (C14:0), palmitic acid (C16:0), stearic acid (C18:0), oleic 25 Candida tropicalis Strains lacking both alleles of each of acid (C18:1), linoleic acid (C18:2), O-linolenic acid (c)3. two acyl coenzyme A oxidase isozymes, encoded by the poX4 C18:3) ricinoleic acid (12-hydroxy-9-cis-octadecenoic acid, and poX5 genes, are efficient biocatalysts for the production 12-OH-C 18:1), erucic acid (C22:1). epoxy stearic acid. of C,c)-diacids (Picataggio et al., 1992, Biotechnology (NY): Examples of other substrates that can be used in biotransfor 10,894-8; Picataggio et al., 1991, Mol Cell Biol: 11, 4333 mations to produce C.()-dicarboxylic acid and ()-hydroxy 30 4339). However for the production of co-hydroxy fatty acids, acid compounds are 7-tetradecyne and 8-hexadecyne. Dis additional enzymes must be eliminated to prevent the oxida closed herein, naturally derived fatty acids, chemically or tion of the co-hydroxyl group to a carboxyl group. enzymatically modified fatty acids, n-alkane, n-alkene, To prevent the oxidation of hydroxyl groups to carboxyl n-alkyne and/or fatty alcohols that have a carbon chain length groups, in Some embodiments it is particularly advantageous 35 to eliminate or inactivate one or more genes encoding a cyto from 12 to 22 are used as carbon sources for the yeast chrome P450. catalyzed biotransformation. For example, Candida tropica To prevent the oxidation of hydroxyl groups to carboxyl lis ATCC20962 can be used as a catalyst under aerobic con groups, in Some embodiments it is particularly advantageous ditions in liquid medium to produce ()-hydroxy fatty acids to eliminate or inactivate one or more genes encoding a fatty and C.co-dicarboxylic acids. Candida tropicalis ATCC20962 40 alcohol dehydrogenase. is initially cultivated in liquid medium containing inorganic To prevent the oxidation of hydroxyl groups to -carboxyl salts, nitrogen Source and carbon source. The carbon Source groups, in Some embodiments it is particularly advantageous for initial cultivations can be saccharide such as Sucrose, to eliminate or inactivate one or more genes encoding an glucose, Sorbitol, etc., and other carbohydrates Such as glyc alcohol dehydrogenase. erol, acetate and ethanol. Then, the Substrate such as naturally 45 In one embodiment yeast genes can be inactivated by delet derived fatty acids, chemically or enzymatically modified ing regions from the yeast genome that encode a part of the fatty acids, n-alkane, n-alkene, n-alkyne and fatty alcohol for yeast gene that encodes the protein product (the open reading oxidation of terminal methyl or hydroxyl moieties is added frame) so that the full-length protein can no longer be made into the culture. The pH is adjusted to 7.5-8.0 and fermenta by the cell. In another embodiment yeast genes can be inac tions are conducted under aerobic conditions with agitation in 50 tivated by inserting additional DNA sequences into the part of a shaker incubator, fermentor or other suitable bioreactor. the yeast gene that encodes the protein product so that the For example, the fermentation process may be divided into protein that is made by the cell contains changes that prevent two phases: a growth phase and a transformation phase in it from functioning correctly. In another embodiment yeast which co-oxidation of the substrate is performed. The seeds genes are inactivated by inserting or deleting sequences from 55 control regions of the gene, so that the expression of the gene inoculated from fresh agar plate or glycerol stock are firstly is no longer correctly controlled; for example additions or cultivated in a preculture medium for 16-20 hours, at 30°C. deletions to the promoter can be used to prevent transcription and pH 6.5 in a shaker. Subsequently, this culture is used to of the gene, additions or deletions to the polyadenylation inoculate the conversion medium with co-substrates. The signal can be used to affect the stability of the mRNA, addi growth phase of the culture is performed for 10-12 hours to 60 tions or deletions to introns or intron splicing signals can be generate high cell density cultures at pH 6.5 and 30°C. The used to prevent correct splicing or nuclear export of the pro transformation phase is begun with addition of the fatty acid cessed mRNA. or other substrate for the bio-oxidation. The medium pH is For the production of oxidized compounds in yeast-includ adjusted to 7.5-8.0 by addition of a base solution. Co-sub ing ()-hydroxy fatty acids and high energy compounds, it may strates are fed during the transformation phase to provide 65 also be advantageous to add certain new genes into the yeast energy for cell growth. By use of this method, the terminal cell. For example to facilitate the production of co-hydroxy methyl group of fatty acids, synthetically derived substrates, fatty acids from fatty acids with different chain lengths or US 9,359,581 B2 41 42 degrees or positions of unsaturation, the enzymes that are introns may be absent) or in terms of coding sequences. One naturally present in the yeast are often inadequate; they may example of Such a detectable gene product is one that causes oxidise the fatty acid to the co-hydroxy fatty acid too slowly, the yeast to adopt a unique characteristic color associated they may only oxidise a Subset of the fatty acids in a mixture with the detectable gene product. For example, if the targeting to their corresponding ()-hydroxy fatty acids, they may oxi- 5 construct contains a selective marker that is a gene that directs dise the fatty acid in the wrong position or they may oxidise the cell to synthesize a fluorescent protein, then all of the the ()-hydroxy fatty acid itself to a diacid. Advantageous colonies that contain the fluorescent protein are carrying the enzymes could thus be those that oxidise a compound to the targeting construct and are therefore likely to be integrants. corresponding hydroxylated compound more rapidly, those Thus the cells that will be selected for further analysis are that oxidise a fatty acid to its corresponding ()-hydroxy fatty 10 those that contain the fluorescent protein. acid more rapidly, those that accept as Substrates a wider The selective marker may encode a protein that allows the range of Substrates and those that do not over-oxidise target yeast cell to be selected by, for example, a nutritional require compounds including ()-hydroxy fatty acids to diacids. ment. For example, the selective marker may be the ura4 gene To achieve novel phenotypes in Candida species, including that encodes orotidine-5'-phosphate decarboxylase. The ura4 the ability to perform biotransformations such as novel 15 gene encodes an enzyme involved in the biosynthesis ofuracil chemical conversions, or increased rates of conversion of one and offers both positive and negative selection. Only cells or more Substrates to one or more products, or increased expressing ura4 are able to grow in the absence of uracil, specificity of conversion of one or more Substrates to one or where the appropriate yeast strain is used. Cells expressing more products, or increased tolerance of a compound by the ura4 die in the presence of 5-fluoro-orotic acid (FOA) as the yeast, or increased uptake of a compound by the yeast, it may 20 ura4 gene product converts FOA into a toxic product. Cells be advantageous to incorporate a gene encoding a polypep not expressing ura4 can be maintained by adding uracil to the tide into the genome of the yeast. medium. The sensitivity of the selection process can be Preferred sites of integration include positions within the adjusted by using medium containing 6-azauracil, a competi genome where the gene would be under control of a promoter tive inhibitor of the ura4 gene product. The his3 gene, which that transcribes high levels of an endogenous protein, or under 25 encodes imidazoleglycerol-phosphate dehydratase, is also control of a promoter that leads to regulated transcription for suitable for use as a selective marker that allows nutritional example in response to changes in the concentrations of one selection. Only cells expressing his3 are able to grow in the or more compound in the cellular or extracellular environ absence of histidine, where the appropriate yeast strain is ment. Examples of preferred sites of integration include sites used. in the genome that are under control of the promoter for an 30 The selective marker may encode for a protein that allows isocitrate lyase gene, sites in the genome that are under con the yeast to be used in a chromogenic assay. For example, the trol of the promoter for a cytochrome P450 gene, sites in the selective marker may be the lac7 gene from Escherichia coli. genome that are under control of the promoter for a fatty This encodes the B-galactosidase enzyme which catalyses the alcohol oxidase gene and sites in the genome that are under hydrolysis off-galactoside Sugars such as lactose. The enzy control of the promoter for an alcohol dehydrogenase gene to 35 matic activity of the enzyme may be assayed with various obtain high levels of expression of a polypeptide or expres specialized substrates, for example X-gal (5-bromo-4- sion of a polypeptide under specific circumstances. chloro-3-indoyl-B-D-galactoside) or o-nitrophenyl-B-D-ga To achieve Such novel phenotypes in Candida species, it lactopyranoside, which allow selective marker enzyme activ may be advantageous to modify the activity of a polypeptide ity to be assayed using a spectrophotometer, fluorometer or a by altering its sequence, and to test the effect of the polypep- 40 luminometer. tide with altered sequence within the yeast. Polypeptides of In some embodiments, the selective marker comprises a particular interest for conferring the ability to synthesize gene that encodes green fluorescent protein (GFP), which is novel hydroxy fatty acids include cytochrome P450s and their known in the art. reductases, glycosyltransferases and desaturases. A preferred In some embodiments, the selective marker encodes a pro method for testing the effect of sequence changes in a 45 tein that is capable of inducing the cell, or an extract of a cell, polypeptide within yeast is to introduce a plurality of genes of to produce light. For example, the selective marker encodes known sequence, each encoding a unique modified polypep luciferase in some embodiments. The use of luciferase is tide, into the same genomic location in a plurality of strains. known in the art. They are usually derived from firefly (Pho Some embodiments described herein make use of a selec tinous pyralis) or sea pansy (Renilla reniformis). The tive marker. A selective marker can be a gene that produces a 50 luciferase enzyme catalyses a reaction using D-luciferin and selective advantage for the cells under certain conditions such ATP in the presence of oxygen and Mg" resulting in light as a gene encoding a product that confers resistance to an emission. The luciferase reaction is quantitated using a lumi antibiotic or other compound that normally inhibits the nometer that measures light output. The assay may also growth of the host cell. include coenzyme A in the reaction that provides a longer, A selective marker can be a reporter, Such as, for example, 55 Sustained light reaction with greater sensitivity. An alternative any nucleic acid sequence encoding a detectable gene prod form of enzyme that allows the production of light and which uct. The gene product may be an untranslated RNA product can serve as a selective marker is aequorin, which is known in such as mRNA or antisense RNA. Such untranslated RNA the art. may be detected by techniques known in the art, such as PCR, In some embodiments the selective marker encodes B-lac Northern or Southern blots. The selective marker may encode 60 tamase. This selective marker has certain advantages over, for a polypeptide, such as a protein orpeptide. A polypeptide may example, lac7. There is no background activity in mamma be detected immunologically or by means of its biological lian cells or yeast cells, it is compact (29 kDa), it functions as activity. The selective marker may be any known in the art. a monomer (in comparison with lacz which is a tetramer), The selective marker need not be a natural gene. Useful and has good enzyme activity. This may use CCF2/AM, a selective markers may be the same as certain natural genes, 65 FRET-based membrane permeable, intracellularly trapped but may differ from them either in terms of non-coding fluorescent substrate. CCF2/AM has a 7-hydroxycoumarin sequences (for example one or more naturally occurring linked to a fluorescein by a cephalosporin core. In the intact US 9,359,581 B2 43 44 molecules, excitation of the coumarin results in efficient into the chromosome of Candida (Reuss et al., 2004, Gene: FRET to the fluorescein, resulting in green fluorescent cleav 341, 119-277). In the “SAT1 flipper” the recombinase is the age of the CCF2 by B-lactamase results in spatial separation flp recombinase from Saccharomyces cerevisiae (Vetteret al., of the two dyes, disrupting FRET and causing cells to change 1983, Proc Natl Acad Sci USA: 80, 7284-8) (FLP) and the from green to blue when viewed using a fluorescent micro flanking sequences recognized by the recombinase are rec scope. The retention of the cleaved product allows the blue ognition sites for the flp recombinase (FRT). The selective colour to develop over time, giving a low detection limit of marker is the gene encoding resistance to the Nourseothricin for example, 50 enzyme molecules per cell. This results in the resistance marker from transposon Tn 1825 (Tietze et al., selective maker being able to be assayed with high sensitivity. 1988, J Basic Microbiol: 28, 129-36). The entire construct It also allows the ability to confirm results by visual inspec 10 can then be targeted to the Candida chromosome by adding tion of the cells or the samples. flanking sequences with homology to a gene in the Candida In some embodiments, the selective marker comprises any chromosome. The DNA sequence of the SAT1-flipper is SEQ of the aforementioned genes under the control of a promoter. ID NO: 1. In some embodiments, the selective marker comprises any of Yeast preferentially recombines linear DNA. It is therefore the aforementioned genes under the control of a promoter as 15 advantageous to prepare the targeting construct as a linear well as one or more additional regulatory elements. Such as molecule prior to transforming it into the yeast target. In some upstream activating sequences (UAS), termination sequences embodiments it is desirable to prepare and propagate the and/or secretory sequences known in the art. The secretory targeting construct as plasmid DNA in a bacterial host such as sequences may be used to ensure that the product of the E. coli. For propagation in a bacterial host it is generally reporter gene is secreted out of the yeast cell. preferred that plasmid DNA be circular. It is thus sometimes 5.7.1. Methods for Deletion of Sequences from the Candida necessary to convert the targeting construct from a circular Genome molecule to a linear molecule. Furthermore for propagation Many yeasts recombine DNA in regions of sequence of the targeting construct in a bacterial host, additional homology. A linear DNA molecule that is introduced into a sequence elements may be necessary, so a targeting construct yeast cell can recombine homologously with the chromo 25 may, in addition to the elements shown in FIGS. 4 and 7. somal DNA if its ends share sufficient sequence identity with comprise an origin of replication and a bacterial selectable chromosomal sequences. Since the sequences of the ends of marker. It may therefore be advantageous to place restriction the DNA molecule are the primary determinant of where in sites in the targeting construct to cleave between the elements the yeast chromosome the homologous recombination event of the targeting construct shown in FIGS. 4 and 7 and the occurs, it is possible to construct a DNA molecule that 30 elements not shown but required for propagation in a bacterial encodes one or more functional genes, and to target that host. Cleavage with restriction enzymes that recognize these molecule to integrate at a specific location in the yeast chro sites will linearize the DNA and leave the targeting sequences mosome. In this way, yeast genes in the chromosome or at the ends of the molecule, favoring homologous recombi mitochondria may be disrupted, by interrupting the gene nation with the target host genome. One of ordinary skill in sequence with other sequences. 35 the art will recognize that there are alternative ways to obtain In one embodiment, a DNA construct comprises two linear DNA, for example by amplifying the desired segment sequences with homology to two sequences in the target yeast of DNA by PCR. It is also possible to prepare the DNA genome (“targeting sequences'), separated by a selective directly and transform it into the target yeast strain without marker, as shown in FIG. 11. The two target sequences within propagating as a plasmid in a bacterial host. the yeast genome are preferably located on the same molecule 40 Introduction of the linearized targeting construct into a of DNA (e.g. the same nuclear or mitochondrial chromo yeast host cell such as a Candida host cell is followed by some), and are preferably less than 1,000,000 base pairs homologous recombination catalyzed by host cell enzymes. apart, more preferably they are less than 100,000 base pairs This event is represented schematically in FIG. 5. Homolo apart, and more preferably they are less than 10,000 base pairs gous recombination occurs between each of the two targeting apart. Cells containing a genomic integration of the targeting 45 sequences in the genomic targeting construct and the homolo construct can be identified using the selective marker. gous sites in the yeast genome. The result is an integration of A schematic representation of one form of a DNA mol the targeting construct into the genomic DNA. Cells contain ecule for yeast genomic integration (a 'genomic targeting ing a genomic integration of the targeting construct can be construct”) is shown in FIG. 4. In this embodiment the identified using the selective marker. genomic targeting construct has two targeting sequences that 50 Cells containing a genomic integration of the targeting are homologous to the sequences of two regions of the target construct can optionally be tested to ensure that the integra yeast genome. In some embodiments these sequences are tion has occurred at the desired site within the genome. In one each at least 100 base pairs in length, or between 100 and 300 embodiment, such testing is performed by amplification of a base pairs in length. The targeting sequences are preferably section of the genomic DNA by the polymerase chain reac 100% identical to sequences in the host genome or between 55 tion. Integration of the targeting construct into the yeast 95% and 100% identical to sequences in the host genome. genome will replace genomic sequences with targeting con Between these targeting sequences are two sites recognized struct sequences. This replacement may be detected by a by a site-specific recombinase Such as the natural or modified difference in size of amplicon using oligonucleotide primers versions of cre or flp or PhiC31 recombinases or serine that anneal to sequences outside the targeted sequence. This is recombinases such as those from bacteriophage R4 or bacte 60 illustrated in FIG. 10. One of ordinary skill in the art will riophage TP901-1. Between the two site specific recombinase readily appreciate that there are many alternative ways to recognition sites are functional sequence elements which design oligonucleotides to produce diagnostic amplicons may include sequences that encode a site-specific recombi using the polymerase chain reaction. For example one oligo nase that recognizes the recombinase sites and which may nucleotide that anneals inside the targeted region and one also encode a selective marker as illustrated in FIG. 4. In one 65 oligonucleotide that anneals outside but close to the targeted embodiment this DNA construct incorporates the "SAT1 flip region can be used to produce an amplicon from the natural per', a DNA construct for inserting and deleting sequences genomic sequence but will not produce an amplicon if the US 9,359,581 B2 45 46 targeting construct has eliminated the targeted genomic sis of the sizes may enable the two possible genotypes to be sequence. Conversely one oligonucleotide that anneals inside distinguished, or sequencing of the amplicon may enable the the targeting construct and one oligonucleotide that anneals two possible genotypes to be distinguished. outside but close to the targeted region outside will not pro In some embodiments it may be advantageous to delete duce an amplicon from the natural genomic sequence but will sequences whose deletion will result in the inactivation of a produce an amplicon if the targeting construct has integrated cytochrome P450; in some embodiments it may be advanta at the targeted genomic location. In general oligonucleotide geous to delete sequences whose deletion will result in the pairs for producing diagnostic amplicons should be oriented inactivation of a fatty alcohol oxidase; in Some embodiments with their 3' ends towards each other and the sites in the it may be advantageous to delete sequences whose deletion genome where the two oligonucleotides anneal should be 10 will result in the inactivation of an alcohol dehydrogenase. separated by between 100 and 10,000 bases, more preferably 5.7.2. Methods for Addition of Sequences to the Candida by between 150 and 5,000 bases and more preferably by Genome between 200 and 2,000 bases. In some instances it may not be In some embodiments, new DNA sequences can be possible to distinguish between two possible genotypes based inserted into the yeast genome at a specific location using on the size of the amplicons produced by PCR from genomic 15 variations of the targeting construct. Because many yeasts DNA. In these cases an additional test is possible, for example recombine DNA in regions of sequence homology, a linear digestion of the amplicon with one or more restriction DNA molecule that is introduced into a yeast cell can recom enzymes and analysis of the sizes may enable the two possible bine homologously with the chromosomal DNA if its ends genotypes to be distinguished, or sequencing of the amplicon share Sufficient sequence identity with chromosomal may enable the two possible genotypes to be distinguished. sequences. It is thus possible to insert a DNA sequence into The same selectable marker may be used for the disruption the yeast genome at a specific location by flanking that of more than one genomic target. This can be achieved by sequence with sequences homologous to sequences within removing the selectable marker from the yeast genome after the yeast genome that Surround the desired genomic insertion each disruption. In one embodiment, this is achieved when site. Such replacements are quite rare, generally occurring the selectable marker separates two sites that are recognized 25 less than 1 time in 1,000 yeast cells, so it is often advanta by a recombinase. When the recombinase is present and geous to use a selective marker to indicate when new DNA active, it effects a recombination reaction between the two sequences have been incorporated into the yeast genome. A sites, excising the sequences between them. In the targeting selective marker can be used in conjunction with a sequence construct shown in FIG. 6this is done by induction of the gene to be integrated into the yeast genome by modifying the encoding the recombinase present in the targeting construct. 30 strategy described for deleting sequences form the yeast Expression of the recombinase causes a recombination event genome. between the two recombinase recognition sites of the target If a targeting construct comprises additional sequences ing construct, as shown schematically in FIG. 6. The result is between one of the targeting sequences and the proximal that the sequences between the two recombinase sites are recombinase site, those sequences will be retained in the excised from the genome. In other embodiments it is possible 35 genome following integration and excision of the targeting to integrate a recombinase into a second site in the host construct. An example of such a construct is shown in FIG. 7, genome instead of having it present in the targeting construct. with the additional sequences indicated as “insertion Cells from which a genomic integration of the targeting sequences.” Integration of the targeting construct for inser construct has been excised can optionally be tested to ensure tion into the yeast genome is shown Schematically in FIG. 8. that the excision has occurred by testing cells from individual 40 Homologous recombination occurs between each of the two colonies to determine whether they still carry the selective targeting sequences in the genomic targeting construct and marker. In some embodiments, such testing is performed by the homologous sites in the yeast genome. The result is an amplification of a section of the genomic DNA by the poly integration of the targeting construct into the genomic DNA. merase chain reaction. Excision of part of the targeting con Cells containing a genomic integration of the targeting con struct from the yeast genome may be detected by a difference 45 struct can be identified using the selective marker. in size of amplicon using oligonucleotide primers that anneal Cells containing a genomic integration of the targeting to sequences outside the targeted sequence. This is illustrated construct can optionally be tested to ensure that the integra in FIG. 10. One of ordinary skill in the art will readily appre tion has occurred at the desired site within the genome. In one ciate that there are many alternative ways to design oligo embodiment, Such testing may be performed by amplification nucleotides to produce diagnostic amplicons using the poly 50 of a section of the genomic DNA by the polymerase chain merase chain reaction. For example one oligonucleotide that reaction, for example as illustrated in FIG. 10. One of ordi anneals inside the targeting construct (example e.g. within the nary skill in the art will readily appreciate that there are many selective marker) and one oligonucleotide that anneals out alternative ways to design oligonucleotides to produce diag side but close to the targeted region can be used to produce an nostic amplicons using the polymerase chain reaction. amplicon from the integrated targeting construct but will not 55 The selectable marker and other sequences from the tar produce an amplicon if the targeting construct has been geting construct can be removed from the yeast genome using excised. In general oligonucleotide pairs for producing diag a recombinase-based strategy: the recombinase effects a nostic amplicons should be oriented with their 3' ends towards recombination reaction between the two recombinase sites, each other and the sites in the genome where the two oligo excising the sequences between them. In the targeting con nucleotides anneal should be separated by between 100 and 60 struct shown in FIG. 7 this is done by induction of the gene 10,000 bases, more preferably by between 150 and 5,000 encoding the recombinase present in the targeting construct. bases and more preferably by between 200 and 2,000 bases. Expression of the recombinase causes a recombination event In Some instances it may not be possible to distinguish between the two recombinase recognition sites of the target between two possible genotypes based on the size of the ing construct, as shown schematically in FIG.9. The result is amplicons produced by PCR from genomic DNA. In these 65 that the sequences between the two recombinase sites are cases an additional test is possible, for example digestion of excised from the genome, leaving the insertion sequences the amplicon with one or more restriction enzymes and analy integrated into the yeast genome. US 9,359,581 B2 47 48 Cells to which a genomic integration has been introduced manipulations causing the presence of more than one copy of can optionally be tested to ensure that the addition has the gene within the host cell genome and frequently resulting occurred correctly by polymerase chain reaction amplifica in higher activity of the gene. tion of DNA from the yeast genome. These amplicons may 5.7.3. Other Microorganisms of Interest for the Production of then be tested to measure their size (for example by agarose Oxidized Fatty Acids gel electrophoresis), or their sequence may be determined to Homology-based recombination occurs in the Saccharo ensure that precisely the desired changes have been effected. mycetacaeae Family (which is in the Sub In some embodiments, it may be advantageous to insert phylum); Saccharomycetacaeae include the Genera Ascobot sequences into a site in the genome that is known to be rvozyma, Candida, Citeromyces, Debaryomyces, Dekkera 10 (Brettanomyces), Eremothecium, Issatchenkia, Kazachsta transcriptionally active. For example inserting a sequence nia, Kluyveromyces, Kodamaea, Kregervanrija, Kuraishia, encoding a polypeptide into a genomic site where transcrip Lachancea, Lodderomyces, Nakaseomyces, Pachysolen, tion is regulated by a promoter that expresses high levels of Pichia (Hansenula), Saccharomyces, Saturnispora, Tetrapi mRNA can produce high levels of mRNA encoding the sispora, Torulaspora, Vanderwaltozyma, Williopsis, polypeptide. In some embodiments this can be done by 15 Zygosaccharomyces. The deletion and insertion methods replacing a polypeptide encoding sequence in the genome described here are therefore likely to work in these Genera. with a sequence encoding a different polypeptide, for Within the Subphylum Saccharomycotina is a monophyl example using the genomic targeting constructs of the form etic Glade containing organisms that translate CTG as serine shown in FIG. 7. instead of leucine (Fitzpatrick et al. A fungal phylogeny In some embodiments, the insertion of a sequence encod based on 42 complete genomes derived from Supertree and ing a polypeptide into a genomic site where transcription is combined gene analysis BMC Evolutionary Biology 2006, regulated by a promoter that expresses high levels of mRNA 6:99) including the species Candida lusitaniae, Candida is accomplished by adding a polypeptide encoding sequence guilliermondii and Debaryomyces hansenii, and the second into the genome at a position where a part of the genomic group containing Candida albicans, Candida dubliniensis, sequence is duplicated so that the gene that was originally 25 Candida tropicalis, Candida parapsilosis and Lodderomyces present in the genome remains. In some embodiments this can elongisporus. Of particular interest are modifications of the be effected using a DNA construct comprising a promoter activities of cytochrome P450s, fatty alcohol oxidases and sequence found in the yeast genome positioned such that alcohol dehydrogenases to modulate the host's production of transcription initiated by the promoterproduces RNA that can oxidized molecules by yeasts in this clade. Yeast species of Subsequently encode the polypeptide. Such a construct also 30 particular interest and industrial relevance within this clade include Candida aaseri, Candida abiesophila, Candida afri comprises a selectable marker that will function in the yeast cana, Candida aglyptinia, Candida agrestis, Candida aka and optionally a selectable marker that will function in a banensis, Candida alai, Candida albicans, Candida alimen bacterial host. These may optionally be the same selectable taria, Candida amapae, Candida ambrosiae, Candida marker. An example of such a construct is shown in FIG. 21. 35 amphixiae, Candida anatomiae, Candida ancudensis, Can Integration of this construct into the yeast genome is shown dida anglica, Candida anneliseae, Candida antarctica, Can schematically in FIG. 22. dida antillancae, Candida anutae, Candida apicola, Candida In some embodiments, a sequence encoding a polypeptide apis, Candida arabinofermentans, Candida arcana, Candida is inserted under control of the promoter for an isocitrate lyase ascalaphidarum, Candida asparagi, Candida atakaporum, gene or the promoter for a cytochrome P450 gene including 40 Candida atbi, Candida athensensis, Candida atlantica, Can the promoter of CYP52A12 or the promoter of CYP52A13 or dida atmosphaerica, Candida auringiensis, Candida auris, the promoter of CYP52A14 or the promoter of CYP52A17 or Candida aurita, Candida austromarina, Candida azyma, the promoter of CYP52A18 or the promoter for a fatty alcohol Candida azymoides, Candida barrocoloradensis, Candida oxidase gene including the promoter of FAO1 or the promoter batistae, Candida beechii, Candida bentonensis, Candida of FAO1B or the promoter of FAO2A or the promoter of 45 bertae, Candida berthetii, Candida bituminphila, Candida FAO2B, or the promoter for an alcohol dehydrogenase gene blanki, Candida blattae, Candida blattariae, Candida including the promoter of ADH-A4 or the promoter of ADH bOhiensis, Candida boidinii, Candida bokatorum, Candida A4B or the promoter of ADH-B4 or the promoter of ADH boleticola, Candida bolitotheri, Candida bombi, Candida B4B or the promoter of ADH-A10 or the promoter of ADH bombiphila, Candida bondarzewiae, Candida bracarensis, B11 or the promoter of ADH-A10B or the promoter of ADH 50 Candida bribrorum, Candida bromeliacearum, Candida B11B to obtain high levels of expression of a polypeptide. buenavistaensis, Candida buinensis, Candida butyri, Can In addition to naturally occurring enzymes, modified dida Californica, Candida Canberraensis, Candida cariosil enzymes may be added into the host genome. For example ignicola, Candida carpophila, Candida carvicola, Candida enzymes may be altered by incorporating systematically var caseinolytica, Candida castrensis, Candida catenulata, Can ied sets of amino acid changes, with the resulting changes in 55 dida cellae, Candida cellulolytica, Candida cerambyci phenotypes measured and used to identify sequence changes darum, Candida chauliodes, Candida chickasaworum, Can conferring improved function. See, for example, United dida chilensis, Candida choctaworum, Candida Chodatii, States Patent Publications Nos. 2006O136184 and Candida chrysomelidarum, Candida cidri, Candida cloacae, 2008.0050357; Liao et al., 2007, BMC Biotechnol 7, 16; Candida coipomoensis, Candida conglobata, Candida Cory Ehren et al., 2008, Protein Eng Des Sel 21, 699-707 and 60 dali, Candida cylindracea, Candida davenportii, Candida Heinzelman et al., 2009, Proc Natl AcadSci USA 106,5610 davisiana, Candida deformans, Candida dendrica, Candida 5615. Using these methods, modified versions of enzymes dendronema, Candida derodonti, Candida diddensiae, Can may be obtained that confer on the host cell an improved dida digboiensis, Candida diospyri, Candida diversa, Can ability to utilize one or more substrate or an improved ability dida dosseyi, Candida drinydis, Candida drosophilae, Can to perform one or more chemical conversion. A gene that has 65 dida dubliniensis, Candida easanensis, Candida edaphicus, been modified by these methods may be made more useful in Candidaedax, Candida elateridarum, Candida emberorum, the genome of the host by amplification, that is by genetic Candida endomychidarum, Candida entomophila, Candida US 9,359,581 B2 49 50 ergastensis, Candida ernobii, Candida etchellsii, Candida songkhlaensis, Candida Sonorensis, Candida Sophiae-regi ethanolica, Candida famata, Candida fennica, Candida fer nae, Candida Sorbophila, Candida SOrbosivorans, Candida menticarens, Candida flocculosa, Candida fioricola, Can Sorboxylosa, Candida Spandovensis, Candida Steatolytica, dida fioris, Candida fiosculorum, Candida fluviatilis, Can Candida Stellata, Candida Stellimalicola, Candida Stri, Can dida fragi, Candida freyschlussii, Candida friedrichii, dida Subhashii, Candida succiphila, Candida Suecica, Can Candida frijolesensis, Candida fructus, Candida fukazawae, dida Suzukii, Candida takamatsuzukensis, Candida taliae, Candida fungicola, Candida galacta, Candida galis, Can Candida tammamiensis, Candida tanzawaensis, Candida tar dida galli, Candida gatunensis, Candida gelsemii, Candida tarivorans, Candida temnochilae, Candida tenuis, Candida geochares, Candida germanica, Candidaghanaensis, Can tepae, Candida terraborum, Candida tetrigidarum, Candida dida gigantensis, Candida glaebosa, Candida glucosophila, 10 thaimueangensis, Candida thermophila, Candida tilneyi, Candida glycerinogenes, Candida gorgasii, Candida gotoi, Candida tolerans, Candida torresii, Candida tritonae, Can Candida gropengiesseri, Candida guaymorum, Candida dida tropicalis, Candida trypodendroni, Candida tsuchiyae, haemulonii, Candida halonitratophila, Candida halophila, Candida tumulicola, Candida ubatubensis, Candida ulmi, Candida hasegawae, Candida hawaiiana, Candida helico Candida vaccinii, Candida valdiviana, Candida vanderk niae, Candida hispaniensis, Candida homilentoma, Candida 15 liftii, Candida vanderwaltii, Candida vartiovaarae, Candida humicola, Candida humilis, Candida hungarica, Candida versatilis, Candida vini, Candida viswanathii, Candida wick hyderabadensis, Candida incommunis, Candida incon erhamii, Candida wounanorum, Candida wyomingensis, spicua, Candida insectalens, Candida insectamans, Candida Candida xylopsoci, Candida yuchorum, Candida Zemplinina, insectorum, Candida intermedia, Candida ipomoeae, Can Candida zeylanoides dida ishiwadae, Candida jaroonii, Candida jeffriesii, Can 5.7.4. Engineering of Additional Enzymes into Candida to dida kanchanaburiensis, Candida karawaiiewii, Candida Further Diversify Structures of Products Formed. kashinagacola, Candida kazuoi, Candida khmerensis, Can Different fatty acids are hydroxylated at different rates by dida kipukae, Candida kofuensis, Candida krabiensis, Can different cytochrome P450s. To achieve efficient hydroxyla dida kruisii, Candida kunorum, Candida labiduridarum, tion of a desired fatty acid feedstock, one strategy is to express Candida lactis-condensi, Candida lassenensis, Candida lau 25 P450 enzymes within Candida that are active for co-hydroxy reliae, Candida leandrae, Candida lessepsii, Candida ligni lation of a wide range of highly abundant fatty acid feed cola, Candida litsaeae, Candida lit.seae, Candida llan stocks. Of particular interest are P450 enzymes that catalyze quihuensis, Candida lycoperdinae, Candida lyxosophila, ()-hydroxylation of lauric acid (C12:0), myristic acid (C14: Candida magnifica, Candida magnoliae, Candida maltosa, O), palmitic acid (C16:0), stearic acid (C18:0), oleic acid Candida mannitofaciens, Candida maris, Candida maritima, 30 (C18:1), linoleic acid (C18:2), and C-linolenic acid (c)3. C18: Candida maxii, Candida melibiosica, Candida membranifa 3). Examples of P450 enzymes with known ()-hydroxylation ciens, Candida mesenterica, Candida metapsilosis, Candida activity on different fatty acids that may be cloned into Can methanolophaga, Candida methanolovescens, Candida dida are the following: CYP94A1 from Vicia sativa (Tijet et methanosorbosa, Candida methylica, Candida michaelii, al., 1988, Biochemistry Journal 332, 583-589); CYP 94A5 Candida mogii, Candida montana, Candida multigennis, 35 from Nicotiana tabacum (Le Bouquin et al., 2001, Eur J Candida mycetangii, Candida naeodendra, Candida Biochem 268, 3083-3090); CYP78A1 from Zea mays (Lar nakhonratchasimensis, Candida nanaspora, Candida natal kin, 1994, Plant Mol Biol 25, 343-353); CYP 86A1 (Ben ensis, Candida neerlandica, Candida memodendra, Candida veniste et al., 1998, Biochem Biophys Res Commun 243, nitrativorans, Candida nitratophila, Candida nivariensis, 688-693) and CYP86A8 (Wellesen et al., 2001, Proc Natl Candida nodaensis, Candida norvegica, Candida novakii, 40 Acad Sci USA 98, 9694-9699) from Arabidopsis thaliana: Candida Odintsovae, Candida oleophila, Candida Ontarioen CYP92B1 from Petunia hybrida (Petkova-Andonova et al., sis, Candida Ooitensis, Candida Orba, Candida Oregonensis, 2002, Biosci Biotechnol Biochem 66, 1819-1828); Candida Orthopsilosis, Candida Ortonii, Candida Ovalis, CYP102A1 (BM-3) mutant F87 from Bacillus megaterium Candida pallodes, Candida palmioleophila, Candida palu (Oliver et al., 1997, Biochemistry 36, 1567-1572); and CYP digena, Candida panamensis, Candida panamericana, Can 45 4 family from mammal and insect (Hardwick, 2008, Biochem dida parapsilosis, Candida pararugosa, Candida pattanien Pharmacol 75,2263-2275). sis, Candida peltata, Candida peoriaensis, Candida A second strategy to obtain efficient hydroxylation (or petrohuensis, Candida phangingensis, Candidapicachoensis, further oxidation of the hydroxy group to an aldehyde or Candida piceae, Candida picinguabensis, Candida pigna dicarboxylic acid) of a modified fatty acid is to perform the liae, Candida pimensis, Candida pini, Candida plutei, Can 50 hydroxylation first and then to expose the hydroxylated fatty dida pomicola, Candida ponderosae, Candida populi, Can acid or aldehyde or dicarboxylic acid to an additional dida powellii, Candida prunicola, Candida pseudogliaebosa, enzyme. Candida pseudohaemulonii, Candida pseudointermedia, For example incorporating one or more desaturase Candida pseudolambica, Candida pseudorhagii, Candida enzymes into engineered Candida would allow the introduc pseudovanderkliftii, Candida psychrophila, Candida pyral 55 tion of double bonds into ()-hydroxyl fatty acids or aldehydes idae, Candida qinlingensis, Candida quercitrusa, Candida or dicarboxylic acids at desired positions. Examples of quercuum, Candida railenensis, Candida ralunensis, Can desaturases with known specificity that may be cloned into dida rancensis, Candida restingae, Candida rhagii, Candida Candida are the following: A' desaturase from rat liver riodocensis, Candida rugopelliculosa, Candida rugosa, microsomes (Savile et al., 2001, JAm ChemSoc. 123,4382 Candida Sagamina, Candida Saitoana, Candida sake, Can 60 4385), A desaturase from Bacillus subtilis (Fauconnot and dida Salmanticensis, Candida Santamariae, Candida Santia Buist, 2001, Bioorg Med Chem Lett 11, 2879-2881), A cobensis, Candida Saopaulonensis, Candida Savonica, Can desaturase from Tetrahymena thermophila (Fauconnot and dida Schatavi, Candida sequanensis, Candida Sergipensis, Buist, 2001, J Org Chem 66, 1210-1215), A desaturase from Candida Shehatae, Candida Silvae, Candida Silvanorum, Saccharomyces cerevisiae (Buist and Behrouzian, 1996, J Candida Silvatica, Candida Silvicola, Candida Silvicultrix, 65 Am Chem Soc 118, 6295-6296): A' desaturase from Candida Sinolaborantium, Candida Sithepensis, Candida Spodoptera littoralis (Pinilla et al., 1999, Biochemistry 38, Smithsonii, Candida sojae, Candida Solani, Candida 15272-15277), A' desaturase from Arabidopsis thaliana US 9,359,581 B2 51 52 (Buist and Behrouzian, 1998, JAm ChemSoc. 120,871-876); host cell genome and frequently resulting in higher activity of A desaturase from Caenorhabditis elegans (Meesapyodsuk the gene. Expression of one or more additional enzymes may et al., 2000, Biochemistry 39, 11948-11954). Many other also be used to functionalize the oxidized fatty acid, either the desaturases are known in the literature that can also be hydroxyl group or more highly oxidized groups such as alde expressed in engineered Candida Strains including Candida 5 hydes or carboxylic acids tropicalis strains to introduce unsaturation at specific sites of fatty acid Substrates prior to co-hydroxylation or to catalyze 6. Biotransformation Examples carbon-carbon double bond formation after ()-hydroxylation of fatty acids. The following examples are set forth so as to provide those Expression in engineered Candida strains of P450 10 of ordinary skill in the art with a complete description of how enzymes that are known in the literature to introduce addi to practice, make and use exemplary embodiments of the tional internal hydroxylation at specific sites of fatty acids or disclosed methods, and are not intended to limit the scope of ()-hydroxyfatty acids can be used to produce internally oxi what is regarded as the invention. dized fatty acids or co-hydroxyfatty acids or aldehydes or dicarboxylic acids. Examples of P450 enzymes with known 15 6.1. General Biotransformation Procedure in in-chain hydroxylation activity on different fatty acids that Shake-Flask may be cloned into Candida are the following: CYP81B1 from Helianthus tuberosus with ()-1 to co-5 hydroxylation C. tropicalis ATCC20962 from freshagar plate or glycerol (Cabello-Hurtado et al., 1998, J Biol Chem 273, 7260-7267); stock was precultured in 30 ml YPD medium consisting of (g CYP790C1 from Helianthus tuberosus with (0-1 and (0-2 1'): yeast extract, 10; peptone, 10; glucose, 20 and shaken at hydroxylation (Kandelet al., 2005, J Biol Chem 280, 35881 250 rpm, 30° for 20 hours in 500 ml flask. After 16 hours of 35889); CYP726A1 from Euphorbia lagscae with epoxida cultivation at 250 rpm, 30° C., preculture was inoculated at tion on fatty acid unsaturation (Cahoon et al., 2002, Plant 10% (v/v) to 30 ml conversion medium consisting of (g 1"): Physiol 128, 6.15-624); CYP152B1 from Sphingomonas peptone, 3; yeast extract, 6; yeast nitrogen base, 6.7; acetic paucimobilis with O-hydroxylation (Matsunaga et al., 2000, 25 acid, 3: KHPO, 7.2: KHPO 9.3; glucose/glycerol. 20 in Biomed Life Sci 35, 365-371); CYP2E1 and 4A1 from 500 ml flask and shaked at 250 rpm. The initial concentration human liver with ()-1 hydroxylation (Adas et al., 1999, J. Lip of substrate was about 10-20 gl'. pH was adjusted to 7.5 by Res 40, 1990-1997); P450s from Bacillus subtilis with C addition of 2 moll-1 NaOH solution after 12 hour culture. and 3-hydroxylation (Lee et al., 2003, J Biol Chem 278, During biotransformation, concentrated co-substrate (glu 9761–9767); and CYP102A1 (BM-3) from Bacillus megate 30 cose/glycerol/sodium acetate/ethanol) was fed (1-2.5% per rium with (D-1, co-2 and c)-3 hydroxylation (Shirane et al., day) and pH was maintained at 7.5-8.0 by addition of NaOH 1993, Biochemistry 32, 13732-13741). solution. Samples were taken on a daily basis to determine In addition to naturally occurring enzymes, modified levels of product by LC-MS. enzymes may be added into the host genome. For example enzymes may be altered by incorporating systematically var 35 6.2. General Biotransformation Procedure in ied sets of amino acid changes, with the resulting changes in Fermentor phenotypes measured and used to identify sequence changes conferring improved function. See, for example, United Fermentation was carried out in 3-1 Bioflo3000 fermentor States Patent Publications Nos. 2006O136184 and (New Brunswick Scientific Co., USA) in fed-batch culture. 2008.0050357; Liao et al., 2007, BMC Biotechnol 7, 16; 40 The conversion medium mentioned above was used except Ehren et al., 2008, Protein Eng Des Sel 21, 699-707 and for addition of 0.05% antifoam 204 (Sigma) and 0.5% sub Heinzelman et al., 2009, Proc Natl AcadSci USA 106,5610 strate. The seed culture from freshagar plate or glycerol stock 5615. Using these methods, modified versions of cytochrome was prepared in 50 ml of conversion medium for 20 hours at P450s may be obtained with improved ability to oxidise fatty 30°C.,250 rpm prior to inoculation into the fermentor vessel. acids of different lengths (for example C6, C7, C8, C9, C10, 45 Following inoculation, the culture was maintained at pH 6.3 C11, C12, C13, C14, C15, C16, C17, C18, C19, C20, C21, and grown at 30,900 rpm withaeration rate of 1.5VVm. After C22, C23, C24) or different degrees of saturation (for 12 hour fermentations (growth phase), biotransformation example fatty acids with one carbon-carbon double bond, phase was started with feeding of substrate (2 mill'). Con fatty acids with two carbon-carbon double bonds and fatty centrated glucose (500 gl') as co-substrate was fed continu acids with three carbon-carbon double bonds) or with unsat 50 ously at the rate of 1.2 g 1-1 h-1. During the biotransforma urated fatty acids where the unsaturated bond is at different tion phase, pH was maintained at 7.6 automatically by positions relative to the carboxyl group and the ()-position, to addition of 4 mol l NaOH solution. Antifoam (Antifoam hydroxy fatty acids or to dicarboxylic fatty acids. Further, 204) was also added to the fermentor as necessary. Samples using these methods modified versions of fatty alcohol oxi were taken on a daily basis to determine levels of product by dases or alcohol dehydrogenases may be obtained with 55 LC-MS. improved ability to oxidise hydroxy-fatty acids of different lengths (for example C6, C7, C8, C9, C10, C11, C12, C13, 6.3. General Extraction and Purification Procedure of C14, C15, C16, C17, C18, C19, C20, C21, C22, C23, C24) or Biotransformation Products different degrees of saturation (for example fatty acids with one carbon-carbon double bond, fatty acids with two carbon 60 The fermentation broth was acidified to pH 1.0 with HCl carbon double bonds andfatty acids with three carbon-carbon and extracted twice with diethyl ether. To avoid the epoxy double bonds) or with unsaturated fatty acids where the unsat ring-opening during acidification, the fermentation broth urated bond is at different positions relative to the carboxyl with products containing epoxy groups was slowly acidified group and the co-position. A gene that has been modified by to pH 3.0 with 5 N HC1. Solvent was evaporated under these methods may be made more useful in the genome of the 65 vacuum with a rotary evaporator. The residual obtained was host by amplification, that is by genetic manipulations caus separated by silica gel column chromatography using silica ing the presence of more than one copy of the gene within the gel 60. The fractions containing impurities, un-reacted mono US 9,359,581 B2 53 54 fatty acids and products were gradually eluted with a mixture tice, make and use various disclosed exemplary embodi of n-hexane/diethyl ether that their ratio ranges from 90:30 to ments, and are not intended to limit the scope of what is 10:90. The fractions containing same compound were col regarded as the invention. lected together and the solvents were evaporated under The strains shown in Table 2 and further described in this vacuum with a rotary evaporator. 5 section were constructed by the synthesis and cloning of DNA and its Subsequent transformation into the appropriate C. tropicalis strain. Table 2 summarizes the DNA sequences 7. Genetic Modification Examples synthesized and used in these examples. Table 3 Summarizes the C. tropicalis strains constructed in these examples. Sec The following examples are set forth so as to provide those tion 7.1 describes the methods used for transformation of of ordinary skill in the art with a description of how to prac Candida tropicalis. TABLE 2

NAME ID NO: GINo. SOURCE CONSTRUCTION APPLICATION SAT1 Flipper 1 50059745 Joachim Morschhauser Source of the SAT1 Flipper CYP52A17 2 29469874 Used to design CYP52A17 A CYP52A17 A 3 Not Gene synthesis Used to construct applicable CYP52A17::SAT1 CYP52A17:SAT1 4 Not Subcloning of SAT1 Used to delete CYP52A17 applicable flipper into CYP52A17 A CYP52A13 5 294698.64 Used to design CYP52A13 A CYP52A13 A 6 Not Gene synthesis Used to construct applicable CYP52A13::SAT1 CYP52A13:SAT1 7 Not Subcloning of SAT 1 Used to delete CYP52A13 applicable flipper into CYP52A13 A CYP52A18 8 29469876 Used to design CYP52A18. A CYP52A18. A 9 Not Gene synthesis Used to construct applicable CYPS2A18::SAT1 CYPS2A18::SAT1 11 Not Subcloning of SAT1 Used to delete CYP52A18 applicable flipper into CYP52A18. A CYP52A14 13 29469866 Used to design CYP52A14 A Gene#1179 179 CYP52A14 A 14 No Gene synthesis Used to construct applicable CYPS2A14::SAT1 CYPS2A14::SAT1 15 No Subcloning of SAT1 Used to delete CYP52A14 applicable flipper into CYP52A14 A FAO1 16 44194456 Used to design FAO1 A FAO1 A 17 No Gene synthesis Used to construct applicable FAO1::SAT1 FAO1::SAT1 18 No Subcloning of SAT1 Used to delete FAO1 applicable flipper into FAO1 A FAO1B 19 No Used to design FAO1B A applicable FAO1B A 2O No Assembly PCR. Product Used to construct applicable not cloned. FAO1B::SAT1 FAO1B::SAT1 21 No Ligation of SAT1 flipper Used to delete FAO1B applicable to assembly PCR product of FAO1B A FAO2A 22 44194479 Used to design FAO2A A FAO2A A 23 No Gene synthesis Used to construct applicable FAO2A::SAT1 FAO2A::SAT1 24 No Subcloning of SAT1 Used to delete FAO2A applicable flipper into FAO2A A FAO2B 25 44194514 Used to design FAO2B A FAO2B A 26 No Gene synthesis Used to construct applicable FAO2B::SAT1 FAO2B::SAT1 27 No Subcloning of SAT1 Used to delete FAO2B applicable flipper into FAO2B A CYP52A12 28 294698.62 Used to design CYP52A12 A CYP52A12 A 29 Not Gene synthesis Used to construct applicable CYPS2A12::SAT1 CYPS2A12::SAT1 30 Not Subcloning of SAT1 Used to delete CYP52A12 applicable flipper into CYP52A12 A CYP52A12B Not Used to design applicable CYP52A12B A CYP52A12B A 31 Not Gene synthesis Used to construct applicable CYPS2A12B::SAT1 CYPS2A12B::SAT1 32 Not Subcloning of SAT1 Used to delete CYP52A12B applicable flipper into CYP52A12B A ADH-A4 39 Not Used to design ADH-A4 A applicable US 9,359,581 B2 55 TABLE 2-continued SEQ NAME ID NO: GINo. SOURCECONSTRUCTION APPLICATION ADH-A4 A 44 No Gene synthesis Used to construct ADH applicable A4::SAT ADH-A4::SAT1 45 No Subcloning of SAT1 Osed to delete ADH-A4 applicable flipper into ADH-A4 A ADH-A4B No Used to design ADH applicable A4B A ADH-A4B A 46 No Gene synthesis Used to construct ADH applicable A4B::SAT1 ADH-A4B::SAT1 47 No Subcloning of SAT1 Osed to delete ADH-A4B applicable flipper into ADH-A4B A ADH-B4 42 No Used to design ADH-B4 A applicable ADH-B4 A 48 No Gene synthesis Used to construct ADH applicable B4::SAT ADH-B4::SAT1 49 No Subcloning of SAT1 Used to delete ADH-B4 applicable flipper into ADH-B4 A ADH-B4B No Used to design ADH applicable B4B A ADH-B4B A 50 No Gene synthesis Used to construct ADH applicable B4B::SAT1 ADH-B4B::SAT1 51 No Subcloning of SAT1 Used to delete ADH-B4B applicable flipper into ADH-B4B A ADH-A10 40 No Used to design ADH applicable A10 A ADH-A10 A 52 No Gene synthesis Used to construct ADH applicable A10::SAT1 ADH-A10::SAT1 53 No Subcloning of SAT1 Osed to delete ADH-A10 applicable flipper into ADH-A10 A ADH-B11 43 No Used to design ADH applicable B11 A ADH-B11 A S4 No Gene synthesis Used to construct ADH applicable B11::SAT1 ADH-B11::SAT1 55 No Subcloning of SAT1 Used to delete ADH-B11 applicable flipper into ADH-B11 A ADH-A1OB 56 No Used to design ADH applicable A1 OB A ADH-A1OB A 57 No Gene synthesis Used to construct ADH applicable A1 OB::SAT1 ADH-A1OB::SAT1 58 No Subcloning of SAT1 Used to delete ADH-A10B applicable flipper into ADH A1 OB A ADH-B11B 59 No Used to design ADH applicable B11B A ADH-B11B A 60 No Gene synthesis Used to construct ADH applicable B11B::SAT1 ADH-B11B::SAT1 61 No Subcloning of SAT1 Used to delete ADH-B11B applicable flipper into ADH B11B A ICL promoter 62 No Gene synthesis Used as a component o applicable genomic integration an expression constructs (e.g. SEQ ID NO: 70, SEQID NO: 71, SEQ ID NO: 74, etc.) ICL terminator 63 Not Gene synthesis Used as a component o applicable genomic integration an expression constructs (e.g. SEQ ID NO: 70, SEQID NO: 71, SEQ ID NO: 74, etc.) TEF1 promoter 64 Not Gene synthesis Used as a component o applicable genomic integration an expression constructs (e.g. SEQ ID NO: 70, SEQID NO: 71, SEQ ID NO: 74, etc.) EM7 promoter 65 Not Gene synthesis Used as a component o applicable genomic integration and expression constructs (e.g. SEQ ID NO: 70, SEQID NO: 71, SEQ ID NO: 74, etc.) ZeoR 66 Not Gene synthesis of gene Used as a component o applicable optimized for Candida genomic integration and expression constructs (e.g. US 9,359,581 B2 57 58 TABLE 2-continued SEQ NAME ID NO: GINo. SOURCECONSTRUCTION APPLICATION SEQ ID NO: 70, SEQID NO: 71, SEQ ID NO: 74, etc.) CYC1 transcription 67 Not Gene synthesis Used as a component of terminator applicable genomic integration and expression constructs (e.g. SEQ ID NO: 70, SEQID NO: 71, SEQ ID NO: 74, etc.) pUC origin of 68 Not Gene synthesis Used as a component of replication applicable genomic integration and expression constructs (e.g. SEQ ID NO: 70, SEQID NO: 71, SEQ ID NO: 74, etc.) CYP52A17 69 No Gene synthesis Cloned into genomic applicable integration and expression constructs to express (e.g. SEQ ID No: 70) pXICL:CYP52A17 70 No CYP52A17 cloned into Used to express CYP52A17 applicable genomic integration in Candida tropicalis under Wector control of the isocitrate lyase promoter CYP52A13 71 No Gene synthesis Cloned into genomic applicable integration and expression constructs to express(e.g. SEQ ID NO: 71) pXICL:CYP52A13 72 No CYP52A13 cloned into Used to express CYP52A13 applicable genomic integration in Candida tropicalis under Wector control of the isocitrate lyase promoter CYP52A12 73 No Gene synthesis Cloned into genomic applicable integration and expression constructs to express(e.g. SEQ ID NO: 74) pXICL:CYP52A12 74 No CYP52A12 cloned into Used to express CYP52A12 applicable genomic integration in Candida tropicalis under Wector control of the isocitrate lyase promoter mCherry 75 No Gene synthesis Cloned into genomic applicable integration and expression constructs to express mCherry (e.g. SEQID NO: 76) pXICL::mCherry 76 Not mCherry cloned into Used to express mCherry in applicable genomic integration Candida tropicalis under Wector control of the isocitrate lyase promoter

TABLE 3 Strain Name Genotype Description DP1 ura AfuraB American Type Culture Collection (ATCC pox5::ura3A pox5::ura3A 20962) pox4A::ura3A pox4B::URA3A DP65 DP1 CYP52A17:SAT1 Electroporation of DP1 with CYP52A17:SAT1 (SEQ ID NO: 4) and selection for nourseothricin resistance followed by PCR screens for targeting construct insertion into CYP52A17 DP78 DP1 ACYP52A17 Growth of DP65 with maltose followed by agar plate screen for loss of nourseothricin resistance and PCR screen for excision of targeting construct from CYP52A17 DP107 DP1 ACYP52A17 Electroporation of DP78 with CYP52A13:SAT1 CYP52A13:SAT1 (SEQ ID NO: 7) and selection for nourseothricin resistance followed by PCR screens for targeting construct insertion into CYP52A13 DP113 DP1 ACYP52A17 ACYP52A13 Growth of DP107 with maltose followed by agar plate screen for loss of US 9,359,581 B2 59 60 TABLE 3-continued

Strain Name Genotype Description nourseothricin resistance and PCR screen or excision of targeting construct from CYP52A13 DP140 DP1 Electroporation of DP113 with ACYP52A17/CYP52A18::SAT1 CYP52A18:SAT1 (SEQ ID NO: 11) and ACYPS2A13 selection for nourseothricin resistance ollowed by PCR screens for targeting construct insertion into CYP52A18 DP142 DP1 ACYP52A17 ACYP52A18 Growth of DP140 with maltose followed ACYPS2A13 by agar plate screen for loss of nourseothricin resistance and PCR screen or excision of targeting construct from CYP52A18 DP170 Electroporation of DP142 with CYP52A14::SAT1 (SEQID NO: 15) and selection for nourseothricin resistance ollowed by PCR screens for targeting construct insertion into CYP52A14 DP174 Growth of DP170 with maltose followed by agar plate screen for loss of nourseothricin resistance and PCR screen or excision of targeting construct from CYP52A14 DP182 Electroporation of DP174 with FAO1::SAT1 (SEQ ID NO: 18) and selection for nourseothricin resistance ollowed by PCR screens for targeting construct insertion into FAO1 DP186 Growth of DP182 with maltose followed by agar plate screen for loss of nourseothricin resistance and PCR screen or excision of targeting construct from FAO1 DP197 DP1 ACYP52A17 ACYP52A18 Electroporation of DP186 with ACYPS2A13:ACYPS2A14 pXICL::mCherry (SEQID NO: 76) and AFAO1 pXICL::mCherry selection for Zeocin resistance followed by PCR screens for targeting construct insertion into the isocitrate lyase gene Electroporation of DP186 with pXICL:CYP52A17 (SEQID NO:70) and selection for Zeocin resistance followed by PCR screens for targeting construct insertion into the isocitrate lyase gene DP238 Electroporation of DP186 with FAO1B::SAT1 (SEQID NO: 21) and selection for nourseothricin resistance ollowed by PCR screens for targeting construct insertion into FAO1B DP240 Growth of DP238 with maltose followed by agar plate screen for loss of nourseothricin resistance and PCR screen or excision of targeting construct from FAO1B DP255 Electroporation of DP240 with FAO2A::SAT1 (SEQ ID NO: 21) and selection for nourseothricin resistance ollowed by PCR screens for targeting construct insertion into FAO2A DP256 Growth of DP255 with maltose followed by agar plate screen for losso A nourseothricin resistance and PCR screen or excision of targeting construct from FAO2A DP258 D Electroporation of DP256 with DP259 FAO2B::SAT1 (SEQID NO: 27) and selection for nourseothricin resistance ollowed by PCR screens for targeting construct insertion into FAO2B DP261 Growth of DP259 with maltose followed by agar plate screen for losso nourseothricin resistance and PCR screen or excision of targeting construct from FAO2B DP268 Electroporation of DP261 with CYP52A12::SAT1 (SEQ ID NO:30) and selection for nourseothricin resistance ollowed by PCR screens for targeting construct insertion into CYP52A12 US 9,359,581 B2 61 62 TABLE 3-continued

Strain Name Genotype Description DP272 Growth of DP268 with maltose followed by agar plate screen for loss of nourseothricin resistance and PCR screen or excision of targeting construct from CYP52A12 DP282 Electroporation of DP272 with CYP52A12B::SAT1 (SEQID NO:32) and selection for nourseothricin resistance ollowed by PCR screens for targeting C construct insertion into CYP52A12B DP283 Growth of DP282 with maltose followed DP284 by agar plate screen for loss of nourseothricin resistance and PCR screen or excision of targeting construct from CYP52A12B DP387 Electroporation of DP283 with ADH A4::SAT1 (SEQID NO: 45) and selection or nourseothricin resistance followed by PCR screens for targeting construct insertion into ADH-A4

DP388 Growth of DP387 with maltose followed by agar plate screen for loss of nourseothricin resistance and PCR screen or excision of targeting construct from ADH-A4

DP389 Electroporation of DP388 with ADH A4B::SAT1 (SEQID NO: 47) and selection for nourseothricin resistance ollowed by PCR screens for targeting construct insertion into ADH-A4B

DP390 Growth of DP389 with maltose followed by agar plate screen for loss of nourseothricin resistance and PCR screen or excision of targeting construct from ADH-A4B

DP397 Electroporation of DP390 with ADH B4::SAT1 (SEQID NO:49) and selection or nourseothricin resistance followed by PCR screens for targeting construct insertion into ADH-B4 A.R DP398 Growth of DP397 with maltose followed by agar plate screen for loss of nourseothricin resistance and PCR screen for excision of targeting construct from YN. ADH-B4 f A. D DP409 Electroporation of DP398 with ADH B4B::SAT1 (SEQID NO:49) and selection N. for nourseothricin resistance followed by PCR screens for targeting construct insertion into ADH-B4B

DP411 Growth of DP409 with maltose followed by agar plate screen for loss of nourseothricin resistance and PCR screen for excision of targeting construct from k . ADH-B4B DP415 Electroporation of DP411 with ADH A10::SAT1 (SEQID NO: 53) and AO selection for nourseothricin resistance followed by PCR screens for targeting . construct insertion into ADH-A10

DP416 Growth of DP415 with maltose followed by agar plate screen for loss of nourseothricin resistance and PCR screen US 9,359,581 B2 63 64 TABLE 3-continued

Strain Name Genotype Description for excision of targeting construct from ADH-A10 RAR ADH DP417 P1 ACY Electroporation of DP416 with ADH AR Y B11::SAT1 (SEQID NO: 55) and selection for nourseothricin resistance followed by PCR screens for targeting construct insertion into ADH-B11

4, AADH

DP421 P1 ACY Growth of DP417 with maltose followed C by agar plate screen for loss of nourseothricin resistance and PCR screen for excision of targeting construct from ADH-B11

DP423 Electroporation of DP421 with ADH DP424 A1OB::SAT1 (SEQID No. 58) and AFAO1. A selection for nourseothricin resistance followed by PCR screens for targeting construct insertion into ADH-A1OB

B4/AADH A10ADH B11 DP427 DP1 ACY PS2A17 ACYPS2A18 Electroporation of DP421 with DP428 pXICL:CYP52A17 (SEQID NO:70) and AFAO1. A selection for Zeocin resistance followed by PCR screens for targeting construct insertion into the isocitrate lyase gene

DP431 Growth of DP424 with maltose followed by agar plate screen for loss of AFAO1. A nourseothricin resistance and PCR screen for excision of targeting construct from ADH-A1 OB

B4/AADH A10/AAD DP433 DP1 ACY Electroporation of DP431 with ADH DP434 B11 B::SAT1 (SEQID NO: 61) and selection for nourseothricin resistance followed by PCR screens for targeting construct insertion into ADH-B11B

B4/AADH A10/AAD

DP436 DP1 ACY PS2A17 ACYPS2A18 Growth of DP433 with maltose followed DP437 by agar plate screen for loss of nourseothricin resistance and PCR screen for excision of targeting construct from ADH-B11B

A10/AAD H-A1 OBAADH B11 AADHB11B DP522 DP1 ACY PS2A17 ACYPS2A18 Electroporation of DP421 with DP523 pXICL:CYP52A13 (SEQID NO: 72) and selection for Zeocin resistance followed by PCR screens for targeting construct insertion into the isocitrate lyase gene

DP526 DP1 ACY PS2A17 ACYPS2A18 Electroporation of DP421 with DP527 pXICL:CYP52A12 (SEQID NO: 74) and selection for Zeocin resistance followed by PCR screens for targeting construct insertion into the isocitrate lyase gene US 9,359,581 B2 65 66 TABLE 3-continued Strain Name Genotype Description

7.1. General Protocols for Transformation of Bio-Rad 0.2 cm electrode gap Gene Pulser cuvette. The cells Candida 10 were then electroporated at 1.8 kV using a Bio-Rad E. coli Pulser, 1 ml of 1M D-Sorbitol was added and the electropo The protocols described in this section have been per rated cells were transferred to a 14 ml culture tube and 1 ml of formed using Candida tropicalis. However it is expected that 2xYPD broth was added. Cells were then rolled on a Roller they will work in the Saccharomycetacaeae Family in general drum for 1 hour at 37°C. before spreading 100 ul on 100 mm and the Candida genus in particular without undue experi 15 diameter plates containing YPD Agar--200 ug/ml nourseo mentation since the methods rely upon homologous recom thricin. Plates were incubated for 2-4 days at 30° C. Large bination which is found throughout this Family. colonies (8-16) were individually streaked onto aYPD Agar 7.1.1. Preparation of DNA Targeting Constructs Prior to Inte plate to purify. A single colony from each streak was patched gration into Candida tropicalis to a YPD agar stock plate and incubated overnight at 30° C. A linear segment of DNA of the form shown schematically 7.1.3. Genomic DNA Preparation and PCR Test for Integra in either FIG. 4 or FIG. 7 was prepared by digesting between tion of Genomic Targeting Constructs at the Desired Location 2.5 and 5ug of the plasmid containing the targeting construct in Candida tropicalis with flanking restriction enzymes, in the examples below the Between 5 and 30 nourseothricin-resistant isolates were restriction enzyme BsmBI from New England Biolabs was 25 eachinoculated into 2 ml of YP Broth and rolled overnight at used according to the manufacturers instructions. The digest 30°C. on a Rollerdrum. Genomic DNA from a 0.5 ml sample was purified using Qiagen’s PCR purification kit, eluted in 75 of each culture was isolated using Zymo Research’s YeaStar of Qiagen's EB buffer (elution buffer) and transformed into genomic DNA isolation kit according to the manufacturers C. tropicalis by electroporation. instructions, eluting the DNA in 120 ul of TE, pH 8.0. 7.1.2. Preparation of Electrocompetent Candida tropicalis 30 For PCR tests, 2.5ul of the resulting g|DNA was used in a The desired C. tropicalis strain was densely streaked from 50ul PCR amplification reaction. As a control for each analy a culture stored at -80°C. in growth media (YPD) containing sis, genomic DNA was prepared from the parental strain that 10% glycerol, onto 2-3 100 mm YPD Agar plates and incu was transformed with the targeting construct. Oligonucle bated overnight at 30°C. The next morning 10 ml YPD broth otide primers for PCR analysis were chosen to lie within the was spread onto the surface of the YPD agar plates and the 35 targeting construct and/or in the genomic sequence Surround yeast cells were scraped from the plates with the aid of a ing the desired integration location, as shown for example in sterile glass spreader. Cells (of the same strain) from the 2-3 FIG. 10. The size of amplicons was used to determine which plates were combined in a 50 ml conical tube, and the Asoo of strain(s) possessed the desired genomic structure. PCR a 1:20 dilution determined. Sufficient cells to prepare 50 ml of primer sequences and diagnostic amplicon sizes are YPD containing yeast cells at an Asoo of 0.2 were placed in 40 described for many of the targeting constructs in Section 7. each of two 50 ml conical tubes and pelleted in a centrifuge PCR reaction mixes were prepared containing 5 ul of for 5 min at 400xg. The cells in each tube were suspended in 10xNEB Standard Taq. Buffer, 2.5ul of dNTP mix (6 mM of 10 ml of TE/Limix (100 mM LiC1, 10 mM Tris-C1, 1 mM each of dATP, dCTP, dGTP, dTTP), 2.5ul of oligonucleotide EDTA, pH 7.4). Both tubes were incubated in a shaking primer 1 (10 mM), 2.5 ul of oligonucleotide primer 2 (10 incubator for 1 hour at 30° C. and 125 rpm, then 250 ul of 1M 45 mM), 1 ul of NEB Taq DNA polymerase (5 U of enzyme), 2.5 DTT was added to each 10 ml cell suspension and incubation ul of Candida gCNA and water to 50 PCR reactions were continued for a further 30 min at 30° C. and 125 rpm. subjected to the following temperatures for the times indi The cells were then washed twice in water and once in cated to amplify the target DNA: sorbitol. Sterile, ice-cold purified water (40 ml) was added to Step 1: 1.5 min (a 95°C. each of the cell suspensions which were then centrifuged for 50 Step 2: 30 sec (a 95°C. 5 min at 400xg at 4°C. and the supernatant decanted off. The Step 3:30sec (a 48°C. (or -5°C. lower than the calculated cells in each tube were resuspended in 50 ml of sterile, ice Tm for the primers as appropriate) cold purified water, centrifuged for 5 min at 400xg at 4°C., Step 4: 1 min (a 72°C. (or 1 minute per 1kb for predicted the Supernatant decanted off Supernatant. The cells in each amplicon size) tube were then resuspended in 25 ml of ice cold 1 M Sorbitol 55 Step 5: Go to step 2 a further 29 times (prepared with purified water) and centrifuged for 5 min at Step 6: 2 min (a) 72° C. 400xg. The supernatant was decanted from each tube and Step 7: Hold (a 4° C. cells resuspended in the small residual volume of Sorbitol Step 8: End Solution (the Volume of each Suspension was approximately The amplicon sizes were determined by running 5-10 ul of 200 ul). The cell suspensions from both tubes were then 60 the completed PCR reaction on a 1% Agarose-TBE gel. pooled, this provided enough cells for 4-8 electroporations. In 7.1.4. Selection and Screen for Isolates Having Excised Tar a 1.5 mileppendorf tube on ice, 60 ul of cells were mixed with geting Constructs from the Genome of Candida tropicalis 60 ul (~2.5ug) of BsmBI digested vector DNA containing the Strains carrying a genomic targeting construct to be genomic targeting construct. A No DNA Control was pre excised were inoculated from aYPDagar stock plate into 2 ml pared for every transformation by mixing cells with Qiagen 65 YP (YPD without dextrose) broth-i-2% maltose in a 14 ml EB (elution buffer) instead of DNA. The cell-DNA mixtures culture tube. The culture tubes were rolled for ~48 hours at were mixed with a vortexer and transferred to an ice-cold 30° C. on a rollerdrum. Growth with maltose induced pro US 9,359,581 B2 67 68 duction of Flp recombinase in the host strain from the inte forms the desired reaction, and to engineer it so that its grated targeting construct. The Flp recombinase then acted at activity is increased towards desired Substrates and reduced Frt sites located near the ends of the targeting construct (be towards undesired substrates. In one embodiment its activity tween the targeting sequences) to excise the sequences for co-hydroxylation of fatty acids is increased relative to its between the Frt sites, including the genes encoding Flp oxidation of ()-hydroxy fatty acids to C. ()-diacids, thereby recombinase and conferring nourseothricin resistance. The favoring the production of ()-hydroxy fatty acids over C.()- culture was then diluted in serial 10-fold dilutions from diacids. 10-fold to 10,000-fold. Aliquots (100 ul) of 100, 1,000 and 7.2.1. Deletion of CYP52A17 10,000-fold dilutions were spread onto YPD agar plates. 10 Putative excisants were identified by replica-plating colo The sequence of a gene encoding a cytochrome P450 in nies on the YPD agar plates from the dilution series (the most Candida tropicalis, CYP52A17 is given as SEQ ID NO: 2. useful plates for this purpose were those with 50-500 colo This sequence was used to design a "pre-targeting construct nies) to aYPDagar--200ug/ml nourseothricin plates and then comprising two targeting sequences from the 5' and 3' end of to a YPD agar plate. Putative excisants were identified as 15 the structural gene. The targeting sequences were separated colonies that grow onYPD agar, but notYPDagar+200ug/ml by a sequence, given as SEQID NO: 12, comprising a NotI nourseothricin following overnight incubation at room tem restriction site, a 20 base pair stuffer fragment and an XhoI perature. Putative excisants were streaked for single colonies restriction site. The targeting sequences were flanked by two to aYPD agar plate and incubated overnight at 30 C. A single BSmBI restriction sites, so that the final targeting construct isolate of each of the putative excisants is patched to a YPD can be linearized prior to transformation into Candida tropi agar stock plate and incubated overnight at 30° C. calis. The sequence of the CYP52A17 pre-targeting construct Putative excisants were inoculated from the stock plate to 2 is given as SEQID NO:3. Not shown in SEQID NO: 3 but ml of YPD broth in a 14 ml culture tube and rolled overnight also present in the pre-targeting construct were a selective at 30°C. on a Rollerdrum. Genomic DNA was prepared from 25 0.5 ml of the overnight culture using the YeaStar Genomic marker conferring resistance to kanamycin and a bacterial DNA Isolation Kit from Zymo Research and eluted in 120 ul origin of replication, so that the pre-targeting construct can be of TE, pH 8.0. Excision of the targeting construct was tested grown and propagated in Ecoli. The sequence was synthe by PCR as described in 7.1.3. sized using standard DNA synthesis techniques well known 30 in the art. 7.2. Deletion of Cytochrome P450 Genes from Candida A targeting construct for deletion of CYP52A17 from the Candida tropicalis genome was prepared by digesting the The CYP52A type P450s are responsible for oxidation of a SAT-1 flipper (SEQID NO: 1) with restriction enzymes NotI variety of compounds in several Candida species, including 35 and XhoI, and ligating it into the CYP52A17 pre-targeting ()-hydroxylation offatty acids (Craft et al., 2003, Appl Envi construct (SEQID NO: 3) from which the 20 bp stuffer had ron Microbiol: 69, 5983-91; Eschenfeldt et al., 2003, Appl been removed by digestion with restriction enzymes Notland Environ Microbiol: 69, 5992-9; Ohkuma et al., 1991, DNA XhoI. The sequence of the resulting targeting construct for Cell Biol: 10, 271-82; Zimmer et al., 1995, DNA Cell Biol: deletion of CYP52A17 is given as SEQ ID NO: 4. This 40 sequence is a specific example of the construct shown generi 14,619-28; and Zimmer et al., 1996, Biochem Biophy's Res cally in FIG. 4: it has nearly 300 base pairs of the genomic Commun: 224, 784-9.) They have also been implicated in the sequence of CYP52A17 at each end to serve as a targeting further oxidation of theseconpounds. See Eschenfeldt et al., sequence; between the targeting sequences are two frt sites 2003, “Transformation of fatty acids catalyzed by cyto that are recognized by the flp recombinase; between the two chrome P450 monooxygenase enzymes of Candida tropica 45 frt sites are sequences encoding the flp recombinase and a lis.” Appl. Environ. Microbiol. 69: 5992-5999, which is protein conferring resistance to the antibiotic nourseothricin. hereby incorporated by reference herein. In some embodi Not shown in SEQID NO. 4 but also present in the targeting ments it is desirable to engineer one or more CYP52A type construct were a selective marker conferring resistance to P450s in a strain of Candida in order to modify the activity or kanamycin and a bacterial origin of replication, so that the specificity of the P450 enzyme. In some such embodiments it 50 targeting construct can be grown and propagated in Ecoli. is advantageous to eliminate the activities of one or more The targeting sequences shown in SEQID NO. 4 also include CYP52A type P450 enzymes endogenous to the strain. Rea a BSmBI restriction site at each end of the construct, so that Sons to delete endogenous P450 enzymes include more accu the final targeting construct can be linearized and optionally rate determination of the activity and specificity of a P450 55 separated from the bacterial antibiotic resistance marker and enzyme that is being engineered and elimination of P450 origin of replication prior to transformation into Candida enzymes whose activities may interfere with synthesis of the tropicalis. desired product. Strains lacking one or more of their natural Candida tropicalis strain DP65 was prepared by integra CYP52A P450s are within the scope of the disclosed tech tion of the construct shown as SEQID NO. 4 into the genome 60 of strain DP1 (Table 3) at the site of the genomic sequence of nology. For example in order to obtain a strain of Candida the gene for CYP52A17. Candida tropicalis strain DP78 was species of yeast including Candida tropicalis for the produc prepared by excision of the targeting construct from the tion of oxidized compounds including ()-hydroxy fatty acids, genome of strain DP65, thereby deleting the gene encoding one method is to reduce or eliminate CYP52A type P450s and CYP52A17. Integration and deletion of targeting sequence other enzyme activities within the cell-that oxidise ()-hy 65 SEQID NO: 4, and analysis of integrants and excisants were droxy fatty acids to C.O)-diacids. It is then possible to re performed as described in Section 7.1. Sequences of oligo introduce one CYP52A type P450 or other enzyme that per nucleotide primers for analysis of strains were: US 9,359,581 B2 70 Candida tropicalis strain DP107 was prepared by integra (SEO ID NO: 77) tion of the construct shown as SEQID NO: 7 into the genome 17-IN-L3: TGGCGGAAGTGCATGTGACACAACG ofstrain DP65 (Table 3) at the site of the genomic sequence of (SEO ID NO: 78) the gene for CYP52A13. Candida tropicalis strain DP113 17-IN-R2: GTGGTTGGTTTGTCTGAGTGGAGAG 5 was prepared by excision of the targeting construct from the (SEO ID NO: 79) genome of strain DP107, thereby deleting the gene encoding SAT1-R: TGGTACTGGTTCTCGGGAGCACAGG CYP52A13. Integration and deletion of targeting sequence SEQID NO: 7, and analysis of integrants and excisants were (SEQ ID NO: 80) performed as described in Section 7.1. SAT1 - F : CGCTAGACAAATTCTTCCAAAAATTTTAGA 10 Sequences of oligonucleotide primers for analysis of For strain DP65 (integration of SEQID NO: 4), PCR with strains were: primers 17-IN-L3 and SAT1-R produces a 959 base pair amplicon: PCR with primers SAT1-F and 17-IN-R2 produces (SEQ ID NO: 81) a 922 base pair amplicon. PCR with primers 17-IN-L3 and 15 17-IN-R2 from a strain carrying a wild type copy of 13 - IN-L2: CATGTGGCCGCTGAATGTGGGGGCA CYP52A17 produces a 2.372 base pair amplicon. For strain (SEQ ID NO: 82) DP78, with a deleted copy of CYP52A17, PCR with primers 13 - IN-R2: GCCATTTTGTTTTTTTTTACCCCTCTAACA 17-IN-L3 and 17-IN-R2 produces a 1,478 base pair amplicon. (SEO ID NO : 79) Deletion of a portion of the coding sequence of the gene for SAT1-R: CYP52A17 will disrupt the function of the protein encoded (SEQ ID NO: 80) by this gene in the Candida host cell. SAT1 - F : 7.2.2. Deletion of CYP52A13 The sequence of a gene encoding a cytochrome P450 in For strain DP107 (integration of SEQID NO:7), PCR with Candida tropicalis, CYP52A13 is given as SEQ ID NO: 5. 25 primers 13-IN-L2 and SAT1-R produces an 874 base pair This sequence was used to design a "pre-targeting construct amplicon: PCR with primers SAT1-F and 13-IN-R2 produces comprising two targeting sequences from the 5' and 3' end of an 879 base pair amplicon. PCR with primers 13-IN-L2 and the structural gene. The targeting sequences were separated 13-IN-R2 from a strain with wild type CYP52A13 produces by a sequence, given as SEQID NO: 12, comprising a NotI a 2.259 base pair amplicon. For strain DP113 with a deleted 30 version of CYP52A13 PCR with primers 13-IN-L2 and restriction site, a 20 base pair stuffer fragment and an XhoI 13-IN-R2 produces a 1,350 base pair amplicon. restriction site. The targeting sequences were flanked by two Deletion of a portion of the coding sequence of the gene for BSmBI restriction sites, so that the final targeting construct CYP52A13 will disrupt the function of the protein encoded can be linearized prior to transformation into Candida tropi by this gene in the Candida host cell. calis. The sequence of the CYP52A13 pre-targeting construct 35 7.2.3. Deletion of CYP52A18 is given as SEQID NO: 6. Not shown in SEQID NO: 6 but The sequence of a gene encoding a cytochrome P450 in also present in the pre-targeting construct were a selective Candida tropicalis, CYP52A18 is given as SEQ ID NO: 8. marker conferring resistance to kanamycin and a bacterial This sequence was used to design a "pre-targeting construct origin of replication, so that the pre-targeting construct can be comprising two targeting sequences from the 5' and 3' end of grown and propagated in Ecoli. The sequence was synthe 40 the structural gene. The targeting sequences were separated sized using standard DNA synthesis techniques well known by a sequence, given as SEQID NO: 12, comprising a NotI in the art. restriction site, a 20 base pair stuffer fragment and an XhoI A targeting construct for deletion of CYP52A13 from the restriction site. The targeting sequences were flanked by two Candida tropicalis genome was prepared by digesting the BSmBI restriction sites, so that the final targeting construct SAT-1 flipper (SEQID NO: 1) with restriction enzymes NotI 45 can be linearized prior to transformation into Candida tropi and XhoI, and ligating it into the CYP52A13 pre-targeting calis. The sequence of the CYP52A18 pre-targeting construct construct (SEQID NO: 6) from which the 20 bp stuffer had is given as SEQ ID NO: 9. The CYP52A18 pre-targeting been removed by digestion with restriction enzymes Notland construct also contains a polylinker sequence (SEQ ID NO: XhoI. The sequence of the resulting targeting construct for 10) between the 5' targeting sequence and the NotI site. This deletion of CYP52A13 is given as SEQ ID NO: 7. This 50 polylinker sequence was placed to allow the insertion of sequence is a specific example of the construct shown generi sequences into the targeting construct to allow it to function cally in FIG. 4: it has nearly 300 base pair of the genomic as an insertion targeting construct of the form shown sche sequence of CYP52A13 at each end to serve as a targeting matically in FIG. 7. Not shown in SEQ ID NO: 9 but also sequence; between the targeting sequences are two frt sites present in the pre-targeting construct were a selective marker that are recognized by the flp recombinase; between the two 55 conferring resistance to kanamycin and a bacterial origin of frt sites are sequences encoding the flp recombinase and a replication, so that the pre-targeting construct can be grown protein conferring resistance to the antibiotic nourseothricin. and propagated in Ecoli. The sequence was synthesized using Not shown in SEQID NO: 7 but also present in the targeting standard DNA synthesis techniques well known in the art. A construct were a selective marker conferring resistance to targeting construct for deletion of CYP52A18 from the Can kanamycin and a bacterial origin of replication, so that the 60 dida tropicalis genome was prepared by digesting the SAT-1 targeting construct can be grown and propagated in Ecoli. flipper (SEQ ID NO: 1) with restriction enzymes NotI and The targeting sequences shown in SEQID NO:7 also include XhoI, and ligating it into the CYP52A18 pre-targeting con a BSmBI restriction site at each end of the construct, so that struct (SEQID NO:9) from which the 20 base pair stuffer had the final targeting construct can be linearized and optionally been removed by digestion with restriction enzymes Notland separated from the bacterial antibiotic resistance marker and 65 XhoI. The sequence of the resulting targeting construct for origin of replication prior to transformation into Candida deletion of CYP52A18 is given as SEQ ID NO: 11. This tropicalis. sequence is a specific example of the construct shown generi US 9,359,581 B2 71 72 cally in FIG. 4: it has nearly 300 base pairs of the genomic matically in FIG. 7. Not shown in SEQID NO: 14 but also sequence of CYP52A18 at each end to serve as a targeting present in the pre-targeting construct were a selective marker sequence; between the targeting sequences are two frt sites conferring resistance to kanamycin and a bacterial origin of replication, so that the pre-targeting construct can be grown that are recognized by the flp recombinase; between the two and propagated in Ecoli. The sequence was synthesized using frt sites are sequences encoding the flp recombinase and a standard DNA synthesis techniques well known in the art. protein conferring resistance to the antibiotic nourseothricin. A targeting construct for deletion of CYP52A14 from the Not shown in SEQID NO: 11 but also present in the targeting Candida tropicalis genome was prepared by digesting the construct were a selective marker conferring resistance to SAT-1 flipper (SEQID NO: 1) with restriction enzymes NotI kanamycin and a bacterial origin of replication, so that the and XhoI, and ligating it into the CYP52A14 pre-targeting targeting construct can be grown and propagated in Ecoli. 10 construct (SEQID NO: 14) from which the 20 bp stuffer had The targeting sequences shown in SEQ ID NO: 11 also been removed by digestion with restriction enzymes Notland include a BSmBI restriction site at each end of the construct, XhoI. The sequence of the resulting targeting construct for so that the final targeting construct can be linearized and deletion of CYP52A14 is given as SEQ ID NO: 15. This optionally separated from the bacterial antibiotic resistance sequence is a specific example of the construct shown generi 15 cally in FIG. 4: it has nearly 300 base pairs of the genomic marker and origin of replication prior to transformation into sequence of CYP52A14 at each end to serve as a targeting Candida tropicalis. sequence; between the targeting sequences are two frt sites Candida tropicalis strain DP140 was prepared by integra that are recognized by the flp recombinase; between the two tion of the construct shown as SEQ ID NO: 11 into the frt sites are sequences encoding the flp recombinase and a genome of strain DP113 (Table 3) at the site of the genomic protein conferring resistance to the antibiotic nourseothricin. sequence of the gene for CYP52A18. Candida tropicalis Not shown in SEQID NO: 15 but also present in the targeting strain DP142 was prepared by excision of the targeting con construct were a selective marker conferring resistance to struct from the genome of strain DP140, thereby deleting the kanamycin and a bacterial origin of replication, so that the gene encoding CYP52A18. Integration and deletion of tar targeting construct can be grown and propagated in Ecoli. geting sequence SEQID NO: 11, and analysis of integrants 25 The targeting sequences shown in SEQ ID NO: 15 also and excisants were performed as described in Section 7.1. include a BSmBI restriction site at each end of the construct, Oligonucleotide primers for analysis of strains were: so that the final targeting construct can be linearized and optionally separated from the bacterial antibiotic resistance marker and origin of replication prior to transformation into (SEQ ID NO: 83) Candida tropicalis. 18-IN-L2: GGAAGTGCATGTGACACAATACCCT 30 Candida tropicalis strain DP170 was prepared by integra (SEO ID NO: 84) tion of the construct shown as SEQ ID NO: 15 into the 18-IN-R2: GGTGGTTTGTCTGAGTGAGAACGTTTAATT genome of strain DP 142 (Table 3) at the site of the genomic sequence of the gene for CYP52A14. Candida tropicalis (SEO ID NO: 79) strain DP174 was prepared by excision of the targeting con SAT1-R: TGGTACTGGTTCTCGGGAGCACAGG 35 struct from the genome of strain DP170, thereby deleting the (SEQ ID NO: 80) gene encoding CYP52A14. Integration and deletion of tar SAT1 - F : CGCTAGACAAATTCTTCCAAAAATTTTAGA geting sequence SEQID NO: 15, and analysis of integrants and excisants were performed as described in Section 7.1. For strain DP140 (integration of SEQ ID NO: 11), PCR Oligonucleotide primers for analysis of strains were: with primers 18-IN-L2 and SAT1-R produces a 676 base pair 40 amplicon: PCR with primers SAT1-F and 18-IN-R2 produces a 605 base pair amplicon. PCR from a strain with a wild type (SEO ID NO: 85) version of CYP52A18 with primers 18-IN-L2 and 18-IN-R2 14 - IN-T 2: GACGTAGCCGATGAATGTGGGGTGC produces a 2.328 base pair amplicon. For strain DP142 with (SEQ ID NO: 86) a deleted version of CYP52A18, PCR with primers 18-IN-L2 45 14 - IN-R2: TGCCATTTATTTTTTATTACCCCTCTAAAT and 18-IN-R2 produces an 878 base pair amplicon. Deletion of a portion of the coding sequence of the gene for (SEO ID NO : 79) CYP52A18 will disrupt the function of the protein encoded SAT1-R: by this gene in the Candida host cell. (SEQ ID NO: 80) 7.2.4. Deletion of CYP52A14 50 SAT1 - F : The sequence of a gene encoding a cytochrome P450 in For strain DP170 (integration of SEQ ID NO: 15), PCR Candida tropicalis, CYP52A14 is given as SEQID NO: 13. with primers 14-IN-L2 and SAT1-R produces a 664 base pair This sequence was used to design a "pre-targeting construct amplicon: PCR with primers SAT1-F and 14-IN-R2 produces comprising two targeting sequences from the 5' and 3' end of a 609 base pair amplicon. For a strain with a wild type version the structural gene. The targeting sequences were separated 55 of CYP52A14, PCR with primers 14-IN-L2 and 14-IN-R2 by a sequence, given as SEQID NO: 12, comprising a NotI produces a 2.234 base pair amplicon. For strain DP174 with restriction site, a 20 base pair stuffer fragment and an XhoI a deleted version of CYP52A14, PCR with primers 14-IN-L2 restriction site. The targeting sequences were flanked by two and 14-IN-R2 produces an 870 base pair amplicon. BSmBI restriction sites, so that the final targeting construct Deletion of a portion of the coding sequence of the gene for can be linearized prior to transformation into Candida tropi 60 CYP52A14 will disrupt the function of the protein encoded calis. The sequence of the CYP52A14 pre-targeting construct by this gene in the Candida host cell. is given as SEQ ID NO: 14. The CYP52A14 pre-targeting construct also contains a polylinker sequence (SEQ ID NO: 7.3. Deletion of Fatty Alcohol Oxidase Genes from 10) between the 5' targeting sequence and the NotI site. This Candida polylinker sequence was placed to allow the insertion of 65 sequences into the targeting construct to allow it to function At least one enzyme capable of oxidizing 03-hydroxy fatty as an insertion targeting construct of the form shown sche acids is present in Candida tropicalis in addition to the cyto US 9,359,581 B2 73 74 chrome P450 genes encoding CYP52A13, CYP52A14, DP186 was prepared by excision of the targeting construct CYP52A17 and CYP52A18. Oxidation of energy rich mol from the genome of strain DP182, thereby deleting the gene ecules reduces their energy content. For the production of encoding FAO1. Integration and deletion of targeting incompletely oxidized compounds-including ()-hydroxy sequence SEQ ID NO: 18, and analysis of integrants and fatty acids, it is advantageous to reduce or eliminate the excisants were performed as described in Section 7.1. further oxidation of incompletely oxidized compounds-Such Sequences of oligonucleotide primers for analysis of as ()-hydroxy fatty acids. Under one aspect, this can be strains were: achieved by deleting the genes encoding the oxidizing enzymes from the Candida genome Candidate genes for this activity include fatty alcohol oxidase and dehydrogenases as 10 (SEO ID NO : 87) shown in FIG. 14. One class of enzymes known to oxidize FAO1 - IN-L: ATTGGCGTCGTGGCATTGGCGGCTC incompletely oxidised compounds including hydroxy fatty (SEQ ID NO: 88) acids are the fatty alcohol oxidases. FAO1 - IN-R: TGGGCGGAATCAAGTGGCTT 7.3.1. Deletion of FAO1 The sequence of a gene encoding a fatty alcohol oxidase in 15 (SEO ID NO : 79) Candida tropicalis, FAO1 is given as SEQID NO: 16. This SAT1-R: TGGTACTGGTTCTCGGGAGCACAGG sequence was used to design a "pre-targeting construct com (SEQ ID NO: 80) prising two targeting sequences from the 5' and 3' end of the SAT1 - F : CGCTAGACAAATTCTTCCAAAAATTTTAGA structural gene. The targeting sequences were separated by a sequence, given as SEQID NO: 12, comprising a Not restric For strain DP182 (integration of SEQ ID NO: 18), PCR tion site, a 20 base pair stuffer fragment and an XhoI restric with primers FAO1-IN-L and SAT1-R produces a 624 base tion site. The targeting sequences were flanked by two BsmBI pair amplicon: PCR with primers SAT1-F and FAO1-IN-R restriction sites, so that the final targeting construct can be produces a 478 base pair amplicon. For a strain with a wild linearized prior to transformation into Candida tropicalis. type copy of FAO1, PCR with primers FAO1-IN-L and The sequence of the FAO1 pre-targeting construct is given as 25 FAO1-IN-R produces a 2,709 base pair amplicon. For strain SEQID NO: 17. The FAO1 pre-targeting construct also con DP186 with a deleted copy of FAO1, PCR with primers tains a polylinker sequence (SEQID NO: 10) between the 5' FAO1-IN-L and FAO1-IN-R produces a 699 base pair ampli targeting sequence and the Not site. This polylinker CO. sequence was placed to allow the insertion of sequences into Deletion of a portion of the coding sequence of the gene for the targeting construct to allow it to function as an insertion 30 FAO 1A will disrupt the function of the protein encoded by targeting construct of the form shown schematically in FIG. this gene in the Candida host cell. 7. Not shown in SEQ ID NO: 17 but also present in the 7.3.2. Deletion of FAO1B pre-targeting construct were a selective marker conferring No sequence had been reported for a second allele for resistance to kanamycin and a bacterial origin of replication, FAO1 (FAO1B) at the time of this work. To identify the allele so that the pre-targeting construct can be grown and propa 35 (BAO1B) we used PCR amplification primers and sequenc gated in Ecoli. The sequence was synthesized using standard ing primers designed to anneal to the known sequenced allele DNA synthesis techniques well known in the art. of FAO1. The primers used were: A targeting construct for deletion of FAO1 from the Can dida tropicalis genome was prepared by digesting the SAT-1 flipper (SEQ ID NO: 1) with restriction enzymes NotI and 40 (SEO ID NO: 89) XhoI, and ligating it into the FAO1 pre-targeting construct FAO1 F1; CGTCGACACCCTTATGTTAT (SEQID NO: 17) from which the 20 base pair stuffer had been (SEO ID NO: 90) removed by digestion with restriction enzymes NotI and FAO1 F2; CGTTGACTCCTATCAAGGACA XhoI. The sequence of the resulting targeting construct for deletion of FAO1 is given as SEQID NO: 18. This sequence 45 (SEQ ID NO: 91) is a specific example of the construct shown generically in FAO1 R1; GGTCTTCTCTTCCTGGATAATG FIG. 4: it has nearly 300 base pairs of the genomic sequence (SEQ ID NO: 92) of FAO1 at the 5' end and 220 base pairs of the genomic FAO1 F3; CCAGCAGTTGTTTGTTCTTG sequence of FAO1 at the 3' end to serve as a targeting (SEO ID NO: 93) sequence; between the targeting sequences are two frt sites 50 that are recognized by the flp recombinase; between the two FAO1 R2; AATCCTGTGCTTTGTCGTAGGC frt sites are sequences encoding the flp recombinase and a (SEQ ID NO: 94) protein conferring resistance to the antibiotic nourseothricin. FAO1 F4; TCCTTAACAAGAAGGGCATCG Not shown in SEQID NO: 18 but also present in the targeting (SEO ID NO: 95) construct were a selective marker conferring resistance to 55 FAO1 R3; TTCTTGAATCCGGAGTTGAC kanamycin and a bacterial origin of replication, so that the targeting construct can be grown and propagated in Ecoli. (SEQ ID NO: 96) The targeting sequences shown in SEQ ID NO: 18 also FAO1 F5; TCTTAGTCGTGATACCACCA include a BSmBI restriction site at each end of the construct, (SEO ID NO: 97) so that the final targeting construct can be linearized and 60 FAO1 R4; CTAAGGATTCTCTTGGCACC optionally separated from the bacterial antibiotic resistance marker and origin of replication prior to transformation into (SEO ID NO: 98) Candida tropicalis. FAO1 R5; GTGACCATAGGATTAGCACC Candida tropicalis strain DP182 was prepared by integra Genomic DNA was prepared from strains DP1 (which has tion of the construct shown as SEQ ID NO: 18 into the 65 FAO1) and DP186 (which is deleted for FAO1) as described genome of strain DP174 (Table 3) at the site of the genomic in section 7.1.3. The FAO genes were amplified from genomic sequence of the gene for FAO1. Candida tropicalis strain DNA by PCR using oligonucleotide primers FAO1 F1 and US 9,359,581 B2 75 76 FAO1 R5. Genomic DNA from both strains yielded an prising two targeting sequences from the 5' and 3' end of the amplicon of approximately 2 kilobases. Bothamplicons were structural gene. The targeting sequences were separated by a directly sequenced using the ten oligonucleotide primers sequence, given as SEQID NO: 12, comprising a Not restric listed above. The amplicon from DP1 gave sequence where tion site, a 20 bp stuffer fragment and an XhoI restriction site. there were occasionally two bases that appeared to be equally The targeting sequences were flanked by two BsmBI restric represented. The amplicon from DP186 had no such ambigu tion sites, so that the final targeting construct can be linearized ous bases but its sequence was slightly different 95% identi prior to transformation into Candida tropicalis. The sequence cal) from the reported sequence of FAO1. We concluded that of the FAO2A pre-targeting construct is given as SEQID NO: the sequence corresponded to a second allele of FAO1, which 23. Not shown in SEQ ID NO. 23 but also present in the we refer to as FAO1B. The sequence of FAO1B is given as 10 pre-targeting construct were a selective marker conferring SEQID NO: 19. This sequence was used to design a “pre-targeting con resistance to kanamycin and a bacterial origin of replication, struct comprising two targeting sequences from the 5' and 3' so that the pre-targeting construct can be grown and propa end of the structural gene. The targeting sequences were gated in Ecoli. The sequence was synthesized using standard separated by a sequence, given as SEQID NO: 12, compris 15 DNA synthesis techniques well known in the art. ing a Not restriction site, a 20 bp stuffer fragment and an A targeting construct for deletion of FAO2A from the Can XhoI restriction site. The targeting sequences were flanked by dida tropicalis genome was prepared by digesting the SAT-1 two BSmBI restriction sites, so that the final targeting con flipper (SEQ ID NO: 1) with restriction enzymes NotI and struct can be linearized prior to transformation into Candida XhoI, and ligating it into the FAO2A pre-targeting construct tropicalis. The sequence of the FAO1B pre-targeting con (SEQ ID NO. 23) from which the 20 bp stuffer had been struct is given as SEQID NO. 20. removed by digestion with restriction enzymes NotI and A targeting construct for deletion of FAO1 from the Can XhoI. The sequence of the resulting targeting construct for dida tropicalis genome was prepared by digesting the SAT-1 deletion of FAO2A is given as SEQID NO: 24. This sequence flipper (SEQ ID NO: 1) with restriction enzymes NotI and is a specific example of the construct shown generically in XhoI, and ligating it into the FAO1B pre-targeting construct 25 FIG. 4: it has nearly 300 base pair of the genomic sequence of (SEQID NO: 20) that had also been digested with restriction FAO2A at the 5' and 3' ends of the structural gene to serve as enzymes NotI and XhoI. the FAO1B pre-targeting construct a targeting sequence; between the targeting sequences are two (SEQID NO: 20) was not cloned or propagated in a bacterial frt sites that are recognized by the flp recombinase; between host, so digestion with restriction enzymes NotI and XhoI the two frt sites are sequences encoding the flp recombinase produced two fragments which were then ligated with the 30 and a protein conferring resistance to the antibiotic nourseo digested SAT-1 flipper to produce a targeting construct for thricin. Not shown in SEQID NO: 24 but also present in the deletion of FAO1B, given as SEQID NO: 21. This sequence targeting construct were a selective marker conferring resis is a specific example of the construct shown generically in tance to kanamycin and a bacterial origin of replication, so FIG. 4: it has nearly 300 base pairs of the genomic sequence that the targeting construct can be grown and propagated in E of FAO1B at the 5' end and 220 base pairs of the genomic 35 coli. The targeting sequences shown in SEQID NO: 24 also sequence of FAO1B at the 3' end to serve as a targeting include a BSmBI restriction site at each end of the construct, sequence; between the targeting sequences are two frt sites so that the final targeting construct can be linearized and that are recognized by the flp recombinase; between the two optionally separated from the bacterial antibiotic resistance frt sites are sequences encoding the flp recombinase and a marker and origin of replication prior to transformation into protein conferring resistance to the antibiotic nourseothricin. 40 Candida tropicalis. Candida tropicalis strain DP238 was prepared by integra Candida tropicalis strain DP255 was prepared by integra tion of the construct shown as SEQ ID NO: 21 into the tion of the construct shown as SEQ ID NO: 24 into the genome of strain DP 186 (Table 3) at the site of the genomic genome of strain DP240 (Table 3) at the site of the genomic sequence of the gene for FAO1B. Candida tropicalis strain sequence of the gene for FAO2A. Candida tropicalis strain DP240 was prepared by excision of the targeting construct 45 DP256 was prepared by excision of the targeting construct from the genome of strain DP238, thereby deleting the gene from the genome of strain DP255, thereby deleting most of encoding FAO1B. Integration and deletion of targeting the coding portion of the gene encoding FAO2A. Integration sequence SEQ ID NO: 21, and analysis of integrants and and deletion of targeting sequence SEQ ID NO: 24, and excisants were performed as described in Section 7.1. analysis of integrants and excisants were performed as Sequences of oligonucleotide primers for analysis of strains 50 described in Section 7.1. Sequences of oligonucleotide prim were, FAO1 F1 (SEQID NO: 89), FAO1 R5 (SEQ ID NO: ers for analysis of Strains were: 98), SAT1-R (SEQID NO: 79), SAT1-F (SEQID NO: 80). For strain DP182 (integration of SEQ ID NO: 18), PCR (SEO ID NO: 99) with primers FAO1 F1 and SAT1-R produces a 558 base pair FAO2A-IN-L: CTTTTCTGATTCTTGATTTTCCCTTTTCAT amplicon: PCR with primers SAT1-F and FAO1 R5 pro 55 duces a 557 base pair amplicon. For a strain with a wild type (SEQ ID NO: 1.OO) copy of FAO1B, PCR with primers FAO1 F1 and FAO1 R5 FAO2A-IN-R: ATACATCTAGTATATAAGTGTCGTATTTCC produces a 2,007 base pair amplicon. For strain DP186, with (SEO ID NO : 79) a deleted copy of FAO1B, PCR with primers FAO1 F1 and SAT1-R: FAO1 R5 produces a 711 base pair amplicon. 60 Deletion of a portion of the coding sequence of the gene for (SEQ ID NO: 80) FAO1B will disrupt the function of the protein encoded by SAT1 - F : this gene in the Candida host cell. For strain DP255 (integration of SEQ ID NO: 24), PCR 7.3.3. Deletion of FAO2A with primers FAO2A-IN-L and SAT1-R produces a 581 base The sequence of a gene encoding a fatty alcohol oxidase in 65 pair amplicon: PCR with primers SAT1-F and FAO2A-IN-R Candida tropicalis, FAO2A is given as SEQID NO: 22. This produces a 569 base pair amplicon. For a strain with a wild sequence was used to design a "pre-targeting construct com type copy of FAO2A, PCR with primers FAO2A-IN-L and US 9,359,581 B2 77 FAO2A-IN-R produces a 2,199 base pair amplicon. For strain DP186 with a deleted copy of FAO2A, PCR with primers (SEQ ID NO: 101) FAO2A-IN-L and FAO2A-IN-R produces a 747 base pair FAO2B-IN-L: TGCTTTTCTGATTCTTGATCATCCCCTTAG amplicon. (SEQ ID NO: 102) Deletion of a portion of the coding sequence of the gene for FAO2B-IN-R: ATACATCTAGTATATAAGTGTCGTATTTCT FAO2A will disrupt the function of the protein encoded by (SEO ID NO : 79) this gene in the Candida host cell. SAT1-R: 7.3.4. Deletion of FAO2B The sequence of a gene encoding a fatty alcohol oxidase in (SEQ ID NO: 80) Candida tropicalis, FAO2B is given as SEQID NO: 25. This 10 SAT1 - F : sequence was used to design a "pre-targeting construct com For strain DP259 (integration of SEQ ID NO: 27), PCR prising two targeting sequences from the 5' and 3' end of the with primers FAO2B-IN-L and SAT1-R produces a 551 base structural gene. The targeting sequences were separated by a pair amplicon: PCR with primers SAT1-F and FAO2B-IN-R sequence, given as SEQID NO: 12, comprising a Not restric 15 produces a 571 base pair amplicon. For a strain with a wild tion site, a 20 base pair stuffer fragment and an XhoI restric type copy of FAO2B, PCR with primers FAO2B-IN-L and tion site. The targeting sequences were flanked by two BsmBI FAO2B-IN-R produces a 2,198 base pair amplicon. For strain restriction sites, so that the final targeting construct can be DP186 with a deleted copy of FAO2B, PCR with primers linearized prior to transformation into Candida tropicalis. FAO2B-IN-L and FAO2B-IN-R produces a 719 base pair The sequence of the FAO2B pre-targeting construct is given amplicon. as SEQ ID NO: 26. Not shown in SEQ ID NO: 26 but also Deletion of a portion of the coding sequence of the gene for present in the pre-targeting construct were a selective marker FAO2B will disrupt the function of the protein encoded by conferring resistance to kanamycin and a bacterial origin of this gene in the Candida host cell. replication, so that the pre-targeting construct can be grown and propagated in Ecoli. The sequence was synthesized using 25 7.4. Deletion of More Cytochrome P450 Genes from standard DNA synthesis techniques well known in the art. Candida A targeting construct for deletion of FAO2B from the Can dida tropicalis genome was prepared by digesting the SAT-1 At least one enzyme capable of oxidizing ()-hydroxy fatty acids is present in Candida tropicalis in addition to the cyto flipper (SEQ ID NO: 1) with restriction enzymes NotI and 30 chrome P450 genes encoding CYP52A13, CYP52A14, XhoI, and ligating it into the FAO2B pre-targeting construct CYP52A17 and CYP52A18 and fatty alcohol oxidase genes (SEQIDNO: 26) from which the 20 base pair stuffer had been FAO1, FAO1B, FAO2A and FAO2B. Oxidation of energy removed by digestion with restriction enzymes NotI and rich molecules reduces their energy content. For the produc XhoI. The sequence of the resulting targeting construct for tion of incompletely oxidized compounds-including ()-hy deletion of FAO2B is given as SEQID NO: 27. This sequence 35 droxy fatty acids, it is advantageous to reduce or eliminate the further oxidation of incompletely oxidized compounds ()-hy is a specific example of the construct shown generically in droxy fatty acids. Under one aspect, this can be achieved by FIG. 4: it has nearly 300 base pairs of the genomic sequence deleting the genes encoding the oxidizing enzymes from the of FAO2B at the 5' and 3' ends of the structural gene to serve Candida genome. One class of enzymes known to oxidize as a targeting sequence; between the targeting sequences are 40 incompletely oxidised compounds are the cytochrome P450s. two frt sites that are recognized by the flp recombinase; The CYP52A type P450s are responsible for co-hydroxy between the two frt sites are sequences encoding the flp lation of fatty acids in several Candida species (Craft et al., recombinase and a protein conferring resistance to the anti 2003, Appl Environ Microbiol: 69,5983-91; Eschenfeldt et al., 2003, Appl Environ Microbiol: 69,5992-9; Ohkuma et al., biotic nourseothricin. Not shown in SEQID NO: 27 but also 45 1991, DNA Cell Biol: 10, 271-82; Zimmer et al., 1995, DNA present in the targeting construct were a selective marker Cell Biol: 14,619-28; Zimmer et al., 1996, Biochem Biophys conferring resistance to kanamycin and a bacterial origin of Res Commun: 224,784-9.) They have also been implicated in replication, so that the targeting construct can be grown and the further oxidation of these ()-hydroxy fatty acids to C.co propagated in Ecoli. The targeting sequences shown in SEQ diacids. See Eschenfeldt, et al., 2003, Appli. Environ. Micro ID NO: 27 also includes a BSmBI restriction site at each end 50 biol. 69: 5992-5999, which is hereby incorporated by refer ence herein. Another CYP52A type P450 whose expression is of the construct, so that the final targeting construct can be induced by fatty acids is CYP52A12. linearized and optionally separated from the bacterial antibi 74.1. Deletion of CYP52A12 otic resistance marker and origin of replication prior to trans The sequence of a gene encoding a cytochrome P450 in formation into Candida tropicalis. 55 Candida tropicalis, CYP52A12 is given as SEQID NO: 28. Candida tropicalis strain DP259 was prepared by integra This sequence was used to design a "pre-targeting construct tion of the construct shown as SEQ ID NO: 27 into the comprising two targeting sequences from the 5' and 3' end of genome of strain DP256 (Table 3) at the site of the genomic the structural gene. The targeting sequences were separated sequence of the gene for FAO2BA. Candida tropicalis strain by a sequence, given as SEQID NO: 12, comprising a NotI DP261 was prepared by excision of the targeting construct 60 restriction site, a 20 base pair stuffer fragment and a XhoI from the genome of strain DP259, thereby deleting most of restriction site. The targeting sequences were flanked by two the coding region of the gene encoding FAO2B. Integration BSmBI restriction sites, so that the final targeting construct and deletion of targeting sequence SEQ ID NO: 27, and can be linearized prior to transformation into Candida tropi analysis of integrants and excisants were performed as calis. The sequence of the CYP52A12 pre-targeting construct described in Section 7.1. 65 is given as SEQID NO:29. Not shown in SEQID NO: 29 but Sequences of oligonucleotide primers for analysis of also present in the pre-targeting construct were a selective strains were: marker conferring resistance to kanamycin and a bacterial US 9,359,581 B2 79 80 origin of replication, so that the pre-targeting construct can be 7.4.2. Deletion of CYP52A12B grown and propagated in Ecoli. The sequence was synthe No sequence had been reported for a second allele for sized using standard DNA synthesis techniques well known CYP52A12 at the time of this work. We reasoned that in a in the art. diploid organisms a second allele existed (CYP52A17 and A targeting construct for deletion of CYP52A12 from the 5 CYP52A18 are an allelic pair and CYP52A13 and Candida tropicalis genome was prepared by digesting the CYP52A14 are an allelic pair). To delete the second allele we SAT-1 flipper (SEQID NO: 1) with restriction enzymes NotI synthesized a deletion construct based on the CYP52A12 and XhoI, and ligating it into the CYP52A12 pre-targeting sequence (SEQID NO: 28), but designed it so that the target ing sequences were homologous to regions of the CYP52A12 construct (SEQ ID NO: 29) from which the 20 base pair 10 gene that are missing because they have been deleted in Strain stuffer had been removed by digestion with restriction DP272. First we constructed a “pre-targeting construct com enzymes NotI and XhoI. The sequence of the resulting tar prising two targeting sequences from near the 5' and 3' ends of geting construct for deletion of CYP52A12 is given as SEQ the structural gene, but internal to the two sequences used in ID NO: 30. This sequence is a specific example of the con the design of the targeting construct for the deletion of struct shown generically in FIG. 4: it has nearly 300 base pairs 15 CYP52A12. The targeting sequences were separated by a of the genomic sequence of CYP52A12 at each end to serve sequence, given as SEQID NO: 12, comprising a Not restric as a targeting sequence; between the targeting sequences are tion site, a 20 base pair stuffer fragment and a XhoI restriction two frt sites that are recognized by the flp recombinase; site. The targeting sequences were flanked by two BsmBI between the two frt sites are sequences encoding the flp restriction sites, so that the final targeting construct can be recombinase and a protein conferring resistance to the anti linearized prior to transformation into Candida tropicalis. The sequence of the CYP52A12B pre-targeting construct is biotic nourseothricin. Not shown in SEQID NO:30 but also given as SEQID NO:31. Not shown in SEQID NO:31 but present in the targeting construct were a selective marker also present in the pre-targeting construct were a selective conferring resistance to kanamycin and a bacterial origin of 25 marker conferring resistance to kanamycin and a bacterial replication, so that the targeting construct can be grown and origin of replication, so that the pre-targeting construct can be propagated in Ecoli. The targeting sequences shown in SEQ grown and propagated in Ecoli. The sequence was synthe ID NO:30 also include a BSmBI restriction site at each end of sized using standard DNA synthesis techniques well known the construct, so that the final targeting construct can be in the art. linearized and optionally separated from the bacterial antibi 30 A targeting construct for deletion of CYP52A12B from the otic resistance marker and origin of replication prior to trans Candida tropicalis genome was prepared by digesting the formation into Candida tropicalis. SAT-1 flipper (SEQID NO: 1) with restriction enzymes NotI Candida tropicalis strain DP268 was prepared by integra and XhoI, and ligating it into the CYP52A12B pre-targeting tion of the construct shown as SEQ ID NO: 30 into the 35 construct (SEQ ID NO: 31) from which the 20 base pair genome of strain DP261 (Table 3) at the site of the genomic stuffer had been removed by digestion with restriction sequence of the gene for CYP52A12. Candida tropicalis enzymes NotI and XhoI. The sequence of the resulting tar strain DP272 was prepared by excision of the targeting con geting construct for deletion of CYP52A12B is given as SEQ struct from the genome of strain DP268, thereby deleting the ID NO:32. This sequence is a specific example of the con gene encoding CYP52A12. Integration and deletion of tar 40 struct shown generically in FIG. 4: it has nearly 300 base pairs geting sequence SEQID NO:30, and analysis of integrants of the genomic sequence of CYP52A12 at each end to serve and excisants were performed as described in Section 7.1. as a targeting sequence; between the targeting sequences are Sequences of oligonucleotide primers for analysis of two frt sites that are recognized by the flp recombinase; strains were: between the two frt sites are sequences encoding the flp 45 recombinase and a protein conferring resistance to the anti biotic nourseothricin. Not shown in SEQID NO: 32 but also (SEQ ID NO: 103) present in the targeting construct were a selective marker 12-IN-L: CGCCAGTCTTTCCTGATTGGGCAAG conferring resistance to kanamycin and a bacterial origin of replication, so that the targeting construct can be grown and (SEQ ID NO : 104) so propagated in Ecoli. The targeting sequences shown in SEQ 12-IN-R2: GGACGTTGTCGAGTAGAGGGATGTG ID NO:32 also include a BSmBI restriction site at each end of (SEO ID NO: 79) the construct, so that the final targeting construct can be SAT1-R: linearized and optionally separated from the bacterial antibi (SEQ ID NO: 80) otic resistance marker and origin of replication prior to trans SAT1 - F : 55 formation into Candida tropicalis. Candida tropicalis strain DP282 was prepared by integra For strain DP268 (integration of SEQ ID NO: 30), PCR tion of the construct shown as SEQ ID NO: 32 into the with primers 12-IN-L and SAT1-R produces a 596 base pair genome of strain DP272 (Table 3) at the site of the genomic amplicon: PCR with primers SAT1-F and 12-IN-R2 produces sequence of the gene for CYP52A12B. Candida tropicalis a 650 base pair amplicon. For a strain with a wild type copy of 60 strain DP284 was prepared by excision of the targeting con CYP52A12, PCR with primers 12-IN-L and 12-IN-R2 pro struct from the genome of strain DP282, thereby deleting a duces a 2.348 base pair amplicon. For strain DP272 with a portion of the coding region of the gene encoding deleted copy of CYP52A12, PCR with primers 12-IN-L and CYP52A12B. Integration and deletion of targeting sequence 12-IN-R2 produces a 843 base pair amplicon. SEQID NO:32, and analysis of integrants and excisants were Deletion of a portion of the coding sequence of the gene for 65 performed as described in Section 7.1. CYP52A12 will disrupt the function of the protein encoded Sequences of oligonucleotide primers for analysis of by this gene in the Candida host cell. strains were: US 9,359,581 B2

(SEQ ID NO: 105) (SEO ID NO : 107) 12-F1: CTGTACTTCCGTACTTGACC ADH-F: GTTTACAAAGCCTTAAAGACT

(SEQ ID NO: 106) (SEQ ID NO: 108) 5 ADH-R: TTGAACGGCCAAAGAACCTAA. 12-R1: GAGACCTGGATCAGATGAG Five different sequences were obtained by sequencing the (SEO ID NO: 79) SAT1-R: 96 independent clones, called Ct ADH-A4, Ct ADH-A10. Ct ADH-B2, Ct ADH-B4 and Ct ADH-B1 1. These (SEQ ID NO: 80) 10 sequences are provided as SEQID NO:39, SEQID NO: 40, SAT1 - F : SEQ ID NO: 41, SEQ ID NO: 42 and SEQ ID NO: 43 respectively. In silico translation of Ct ADH-B2 (SEQ ID Oligonucleotides 12-F1 and 12-R1 are designed to anneal NO: 41) yielded an amino acid sequence with multiple in to a part of the genome that is missing in Strains with deletions frame stop codons, so it is almost certainly a pseudogene and in CYP52A12. In such strains they will thus only be able to 15 does not encode a functional protein. The other four anneal to and amplify from the second allele CYP52A12B. sequences all encode protein sequences without stop codons. For strain DP282 (integration of SEQID NO:32), PCR with Amino acid sequences of the partial genes are predicted primers 12-F1 and SAT1-R produces a 978 base pair ampli and provided: SEQID NO: 155 (ADH-A4), SEQID NO:154 con: PCR with primers SAT1-F and 12-R1 produces a 947 (ADH-B4), SEQ ID NO:152 (ADH-A10), SEQ ID NO:153 base pair amplicon. PCR from a strain with a wild type copy (ADH-A10B) and SEQID NO:151 (ADH-B11). of CYP52A12B with primers 12-F1 and 12-R1 produces a In some embodiments an alcohol dehydrogenase gene is 1,478 base pair amplicon. For strain DP272 with a deleted identified in the genome of a yeast of the genus Candida by copy of CYP52A12B, PCR with primers 12-F1 and 12-R1 comparison with the nucleotide sequence of an alcoholdehy produces a 505 base pair amplicon. drogenase from Candida tropicalis and is identified as an 25 alcohol dehydrogenase if (i) it comprises an open reading Deletion of a portion of the coding sequence of the gene for frame encoding a polypeptide at least 275 amino acids long or CYP52A12B will disrupt the function of the protein encoded at least 300 amino acids long and (ii) the gene is at least 65% by this gene in the Candida host cell. identical, at least 70% identical, at least 75% identical, at least 80% identical, at least 85% identical, at least 90% identical, at 7.5. Deletion of Alcohol Dehydrogenase Genes from 30 least 95% identical, or at least 98% identical for a stretch of at least 80, at least 85, at least 90, at least 95, at least 100, at least Candida 105, at least 110, at least 115, or at least 120 contiguous nucleotides of the coding sequence of a Candida tropicalis At least one enzyme capable of oxidizing ()-hydroxy fatty gene selected from the group consisting of ADH-A4 (SEQID acids is present in Candida tropicalis in addition to the cyto 35 NO:39), ADH-B4 (SEQ ID NO: 42), ADH-A10 (SEQ ID chrome P450 genes encoding CYP52A13, CYP52A14, NO: 40), ADH-AIOB (SEQID NO:56), and ADH-B11 (SEQ CYP52A17, CYP52A18, CYP52A12, CYP52A12B and the ID NO: 43). fatty alcohol oxidase genes FAO1, FAO1B, FAO2A and The sequence relationships of these protein sequences are FAO2B. Oxidation of energy rich molecules reduces their shown in a phylogenetic tree in FIG. 17. Ct. ADH-A4 (en energy content. For the production of incompletely oxidized 40 coded by SEQID NO:39) is most homologous to Candida compounds including ()-hydroxy fatty acids, it is advanta albicans ADH1A and Ct ADH-B4 (encoded by SEQID NO: geous to reduce or eliminate the further oxidation of incom 42) is most homologous to Candida albicans ADH2A. pletely oxidized compounds, including for example ()-hy An alignment, using ClustalW. of the amino acid droxy fatty acids. Under one aspect, this can be achieved by sequences of alcohol dehydrogenase proteins predicted from deleting the genes encoding the oxidizing enzymes from the 45 the sequences of genes from Candida albicans and Candida tropicalis is shown in FIGS. 3A and 3B. The genes from Candida genome. One class of enzymes known to oxidize Candida tropicalis were isolated as partial genes by PCR with alcohols is alcohol dehydrogenases. degenerate primers, so the nucleic acid sequences obtained 7.5.1. Identification of Candida tropicalis Alcohol Dehydro for the genes represent only a partial sequence of the gene, genases 50 and the predicted amino acid sequences of the encoded pro The sequences of four alcohol dehydrogenase genes were teins represent only a partial sequence of the protein. A con obtained from the Candida Geneome Database in the Depart sensus is indicated underneath the aligned amino acid ment of Genetics at the School of Medicine, Stanford Uni sequences of FIGS. 3A and 3B, with a * indicating that all 4 versity, Palo Alto, Calif. The sequences of these genes are Candida albicans alcoholdehydrogenase sequences and all 4 55 Candida tropicalis alcohol dehydrogenase sequences are given as SEQ ID NO:33, SEQID NO:34, SEQ ID NO:35 completely identical at those residues. BLAST searching of and SEQID NO:36. These sequences were aligned and two protein sequences in Genbank with highly conserved peptide degenerate oligonucleotide primers were designed, whose regions within the alcohol dehydrogenases yields results that sequences are given as SEQID NO:37 and SEQID NO:38. identify uniquely yeast alcohol dehydrogenases. These two primers were used to PCR amplify from genomic 60 In some embodiments an alcohol dehydrogenase gene is DNA from Candida tropicalis strain DP1. The resulting identified in the genome of a yeast of the genus Candida by amplicon of ~1,000 base pairs was cloned and 96 independent comparison of the amino acid sequence of its predicted trans transformants were picked, plasmid prepared and sequenced lation product with the predicted polypeptide sequence of an using two primers with annealing sites located in the vector alcohol dehydrogenase from Candida tropicalis and is iden reading into the cloning site and two primers designed to 65 tified as an alcohol dehydrogenase if it comprises a first anneal to highly conserved sequences within the Candida peptide sequence VKYSGVCH(SEQIDNO: 156) or VKYS albicans alcohol dehydrogenase sequences: GVCHxxxxxWKGDW (SEQ ID NO: 162) or VKYS US 9,359,581 B2 83 84 GVCHxxxxxWKGDWXXXXKLPxVGGHEGAGVVV (SEQ etc.). The rate of formation of the product can be measured ID NO: 163) or VGGHEGAGVVV (SEQID NO: 157). using colorimetric assays, or chromatographic assays, or In some embodiments an alcohol dehydrogenase gene is mass spectroscopy assays. In some embodiments the alcohol identified in the genome of a yeast of the genus Candida by dehydrogenase is disrupted if the rate of conversion is at least comparison of the amino acid sequence of its predicted trans 5% lower, at least 10% lower, at least 15% lower, at least 20% lation product with the predicted polypeptide sequence of an lower, at least 25% lower in the first host cell organism than alcohol dehydrogenase from Candida tropicalis and is iden the second host cell organism. tified as an alcohol dehydrogenase if it comprises a second In some embodiments, disruption of an alcohol dehydro peptide sequence QYATADAVQAA (SEQ ID NO: 158) or genase in a first host cell organism is measured by incubating SGYxHDGxFXOYATADAVQAA (SEQ ID NO: 164) or 10 said first host cell organism in a mixture comprising a Sub GAEPNCXXADxSGYxHDGxFxOYATADAVQAA (SEQ ID NO: 165). In some embodiments an alcohol dehydroge strate possessing a hydroxyl group and measuring the rate of nase gene is identified in the genome of a yeast of the genus conversion of the Substrate to a more oxidized product such as Candida by comparison of the amino acid sequence of its an aldehyde or a carboxyl group. The amount of the Substrate predicted translation product with the predicted polypeptide 15 converted to product by the first host cell organism in a sequence of an alcohol dehydrogenase from Candida tropi specified time is compared with the amount of Substrate con calis and is identified as an alcohol dehydrogenase if it com Verted to product by a second host cell organism that does not prises a third peptide sequence CAGVTVYKALK (SEQ ID contain the disrupted gene but contains a wild type counter NO: 159) or APIxCAGVTVYKALK (SEQID NO: 166). part of the gene, when the first host cell organism and the In some embodiments an alcohol dehydrogenase gene is second host cell organism are under the same environmental identified in the genome of a yeast of the genus Candida by conditions (e.g., same temperature, same media, etc.). The comparison of the amino acid sequence of its predicted trans amount of product can be measured using colorimetric lation product with the predicted polypeptide sequence of an assays, or chromatographic assays, or mass spectroscopy alcohol dehydrogenase from Candida tropicalis and is iden assays. In some embodiments the alcohol dehydrogenase is tified as an alcohol dehydrogenase if it comprises a fourth 25 disrupted if the amount of product is at least 5% lower, at least peptide sequence GQWVAISGA (SEQ ID NO: 160) or 10% lower, at least 15% lower, at least 20% lower, at least GQWVAISGAxGGLGSL (SEQID NO: 167) or GQWVAIS 25% lower, or at least 30% lower in the first host cell organism GAxGGLGSLXVQYA (SEQ ID NO: 168) or GQWVAIS than the second host cell organism. GAxGGLGSLXVQYAXAMG (SEQ ID NO: 169) or 7.5.2. Deletion of ADH-A4 GQWVAISGAxGGLGSLXVQYAXAMGxRVXAIDGG 30 Sequence SEQ ID NO: 39 was used to design a “pre (SEQ ID NO: 170). targeting construct comprising two targeting sequences The four coding sequences were sufficiently dissimilar to from the 5' and 3' end of the ADH-A4 structural gene. The reach the conclusion that they were not allelic pairs, but rather targeting sequences were separated by a sequence, given as represented four different genes, each of which probably had SEQID NO: 12, comprising a NotI restriction site, a 20 base its own allelic partner in the genome. Each of the coding 35 pair stuffer fragment and an XhoI restriction site. The target sequences was thus used to design two targeting constructs, ing sequences were flanked by two BSmBI restriction sites, so similarly to the strategy described for CYP52A12B in Sec that the final targeting construct can be linearized prior to tion 7.4.2. The construct for the first allele of each ADH gene transformation into Candida tropicalis. The sequence of the used 200 base pairs at the 5' end and ~200 base pairs at the 3' ADH-A4 pre-targeting construct is given as SEQID NO: 44. end as targeting sequences (5'-ADH Out and 3'-ADH Out in 40 Not shown in SEQ ID NO: 44 but also present in the pre FIG. 18). The construct for the second allele used two sec targeting construct were a selective marker conferring resis tions of ~200 base pairs between the first two targeting tance to kanamycin and a bacterial origin of replication, so sequences (5'-ADH In and 3'-ADH in FIG. 18). These that the pre-targeting construct can be grown and propagated sequences will be eliminated by the first targeting construct in Ecoli. The sequence was synthesized using standard DNA from the first allele of the gene and will thus serve as a 45 synthesis techniques well known in the art. targeting sequence for the second allele of the gene. As A targeting construct for deletion of ADH-A4 from the described below, this strategy succeeded with two ADH Candida tropicalis genome was prepared by digesting the allelic pairs: those for ADH-A4 and ADH-B4. However at the SAT-1 flipper (SEQID NO: 1) with restriction enzymes NotI first attempt it was not successful for deletion of the second and XhoI, and ligating it into the ADH-A4 pre-targeting con allele of ADH-A10 or ADH-B11, so the secondallele of these 50 struct (SEQID NO:44) from which the 20 bp stuffer had been genes were isolated, sequenced and those sequences were removed by digestion with restriction enzymes NotI and used to delete the second alleles of ADH-A10 or ADH-B1 1. XhoI. The sequence of the resulting targeting construct for Deletion of a portion of the sequence of an alcohol dehy deletion of ADH-A4 is given as SEQ ID NO: 45. This drogenase gene will disrupt the function of that alcoholdehy sequence is a specific example of the construct shown generi drogenase enzyme in the Candida host cell. 55 cally in FIG. 4: it has nearly 200 base pairs of the genomic In some embodiments, disruption of an alcohol dehydro sequence of ADH-A4 at each end to serve as a targeting genase in a first host cell organism is measured by incubating sequence; between the targeting sequences are two frt sites the first host cell organism in a mixture comprising a substrate that are recognized by the flp recombinase; between the two possessing a hydroxyl group and measuring the rate of con frt sites are sequences encoding the flp recombinase and a version of the Substrate to a more oxidized product such as an 60 protein conferring resistance to the antibiotic nourseothricin. aldehyde or a carboxyl group. The rate of conversion of the Not shown in SEQID NO: 44 but also present in the targeting substrate by the first host cell organism is compared with the construct were a selective marker conferring resistance to rate of conversion produced by a second host cell organism kanamycin and a bacterial origin of replication, so that the that does not contain the disrupted gene but contains a wild targeting construct can be grown and propagated in Ecoli. type counterpart of the gene, when the first host cell organism 65 The targeting sequences shown in SEQ ID NO: 44 also and the second host cell organism are under the same envi include a BSmBI restriction site at each end of the construct, ronmental conditions (e.g., same temperature, same media, so that the final targeting construct can be linearized and US 9,359,581 B2 85 86 optionally separated from the bacterial antibiotic resistance geting construct for deletion of ADH-A4B is given as SEQID marker and origin of replication prior to transformation into NO: 47. This sequence is a specific example of the construct Candida tropicalis. shown generically in FIG. 4: it has nearly 200 base pairs of the Candida tropicalis strain DP387 was prepared by integra genomic sequence of ADH-A4B at each end to serve as a tion of the construct shown as SEQ ID NO: 45 into the targeting sequence; between the targeting sequences are two genome of strain DP283 (Table 3) at the site of the genomic frt sites that are recognized by the flp recombinase; between sequence of the gene for ADH-A4. Candida tropicalis strain the two frt sites are sequences encoding the flp recombinase DP388 was prepared by excision of the targeting construct and a protein conferring resistance to the antibiotic nourseo from the genome of strain DP387, thereby deleting the gene thricin. Not shown in SEQID NO: 47 but also present in the encoding ADH-A4. Integration and deletion of targeting 10 targeting construct were a selective marker conferring resis sequence SEQ ID NO: 45, and analysis of integrants and tance to kanamycin and a bacterial origin of replication, so excisants were performed as described in Section 7.1. that the targeting construct can be grown and propagated in E Sequences of oligonucleotide primers for analysis of coli. The targeting sequences shown in SEQID NO: 47 also strains were: include a BSmBI restriction site at each end of the construct, 15 so that the final targeting construct can be linearized and optionally separated from the bacterial antibiotic resistance (SEQ ID NO: 109) marker and origin of replication prior to transformation into A4 - OUT-F: GAATTAGAATACAAAGATATCCCAGTG Candida tropicalis. (SEQ ID NO: 110) Candida tropicalis strain DP389 was prepared by integra A4 - OUT-R: CATCAACTTGAAGACCTGTGGCAAT tion of the construct shown as SEQ ID NO: 47 into the genome of strain DP388 (Table 3) at the site of the genomic (SEO ID NO: 79) sequence of the gene for ADH-A4B. Candida tropicalis SAT1-R: strain DP390 was prepared by excision of the targeting con (SEQ ID NO: 80) 25 struct from the genome of strain DP389, thereby deleting a SAT1 - F : portion of the coding region of the gene encoding ADH-A4B. Integration and deletion of targeting sequence SEQ ID NO: For strain DP387 (integration of SEQ ID NO: 45), PCR 47, and analysis of integrants and excisants were performed with primers A4-OUT-F and SAT1-R produces a 464 base as described in Section 7.1. pair amplicon: PCR with primers SAT1-F and A4-OUT-R Sequences of oligonucleotide primers for analysis of produces a 464 base pair amplicon. PCR from a strain with a 30 wild type copy of ADH-A4 with primers A4-OUT-F and strains were: A4-OUT-R produces a 948 base pair amplicon. For strain DP388 with a deleted copy of ADH-A4, PCR with primers (SEQ ID NO: 111) A4-OUT-F and A4-OUT-R produces a 525 base pair ampli A4 - IN-F: GAACGGTTCCTGTATGTCCTGTGAGTT CO. 35 7.5.3. Deletion of ADH-A4B (SEQ ID NO: 112) No sequence was identified for a second allele for ADH-A4 A4 - IN-R: CGGATTGGTCAATGGCTTTTTCGGAA in the initial set of 96 sequences but we reasoned that in a (SEO ID NO : 79) diploid organism, a secondallele existed. To delete the second SAT1-R: allele (ADH-A4B) we synthesized a deletion construct based 40 on the ADH-A4 sequence (SEQID NO:39), but designed it (SEQ ID NO: 80) so that the targeting sequences were homologous to regions of SAT1 - F : the ADH-A4 gene that are missing because they have been Oligonucleotides A4-IN-F and A4-IN-R are designed to deleted in strain DP388. First we constructed a “pre-target anneal to a part of the genome that is missing in Strains with ing construct comprising two targeting sequences internal to 45 deletions in ADH-A4. In such strains they will thus only be the two sequences used in the design of the targeting construct able to anneal to and amplify from the second allele ADH for the deletion of ADH-A4. The targeting sequences were A4B. For strain DP389 (integration of SEQID NO: 47), PCR separated by a sequence, given as SEQID NO: 12, compris with primers A4-IN-F and SAT1-R produces a 462 base pair ing a NotI restriction site, a 20 base pair stuffer fragment and amplicon: PCR with primers SAT1-F and A4-IN-R produces an XhoI restriction site. The targeting sequences were flanked 50 a 462 base pair amplicon. PCR from a strain with a wild-type by two BsmBI restriction sites, so that the final targeting copy of ADH-A4B with primers A4-IN-F and A4-IN-R pro construct can be linearized prior to transformation into Can duces a 488 base pair amplicon. For strain DP390 with a dida tropicalis. The sequence of the ADH-A4B pre-targeting deleted copy of ADH-A4B, PCR with primers A4-IN-F and construct is given as SEQID NO: 46. Not shown in SEQID A4-IN-R produces a 521 base pair amplicon. The amplicons NO: 46 but also present in the pre-targeting construct were a 55 with primers A4-IN-F and A4-IN-R could not distinguish selective marker conferring resistance to kanamycin and a between a strain carrying a wild-type or a deleted copy of bacterial origin of replication, so that the pre-targeting con ADH-A4B, but digestion of the amplicon with either NotI or struct can be grown and propagated in Ecoli. The sequence XhoI will cleave the amplicon derived from the deleted copy was synthesized using standard DNA synthesis techniques of the gene but not from the wildtype, thereby distinguishing well known in the art. 60 between them. A targeting construct for deletion of ADH-A4B from the 7.5.4. Deletion of ADH-B4 Candida tropicalis genome was prepared by digesting the Sequence SEQ ID NO: 42 was used to design a “pre SAT-1 flipper (SEQID NO: 1) with restriction enzymes NotI targeting construct comprising two targeting sequences and XhoI, and ligating it into the ADH-A4B pre-targeting from the 5' and 3' end of the ADH-B4 structural gene. The construct (SEQ ID NO: 46) from which the 20 base pair 65 targeting sequences were separated by a sequence, given as stuffer had been removed by digestion with restriction SEQID NO: 12, comprising a NotI restriction site, a 20 bp enzymes NotI and XhoI. The sequence of the resulting tar stuffer fragment and an XhoI restriction site. The targeting US 9,359,581 B2 87 88 sequences were flanked by two BsmBI restriction sites, so DP398 with a deleted copy of ADH-B4, PCR with primers that the final targeting construct can be linearized prior to B4-OUT-F and B4-OUT-R produces a 525 base pair ampli transformation into Candida tropicalis. The sequence of the CO. ADH-B4 pre-targeting construct is given as SEQID NO: 48. 7.5.5. Deletion of ADH-B4B Not shown in SEQ ID NO: 48 but also present in the pre No sequence was identified for a second allele for ADH-B4 targeting construct were a selective marker conferring resis in the initial set of 96 sequences but we reasoned that in a tance to kanamycin and a bacterial origin of replication, so diploid organism a second allele existed. To delete the second that the pre-targeting construct can be grown and propagated allele (ADH-B4B) we synthesized a deletion construct based in Ecoli. The sequence was synthesized using standard DNA on the ADH-B4 sequence (SEQID NO: 42), but designed it so synthesis techniques well known in the art. 10 that the targeting sequences were homologous to regions of A targeting construct for deletion of ADH-B4 from the the ADH-B4 gene that are missing because they have been Candida tropicalis genome was prepared by digesting the deleted in strain DP398. First we constructed a “pre-target SAT-1 flipper (SEQID NO: 1) with restriction enzymes NotI ing construct comprising two targeting sequences internal to and XhoI, and ligating it into the ADH-B4 pre-targeting con 15 the two sequences used in the design of the targeting construct for the deletion of ADH-B4. The targeting sequences were struct (SEQID NO: 48) from which the 20 bp stuffer had been separated by a sequence, given as SEQID NO: 12, compris removed by digestion with restriction enzymes NotI and ing a NotI restriction site, a 20 base pair stuffer fragment and XhoI. The sequence of the resulting targeting construct for an XhoI restriction site. The targeting sequences were flanked deletion of ADH-B4 is given as SEQ ID NO: 49. This by two BsmBI restriction sites, so that the final targeting sequence is a specific example of the construct shown generi construct can be linearized prior to transformation into Can cally in FIG. 4: it has nearly 200 bp of the genomic sequence dida tropicalis. The sequence of the ADH-B4B pre-targeting of ADH-B4 at each end to serve as a targeting sequence; construct is given as SEQID NO: 50. Not shown in SEQID between the targeting sequences are two frt sites that are NO: 50 but also present in the pre-targeting construct were a recognized by the flp recombinase; between the two frt sites 25 selective marker conferring resistance to kanamycin and a are sequences encoding the flp recombinase and a protein bacterial origin of replication, so that the pre-targeting con conferring resistance to the antibiotic nourseothricin. Not struct can be grown and propagated in Ecoli. The sequence shown in SEQ ID NO: 49 but also present in the targeting was synthesized using standard DNA synthesis techniques well known in the art. construct were a selective marker conferring resistance to 30 kanamycin and a bacterial origin of replication, so that the A targeting construct for deletion of ADH-B4B from the targeting construct can be grown and propagated in Ecoli. Candida tropicalis genome was prepared by digesting the The targeting sequences shown in SEQ ID NO: 49 also SAT-1 flipper (SEQID NO: 1) with restriction enzymes NotI include a BSmBI restriction site at each end of the construct, and XhoI, and ligating it into the ADH-B4B pre-targeting 35 construct (SEQ ID NO: 50) from which the 20 base pair so that the final targeting construct can be linearized and stuffer had been removed by digestion with restriction optionally separated from the bacterial antibiotic resistance enzymes NotI and XhoI. The sequence of the resulting tar marker and origin of replication prior to transformation into geting construct for deletion of ADH-B4B is given as SEQID Candida tropicalis. NO: 51. This sequence is a specific example of the construct Candida tropicalis strain DP397 was prepared by integra 40 shown generically in FIG. 4: it has nearly 200 bp of the tion of the construct shown as SEQ ID NO: 49 into the genomic sequence of ADH-B4B at each end to serve as a genome of strain DP390 (Table 3) at the site of the genomic targeting sequence; between the targeting sequences are two sequence of the gene for ADH-B4. Candida tropicalis strain frt sites that are recognized by the flp recombinase; between DP398 was prepared by excision of the targeting construct the two frt sites are sequences encoding the flp recombinase from the genome of strain DP397, thereby deleting the gene 45 and a protein conferring resistance to the antibiotic nourseo encoding ADH-B4. Integration and deletion of targeting thricin. Not shown in SEQID NO: 51 but also present in the sequence SEQ ID NO: 49, and analysis of integrants and targeting construct were a selective marker conferring resis excisants were performed as described in Section 7.1. Sequences of oligonucleotide primers for analysis of tance to kanamycin and a bacterial origin of replication, so strains were: that the targeting construct can be grown and propagated in E. 50 coli. The targeting sequences shown in SEQID NO: 51 also include a BSmBI restriction site at each end of the construct, (SEQ ID NO: 113) so that the final targeting construct can be linearized and B4 - OUT-F: AAATTAGAATACAAGGACATCCCAGTT optionally separated from the bacterial antibiotic resistance marker and origin of replication prior to transformation into (SEQ ID NO: 114) 55 Candida tropicalis. B4 - OUT-R: CATCAACTTGTAGACTTCTGGCAAT Candida tropicalis strain DP409 was prepared by integra (SEO ID NO: 79) tion of the construct shown as SEQ ID NO: 51 into the SAT1-R: genome of strain DP398 (Table 3) at the site of the genomic sequence of the gene for ADH-B4B. Candida tropicalis strain (SEQ ID NO: 80) 60 DP411 was prepared by excision of the targeting construct SAT1 - F : from the genome of strain DP409, thereby deleting a portion For strain DP397 (integration of SEQ ID NO: 49), PCR of the coding region of the gene encoding ADH-B4B. Inte with primers B4-OUT-F and SAT1-R produces a 464 bp gration and deletion of targeting sequence SEQID NO: 51, amplicon: PCR with primers SAT1-F and B4-OUT-R pro and analysis of integrants and excisants were performed as duces a 464 base pair amplicon. PCR from a strain with a wild 65 described in Section 7.1. type copy of ADH-B4 with primers B4-OUT-F and Sequences of oligonucleotide primers for analysis of B4-OUT-R produces a 948 base pair amplicon. For strain strains were: US 9,359,581 B2 90 so that the final targeting construct can be linearized and (SEQ ID NO: 115) optionally separated from the bacterial antibiotic resistance B4 - OUT-R: GAACGGTTCCTGTATGAACTGTGAGTA marker and origin of replication prior to transformation into (SEQ ID NO: 116) Candida tropicalis. B4 - IN-R: CAGATTGGTTGATGGCCTTTTCGGAG 5 Candida tropicalis strain DP415 was prepared by integra (SEO ID NO: 79) tion of the construct shown as SEQ ID NO: 53 into the SAT1-R: genome of strain DP411 (Table 3) at the site of the genomic sequence of the gene for ADH-A10. Candida tropicalis strain (SEQ ID NO: 80) DP416 was prepared by excision of the targeting construct SAT1 - F : 10 from the genome of strain DP415, thereby deleting the gene Oligonucleotides B4-IN-F and B4-IN-R are designed to encoding ADH-A10. Integration and deletion of targeting anneal to a part of the genome that is missing in Strains with sequence SEQ ID NO: 53, and analysis of integrants and deletions in ADH-B4. In such strains they will thus only be excisants were performed as described in Section 7.1. able to anneal to and amplify from the second allele ADH 15 Sequences of oligonucleotide primers for analysis of B4B. For strain DP409 (integration of SEQID NO:51), PCR strains were: with primers B4-IN-F and SAT1-R produces a 462 base pair amplicon: PCR with primers SAT1-F and B4-IN-R produces (SEO ID NO : 117) a 462 base pair amplicon. PCR from a strain with a wild-type A10 - OUT-F: AAGTTAGAATACAAAGACGTGCCGGTC copy of ADH-B4B with primers B4-IN-F and B4-IN-R pro duces a 488 base pair amplicon. For strain DP411 with a (SEQ ID NO: 118) deleted copy of ADH-B4B, PCR with primers B4-IN-F and A1O-OUT-R: CATCAAGTCAAAAATCTCTGGCACT B4-IN-R produces a 521 base pair amplicon. The amplicons (SEO ID NO: 147) with primers B4-IN-F and B4-IN-R could not distinguish SAT1-R: between a strain carrying a wild-type or a deleted copy of 25 ADH-B4B, but digestion of the amplicon with either NotI or (SEQ ID NO: 80) XhoI will cleave the amplicon derived from the deleted copy SAT1 - F : of the gene but not from the wild type, thereby distinguishing For strain DP415 (integration of SEQ ID NO: 49), PCR between them. with primers A10-OUT-F and SAT1-R produces a 464 base 7.5.6. Deletion of ADH-A10 30 pair amplicon: PCR with primers SAT1-F and A10-OUT-R Sequence SEQ ID NO: 40 was used to design a “pre produces a 464 base pair amplicon. PCR from a strain with a targeting construct comprising two targeting sequences wild type copy of ADH-A10 with primers A10-OUT-F and from the 5' and 3' end of the ADH-A10 structural gene. The A 10-OUT-R produces a 948 base pair amplicon. For strain targeting sequences were separated by a sequence, given as DP416 with a deleted copy of ADH-A10, PCR with primers SEQ ID NO: 12, comprising a NotI restriction site, a 20 bp 35 A10-OUT-F and A10-OUT-R produces a 525 base pair stuffer fragment and an XhoI restriction site. The targeting amplicon. sequences were flanked by two BsmBI restriction sites, so 7.5.7. Deletion of ADH-B11 that the final targeting construct can be linearized prior to Sequence SEQ ID NO: 43 was used to design a “pre transformation into Candida tropicalis. The sequence of the targeting construct comprising two targeting sequences ADH-A10 pre-targeting construct is given as SEQIDNO: 52. 40 from the 5' and 3' end of the ADH-B11 structural gene. The Not shown in SEQ ID NO: 52 but also present in the pre targeting sequences were separated by a sequence, given as targeting construct were a selective marker conferring resis SEQID NO: 12, comprising a NotI restriction site, a 20 base tance to kanamycin and a bacterial origin of replication, so pair stuffer fragment and an XhoI restriction site. The target that the pre-targeting construct can be grown and propagated ing sequences were flanked by two BSmBI restriction sites, so in Ecoli. The sequence was synthesized using standard DNA 45 that the final targeting construct can be linearized prior to synthesis techniques well known in the art. transformation into Candida tropicalis. The sequence of the A targeting construct for deletion of ADH-A10 from the ADH-B11 pre-targeting construct is given as SEQID NO:54. Candida tropicalis genome was prepared by digesting the Not shown in SEQ ID NO: 54 but also present in the pre SAT-1 flipper (SEQID NO: 1) with restriction enzymes NotI targeting construct were a selective marker conferring resis and XhoI, and ligating it into the ADH-A10 pre-targeting 50 tance to kanamycin and a bacterial origin of replication, so construct (SEQ ID NO: 52) from which the 20 base pair that the pre-targeting construct can be grown and propagated stuffer had been removed by digestion with restriction in Ecoli. The sequence was synthesized using standard DNA enzymes NotI and XhoI. The sequence of the resulting tar synthesis techniques well known in the art. geting construct for deletion of ADH-A10 is given as SEQID A targeting construct for deletion of ADH-B11 from the NO: 53. This sequence is a specific example of the construct 55 Candida tropicalis genome was prepared by digesting the shown generically in FIG. 4: it has nearly 200 bp of the SAT-1 flipper (SEQID NO: 1) with restriction enzymes NotI genomic sequence of ADH-A10 at each end to serve as a and XhoI, and ligating it into the ADH-B11 pre-targeting targeting sequence; between the targeting sequences are two construct (SEQ ID NO: 54) from which the 20 base pair frt sites that are recognized by the flp recombinase; between stuffer had been removed by digestion with restriction the two frt sites are sequences encoding the flp recombinase 60 enzymes NotI and XhoI. The sequence of the resulting tar and a protein conferring resistance to the antibiotic nourseo geting construct for deletion of ADH-B11 is given as SEQID thricin. Not shown in SEQID NO: 53 but also present in the NO: 55. This sequence is a specific example of the construct targeting construct were a selective marker conferring resis shown generically in FIG. 4: it has nearly 200base pair of the tance to kanamycin and a bacterial origin of replication, so genomic sequence of ADH-B11 at each end to serve as a that the targeting construct can be grown and propagated in E 65 targeting sequence; between the targeting sequences are two coli. The targeting sequences shown in SEQID NO: 53 also frt sites that are recognized by the flp recombinase; between include a BSmBI restriction site at each end of the construct, the two frt sites are sequences encoding the flp recombinase US 9,359,581 B2 91 92 and a protein conferring resistance to the antibiotic nourseo NotI restriction site, a 20 base pair stuffer fragment and an thricin. Not shown in SEQID NO: 55 but also present in the XhoI restriction site. The targeting sequences were flanked by targeting construct were a selective marker conferring resis two BSmBI restriction sites, so that the final targeting con tance to kanamycin and a bacterial origin of replication, so struct can be linearized prior to transformation into Candida that the targeting construct can be grown and propagated in E tropicalis. The sequence of the ADH-A1 OB pre-targeting coli. The targeting sequences shown in SEQID NO: 53 also construct is given as SEQID NO: 57. Not shown in SEQID include a BSmBI restriction site at each end of the construct, NO: 57 but also present in the pre-targeting construct were a so that the final targeting construct can be linearized and selective marker conferring resistance to kanamycin and a optionally separated from the bacterial antibiotic resistance bacterial origin of replication, so that the pre-targeting con marker and origin of replication prior to transformation into 10 Candida tropicalis. struct can be grown and propagated in Ecoli. The sequence Candida tropicalis strain DP417 was prepared by integra was synthesized using standard DNA synthesis techniques tion of the construct shown as SEQ ID NO: 55 into the well known in the art. genome of strain DP416 (Table 3) at the site of the genomic A targeting construct for deletion of ADH-A1 OB from the sequence of the gene for ADH-B1 1. Candida tropicalis strain 15 Candida tropicalis genome was prepared by digesting the DP421 was prepared by excision of the targeting construct SAT-1 flipper (SEQID NO: 1) with restriction enzymes NotI from the genome of strain DP417, thereby deleting the gene and XhoI, and ligating it into the ADH-A10B pre-targeting encoding ADH-B1 1. Integration and deletion of targeting construct (SEQID NO:57) from which the 20 bp stuffer had sequence SEQ ID NO: 55, and analysis of integrants and been removed by digestion with restriction enzymes Notland excisants were performed as described in Section 7.1. XhoI. The sequence of the resulting targeting construct for Sequences of oligonucleotide primers for analysis of deletion of ADH-A10B is given as SEQ ID NO: 58. This strains were: sequence is a specific example of the construct shown generi cally in FIG. 4: it has nearly 200 base pairs of the genomic sequence of ADH-A10B at each end to serve as a targeting (SEQ ID NO: 119) 25 sequence; between the targeting sequences are two frt sites B11 - OUT-F: CCATTGCAATACACCGATATCCCAGTT that are recognized by the flp recombinase; and between the (SEQ ID NO: 12O) two frt sites are sequences encoding the flp recombinase and B11 - OUT-R: CAACAATTTGAAAATCTCTGGCAAT a protein conferring resistance to the antibiotic nourseothri cin. Not shown in SEQ ID NO: 58 but also present in the (SEO ID NO: 79) 30 targeting construct were a selective marker conferring resis SAT1-R: tance to kanamycin and a bacterial origin of replication, so (SEO ID NO: 80) that the targeting construct can be grown and propagated in E SAT1 - F : coli. The targeting sequences shown in SEQID NO: 58 also For strain DP417 (integration of SEQ ID NO: 49), PCR include a BSmBI restriction site at each end of the construct, with primers B11-OUT-F and SAT1-R produces a 464 base 35 so that the final targeting construct can be linearized and pair amplicon: PCR with primers SAT1-F and B11-OUT-R optionally separated from the bacterial antibiotic resistance produces a 464 base pair amplicon. PCR from a strain with a marker and origin of replication prior to transformation into wild type copy of ADH-B11 with primers B11-OUT-F and Candida tropicalis. B11-OUT-R produces a 948base pair amplicon. For strain Candida tropicalis strain DP424 was prepared by integra DP421 with a deleted copy of ADH-B11, PCR with primers 40 tion of the construct shown as SEQ ID NO: 58 into the B11-OUT-F and B11-OUT-R produces a525 base pair ampli genome of strain DP421 (Table 3) at the site of the genomic CO. sequence of the gene for ADH-A10B. Candida tropicalis 7.5.8. Deletion of ADH-A1 OB strain DP431 was prepared by excision of the targeting con No sequence was identified for a second allele for ADH struct from the genome of strain DP424, thereby deleting a A10 in the initial set of 96 sequences but we reasoned that in 45 portion of the coding region of the gene encoding ADH a diploid organism a second allele existed. At our first attempt A1OB. Integration and deletion of targeting sequence SEQID we were unable to delete the second allele (ADH-A10B) NO: 58, and analysis of integrants and excisants were per using the strategy described for ADH-A4B and ADH-B4B. formed as described in Section 7.1. Sequences of oligonucle We used the primers A10-IN-F and A10-IN-R to amplify an otide primers for analysis of strains were A10-IN-F (SEQID ~500 base pair amplicon from genomic DNA from strain 50 NO: 121), A10-IN-R (SEQID NO: 122), SAT1-R (SEQ ID DP415 which has the SAT1-flipper inserted into the first NO: 79), and SAT1-F (SEQID NO: 80). ADH-A10 allele, preventing it from amplifying with these Oligonucleotides A10-IN-F and A10-IN-R are designed to primers. The amplicon was cloned and sequenced, the anneal to a part of the genome that is missing in Strains with sequence is given as SEQID NO: 56. deletions in ADH-A10. In such strains they will thus only be 55 able to anneal to and amplify from the second allele ADH A10B. For strain DP424 (integration of SEQ ID NO: 58), (SEQ ID NO: 121) PCR with primers A10-IN-F and SAT1-R produces a 462 A1O-IN-F: GAATGGTTCGTGTATGAACTGTGAGTT base pair amplicon: PCR with primers SAT1-F and A10-IN-R produces a 462 base pair amplicon. PCR from a strain with a (SEQ ID NO: 122) 60 wild-type copy of ADH-A1 OB with primers A10-IN-F and A1O-IN-R: CCGACTGGTTGATTGCCTTTTCGGAC A 10-IN-R produces a 488 base pair amplicon. For strain We constructed a “pre-targeting construct comprising two DP431 with a deleted copy of ADH-A10B, PCR with primers targeting sequences based on SEQID NO: 56. A single muta A 10-IN-F and A10-IN-R produces a 521 base pair amplicon. tion was introduced into the sequence obtained as SEQ ID The amplicons with primers A10-IN-F and A10-IN-R could NO: 56: a Gat position 433 was mutated to a C to destroy an 65 not distinguish between a strain carrying a wild-type or a unwanted BSmBI site. The targeting sequences were sepa deleted copy of ADH-A1OB, but digestion of the amplicon rated by a sequence, given as SEQID NO: 12, comprising a with either NotI or XhoI will cleave the amplicon derived US 9,359,581 B2 93 94 from the deleted copy of the gene but not from the wild type, portion of the coding region of the gene encoding ADH thereby distinguishing between them. B11B. Integration and deletion of targeting sequence SEQID 7.5.9. Deletion of ADH-B11B NO: 61, and analysis of integrants and excisants were per No sequence was identified for a second allele for ADH formed as described in Section 7.1. B11 in the initial set of 96 sequences but we reasoned that in Sequences of oligonucleotide primers for analysis of a diploid organism a second allele existed. At our first attempt strains were: we were unable to delete the second allele (ADH-B11B) using the strategy described for ADH-A4B and ADH-B4B. We used the primers B11-OUT-F and B11-OUT-R to amplify (SEQ ID NO: 119) an -950 base pair amplicon from genomic DNA from strain 10 B11 - OUT-F: DP417 which has the SAT1-flipper inserted into the first (SEQ ID NO: 123) ADH-B11 allele, preventing it from amplifying with these B11 - IN-R: CAGACTGGTTGATGGCTTTTTCAGAA primers. The amplicon was cloned and sequenced, the (SEO ID NO : 79) sequence is given as SEQID NO. 59. SAT1-R: 15 (SEQ ID NO: 80) (SEQ ID NO: 121) SAT1 - F : B11 - OUT-F GAATGGTTCGTGTATGAACTGTGAGTT For strain DP433 (integration of SEQ ID NO: 61), PCR (SEQ ID NO: 122) with primers B11-OUT-F and SAT1-R produces a 692 base B11 - OUT-R CCGACTGGTTGATTGCCTTTTCGGAC pair amplicon. PCR from a strain with a wild-type copy of We constructed a “pre-targeting construct comprising two ADH-B11B with primers B11-OUT-F and B11-IN-R pro targeting sequences based on SEQID NO. 59. The targeting duces a 718 base pair amplicon. For strain DP437 with a sequences were separated by a sequence, given as SEQ ID deleted copy of ADH-B11B, PCR with primers B11-OUT-F NO: 12, comprising a NotI restriction site, a 20 base pair 25 and B11-IN-R produces a 751 base pair amplicon. The ampli stuffer fragment and an XhoI restriction site. The targeting cons with primers B11-OUT-F and B11-IN-R could not dis sequences were flanked by two BsmBI restriction sites, so tinguish between a strain carrying a wild-type or a deleted that the final targeting construct can be linearized prior to copy of ADH-B11B, but digestion of the amplicon with either transformation into Candida tropicalis. The sequence of the NotI or XhoI will cleave the amplicon derived from the ADH-B11B pre-targeting construct is given as SEQID NO: 30 deleted copy of the gene but not from the wild type, thereby 60. Not shown in SEQ ID NO: 60 but also present in the distinguishing between them. pre-targeting construct were a selective marker conferring resistance to kanamycin and a bacterial origin of replication, 7.6. Insertion of P450 Genes into the Genome of so that the pre-targeting construct can be grown and propa Candida gated in Ecoli. The sequence was synthesized using standard 35 DNA synthesis techniques well known in the art. To achieve novel phenotypes in yeasts of the genus Can A targeting construct for deletion of ADH-B11B from the dida (e.g., Candida tropicalis), including biotransformations Candida tropicalis genome was prepared by digesting the of compounds by Candida tropicalis, including chemical con SAT-1 flipper (SEQID NO: 1) with restriction enzymes NotI versions not previously obtained, or increased rates of con and XhoI, and ligating it into the ADH-B11B pre-targeting 40 version of one or more Substrates to one or more products, or construct (SEQ ID NO: 60) from which the 20 base pair increased specificity of conversion of one or more Substrates stuffer had been removed by digestion with restriction to one or more products, or increased tolerance of a com enzymes NotI and XhoI. The sequence of the resulting tar pound by the yeast, or increased uptake of a compound by the geting construct for deletion of ADH-B11B is given as SEQ yeast, it may be advantageous to incorporate a gene encoding ID NO: 61. This sequence is a specific example of the con 45 a polypeptide into the genome of the yeast. Expression of the struct shown generically in FIG. 4: it has nearly 200 base pair polypeptide in the yeast then allows the phenotype of the of the genomic sequence of ADH-B11B at each end to serve yeast to be modified. as a targeting sequence; between the targeting sequences are In some embodiments of the invention it may be advanta two frt sites that are recognized by the flp recombinase; geous to integrate a gene encoding a polypeptide into a strain between the two frt sites are sequences encoding the flp 50 of Candida tropicalis in which one or more of the alcohol recombinase and a protein conferring resistance to the anti dehydrogenase genes ADH-A4, ADH-A4B, ADH-B4, ADH biotic nourseothricin. Not shown in SEQID NO: 61 but also B4B, ADH-A10, ADH-A1 OB, ADH-B1B and ADH-B11 present in the targeting construct were a selective marker have been disrupted. In some embodiments of the invention it conferring resistance to kanamycin and a bacterial origin of may be advantageous to integrate a gene encoding a polypep replication, so that the targeting construct can be grown and 55 tide into a yeast strain of the genus Candida in which one or propagated in Ecoli. The targeting sequences shown in SEQ more alcohol dehydrogenase genes have been disrupted, and ID NO: 61 also include a BSmBI restriction site at each end of wherein the disrupted alcohol dehydrogenase gene shares at the construct, so that the final targeting construct can be least 95% nucleotide identity, or at least 90% nucleotide linearized and optionally separated from the bacterial antibi identity, or at least 85% nucleotide identity for a stretch of at otic resistance marker and origin of replication prior to trans 60 least 100 contiguous nucleotides within the coding region, or formation into Candida tropicalis. at least 80% identical for a stretch of at least 100 contiguous Candida tropicalis strain DP433 was prepared by integra nucleotides of the coding sequence or at least 75% identical tion of the construct shown as SEQ ID NO: 61 into the for a stretch of at least 100 contiguous nucleotides of the genome of strain DP431 (Table 3) at the site of the genomic coding sequence, or at least 70% identical for a stretch of at sequence of the gene for ADH-B11B. Candida tropicalis 65 least 100 contiguous nucleotides of the coding sequence, or at strain DP437 was prepared by excision of the targeting con least 65% identical for a stretch of at least 100 contiguous struct from the genome of strain DP433, thereby deleting a nucleotides of the coding sequence, or at least 60% identical US 9,359,581 B2 95 96 for a stretch of at least 100 contiguous nucleotides of the Some embodiments said third peptide has the sequence APIX coding sequence with one of the Candida tropicalis genes CAGVTVYKALK (SEQID NO: 166). ADH-A4 (SEQ ID NO:39), ADH-B4 (SEQ ID NO: 42), In some embodiments the first genetic modification class comprises disruption of at least one alcohol dehydrogenase ADH-A10 (SEQID NO: 40), ADH-A10B (SEQID NO:56), whose amino acid sequence, predicted from translation of the ADH-B11 (SEQID NO: 43). gene that encodes it, comprises a fourth peptide. In some In some embodiments it may be advantageous to integrate embodiments said fourth peptide has the sequence a gene encoding a polypeptide into a yeast Strain of the genus GQWVAISGA (SEQ ID NO: 160). In some embodiments Candida in which (i) one or more alcohol dehydrogenase said fourth peptide has the sequence GQWVAISGAXG genes have been disrupted and (ii) the disrupted alcoholdehy GLGSL (SEQ ID NO: 167). In some embodiments said drogenase comprises a first peptide. In some embodiments 10 fourth peptide has the sequence GQWVAISGAXGGLGSLX said first peptide has the sequence VKYSGVCH (SEQ ID VQYA (SEQID NO: 168). In some embodiments said fourth NO: 156). In some embodiments said first peptide has the peptide has the sequence GQWVAISGAXGGLGSLX sequence VKYSGVCHxxxxxWKGDW (SEQID NO: 162). VQYAXAMG (SEQID NO: 169). In some embodiments said In some embodiments the first peptide has the sequence fourth peptide has the sequence GQWVAISGAXGGLGSLX 15 VQYAXAMGxRVxAIDGG. (SEQ ID NO: 170). In some VKYSGVCHXXXXXWKGDWXXXXKLPXVG embodiments the disrupted alcoholdehydrogenase sequence, GHEGAGVVV (SEQID NO: 163). predicted from translation of the gene that encodes it, com In some embodiments the disrupted alcohol dehydroge prises a fifth peptide. In some embodiments said fifth peptide nase sequence, predicted from translation of the gene that has the sequence VGGHEGAGVVV (SEQ ID NO: 157). encodes it, comprises a second peptide. In some embodi Cytochrome P450s are of particular utility in the hydroxy ments said second peptide has the sequence QYATA lation of a variety of substrates including fatty acids. Different DAVQAA (SEQ ID NO: 158). In some embodiments said cytochrome P450s are known to have different substrate and second peptide has the sequence SGYxHDGxFXOYATA regiospecificities and different specific activities. It is there DAVQAA (SEQ ID NO: 164). In some embodiments said fore useful in some embodiments of the invention to incor second peptide has the sequence GAEPNCXXADxSGYX 25 porate agene encoding a cytochrome P450 into the genome of HDGxFXOYATADAVQAA (SEQ ID NO: 165). In some the yeast. The exact P450 to be used will depend upon the embodiments the disrupted alcoholdehydrogenase sequence, substrate and the position on the substrate to be hydroxylated. predicted from translation of the gene that encodes it, com A list of P450 enzymes that may be of utility in the hydroxy prises a third peptide. In some embodiments said third peptide lation of substrates when expressed within a yeast cell are has the sequence CAGVTVYKALK (SEQ ID NO: 159). In given in Table 4. TABLE 4

First Database Second Database Accession Number Accession Number Name Species gi 29469875 gb AAO73958.1 CYP52A17 Candida tropicalis gi 29469877 gb AAO73959.1 CYP52A18 Candida tropicalis gi 231889 SP3O610.1 CP52H CANTR (Cytochrome P450 52A8) gi 3913326 sp Q12586.1 CP52I CANMA (Cytochrome P450 52A9) gi 29469881 gb AAO73961.1 CYP52A2O Candida tropicalis gi 29469879 gb AAO73960.1 CYP52A19 Candida tropicalis gi 3913329 sp Q12589.1 CP52K CANMA (Cytochrome P450 52A11) gi 3913328 sp Q12588.1 CP52J CANMA (Cytochrome P450 52A10) gi 68492087 refXP 710174.1 P450 drug resistance protein Candida albicans gi 33.95458 emb CAA75058.1 alks Candida albicans gi 68474594 refXP 718670.1 CaO 19.7513 Candida albicans gi 294.69865 gb AAO73953.1 CYP52A13 Candida tropicalis gi 149239010 refXP OO1525381.1 cytochrome P450 52A11 Lodderomyces elongisports gi 294.69867 gb AAO73954.1 CYP52A14 Candida tropicalis gi 7548332 gb AAA34353.2 cytochrome P-450-alk2 Candida tropicalis gi 732622 emb CAA39366.1 n-alkane inducible Candida maitosa cytochrome P-450 gi 231886 SP306O7.1 CP52B CANTR (Cytochrome P450 52A2) gi 68474592 refXP 718669.1 CaO 19.7512 Candida albicans gi 150864612 refXP OO1383,506.2 n-alkane inducible Pichia stipitis cytochrome P-450 gi 231888 sp P30609.1 CP52G CANTR (Cytochrome P450 52A7) gi298217 gb AAB24479.1 cytochrome P450 Candida tropicalis monoxygenase alka, P450 alkA = CYP52A7 gene product alkane-inducible gi 149246109 refXP OO1527524.1 cytochrome P450 52A2 Lodderomyces elongisports gi 294.69869 gb AAO73955.1 CYP52A15 Candida tropicalis gi 1903 19368 gb AAD22536.2 AF103948 1 cytochrome Debaryomyces hansenii P450 alkane hydroxylase gi 146419207 refXP OO1485567.1 cytochrome P450 52A12 Pichia guilliermondii gi 294.69863 gb AAO73952.1 CYP52A12 Candida tropicalis US 9,359,581 B2 97 98 TABLE 4-continued

First Database Second Database Accession Number Accession Number Name Species gi50423067 ref XP 46O112.1 DEHAOE19635g Debaryomyces hansenii gi29469871 gb AAO73956.1 bodiment Candida tropicalis gi 199432969 emb CAG88381.2 DEHA2E18612p Debaryomyces hansenii gi 170892 gb AAA34354.1 cytochrome P-450-alk1 Candida tropicalis gi50423065 ref XP 46O111.1 DEHAOE19613g Debaryomyces hansenii gi 1169075 sp P10615.3 CP52A CANTR (Cytochrome P450 52A1) gi 226487 prf 1515252A cytochrome P450alk1 gi 732623 embCAA393.67.1 n-alkane inducible Candida maltosa cytochrome P-450 gi 146413358 refXP OO1482650.1 PGUG 05670 Pichia guilliermondii gi 117182 sp P16141.3 CP52D CANMA (Cytochrome P450 52A4) gi 2608 embCAA361971 unnamed protein product Candida maltosa gi 231887 sp P30608.1 CP52F CANTR (Cytochrome P450 52A6) gi 199432970 emb CAG88382.2 DEHA2E18634p Debaryomyces hansenii gi 190349008 gb EDK41572.2 PGUG 05670 Pichia guilliermondii gi 150864699 refXP OO1383636.2 Cytochrome P450 52A12 Pichia stipitis (Alkane hydroxylase 1) (Alkane-inducible p450alk 1) (DH-ALK2) gi 117181 sp P16496.3 CP52C CANMA (Cytochrome P450 52A3) gi 199432968 emb CAG88380.2 DEHA2E18590p Debaryomyces hansenii gi50423063 ref XP 4601.10.1 DEHAOE19591g Debaryomyces hansenii gi553118 gb AAA34320.1 alkane hydroxylating cytochrome P-450 gi 117183 sp P24458.1 CP52E CANMA (Cytochrome P450 52A5) gi 68475852 refXP 717999.1 potential alkane Candida albicans hydroxylating monooxygenase P450 gi 18203639 CP52M DEBHA (Cytochrome P450 52A13) gi 146412241 ref XP OO1482092.1 cytochrome P450 52A13 Pichia guilliermondii gi 126134585 refXP OO1383817.1 Cytochrome P450 52A13 Pichia stipitis (Alkane hydroxylase 2) (Alkane-inducible p450alk 2) (DH-ALK2) gi 50418551 refXP 457792.1 DEHAOCO2981g Debaryomyces hansenii gi 149236533 ref XP OO1524144.1 cytochrome P450 52A5 Lodderomyces elongisportis gi 150864746 refXP OO1383710.2 Cytochrome P450 52A6 Pichia stipitis (CYPLILA6) (Alkane inducible P450-ALK3) gi 1492394.04 refXP OO1525578.1 cytochrome P450 52A3 Lodderomyces elongisportis gi 50417817 refXP 457727.1 DEHAOCO1177g Debaryomyces hansenii gi 1994.30432 enb CAG85755.2 DEHA2CO1100p Debaryomyces hansenii gi 1492394,02 ref XP 001525577.1 cytochrome P450 52A8 Lodderomyces elongisportis gi29469873 gb AAO73957.1 CYP52D2 Candida tropicalis gi 150866745 ref XP OO1386440.2 Cytochrome P450 52A3 Pichia stipitis (CYPLILA3) (Alkane inducible P450-ALK1-A) (P450-CM1) (CYP52A3-A) (Cytochrome P-450ALK) gi 190347603 gb EDK39907.2 PGUG 04005 Pichia guilliermondii gi 146414612 refXP OO1483.276.1 PGUG 04005 Pichia guilliermondii gi 13913325 sp Q12585.1 CP52T CANMA (Cytochrome P450 52D1) gi50553995 ref XP 5044.06.1 YALIOE25982p Yarrowia lipolytica gi 3298289 dbi BAA31433.1 ALK1 Yarrowia lipolytica gi50554897 refXP 504857.1 YALIOFO1320p Yarrowia lipolytica gi50545727 refXP 5004O2.1 YALIOBO1848p Yarrowia lipolytica gi50546066 refXP 500560.1 YALIOBO6248p Yarrowia lipolytica gi50547357 refXP 5O1148.1 YALIOB20702p Yarrowia lipolytica gi50546771 refXP 500855.1 YALIOB13816p Yarrowia lipolytica gi50546773 refXP 500856.1 YALIOB13838p Yarrowia lipolytica gi 70982077 refXP 746567.1 cytochrome P450 alkane Aspergiiitisfit migatus hydroxylase gi 119487140 refXP OO1262425.1 cytochrome P450 alkane Neosartorya fischeri hydroxylase gi50545119 re P 500097.1 YALIOA15488p Yarrowia lipolytica gi 115387741 re P OO1211376.1 cytochrome P450 52A12 Aspergilius terretts gi 145248800 re P OO1400739.1 An14g01110 Aspergilius niger gi 121714465 re P OO1274843.1 cytochrome P450 alkane Aspergilius clavatus hydroxylase gi50545471 refXP 500273.1 YALIOA20130p Yarrowia lipolytica US 9,359,581 B2 99 100 TABLE 4-continued

First Database Second Database Accession Number Accession Number Name Species gi21254.1280 refXP 002150795.1 cytochrome P450 alkane Penicillium marneffei hydroxylase gi 16978.3066 refXP OO1825995.1 Aspergilius oryzae gi 67541935 refXP 664735.1 AN7131.2 Aspergilius nidulans gi 218716670 gb EED16091.1 cytochrome P450 alkane Taiaromyces stipitatus hydroxylase gi 211584648 emb CAP74173.1 Pc14g00320 Penicillium chrysogenium gi 68475719 refXP 718066.1 potential alkane Candida albicans hydroxylating monooxygenase P450 fragment gi 231890 sp P30611.1 CP52N CANTR (Cytochrome P450 52B1) gi50553800 refXP 5043.11.1 YALIOE23474p Yarrowia lipolytica gi 115391153 refXP OO1213081.1 ATEG O3903 Aspergilius terretts gi 1169076 sp P43083.1 CP52V CANAP (Cytochrome P450 52E1) gi212537573 ref XP 00214.8942.1 cytochrome P450 family Penicillium marneffei protein gi 11948.0837 ref XP OO1260447.1 cytochrome P450 family Neosartorya fischeri protein gi 159129370 gb EDP54484.1 cytochrome P450 family Aspergiiitisfit migatus protein gi 7100.1214 refXP 755288.1 cytochrome P450 family Aspergiiitisfit migatus protein gi50548557 refXP 5O1748.1 YALIOC12122p Yarrowia lipolytica gi 211592844 emb CAP992.12.1 Pc22g 19240 Penicillium chrysogenium gi 231891 sp P30612.1 CP52P CANTR (Cytochrome P450 52C1) gi3913327 sp Q12587.1 CP52Q CANMA (Cytochrome P450 52C2) gi50548395 refXP 5O1667.1 YALIOC10054p Yarrowia lipolytica gi 145248373 refXP OO1396435.1 An13g03000 Aspergilius niger gi 169783674 refXP OO1826299.1 Aspergillus oryzae gi 169774249 refXP OO1821592.1 Aspergilius oryzae gi212536398 refXP 0021483.55.1 cytochrome P450 alkane Penicillium marneffei hydroxylase gi 211590140 emb CAP96310.1 Pc21g 14130 Penicillium chrysogenium gi 189200681 refXP OO1936677.1 cytochrome P450 52A12 Pyrenophora tritici-repentis gi 121698.992 refXP OO1267871.1 cytochrome P450 family Aspergilius clavatus protein gi 154310961 refXP OO1554811.1 BC1G 06459 Botryotinia fuckeiana gi 119497443 refXP OO1265480.1 cytochrome P450 alkane Neosartorya fischeri hydroxylase gi67539774 ref XP 663661.1 AN6057.2 Aspergilius nidulans gi3913324 sp Q12573.1 CP52W CANAP (Cytochrome P450 52E2) gi 159130401 gb EDP55514.1 cytochrome P450 alkane Aspergiiitisfit migatus hydroxylase gi70990140 refXP 749.919.1 cytochrome P450 alkane Aspergiiitisfit migatus hydroxylase gi212543867 refXP 002152088. N-alkane-inducible Penicillium marnefei ATCC 18224 cytochrome P450 gi 189204508 re P OO1938589. cytochrome P450 52A12 Pyrenophora tritici-repentis gi 67904794 re P 682653.1 AN93842 Aspergilius nidulans gi 115401146 re P OO1216161. ATEG O7540 Aspergilius terretts gi 169765686 re P OO1817314. Aspergilius oryzae gi 156034334 re P OO1585586. SS1G 13470 Scierotinia Scierotiorum gi 115389132 re P OO1212071. ATEG 02893 Aspergilius terretts gi 149249004 re P OO1528842. LELG 05768 Lodderomyces elongisportis gi 119490743 re P OO1263094. n-alkane-inducible Neosartorya fischeri cytochrome P450 gi 1695,98696 re P 001792.771. SNOG 02153 Phaeosphaeria nodorum gi 145233653 re P OO1400199. Ano2g10700 Aspergilius niger gi 121703415 re P OO1269972. cytochrome P450 alkane Aspergilius clavatus hydroxylase gi 145244813 re P 001394.678. An 11g07010 Aspergilius niger gi 115400535 re P OO1215856. ATEG 06678 Aspergilius terretts gi 156054264 re P OO1593,058. SS1G 0598O Scierotinia Scierotiorum gi 145235009 re P 001390153. Ano3g02570 Aspergilius niger gi 121714697 re P OO1274959. n-alkane-inducible Aspergilius clavatus cytochrome P450 gi 115383936 re P 001208515. ATEG 01150 Aspergilius terretts gi 119188703 re P OO1244958. CIMG 04399 Coccidioides inmitis gi 1543.03347 re P OO1552081. BC1G 09422 Botryotinia fuckeiana gi 68469246 re P 721410.1 potential n-alkane inducible Candida albicans cytochrome P-450 US 9,359,581 B2 101 102 TABLE 4-continued

First Database Second Database Accession Number Accession Number Name Species gi 211588353 emb CAP864.58.1 Pc20g11290 Penicillium chrysogenium gi 218.719422 gb EED18842.1 cytochrome P450 Taiaromyces stipitatus gi 189196472 refX P OO1934,574.1 cytochrome P450 52A11 Pyrenophora tritici-repentis gi 145228377 refX P OO1388497.1 Ano1g00510 Aspergilius niger gi 145243810 refX P 001394.417.1 An 11g04220 Aspergilius niger gi 119467390 refX P 001257501.1 n-alkane-inducible Neosartorya fischeri cytochrome P450 gi 218713692 gb EED13116.1 cytochrome P450 alkane Taiaromyces stipitatus hydroxylase gi 156040904 P OO1587438.1 SS1G 11430 Scierotinia Scierotiorum gi 21158.8608 CAP86724.1 Pc20g 13950 Penicillium chrysogenium gi 189210960 P OO1941811.1 cytochrome P450 52A11 Pyrenophora tritici-repentis gi 1543.00280 P 001550556.1 BC1G 11329 Botryotinia fuckeiana gi39965179 P 365075.1 MGG 0.9920 Magnaporthe grisea gi 70984521 P 747767.1 cytochrome P450 alkane Aspergiiitisfit migatus hydroxylase gi 164424932 re P 958030.2 Neurospora crassa gi 169785321 re P OO1827121.1 Aspergilius oryzae gi 171687345 re P OO1908613.1 Podospora anserina gi495225 dbre AAOS145.1 n-alkane-inducible Candida maltosa cytochrome P-450 gi 169778468 P OO1823699.1 Aspergilius oryzae gi 685237 h CAA35593.1 cytochrome P-450-alk2 Candida tropicalis gi 115398792 P OO1214985.1 ATEG 05807 Aspergilius terretts gi 156045685 P OO1589398.1 SS1 G 10037 Scierotinia Scierotiorum gi 116181964 P OO1220831.1 CHGG O1610 Chaetonium globosum gi212539338 P 002149824.1 N-alkane-inducible Penicillium marneffei cytochrome P450 gi55823915 gb AAV66104.1 cytochrome P450 Fusarium heterosporum gi 16978.6131 refX P OO1827526.1 Aspergilius oryzae gi 67526919 refX P 661521.1 AN3917.2 Aspergilius nidulans gi57157397 db B AD83681.1 cytochrome P-450 Alternaria Soiani gi 39954838 refX P 364111.1 MGG O8956 Magnaporthe grisea gi46108804 refX P. 381460.1 FGO1284.1 Gibbereia zeae gi 167962420 db B AGO9241.1 n-alkane inducible Candida maltosa cytochrome P-450 gi 11946.9615 refX P OO1257962.1 cytochrome P450 alkane Neosartorya fischeri hydroxylase gi70991773 refX P 750735.1 cytochrome P450 alkane Aspergiiitisfit migatus hydroxylase gi 17167918.5 refX P OO1904540.1 unnamed protein product Podospora anserina gi 119488606 refX P OO1262753.1 n-alkane-inducible Neosartorya fischeri cytochrome P450 gi 218722969 gb EED22387.1 cytochrome P450 Taiaromyces stipitatus gi 145243244 refX P 001394159.1 An 11g01550 Aspergilius niger gi212533853 refX P 002147083.1 N-alkane-inducible Penicillium marneffei cytochrome P450 gi 218720976 gb EED20395.1 cytochrome P450 alkane Taiaromyces stipitatus hydroxylase gi 145604320 refXP 362943.2 MGG O8494 Magnaporthe grisea gi 154319876 refXP OO1559255.1 BC1G O2419 Botryotinia fuckeiana gi 154272319 refXP OO1537012.1 HCAG 08121 Aiellomyces capsulatus gi 39976331 refXP 369556.1 MGG O5908 Magnaporthe grisea gi 11620.0125 refXP OO1225874.1 CHGG O8218 Chaetonium globosum gi 218722681 gb EED22099.1 cytochrome P450 alkane Taiaromyces stipitatus hydroxylase gi 145606889 refXP 361347.2 MGG O3821 Magnaporthe grisea gi211592275 emb CAP98.620.1 Pc22g13320 Penicillium chrysogenium gi 171688.034 refXP OO1908957.1 unnamed protein product Podospora anserina gi 211587061 emb CAP94723.1 Pc18g.04990 Penicillium chrysogenium gi 169612986 refXP OO1799910.1 SNOG 09621 Phaeosphaeria nodorum gi212539354 refXP 0021.49832.1 N-alkane-inducible Penicillium marneffei cytochrome P450 gi212533239 refXP 002146776.1 cytochrome P450 alkane Penicillium marneffei hydroxylase gi41079162 gb AAR99474.1 alkane monooxygenase Graphium sp. P-450 gi 159122944 gb EDP48064.1 cytochrome P450 alkane Aspergiiitisfit migatus hydroxylase gi6753.7376 ref XP 662462.1 AN4858.2 Aspergilius nidulans gi 39954738 ref XP364102.1 MGG O8947 Magnaporthe grisea gi39968,921 refXP 365851.1 MGG 10071 Magnaporthe grisea gi 70983886 refXP 747469.1 cytochrome P450 alkane Aspergiiitisfit migatus hydroxylase gi 171691438 ref XP OO1910644.1 unnamed protein product Podospora anserina gi 1191934.52 refXP OO1247332.1 CIMG 01103 Coccidioides inmitis US 9,359,581 B2 103 104 TABLE 4-continued

First Database Second Database Accession Number Accession Number Name Species gi emb CAC10088.1 related to n-alkane-inducible Neurospora crassa cytochrome P450 gi 69626152 P OO 806478. SNOG 16355 Phaeosphaeria nodorum gi 1919.1908 P OO 246560. CIMG 00331 Coccidioides inmitis gi 54296O77 P OO S48471. BC1G 12768 Botryotinia fuckeiana gi 64,42964.5 P 964653.2 Neurospora crassa gi 23117OO e l b CAC24473.1 Candida albicans gi S4305169 552.987. BC1G O8879 Botryotinia fuckeiana gl 39978.177 P 370476.1 MGG O6973 Magnaporthe grisea gl 70982576 re P 746816.1 cytochrome P450 alkane Aspergiiitisfit migatus hydroxylase gi 5431914.5 re P OO SS8890. BC1G O2524 Botryotinia fuckeiana gl 4612788S re P. 388496.1 FGO8320.1 Gibbereia zeae gl 3233066S gb AAP79879.1 cytochrome P450 Phanerochaete chrysosporium monooxygenase pc-3 gi 161936OS refXP OO 222615. CHGG O6520 Chaetonium globosum gi 45241598 refXP OO 393445. Ano9gO1270 Aspergilius niger gi 4921 O127 refXP OO S22438. MGCH7 ch7g545 Magnaporthe grisea gi 21 699244 refXP OO 267956. cytochrome P450 alkane Aspergilius clavatus hydroxylase gi S6032429 X 585052. SS1G 13912 Scierotinia Scierotiorum gi 591.22551 cytochrome P450 alkane Aspergiiitisfit migatus hydroxylase gi 45613078 P OO 412594. MGG 12496 Magnaporthe grisea gi 212531571 X P 002145942. N-alkane-inducible Penicillium marneffei cytochrome P450 gi 452S2862 refXP OO 397944. An16g06420 Aspergilius niger gi 698,55683 re X P OO 834508. CC1G O2244 Coprinopsis cinerea oikayama gi 21253O338 refX P 002145326. N-alkane-inducible Penicillium marneffei cytochrome P450 gi 61657996 gb AAX4940.0.1 cytochrome P450 Phanerochaete chrysosporium monooxygenase pc-2 gi 701.10164 re P OO1886288. CYP63 cytochrome P450 Laccaria bicolor monooxygenase-like protein gi 46.323950 refXP 748328.2 cytochrome P450 Aspergiiitisfit migatus oxidoreductase, alkane hydroxylase gi 56042346 refXP OO 587730. SS1 G 10970 Scierotinia Scierotiorum gi 89196282 refXP OO 934479. cytochrome P45071A23 Pyrenophora tritici-repentis gi 8369901 gb AAL67906.1 cytochrome P450 Phanerochaete chrysosporium monooxygenase pc-2 gli218714942 gb EED14365.1 cytochrome P450 Taiaromyces stipitatus gi 7O106497 refXP OO 884.460. cytochrome P450 Laccaria bicolor gi 6986SS34 refXP OO 839366. CC1G 08233 Coprinopsis cinerea oikayama gi 698,55669 refXP OO 8345O1. CC1G 02237 Coprinopsis cinerea oikayama gi 89.197495 refXP OO 935085. cytochrome P450 52A1 Pyrenophora tritici-repentis gl 21.8713646 gb EED13070.1 cytochrome P450 Taiaromyces stipitatus gi 70106217 refXP OO 884.32O. cytochrome P450 Laccaria bicolor gi 16197088 refXP OO 224356. CHGG O5142 Chaetonium globosum gi 8369899 gb AAL67905.1 cytochrome P450 Phanerochaete chrysosporium monooxygenase pc-1 gi 54312290 re P OO 555473. BC1G 061.78 Botryotinia fuckeiana gi S6064223 re P OO 598O33. SS1G OO119 Scierotinia Scierotiorum gi S60392.63 re P OO S86739. SS1G 11768 Scierotinia Scierotiorum gi 701OS2O6 re P OO 883816. Laccaria bicolor gi 696.13228 P OO 800031. SNOG 09744 Phaeosphaeria nodorum gi 6986.3123 P OO 838184. CC1G 12233 Coprinopsis cinerea oikayama gl 67902848 P 68 68.0.1 AN84.11.2 Aspergilius nidulans gi 58.3924.52 e l b CAO91865.1 monooxygenase Penicillium expansium gi 698,57173 P OO 83S239. CC1G O7782 Coprinopsis cinerea oikayama gi 6978.1220 P OO 825073. Aspergilius oryzae gl P 663925.1 AN6321.2 Aspergilius nidulans gi 45234SS3 re P OO 3.8992S. Ano3g00180 Aspergilius niger gi 70106275 P OO 884349. Laccaria bicolor gi 4561 OO12 P 366716.2 MGG O2792 Magnaporthe grisea gi 19473653 P OO 258702. cytochrome P450 Neosartorya fischeri monooxygenase gi 180263SS CAL.69594.1 Cordyceps bassiana gi S430994.5 . 554.305. BC1G 06893 Botryotinia fuckeiana gl 211593324 e l b CAP997O6.1 Pc22g24180 Penicillium chrysogenium gi 701 11410 P OO 886909. cytochrome P450 Laccaria bicolor monooxygenase CYP63 gi 698.64610 re P OO 838912. CC1G 05465 Coprinopsis cinerea oikayama gi 4524OOO7 re P OO 3926SO. Ano8g05330 Aspergilius niger gi 154333O2 re P OO 216788. Aspergilius terretts gi 21701751 re P OO 269140. Cytochrome P450 Aspergilius clavatus oxidoreductase US 9,359,581 B2 105 106 TABLE 4-continued

First Database Second Database Accession Number Accession Number Name Species gi 154289956 refXP OO15455.81.1 BC1G 15919 Botryotinia fuckeiana gi212527006 ref XP 002143660.1 cytochrome P450 alkane Penicillium marneffei hydroxylase gi 156054506 refXP OO15.93179.1 SS1 G 0610 Scierotinia Scierotiorum gi 167962.125 dbi BAGO9240.1 n-alkane inducible Candida maltosa gi 1696.10561 refXP OO1798.699.1 Phaeosphaeria nodorum gi 154322320 refXP OO1560475.1 Botryotinia fuckeiana gi 171986596 gb ACB59278.1 Pseudozymafiocculosa gi 16985.0022 refXP OO1831709.1 Coprinopsis cinerea oikayama gi 84514171 gb ABC59094.1 cytochrome P450 Medicago truncatula monooxygenase CYP704G9 gi 157349259 emb CAO244.05.1 Vitis vinifera gi 154322983 refXP OO1560806.1 BC1G 00834 Botryotinia fuckeiana gi 71726950 gb AAZ39646.1 cytochrome P450 Petuniax hybrida monooxygenase gi 2160323 dbi BAAO5146.1 n-alkane-inducible Candida maltosa cytochrome P-450 gi 218717320 gb EED16741.1 cytochrome P450 Taiaromyces stipitatus gi 118485860 gb ABK94777.1 Populus trichocarpa gi 71024781 refXP 762620.1 UMO6473.1 Ustilago maydis gi58265 104 refXP 569708.1 Cryptococci is neoformans var. neoformans gi 169596949 refXP OO1791898.1 SNOG 01251 Phaeosphaeria nodorum gi 1573.55912 emb CAO49769.1 Vitis vinifera gi 134109309 refXP 776769.1 CNBC2600 Cryptococci is neoformans var. neoformans gi 157349262 emb CAO24408.1 Vitis vinifera gi 147765747 emb CAN6O189.1 Vitis vinifera gi 1698.64676 refXP OO1838945.1 CC1G 05498 Coprinopsis cinerea oikayama gi 157352095 emb CAO43102.1 Vitis vinifera gi 147791153 emb CAN63571.1 Vitis vinifera gi 84514173 gb ABC59095.1 cytochrome P450 Medicago truncatula monooxygenase CYP704G7 gi 71024761 refXP 76261.0.1 UMO64631 Ustilago maydis gi 1573.55911 emb CAO49768.1 Vitis vinifera gi 115451645 ref NP OO1049423.1 OsO3g0223100 Oryza sativa gi 22748335 gbAANO5337.1 cytochrome P450 Oryza sativa gi 168059245 refXP OO1781.614.1 Physcomitrella patens Subsp. patens gi 15225499 refNP 182075.1 CYP704A2 (cytochrome Arabidopsis thaliana P450, family 704, Subfamily A, polypeptide 2) oxygen binding gi75319885 CO4C1, PINTA (Cytochrome P450 704C1) gi 167521978 refXP OO1745327.1 Monosiga brevicollis gi 21536522 gb AAM60854.1 cytochrome P450-like Arabidopsis thaliana protein gi 15242759 refNP 201150.1 CYP94B1 (cytochrome Arabidopsis thaliana P450, family 94, Subfamily B, polypeptide 1) oxygen binding gi 168031659 refXP OO1768338.1 Physcomitrella patens Subsp. patens gi 15733.9131 emb CAO42482.1 Vitis vinifera gi 30682301 refNP 196442.2 cytochrome P450 family Arabidopsis thaliana protein gi 8346562 emb CAB93726.1 cytochrome P450-like Arabidopsis thaliana protein gi2344895 gb AAC31835.1 cytochrome P450 Arabidopsis thaliana gi 30689861 refNP 850427.1 CYP704A1 (cytochrome Arabidopsis thaliana P450, family 704, Subfamily A, polypeptide 1) oxygen binding gi 15221776 refNP 173862.1 CYP86C1 (cytochrome Arabidopsis thaliana P450, family 86, Subfamily C, polypeptide 1) oxygen binding gi 147793,015 emb CAN77648.1 Vitis vinifera gi 157356646 emb CAO62841.1 Vitis vinifera gi 147844260 emb CAN80040.1 Vitis vinifera gi215466577 gb EEB965.17.1 MPER 04337 Moniliophthora perniciosa gi 15222515 refNP 176558.1 CYP86A7 (cytochrome Arabidopsis thaliana P450, family 86, subfamily A, polypeptide 7) oxygen binding gi 1946977 24 gb ACF82946.1 Zea mays gi 168021353 refXP OO1763,206.1 Physcomitrella patens Subsp. patens gi 115483.036 ref NP OO1065111.1 Os10g.0525000 Oryza sativa (japonica cultivar-group) US 9,359,581 B2 107 108 TABLE 4-continued

First Database Second Database Accession Number Accession Number Name Species gi 157338660 emb CAO42011.1 Vitis vinifera gi 147836212 emb CAN75428.1 Vitis vinifera gi5042165 emb CAB44684.1 cytochrome P450-like Arabidopsis thaliana protein gi79326551 ref NP OO1031814.1 CYP96A10 (cytochrome Arabidopsis thaliana P450, family 96, Subfamily A, polypeptide 10) heme binding iron ion binding? oxygenase gi 26452145 dbi BAC43161.1 cytochrome P450 Arabidopsis thaliana gi 110289450 gb AAP54707.2 Cytochrome P450 family Oryza sativa protein, expressed gi 21593258 gb AAM65207.1 cytochrome P450 Arabidopsis thaliana gi 11548.3034 ref NP OO1065110.1 Os10g.0524700 Oryza sativa gi 118486379 gb ABK95.030.1 Populus trichocarpa gi 10442763 gb AAG17470.1 AF123610 9 cytochrome Titiciinaestivitin P450 gi 125532704 gb EAY79269.1 OsI 34384 Oryza sativa gi 15237250 refNP 197710.1 CYP86B1 (cytochrome Arabidopsis thaliana P450, family 86, Subfamily B, polypeptide 1) oxygen binding gi 12554.9414 gb EAY95236.1 OsI 17053 Oryza sativa gi 110289453 gb AAP547 10.2 Cytochrome P450 family Oryza sativa protein gi 20146744 gb AAM12480.1 ACO74232 7 cytochrome Oryza sativa P450-like protein gi 218184911 gb EEC67338.1 OsI 34388 Oryza sativa Indica Group gi 125549325 gb EAY95147.1 OsI 16965 Oryza sativa Indica Group gi 1984.72816 refXP 002133118.1 GA29OOO Drosophia pseudoobscura pseudoobscura gi 195574346 refXP 002105150.1 GD21336 Drosophila simulans gi 168024173 ref XP OO1764611.1 Physcomitrella patens Subsp. patens gi 11544.0549 ref NP OO1044554.1 OsO1g0804400 Oryza sativa (japonica cultivar-group) gi 15223657 refNP 176086.1 CYP96A15/MAH1 (MID Arabidopsis thaliana CHAINALKANE HYDROXYLASE 1) oxygen binding gi 12554O131 gb EAY86526.1 OsI O7906 Oryza sativa gi 11546.0030 refNP 001053615.1 OsO4g.0573900 Oryza sativa (japonica cultivar-group) gi 157349258 emb CAO24404.1 Vitis vinifera gi 157346575 emb CAO16644.1 Vitis vinifera gi 147835182 enb CAN76753.1 Vitis vinifera gi 1956.13956 gb ACG28808.1 Zea mays gi 194753285 refXP OO1958947.1 GF12 635 Drosophila ananassae gi 156546811 ref XP OO1606040.1 Nasonia vitripennis gi 125583181 gb EAZ24112.1 Os) 007595 Oryza sativa (japonica cultivar-group) gi 15229477 refNP 189243.1 CYP86C2 (cytochrome Arabidopsis thaliana P450, family 86, Subfamily C, po ypeptide 2) oxygen binding gi940446 emb CAA62082.1 cytoc hrome p450 Arabidopsis thaliana gi 115447789 ref NP OO1047674.1 OsO2g0666500 Oryza sativa (japonica cultivar-group) gi 15227788 refNP 1798.99.1 CYP96A1 (cytochrome Arabidopsis thaliana P450, family 96, Subfamily A, po ypeptide 1) oxygen binding gi 195503768 refXP 002098.791.1 GE23 738 Drosophila yakuba gi 147804860 emb CAN66874.1 Vitis vinifera gi 84514169 gb ABC59093.1 cytoc hrome P450 Medicago truncatula OO oxygenase CYP94C9 gi 19698839 gb AAL91155.1 cytoc hrome P450 Arabidopsis thaliana gi 15237768 refNP 200694.1 CYP86A1 (cytochrome Arabidopsis thaliana P450, family 86, Subfamily A, po ypeptide 1) oxygen binding gi 157353969 emb CAO4651.0.1 Vitis vinifera gi 169865676 refXP OO1839436.1 CC1G 06649 Coprinopsis cinerea oikayama gi85001697 gb ABC68403.1 cytoc hrome P450 Glycine max OO oxygenase CYP86A24 gi 115466172 refNP 001056685.1 Os06g0129900 Oryza sativa gi 195637782 gb ACG38359.1 cytoc hrome P450 86A2 Zea mays gi 194704220 gb ACF86194.1 Zea mays gi 710064.08 ref XP 757870.1 UMO 723.1 Ustilago maydis 521 gi 195161677 refXP 002021689.1 GL26 642 Drosophila persimilis gi 115459886 refNP 001053543.1 OsO4g.056O100 Oryza sativa gi 194704096 gb ACF86132.1 Zea mays gi 147773635 emb CAN67559.1 Vitis vinifera US 9,359,581 B2 109 110 TABLE 4-continued

First Database Second Database Accession Number Accession Number Name Species gi 125575195 gb EAZ16479.1 Os. O30688 Oryza sativa gi 115482616 refNP 001064901.1 Os10g.0486100 Oryza sativa gi 71726942 gb AAZ39.642.1 cytochrome P450 fatty acid Petuniax hybrida omega-hydroxylase gi 195626182 gb ACG34921.1 cytochrome P45086A1 Zea mays gi 194907382 refXP OO1981543.1 GG11553 Drosophila erecta gi 71006688 refXP 758O10.1 UMO1863.1 Ustilago maydis gi 15734.6247 emb CAO15944.1 Vitis vinifera gi 116830948 gb ABK28430.1 Arabidopsis thaliana gi 13641298 gb AAK31592.1 cytochrome P450 Brassica rapa Subsp. pekinensis gi 2258321 gb AAB63277.1 cytochrome P450 Phanerochaete chrysosporium gi 15218671 refNP 174713.1 CYP94D1 (cytochrome Arabidopsis thaliana P450, family 94, subfamily D, polypeptide 1) oxygen binding gi 195623910 gb ACG33785.1 cytochrome P45086A1 Zea mays gi 157337152 emb CAO21498.1 Vitis vinifera

In some embodiments of the invention it is advantageous to 42), ADH-A10 (SEQID NO: 40), ADH-A10B (SEQID NO: integrate one or more genes encoding a P450 enzyme-into a 56), ADH-B11 (SEQID NO: 43). yeast strain, a species of the genus Candida or a strain of In some embodiments it may be advantageous to integrate Candida tropicalis in which genes or pathways that cause a gene encoding a cytochrome P450 into a yeast strain of the 25 genus Candida in which (i) one or more alcohol dehydroge further oxidation of the substrate have been disrupted. In nase genes have been disrupted and (ii) the disrupted alcohol Some embodiments, a strain of yeast in which one or more dehydrogenase comprises a first peptide. In some embodi cytochrome P450s or one or more alcohol oxidase or one or ments the first peptide has the sequence VKYSGVCH (SEQ more alcoholdehydrogenase have been disrupted will oxidize ID NO: 156). In some embodiments the first peptide has the hydroxyl groups to aldehydes or acids more slowly than 30 sequence VKYSGVCHxxxxxWKGDW (SEQID NO: 162). strains of yeast in which these genes have not been disrupted. In some embodiments the first peptide has the sequence In some embodiments of the invention it may be advanta VKYSGVCHXXXXXWKGDWXXXXKLPXVG geous to integrate a cytochrome P450 into a strain of Candida GHEGAGVVV (SEQ ID NO: 163). In some embodiments tropicalis in which fatty alcohol oxidase genes FAO1, the disrupted alcohol dehydrogenase sequence, predicted FAO1B, FAO2 and FAO2B have been disrupted. In some 35 from translation of the gene that encodes it, comprises a embodiments of the invention it may be advantageous to second peptide. In some embodiments the second peptide has integrate a cytochrome P450 into a strain of Candida tropi the sequence QYATADAVQAA (SEQID NO: 158). In some calis in which at least one of the fatty alcohol oxidase genes embodiments the second peptide has the sequence SGYX FAO1, FAO1B, FAO2 and FAO2B have been disrupted. In HDGxFXOYATADAVQAA (SEQ ID NO: 164). In some Some embodiments of the invention it may be advantageous 40 embodiments the second peptide has the sequence GAEP to integrate a cytochrome P450 into a strain of Candida NCXXADxSGYxHDGxFxOYATADAVQAA (SEQ ID NO: tropicalis in which alcohol dehydrogenase genes ADH-A4. 165). ADH-A4B, ADH-B4, ADH-B4B, ADH-A10 and ADH-B11 In some embodiments the disrupted alcohol dehydroge have been disrupted. In some embodiments of the invention it nase sequence, predicted from translation of the gene that may be advantageous to integrate a cytochrome P450 into a 45 encodes it, comprises a third peptide. In some embodiments the third peptide has the sequence CAGVTVYKALK (SEQ strain of Candida tropicalis in which one or more of the ID NO: 159). In some embodiments the third peptide has the alcohol dehydrogenase genes ADH-A4, ADH-A4B, ADH sequence APIXCAGVTVYKALK (SEQID NO: 166). B4, ADH-B4B, ADH-A10, ADH-A1 OB, ADH-B1B and In some embodiments the first genetic modification class ADH-B11 have been disrupted. In some embodiments of the 50 comprises disruption of at least one alcohol dehydrogenase invention it may be advantageous to integrate a gene encoding whose amino acid sequence, predicted from translation of the a cytochrome P450 into a yeast species of the genus Candida gene that encodes it, comprises a fourth peptide. In some in which one or more alcoholdehydrogenase genes have been embodiments the fourth peptide has the sequence disrupted, and wherein the disrupted alcohol dehydrogenase GQWVAISGA(SEQID NO: 160). In some embodiments the gene shares at least 95% nucleotide identity, or at least 90% 55 fourth peptide has the sequence GQWVAISGAXGGLGSL nucleotide identity, or at least 85% nucleotide identity for a (SEQID NO: 167). In some embodiments the fourth peptide stretch of at least 100 contiguous nucleotides, or at least 80% has the sequence GQWVAISGAXGGLGSLXVQYA (SEQID identical for a stretch of at least 100 contiguous nucleotides of NO: 168). In some embodiments the fourth peptide has the the coding sequence, or at least 75% identical for a stretch of sequence GQWVAISGAXGGLGSLXVQYAXAMG (SEQID at least 100 contiguous nucleotides of the coding sequence, or 60 NO: 169). In some embodiments the fourth peptide has the at least 70% identical for a stretch of at least 100 contiguous Sequence GQWVAISGAxGGLGSLXVQYAXAM nucleotides of the coding sequence, or at least 65% identical GxRVXAIDGG (SEQID NO: 170). for a stretch of at least 100 contiguous nucleotides of the In some embodiments the disrupted alcohol dehydroge coding sequence, or at least 60% identical for a stretch of at nase sequence, predicted from translation of the gene that least 100 contiguous nucleotides of the coding sequence 65 encodes it, comprises a fifth peptide. In some embodiments within the coding region with one of the Candida tropicalis said fifth peptide has the sequence VGGHEGAGVVV (SEQ genes ADH-A4 (SEQ ID NO:39), ADH-B4 (SEQ ID NO: ID NO: 157). US 9,359,581 B2 111 112 In some embodiments of the invention it may be advanta dehydrogenase genes ADH-A4, ADH-A4B, ADH-B4, ADH geous to integrate a cytochrome P450 into a strain of Candida B4B, ADH-A10 and ADH-B11 and cytochrome P450 genes tropicalis in which cytochrome P450 genes CYP52A17, CYP52A17, CYP52A18, CYP52A13, CYP52A14, CYP52A18, CYP52A13, CYP52A14, CYP52A12 and CYP52A12 and CYP52A12B have been disrupted, for CYP52A12B have been disrupted. In some embodiments it example strain DP421, in which the B-oxidation pathway has may be advantageous to integrate a cytochrome P450 into a also been disrupted. strain of Candida tropicalis in which one or more of the In some embodiments, one or more genes, two or more cytochrome P450 genes CYP52A17, CYP52A18, genes, or three or more genes listed in Table 4 are integrated CYP52A13, CYP52A14, CYP52A12 and CYP52A12B have into a strain of Candida tropicalis in which endogenous cyto been disrupted. In some embodiments it may be advanta 10 geous to integrate a cytochrome P450 into a strain of Candida chrome P450s have been disrupted. tropicalis in which fatty alcohol oxidase genes FAO1, In some embodiments, one or more genes, two or more FAO1B, FAO2 and FAO2B, alcohol dehydrogenase genes genes, or three or more genes listed in Table 4 are integrated ADH-A4, ADH-A4B, ADH-B4, ADH-B4B, ADH-A10 and into a strain of Candida in which endogenous cytochrome ADH-B11 and cytochrome P450 genes CYP52A17, 15 P450s have been disrupted. CYP52A18, CYP52A13, CYP52A14, CYP52A12 and In some embodiments, a gene having at least 40 percent CYP52A12B have been disrupted, for example strain DP421, sequence identity, at least 45 percent sequence identity, at in which the B-oxidation pathway has also been disrupted. least 50 percent sequence identity, at least 55 percent In some embodiments, a cytochrome P450 is integrated sequence identity, at least 60 percent sequence identity, at into a strain of Candida tropicalis in which endogenous cyto least 65 percent sequence identity, at least 70 percent chrome P450s have been disrupted. sequence identity, at least 75 percent sequence identity, at In some embodiments, a cytochrome P450 is integrated least 80 percent sequence identity, at least 85 percent into a strain of Candida in which endogenous cytochrome sequence identity, at least 90 percent sequence identity, or at P450s have been disrupted. least 95 percent sequence identity to a gene listed in Table 4 is In some embodiments, a cytochrome P450 is integrated 25 integrated into a yeast strain, a species of Candida, or a strain into a strain of yeast of a species of the genus Candida in of Candida tropicalis in which genes or pathways that cause which endogenous cytochrome P450s have been disrupted. further oxidation of a fatty acid substrate (e.g., a C-carboxyl In some embodiments, one or more genes, two or more ()-hydroxy fatty acid having a carbon chain length in the genes, or three or more genes listed in Table 4 are integrated range from C6 to C22, an O.co-dicarboxylic fatty acid having into a yeast Strain, a species of Candida, or a strain of Candida 30 a carbon chain length in the range from C6 to C22, or mixtures tropicalis in which genes or pathways that cause further oxi thereof) have been disrupted. In some embodiments, this dation of a fatty acid substrate (e.g., a C-carboxyl-(1)-hydroxy strain of yeast is one in which one or more disrupted cyto fatty acid having a carbon chain length in the range from C6 chrome P450s, or one or more disrupted alcohol oxidases, or to C22, an O.()-dicarboxylic fatty acid having a carbon chain one or more disrupted alcohol dehydrogenases present in the length in the range from C6 to C22, or mixtures thereof) have 35 strain of yeast will oxidize hydroxyl groups to aldehydes or been disrupted. In some embodiments, this strain of yeast is acids more slowly than strains of yeast in which these genes one in which one or more disrupted cytochrome P450s, or one have not been disrupted. In some embodiments, this strain of or more disrupted alcohol oxidases, or one or more disrupted yeast is one in which one or more disrupted cytochrome alcohol dehydrogenases present in the strain of yeast will P450s, one or more disrupted alcohol oxidases, and one or oxidize hydroxyl groups to aldehydes or acids more slowly 40 more disrupted alcohol dehydrogenases will oxidize than strains of yeast in which these genes have not been hydroxyl groups to aldehydes or acids more slowly than disrupted. In some embodiments, this strain of yeast is one in strains of yeast in which these genes have not been disrupted. which one or more disrupted cytochrome P450s, one or more In some embodiments, a gene having at least 40 percent disrupted alcohol oxidases, and one or more disrupted alcohol sequence identity, at least 45 percent sequence identity, at dehydrogenases will oxidize hydroxyl groups to aldehydes or 45 least 50 percent sequence identity, at least 55 percent acids more slowly than strains of yeast in which these genes sequence identity, at least 60 percent sequence identity, at have not been disrupted. least 65 percent sequence identity, at least 70 percent In some embodiments, one or more genes, two or more sequence identity, at least 75 percent sequence identity, at genes, or three or more genes listed in Table 4 are integrated least 80 percent sequence identity, at least 85 percent into a strain of Candida tropicalis in which fatty alcohol 50 sequence identity, at least 90 percent sequence identity, or at oxidase genes FAO1, FAO1B, FAO2 and FAO2B have been least 95 percent sequence identity to a gene listed in Table 4 is disrupted. integrated into a strain of Candida tropicalis in which fatty In some embodiments, one or more genes, two or more alcohol oxidase genes FAO1, FAO1B, FAO2 and FAO2B genes, or three or more genes listed in Table 4 are integrated have been disrupted. into a strain of Candida tropicalis in which endogenous alco 55 In some embodiments, a gene having at least 40 percent hol dehydrogenase genes ADH-A4, ADH-A4B, ADH-B4. sequence identity, at least 45 percent sequence identity, at ADH-B4B, ADH-A10 and ADH-B11 have been disrupted. least 50 percent sequence identity, at least 55 percent In some embodiments, one or more genes, two or more sequence identity, at least 60 percent sequence identity, at genes, or three or more genes listed in Table 4 are integrated least 65 percent sequence identity, at least 70 percent into a strain of Candida tropicalis in which endogenous cyto 60 sequence identity, at least 75 percent sequence identity, at chrome P450 genes CYP52A17, CYP52A18, CYP52A13, least 80 percent sequence identity, at least 85 percent CYP52A14, CYP52A12 and CYP52A12B have been dis sequence identity, at least 90 percent sequence identity, or at rupted. least 95 percent sequence identity to a gene listed in Table 4 is In some embodiments, one or more genes, two or more integrated into a strain of Candida tropicalis in which endog genes, or three or more genes listed in Table 4 are integrated 65 enous alcohol dehydrogenase genes ADH-A4, ADH-A4B, into a strain of Candida tropicalis in which fatty alcohol ADH-B4, ADH-B4B, ADH-A10 and ADH-B11 have been oxidase genes FAO1, FAO1B, FAO2 and FAO2B, alcohol disrupted. US 9,359,581 B2 113 114 In some embodiments, a gene having at least 40 percent encodes it, comprises a fifth peptide. In some embodiments sequence identity, at least 45 percent sequence identity, at the fifth peptide has the sequence VGGHEGAGVVV (SEQ least 50 percent sequence identity, at least 55 percent ID NO: 157). sequence identity, at least 60 percent sequence identity, at In some embodiments, a gene having at least 40 percent least 65 percent sequence identity, at least 70 percent sequence identity, at least 45 percent sequence identity, at sequence identity, at least 75 percent sequence identity, at least 50 percent sequence identity, at least 55 percent least 80 percent sequence identity, at least 85 percent sequence identity, at least 60 percent sequence identity, at sequence identity, at least 90 percent sequence identity, or at least 65 percent sequence identity, at least 70 percent least 95 percent sequence identity to a gene listed in Table 4 is sequence identity, at least 75 percent sequence identity, at integrated into a strain of yeast species of the genus Candida 10 least 80 percent sequence identity, at least 85 percent in which one or more alcoholdehydrogenase genes have been sequence identity, at least 90 percent sequence identity, or at disrupted, and wherein at least one disrupted alcohol dehy least 95 percent sequence identity to a gene listed in Table 4 is drogenase gene shares at least 95% nucleotide identity, or at integrated into a strain of yeast species of the genus Candida least 90% nucleotide identity, or at least 85% nucleotide 15 in which one or more alcoholdehydrogenase genes have been identity for a stretch of at least 100 contiguous nucleotides disrupted and wherein at least one disrupted alcohol dehy within the coding region, or at least 80% identical for a stretch drogenase gene comprises a first peptide. In some embodi of at least 100 contiguous nucleotides of the coding sequence, ments the first peptide has the sequence VKYSGVCH (SEQ or at least 75% identical for a stretch of at least 100 contigu ID NO: 156). In some embodiments the first peptide has the ous nucleotides of the coding sequence, or at least 70% iden sequence VKYSGVCHxxxxxWKGDW (SEQID NO: 162). tical for a stretch of at least 100 contiguous nucleotides of the In some embodiments the first peptide has the sequence coding sequence, or at least 65% identical for a stretch of at VKYSGVCHXXXXXWKGDWXXXXKLPXVG least 100 contiguous nucleotides of the coding sequence, or at GHEGAGVVV (SEQ ID NO: 163). In some embodiments least 60% identical for a stretch of at least 100 contiguous the disrupted alcohol dehydrogenase sequence, predicted nucleotides of the coding sequence with one of the Candida 25 from translation of the gene that encodes it, comprises a tropicalis genes ADH-A4 (SEQID NO:39), ADH-B4 (SEQ second peptide. In some embodiments the second peptide has ID NO: 42), ADH-A10 (SEQID NO: 40), ADH-A10B (SEQ the sequence QYATADAVQAA (SEQID NO: 158). In some ID NO. 56), ADH-B11 (SEQID NO: 43). embodiments the second peptide has the sequence SGYX In some embodiments, a gene listed in Table 4 is integrated HDGxFXOYATADAVQAA (SEQ ID NO: 164). In some into a strain of yeast of the genus Candida in which (i) one or 30 embodiments the second peptide has the sequence GAEP more alcohol dehydrogenase genes has been disrupted and NCXXADxSGYxHDGxFxOYATADAVQAA (SEQ ID NO: 165). (ii) at least one disrupted alcohol dehydrogenase gene com In some embodiments, the disrupted alcohol dehydroge prises a first peptide. In some embodiments the first peptide nase sequence, predicted from translation of the gene that has the sequence VKYSGVCH (SEQID NO: 156). In some 35 encodes it, comprises a third peptide. In some embodiments embodiments the first peptide has the sequence VKYS the third peptide has the sequence CAGVTVYKALK (SEQ GVCHXXXXXWKGDW (SEQID NO: 162). In some embodi ID NO: 159). In some embodiments the third peptide has the ments the first peptide has the sequence VKYSGVCHXXXXX sequence APIXCAGVTVYKALK (SEQID NO: 166). WKGDWXXXXKLPxVGGHEGAGVVV (SEQID NO: 163). In some embodiments the first genetic modification class In some embodiments the disrupted alcohol dehydrogenase 40 comprises disruption of at least one alcohol dehydrogenase sequence, predicted from translation of the gene that encodes whose amino acid sequence, predicted from translation of the it, comprises a second peptide. In some embodiments the gene that encodes it, comprises a fourth peptide. In some second peptide has the sequence QYATADAVOAA (SEQID embodiments the fourth peptide has the sequence NO: 158). In some embodiments the second peptide has the GQWVAISGA(SEQID NO: 160). In some embodiments the sequence SGYxHDGxFXOYATADAVQAA (SEQ ID NO: 45 fourth peptide has the sequence GQWVAISGAXGGLGSL 164). In some embodiments the second peptide has the (SEQID NO: 167). In some embodiments the fourth peptide Sequence GAEPNCXXADxSGYxHDGxFxOYATA has the sequence GQWVAISGAXGGLGSLXVQYA (SEQID DAVQAA (SEQ ID NO: 165). NO: 168). In some embodiments, the fourth peptide has the In some embodiments the disrupted alcohol dehydroge sequence GQWVAISGAXGGLGSLXVQYAXAMG (SEQID nase sequence, predicted from translation of the gene that 50 NO: 169). In some embodiments the fourth peptide has the encodes it, comprises a third peptide. In some embodiments Sequence GQWVAISGAxGGLGSLXVQYAXAM the third peptide has the sequence CAGVTVYKALK (SEQ GxRVXAIDGG (SEQID NO: 170). ID NO: 159). In some embodiments the disrupted alcohol dehydroge In some embodiments the third peptide has the sequence nase sequence, predicted from translation of the gene that APIxCAGVTVYKALK (SEQID NO: 166). 55 encodes it, comprises a fifth peptide. In some embodiments In some embodiments the fourth peptide has the sequence said fifth peptide has the sequence VGGHEGAGVVV (SEQ GQWVAISGA(SEQID NO: 160). In some embodiments the ID NO: 157). fourth peptide has the sequence GQWVAISGAXGGLGSL In some embodiments, a gene having at least 40 percent (SEQID NO: 167). In some embodiments the fourth peptide sequence identity, at least 45 percent sequence identity, at has the sequence GQWVAISGAXGGLGSLXVQYA (SEQID 60 least 50 percent sequence identity, at least 55 percent NO: 168). In some embodiments, the fourth peptide has the sequence identity, at least 60 percent sequence identity, at sequence GQWVAISGAXGGLGSLXVQYAXAMG (SEQID least 65 percent sequence identity, at least 70 percent NO: 169). In some embodiments the fourth peptide has the sequence identity, at least 75 percent sequence identity, at Sequence GQWVAISGAxGGLGSLXVQYAXAM least 80 percent sequence identity, at least 85 percent GxRVxAIDGG (SEQID NO: 170). 65 sequence identity, at least 90 percent sequence identity, or at In some embodiments the disrupted alcohol dehydroge least 95 percent sequence identity to a gene listed in Table 4 is nase sequence, predicted from translation of the gene that integrated into a strain of Candida tropicalis in which endog US 9,359,581 B2 115 116 enous cytochrome P450 genes CYP52A17, CYP52A18, hydroxyl groups to aldehydes or acids more slowly than CYP52A13, CYP52A14, CYP52A12 and CYP52A12B have strains of yeast in which these genes have not been disrupted. been disrupted. In some embodiments, a gene having at least 40 percent In some embodiments, a gene having at least 40 percent sequence identity, at least 45 percent sequence identity, at sequence identity, at least 45 percent sequence identity, at least 50 percent sequence identity, at least 55 percent least 50 percent sequence identity, at least 55 percent sequence identity, at least 60 percent sequence identity, at sequence identity, at least 60 percent sequence identity, at least 65 percent sequence identity, at least 70 percent least 65 percent sequence identity, at least 70 percent sequence identity, at least 75 percent sequence identity, at sequence identity, at least 75 percent sequence identity, at least 80 percent sequence identity, at least 85 percent least 80 percent sequence identity, at least 85 percent 10 sequence identity, at least 90 percent sequence identity, or at sequence identity, at least 90 percent sequence identity, or at least 95 percent sequence identity to a gene listed in Table 4 is least 95 percent sequence identity to a gene listed in Table 4 is integrated into a strain of Candida tropicalis in which fatty integrated into a strain of Candida tropicalis in which fatty alcohol oxidase genes FAO1, FAO1B, FAO2 and FAO2B, alcohol oxidase genes FAO1, FAO1B, FAO2 and FAO2B alcohol dehydrogenase genes ADH-A4, ADH-A4B, ADH 15 have been disrupted. B4, ADH-B4B, ADH-A10 and ADH-B11 and cytochrome In some embodiments, a gene having at least 40 percent P450 genes CYP52A17, CYP52A18, CYP52A13, sequence identity, at least 45 percent sequence identity, at CYP52A14, CYP52A12 and CYP52A12B have been dis least 50 percent sequence identity, at least 55 percent rupted, for example strain DP421, in which the B-oxidation sequence identity, at least 60 percent sequence identity, at pathway has also been disrupted. least 65 percent sequence identity, at least 70 percent In some embodiments, a gene having at least 40 percent sequence identity, at least 75 percent sequence identity, at sequence identity, at least 45 percent sequence identity, at least 80 percent sequence identity, at least 85 percent least 50 percent sequence identity, at least 55 percent sequence identity, at least 90 percent sequence identity, or at sequence identity, at least 60 percent sequence identity, at least 95 percent sequence identity to a gene listed in Table 4 is least 65 percent sequence identity, at least 70 percent 25 integrated into a strain of Candida tropicalis in which endog sequence identity, at least 75 percent sequence identity, at enous alcohol dehydrogenase genes ADH-A4, ADH-A4B, least 80 percent sequence identity, at least 85 percent ADH-B4, ADH-B4B, ADH-A10 and ADH-B11 have been sequence identity, at least 90 percent sequence identity, or at disrupted. least 95 percent sequence identity to a gene listed in Table 4 is In some embodiments, a gene having at least 40 percent integrated into a strain of Candida tropicalis in which endog 30 sequence identity, at least 45 percent sequence identity, at enous cytochrome P450s have been disrupted. least 50 percent sequence identity, at least 55 percent In some embodiments, a gene having at least 40 percent sequence identity, at least 60 percent sequence identity, at sequence identity, at least 45 percent sequence identity, at least 65 percent sequence identity, at least 70 percent least 50 percent sequence identity, at least 55 percent sequence identity, at least 75 percent sequence identity, at sequence identity, at least 60 percent sequence identity, at 35 least 80 percent sequence identity, at least 85 percent least 65 percent sequence identity, at least 70 percent sequence identity, at least 90 percent sequence identity, or at sequence identity, at least 75 percent sequence identity, at least 95 percent sequence identity to a gene listed in Table 4 is least 80 percent sequence identity, at least 85 percent integrated into a strain of Candida tropicalis in which endog sequence identity, at least 90 percent sequence identity, or at enous cytochrome P450 genes CYP52A17, CYP52A18, least 95 percent sequence identity to a gene listed in Table 4 is 40 CYP52A13, CYP52A14, CYP52A12 and CYP52A12B have integrated into a strain of Candida in which endogenous been disrupted. cytochrome P450s have been disrupted. In some embodiments, a gene having at least 40 percent In some embodiments, a gene having at least 40 percent sequence identity, at least 45 percent sequence identity, at sequence identity, at least 45 percent sequence identity, at least 50 percent sequence identity, at least 55 percent least 50 percent sequence identity, at least 55 percent 45 sequence identity, at least 60 percent sequence identity, at sequence identity, at least 60 percent sequence identity, at least 65 percent sequence identity, at least 70 percent least 65 percent sequence identity, at least 70 percent sequence identity, at least 75 percent sequence identity, at sequence identity, at least 75 percent sequence identity, at least 80 percent sequence identity, at least 85 percent least 80 percent sequence identity, at least 85 percent sequence identity, at least 90 percent sequence identity, or at sequence identity, at least 90 percent sequence identity, or at 50 least 95 percent sequence identity to a gene listed in Table 4 is least 95 percent sequence identity to a gene listed in Table 4 is integrated into a strain of Candida tropicalis in which fatty integrated into a yeast Strain, a species of Candida, or a strain alcohol oxidase genes FAO1, FAO1B, FAO2 and FAO2B, of Candida tropicalis in which genes or pathways that cause alcohol dehydrogenase genes ADH-A4, ADH-A4B, ADH further oxidation of a fatty acid Substrate (e.g., a C-carboxyl B4, ADH-B4B, ADH-A10 and ADH-B11 and cytochrome ()-hydroxy fatty acid having a carbon chain length in the 55 P450 genes CYP52A17, CYP52A18, CYP52A13, range from C6 to C22, an O.co-dicarboxylic fatty acid having CYP52A14, CYP52A12 and CYP52A12B have been dis a carbon chain length in the range from C6 to C22, or mixtures rupted, for example strain DP421, in which the B-oxidation thereof) have been disrupted. In some embodiments, this pathway has also been disrupted. strain of yeast is one in which one or more disrupted cyto In some embodiments, a gene having at least 40 percent chrome P450s, or one or more disrupted alcohol oxidases, or 60 sequence identity, at least 45 percent sequence identity, at one or more disrupted alcohol dehydrogenases present in the least 50 percent sequence identity, at least 55 percent strain of yeast will oxidize hydroxyl groups to aldehydes or sequence identity, at least 60 percent sequence identity, at acids more slowly than strains of yeast in which these genes least 65 percent sequence identity, at least 70 percent have not been disrupted. In some embodiments, this strain of sequence identity, at least 75 percent sequence identity, at yeast is one in which one or more disrupted cytochrome 65 least 80 percent sequence identity, at least 85 percent P450s, one or more disrupted alcohol oxidases, and one or sequence identity, at least 90 percent sequence identity, or at more disrupted alcohol dehydrogenases will oxidize least 95 percent sequence identity to a gene listed in Table 4 is US 9,359,581 B2 117 118 integrated into a strain of Candida tropicalis in which endog of Zeocin was used instead of 200 ug/ml nourseothricinas the enous cytochrome P450s have been disrupted. selective antibiotic. Genomic DNA was prepared and tested In some embodiments, a gene having at least 40 percent for the presence of the integrated DNA as described in Section sequence identity, at least 45 percent sequence identity, at 7.1.3. least 50 percent sequence identity, at least 55 percent Candida tropicalis strain DP201 was prepared by integra sequence identity, at least 60 percent sequence identity, at tion of the construct shown as SEQ ID NO: 70 into the least 65 percent sequence identity, at least 70 percent genome of strain DP186 (Table 3) at the site of the genomic sequence identity, at least 75 percent sequence identity, at sequence of the gene for isocitrate lyase. DP428 was prepared least 80 percent sequence identity, at least 85 percent by integration of the construct shown as SEQID NO: 70 into sequence identity, at least 90 percent sequence identity, or at 10 the genome of strain DP421 (Table 3) at the site of the least 95 percent sequence identity to a gene listed in Table 4 is genomic sequence of the gene for isocitrate lyase. Sequences integrated into a strain of Candida in which endogenous of oligonucleotide primers for analysis of strains were: cytochrome P450s have been disrupted. To achieve novel phenotypes of Candida, it may be advan tageous to modify the activity of a polypeptide by altering its 15 (SEQ ID NO: 124) sequence and to test the effect of the polypeptide with altered ICL-IN-F1: GGATCCGTCTGAAGAAATCAAGAACC sequence within the yeast. A preferred method for testing the (SEQ ID NO: 125) effect of sequence changes in a polypeptide within yeast is to 1758R2 : TGGTGTAGGCCAATAATTGCTTAATGATATACAAAACTGGC introduce a plurality of genes of known sequence, each ACCACAA encoding a unique modified polypeptide, into the same (SEQ ID NO: 126) genomic location in a plurality of strains. 1782 : GAGCAATTGTTGGAATATTGGTACGTTGTGGTGCCAGTTTT The isocitrate lyase promoter from Candida tropicalis has GTATATCA been shown to be an inducible promoter in both Saccharomy ces cerevisiae and E. coli (Atomiet al., 1995, Arch Microbiol: (SEO ID NO: 127) 163,322-8: Umemura et al., 1995, Appl Microbiol Biotech 25 1758R34 : CGAACTTAACAATAGCACCGTCTTGCAAACACATGGTCAA nol: 43, 489-92.) When expressed in S. cerivisiae, the isoci GTTAGTTAA. trate lyase gene was found to be inducible by acetate, glyc For strains DP201 and DP428 (integrants of SEQID NO: erol, lactate, ethanol, or oleate. Ethanol is interesting from the 70), PCR with primers ICL-IN-F1 and 1758R2 produces a perspective that is a relatively cheap inducer and oleate for the 1609 base pair amplicon indicating that the construct has fact that it is a potential substrate for the system for converting 30 been integrated in the ICL promoter region: PCR with prim fatty acids to omegahydroxy fatty acids. Inducible expression ers 1758F2 and 1758R34 produces a 1543 base pair amplicon of the Candida tropicalis ICL gene was found to be high in S. indicating that CYP52A17 has been integrated. Neither cerivisiae (as much as 30% of soluble protein), indicating that primer pair produces an amplicon from the parental strains it may serve as a strong inducible promoter in C. tropicalis. DP186 or DP421. To insert genes under control of the isocitrate lyase pro 35 7.6.2. Insertion of CYP52A13 under control of the isocitrate moter a genomic insertion construct of the form shown in lyase promoter FIG. 21 was synthesized. The sequence used for the sequence A construct for expressing Candida tropicalis cytochrome of promoter 1 was that of the Candida tropicalis isocitrate P450 CYP52A13 under the control of the isocitrate lyase lyase promoter, given as SEQID NO: 62. This promoter has promoter was made by cloning the sequence of a gene encod a BsiWI site that can be used to linearize the construct for 40 ing Candida tropicalis cytochrome P450 CYP52A13 (given Subsequent insertion into the Candida tropicalis genome. The as SEQID NO: 71) into a vector of the form shown in FIG. 23. sequence used for transcription terminator 1 was that of the The sequence of the complete vector is given as SEQID NO: Candida tropicalis isocitrate lyase terminator, given as SEQ 72. ID NO: 63. The sequence used for Promoter 2 was the TEF1 The vector was prepared as described in Section 7.1.1, promoter, given as SEQID NO: 64. The sequence used for the 45 except that the construct was linearized with BsiWI instead of bacterial promoter was the EM7 promoter, given as SEQID BsmBI. Candida tropicalis strains were transformed with the NO: 65. The sequence used for the selectable marker was the construct as described in Section 7.1.2, except that 100 g/ml Zeocin resistance gene, a version optimized for expression in of Zeocin was used instead of 200 ug/ml nourseothricinas the Candida tropicalis is given as SEQID NO: 66. The sequence selective antibiotic. Genomic DNA was prepared and tested use for Transcription terminator 2 was the CYC1 transcrip 50 for the presence of the integrated DNA as described in Section tion terminator, given as SEQID NO: 67. The sequence used 7.1.3. as the bacterial origin of replication was the pUC origin, given Candida tropicalis strain DP522 was prepared by integra as SEQID NO: 68. A genomic integration vector with these tion of the construct shown as SEQ ID NO: 72 into the components is represented graphically as FIG. 23. genome of strain DP421 (Table 3) at the site of the genomic 7.6.1. Insertion of CYP52A17 Under Control of the Isocitrate 55 sequence of the gene for isocitrate lyase. Sequences of oligo Lyase Promoter nucleotide primers for analysis of strains were: A construct for expressing Candida tropicalis cytochrome P450 CYP52A17 under the control of the isocitrate lyase promoter was made by cloning the sequence of a gene encod (SEQ ID NO: 124) ICL-IN-F1: ing Candida tropicalis cytochrome P450 CYP52A17 (given 60 as SEQID NO: 69) into a vector of the form shown in FIG. 23. (SEQ ID NO: 128) The sequence of the complete vector is given as SEQID NO: 4082R2 : CGATTAAGGCCAATGGAACAATGACGTACCACTTAGTAAAG 70. TAGGTA The vector was prepared as described in Section 7.1.1, (SEQ ID NO: 129) except that the construct was linearized with BsiWI instead of 65 4 OB2F2 : CATGACTGTTCACGACATTATTGCTACCTACTTTACTAAGT BsmBI. Candida tropicalis strains were transformed with the GGTACGTC construct as described in Section 7.1.2, except that 100 ug/ml US 9,359,581 B2 120 - Continued comprised of two targeting sequences from the 5' and 3' end of the structural gene. The targeting sequences are separated by (SEQ ID NO: 130) 4 O82R34 : ACATTTCAATATTAGCACCGT CAAATAATGACATGGTCAA a sequence, given as SEQ ID NO: 12, comprising a NotI ATGGGACA restriction site, a 20 bp stuffer fragment and an XhoI restric tion site. The targeting sequences are flanked by BSmBI For strain DP522 (integration of SEQ ID NO: 72), PCR restriction sites, so that the final targeting construct can be with primers ICL-IN-F1 and 4082R2 produces a 1600 base linearized prior to transformation into Candida tropicalis. pair amplicon indicating that the construct has been inte The sequence of the first POX4 pre-targeting construct is grated in the ICL promoter region: PCR with primers 4082F2 given as SEQID NO: 137. Not shown in SEQID NO: 137 but and 4082R34 produces a 1565 base pair amplicon indicating 10 also present in the pre-targeting construct were a selective that CYP52A13 has been integrated. Neither primer pair pro marker conferring resistance to kanamycin and a bacterial duces an amplicon from the parental strain DP421. origin of replication, so that the pre-targeting construct can be 7.6.3. Insertion of CYP52A12 Under Control of the Isocitrate grown and propagated in E. coli. The first pre-targeting Lyase Promoter 15 sequence can be synthesized using standard DNA synthesis A construct for expressing Candida tropicalis cytochrome techniques well known in the art. P450 CYP52A12 under the control of the isocitrate lyase The second pre-targeting construct is comprised of two promoter was made by cloning the sequence of a gene encod targeting sequences from the 5' and 3' end of the structural ing Candida tropicalis cytochrome P450 CYP52A12 (given gene that lie internal to the 5' and 3' targeting sequences of the as SEQID NO: 73) into a vector of the form shown in FIG. 23. first pre-targeting construct. The targeting sequences are The sequence of the complete vector is given as SEQID NO: separated by a sequence, given as SEQID NO: 12, compris 74. ing a NotI restriction site, a 20 bp stuffer fragment and an The vector was prepared as described in Section 7.1.1, XhoI restriction site. The targeting sequences are flanked by except that the construct was linearized with BsiWI instead of BSmBI restriction sites, so that the final targeting construct BsmBI. Candida tropicalis strains were transformed with the 25 construct as described in Section 7.1.2, except that 100 ug/ml can be linearized prior to transformation into Candida tropi of Zeocin was used instead of 2001g/ml nourseothricin as the calis. The sequence of the second POX4 pre-targeting con selective antibiotic. Genomic DNA was prepared and tested struct is given as SEQID NO: 138. Not shown in SEQID NO: for the presence of the integrated DNA as described in Section 138 but also present in the pre-targeting construct are a selec tive marker conferring resistance to kanamycin and a bacte 7.1.3. 30 Candida tropicalis strain DP526 was prepared by integra rial origin of replication, so that the pre-targeting construct tion of the construct shown as SEQ ID NO: 74 into the can be grown and propagated in E. coli. The second pre genome of strain DP421 (Table 3) at the site of the genomic targeting sequence can synthesized using standard DNA syn sequence of the gene for isocitrate lyase. Sequences of oligo thesis techniques well known in the art. nucleotide primers for analysis of strains were: Targeting sequences for deletion of the two POX4 alleles 35 from the Candida tropicalis geneome can be prepared by digesting the SAT-1 flipper (SEQID NO: 1) with restriction (SEQ ID NO: 124) enzymes NotI and XhoI, and ligating into the POX4 pre ICL-IN-F1: targeting constructs (SEQ ID NO: 137 or SEQID NO: 138) (SEQ ID NO: 131) 40 from which the 20 bp stuffer has been removed by digestion ATCAATAATTTCCTGGGTTGCCAT with restriction enzymes Notland XhoI. The sequence of the resulting first targeting construct for the deletion of the first (SEQ ID NO: 132) allele of POX4 is given as SEQID NO: 139. The sequence of CYP52A12-F1: ATGGCAACCCAGGAAATTATTGAT the resulting second targeting construct for the deletion of the (SEQ ID NO: 133) 45 second allele of POX4 is given is SEQID NO: 140. Because CYP52A12-R1: CTACATCTTGACAAAAACACCATCATT the POX4 targeting sequences of the second targeting con For strain DP526 (integration of SEQ ID NO: 74), PCR struct lie internal to the targeting sequences of the first target with primers ICL-IN-F1 and 4082R2 produces a 1554 base ing construct, use of the first targeting construct to delete the pair amplicon indicating that the construct has been inte first POX4 allele assures that use of the second targeting grated in the ICL promoter region: PCR with primers 4082F2 50 construct is specific for the second POX4 allele since the and 4082R34 produces a 1572 base pair amplicon indicating targeting sequences of the second targeting construct no longer exist in the first deleted allele. that CYP52A12 has been integrated. Neither primer pair pro Analysis of integrants and excisants can be performed as duces an amplicon from the parental strain DP421. described in Section 7.1. Sequences of oligonucleotide prim 7.7. Deletion of Pdx Genes from Candida tropicalis 55 ers for the analysis of Strains are:

Picataggio et al., 1991, Mol Cell Biol: 11,4333-9, describe (SEQ ID NO: 141) a system for the sequential disruption of the Candida tropi POX4 -IN-L: ATGACTTTTACAAAGAAAAACGTTAGTGTATCACAAG calis chromosomal POX4 and POX5 genes, encoding distinct isozymes of the acyl coenzyme A (acyl-CoA) oxidase, which 60 (SEQ ID NO: 142) catalyze the first reaction in the B-oxidation pathway of fatty POX4 - IN-R: TTACTTGGACAAGATAGCAGCGGTTTC acids. An alternative method is to use the SAT-1 flipper. (SEO ID NO : 79) 7.7.1. Deletion of POX4 Alleles SAT1-R: TGGTACTGGTTCTCGGGAGCACAGG The sequence of a gene encoding an acyl-coenzyme A oxidase II (PXP-4) of Candida tropicalis, POX4, is given as 65 (SEQ ID NO: 80) SEQ ID NO: 136. This sequence was used to design two SAT1 - F : CGCTAGACAAATTCTTCCAAAAATTTTAGA “pre-targeting constructs. The first pre-targeting construct is US 9,359,581 B2 121 7.7.2. Deletion of POX5 Alleles The sequence of a gene encoding an acyl-coenzyme A (SEQ ID NO: 148) oxidase I (PXP-5) of Candida tropicalis, POX5, is given as POX5-IN-L: ATGCCTACCGAACTTCAAAAAGAAAGAGAA SEQ ID NO: 143. This sequence was used to design two (SEQ ID NO: 149) “pre-targeting constructs. The first pre-targeting construct is POX5 - IN-R: TTAACTGGACAAGATTTCAGCAGCTTCTTC comprised of two targeting sequences from the 5' and 3' end of (SEO ID NO : 79) the structural gene. The targeting sequences were separated SAT1-R: TGGTACTGGTTCTCGGGAGCACAGG by a sequence, given as SEQID NO: 12, comprising a NotI (SEQ ID NO: 80) restriction site, a 20 bp stuffer fragment and an XhoI restric 10 SAT1 - F : CGCTAGACAAATTCTTCCAAAAATTTTAGA tion site. The targeting sequences are flanked by BSmBI restriction sites, so that the final targeting construct can be 7.8. Insertion of Genes into the Genome of Candida linearized prior to transformation into Candida tropicalis. The sequence of the first POX5 pre-targeting construct is 15 To achieve novel phenotypes in yeasts of the genus Can given as SEQID NO: 144. Not shown in SEQID NO: 144 but dida (e.g., Candida tropicalis), including biotransformations also present in the pre-targeting construct were a selective of compounds by Candida tropicalis, including chemical marker conferring resistance to kanamycin and a bacterial conversions not previously obtained, or increased rates of conversion of one or more substrates to one or more products, origin of replication, so that the pre-targeting construct can be or increased specificity of conversion of one or more Sub grown and propagated in E. coli. The first pre-targeting strates to one or more products, or increased tolerance of a sequence can be synthesized using standard DNA synthesis compound by the yeast, or increased uptake of a compound techniques well known in the art. by the yeast, it may be advantageous to incorporate a gene The second pre-targeting construct is comprised of two encoding a polypeptide into the genome of the yeast. Expres targeting sequences from the 5' and 3' end of the structural 25 sion of the polypeptide in the yeast then allows the phenotype gene that lie internal to the 5' and 3' targeting sequences of the of the yeast to be modified. To achieve novel phenotypes of Candida, it may be advan first pre-targeting construct. The 5' targeting sequence of the tageous to modify the activity of a polypeptide by altering its second pre-targeting construct is modified at position 248 sequence and to test the effect of the polypeptide with altered (C248T) and 294 (G294A) to remove unwanted XhoI and 30 sequence within the yeast. A preferred method for testing the BSmBI sites, respectively. The targeting sequences were effect of sequence changes in a polypeptide within yeast is to separated by a sequence, given as SEQID NO: 12, compris introduce a plurality of genes of known sequence, each ing a Not restriction site, a 20 bp stuffer fragment and an encoding a unique modified polypeptide, into the same XhoI restriction site. The targeting sequences are flanked by genomic location in a plurality of Strains. 35 The isocitrate lyase promoter from Candida tropicalis has BSmBI restriction sites, so that the final targeting construct been shown to be an inducible promoter in both Saccharomy can be linearized prior to transformation into Candida tropi ces cerevisiae and E. coli as described in Atomi H. etal, 1995 calis. The sequence of the second POX5 pre-targeting con Arch Microbiol. 163:322-8; Umemura K. et al., 1995 Appl struct is given as SEQID NO: 145. Not shown in SEQID NO: Microbiol Biotechnol. 43:489-92: Kanai T. et al., 1996 Appl 145 but also present in the pre-targeting construct were a 40 Microbiol Biotechnol. 44:759-65. The paper by Atomi H. et selective marker conferring resistance to kanamycin and a al, 1995 Arch Microbiol. 163:322-8, identified the sequence bacterial origin of replication, so that the pre-targeting con between bases -394 and -379 of the promoter as a promoter struct can be grown and propagated in E. coli. The second that regulates the isocitrate lyase promoter in the yeast Sac pre-targeting sequence can be synthesized using standard charomyces cerevisiae. The DNA sequence of an isocitrate 45 lyase promoter from Candida tropicalis from base -394 to -1 DNA synthesis techniques well known in the art. is given as SEQ ID NO 161. Inducible expression of the Targeting sequences for deletion of the two POX5 alleles Candida tropicalis ICL gene was found to be high in S. from the Candida tropicalis geneome were prepared by cerivisiae (as much as 30% of soluble protein), indicating that digesting the SAT-1 flipper (SEQID NO: 1) with restriction it may serve as a strong inducible promoter in C. tropicalis. enzymes NotI and XhoI, and ligating into both of the POX5 50 The sequence of an isocitrate lyase promoter that has been pre-targeting constructs (SEQID NO 144 or 145) from which used to drive expression of a protein in the yeast Saccharo the 20 bp stuffer had been removed by digestion with restric myces cerevisiae is given as SEQID NO: 171. To insert genes tion enzymes Not and XhoI. The sequence of the resulting under control of the isocitrate lyase promoter a genomic first targeting construct for the deletion of the first allele of insertion construct of the form shown in FIG. 21 was synthe POX5 is given as SEQ ID NO: 146. The sequence of the 55 sized. A genomic integration vector with these components is resulting second targeting construct for the deletion of the represented graphically as FIG. 23. second allele of POX5 is given is SEQID NO: 147. Because In some embodiments, a construct for integration of a gene the POX5 targeting sequences of the second targeting con to be expressed into the genome of a yeast of the genus struct lie internal to the targeting sequences of the first target Candida comprises an isocitrate lyase promoter, in some ing construct, use of the first targeting construct to delete the 60 embodiments a construct for integration of a gene to be first POX5 allele assures that use of the second targeting expressed into the genome of a yeast of the genus Candida construct is specific for the second POX5 allele since the comprises the sequence shown as SEQID NO: 62, in some targeting sequences of the second targeting construct no embodiments a construct for integration of a gene to be longer exist in the first deleted allele. expressed into the genome of a yeast of the genus Candida Analysis of integrants and excisants can be performed as 65 comprises the sequence shown as SEQ ID 161, in some described in section 7.1. Sequences of oligonucleotide prim embodiments a construct for integration of a gene to be ers for the analysis of Strains are: expressed into the genome of a yeast of the genus Candida US 9,359,581 B2 123 124 comprises a sequence that is 70%, 75%, 80%, 85%, 90%, or control of an isocitrate lyase promoter, an alcohol dehydro 95% identical to the sequence shown as SEQID 161. In some genase promoter, a fatty alcohol oxidase promoter or a cyto embodiments a construct for integration of a gene to be chrome P450 promoter into a yeast strain of the genus Can expressed into the genome of a yeast of the genus Candida dida in which one or more alcoholdehydrogenase genes have comprises a sequence of Sufficient length and identity to the 5 been disrupted, and wherein the disrupted alcohol dehydro isocitrate lyase promoter to ensure integration at that locus; in genase comprises a first peptide. In some embodiments the Some embodiments said construct comprises at least 100 first peptide has the sequence VKYSGVCH (SEQ ID NO: contiguous base pairs or at least 200 contiguous base pairs or 156). In some embodiments, the first peptide has the sequence at least 300 contiguous base pairs or at least 400 contiguous VKYSGVCHxxxxxWKGDW (SEQ ID NO: 162). In some base pairs or at least 500 contiguous base pairs of the 10 embodiments the first peptide has the sequence VKYS sequence shown as SEQID NO: 62 or to the sequence shown GVCHxxxxxWKGDWXXXXKLPxVGGHEGAGVVV (SEQ as SEQ ID NO: 171; in some embodiments the construct ID NO: 163). comprises at least 100 contiguous base pairs or at least 200 In some embodiments the disrupted alcohol dehydroge contiguous base pairs or at least 300 contiguous base pairs or nase sequence, predicted from translation of the gene that at least 400 contiguous base pairs or at least 500 contiguous 15 encodes it, comprises a second peptide. In some embodi base pairs that are at least 65%, at least 70%, at least 75%, at ments the second peptide has the sequence QYATADAVOAA least 80%, at least 85%, at least 90%, at least 95%, or at least (SEQID NO: 158). In some embodiments the second peptide 98% identical to the sequence shown as SEQID NO: 62 or to has the sequence SGYxHDGxFXOYATADAVQAA (SEQID the sequence shown as SEQID NO: 171. NO: 164). In some embodiments the second peptide has the Genes may also be inserted into the genome of yeasts of the Sequence GAEPNCXXADxSGYxHDGxFxOYATA genus Candida under control of other promoters by con DAVQAA (SEQ ID NO: 165). structing analogous constructs to the one shown Schemati In some embodiments the disrupted alcohol dehydroge cally in FIG. 21. Of particular utility may be the promoters for nase sequence, predicted from translation of the gene that alcohol dehydrogenase genes, which are known to be highly encodes it, comprises a third peptide. In some embodiments expressed in other yeasts such as Saccharomyces cerevisiae. 25 the third peptide has the sequence CAGVTVYKALK (SEQ A construct for integrating into an alcohol dehydrogenase ID NO: 159). In some embodiments the third peptide has the gene locus could also have an advantage in embodiments in sequence APIXCAGVTVYKALK (SEQID NO: 166). which it is desirable to disrupt the alcohol dehydrogenase In some embodiments the first genetic modification class gene itself. In these cases it would be unnecessary to know the comprises disruption of at least one alcohol dehydrogenase full sequence of the promoter: replacing all or a part of the 30 whose amino acid sequence, predicted from translation of the coding sequence of the gene to be disrupted with the coding gene that encodes it, comprises a fourth peptide. In some sequence of the gene to be inserted would be sufficient. embodiments the fourth peptide has the sequence In some embodiments a construct for integration of a gene GQWVAISGA(SEQID NO: 160). In some embodiments the into the Candida genome with the aim of expressing a protein fourth peptide has the sequence GQWVAISGAXGGLGSL from that gene comprises a promoter from an alcohol dehy 35 (SEQID NO: 167). In some embodiments the fourth peptide drogenase gene or a promoter from a cytochrome P450 gene, has the sequence GQWVAISGAXGGLGSLXVQYA (SEQID or a promoter for a fatty alcohol oxidase gene. NO: 168). In some embodiments, the fourth peptide has the In some embodiments of the invention a gene encoding a sequence GQWVAISGAXGGLGSLXVQYAXAMG (SEQID polypeptide is integrated under control of an isocitrate lyase NO: 169). In some embodiments the fourth peptide has the promoter, an alcoholdehydrogenase promoter, a fatty alcohol 40 Sequence GQWVAISGAxGGLGSLXVQYAXAM oxidase promoter or a cytochrome P450 promoter into a strain GxRVXAIDGG (SEQID NO: 170). of Candida tropicalis in which one or more of the alcohol In some embodiments the first genetic modification class dehydrogenase genes ADH-A4, ADH-A4B, ADH-B4, ADH comprises disruption of at least one alcohol dehydrogenase B4B, ADH-A10, ADH-A1 OB, ADH-B1B and ADH-B11 whose amino acid sequence, predicted from translation of the have been disrupted. In some embodiments of the invention a 45 gene that encodes it, comprises a fifth peptide. In some gene encoding a polypeptide is integrated under control of an embodiments the fifth peptide has the sequence VGGHE isocitrate lyase promoter, an alcohol dehydrogenase pro GAGVVV (SEQID NO: 157). moter, a fatty alcohol oxidase promoter or a cytochrome P450 Insertion of the Gene Encoding mocherry Under Control of promoter into a yeast strain of the genus Candida in which the Isocitrate Lyase Promoter one or more alcohol dehydrogenase genes have been dis 50 A construct for expressing mocherry (Shaner NC, Camp rupted, and wherein the disrupted alcohol dehydrogenase bell RE, Steinbach PA, Giepmans B N, Palmer A E, Tsien R gene shares at least 95% nucleotide identity, or at least 90% Y. (2004) Nat Biotechnol. 22:1567-72) under the control of nucleotide identity, or at least 85% nucleotide identity for a the C. tropicalis isocitrate lyase promoter (given as SEQID stretch of at least 100 contiguous nucleotides within the cod NO: 62) was made by cloning the sequence of a gene encod ing region, or at least 80% identical for a stretch of at least 100 55 ing mocherry (given as SEQID NO: 75) into a vector of the contiguous nucleotides of the coding sequence or at least 75% form shown in FIG. 23 with the mGherry open reading frame identical for a stretch of at least 100 contiguous nucleotides of in the position indicated by the element labeled “Gene for the coding sequence, or at least 70% identical for a stretch of expression'. The sequence of the complete vector is given as at least 100 contiguous nucleotides of the coding sequence, or SEQID NO: 76. at least 65% identical for a stretch of at least 100 contiguous 60 The vector was prepared as described in Section 7.1.1, nucleotides of the coding sequence, or at least 60% identical except that the construct was linearized with BsiWI instead of for a stretch of at least 100 contiguous nucleotides of the BsmBI. Candida tropicalis strain DP186 (Table 3) was trans coding sequence with one of the Candida tropicalis genes formed with the constructorano DNA control as described in ADH-A4 (SEQ ID NO:39), ADH-B4 (SEQ ID NO: 42), Section 7.1.2, except that 200, 400 or 600 ug/ml of Zeocin ADH-A10 (SEQID NO: 40), ADH-A10B (SEQID NO:56), 65 were used instead of 200ug/ml nourseothricinas the selective ADH-B11 (SEQ ID NO: 43). In some embodiments of the antibiotic. Following ~48 hours at 30° C. and an additional 24 invention a gene encoding a polypeptide is integrated under hours at room temperature, 10 large red colonies were US 9,359,581 B2 125 126 observed amongst a virtually confluent background of Small mobile phase used for separation contained 10% HO, 5% white colonies on YPD agar plates with 200 ug/ml Zeocin. acetonitrile, 5% Formic acid solution (1% in water) and 80% Likewise, following 48 hours at 30° C. and an additional 48 methanol. hours at room temperature, large red colonies were observed 8.1.3. NMR for Characterization of Omega-Hydroxyfatty on the 400 and 600 ug/ml ZeocinYPD agar plates amongst a Acids and Diacids background of smaller white colonies. No red colonies were Proton (H) and 'C-NMR spectra were recorded on a observed on plates transformed with the no DNA control. A Bruker DPX300NMR spectrometer at 300 MHz. The chemi total of 8 large, red colonies were isolated and selected for cal shifts (ppm) for "H-NMR were referenced relative to further characterization (see FIG. 27). Genomic DNA was tetramethylsilane (TMS, 0.00 ppm) as the internal reference. prepared from the isolates and tested for the presence of the 10 integrated mOherry DNA at the isocitriate lyase promoter as 8.2. Oxidation of Fatty Acids by Candida tropicalis described in Section 7.1.3. All 8 tested positive for mCherry Strains Lacking Four CYP52A P450S integration at the isocitrate lyase promoter demonstrating that We compared the Candida tropicalis Strain lacking expression of genes other than isocitrate lyase can be driven in 15 CYP52A13, CYP52A14, CYP52A17 and CYP52A18 C. tropicalis using this promoter (DP174) constructed in Section 7.2 with the starting strain One of the eight isolates, Candida tropicalis strain DP197 (DP1) for their abilities to oxidize fatty acids. To engineer (Table 3), was prepared by integration of the construct shown P450s for optimal oxidation of fatty acids or other substrates as SEQID NO: 75 into the genome of strain DP186(Table 3) it is advantageous to eliminate the endogenous P450s whose at the site of the genomic sequence of the gene for isocitrate activities may mask the activities of the enzymes being engi lyase. neered. We tested Candida tropicalis strains DP1 and DP174 (genotypes given in Table 3) to determine whether the dele tion of the four CYP52 P450S had affected the ability of the (SEQ ID NO: 124) ICL-IN-F1: GGATCCGTCTGAAGAAATCAAGAACC yeast to oxidize fatty acids. 25 Cultures of the yeast strains were grown at 30° C. and 250 (SEQ ID NO: 150) rpm for 16 hours in a 500 ml flask containing 30 ml of media 1759R33: ACCTTAAAACGCATAAATTCCTTGATGATTGCCATGTTGT F (media F is peptone 3 g/l. yeast extract 6 g/l. yeast nitrogen CTTCTTCA base 6.7 g/l, sodium acetate 3 g/l. KHPO, 7.2 g/l. KHPO, For strain DP197 (integrant of SEQID NO: 75), PCR with 9.3 g/l) plus 30 g/l glucose. After 16 hours 0.5 ml of culture primers ICL-IN-F1 and 1759R33 produces a 1592 base pair 30 was added to 4.5 ml fresh media Fplus 60 g/l glucose in a 125 amplicon indicating that the construct has been integrated in ml flask, and grown at 30° C. and 250 rpm for 12 hours. the ICL promoter region. The primer pair does not produces Substrates were added and shaking was continued at 30° C. and 250 rpm. We then tested the conversion of C14 fatty acid an amplicon from the parental strain DP 186. substrates as shown in FIG. 13. FIG. 13 parts A and B show 35 that the starting strain DP1 converts methyl myristate to 8. Conversion of Fatty Acids Using Modified Strains ()-hydroxy myristate and to the C14 diacid produced by oxi of Candida Tropicalis dation of the co-hydroxy myristate over a 48 hour time course, while the quadruple P450 deletion strain DP174 can effect 8.1. Analytical Methods almost no detectable conversion. FIG. 13 parts C and D show 40 that the starting strain DP1 converts methyl myristate and 8.1.1. GC-MS for Identification of Fatty Acids, Omega-Hy Sodium myristate to co-hydroxy myristate and to the C14 droxy Fatty Acids and Diacids diacid produced by oxidation of the co-hydroxy myristate Gas chromatography/mass spectrometry (GC/MS) analy after 48 hours, while the quadruple P450 deletion strain sis was performed at 70 eV with ThermoEinnigan TraceGC DP174 effects almost no detectable conversion of these sub Ultra gas chromatograph coupled with Trace DSQ mass spec 45 Strates. trometer. Products were esterified with BF in methanol These results confirm that at least one of the four Candida (10%, w/w) at 70° C. for 20 min, and further silylation of the tropicalis cytochrome P450 genes encoding CYP52A13, methyl esters with HMDS/TMCS/Pyridine at 70° C. for 10 CYP52A14, CYP52A17 and CYP52A18 is required for min when needed. The experiments were carried out with hydroxylation of fatty acids, consistent with the schematic injector, ion source and interface temperature of 200° C. 50 representation of Candida tropicalis fatty acid metabolism 250° C. and 280°C., respectively. Samples in hexane (1 ul) pathways shown in FIG. 12. Further it shows that strain were injected in PTV split mode and run on a capillary col DP174 is an appropriate strain to use for testing of engineered umn (Varian CP8944 VF-SMS, 0.25 mmx0.25 umx30 m). cytochrome P450s, since it has essentially no ability to oxi The oven temperature was programmed at 120° C. for one dize fatty acids without an added P450. minute increasing to 260° C. at the rate of 20°C/minute, and 55 then to 280° C. at the rate of 4.0° C.Aminute. 8.3. Oxidation of C2-Hydroxy Fatty Acids by 8.1.2. LC-MS for Measurement of Fatty Acids, Omega-Hy Candida tropicalis Strains Lacking Four CYP52A droxy Fatty Acids and Diacids P450S The concentration of omega-hydroxy fatty acids and diac ids during biotransformation was measured by liquid chro 60 We compared the Candida tropicalis Strain lacking matography/mass spectrometry (LC/MS) with purified prod CYP52A13, CYP52A14, CYP52A17 and CYP52A18 ucts as standards. The solvent delivery system was a Waters (DP174) constructed in Section 7.2 with the starting strain Alliance 2795 Separation Module (Milford, Mass., USA) (DP1) for their abilities to oxidize ()-hydroxy fatty acids. To coupled with a Waters 2996 photodiode array detector and engineer a strain for the production of ()-hydroxy fatty acids Waters ZQ detector with an electron spray ionization mode. 65 it is desirable to eliminate enzymes from the cell that can The separation was carried on a reversed-phase column with oxidize co-hydroxy fatty acids. It is possible to determine a dimension of 150x4.6 mm and particle size of 5 um. The whether other enzymes involved in oxidation of co-hydroxy US 9,359,581 B2 127 128 fatty acids are present in the Strain by feeding it ()-hydroxy base 6.7 g/l, sodium acetate 3 g/l. KHPO, 7.2 g/l. KHPO fatty acids in the media. If there are enzymes present that can 9.3 g/l) plus 20 g/l glycerol. After 16 hours 0.5 ml of culture oxidize ()-hydroxy fatty acids, then the strain will convert was added to 4.5 ml fresh media Fplus 20 g/l glycerolina 125 ()-hydroxy fatty acids fed in the media to C.()-dicarboxylic ml flask, and grown at 30° C. and 250 rpm for 12 hours. We acids. 5 then tested the conversion of C12 and C16 ()-hydroxy fatty Cultures of the yeast strains were grown at 30° C. and 250 acid substrates by adding these Substrates to independent rpm for 16 hours in a 500 ml flask containing 30 ml of media flasks at final concentrations of 5 g/l and the pH was adjusted F (media F is peptone 3 g/l. yeast extract 6 g/l. yeast nitrogen to between 7.5 and 8 and shaking was continued at 30° C. and base 6.7 g/l, sodium acetate 3 g/l. KHPO, 7.2 g/l. KHPO 250 rpm. Samples were taken after 24 hours, cell culture was 9.3 g/l) plus 20 g/l glycerol. After 16 hours 0.5 ml of culture 10 acidified to pH ~1.0 by addition of 6 NHCl, products were was added to 4.5 ml fresh media Fplus 20 g/l glycerol in a 125 extracted from the cell culture by diethyl ether and the con ml flask, and grown at 30° C. and 280 rpm for 12 hours. We centrations of co-hydroxy fatty acids and C.()-diacids in the then tested the conversion of C12 and C16 ()-hydroxy fatty media were measured by LC-MS (liquid chromatography acid Substrates by adding these Substrates to independent 15 mass spectroscopy). As shown in FIG. 15 most of the hydroxy flasks at final concentrations of 5 g/l and the pH was adjusted fatty acids are converted to diacid after 24 hours. These results to between 7.5 and 8 and shaking was continued at 30° C. and show that at least one enzyme capable of oxidizing ()-hy 250 rpm. Samples were taken at the times indicated, cell droxy fatty acids is present in Candida tropicalis in addition culture was acidified to pH -1.0 bp addition of 6 NHCl, to the cytochrome P450 genes encoding CYP52A13, products were extracted from the cell culture by diethyl ether 20 CYP52A14, CYP52A17, CYP52A18, FAO1, FAO1B, and the concentrations of co-hydroxy fatty acids and C.()- FAO2A and FAO2B. diacids in the media were measured by LC-MS (liquid chro matography mass spectroscopy). The results are shown in 8.5. Oxidation of C2-Hydroxy Fatty Acids by Table 5. Candida tropicalis Strains Lacking Six CYP52A 25 P450S and Four Fatty Alcohol Oxidases TABLE 5 We compared the Candida tropicalis Strain lacking Oxidation of O-hydroxy fatty acids by Candida tropicalis CYP52A13, CYP52A14, CYP52A17, CYP52A18 and FAO1 -HYDROXY DIACID DIACID (DP186) constructed in Section 7.2 with the Candida tropi FATTY ACID PRODUCED PRODUCED 30 SUBSTRATE REACTION BY DP1 BY DP174 calis strain lacking CYP52A13, CYP52A14, CYP52A17, CHAINLENGTH TIME (G/L) (G/L) CYP52A18, FAO1, FAO1B, FAO2A, FAO2B, CYP52A12 and CYP52A12B (DP283 and DP284) for their abilities to C12 60 hours S.6 5.2 C16 60 hours 1.4 O.8 oxidize ()-hydroxy fatty acids. To engineer a strain for the C12 24 hours 5.4 5 35 production of co-hydroxy fatty acids it is desirable to elimi C12 48 hours 6 6.7 nate enzymes from the cell that can oxidize ()-hydroxy fatty C12 72 hours 6.2 6.5 C16 24 hours 2.3 O.9 acids. It is possible to determine whether other enzymes C16 48 hours 2.4 1.7 involved in oxidation of ()-hydroxy fatty acids are present in C16 72 hours 2.8 1.8 the strain by feeding it co-hydroxy fatty acids in the media. If 40 there are enzymes present that can oxidize ()-hydroxy fatty These results show that at least one enzyme capable of acids, then the strain will convert ()-hydroxy fatty acids fed in oxidizing ()-hydroxy fatty acids is present in Candida tropi the media to C.O)-dicarboxylic acids. calis in addition to the cytochrome P450 genes encoding Cultures of the yeast strains were grown at 30° C. and 250 CYP52A13, CYP52A14, CYP52A17 and CYP52A18. rpm for 16 hours in a 500 ml flask containing 30 ml of media 45 F (media F is peptone 3 g/l. yeast extract 6 g/l. yeast nitrogen 8.4. Oxidation of C2-Hydroxy Fatty Acids by base 6.7 g/l, sodium acetate 3 g/l. KHPO, 7.2 g/l. KHPO Candida tropicalis Strains Lacking Four CYP52A 9.3 g/l) plus 20 g/l glycerol. After 16 hours 0.5 ml of culture P450S and Four Fatty Alcohol Oxidases was added to 4.5 ml fresh media Fplus 20 g/l glycerolina 125 ml flask, and grown at 30° C. and 250 rpm for 12 hours. We We compared the Candida tropicalis Strain lacking 50 then tested the conversion of C12 and C16 ()-hydroxy fatty CYP52A13, CYP52A14, CYP52A17, CYP52A18 and FAO1 acid substrates by adding these Substrates to independent (DP186) constructed in Section 7.3 with the Candida tropi flasks at final concentrations of 5 g/l and the pH was adjusted calis strain lacking CYP52A13, CYP52A14, CYP52A17, to between 7.5 and 8 and shaking was continued at 30° C. and CYP52A18, FAO1, FAO1B, FAO2A and FAO2B (DP258 250 rpm. Samples were taken after 24 hours, cell culture was and DP259) for their abilities to oxidize ()-hydroxy fatty 55 acidified to pH ~1.0 by addition of 6 NHCl, products were acids. To engineer a strain for the production of ()-hydroxy extracted from the cell culture by diethyl ether and the con fatty acids it is desirable to eliminate enzymes from the cell centrations of co-hydroxy fatty acids and C.()-diacids in the that can oxidize ()-hydroxy fatty acids. It is possible to deter media were measured by LC-MS (liquid chromatography mine whether other enzymes involved in oxidation of co-hy mass spectroscopy). As shown in FIG. 16 most of the C12 droxy fatty acids are present in the strain by feeding it ()-hy- 60 hydroxy fatty acids and a substantial fraction of the C16 droxy fatty acids in the media. If there are enzymes present hydroxy fatty acids are converted to diacid after 24 hours. that can oxidize co-hydroxy fatty acids, then the strain will These results show that at least one enzyme capable of oxi convert ()-hydroxy fatty acids fed in the media to C,c)-dicar dizing ()-hydroxy fatty acids is present in Candida tropicalis boxylic acids. in addition to the cytochrome P450 genes encoding Cultures of the yeast strains were grown at 30° C. and 250 65 CYP52A13, CYP52A14, CYP52A17, CYP52A18, rpm for 16 hours in a 500 ml flask containing 30 ml of media CYP52A12, CYP52A12B, FAO1, FAO1B, FAO2A and F (media F is peptone 3 g/l. yeast extract 6 g/l. yeast nitrogen FAO2B. US 9,359,581 B2 129 130 8.6. Oxidation of C2-Hydroxy Fatty Acids by A4, ADH-A4B, ADH-B4, ADH-B4B, ADH-A10, ADH Candida tropicalis Strains Lacking Six CYP52A A1 OB and ADH-B11 (DP423), the Candida tropicalis strain P450S, Four Fatty Alcohol Oxidases and Five lacking CYP52A13, CYP52A14, CYP52A17, CYP52A18, Alcohol Dehydrogenases FAO1, FAO1B, FAO2A, FAO2B, CYP52A12, CYP52A12B, ADH-A4, ADH-A4B, ADH-B4, ADH-B4B, ADH-A10, We compared the Candida tropicalis strain DP1 with the ADH-A10B, ADH-B11 and ADH-B11B (DP434 and Candida tropicalis strain lacking CYP52A13, CYP52A14, DP436) for their abilities to oxidize co-hydroxy fatty acids. To CYP52A17, CYP52A18, FAO1, FAO1B, FAO2A, FAO2B, engineer a strain for the production of ()-hydroxy fatty acids CYP52A12 and CYP52A12B (DP283) and the Candida it is desirable to eliminate enzymes from the cell that can tropicalis strain lacking CYP52A13, CYP52A14, 10 oxidize co-hydroxy fatty acids. It is possible to determine CYP52A17, CYP52A18, FAO1, FAO1B, FAO2A, FAO2B, whether other enzymes involved in oxidation of co-hydroxy CYP52A12, CYP52A12B, ADH-A4, ADH-A4B, ADH-B4, fatty acids are present in the Strain by feeding it ()-hydroxy ADH-B4B and ADH-A10 (DP415) for their abilities to oxi fatty acids in the media. If there are enzymes present that can dize ()-hydroxy fatty acids. To engineer a strain for the pro oxidize ()-hydroxy fatty acids, then the strain will convert duction of co-hydroxy fatty acids it is desirable to eliminate 15 ()-hydroxy fatty acids fed in the media to C.()-dicarboxylic enzymes from the cell that can oxidize ()-hydroxy fatty acids. acids. It is possible to determine whether other enzymes involved in Cultures of the yeast strains were grown at 30° C. and 250 oxidation of co-hydroxy fatty acids are present in the strain by rpm for 18 hours in a 500 ml flask containing 30 ml of media feeding it w-hydroxy fatty acids in the media. If there are F (media F is peptone 3 g/l. yeast extract 6 g/l. yeast nitrogen enzymes present that can oxidize ()-hydroxy fatty acids, then base 6.7 g/l, sodium acetate 3 g/l. KHPO, 7.2 g/l. KHPO the strain will convert ()-hydroxy fatty acids fed in the media 9.3 g/l) plus 20 g/l glycerol. After 18 hours the preculture was to am-dicarboxylic acids. diluted in fresh media to A-1.0. This culture was shaken Cultures of the yeast strains were grown at 30° C. and 250 until the Asoo reached between 5.0 and 6.0. Biocatalytic con rpm for 18 hours in a 500 ml flask containing 30 ml of media version was initiated by adding 5 ml culture to a 125 ml flask F (media F is peptone 3 g/l. yeast extract 6 g/l. yeast nitrogen 25 together with 50 mg of co-hydroxy lauric acid, and pH base 6.7 g/l, sodium acetate 3 g/l. KHPO, 7.2 g/l. KHPO, adjusted to ~7.5 with 2M NaOH. Samples were taken at the 9.3 g/l) plus 20 g/l glycerol. After 18 hours the preculture was times indicated, cell culture was acidified to pH -1.0 by diluted in fresh media to A-1.0. This culture was shaken addition of 6 NHCl, products were extracted from the cell until the Asoo reached between 5.0 and 6.0. Biocatalytic con culture by diethyl ether and the concentrations of C.O)-diacids version was initiated by adding 5 ml culture to a 125 ml flask 30 in the media were measured by LC-MS (liquid chromatogra together with 50 mg of co-hydroxy lauric acid, and pH phy mass spectroscopy). As shown in FIG. 20, a significant adjusted to ~7.5 with 2M NaOH. Samples were taken at the reduction in the ability of Candida tropicalis to oxidize ()-hy times indicated, cell culture was acidified to pH -1.0 by droxy fatty acids can be obtained by deleting genes encoding addition of 6 NHCl, products were extracted from the cell alcohol dehydrogenases in Strains lacking some cytochrome culture by diethyl ether and the concentrations of C.O)-diacids 35 P450s and fatty alcohol oxidases. in the media were measured by LC-MS (liquid chromatogra phy mass spectroscopy). As shown in FIG. 19 Part A, the cell 8.8. Oxidation of Methyl Myristate by Candida growth was almost identical for the 3 strains. Strain DP415 tropicalis Strains Lacking Six CYP52A P450S, Four produced much less C.()-dicarboxy laurate than the other two Fatty Alcohol Oxidases and Six Alcohol strains, however, as shown in FIG. 19 part B. 40 Dehydrogenases with a Single CYP52A P450 Added These results show that a significant reduction in the ability Back Under Control of the ICL Promoter of Candida tropicalis to oxidize ()-hydroxy fatty acids can be reduced by deleting genes encoding CYP52A13, We compared the Candida tropicalis strain DP1 with the CYP52A14, CYP52A17, CYP52A18, FAO1, FAO1B, Candida tropicalis strain lacking CYP52A13, CYP52A14, FAO2A, FAO2B, CYP52A12, CYP52A12B, ADH-A4, 45 CYP52A17, CYP52A18 and FAO1 and with CYP52A17 ADH-A4B, ADH-B4, ADH-B4B and ADH-A10. added back under control of the isocitrate lyase promoter (DP201) and with the Candida tropicalis strain lacking 8.7. Oxidation of C2-Hydroxy Fatty Acids by CYP52A13, CYP52A14, CYP52A17, CYP52A18, FAO1, Candida tropicalis Strains Lacking Six CYP52A FAO1B, FAO2A, FAO2B, CYP52A12, CYP52A12B, ADH P450S, Four Fatty Alcohol Oxidases and Eight 50 A4, ADH-A4B, ADH-B4, ADH-B4B, ADH-A10 and ADH Alcohol Dehydrogenases B11 and with CYP52A17 added back under control of the isocitrate lyase promoter (DP428) for their abilities to oxidize We compared the Candida tropicalis strain DP1 with the methyl myristate. Candida tropicalis strain lacking CYP52A13, CYP52A14, Cultures of the yeast strains were grown at 30° C. and 250 CYP52A17, CYP52A18, FAO1, FAO1B, FAO2A, FAO2B, 55 rpm for 18 hours in a 500 ml flask containing 30 ml of media CYP52A12, CYP52A12B, ADH-A4 and ADH-A4B F (media F is peptone 3 g/l. yeast extract 6 g/l. yeast nitrogen (DP390), the Candida tropicalis strain lacking CYP52A13, base 6.7 g/l, sodium acetate 3 g/l. KHPO, 7.2 g/l. KHPO CYP52A14, CYP52A17, CYP52A18, FAO1, FAO1B, 9.3 g/l) plus 20 g/l glucose plus 5 g/l ethanol. After 18 hours FAO2A, FAO2B, CYP52A12, CYP52A12B, ADH-A4, 3 ml of preculture was added to 27 ml fresh media F plus 20 ADH-A4B, ADH-B4, ADH-B4B and ADH-A10 (DP415), 60 g/l glucose plus 5 g/l ethanol in a 500 ml flask, and grown at the Candida tropicalis strain lacking CYP52A13, 30° C. and 250 rpm for 20 hours before addition of substrate. CYP52A14, CYP52A17, CYP52A18, FAO1, FAO1B, Biocatalytic conversion was initiated by adding 40 g/l of FAO2A, FAO2B, CYP52A12, CYP52A12B, ADH-A4, methyl myristate, the pH was adjusted to ~7.8 with 2M ADH-A4B, ADH-B4, ADH-B4B, ADH-A10 and ADH-B11 NaOH. The culture was pH controlled by adding 2 mol/l (DP417 and DP421), the Candida tropicalis strain lacking 65 NaOH every 12 hours, glycerol was fed as cosubstrate by CYP52A13, CYP52A14, CYP52A17, CYP52A18, FAO1, adding 500 g/l glycerol and ethanol was fed as a inducer by FAO1B, FAO2A, FAO2B, CYP52A12, CYP52A12B, ADH adding 50% ethanol every 12 hours. Samples were taken at US 9,359,581 B2 131 132 the times indicated, cell culture was acidified to pH ~1.0 by centration of 91.5 g/l ()-hydroxy myristic acid, with a pro addition of 6 NHCl, products were extracted from the cell ductivity of 1.63 g/1/hr and aw/w ratio of co-hydroxy myristic culture by diethyl ether and the concentrations of co-hydroxy acid: tetradecanedioic acid of 20.3:1. This shows that elimi myristate and C.(I)-dicarboxymyristate were measured by nation of one or more of the genes FAO1B, FAO2A, FAO2B, LC-MS (liquid chromatography mass spectroscopy). CYP52A12, CYP52A12B, ADH-A4, ADH-A4B, ADH-B4, ADH-B4B, ADH-A10 and ADH-B11 prevents the over-oxi As shown in FIG. 24, strains DP1 and DP201 both produce dation of the fatty acid myristic acid by Candida tropicalis, significant levels of tetradecanedioic acid (the C.O)-diacid) and that the presence of CYP52A17 under control of the and negligible levels of co-hydroxy myristic acid. In contrast, isocitrate lyase promoter in this strain background produces a under these conditions strain DP428 produces approximately strain that can convert methyl myristate to ()-hydroxy myris five-fold less tetradecanedioic acid, while converting nearly 10 tic acid, but that does not over-oxidize the product to tetrade 70% of the methyl myristate to co-hydroxy myristic acid after canedioic acid. 60 hours. This shows that elimination of one or more of the genes FAO1B, FAO2A, FAO2B, CYP52A12, CYP52A12B, 8.10. Oxidation of Methyl Myristate, Oleic Acid and ADH-A4, ADH-A4B, ADH-B4, ADH-B4B, ADH-A10 and Linoleic Acid by Engineered Candida tropicalis ADH-B11 prevents the over-oxidation of the fatty acid myris 15 Strains tic acid by Candida tropicalis, and that the presence of CYP52A17 under control of the isocitrate lyase promoter in We compared the fatty acid oxidizing activities of two this strain background produces a strain that can convert Candida tropicalis strains which lack CYP52A13, methyl myristate to co-hydroxy myristic acid, but that does CYP52A14, CYP52A17, CYP52A18, FAO1, FAO1B, not over-oxidize the product to tetradecanedioic acid. FAO2A, FAO2B, CYP52A12, CYP52A12B, ADH-A4, ADH-A4B, ADH-B4, ADH-B4B, ADH-A10 and ADH-B11, 8.9. Oxidation of Methyl Myristate by an Engineered one of which has CYP52A17 added back under control of the Candida tropicalis Strain in a Fermentor isocitrate lyase promoter (DP428) and one of which has CYP52A13 added back under control of the isocitrate lyase We compared the production of co-hydroxy myristic acid 25 promoter (DP522). Cultures of the yeast strains were grown at 30° C. in a and C.(I)-tetradecanoic acid by a Candida tropicalis Strain DASGIP parallel fermentor containing 200 ml of media F lacking CYP52A13, CYP52A14, CYP52A17, CYP52A18, (media F is peptone 3 g/l. yeast extract 6 g/l. yeast nitrogen FAO1, FAO1B, FAO2A, FAO2B, CYP52A12, CYP52A12B, base 6.7 g/l, sodium acetate 3 g/l. KHPO, 7.2 g/l. KHPO ADH-A4, ADH-A4B, ADH-B4, ADH-B4B, ADH-A10 and 9.3 g/l) plus 30 g/l glucose. The pH was maintained at 6.0 by ADH-B11 and with CYP52A17 added back under control of 30 automatic addition of 6 M NaOH or 2 MHSO solution. the isocitrate lyase promoter (DP428). Dissolved oxygen was kept at 70% by agitation and O-cas C. tropicalis DP428 was taken from a glycerol stock or cade control mode. After 6 hour growth, ethanol was fed into fresh agar plate and inoculated into 500 ml shake flask con the cell culture to 5 g/l. After 12 h growth, biocatalytic con taining 30 mL of YPD medium (20 g/l glucose, 20 g/1 peptone version was initiated by adding methyl myristate acid to 60 g/1 and 10 g/l yeast extract) and shaken at 30°C., 250 rpm for 20 35 or oleic acid to 60 g/l or linoleic acid to 30 g/l. During the h. Cells were collected by centrifugation and re-suspended in conversion phase, 80% glycerol was fed as co-substrate for FM3 medium for inoculation. (FM3 medium is 30 g/l glu conversion of methyl myristate and 500 g/l glucose was fed as cose, 7 g/l ammonium Sulfate, 5.1 g/l potassium phosphate, co-substrate for conversion of oleic acid and linoleic acid by monobasic, 0.5 g/l magnesium sulfate, 0.1 g/l calcium chlo dissolved oxygen-stat control mode (the high limit of dis ride, 0.06 g/l citric acid, 0.023 g/1 ferric chloride, 0.0002 g/1 40 solved oxygen was 75% and low limit of dissolved oxygen biotin and 1 ml/l of a trace elements solution. The trace was 70%, which means glycerol feeding was initiated when elements solution contains 0.9 g/l boric acid, 0.07 g/l cupric dissolved oxygen is higher than 75% and stopped when dis sulfate, 0.18 g/l potassium iodide, 0.36 g/l ferric chloride, solved oxygen was lower than 70%). Every 12 hour, ethanol 0.72 g/l manganese sulfate, 0.36 g/l sodium molybdate, 0.72 was added into cell culture to 2 g/l. Samples were taken at g/l Zinc sulfate.) Conversion was performed by inoculating 15 45 various times, cell culture was acidified to pH~1.0 by addi ml of preculture into 135 ml FM3 medium, methyl myristate tion of 6 NHCl, products were extracted from the cell culture was added to 20 g/l and the temperature was kept at 30°C. The by diethyl ether and the concentrations of co-hydroxy fatty pH was maintained at 6.0 by automatic addition of 6 MNaOH acids and C.O)-diacids in the media were measured by LC-MS or 2 MHSO solution. Dissolved oxygen was kept at 70% by (liquid chromatography mass spectroscopy). As shown in agitation and O-cascade control mode. After 6 hours growth, 50 FIG. 25, strains DP428 and DP522 were both able to produce ethanol was fed into the cell culture to 5 g/l. During the ()-hydroxy fatty acids from these Substrates, as well as some conversion phase, 80% glycerol was fed as co-substrate by C.()-diacids. FIG. 25 also shows that the different P450s had dissolved oxygen-stat control mode (the high limit of dis different preferences for the fatty acid substrates, and differ solved oxygen was 75% and low limit of dissolved oxygen ent propensities to oxidize the ()-hydroxy group. was 70%, which means glycerol feeding was initiated when 55 dissolved oxygen is higher than 75% and stopped when dis 9. Deposit of Microorganisms solved oxygen was lower than 70%). Every 12 hours, ethanol was added into cell culture to 2 g/l, and methyl myristate was A living cultures of strain DP421 has been deposited with added to 40 g/1 until the total methyl myristate added was 140 American Type Culture Collection, 12301 Parklawn Drive, g/l (i.e. the initial 20 g/l plus 3 subsequent 40 g/l additions). 60 Rockville, Md. 20852, on May 4, 2009, under the Budapest Formation of products was measured at the indicated inter Treaty on the International Recognition of the Deposit of vals by taking samples and acidifying to pH~1.0 by addition Microorganisms for the purposes of patent procedure. of 6 NHCl; products were extracted from the cell culture by diethyl ether and the concentrations of co-hydroxy myristate 10. Equivalents and C,c)-dicarboxymyristate were measured by LC-MS (liq 65 uid chromatography mass spectroscopy), as shown in FIG. Those skilled in the art will recognize, or be able to ascer 26. Under these conditions the strain produced a final con tain using no more than routine experimentation, many US 9,359,581 B2 133 134 equivalents to the specific embodiments of the invention fiosculorum, Candida fluviatilis, Candida fragi, Candida described herein. Such equivalents are intended to be encom freyschussii, Candida friedrichii, Candida frijolesensis, passed by the following claims. Candida fructus, Candida fukazawae, Candida fungicola, All publications, patents, patent applications, and data Candida galacta, Candida galis, Candida galli, Candida bases mentioned in this specification are herein incorporated 5 gatunensis, Candida gelsemii, Candida geochares, Can by reference into the specification to the same extent as if each dida germanica, Candidaghanaensis, Candida giganten individual publication, patent, patent application or database sis, Candida glaebosa, Candida glucosophila, Candida was specifically and individually indicated to be incorporated glycerinogenes, Candida gorgasii, Candida gotoi, Can herein by reference. dida gropengiesseri, Candida guaymorum, Candida 10 haemulonii, Candida halonitratophila, Candida halo 11. Exemplary Embodiments phila, Candida hasegawae, Candida hawaiiana, Candida heliconiae, Candida hispaniensis, Candida homilentoma, The following are nonlimiting exemplary embodiments in Candida humicola, Candida humilis, Candida hungarica, accordance with the disclosed application: Candida hyderabadensis, Candida incommunis, Candida Embodiment 1. A substantially pure Candida host cell for the 15 inconspicua, Candida insectalens, Candida insectamans, biotransformation of a substrate to a product, wherein the Candida insectorum, Candida intermedia, Candida ipo Candida host cell is characterized by a first genetic modi moeae, Candida ishiwadae, Candida jaroonii, Candida fication class that comprises one or more genetic modifi jeffriesii, Candida kanchanaburiensis, Candida cations that collectively or individually disrupt an alcohol karawaiiewii, Candida kashinagacola, Candida kazuoi, dehydrogenase gene in the Substantially pure Candida host Candida khmerensis, Candida kipukae, Candida kofuen cell. sis, Candida krabiensis, Candida kruisii, Candida Embodiment 2. The substantially pure Candida host cell of kunorum, Candida labiduridarum, Candida lactis-con embodiment 1, wherein the substantially pure Candida densi, Candida lassenensis, Candida laureliae, Candida host cell is genetically modified Candida glabrata, Can leandrae, Candida lessepsii, Candida lignicola, Candida dida zeylenoides, Candida lipolytica, Candida guillermon 25 litsaeae, Candida lit.seae, Candida lanquihuensis, Can dii, Candida aaseri, Candida abiesophila, Candida afri dida lycoperdinae, Candida lyxosophila, Candida mag cana, Candida aglyptinia, Candida agrestis, Candida nifica, Candida magnoliae, Candida maltosa, Candida akabanensis, Candida alai, Candida albicans, Candida mannitofaciens, Candida marls, Candida maritima, Can alimentaria, Candida amapae, Candida ambrosiae, Can dida maxii, Candida melibiosica, Candida membranifa dida amphixiae, Candida anatomiae, Candida ancudensis, 30 ciens, Candida mesenterica, Candida metapsilosis, Can Candida anglica, Candida anneliseae, Candida antarc dida methanolophaga, Candida methanolovescens, tica, Candida antillancae, Candida anutae, Candida api Candida methanosorbosa, Candida methylica, Candida Cola, Candida apis, Candida arabinofermentans, Candida michaelii, Candida mogii, Candida montana, Candida arcana, Candida ascalaphidarum, Candida asparagi, multigeminis, Candida mycetangii, Candida naeodendra, Candida atakaporum, Candida atbi, Candida athensensis, 35 Candida nakhonratchasimensis, Candida nanaspora, Candida atlantica, Candida atmosphaerica, Candida aur Candida natalensis, Candida neerlandica, Candida memo ingiensis, Candida auris, Candida aurita, Candida austro dendra, Candida nitrativorans, Candida nitratophila, marina, Candida azyma, Candida azymoides, Candida Candida nivariensis, Candida nodaensis, Candida nor barrocoloradensis, Candida batistae, Candida beechii, vegica, Candida novakii, Candida Odintsovae, Candida Candida bentonensis, Candida bertae, Candida berthetii, 40 Oleophila, Candida Ontarioensis, Candida Ooitensis, Can Candida bituminphila, Candida blanki, Candida blattae, dida Orba, Candida Oregonensis, Candida Orthopsilosis, Candida blattariae, Candida bOhiensis, Candida boidinii, Candida Ortonii, Candida ovalis, Candida pallodes, Can Candida bokatorum, Candida boleticola, Candida bolito dida palmioleophila, Candida paludigena, Candida pana theri, Candida bombi, Candida bombiphila, Candida mensis, Candida panamericana, Candida parapsilosis, bondarzewiae, Candida bracarensis, Candida bribrorum, 45 Candida pararugosa, Candida pattaniensis, Candida Candida bromeliacearum, Candida buenavistaensis, Can peltata, Candida peoriaensis, Candida petrohuensis, Can dida buinensis, Candida butyri, Candida Californica, Can dida phangingensis, Candida picachoensis, Candida dida Canberraensis, Candida cariosilignicola, Candida piceae, Candida picinguabensis, Candida pignaliae, Can carpophila, Candida carvicola, Candida caseinolytica, dida pimensis, Candida pini, Candida plutei, Candida Candida Castrensis, Candida catenulata, Candida cellae, 50 pomicola, Candida ponderosae, Candida populi, Candida Candida cellulolytica, Candida cerambycidarum, Can powellii, Candida prunicola, Candida pseudogliaebosa, dida chauliodes, Candida chickasaworum, Candida chil Candida pseudohaemulonii, Candida pseudointermedia, ensis, Candida choctaworum, Candida Chodatii, Candida Candida pseudolambica, Candida pseudorhagii, Candida chrysomelidarum, Candida cidri, Candida cloacae, Can pseudovanderkliftii, Candida psychrophila, Candida dida coipomoensis, Candida conglobata, Candida Cory 55 pyralidae, Candida qinlingensis, Candida quercitrusa, dali, Candida cylindracea, Candida davenportii, Candida Candida quercuum, Candida railenensis, Candida ralun da visiana, Candida deformans, Candida dendrica, Can ensis, Candida rancensis, Candida restingae, Candida dida dendronema, Candida derodonti, Candida didden rhagii, Candida riodocensis, Candida rugopelliculosa, siae, Candida digboiensis, Candida diospyri, Candida Candida rugosa, Candida Sagamina, Candida Saitoana, diversa, Candida dosseyi, Candida drinydis, Candida 60 Candida sake, Candida Salmanticensis, Candida Santama drosophilae, Candida dubliniensis, Candida easanensis, riae, Candida Santjacobensis, Candida Saopaulonensis, Candidaedaphicus, Candidaedax, Candida elateridarum, Candida Savonica, Candida Schatavi, Candida Sequanen Candida emberorum, Candida endomychidarum, Candida sis, Candida Sergipensis, Candida Shehatae, Candida Sil entomophila, Candida ergastensis, Candida ernobii, Can vae, Candida Silvanorum, Candida Silvatica, Candida Sil dida etchellsii, Candida ethanolica, Candida famata, Can 65 vicola, Candida Silvicultrix, Candida Sinolaborantium, dida fennica, Candida fermenticarens, Candida floccu Candida Sithepensis, Candida Smithsonii, Candida sojae, losa, Candida floricola, Candida floris, Candida Candida Solani, Candida songkhlaensis, Candida Sono US 9,359,581 B2 135 136 rensis, Candida Sophiae-reginae, Candida Sorbophila, from the group consisting of SEQID NO:39, SEQID NO: Candida SOrbosivorans, Candida Sorboxylosa, Candida 40, SEQID NO:42, SEQID NO:43, and SEQID NO:56. spandovensis, Candida Steatolytica, Candida Stellata, Embodiment 12. The substantially pure Candida host cell of Candida Stellimalicola, Candida Stri, Candida Subhashii, embodiment 9, wherein the alcohol dehydrogenase gene Candida succiphila, Candida Suecica, Candida Suzuki, comprises a nucleic acid sequence that binds under condi Candida takamatsuzukensis, Candida taliae, Candida tions of low stringency to a first sequence selected from the tammamiensis, Candida tanzawaensis, Candida tar group consisting of SEQID NO:39, SEQID NO:40, SEQ tarivorans, Candida temnochilae, Candida tenuis, Can ID NO: 42, SEQID NO:43, and SEQID NO: 56. dida tepae, Candida terraborum, Candida tetrigidarum, Embodiment 13. The substantially pure Candida host cell of Candida thaimueangensis, Candida thermophila, Can 10 embodiment 1, wherein the alcohol dehydrogenase gene dida tilneyi, Candida tolerans, Candida torresii, Candida encodes an amino acid sequence that has 100 percent tritonae, Candida tropicalis, Candida trypodendroni, sequence identity to a stretch of at least 100 contiguous Candida tsuchiyae, Candida tumulicola, Candida residues of any one of SEQID NO: 151, SEQID NO: 152, ubatubensis, Candida ulmi, Candida vaccinii, Candida 15 SEQID NO: 153, SEQID NO: 154, or SEQID NO:155. valdiviana, Candida vanderkliftii, Candida vanderwaltii, Embodiment 14. The substantially pure Candida host cell of Candida vartiovaarae, Candida versatilis, Candida vini, embodiment 13, wherein the alcohol dehydrogenase gene Candida viswanathi, Candida wickerhamii, Candida comprises a nucleic acid sequence that binds under condi wounanorum, Candida wyomingensis, Candida xylopsoci, tions of high Stringency to a first sequence selected from Candida yuchorum, Candida Zemplinina, or Candida zey the group consisting of SEQID NO:39, SEQID NO:40, lanoides. SEQID NO:42, SEQID NO: 43, and SEQID NO: 56. Embodiment 3. The substantially pure Candida host cell of Embodiment 15. The substantially pure Candida host cell of embodiment 2, wherein the substantially pure Candida embodiment 13, wherein the alcohol dehydrogenase gene host cell is genetically modified Candida tropicalis. comprises a nucleic acid sequence that binds under condi Embodiment 4. The substantially pure Candida host cell of 25 tions of moderate stringency to a first sequence selected embodiment 3, wherein the substantially pure Candida from the group consisting of SEQID NO:39, SEQID NO: host cell is selected from the group consisting of DP428, 40, SEQID NO:42, SEQID NO:43, and SEQID NO:56. DP522 and DP 527. Embodiment 16. The substantially pure Candida host cell of Embodiment 5. The substantially pure Candida host cell of embodiment 13, wherein the alcohol dehydrogenase gene embodiment 1, wherein the substantially pure Candida 30 comprises a nucleic acid sequence that binds under condi host cell is genetically modified Candida tropicalis and tions of low stringency to a first sequence selected from the wherein the alcohol dehydrogenase gene is selected from group consisting of SEQID NO:39, SEQID NO:40, SEQ the group consisting of ADH-A4, ADH-A4B, ADH-B4. ID NO: 42, SEQID NO:43, and SEQID NO: 56. ADH-B4B, ADH-A10, ADH-A10B, ADH-B11, and Embodiment 17. The substantially pure Candida host cell of ADH-B11B. 35 embodiment 1, wherein the alcohol dehydrogenase gene Embodiment 6. The substantially pure Candida host cell of encodes an amino acid sequence that comprises at least one embodiment 1, wherein the alcohol dehydrogenase gene peptide selected from the group consisting of SEQID NO: comprises a nucleic acid sequence that binds under condi 156, SEQID NO: 157, SEQIDNO: 158, SEQIDNO: 159, tions of high stringency to SEQID NO:39, SEQID NO: and SEQID NO: 160. 40, SEQID NO:42, SEQID NO:43, or SEQID NO: 56. 40 Embodiment 18. The substantially pure Candida host cell of Embodiment 7. The substantially pure Candida host cell of embodiment 1, wherein the alcohol dehydrogenase gene embodiment 1, wherein the alcohol dehydrogenase gene encodes an amino acid sequence that comprises at least two comprises a nucleic acid sequence that binds under condi peptides selected from the group consisting of SEQID NO: tions of moderate stringency to SEQID NO:39, SEQID 156, SEQID NO: 157, SEQIDNO: 158, SEQIDNO: 159, NO:40, SEQID NO:42, SEQID NO:43, or SEQID NO: 45 and SEQID NO: 160. 56. Embodiment 19. The substantially pure Candida host cell of Embodiment 8. The substantially pure Candida host cell of embodiment 1, wherein the alcohol dehydrogenase gene embodiment 1, wherein the alcohol dehydrogenase gene encodes an amino acid sequence that comprises at least comprises a nucleic acid sequence that binds under condi three peptides selected from the group consisting of SEQ tions of low stringency to SEQID NO:39, SEQID NO:40, 50 ID NO: 156, SEQID NO: 157, SEQID NO: 158, SEQID SEQID NO: 42, SEQID NO:43, or SEQID NO: 56. NO: 159, and SEQID NO: 160. Embodiment 9. The substantially pure Candida host cell of Embodiment 20. The substantially pure Candida host cell of embodiment 1, wherein the alcohol dehydrogenase gene embodiment 1, wherein the alcohol dehydrogenase gene encodes an amino acid sequence that has at least 90 percent encodes an amino acid sequence that comprises at least sequence identity to a stretch of at least 100 contiguous 55 four peptides selected from the group consisting of SEQID residues of any one of SEQID NO: 151, SEQID NO: 152, NO: 156, SEQID NO: 157, SEQIDNO: 158, SEQID NO: SEQ ID NO: 153, SEQID NO: 154, or SEQID NO:155. 159, and SEQID NO: 160. Embodiment 10. The substantially pure Candida host cell of Embodiment 21. The substantially pure Candida host cell of embodiment 9, wherein the alcohol dehydrogenase gene embodiment 1, wherein the one or more genetic modifica comprises a nucleic acid sequence that binds under condi 60 tions in the first genetic modification class cause the alco tions of high Stringency to a first sequence selected from holdehydrogenase to have decreased function relative to the group consisting of SEQID NO:39, SEQID NO: 40, the function of the wild-type counterpart, be nonfunc SEQID NO: 42, SEQID NO:43, and SEQID NO: 56. tional, or have a modified activity spectrum relative to an Embodiment 11. The substantially pure Candida host cell of activity spectrum of the wild-type counterpart. embodiment 9, wherein the alcohol dehydrogenase gene 65 Embodiment 22. The substantially pure Candida host cell of comprises a nucleic acid sequence that binds under condi embodiment 1 that further comprises a second genetic tions of moderate stringency to a first sequence selected modification class, wherein the second genetic modifica US 9,359,581 B2 137 138 tion class comprises an insertion of a first gene into the wherein the first gene does not naturally occur in the Sub Candida host cell genome; wherein the first gene encodes stantially pure Candida host cell. a protein that is not identical to a naturally occurring pro Embodiment 31. The substantially pure Candida host cell of tein in the substantially pure Candida host cell, or embodiment 22, wherein the first gene is encoded by a a protein that is identical to a naturally occurring protein in 5 nucleic acid that binds under conditions of moderate strin the substantially pure Candida host cell, but expression gency to a nucleic acid that encodes a gene listed in Table of the gene is controlled by a promoter that is different 4, and wherein the first gene does not naturally occur in the from the promoter that controls the expression of the substantially pure Candida host cell. naturally occurring protein. Embodiment 32. The substantially pure Candida host cell of Embodiment 23. The substantially pure Candida host cell of 10 embodiment 22, wherein the first gene is encoded by a embodiment 22, wherein the first gene encodes a desatu nucleic acid that binds under conditions of low stringency rase, a lipase, a fatty alcohol oxidase, an alcohol dehydro to a nucleic acid that encodes a gene listed in Table 4, and genase, a glycosyltransferase, a cytochrome P450, a cel wherein the first gene does not naturally occur in the Sub lulose, an exoglucanase, a cellobiohydrolase, an stantially pure Candida host cell. endoglucanase, a B-glucosidase, an O-amylase, a B-amy 15 Embodiment 33. The substantially pure Candida host cell of lase, a Y-amylases, a glucoamylase, a maltogenase, a pull embodiment 22, wherein the promoteris an isocitrate lyase lanase, an endo-B-Xylanase, an O-glucuronidase, an O-ara promoter, a cytochrome P450 promoter, a fatty alcohol binofuranosidase, a 3-Xylosidase, a B-mannanase, a oxidase promoter oran alcoholdehydrogenase promoterin B-mannosidase, a pectin lyase, an endo-polygalacturonase, the Candida host cell genome. an C-arabinofuranosidase, an O-galactosidase, a polymeth Embodiment 34. The substantially pure Candida host cell of ylgalacturonase, a pectin depolymerase, a pectinase, an embodiment 33, wherein the promoter is an isocitrate lyase exopolygalacturanosidase hydrolase, an C-L-Rhamnosi promoter. dase, an O-L-Arabinofuranosidase, a polymethylgalactur Embodiment 35. The substantially pure Candida host cell of onate lyase, a polygalacturonate lyase, an exopolygalactu embodiment 1 that further comprises a third genetic modi ronate lyase, a peroxidase, a copper radical oxidase, an 25 fication class, wherein the third genetic modification class FAD-dependent oxidase, a multicopper oxidase, a lignin comprises one or more genetic modifications in the Can peroxidase or a manganese peroxidase that is dida host cell genome that collectively or individually dis not identical to a naturally occurring protein in the Substan rupt tially pure Candida host cell; or the B-Oxidation pathway; or identical to a naturally occurring protein in the Substan 30 a gene selected from the group consisting of a CYP52A tially pure Candida host cell, but expression of the gene type cytochrome P450 and a fatty alcohol oxidase. is controlled by a promoter that is different from the Embodiment 36. A method of using a genetically modified promoter that controls the expression of the naturally Candida cell for the biotransformation of a substrate to a occurring protein. product, wherein the genetically modified Candida cell is Embodiment 24. The substantially pure Candida host cell of 35 characterized by a first genetic modification class that com embodiment 22, wherein the first gene encodes a cyto prises one or more genetic modifications that collectively chrome P450 that is not identical to a naturally occurring orindividually disrupt an alcoholdehydrogenase gene; and cytochrome P450 in the substantially pure Candida host the method comprises fermenting the genetically modified cell. Candida cell in a culture medium comprising a nitrogen Embodiment 25. The substantially pure Candida host cell of 40 Source and a carbon Source. embodiment 22, wherein the first gene is a gene listed in Embodiment 37. The method of embodiment 36, wherein the Table 4 other than a gene that naturally occurs in the sub culture medium further comprises the substrate. stantially pure Candida host cell. Embodiment 38. The method of embodiment 36, wherein the Embodiment 26. The substantially pure Candida host cell of genetically modified Candida cell is genetically modified embodiment 22, wherein the first gene has at least 40 45 Candida glabrata, Candida zeylenoides, Candida lipoly percent sequence identity to a gene listed in Table 4, and tica, Candida guillermondii, Candida aaseri, Candida wherein the first gene does not naturally occur in the Sub abiesophila, Candida africana, Candida aglyptinia, Can stantially pure Candida host cell. dida agrestis, Candida akabanensis, Candida alai, Can Embodiment 27. The substantially pure Candida host cell of dida albicans, Candida alimentaria, Candida amapae, embodiment 22, wherein the first gene has at least 60 50 Candida ambrosiae, Candida amphixiae, Candida anato percent sequence identity to a gene listed in Table 4, and miae, Candida ancudensis, Candida anglica, Candida wherein the first gene does not naturally occur in the Sub anneliseae, Candida antarctica, Candida antillancae, stantially pure Candida host cell. Candida anutae, Candida apicola, Candida apis, Candida Embodiment 28. The substantially pure Candida host cell of arabinofermentans, Candida arcana, Candida ascalaphi embodiment 22, wherein the first gene has at least 80 55 darum, Candida asparagi, Candida atakaporum, Candida percent sequence identity to a gene listed in Table 4, and atbi, Candida athensensis, Candida atlantica, Candida wherein the first gene does not naturally occur in the Sub atmosphaerica, Candida auringiensis, Candida auris, stantially pure Candida host cell. Candida aurita, Candida austromarina, Candida azyna, Embodiment 29. The substantially pure Candida host cell of Candida azymoides, Candida barrocoloradensis, Candida embodiment 22, wherein the first gene has at least 95 60 batistae, Candida beechii, Candida bentonensis, Candida percent sequence identity to a gene listed in Table 4, and bertae, Candida berthetii, Candida bituminiphila, Can wherein the first gene does not naturally occur in the Sub dida blankii, Candida blattae, Candida blattariae, Can stantially pure Candida host cell. dida bOhiensis, Candida boidinii, Candida bokatorum, Embodiment 30. The substantially pure Candida host cell of Candida boleticola, Candida bolitotheri, Candida bombi, embodiment 22, wherein the first gene is encoded by a 65 Candida bombiphila, Candida bondarzewiae, Candida nucleic acid that binds under conditions of high Stringency bracarensis, Candida bribrorum, Candida bromeli to a nucleic acid that encodes a gene listed in Table 4, and acearum, Candida buenavistaensis, Candida buinensis, US 9,359,581 B2 139 140 Candida butyri, Candida Californica, Candida Canber dida phangingensis, Candida picachoensis, Candida raensis, Candida cariosilignicola, Candida carpophila, piceae, Candida picinguabensis, Candida pignaliae, Can Candida carvicola, Candida caseinolytica, Candida cas dida pimensis, Candida pini, Candida plutei, Candida trensis, Candida catenulata, Candida cellae, Candida cel pomicola, Candida ponderosae, Candida populi, Candida lulolytica, Candida cerambycidarum, Candida powellii, Candida prunicola, Candida pseudogliaebosa, chauliodes, Candida chickasaworum, Candida chilensis, Candida pseudohaemulonii, Candida pseudointermedia, Candida choctaworum, Candida chodatii, Candida chry Candida pseudolambica, Candida pseudorhagii, Candida somelidarum, Candida cidri, Candida cloacae, Candida pseudovanderkliftii, Candida psychrophila, Candida coipomoensis, Candida conglobata, Candida corydali, pyralidae, Candida qinlingensis, Candida quercitrusa, Candida cylindracea, Candida davenportii, Candida davi 10 Candida quercuum, Candida railenensis, Candida ralun siana, Candida deformans, Candida dendrica, Candida ensis, Candida rancensis, Candida restingae, Candida dendronema, Candida derodonti, Candida diddensiae, rhagii, Candida riodocensis, Candida rugopelliculosa, Candida digboiensis, Candida diospyri, Candida diversa, Candida rugosa, Candida Sagamina, Candida Saitoana, Candida dosseyi, Candida drimydis, Candida drosophilae, Candida sake, Candida Salmanticensis, Candida Santama Candida dubliniensis, Candida easanensis, Candida 15 riae, Candida Santjacobensis, Candida Saopaulonensis, edaphicus, Candidaedax, Candida elateridarum, Candida Candida Savonica, Candida Schatavi, Candida Sequanen emberorum, Candida endomychidarum, Candida ento sis, Candida Sergipensis, Candida Shehatae, Candida Sil mophila, Candida ergastensis, Candida ermobii, Candida vae, Candida Silvanorum, Candida Silvatica, Candida Sil etchellsii, Candida ethanolica, Candida famata, Candida vicola, Candida Silvicultrix, Candida Sinolaborantium, fennica, Candida fermenticarens, Candida flocculosa, Candida Sithepensis, Candida Smithsonii, Candida sojae, Candida floricola, Candida floris, Candida fiosculorum, Candida Solani, Candida songkhlaensis, Candida Sono Candida fluviatilis, Candida fragi, Candida freyschlussii, rensis, Candida Sophiae-reginae, Candida Sorbophila, Candida friedrichii, Candida frijolesensis, Candida fruc Candida SOrbosivorans, Candida Sorboxylosa, Candida tus, Candida fukazawae, Candida fingicola, Candida spandovensis, Candida Steatolytica, Candida Stellata, galacta, Candida galis, Candida galli, Candida gatunen 25 Candida Stellimalicola, Candida Stri, Candida Subhashii, sis, Candida gelsemii, Candida geochares, Candida ger Candida succiphila, Candida Suecica, Candida Suzuki, manica, Candidaghanaensis, Candida gigantensis, Can Candida takamatsuzukensis, Candida taliae, Candida dida glaebosa, Candida glucosophila, Candida tammamiensis, Candida tanzawaensis, Candida tar glycerinogenes, Candida gorgasii, Candida gotoi, Can tarivorans, Candida temnochilae, Candida tenuis, Can dida gropengiesseri, Candida guaymorum, Candida 30 dida tepae, Candida terraborum, Candida tetrigidarum, haemulonii, Candida halonitratophila, Candida halo Candida thaimueangensis, Candida thermophila, Can phila, Candida hasegawae, Candida hawaiiana, Candida dida tilneyi, Candida tolerans, Candida torresii, Candida heliconiae, Candida hispaniensis, Candida homilentoma, tritonae, Candida tropicalis, Candida trypodendroni, Candida humicola, Candida humilis, Candida hungarica, Candida tsuchiyae, Candida tumulicola, Candida Candida hyderabadensis, Candida incommunis, Candida 35 ubatubensis, Candida ulmi, Candida vaccinii, Candida inconspicua, Candida insectalens, Candida insectamans, valdiviana, Candida vanderkliftii, Candida vanderwaltii, Candida insectorum, Candida intermedia, Candida ipo Candida vartiovaarae, Candida versatilis, Candida vini, moeae, Candida ishiwadae, Candida jaroonii, Candida Candida viswanathi, Candida wickerhamii, Candida jeffriesii, Candida kanchanaburiensis, Candida wounanorum, Candida wyomingensis, Candida xylopsoci, karawaiiewii, Candida kashinagacola, Candida kazuoi, 40 Candida yuchorum, Candida Zemplimina, or Candida zey Candida khmerensis, Candida kipukae, Candida kofuen lanoides. sis, Candida krabiensis, Candida kruisii, Candida Embodiment 39. The method of embodiment 36, wherein the kunorum, Candida labiduridarum, Candida lactis-con genetically modified Candida cell is genetically modified densi, Candida lassenensis, Candida laureliae, Candida Candida tropicalis. leandrae, Candida lessepsii, Candida lignicola, Candida 45 Embodiment 40. The method of embodiment 36, wherein the litsaeae, Candida lit.seae, Candida lanquihuensis, Can genetically modified Candida cell is genetically modified dida lycoperdinae, Candida lyxosophila, Candida mag Candida tropicalis and wherein the alcoholdehydrogenase nifica, Candida magnoliae, Candida maltosa, Candida is selected from the group consisting of ADH-A4, ADH mannitofaciens, Candida maris, Candida maritima, Can A4B, ADH-B4, ADH-B4B, ADH-A10, ADH-A10B, dida maxii, Candida melibiosica, Candida membranifa 50 ADH-B11, and ADH-B11B. ciens, Candida mesenterica, Candida metapsilosis, Can Embodiment 41. The method of embodiment 36, wherein the dida methanolophaga, Candida methanolovescens, alcohol dehydrogenase gene comprises a nucleic acid Candida methanosorbosa, Candida methylica, Candida sequence that binds under conditions of high Stringency to michaelii, Candida mogii, Candida montana, Candida SEQID NO:39, SEQID NO:40, SEQID NO:42, SEQID multigeminis, Candida mycetangii, Candida naeodendra, 55 NO: 43, or SEQID NO: 56. Candida nakhonratchasimensis, Candida nanaspora, Embodiment 42. The method of embodiment 36, wherein the Candida natalensis, Candida neerlandica, Candida memo alcohol dehydrogenase gene comprises a nucleic acid dendra, Candida nitrativorans, Candida nitratophila, sequence that binds under conditions of moderate strin Candida nivariensis, Candida nodaensis, Candida nor gency to SEQID NO:39, SEQID NO:40, SEQID NO:42, vegica, Candida novakii, Candida Odintsovae, Candida 60 SEQID NO:43, or SEQ ID NO: 56. Oleophila, Candida Ontarioensis, Candida Ooitensis, Can Embodiment 43. The method of embodiment 36, wherein the dida Orba, Candida Oregonensis, Candida Orthopsilosis, alcohol dehydrogenase gene comprises a nucleic acid Candida Ortonii, Candida Ovalis, Candida pallodes, Can sequence that binds under conditions of low stringency to dida palmioleophila, Candida paludigena, Candida pana SEQID NO:39, SEQID NO:40, SEQID NO:42, SEQID mensis, Candida panamericana, Candida parapsilosis, 65 NO: 43, or SEQID NO: 56. Candida pararugosa, Candida pattaniensis, Candida Embodiment 44. The method of embodiment 36, wherein the peltata, Candida peoriaensis, Candida petrohuensis, Can alcohol dehydrogenase gene encodes an amino acid US 9,359,581 B2 141 142 sequence that has at least 90 percent sequence identity to a from the group consisting of SEQ ID NO: 156, SEQ ID stretch of at least 100 contiguous residues of any one of NO: 157, SEQID NO: 158, SEQID NO: 159, and SEQID SEQIDNO: 151, SEQIDNO: 152, SEQIDNO: 153, SEQ NO: 16O. ID NO: 154, or SEQ ID NO:155. Embodiment 56. The method of embodiment 36, wherein the Embodiment 45. The method of embodiment 44, wherein the one or more genetic modifications in the first genetic modi alcohol dehydrogenase gene comprises a nucleic acid fication class cause the alcohol dehydrogenase to have sequence that binds under conditions of high Stringency to decreased function relative to the function of the wild-type a first sequence selected from the group consisting of SEQ counterpart, be nonfunctional, or have a modified activity ID NO:39, SEQID NO:40, SEQID NO:42, SEQID NO: spectrum relative to an activity spectrum of the wild-type 10 counterpart. 43, and SEQID NO: 56. Embodiment 57. The method of embodiment 36, wherein the Embodiment 46. The method of embodiment 44, wherein the genetically modified Candida cell further comprises a sec alcohol dehydrogenase gene comprises a nucleic acid ond genetic modification class, wherein the second genetic sequence that binds under conditions of moderate strin modification class comprises an insertion of a first gene gency to a first sequence selected from the group consisting 15 into the Candida host cell genome; wherein the first gene of SEQID NO:39, SEQID NO:40, SEQID NO:42, SEQ encodes ID NO: 43, and SEQID NO:56. a protein that is not identical to a naturally occurring pro Embodiment 47. The method of embodiment 44, wherein the tein in the Candida host cell, or alcohol dehydrogenase gene comprises a nucleic acid a protein that is identical to a naturally occurring protein in sequence that binds under conditions of low stringency to the Candida host cell, but expression of the gene is a first sequence selected from the group consisting of SEQ controlled by a promoter that is different from the pro ID NO:39, SEQID NO:40, SEQID NO:42, SEQID NO: moter that controls the expression of the naturally occur 43, and SEQID NO: 56. ring protein. Embodiment 48. The method of embodiment 36, wherein the Embodiment 58. The method of embodiment 57, wherein the alcohol dehydrogenase gene encodes an amino acid 25 first gene encodes a desaturase, a lipase, a fatty alcohol sequence that has 100 percent sequence identity to a stretch oxidase, an alcoholdehydrogenase, a glycosyltransferase, of at least 100 contiguous residues of any one of SEQID a cytochrome P450, a cellulose, an exoglucanase, a cello NO: 151, SEQID NO: 152, SEQIDNO: 153, SEQID NO: biohydrolase, an endoglucanase, a B-glucosidase, an 154, or SEQID NO:155. C.-amylase, a f-amylase, a Y-amylases, a glucoamylase, a 30 maltogenase, a pullanase, an endo-B-Xylanase, an O-glu Embodiment 49. The method of embodiment 48, wherein the curonidase, an O-arabinofuranosidase, a B-Xylosidase, a alcohol dehydrogenase gene comprises a nucleic acid B-mannanase, a B-mannosidase, a pectin lyase, an endo sequence that binds under conditions of high Stringency to polygalacturonase, an O-arabinofuranosidase, an O-galac a first sequence selected from the group consisting of SEQ tosidase, a polymethylgalacturonase, a pectin depoly ID NO:39, SEQID NO:40, SEQID NO:42, SEQID NO: 35 merase, a pectinase, an exopolygalacturanosidase 43, and SEQID NO: 56. hydrolase, an O-L-Rhamnosidase, an C-L-Arabinofura Embodiment 50. The method of embodiment 48, wherein the nosidase, a polymethylgalacturonate lyase, a polygalactu alcohol dehydrogenase gene comprises a nucleic acid ronate lyase, an exopolygalacturonate lyase, a peroxidase, sequence that binds under conditions of moderate strin a copper radical oxidase, an FAD-dependent oxidase, a gency to a first sequence selected from the group consisting 40 multicopper oxidase, a lignin peroxidase or a manganese of SEQID NO:39, SEQID NO:40, SEQID NO:42, SEQ peroxidase that is not identical to a naturally occurring ID NO: 43, and SEQID NO:56. protein in the Candida host cellor is identical to a naturally Embodiment 51. The method of embodiment 48, wherein the occurring protein in the Candida host cell, but expression alcohol dehydrogenase gene comprises a nucleic acid of the gene is controlled by a promoter that is different from sequence that binds under conditions of low stringency to 45 the promoter that controls the expression of the naturally a first sequence selected from the group consisting of SEQ occurring protein. ID NO:39, SEQID NO:40, SEQID NO:42, SEQID NO: Embodiment 59. The method of embodiment 57, wherein the 43, and SEQID NO: 56. first gene encodes a cytochrome P450 that is not identical Embodiment 52. The method of embodiment 36, wherein the to a naturally occurring cytochrome P450 in the Candida alcohol dehydrogenase gene encodes an amino acid 50 host cell. sequence that comprises at least one peptide selected from Embodiment 60. The method of embodiment 57, wherein the the group consisting of SEQID NO: 156, SEQIDNO: 157, first gene is a gene listed in Table 4 other than a gene that SEQID NO: 158, SEQID NO: 159, and SEQID NO: 160. naturally occurs in the Candida host cell. Embodiment 53. The method of embodiment 36, wherein the Embodiment 61. The method of embodiment 57, wherein the alcohol dehydrogenase gene encodes an amino acid 55 first gene has at least 40 percent sequence identity to a gene sequence that comprises at least two peptides selected from listed in Table 4, and wherein the first gene does not natu the group consisting of SEQID NO: 156, SEQIDNO: 157, rally occur in the Candida host cell. SEQID NO: 158, SEQID NO: 159, and SEQID NO: 160. Embodiment 62. The method of embodiment 57, wherein the Embodiment 54. The method of embodiment 36, wherein the first gene has at least 60 percent sequence identity to a gene alcohol dehydrogenase gene encodes an amino acid 60 listed in Table 4, and wherein the first gene does not natu sequence that comprises at least three peptides selected rally occur in the Candida host cell. from the group consisting of SEQ ID NO: 156, SEQ ID Embodiment 63. The method of embodiment 57, wherein the NO: 157, SEQID NO: 158, SEQID NO: 159, and SEQID first gene has at least 80 percent sequence identity to a gene NO: 16O. listed in Table 4, and wherein the first gene does not natu Embodiment 55. The method of embodiment 36, wherein the 65 rally occur in the Candida host cell. alcohol dehydrogenase gene encodes an amino acid Embodiment 64. The method of embodiment 57, wherein the sequence that comprises at least four peptides selected first gene has at least 95 percent sequence identity to a gene