US0093.99762B2

(12) United States Patent (10) Patent No.: US 9,399,762 B2 Far Well et al. (45) Date of Patent: Jul. 26, 2016

(54) METHODS AND SYSTEMS FOR 540,145 SULFMIDATION OR SULFOXMIDATION 2011 0196086 A1 8/2011 Matsushita et al. 2012/0237591 A1 9/2012 Cullis et al. OF ORGANIC MOLECULES FOREIGN PATENT DOCUMENTS (71) Applicant: The California Institute of Technology, Pasadena, CA (US) EP O 200 638 B1 4f1986 WO 2007 144599 A2 12/2007 (72) Inventors: Christopher C. Farwell, Thousand WO 2011, 1595.50 A3 12/2011 Oaks, CA (US); John A. McIntosh, OTHER PUBLICATIONS Pasadena, CA (US); Frances H. Arnold, Wang et al. Angewandte Chemie. Int. Ed., 2013, 52(33), 8661-65.* La Canada, CA (US) Mancheno et al. Organic Letters, 2006, 8(11).2349-52.* Adams, PD. et al., “PHENIX: a comprehensive Python-based system for macromolecular structure solution.” Acta Crystallogr., Sect. D, (73) Assignee: California Institute of Technology, Biol. Crystallogr., 2010, D66(2):213-221. Pasadena, CA (US) Ajikumar, P.K. et al., “Isoprenoid pathway recursor overproduction in , Science, 2010, 330:70-74. (*) Notice: Subject to any disclaimer, the term of this Atschul, S.F. et al., “Basic local alignment search tool.” J. Mol. Biol. patent is extended or adjusted under 35 1990, 215(3): 403-10. U.S.C. 154(b) by 0 days. Bergman, R.G., “Organometallic chemistry: C-H activiation.” Nature, 2007, 446(7134):391-393. Bloom, J.D. et al., “Protein stability promotes evolvability.” Proc. (21) Appl. No.: 14/625,514 Natl. Acad. Sci. USA, 2006, 103(15):5869-5874. Bornscheur, U.T. and R.J. Kazlauskas, “Reaction specificity of (22) Filed: Feb. 18, 2015 : Catalytic promiscuity in biocatalysis: Using old enzymes to form new bonds and follow new pathways.” Angew. Chem. Int. (65) Prior Publication Data Ed., 2004, 43(45):6032-6040. Boyce, M. and C.R. Bertozzi. "Bringing chemistry to life.” Nat. US 2015/O232814 A1 Aug. 20, 2015 Methods, 2011, 8(8):638-642. Breslow, R., "Biomimetic chemistry: Biology as an inspiration.” J. Biol. Chem., 2009, 284(3): 1337-1342. Chen, M.S. and M.C. White, "Apredictably selective aliphatic C-H Related U.S. Application Data oxidation reaction for complex molecule synthesis.” Science, 2007. (60) Provisional application No. 61/941,197, filed on Feb. 318(5851):783-787. Cirino, P.C. and F.H. Arnold, "A self-sufficient perioxide-driven 18, 2014, provisional application No. 61/976,927, hydroxylation biocatalyst.” Angew. Chem. Int. Ed., 2003, filed on Apr. 8, 2014. 42(28):3299-3301. Clark, J.P. et al., “The role of Thr268 and Phe393 in cytochrome P450 (51) Int. Cl. BM3.” J. Inorg. Biochem., 2006, 100(5-6): 1075-1090. CI2N 9/02 (2006.01) Coelho, P.S. et al., “A serine-substituted P450 catalyzes highly effi CI2PI3/00 (2006.01) cient carbene transfer to olefins in vivo.” Nat. Chem. Biol., 2013, C07C33/24 (2006.01) 9(8):485-487. Coelho, P.S. et al., “Olefin Cyclopropanation via Carbene Transfer (52) U.S. Cl. Catalyzed by Engineered Cytochrome P450 Enzymes.” Science, CPC ...... CI2N 9/0042 (2013.01); CI2P 13/00 2013, 339(6117):307-310. (2013.01): CI2Y 106/02004 (2013.01) Davies, H.M.L. and J.R. Manning, “Catalytic C-H functionaliza (58) Field of Classification Search tion by metal carbenoid and nitrenoid insertion.” Nature, 2008, 451(7177), 417-424. CPC ...... C12N 9/0042; C12P 13/00; C12Y Davies, H.M.L. and R.E.J. Beckwith, “Catalytic Enantioselective 106/02004 C–H Activation by Means of Metal-Carbenoid-Induced C–H See application file for complete search history. Insertion.” Chem. Rev. 2003, 103 (8), pp. 2861-2904. Donaldson, W.A., “Synthesis of cyclopropane containing natural (56) References Cited products.” Tetrahedron, 2001, 57(41): 8589-8627. Dunford, A.J. et al., “Probing the molecular determinants of coen U.S. PATENT DOCUMENTS zyme selectivity in the P450 BM3 FAD/NADPH domain.” Biochim Biophys Acta., 2009, 1794(8): 1181-1189. 3,965,204 A 6, 1976 Lukas et al. Emsley, P. and K. Cowtan, "Coot: model-building tools for molecular 5,296,595 A 3/1994 Doyle graphics.” Acta Crystallogr., Sect. D., Biol. Crystallogr, 2004, 5,703,246 A 12/1997 Aggarwal et al. D60(12, Pt 1):2126-2132. 7,267,949 B2 9, 2007 Richards et al. Evans, P. "Scaling and assessment of data quality.” Acta Crystallogr., 7.625,642 B2 12/2009 Matsutani et al. Sect. D, Biol. Crystallogr., 2006, D62(1): 72-82. 7,662.969 B2 2/2010 Doyle 7,863,030 B2 1/2011 Arnold et al. (Continued) 8,247.430 B2 8, 2012 Yuan 8,993,262 B2 * 3/2015 Coelho ...... C12P 7/62 Primary Examiner — Shailendra Kumar 435/119 (74) Attorney, Agent, or Firm — Joseph R. Baker, Jr.; 2006/0030718 A1 2/2006 Zhang et al. Gavrilovich, Dodd & Lindsey LLP 2006/011 1347 A1 5/2006 Askew, Jr. et al. 2007/0276013 A1 1 1/2007 Ebbinghaus et al. (57) ABSTRACT 2009/0238790 A2 9/2009 Sommadossi et al. The disclosure generally relates to the fields of synthetic 2010.00568.06 A1 3/2010 Warren organic chemistry. In particular, the present disclosure relates 2010, 0168463 A1 7, 2010 Hirata et al. 2010/02401.06 A1 9/2010 Wong et al. to methods and systems for the imidation of sulfides. 2011/01 12288 A1* 5/2011 Zhang ...... B01J 31, 1815 20 Claims, 27 Drawing Sheets US 9,399,762 B2 Page 2

(56) References Cited Omura, T. and R. Sato, “The carbon monoxide-binding pigment of liver microsomes. I. Evidence for its hemoprotein nature.” J. Biol. OTHER PUBLICATIONS Chem., 1964, 239(7):2370-8. Ost, T.W.B. et al., “Phenylalanine 393 exerts thermodynamic control Girvan et al., “Glutamate-haem ester bond formation is disfavoured over the heme of flavocytochrome P450 BM3.” Biochemistry, 2001, in flavocytochrome P450 BM3: characterization of glutamate substi 40(45): 13421-13429. tution mutants at the haem site of P450 BM3.’ Biochemical Journal, Otey, C.R. et al., “Structure-guided recombination creates an artifi 2010,455-466. cial family of cytochromes P450.” PLoS Biol., 2006, 4(5) 189-798. Groves, J.T., “The bioinorganic chemistry of iron in and Perera, R. et al., “Molecular basis for the inability of an oxygenatom Supramolecular assemblies.” PNAS, Apr. 1, 2003, vol. 100, No. 7. donor ligand to replace the natural Sulfur donor heme axial ligand in 3569-3574. cytochrome P450 catalysis. Spectroscopic characterization of the Haines, D.C. et al., “Pivotal Role of Water in the Mechanism of Cys436Ser CYP2B4 mutant.” Arch. Biochem. Biophys., 2011, 507(1): 119-125. P450BM-3+.” Biochemistry, 2001, 40: 13456-13465. Preissner, S. et al., “SuperCYP: a comprehensive database on Hiraga, K. and F.H. Arnold, “General Method for Sequence-indepen Cytochrome P450 enzymes including a tool for analysis of CYP-drug dent Site-directed Chimeragenesis,” J. Mol. Biol. (2003) 330, 287 interactions.” Nucleic Acids Res., 2010, 38(Database issue):D237 296. 43. Hittinger, K., "Semi-synthetic proteins for catalytic and analytical Raphael, A.L. and H.B. Gray, 'Semisynthesis of axial-ligand (posi applications,” in Chemistry and Biochemistry, May 2009, Georgia tion 80) mutants of cytochrome c.” J. Am. Chem. Soc., 1991, Institute of Technology. 113(3):1038-40. Hyster, T.K. et al., “Biotinylated Rh(III) Complex in Engineered Reedy, C.J. et al., “Development of a heme protein structure-electro Streptavidin for Accelerated Asymmetric C–H Activation.” Sci chemical function database.” Nucleic Acids Res., 2008, 36(Database Iss.): D307-D313. ence, 2012, 338(6106):500-503. Rosenberg, M.L. et al., “Highly cis-Selective Cyclopropanations Isin, E.M and F.P. Guengerich, "Complex reactions catalyzed by with Ethyl Diazoacetate Using a Novel Rh(I) Catalyst with a Chelat cytochrome P450 enzymes.” Biochimica Biophysica Acta Gen. ing N-Heterocyclic Iminocarbene Ligand.” Org. Lett., 2009, 11(3), Subj, Mar. 2007, vol. 1770, Issue 3, pp. 314-329. 547-50. Kabsch, W., “Integration, Scaling, space-group assignment and post Ruppel, J.V. et al., "Cobalt-Catalyzed Intramolecular C H Amina refinement.” Acta Crystallogr., Sect. D, Biol. Crystallogr., Sect. D. tion with Arylsulfonyl Azides.”Org. Lett. 2007, 9(23):4889-4892. Biol. Crystallogr., 2010, D66(2):133-144. Setsune, J.I. and D. Dolphin, “Organometallic aspects of cytochrome Kataoka, M. et al., “Novel bioreduction system for the production of P-450 metabolism.” Can. J. Chem., 1987, 65(3):459-67. chiral alcohols.” Appl. Microbiol. Biotechnol. 2003, 62(5-6):437-45. Siegel, J.B. et al., “Computational design of an catalyst for a Lebel, H. et al., “Stereoselective cyclopropanation reactions', Chem. stereoselective biomolecular Diels-Alder reaction.” Science, 2010, Rev., 2003, 103(4): 977-1050. 329(5989):309-313. Lewis, J.C. and F.H. Arnold, “Catalysts on demand: selective oxida Sirim, D. et al., “The Cytochrome P450 Engineering Database: inte tions by laboratory-evolved cytochrome P450 BM3.” Chimia, 2009, gration of biochemical properties.” BMC Biochemistry 2009, 10:27. 63(6):309-312. Vagin, A. and A. Teplyakov, “MOLREP: an automated program for Lewis, J.C. et al., "Chemoenzymatic elaboration of monosaccharides molecular replacement.” J. Appl. Cryst., 1997, 30(6):1022-1025. using engineered cytochrome P450BM3 demethylases.” PNAS, Vatsis, K.P. et al., “Replacement of active-site cysteine-436 by serine 2009, 106(39), 16550-16555. converts cytochrome P450 2B4 into an NADPH oxidase with negli Lewis, J.C. et al., “Combinatorial Alanine Substitution Enables gible monooxygenase activity,” Journal of Inorganic Biochemistry, Rapid Optimization of Cytochrome P450BM3 for Selective 2002, 91(4): 542-553. Hydroxylation of Large Substrates.” Chembiochem, 2010, 11(18), Westfall, P.J. et al., “Production of amorphadiene in yeast, and its 2502-05. conversion to dihydroartemisinic acid, precursor to the antimalarial Mansuy, D. et al., “Reaction of carbon tetrachloride with 5,10,15,20 agent artemisinin.” Proc. Natl. Acad. Sci. USA, 2012, 109(3):El 11 tetraphenyl-porphinatoiron(II) (TPP)Fell: evidence for the forma E-118. tion of the carbene complex (TPP)Fell(CCI2).” J. Chem. Soc., Whitehouse, C.J.C. et al., “450BM3 (CYP102A1): connecting the Chem. Commun., 1977, 648-649. dots.” Chem. Soc. Rev., 2012, 41(3):1218-1260. Meinhold, P. et al., “Engineering Cytochrome P450 BM3 for Termi Wolf, J.R. et al., “Shape and stereoselective cyclopropanation of nal Alkane Hydroxylation.” Adv. Synth. Catal. 2006, 348, 763-772. alkenes catalyzed by iron porphyrins,” J. Am. Chem. Soc., 1995, Morandi, B., and E.M. Carreira, “Iron-catalyzed cyclopropanation in 117(36):9 194-9. 6 M KOH with in situ generation of diazomethane.” Science, 2012, Wuttke, D.S. and H.B. Gray, “Protein engineering as a tool for under 335(6075): 1471-1474. standing electron transfer.” Curr. Opin. Struct. Biol., 1993.3(4):555 Nakagawa S. et al., “Construction of Deficient Escherichia 563. coli Strains for the Production of Uricase,” Biosci. Biotechnol. Yeom et al., “The Role of Thr268 in Oxygen Activation of Biochem., 1996, 60(3): 415-20. Cytochrome P450BM3.” Biochemistry, 1995, vol. 34, pp. 14733 Narhi, L.O. and A.J. Fulco, “Characterization of a catalytically self 14740. sufficient 119,000-dalton cytochrome P-450 monooxygenase Yoshioka, S. et al., Roles of the Proximal Hydrogen Bonding Net induced by barbiturates in Bacillus megaterium”. J. Biol. Chem. work in Cytochrome P450cam-Catalyzed oxygenation, J. Am. 1986, 261(16): 7160-9. Chem. Soc., 2002, 124 (49), pp. 14571-14579. Nelson, D.R., “The cytochrome P450 homepage.” Hum. Genomics, 2009, 4(1):59-65. * cited by examiner U.S. Patent Jul. 26, 2016 Sheet 1 of 27 US 9,399,762 B2

A) CF. A 9, . fixO -S-SS O S. Enz S. Enz -> || /2 R Sulfoxide

B) N S N w R N R N '" N é O NR N S. s O Enz NN2 E EnzA R^A 2 r. ?s.(1 Sulfillimine Sulfoximine FIGURE A-B

i:it: tisticianin:

y x 24x . .ist 8 & 33

FIGURE 2 U.S. Patent Jul. 26, 2016 Sheet 2 of 27 US 9,399,762 B2

precitactive Fery e - city:

N-SSN MeO

FIGURE 4

U.S. Patent Jul. 26, 2016 Sheet 4 of 27 US 9,399,762 B2

: 93 PrioAceir og a a &

8, 88-i R, N is

FIGURE 6 U.S. Patent Jul. 26, 2016 Sheet 5 Of 27 US 9,399,762 B2

Ex2y:33e Nis S 8. S. s is: 8 & S. Y. O -- cr" alsoR. 4 assis SeX g is Or". s 83

: ; $20 -1

8 : u P.B.3.SSS 333333-33-38 x 33333... s

Eass: (Bia: U.S. Patent Jul. 26, 2016 Sheet 6 of 27 US 9,399,762 B2

S.

~~~~~~--~~~~~~~~~--~~~~~~~~--~~~~~~~)--~~~~~~~;~~~~~~~~~;~~~~~~~~~,~~~~~~~~~|~~~~~~--~~~~--~~~~--~~~~~--~~~~~~~~~- ~~~~~~~~~~~~~*~~~~~~~~~&~~~~~~~~*~~~~

…;…:…,~~~~~~~~~~~~~.~~~~~.~~~~~ s

ww.www.www.------~~~~~~}--~~~~~~~;~~~~~~~~~}--~~~~~~~~~~~~~~~*~~~~~~~~~}--~~~~~~~ ?.

aS w rison:censisterophistories S x-rriers r sersy: s's x-exiss's yrinserpentry's S. SS S. S s SS ..….….…...************************************** S w wS. was S. U.S. Patent Jul. 26, 2016 Sheet 7 Of 27 US 9,399,762 B2

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaai U.S. Patent US 9,399,762 B2

U.S. Patent Jul. 26, 2016 Sheet 9 Of 27 US 9,399,762 B2

& U.S. Patent Jul. 26, 2016 Sheet 10 of 27 US 9,399,762 B2

250 us-u1 & 8 Si. x -* x 8. - re-r -

ise sity FIGURE 12 i r: xiii. 3ride :

o ogosoooooooooooooooooooooooooooooooooo.o :8: 88: 88: 8: s: 88: aveiength nia

FIGURE 13 U.S. Patent Jul. 26, 2016 Sheet 11 of 27 US 9,399,762 B2

i.

8.8 : A YX: s 8

''''''''', 3 i3. 8.6

8.4

- 4. 3S SS 38 Waygieagti aay

EE 4 U.S. Patent Jul. 26, 2016 Sheet 12 of 27 US 9,399,762 B2

gS. as ; :8: Nis r.S. ?y Soala , 8 x', Meo \ Me-a- Meo N* .SW A ea 8a. ex;

3. s 8 it : i.2

s 1.8 x 8.8 3.8 8. 8.

3.3 kg e. 3 &g eg se Stside equivatests FIGURE 15 U.S. Patent Jul. 26, 2016 Sheet 13 Of 27 US 9,399,762 B2

E s

Sisy ticiitiis Sittita:88:8

FIGURE 16

{}s : s : 2. Catacentration injecteti (na

FIGURE 1 7 U.S. Patent Jul. 26, 2016 Sheet 14 of 27 US 9,399,762 B2

3.

: | s

x: ..s. 3. s Concentration injected its FIGURE 18

. s

&

Concentration injected (iii FIGURE 19 U.S. Patent Jul. 26, 2016 Sheet 15 Of 27 US 9,399,762 B2

y: (3.83... x

Cancentration injected ravi FIGURE 20 U.S. Patent Jul. 26, 2016 Sheet 16 of 27 US 9,399,762 B2

§§ U.S. Patent Jul. 26, 2016 Sheet 17 Of 27 US 9,399,762 B2

Z U.S. Patent Jul. 26, 2016 Sheet 18 of 27 US 9,399,762 B2

U.S. Patent Jul. 26, 2016 Sheet 19 Of 27 US 9,399,762 B2

$$$$$$$$$$ U.S. Patent Jul. 26, 2016 Sheet 20 Of 27 US 9,399,762 B2

***!!!

&&&&&&&&&&&&&&&&&&&&&&&&&& U.S. Patent Jul. 26, 2016 Sheet 21 of 27 US 9,399,762 B2

U.S. Patent Jul. 26, 2016 Sheet 22 of 27 US 9,399,762 B2

xx :* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *. : U.S. Patent Jul. 26, 2016 Sheet 23 Of 27 US 9,399,762 B2

U.S. Patent Jul. 26, 2016 Sheet 24 of 27 US 9,399,762 B2

U.S. Patent Jul. 26, 2016 Sheet 25 Of 27 US 9,399,762 B2

FIGURE 30 U.S. Patent Jul. 26, 2016 Sheet 26 of 27 US 9,399,762 B2

X------. U.S. Patent Jul. 26, 2016 Sheet 27 Of 27 US 9,399,762 B2

??¿ US 9,399,762 B2 1. 2 METHODS AND SYSTEMIS FOR enzyme variants comprising at least one or more amino acid SULFIMIDATION OR SULFOXMIDATION mutations therein that catalyze sulfoxidation and/or Sulfimi OF ORGANIC MOLECULES dation, making products described herein with high Stereose lectivity. In some embodiments, the heme enzyme variants of CROSS REFERENCE TO RELATED the disclosure have the ability to catalyze nitrene transfer APPLICATIONS reactions efficiently, display increased total turnover num bers, and/or demonstrate highly regio- and/or enantioselec This application claims priority to U.S. Provisional Appli tive formation compared to the corresponding wild cation No. 61/941,197, filed Feb. 18, 2014 and claims priority type enzymes. to U.S. Provisional Application No. 61/976,927, filed Apr. 8, 10 The disclosure provides a method for catalyzing the inter 2014, the disclosures of each of the foregoing application are molecular insertion of nitrogen into thioethers, Sulfur-organo incorporated herein by reference in their entirety. compounds or Sulfoxides to produce a product having a new S—N bond, the method comprising providing a nitrene STATEMENT REGARDING FEDERALLY Source, a thioether or Sulfoxide precursor and a heme enzyme SPONSORED RESEARCH 15 or an engineered heme enzyme; and allowing the reaction to proceed for a time Sufficient to form a product having a new This invention was made with government Support under S—N bond. In one embodiment, the nitrene source is an GM101792 awarded by the National Institutes of Health and azide. In a further embodiment, the azide has the general under NO0014-11-1-0205 awarded by the Office of Naval formula R' N, wherein R' is (i) a substituted or unsubsti Research. The government has certain rights in the invention. tuted aryl, a substitute or unsubstituted alkyl, OR, or —NR, wherein R is a substituted or unsubstituted aryl, a TECHNICAL FIELD substitute or unsubstituted alkyl; (ii)—SO.R, wherein Ra substituted or unsubstituted aryl, a substitute or unsubstituted The present disclosure generally relates to the fields of alkyl, —OR or NR, wherein Rare any alkyl or aryl; (iii) synthetic organic chemistry. In particular, the present disclo 25 —COR wherein R is a substituted or unsubstituted aryl, a sure relates to methods and systems for the imidation of substitute or unsubstituted alkyl, -OR or NR, wherein R sulfides. are any alkyl or aryl; or (iv) -P(O)(OR)(OR), wherein R BACKGROUND and R are independently H, a substituted or unsubstituted 30 aryl, a substitute or unsubstituted alkyl. In a further embodi Enzymes offer appealing alternatives to traditional chemi ment, the azide has a structure selected from the group con cal catalysts due to their ability to function in aqueous media sisting of at ambient temperature and pressure. In addition, the ability of enzymes to orient binding for defined regio- and O O Stereo-chemical outcomes is highly valuable. This property is 35 exemplified by the cytochrome P450 monooxygenase family \/ O O O of enzymes that catalyze insertion of oxygenatoms into unac YN, l s \/ s tivated C-H bonds (P. R. O. d. Montellano, Cytochrome col1 No N, 1 YN, P450: Structure, Mechanism and Biochemistry. Kluwer Aca demic/Plenum Publishers, New York, ed. 3rd Edition, 2005). 40 O O N N Cytochrome P450s catalyze monooxygenation with high Sn s s degrees of regio- and stereo-Selectivity, a property that makes N YOF OMe O. them attractive for use in chemical synthesis. This broad enzyme class is capable of oxygenating a wide variety of O O OR O organic molecules including aromatic compounds, fatty 45 acids, alkanes and alkenes. Diverse Substrate selectivity is a R1 ul N R.No1 Y YN, and R1Y NN, hallmark of this enzyme family and is exemplified in the natural world by their importance in natural product oxida wherein R' is any alkyl, aryl, -OR, NR, wherein R, R and tion as well as xenobiotic metabolism (F. P. Guengerich, Rare any alkyl, or aryl. In another embodiment, the nitrene Chem. Res. Toxicol. 14, 611 (2001)). Limitations to this 50 enzyme class in synthesis include their large size, need for Source is selected from the group consisting of expensive reducing equivalents (e.g., NADPH) and cellular distribution many cytochrome P450s are membrane bound and therefore difficult to handle (Montellano, Cytochrome O V/ X P450: Structure, Mechanism and Biochemistry. Kluwer Aca 55 ls O 1SN / demic/Plenum Publishers, New York, ed. 3rd Edition, 2005). X, R N Several soluble bacterial cytochrome P450s have been iso R1 N1H R ls NEOTS Na lated, however, that show excellent properties and behavior O O O O O OR for chemical synthesis and protein engineering applications. \/ 1nY4 \/ x R31 YN=OTs R' NH, No1 NNH^ SUMMARY O OR O OR The disclosure provides method and compositions com R! \/ R \/n prising one or more heme enzymes that catalyze the nitrene No1 YN=OTs and No1 NH2, transfer or insertion into an organosulfur compounds com 65 prising an —S— target site to form a new S-N bond. In wherein R' is any alkyl, aryl, OR, NR, wherein R, R and particular embodiments, the disclosure provides heme Rare any alkyl, or aryl. In another embodiment, the S N US 9,399,762 B2 3 4 containing product is an aliphatic amine and the nitrene pre In anotherembodiment of Formula 1a, R' is a carbonyl group. cursor is tosyl azide. In yet another embodiment, the product In a further embodiment, R is any alkyl or any aryl. In a is generated through a nitrenoid intermediate. In still yet further embodiment, R is O. another embodiment, the engineered heme enzyme is a cyto The disclosure also provides products made by any of the chrome P450 enzyme or a variant thereof. In a further 5 foregoing methods. embodiment, the cytochrome P450 enzyme is expressed in a Also provided is a reaction mixture comprising a nitrene bacterial, archaeal or fungal host organism. In yet another Source, a thioether or Sulfoxide Substrate and an engineered embodiment, the cytochrome P450 enzyme is a P450 BM3 heme enzyme for producing a product having a new S-N enzyme or a variant thereof. In a further embodiment, the bond. In one embodiment, the nitrene Source is an azide. In a 10 further embodiment, the azide has the general formula cytochrome P450 BM3 enzyme comprises the amino acid R' N, wherein R' is (i) a substituted or unsubstituted aryl, sequence set forth in SEQID NO: 1 or a variant thereof. In yet a substitute or unsubstituted alkyl, OR, or NR, wherein other embodiments of the foregoing the cytochrome P450 R’ is a substituted or unsubstituted aryl, a substitute or unsub enzyme variant comprises a mutation at the axial position of stituted alkyl; (ii) - SOR, wherein R a substituted or the heme coordination site. In a further embodiment, the 15 unsubstituted aryl, a substitute or unsubstituted alkyl, —OR mutation is an amino acid Substitution of Cys with a member or NR, wherein R are any alkyl or aryl; (iii) -COR selected from the group consisting of Ala, Asp, Arg, ASn, Glu, wherein R is a substituted or unsubstituted aryl, a substitute Gin, Gly. His, He, Lys, Leu, Met, Phe, Pro, Ser. Thr, Trp, Tyr or unsubstituted alkyl, -OR or NR, wherein R are any and Val at the axial position. In still a further embodiment, the alkyl or aryl; or (iv) -P(O)(OR)(OR), wherein RandR mutation is an amino acid substitution of Cys with Asp or Ser are independently H, a substituted or unsubstituted aryl, a at the axial position. In another embodiment of any of the substitute or unsubstituted alkyl. Inafurther embodiment, the foregoing, the P450 BM3 enzyme variant comprises at least azide has a structure selected from the group consisting of: one, two, three, four, five, six, seven, eight, nine, ten, eleven, twelve, or all thirteen of the following amino acid substitu tions in SEQID NO: 1: V78A, F87V, P142S, T175I, A184V, 25 V / O V / S S S226R, H236Q, E252G, T268A, A290V. L353V, I366V, and N E442K. In another embodiment of any of the foregoing the N3. --> s 1NN3 cytochrome P450 enzyme variant comprises a T268A muta s tion and/or a C400X mutation in SEQID NO: 1, wherein X is any amino acid other than Cys. In still another embodiment, 30 the cytochrome P450 enzyme variant comprises a T438S mutation and/or a C400X mutation in SEQID NO: 1, wherein X is any amino acid other than Cys. In yet another embodi F ment, the cytochrome P450 enzyme variant comprises a O O OR O O T268A mutation, a C400X mutation and a T438S mutation in 35 SEQID NO:1, wherein X is any amino acid other than Cys. In l R! \/ \/ another embodiment, the engineered heme enzyme com R1 N, No1 NN, and R11 YN, prises a fragment of the cytochrome P450 enzyme or variant thereof. In yet another embodiment, the engineered heme wherein R' is any alkyl, aryl, -OR, NR, wherein R, R and enzyme is a cytochrome P450 BM3 enzyme variant selected 40 Rare any alkyl, or aryl. In another embodiment, the nitrene from Table 4, Table 5 and Table 6. In yet another embodiment, Source is selected from the group consisting of the product is a compound of Formula la:

O O O R, 45 O \/ X N1 ls -X ls , R11 NN1, R N RI NEOTS Na R3 R2 O O O O OR 50 \/ \/ \/ wherein R' is a sulfoxide, a carbonyl or a phosphonate; R31 YNors. R111 YNH, No1 NNH1 wherein R is H or any alkyl or aryl; and wherein R is H, O O OR O OR or an optionally Substituted aryl group. In a further embodi Y4. Y4. ment, R' is a sulfoxide of formula SOR, wherein R any No1 YN=OTs and No1 YNH, alkyl, any aryl, -OR or NR", wherein R and R are any 55 alkyl or any aryl. In a further embodiment, R is any alkyl or aryl. In still a further embodiment, R is H. In yet another wherein R' is any alkyl, aryl, -OR, NR, wherein R, R and embodiment of Formula 1a R' is a phosphonate of formula Rare any alkyl, or aryl. In another embodiment, the S N P(O)(OR)(OR), wherein Rand Rare independently any containing product is an aliphatic amine and the nitrene pre aryl or any alkyl. In a further embodiment, R is any alkyl or 60 cursor is tosyl azide. In yet another embodiment, the product any aryl. In still a further embodiment, R is any alkyl or aryl. is generated through a nitrenoid intermediate. In still yet In yet another embodiment of Formula 1a, R' is a carbonyl another embodiment, the engineered heme enzyme is a cyto group. In a further embodiment, R is any alkylor any aryl. In chrome P450 enzyme or a variant thereof. In a further a further embodiment, R is any alkyl or any aryl. In another embodiment, the cytochrome P450 enzyme is expressed in a embodiment of Formula 1a, R is an optionally substituted 65 bacterial, archaeal or fungal host organism. In yet another aryl group. In a further embodiment, R' is any alkyl or any embodiment, the cytochrome P450 enzyme is a P450 BM3 aryl. Inafurther embodiment, R is Horany alkylor any aryl. enzyme or a variant thereof. In a further embodiment, the US 9,399,762 B2 5 6 cytochrome P450 BM3 enzyme comprises the amino acid substituent effects is greater for nitrene transfer, possibly sequence set forth in SEQID NO: 1 or a variant thereof. In yet owing to the less oxidizing nature of the presumed nitrenoid other embodiments of the foregoing the cytochrome P450 intermediate. enzyme variant comprises a mutation at the axial position of The details of one or more embodiments of the disclosure the heme coordination site. In a further embodiment, the 5 are set forth in the accompanying drawings and the descrip mutation is an amino acid Substitution of Cys with a member tion below. Other features, objects, and advantages will be selected from the group consisting of Ala, Asp, Arg, ASn, Glu, apparent from the description and drawings, and from the Gin, Gly. His, He, Lys, Leu, Met, Phe, Pro, Ser. Thr, Trp, Tyr claims. and Val at the axial position. In still a further embodiment, the mutation is an amino acid substitution of Cys with Asp or Ser 10 BRIEF DESCRIPTION OF THE DRAWINGS at the axial position. In another embodiment of any of the foregoing, the P450 BM3 enzyme variant comprises at least The accompanying drawings, which are incorporated into one, two, three, four, five, six, seven, eight, nine, ten, eleven, and constitute a part of this specification, illustrate one or twelve, or all thirteen of the following amino acid substitu 15 more embodiments of the present disclosure and, together tions in SEQID NO: 1: V78A, F87V, P142S, T175I, A184V, with the detailed description, serve to explain the principles S226R, H236Q, E252G, T268A, A290V. L353V, I366V, and and implementations of the disclosure. E442K. In another embodiment of any of the foregoing the FIG. 1A-B shows P450-catalyzed sulfoxidation and P450 cytochrome P450 enzyme variant comprises a T268A muta catalyzed sulfimination. (A) P450-catalyzed sulfoxidation, tion and/or a C400X mutation in SEQID NO: 1, wherein X is shown proceeding through compound I. This reaction can any amino acid other than Cys. In still another embodiment, also be mediated by compound 0 (hydroperoxy intermedi the cytochrome P450 enzyme variant comprises a T438S ate). (B) P411-catalyzed Sulfimination proceeding through a mutation and/or a C400X mutation in SEQID NO: 1, wherein nitrenoid intermediate formed from the azide with N as a X is any amino acid other than Cys. In yet another embodi byproduct. Subsequent oxidation of the sulfillimine can result ment, the cytochrome P450 enzyme variant comprises a 25 in Sulfoximine formation. T268A mutation, a C400X mutation and a T438S mutation in FIG.2 shows a plot of reaction rate versus Hammett param SEQID NO:1, wherein X is any amino acid other than Cys. In eter of substituted aryl sulfides in reactions using P411-CIS another embodiment, the engineered heme enzyme com enzyme and tosyl azide as nitrogen source. Data points are prises a fragment of the cytochrome P450 enzyme or variant labeled with aryl substituent and position (p-para, thereof. In yet another embodiment, the engineered heme 30 m--meta). enzyme is a cytochrome P450 BM3 enzyme variant selected FIG.3 shows proposed mechanisms of sulfimide (“produc from Table 4, Table 5 and Table 6. In yet another embodiment, tive') and sulfonamide (“unproductive') formation. the product is a compound of Formula la: FIG. 4 shows examples of azides tested as nitrene sources for sulfimidation of thioanisole using the P411-CIS 35 T438S enzyme. -R FIG. 5 shows examples of appropriate nitrene sources and N generalized reactions for the formation of Sulfillimines. S FIG. 6 shows examples of appropriate nitrene Sources and R2, 40 generalized reactions for the formation of Sulfoximines. FIG. 7 shows data used to determine initial rates of the wherein R' is a sulfoxide, a carbonyl or a phosphonate; reaction depicted in Scheme above the graph for the enzymes wherein R is H or any alkyl or aryl; and wherein R is H, O listed in figure legend. or an optionally Substituted aryl group. In a further embodi FIG. 8A-D shows traces of (A) SFC trace of synthetic ment, R' is a sulfoxide of formula SO.R., wherein R any 45 standard of 8a using Chiralpak OD-H column with 25% iso alkyl, any aryl, —OR or NR", wherein R and R7 are any propanol/75% supercritical CO mobile phase. (B) Trace of P411-CIS T438S produced 8a under same conditions as alkyl or any aryl. In a further embodiment, R is any alkyl or synthetic standard. (C) Trace of P411-H2-5-F10 pro aryl. In still a further embodiment, R is H. In yet another duced 8a. (D) Trace of P411-CIS I263-AT438S produced embodiment of Formula 1a R' is a phosphonate of formula 50 8a, using 20% isopropanol/80% supercritical CO mobile P(O)(OR)(OR), wherein RandR are independently any phase. aryl or any alkyl. In a further embodiment, R is any alkyl or FIG. 9 shows: Top: SFC trace of 8b synthetic standard any aryl. In still a further embodiment, R is any alkyl or aryl. using Chiralpak OD-H column with 20% isopropanol/80% In yet another embodiment of Formula 1a, R' is a carbonyl supercritical CO mobile phase. Bottom: Trace of enzyme group. In a further embodiment, R is any alkyl or any aryl. In 55 produced 8b. a further embodiment, R is any alkyl or any aryl. In another FIG. 10 shows: Top: SFC trace of 8c synthetic standard embodiment of Formula 1a, R is an optionally substituted using Chiralpak OJ column with 15% isopropanol/85% aryl group. In a further embodiment, R' is any alkyl or any supercritical CO mobile phase. Bottom: Trace of enzyme aryl. In a further embodiment, R is Horanyalkylor any aryl. produced 8c. In anotherembodiment of Formula 1a, R' is a carbonyl group. 60 FIG. 11 shows: Top: SFC trace of 8d synthetic standard In a further embodiment, R is any alkyl or any aryl. In a using Chiralpak OJ column with 15% isopropanol/85% further embodiment, R is O. supercritical CO mobile phase. Bottom: Trace of enzyme The disclosure demonstrates the intermolecular nitrene produced 8d. transfer catalyzed by an enzyme, allowing for a mechanistic FIG. 12 shows Data used to determine initial rates for analysis of this new enzyme activity. Similar to P450-cata 65 reaction of Substituted aryl Sulfides with tosyl azide, using lyzed sulfoxidation, the electronic properties of the sulfide P411-CIS T438S as catalyst. Sulfimide products mea Substrates influence reactivity, though the magnitude of the sured are as listed in Table 2. US 9,399,762 B2 7 8 FIG. 13 shows visible spectroscopy of Fe(III) and Fe(II)- FIG. 32 shows an LC-MS chromatrogram showing UV P411 CIS I263 A T438S in the presence of NADPH, fol trace at 230 nm (top) and selected ions for 262 m/z, corre lowed by azide addition. sponding to the M+H+ mass of the product of azide 4 with FIG. 14 shows the monitoring the iron heme during the sulfide 7a. Note TIC timescale differs due to instrument sol sulfimidation reaction with azide and sulfide. As shown in vent delay. FIG. 13, the Soret peak for the Fe(III) state is obscured by NADPH. Therefore the Q-band region (enlarged box from DETAILED DESCRIPTION 530-640 nm) was used to assess the transition from Fe(III) to Fe(II) as reaction proceeds. As used herein and in the appended claims, the singular 10 forms “a,” “and” and “the include plural referents unless the FIG. 15 shows the ratio of Sulfimide TTN to Sulfonamide context clearly dictates otherwise. Thus, for example, refer TTN as sulfide concentration is varied, using P411-CIS ence to “a species' includes a plurality of Such species and T438S enzyme, as shown in the scheme. Reactions were reference to “the enzyme’ includes reference to one or more prepared with azide concentration held constant at 1 mM and enzymes and equivalents thereof, and so forth. sulfide concentration varied from 0.5 mM to 4 mM. 15 Unless defined otherwise, all technical and scientific terms FIG.16 shows TTN values measured for slow addition vs. used herein have the same meaning as commonly understood adding Substrates simultaneously, using P411-CIST438S to one of ordinary skill in the art to which this disclosure enzyme and the Substrates shown in Figure S10. Light gray belongs. Although any methods and reagents similar or bar shows TTN for sulfonamide, 9, dark gray shows TTN for equivalent to those described herein can be used in the prac the sulfimide, 8a. tice of the disclosed methods and compositions, the exem FIG. 17 shows a standard curve for product 8a, shows in plary methods and materials are now described. response factor as ratio of product area to internal standard Also, the use of “or” means “and/or unless stated other aca. wise. Similarly, “comprise.” “comprises.” “comprising FIG. 18 shows the standard curve for product 8b. “include “includes, and “including are interchangeable FIG. 19 shows the standard curve for product 8c. 25 and not intended to be limiting. FIG. 20 shows the standard curve for product 8d. It is to be further understood that where descriptions of FIG. 21 shows a demonstration of enzymatic production of various embodiments use the term “comprising, those 8a. Top: LC-MS 230 nm chromatogram of synthetic standard skilled in the art would understand that in some specific of 8a, confirmed by NMR. Middle: Enzyme reaction contain instances, an embodiment can be alternatively described ing putative 8a. Bottom: Mixture of enzyme reaction and 30 using language "consisting essentially of or “consisting of synthetic 8a, showing coelution. All publications mentioned herein are incorporated herein FIG.22 shows LC runs from FIG.21 showing ESI-MS-(+) by reference in full for the purpose of describing and disclos detection of total ion chromatogram (TIC) (major peak=324, ing the methodologies, which are described in the publica corresponding to 8a M+H+). tions, which might be used in connection with the description FIG. 23 shows a demonstration of enzymatic production of 35 herein. The publications discussed above and throughout the 8b. Top: LC-MS 230 nm chromatogram of synthetic standard text are provided solely for their disclosure prior to the filing of 8b, confirmed by NMR. Middle: Enzyme reaction contain date of the present application. Nothing herein is to be con ing putative 8b. Bottom: Mixture of enzyme reaction and Strued as an admission that the inventors are not entitled to synthetic 8b, showing coelution. antedate such disclosure by virtue of prior disclosure. FIG. 24 shows LC runs from FIG.23 showing ESI-MS-(+) 40 Enzymes offer many advantages over traditional catalysts, detection of selected ions (mass window 307.5-308.5). Such as selectivity, mild reaction conditions, convenient pro FIG. 25 shows a demonstration of enzymatic production of duction, and use in whole cells. Cytochrome P450 enzymes 8c. Top: LC-MS 230 nm chromatogram of synthetic standard are known to be able to carry out monooxygenations of of 8c, confirmed by NMR. Middle: Enzyme reaction contain diverse Substrates, and exemplify the mild operating condi ing putative 8c. Bottom: Mixture of enzyme reaction and 45 tions that enzymes can afford. Many of the small molecule synthetic 8c, showing coelution. catalysts developed for C H amination reaction have been FIG. 26 shows LC runs from FIG.25 showing ESI-MS-(+) designed in an effort to mimic these enzymes, but with the detection of selected ions (mass window 307.5-308.5). goal of activating nitrene equivalents rather than the oxene FIG. 27 shows a demonstration of enzymatic production of equivalents activated by cytochrome P450 enzymes (Bennett, 8d. Top: LC-MS 230 nm chromatogram of synthetic standard 50 R. D. & Heftmann, Phytochemistry 4,873-879 (1965)). Cyto of 8d, confirmed by NMR. Middle: Enzyme reaction contain chrome P450 enzymes bind to a comprising a cata ing putative 8d. Bottom: Mixture of enzyme reaction and lytic transition metal (iron heme) that forms a reactive inter synthetic 8d, showing coelution. mediate similar in electronic and steric features to FIG.28 shows LC runs from FIG.27 showing ESI-MS-(+) metallonitrenoid intermediates used for synthetic C N bond detection of total ion chromatogram (TIC) (major peak=294 55 forming reactions. m/z, corresponding to 8d M+H+). The disclosure is based on the Surprising discovery that FIG. 29 shows LC runs from reaction with P411-CIS engineered heme enzymes such as cytochrome P450BM3 T438S with substrate 7e showing ESIMS-(+) detection of enzymes, including a serine-heme-ligated P411 enzyme, effi selected ions (mass window 321.5-322.5 m/z), demonstrating ciently catalyze nitrene insertion and transfer reactions. Suit production of 8e. 60 able reactions include, but are not limited to, transfer of a FIG. 30 shows a mass spectrum of peak identified in FIG. nitrogen atom derived from an appropriate nitrene precursor 29, showing M+H+ of product 8e. to sulfur atoms with formation of an S N bond. For FIG. 31 shows an LC-MS chromatrogram showing UV example, in certain aspects, the present disclosure provides trace at 230 nm (top) and selected ions for 344 m/z, corre engineered heme enzymes such as cytochrome P450BM3 sponding to the M+H+ mass of the product of azide 2 with 65 enzymes, including the serine-heme-ligated P411, which sulfide 7a. Note TIC timescale differs due to instrument sol efficiently catalyze the sulfimidation of various organsulfur vent delay. molecules. Significant enhancements in catalytic activity and US 9,399,762 B2 9 10 enantioselectivity are observed in Vivo, using intact bacterial The term “alkynyl and “alkynyl group’ as used herein cells expressing the engineered enzymes. The results pre refers to a linear, branched, or cyclic hydrocarbon group of 2 sented here underscore the utility of enzymes in catalyzing to 24 carbon atoms, preferably of 2 to 12 carbon atoms, new reaction types with the aid of synthetic reagents. The containing at least one triple bond. Such as ethynyl, n-propy ability to genetically encode catalysts for formal nitrene nyl, and the like. The term "heteroatom-containing alkynyl transferS opens up new biosynthetic pathways to amines and as used herein refer to an alkynyl moiety where at least one expands the scope of transformations accessible to biocataly carbon atom is replaced with a heteroatom. sis. The term “aryland “aryl group’ as used herein refers to an The term “S N sulfimidation' includes a transfer of a aromatic Substituent containing a single aromatic or multiple nitrogen atom derived from an appropriate nitrene precursor 10 aromatic rings that are fused together, directly linked, or to sulfur atoms with formation of an S N bond, yielding a indirectly linked (such as linked through a methylene or an sulfimide. ethylene moiety). Preferred aryl groups contain 5 to 24 car The term “S N sulfimidation (enzyme) catalyst” or bonatoms, and particularly preferred aryl groups contain 5 to “enzyme with S N suflimidation activity” includes any and 14 carbon atoms. The term "heteroatom-containing aryl as all chemical processes catalyzed by enzymes, by which Sub 15 used herein refer to an aryl moiety where at least one carbon strates containing at least one carbon-sulfur bond can be atom is replaced with a heteroatom. converted into Sulfimide products by using nitrene precursors The term “alkoxy' and “alkoxy group’ as used herein Such as Sulfonyl azides, carbonyl azides, aryl azides, azido refers to an aliphatic group or a heteroatom-containing ali formates, phosphoryl azides, azide phosphonates, imi phatic group bound through a single, terminal ether linkage. noiodanes, or haloamine derivatives. Preferred aryl alkoxy groups contain 1 to 24 carbon atoms, This disclosure describes enzyme catalysts based for the and particularly preferred alkoxy groups contain 1 to 14 car transfer of nitrogen atoms to aryl Sulfides and other organo bon atoms. Sulfur compounds. This reaction is presumed to take place The term “aryloxy” and “aryloxy group’ as used herein through a metal-nitrenoid intermediate, the reactivity of refers to an aryl group or a heteroatom-containing aryl group 25 bound through a single, terminal ether linkage. Preferred which is modulated by both the enzyme and substrates. aryloxy groups contain 5 to 24 carbon atoms, and particularly In some embodiments the organosulfur molecule has the preferred aryloxy groups contain 5 to 14 carbon atoms. structure of formula (I): The terms “halo' and “halogen' are used in the conven tional sense to refer to a fluoro, chloro, bromo or iodo Sub 30 stituent. (I) By “substituted it is intended that in the alkyl, alkenyl, alkynyl, aryl, or other moiety, at least one hydrogen atom is replaced with one or more non-hydrogenatoms. Examples of such substituents include, without limitation: functional 35 groups referred to herein as “FG'. Such as alkoxy, aryloxy, in which X-S atom is a target site for addition of a nitrogen, alkyl, heteroatom-containing alkyl, alkenyl, heteroatom-con and R, R2, and R are independently selected from the group taining alkenyl, alkynyl, heteroatom-containing alkynyl, aryl, consisting of hydrogen, oxygen, aliphatic, aryl, Substituted heteroatom-containing aryl, alkoxy, heteroatom-containing aliphatic, Substituted aryl, heteroatom-containing aliphatic, alkoxy, aryloxy, heteroatom-containing aryloxy, halo, heteroatom-containing aryl, Substituted heteroatom-contain 40 hydroxyl ( OH), sulfhydryl ( -SH), substituted sulfhydryl, ing aliphatic, Substituted heteroatom-containing aryl, alkoxy, carbonyl (-CO—), thiocarbonyl, (—CS ), carboxy aryloxy, and functional groups (FG) or are taken together to (—COOH), amino ( NH), substituted amino, nitro form a ring. (—NO), nitroso ( NO), sulfo (—SO OH), cyano The term “aliphatic' is used in the conventional sense to (—C=N), cyanato ( O C=N), thiocyanato ( S refer to an open-chain or cyclic, linear or branched, saturated 45 C=N), formyl ( CO-H), thioformyl ( CS H), or unsaturated hydrocarbon group, including but not limited phosphono ( -P(O)OH), Substituted phosphono, and phos to alkyl group, alkenyl group and alkynyl groups. The term pho ( PO). "heteroatom-containing aliphatic' as used herein refer to an In particular, the Substituents R. RandR of formula I can aliphatic moiety where at least one carbon atom is replaced be independently selected from hydrogen, C-C alkyl, with a heteroatom. 50 C-C substituted alkyl, C-C Substituted heteroatom-con The term “alkyl and “alkyl group’ as used herein refers to taining alkyl, C-C Substituted heteroatom-containing a linear, branched, or cyclic Saturated hydrocarbon typically alkyl, C-C alkenyl, C-C Substituted alkenyl, C-C, containing 1 to 24 carbon atoms, preferably 1 to 12 carbon Substituted heteroatom-containing alkenyl, C-C substi atoms, such as methyl, ethyl, n-propyl, isopropyl. n-butyl, tuted heteroatom-containing alkenyl, Cs-C aryl, Cs-Ca isobutyl, t-butyl, octyl, decyl and the like. The term "heteroa 55 Substituted aryl, Cs-C. Substituted heteroatom-containing tom-containing alkyl as used herein refers to an alkyl moiety aryl, Cs-C. Substituted heteroatom-containing aryl, C-C, where at least one carbonatom is replaced with a heteroatom, alkoxy, Cs-Caryloxy, carbonyl, thiocarbonyl, and carboxy. e.g. oxygen, nitrogen, Sulphur, phosphorus, or silicon, and More in particular, R. R. and R of formula I can be inde typically oxygen, nitrogen, or Sulphur. pendently selected from hydrogen, C-C alkyl, C-C Sub The term “alkenyl and “alkenyl group’ as used herein 60 stituted alkyl, C-C Substituted heteroatom-containing refers to a linear, branched, or cyclic hydrocarbon group of 2 alkyl, C-C substituted heteroatom-containing alkyl, C-C, to 24 carbon atoms, preferably of 2 to 12 carbon atoms, alkenyl, C-C substituted alkenyl, C-C substituted heteroa containing at least one double bond, such as ethenyl, n-pro tom-containing alkenyl, C-C substituted heteroatom-con penyl, isopropenyl. n-butenyl, isobutenyl, octenyl, decenyl, taining alkenyl, Cs-Caryl, Cs-C. Substituted aryl, Cs-C and the like. The term "heteroatom-containing alkenyl as 65 Substituted heteroatom-containing aryl, Cs-C. Substituted used herein refer to an alkenyl moiety where at least one heteroatom-containing aryl, C2-C alkoxy, Cs-Caryloxy, carbon atom is replaced with a heteroatom. carbonyl, thiocarbonyl, and carboxy. US 9,399,762 B2 11 12 As used herein, the term “alkylthio’ refers to an alkyl group species. Genetic alterations include, for example, modifica having a sulfur atom that connects the alkyl group to the point tions introducing expressible nucleic acids encoding meta of attachment: i.e., alkyl-S-. As for alkyl groups, alkylthio bolic polypeptides, other nucleic acid additions, nucleic acid groups can have any suitable number of carbonatoms, such as deletions and/or other functional disruption of the microbial Ce or C. Alkylthio groups include, for example, methoxy, organism's genetic material. Such modifications include, for ethoxy, propoxy, iso-propoxy, butoxy, 2-butoxy, iso-butoxy, example, coding regions and functional fragments thereof, sec-butoxy, tert-butoxy, pentoxy, hexoxy, etc. Alkylthio for heterologous, homologous or both heterologous and groups can be optionally Substituted with one or more moi homologous polypeptides for the referenced species. Addi eties selected from halo, hydroxy, amino, alkylamino, alkoxy, tional modifications include, for example, non-coding regu haloalkyl, carboxy, amido, nitro, oxo, and cyano. Thioethers 10 have the generals structure R S R, wherein R is any alkyl, latory regions in which the modifications alter expression of alkynyl, alkenyl, aryl (Substituted or unsubstituted). a gene or operon. As used herein, the term “haloalkyl refers to an alkyl As used herein, the term 'anaerobic', when used in refer moiety as defined above substituted with at least one halogen ence to a reaction, culture or growth condition, is intended to atOm. 15 mean that the concentration of oxygen is less than about 25 As used herein, the term “acyl refers to a moiety —C(O) uM, typically less than about 5uM, and commonly less than R, wherein R is an alkyl group. 1 uM. The term is also intended to include sealed chambers of As used herein, the term “oxo' refers to an oxygen atom liquid or Solid medium maintained with an atmosphere of less that is double-bonded to a compound (i.e., O—). than about 1% oxygen. Typically, anaerobic conditions are As used herein, the term “carboxy' refers to a moiety achieved by sparging a reaction mixture with an inert gas Such —C(O)CH. The carboxy moiety can be ionized to form the as nitrogen or argon. carboxylate anion. As used herein, the term “exogenous” is intended to mean As used herein, the term "amino” refers to a moiety —NR, that the referenced molecule or the referenced activity is wherein each R group is H or alkyl. introduced into the host microbial organism. The term as it is As used herein, the term "amido” refers to a moiety —NRC 25 used in reference to expression of an encoding nucleic acid (O)R or C(O)NR', wherein each R group is H or alkyl. refers to the introduction of the encoding nucleic acid in an The terms “engineered heme enzyme” and “heme enzyme expressible form into the microbial organism using recombi variant include any heme-containing enzyme comprising at nant DNA techniques. When used in reference to a biosyn least one amino acid mutation with respect to wild-type and thetic activity, the term refers to an activity that is introduced also include any chimeric protein comprising recombined 30 into the host reference organism. sequences or blocks of amino acids from two, three, or more The term "heterologous' as used herein with reference to different heme-containing enzymes that will improve its molecules, and in particular enzymes and polynucleotides, S N sulfimidation activity or other reactions disclosed indicates molecules that are expressed in an organism other herein. than the organism from which they originated or are found in The terms “engineered cytochrome P450 and “cyto 35 nature, independently of the level of expression that can be chrome P450 variant include any cytochrome P450 enzyme lower, equal or higher than the level of expression of the comprising at least one amino acid mutation with respect to molecule in the native microorganism. wild-type and also include any chimeric protein comprising On the other hand, the term “native' or “endogenous” as recombined sequences or blocks of amino acids from two, used herein with reference to molecules, and in particular three, or more different cytochrome P450 enzymes. 40 enzymes and polynucleotides, indicates molecules that are As used herein, the term “whole cell catalyst” includes expressed in the organism in which they originated or are microbial cells expressing heme containing enzymes, engi found in nature, independently of the level of expression that neered cytochrome P450, or a cytochrome P450 variant, can be lower equal or higher than the level of expression of the where the whole cell displays sulfimidation activity or sul molecule in the native microorganism. It is understood that foXimdation activity. 45 expression of native enzymes or polynucleotides may be As used herein, the term "nitrene equivalent” or "nitrene modified in recombinant microorganisms. precursor includes molecules that can be decomposed in the The term “homolog, as used herein with respect to an presence of metal (or enzyme) catalysts to structures that original enzyme or gene of a first family or species, refers to contain at least one monovalent nitrogen atom with only 6 distinct enzymes or genes of a second family or species which valence shell electrons and that can be transferred to a sulfur 50 are determined by functional, structural and/or genomic to form sulfillimines or sulfoximines. analyses to be an enzyme or gene of the second family or As used herein, the terms “microbial. “microbial organ species which corresponds to the original enzyme or gene of ism” and “microorganism’ include any organism that exists the first family or species. Homologs most often have func as a microscopic cell that is included within the domains of tional, structural, or genomic similarities. Techniques are archaea, bacteria or eukarya. Therefore, the term is intended 55 known by which homologs of an enzyme or gene can readily to encompass prokaryotic or eukaryotic cells or organisms be cloned using genetic probes and PCR. Identity of cloned having a microscopic size and includes bacteria, archaea and sequences as homolog can be confirmed using functional eubacteria of all species as well as eukaryotic microorgan assays and/or by genomic mapping of the genes. isms such as yeast and fungi. Also included are cell cultures of A protein has "homology' or is "homologous' to a second any species that can be cultured for the production of a chemi 60 protein if the amino acid sequence encoded by a gene has a cal. similar amino acid sequence to that of the second gene. Alter As used herein, the term “non-naturally occurring, when natively, a protein has homology to a second protein if the two used in reference to a microbial organism or enzyme activity proteins have "similar amino acid sequences. Thus, the term of the disclosure, is intended to mean that the microbial “homologous proteins’ is intended to mean that the two pro organism or enzyme has at least one genetic alteration not 65 teins have similar amino acid sequences. In particular normally found in a naturally occurring strain of the refer embodiments, the homology between two proteins is indica enced species, including wild-type strains of the referenced tive of its shared ancestry, related by evolution. US 9,399,762 B2 13 14 The terms “analog and “analogous' include nucleic acid including, but not limited to, cobalt, rhodium, copper, ruthe or protein sequences or protein structures that are related to nium, and manganese, which are active Sulfimidation or Sul one another in function only and are not from common foXimidation catalysts. descent or do not share a common ancestral sequence. Ana In some embodiments, the heme enzyme is a member of logs may differ in sequence but may share a similar structure, one of the enzyme classes set forth in Table 1. In other due to convergent evolution. For example, two enzymes are embodiments, the heme enzyme is a variant or homolog of a analogs or analogous if the enzymes catalyze the same reac member of one of the enzyme classes set forth in Table 1. In tion of conversion of a Substrate to a product, are unrelated in yet other embodiments, the heme enzyme comprises or con sequence, and irrespective of whether the two enzymes are sists of the heme domain of a member of one of the enzyme related in structure. 10 classes set forth in Table 1 or a fragment thereof (e.g., a The term “contact’ as used herein with reference to inter truncated heme domain) that is capable of carrying out the actions of chemical units indicates that the chemical units are nitrene insertion and nitrene transfer reactions described at a distance that allows short range non-covalent interactions herein. (such as Van der Waals forces, hydrogen bonding, hydropho 15 bic interactions, electrostatic interactions, dipole-dipole TABLE 1 interactions) to dominate the interaction of the chemical Heme enzymes identified by their enzyme classification units. For example, when an enzyme is contacted number (EC number) and classification name. with a target molecule, the enzyme is allowed to interact with and bind to the organic molecule through non-covalent inter ECNumber Name actions so that a reaction between the enzyme and the target .1.2.3 L-lactate dehydrogenase molecule can occur. .1.2.6 polyvinyl alcohol dehydrogenase (cytochrome) 12.7 methanol dehydrogenase (cytochrome c) The term “introducing as used herein with reference to the 1.S.S alcohol dehydrogenase (quinone) interaction between two chemical units, such as a functional 1.5.6 formate dehydrogenase-N: groups and a target site, indicates a reaction resulting in the 25 .1.9.1 alcohol dehydrogenase (aZurin): 1993 gluconate 2-dehydrogenase (acceptor) formation of a bond between the two chemical units, e.g. the 1.99.11 fructose 5-dehydrogenase functional group and the target site. 1.99.18 cellobiose dehydrogenase (acceptor) The disclosure provides an enzymatic process for the 1.99.20 alkan-1-oldehydrogenase (acceptor) Suflimidation or Sulfoximidation of a target Sulfur atom b a 2.1.70 glutamyl-tRNA reductase 30 2.3.7 indole-3-acetaldehyde oxidase nitrene precursor. The methods and compositions of the dis 2.99.3 aldehyde dehydrogenase (pyrroloquinoline-quinone) closure includes a heme enzyme (e.g., an engineered .3.1.6 fumarate reductase (NADH): P450BM3 of variant) an nitrene precursor (e.g., an azide or 3.5.1 Succinate dehydrogenase (ubiquinone) 3.54 fumarate reductase (menaquinone) other compound that comprises a nitrogen atom that can be 3.99.1 Succinate dehydrogenase used as a nitrene) and a target organosulfur compound, 35 .4.9.1 methylamine dehydrogenase (amicyanin) wherein the compound comprises a Sulfur atom that is linked .4.9.2. aralkylamine dehydrogenase (aZurin) 5.1.20 methylenetetrahydrofolate reductase NAD(P)H) to a nitrogen to form a new N–C bond using the enzymatic 5.99.6 spermidine dehydrogenase process of the disclosure. 6.3.1 NAD(P)H oxidase In some embodiments, the enzyme is a heme-containing .7.1.1 reductase (NADH) enzyme or a variant thereof. The wording “heme' or “heme 40 .7.1.2 Nitrate reductase NAD(P)H) .7.1.3 nitrate reductase (NADPH) domain as used herein refers to an amino acid sequence .7.1.4 nitrite reductase NAD(P)H) within an enzyme, which is capable of binding an iron-com .7.1.14 reductase NAD(P), nitrous oxide-forming plexing structure Such as a porphyrin. Compounds of iron are 7.2.1 nitrite reductase (NO-forming) 7.2.2 nitrite reductase (cytochrome; ammonia-forming) typically complexed in a porphyrin (tetrapyrrole) ring that 7.2.3 trimethylamine-N-oxide reductase (cytochrome c) may differinside chain composition. Heme groups can be the 45 7.2.5 nitric oxide reductase (cytochrome c) prosthetic groups of cytochromes and are found in most oxy 7.2.6 hydroxylamine dehydrogenase gen carrier proteins. Exemplary heme domains include that of 7.3.6 hydroxylamine oxidase (cytochrome) 7.5.1 nitrate reductase (quinone) P450, as well as truncated or mutated versions of these that 7.5.2 nitric oxide reductase (menaquinol) retain the capability to bind the iron-complexing structure. A 7.6.1 nitrite dismutase skilled person can identify the heme domain of a specific 50 7.7.1 ferredoxin-nitrite reductase 7.7.2 ferredoxin-nitrate reductase protein using methods known in the art. 7.99.4 nitrate reductase The terms “heme enzyme” and “heme protein’ are used 7.99.8 hydrazine herein to include any member of a group of proteins contain .8.1.2 Sulfite reductase (NADPH) ing heme as a prosthetic group. Non-limiting examples of .8.2.1 Sulfite dehydrogenase heme enzymes include , cytochromes, oxidoreduc 55 .8.2.2 thiosulfate dehydrogenase 8.23 Sulfide-cytochrome-c reductase (flavocytochrome c) tases, any other protein containing a heme as a prosthetic .8.2.4 dimethylsulfide:cytochrome c2 reductase group, and combinations thereof. Heme-containing globins 8.31 sulfite oxidase include, but are not limited to, , , and 8.7.1 Sulfite reductase (ferredoxin) combinations thereof. Heme-containing cytochromes 8.98.1 CoB-CoM heterodisulfide reductase 8.99.1 sulfite reductase include, but are not limited to, cytochrome P450, cytochrome 60 8.99.2 adenylyl-sulfate reductase b, cytochrome cl, cytochrome c, and combinations thereof. 8.99.3 hydrogensulfite reductase Heme-containing include, but are not lim 9.3.1 cytochrome-c oxidase ited to, a catalase, an oxidase, an oxygenase, a haloperoxi 96.1 nitrate reductase (cytochrome) .10.2.2 ubiquinol-cytochrome-c reductase dase, a , and combinations thereof. 103.1 catechol oxidase In certain instances, the heme enzymes are metal-Substi 65 10.3.B1 caldariellaquinol oxidase (H---transporting) tuted heme enzymes containing protoporphyrin IX or other 10.3.3 L-ascorbate oxidase porphyrin molecules containing metals other than iron, US 9,399,762 B2 15 16 TABLE 1-continued TABLE 1-continued Heme enzymes identified by their enzyme classification Heme enzymes identified by their enzyme classification number (EC number) and classification name. number (EC number) and classification name.

EC Number Name ECNumber Name 103.9 photosystem II 2.7.1.3.3 histidine kinase 10.3.10 ubiquinol oxidase (H+-transporting) 3.14.52 cyclic-guanylate-specific phosphodiesterase 10.3.11 ubiquinol oxidase 4.2.1.B9 colneleic acid etheroleic acid synthase 10.3-12 menaquinol oxidase (H+-transporting) 4.2.1.22 Cystathionine beta-synthase 10.9. plastoquinol-plastocyanin reductase 10 4.2.192 hydroperoxide dehydratase 1.5 cytochrome-c peroxidase 4.2.1.212 colneleate synthase .1.6 catalase 4.3.1.26 chromopyrrolate synthase 1.7 peroxidase 4.6.1.2 guanylate cyclase 1.B2 chloride peroxidase (vanadium-containing) 4.99.13 Sirohydrochlorin cobaltochelatase 1.B7 bromide peroxidase (heme-containing) 4.99.15 aliphatic aldoxime dehydratase 1.8 iodide peroxidase 15 4.99.17 phenylacetaldoxime dehydratase 1.10 chloride peroxidase 5.3.99.3 prostaglandin-E synthase 1.11 L-ascorbate peroxidase 5.3.99.4 prostaglandin-I synthase 1.13 manganese peroxidase 5.3.99.5 Thromboxane-A synthase .1.14 lignin peroxidase 5.445 9,12-octadecadienoate 8-hydroperoxide 8R- 1.16 versatile peroxidase 5.4.4.6 9,12-octadecadienoate 8-hydroperoxide 8S-isomerase 1.19 dye decolorizing peroxidase 6.6.1.2 cobaltochelatase 1.21 catalase-peroxidase 2. unspecific peroxygenase 2.2 myeloperoxidase In some embodiments, the heme enzyme is a variant or a 2.3 plant seed peroxygenase .11.2.4 atty-acid peroxygenase fragment thereof (e.g., a truncated variant containing the .12.2. cytochrome-c3 hydrogenase heme domain) comprising at least one mutation such as, e.g., 12.5. hydrogen:Quinone oxidoreductase 25 a mutation at the axial position of the heme coordination site. 12.99.6 hydrogenase (acceptor) In some instances, the mutation is a Substitution of the native 13.11.9 2,5-dihydroxypyridine 5,6-dioxygenase .13.11.11 tryptophan 2,3-dioxygenase residue with Ala, Asp, Arg, ASn, Cys, Glu, Gln, Gly, His, Ile, 13.11.49 chlorite O2- Lys, Leu, Met, Phe, Pro, Ser, Thr, Trp, Tyr, or Val at the axial 13.11.50 acetylacetone-cleaving enzyme position. In certain instances, the mutation is a Substitution of 13.11.52 indoleamine 2,3-dioxygenase 30 CyS with any other amino acid Such as Ser at the axial posi 13.11.60 inoleate 8R-lipoxygenase tion. 13.99.3 tryptophan 2'-dioxygenase .14.11.9 lavanone 3-dioxygenase In certain embodiments, the invitro methods for producing 14.12.17 nitric oxide dioxygenase a product described herein comprise providing a heme 14.1339 nitric-oxide synthase (NADPH dependent) enzyme, variant, or homolog thereof with a reducing agent 14.13.17 cholesterol 7alpha-monooxygenase .14.13.41 tyrosine N-monooxygenase 35 such as NADPH oradithionite salt (e.g., NaSO). In certain 14.13.70 sterol 14alpha-demethylase other embodiments, the in vivo methods for producing a 14.13.71 N-methylcoclaurine 3'-monooxygenase reaction product provided herein comprise providing whole 14.13.81 magnesium-protoporphyrin IX monomethyl ester cells such as E. coli cells expressing a heme enzyme, variant, (oxidative) cyclase 14.13.86 2-hydroxyisoflavanone synthase or homolog thereof. 14.13.98 cholesterol 24-hydroxylase 40 In some embodiments, the heme enzyme, variant, or 14.13.119 5-epiaristolochene 1,3-dihydroxylase homolog thereof is recombinantly expressed and optionally 14.13.126 vitamin D3 24-hydroxylase isolated and/or purified for carrying out the in vitro sulfimi 14.13.129 beta-carotene 3-hydroxylase dation or sulfoximidation reactions of the disclosure. In other .14.13.141 cholest-4-en-3-one 26-monooxygenase .14.13.142 3-ketosteroid 9alpha-monooxygenase embodiments, the heme enzyme, variant, or homolog thereof 14.13.151 linalool 8-monooxygenase 45 is expressed in whole cells Such as E. coli cells, and these cells 14.13.156 1,8-cineole 2-endo-monooxygenase are used for carrying out the in vivo nitrene insertion activity 14.13.159 vitamin D 25-hydroxylase .14.14. unspecific monooxygenase and/or nitrene transfer activity of the disclosure. .14.15. camphor 5-monooxygenase In certain embodiments, the heme enzyme, variant, or 14.15.6 cholesterol monooxygenase (side-chain-cleaving) homolog thereof comprises or consists of the same number of 14.15.8 steroid 15beta-monooxygenase 50 amino acid residues as the wild-type enzyme (i.e., a full 14.15.9 spheroidene monooxygenase .14.18. length polypeptide). In some instances, the heme enzyme, .14.19. stearoyl-CoA 9-desaturase variant, or homolog thereof comprises or consists of an amino 14.19.3 linoleoyl-CoA desaturase acid sequence without the start methionine (e.g., P450BM3 421.7 biflaviolin synthase amino acid sequence set forth in SEQ ID NO:1). In other 4.99. prostaglandin-endoperoxide synthase 55 4.99.3 embodiments, the heme enzyme comprises or consists of a 4.99.9 steroid 17alpha-monooxygenase heme domain fused to a reductase domain. In yet other 4.99.10 steroid 21-monooxygenase embodiments, the heme enzyme does not contain a reductase 4.99.15 4-methoxybenzoate monooxygenase (O-demethylating) domain, e.g., the heme enzyme contains a heme domain only 4.99.45 caroteine epsilon-monooxygenase 6.5.1 ascorbate ferrireductase (transmembrane) or a fragment thereof Such as a truncated heme domain. 6.9.1 iron:rusticyanin reductase 60 In some embodiments, the heme enzyme, variant, or 7.14 Xanthine dehydrogenase homolog thereof has an enhanced nitrene insertion activity 7.2.2 lupanine 17-hydroxylase (cytochrome c) and/or nitrene transfer activity of about 1.5, 2, 3, 4, 5, 6, 7, 8, 7.99.1 4-methylphenol dehydrogenase (hydroxylating) 17.99.2 ethylbenzene hydroxylase 9, 10, 15, 20, 25, 30, 35, 40, 45, 50,55, 60, 65,70, 75,80, 85, 97.1.1 chlorate reductase 90, 95, or 100 fold compared to the corresponding wild-type 97.1.9 Selenate reductase 65 heme enzyme. 2.7.7.65 diguanylate cyclase In some embodiments, the heme enzyme comprises aheme domain fused to a reductase domain. In other embodiments, US 9,399,762 B2 17 18 the heme enzyme does not comprise a reductase domain, e.g., TABLE 2-continued a heme domain only or a fragment thereof. In one embodiment, the disclosure provides a method for Heme enzymes identified by their enzyme classification catalyzing a nitrene insertion into a S-bond to produce a number (EC number) and classification name. product having a new S N bond. The method comprises the EC Recommended name Family gene steps of providing a —S - containing Substrate, a nitene 14.13.71 N-methylcoclaurine 3'-monooxygenase CYP8OB1 precursor and an engineered heme enzyme; and allowing the 14.1373 tabersonine 16-hydroxylase CYP71D12 reaction to proceed for a time Sufficient to form a product 14.13.74 7-deoxyloganin 7-hydroxylase having a new S N bond. 14.13.75 vinorine hydroxylase 10 14.13.76 taxane 10B-hydroxylase CYP725A1 In particular embodiments, the heme enzyme comprises a 14.13.77 taxane 13C-hydroxylase CYP725A2 cyctochrome P450 enzyme. Cytochrome P450 enzymes con 14.13.78 ent-kaurene oxidase CYP701 stitute a large Superfamily of heme-thiolate proteins involved 14.13.79 ent-kaurenoic acid oxidase CYP88A in the metabolism of a wide variety of both exogenous and .14.14.1 unspecific monooxygenase multiple endogenous compounds. Usually, they act as the terminal 14.15.1 camphor 5-monooxygenase CYP101 15 14.15.3 alkane 1-monooxygenase CYP4A oxidase in multicomponent electron transfer chains, such as .14.15.4 steroid 113-monooxygenase CYP11B P450-containing monooxygenase systems. Members of the 14.15.5 corticosterone 18-monooxygenase CYP11B cytochrome P450 enzyme family catalyze myriad oxidative 14.15.6 cholesterol monooxygenase (side-chain- CYP11A cleaving) transformations, including, e.g., hydroxylation, epoxidation, .14.21.1 (S)-stylopine synthase oxidative ring coupling, heteratom release, and heteroatom .14.21.2 (S)-cheilanthifoline synthase oxygenation (E. M. Isin et al., Biochim. Biophys. Acta 1770, .14.21.3 berbamunine synthase CYP8O 314 (2007)). The of these enzymes contains a .14.21.4 salutaridine synthase 14:21.5 (S)-canadine synthase Fel-protoporphyrin IX cofactor (heme) ligated proximally by 14.99.9 steroid 17C-monooxygenase CYP17 a conserved cysteine thiolate (M.T. Green, Current Opinion 14.99.10 steroid 21-monooxygenase CYP21 in Chemical Biology 13, 84 (2009)). The remaining axial iron 14.99.22 ecclysone 20-monooxygenase coordination site is occupied by a water molecule in the 25 14.99.28 linalool 8-monooxygenase CYP111 4.2.192 hydroperoxide dehydratase CYP74 resting enzyme, but during native catalysis, this site is capable 5.3.99.4 prostaglandin-I synthase CYP8 of binding molecular oxygen. In the presence of an electron 5.3.99.5 thromboxane-A synthase CYP5 source, typically provided by NADH or NADPH from an adjacent fused reductase domain oran accessory cytochrome P450 reductase enzyme, the heme center of cytochrome P450 30 In Some embodiments, the heme enzyme variant comprises activates molecular oxygen, generating a high Valent iron a mutation at the axial position of the heme coordination site. (IV)-oxo porphyrin cation radical species intermediate and a In some instances, the mutation is an amino acid substitution molecule of water. of the naturally occurring residue at this position with Ala, In some embodiments, the engineered heme enzyme is a Asp, Arg, ASn, Cys, Glu, Gin, Gly, His, Ile, Lys, Leu, Met, cytochrome P450 enzyme or a variant thereof. In certain 35 Phe, Pro, Ser. Thr, Trp, Tyr or Val at the axial position. In other embodiments, the P450 enzyme is a member of one of the instances, the mutation is an amino acid Substitution of Cys classes shown in Table 2 (see, e.g. http://www.icegeb.org/ with Asp or Ser at the axial position. ~p450SrV/P450enzymes.html), the disclosure of which is In some embodiments, the engineered heme enzyme is incoproated herein by reference in its entirety. expressed in a bacterial, archaeal or fungal host organism. 40 In some embodiments, the cytochrome P450 enzyme is a TABLE 2 P450 BM3 enzyme or a variant thereof. In some instances, the P450 BM3 enzyme comprises the amino acid sequence set Heme enzymes identified by their enzyme classification forth in SEQID NO: 1 or a variant thereof. number (EC number) and classification name. In some embodiments, the P450 enzyme variant comprises EC Recommended name Family gene 45 a mutation at the axial position of the heme coordination site. In some instances, the mutation is an amino acid Substitution 33.9 Secologanin synthase CYP72A1 of CyS with Ala, Asp, Arg, ASn, Glu, Gin, Gly. His, Ile, Lys, .14.13.11 trans-cinnamate 4-monooxygenase CYP73 .14.13.12 benzoate 4-monooxygenase CYP53 Leu, Met, Phe, Pro, Ser, Thr, Trp, Tyr or Val at the axial 14.13.13 calcidiol 1-monooxygenase CYP27 position. In other instances, the mutation is an amino acid 14.13:15 cholestanetriol 26-monooxygenase CYP27 50 substitution of Cys with Asp or Ser at the axial position. 14.13.17 cholesterol 7C.-monooxygenase CYP7 .14.13.21 flavonoid 3'-monooxygenase CYP75 In some embodiments, the P450BM3 enzyme comprises at 14.13.28 3,9-dihydroxypterocarpan 6a- CYP93A1 least one, two, three, four, five, six, seven, eight, nine, ten, monooxygenase eleven, twelve, or all thirteen of the following amino acid 14.1330 eukotriene-Ba 20-monooxygenase CYP4F substitutions in SEQID NO:1: V78A, F87V, P142S, T175I, 14.1337 methyltetrahydroprotoberberine 14- CYP93A1 55 monooxygenase A184V, S226R, H236Q, E252G, T268A, A290V. L353V, .14.13.41 tyrosine N-monooxygenase CYP79 I366V, and E442K. .14.13.42 hydroxyphenylacetonitrile 2 In some embodiments, the cytochrome P450BM3 enzyme monooxygenase variant comprises a T268A mutation and/or a C400X muta 14.1347 (-)-limonene 3-monooxygenase .14.13.48 (-)-limonene 6-monooxygenase tion in SEQID NO:1, wherein X is any amino acid other than 14.13.49 (-)-limonene 7-monooxygenase 60 Cys. In another embodiment, the cytochrome P450BM3 14.1352 isoflavone 3'-hydroxylase enzyme variant comprises a T438S mutation and/or a C400X 1413.53 isoflavone 2'-hydroxylase mutation in SEQID NO:1, wherein X is any amino acid other 14.13.SS protopine 6-monooxygenase 1413.56 dihydrosanguinarine 10-monooxygenase — than CyS. 14.13.57 dihydrochelirubine 12-monooxygenase In one embodiment, the heme enzyme variant for use in the 14.13.60 27-hydroxycholesterol 7C-monooxygenase — 65 catalysis of a nitrene insertion into a —S—bond to produce 14.13.70 sterol 14-demethylase CYP51 a product having a new S N bond is a P450BM3 variant comprising the following amino acid substitutions to SEQID US 9,399,762 B2 19 20 NO: 1: V78A, F87V, P142S, T175I, A184V, S226R, H236Q, 2003/008563; EP 1270722; US 20020187538; WO 2002/ E252G, T268A, A29OV, L353V, I366V, and E442K. In 092801; WO 2002/08.8341; US 2002.0160950, WO 2002/ another embodiment, the heme variant optionally comprises 083868; US 20020142379; WO 2002/072758; WO 2002/ the following additional amino acid substitutions to SEQID 064765; US 20020076777; US 20020076774; US NO:1: L75A, I263A and L437A. In yet another embodiment, 20020076774; WO 2002/046386; WO 2002/044213; US the heme variant optionally comprises the additional amino 20020061566; CN 1315335; WO 2002/034922; WO 2002/ acid substitution C400S to SEQID NO:1. In some embodi 033057; WO 2002/029018: WO 2002/018558; JP ments, the heme enzyme variant is the H2-5-F10 variant (see, 2002058490; US 20020022254; WO 2002/008269; WO Table 7). In other embodiments, the heme enzyme variant is 2001/098461; WO 2001/081585; WO 2001/051622; WO the P411-CIS variant (see, Table 4). 10 Table 3 below lists additional cyctochrome P450 enzymes 2001/034780: CN 1271005; WO 2001/011071: WO 2001/ that are suitable for use in the sulfimidation or sulfoximida 0.07630; WO 2001/007574; WO 2000/078973: U.S. Pat. No. tion reactions of the disclosure. The accession numbers in 6,130,077; JP 2000152788; WO 2000/031273; WO 2000/ Table 3 are incorporated herein by reference in their entirety 020566; WO 2000/000585; DE 19826821; JP 11235174; for all purposes. The cytochrome P450 gene and/or protein 15 U.S. Pat. No. 5,939,318: WO 99/19493: WO 99/18224; U.S. sequences disclosed in the following patent documents are Pat. No. 5,886, 157; WO 99/08812; U.S. Pat. No. 5,869,283; hereby incorporated by reference in their entirety for all pur JP 10262665; WO98/40470; EP 776974; DE 19507546; GB poses: WO 2013/076258; CN103160521; CN 103223219; 2294692; U.S. Pat. No. 5,516,674; JP 07147975; WO KR 2013081394.JP5222410; WO 2013/073775; WO 2013/ 94/29434; JP 06205685; JP 05292.959; JP 04144680, DD 054890, WO 2013/048.898; WO 2013/031975; WO 2013/ 298820; EP 477961; SU 1693043;JP 0104.7375; EP 281245; 064.41 1; U.S. Pat. No. 8,361,769; WO 2012/150326, CN JP 62104583; JP 63.044888; JP 62236485; JP 62104582; and 102747053; CN 102747052; JP 2012170409, WO 2013/1 JP 62O19084. 15484; CN 103223219; KR 2013081394; CN 103194461; JP 5222410; WO 2013/086499; WO 2013/076258; WO 2013/ TABLE 3 073775; WO 2013/064.41 1: WO 2013/054890, WO 2013/ 25 03.1975; U.S. Pat. No. 8,361,769; WO 2012/156976: WO Additional cytochrome P450 enzymes of the disclosure. 2012/150326; CN 102747053; CN 102747052; US SEQ 20120258938; JP 2012170409; CN 102399796; JP Species Cyp No. Accession No. ID NO 2012055274; WO 2012/029914; WO 2012/028709; WO 201 Bacilius negaterium 1O2A1 AAA876O2 1 1/154523; JP 201 1234631; WO 2011/121456; EP2366782: 30 Bacilius negaterium 1O2A1 ADAS7069 2 WO 2011/105241; CN 102154234: WO 2011/093185; WO Bacilius negaterium 1O2A1 ADAS7068 3 2011/093.187; WO 2011/093186: DE 102010000168; CN Bacillus megaterium 102A1 ADAS7062 4. 1021 15757; CN 102093984; CN 102080069; JP Bacilius negaterium 1O2A1 ADAS7061 5 Bacilius negaterium 1O2A1 ADAS7059 6 2011103864; WO 2011/042143: WO 2011/038313; JP Bacilius negaterium 1O2A1 ADAS7058 7 2011055721; WO 2011/025203; JP 2011024534; WO 2011/ 35 Bacilius negaterium 1O2A1 ADAS70SS 8 008231; WO 2011/008232; WO 2011/005786; IN Bacilius negaterium 1O2A1 ACZ37122 9 2009DE01216; DE 102009025996; WO 2010/134096; JP Bacilius negaterium 1O2A1 ADAS7057 10 2010233523; JP 2010220609; WO 2010/095721; WO 2010/ Bacilius negaterium 1O2A1 ADAS70S6 11 064764; US 20100136595; JP 2010051 174; WO 2010/ Mycobacterium sp. HXN-1500 153A6 CAHO4396 12 40 Tetrahymena thermophile SO13 C2 ABYS9989 13 024437; WO 2010/01 1882; WO 2009/108388: US Nonomiraea dietziae AGE14547.1 14 20090209010; US 20090124515; WO 2009/041470; KR Homo sapiens 2R1 NP 078790 15 2009028942: WO 2009/039487; WO 2009/020231; JP Macca mulatia 2R1 NP 001180887.1 16 2009005687; CN 101333520; CN 101333521; US Canis familiaris 2R1 XP 85.4533 17 20080248545; JP 20082371.10: CN 101275141; WO 2008/ Mits musculus 2R1 AAIO8963 18 45 Bacilius halodurans C-125 52A6 NP 242623 19 118545; WO 2008/115844; CN 101255408; CN 101250506; Streptomyces parvus aryC AFM80O22 2O CN 101250505: WO 2008/098198: WO 2008/096695; WO Pseudomonas puttida O1A1 POO183 21 2008/071673; WO 2008/073498: WO 2008/065370; WO Homo sapiens 2D7 AAO49806 22 2008/067070; JP 2008127301; JP 2008.054644; KR 794395; Rattus norvegicus C27 AABO2287 23 EP 1881066: WO 2007/147827; CN 101078014; JP Oryctolagus cliniculus 2B4 AAA65840 24 2007300852; WO 2007/048235; WO 2007/044688; WO 50 Bacilius subtiis O2A2 OO8394 25 Bacilius subtiis O2A3 OO8336 26 2007/032540: CN 1900286; CN 1900285; JP 2006340611; B. megaterium DSM32 O2A1 P14779 27 WO 2006/126723; KR 2006029792; KR 2006029795; WO B. ceretts ATCC14579 O2AS AAP1 O153 28 2006/105082: WO 2006/076094; US 2006/0156430; WO B. licheniformis ATTC1458 O2A7 YPO79990 29 2006/065126; JP 2006129836; CN 1746293.; WO 2006/ B. thuringiensis serovar X YPO37304 30 029398:JP 2006034215:JP 2006034214; WO 2006/009334; 55 konkukian WO 2005/111216; WO 2005/080572; US 2005/015.0002: Str. 97-27 R. metailidurans CH34 O2E1 YP 585608 31 WO 2005/061699, WO 2005/052152; WO 2005/038033; A. fumigatus Af293 505X EAL92660 32 WO 2005/038018: WO 2005/030944; JP 2005065618: WO A. nidulians FGSC A4 SOSA8 EAA58234 33 2005/017106; WO 2005/017105: US 20050037411; WO A. oryzae ATCC42149 SOSA3 Q2U4F1 34 2005/010166; JP 2005021106; JP 2005021104; JP 60 A. oryzae ATCC42149 X Q2UNA2 35 2005021105, WO 2004/113527; CN 1472323; JP Foxysportin SOSA1 Q9Y8G7 36 G. moniiformis X AAG27132 37 2004261121; WO 2004/013339; WO 2004/011648; DE G. Zege PH1 505A7 EAA67736 38 10234126; WO 2004/003190; WO 2003/087381; WO 2003/ G. Zege PH1 505C2 EAAA7183 39 078577; US 20030170627; US 20030166176; US M. grisea 70-15 syn 505A5 XP365223 40 2003015.0025, WO 2003/057830; WO 2003/052050: CN 65 N. crassa OR74A SOSA2 XP 96.1848 41 1358756; US 20030092658; US 20030078404; US Oryza sativa 97A 20030066103; WO 2003/014341; US 2003.0022334; WO US 9,399,762 B2 21 22 TABLE 3-continued through detailed mutagenesis studies in a conserved region of the protein (see, e.g., Shimizu et al., Biochemistry 27, 4138 Additional cytochrome P450 enzymes of the disclosure. 4141, 1988). In other instances, the conserved amino acid SEQ residue is identified through crystallographic study (see, e.g., Species Cyp No. Accession No. ID NO 5 Poulos et al., J. Mol. Biol 195:687-700, 1987). In yet other Oryza sativa 97B instances, protein sequence alignment algorithms can be used Oryza sativa 97C ABB47954 42 to identify the conserved amino acid residue. For example, BLAST alignment with the P450 BM3 amino acid sequence Note: as the query sequence can be used to identify the heme axial the start methionine (“M”) may be present or absent from these sequences, 10 ligand site and/or the equivalent T268 residue in other cyto In some embodiments, the heme enzyme variant comprises chrome P450 enzymes. afragment of the cytochrome P450 enzyme or variant thereof. Table 5 below provides non-limiting examples of cyto In some embodiments, the heme enzyme variant is a cyto chrome P450 BM3 variants of the disclosure. Each P450 chrome P450 BM3 enzyme variant selected from Table 4, BM3 variant comprises one or more of the listed mutations Table 5 and Table 6. (Variant Nos. 1-31), wherein a "+" indicates the presence of TABLE 4 Examplary cytochrome P450BM3 enzymes variants of the disclosure. P450 variants Mutation compared to wild-type P450 (SEQID NO: 1) P450 (WT-BM3: SEQID NO: 1) None P450-C400A (WT-C400A) C4OOA P450-T268A (BM3-T268) T268A P411 (ABC) C4OOS P411-T268A (ABC-T268A) P411-T438S (ABC-T438A)

BM3-CIS + C400S + A268T (9-10ATS + P87V, C400S)

WT-BM3 (heme) WTheme domain (amino acids 1-463 of SEQID NO: 1) WTAXA (heme) WTheme domain (amino acids 1-463 of SEQID NO: 1) + C400A WTAxD (heme) WTheme domain (amino acids 1-463 of SEQID NO: 1) + C400D WTAxH (heme) WTheme domain (amino acids 1-463 of SEQID NO: 1) + C400H WTAXK (heme) WTheme domain (amino acids 1-463 of SEQID NO: 1) + C400K WTAxM (heme) WTheme domain (amino acids 1-463 of SEQID NO: 1) + C400M WTAXN (heme) WTheme domain (amino acids 1-463 of SEQID NO: 1) + C400N WTAxS (heme) WTheme domain (amino acids 1-463 of SEQID NO: 1) + C400S WTAxY (heme) WTheme domain (amino acids 1-463 of SEQID NO: 1) + C40OY BM3-CIS-T438S -- C4OOA BM3-CIS-T438S - CAOOD BM3-CIS-T438S - CAOOM BM3-CIS-T438S - CAOOY BM3-CIS-T438S - CAOOT

One skilled in the art will understand that any of the muta 60 that particular mutation in the variant. Any of the variants tions listed in Table 4 can be introduced into any cytochrome listed in Table 4 can further comprise an I263 A and/or an P450 enzyme of interest by locating the segment of the DNA A328G mutation and/or at leastone, two, three, four, or five of sequence in the corresponding cytochrome P450 gene which the following alanine Substitutions, in any combination, in the encodes the conserved amino acid residue as described above P450 BM3 enzyme active site: L75A, M177A, L181A, for identifying the conserved cysteine residue in a cyto 65 I263A, and L437A. In particular embodiments, the P450 chrome P450 enzyme of interest that serves as the heme axial BM3 variant comprises or consists of the heme domain of any ligand. In certain instances, this DNA segment is identified one of Variant Nos. 1-31 listed in Table 5 or a fragment US 9,399,762 B2 23 24 thereof, wherein the fragment is capable of carrying out the block 5 (aa 217-268) from CYP102A2, and so on. Non nitrene transfer/sulfimidation of the disclosure. limiting examples of chimeric P450 proteins include those set TABLE 5 Exemplary cytochrome P450 BM3 enzyme variants of the disclosure. P450 variant Mutation 1 2 3 4 S 6 7 8 9 10 11 12 13 14 1S 16

C4OOX ------T268A ------F87V ------9-10A-TS ------T438Z. ------: P450 variant Mutation 17 18. 19 20 21 22 23 24 25 26 27 28 29 3O 31

C4OOX ------T268A ------F87V ------9-10A-TS ------T438Z. ------

Mutations relative to the wild-type P450R3 amino acid sequence (SEQ DNO: 1); “X” is selected from Ala, Asp, Arg, Asn., Glu, Gin, Gly, His, He, Lys, Leu, Met, Phe, Pro, Ser, Thr, Trp, Tyr, and Val; “Z” is selected from Ala, Ser, and Pro; “9-10A-TS” includes the following amino acid substitutions in SEQID NO: 1: V78A,

In other aspects, the disclosure provides chimeric heme forth in Table 6 (C2G9, X7, X7-12, C2E6, X7-9, C2B12, enzymes Such as, e.g., chimeric P450 polypeptides comprised TSP234). In some embodiments, the chimeric heme enzymes of recombined sequences from P450 BM3 and at least two, or of the invention can comprise at least one or more of the more distantly related P450 enzymes from Bacillus subtilis 30 mutations described herein. or variants. As a non-limiting example, site-directed recom bination of three bacterial cytochrome P450s can be per TABLE 6 formed with sequence crossover sites selected to minimize the number of disrupted contacts within the proteinstructure. Exemplary preferred chimeric cytochrome P450 enzymes of the invention. In some embodiments, seven crossover sites can be chosen, 35 resulting in eight sequence blocks. One skilled in the art will Heme domain understand that the number of crossover sites can be chosen to Chimeric P450s block sequence SEQ ID NO produce the desired number of sequence blocks, e.g., 1, 2, 3, C2G9 22223132 43 4, 5, 6, 7, 8, or 9 crossover sites for 2, 3, 4, 5, 6, 7, 8, 9, or 10 X7 22312333 44 sequence blocks, respectively. In other embodiments, the 40 X7-12 12112333 45 numbering used for the chimeric P450 refers to the identity of C2E6 111.13311 46 X7-9 32312333 47 the parent sequence at each block. For example, “12312312 C2B12 32313233 48 refers to a sequence containing block 1 from P450 #1, block TSP234 2231.3333 49 2 from P450 #2, block 3 from P450 #3, block 4 from P450 #1, block 5 from P450 #2, and so on. A chimeric library useful for 45 generating the chimeric heme enzymes of the invention can In another embodiment, the disclosure provides a method be constructed as described in U.S. Pat. Publ. No. US-2012 for catalyzing a nitrene insertion or transfer into a —S—bond C171693-A1 to Arnold et al., the disclosure of which is incor to produce a product with a new S N bond. The method porated herein for all purposes. comprising: providinga —S— containing Substrate, a nitrene As a non-limiting example, chimeric P450 proteins com 50 precursor and an engineered P450 enzyme as described prising recombined sequences or blocks of amino acids from herein and above; and allowing the reaction to proceed for a CYP102A1 (Accession No.J.04832), CYP102A2 (Accession time sufficient to form a product having a new S N bond. No. CAB12544), and CYP102A3 (Accession No. U93874) In some embodiments, the engineered P450 enzyme is can be constructed. In certain instances, the CYP102A1 par expressed in a bacterial, archaeal or fungal host organism. ent sequence is assigned “1”, the CYP102A2 parent sequence 55 In some embodiments, the cytochrome P450 enzyme is a is assigned “2, and the CYP102A3 is parent sequence P450 BM3 enzyme or a variant thereof. In some instances, the assigned '3'. In some instances, each parent sequence is P450 BM3 enzyme comprises the amino acid sequence set divided into eight sequence blocks containing the following forth in SEQID NO: 1 or a variant thereof. amino acids (aa): block 1: aa 1-64; block 2: aa 65-122; block An enzyme’s total turnover number (or TTN) refers to the 3: aa 123-166; block 4: aa 167-216; block 5: aa 217-268; 60 maximum number of molecules of a substrate that the block 6: aa 269-328; block 7: aa 329-404; and block 8: aa enzyme can convert before becoming inactivated. In general, 405-end. Thus, in this example, there are eight blocks of the TTN for the heme enzymes of the disclosure range from amino acids and three fragments are possible at each block. about 1 to about 100,000 or higher. For example, the TTN can For instance, “12312312 refers to a chimeric P450 protein of be from about 1 to about 1,000, or from about 1,000 to about the invention containing block 1 (aa 1-64) from CYP102A1, 65 10,000, or from about 10,000 to about 100,000, or from about block 2 (aa 65-122) from CYP102A2, block 3 (aa 123-166) 50,000 to about 100,000, or at least about 100,000. In par from CYP102A3, block 4 (aa 167-216) from CYP102A1, ticular embodiments, the TTN can be from about 100 to about US 9,399,762 B2 25 26 10,000, or from about 10,000 to about 50,000, or from about polypeptides expressed in the cell as it exists in nature. A 5,000 to about 10,000, or from about 1,000 to about 5,000, or “wild-type cell refers instead to a cell which has not been from about 100 to about 1,000, or from about 250 to about engineered and displays the genotype and phenotype of said 1,000, or from about 100 to about 500, or at least about 10, 15, cell as found in nature. 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85,90, 95, 5 The term “engineer refers to any manipulation of a mol 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, ecule or cell that result in a detectable change in the molecule 700, 750, 800, 850, 900, 950, 1000, 1500, 2000, 2500, 3000, or cell, wherein the manipulation includes but is not limited to 3500, 4000, 4500, 5000, 5500, 6000, 6500, 7000, 7500, 8000, inserting a polynucleotide and/or polypeptide heterologous 8500, 9000, 9500, 10,000, 15,000, 20,000, 25,000, 30,000, to the cell and mutating a polynucleotide and/or polypeptide 35,000, 40,000, 45,000, 50,000, 55,000, 60,000, 65,000, 10 native to the cell. Engineered cells can also be obtained by 70,000, 75,000, 80,000, 85,000, 90,000, 95,000, 100,000, or modification of the cell' genetic material, lipid distribution, or more. In certain embodiments, the variant or chimeric heme protein content. In addition to recombinant production, the enzymes of the disclosure have higher TTNs compared to the enzymes may be produced by direct peptide synthesis using wild-type sequences. In some instances, the variant or chi solid-phase techniques, such as Solid-Phase Peptide Synthe meric heme enzymes have TTNs greater than about 100 (e.g., 15 sis. Peptide synthesis may be performed using manual tech at least about 100, 150, 200, 250, 300, 325, 350, 400, 450, niques or by automation. Automated synthesis may be 500, or more) in carrying out in vitro sulfimidation reactions. achieved, for example, using Applied Biosystems 431 A Pep In other instances, the variant orchimeric heme enzymes have tide Synthesizer (PerkinElmer, Foster City, Calif.) in accor TTNs greater than about 1000 (e.g., at least about 1000, 2500, dance with the instructions provided by the manufacturer 5000, 10,000, 25,000, 50,000, 75,000, 100,000, or more) in Variants of naturally-occurring sequences can be generated carrying out in vivo whole cell reactions. by site-directed mutagenesis (Botstein and Shortle 1985; In general, the term “mutant” or “variant' as used herein Smith 1985; Carter 1986; Dale and Felix 1996; Ling and with reference to a molecule Such as polynucleotide or Robinson 1997), mutagenesis using uracil containing tem polypeptide, indicates that has been mutated from the mol plates (Kunkel, Roberts etal. 1987: Bass, Sorrells etal. 1988), ecule as it exits in nature. In particular, the term "mutate' and 25 oligonucleotide-directed mutagenesis (Zoller and Smith “mutation” as used herein indicates any modification of a 1983; Zoller and Smith 1987; Zoller 1992), phosphorothio nucleic acid and/or polypeptide which results in an altered ate-modified DNA mutagenesis (Taylor, Schmidt et al. 1985; nucleic acid or polypeptide. Mutations include any process or Nakamaye and Eckstein 1986; Sayers, Schmidt et al. 1988), mechanism resulting in a mutant protein, enzyme, polynucle mutagenesis using gapped duplex DNA (Kramer, Drutsaetal. otide, gene, or cell. This includes any mutation in which a 30 1984: Kramer and Fritz 1987), point mismatch, mutagenesis polynucleotide or polypeptide sequence is altered, as well as using repair-deficient host strains, deletion mutagenesis any detectable change in a cell wherein the mutant polynucle (Eghtedarzadeh and Henikoff 1986), restriction-selection otide or polypeptide is expressed arising from Such a muta and restriction-purification (Braxton and Wells 1991), tion. Typically, a mutation occurs in a polynucleotide or gene mutagenesis by total gene synthesis (Nambiar, Stackhouse et sequence, by point mutations, deletions, or insertions of 35 al. 1984; Grundstrom, Zenke et al. 1985; Wells, Vasser et al. single or multiple nucleotide residues. A mutation in a poly 1985), double-strand break repair (Mandecki 1986), and the nucleotide includes mutations arising withina protein-encod like. Additional details on many of the above methods can be ing region of a gene as well as mutations in regions outside of found in Methods in Enzymology Volume 154, which also a protein-encoding sequence, Such as, but not limited to, describes useful controls for trouble-shooting problems with regulatory or promoter sequences. A mutation in a coding 40 various mutagenesis methods. polynucleotide Such as a gene can be 'silent’, i.e., not Additional details regarding the methods to generate vari reflected in an amino acid alteration upon expression, leading ants of naturally-occurring sequences can be found in the to a “sequence-conservative' variant of the gene. A mutation following U.S. patents, PCT publications, and EPO publica in a polypeptide includes but is not limited to mutation in the tions: U.S. Pat. No. 5,605,793 to Stemmer (Feb. 25, 1997), polypeptide sequence and mutation resulting in a modified 45 “Methods for In vitro Recombination: U.S. Pat. No. 5,811, amino acid. Non-limiting examples of a modified amino acid 238 to Stemmer et al. (Sep. 22, 1998) “Methods for Generat include a glycosylated amino acid, a Sulfated amino acid, a ing Polynucleotides having Desired Characteristics by Itera prenylated (e.g., farnesylated, geranylgeranylated) amino tive Selection and Recombination: U.S. Pat. No. 5,830,721 acid, an acetylated amino acid, an acylated amino acid, a to Stemmer et al. (Nov. 3, 1998), “DNA Mutagenesis by PEGylated amino acid, a biotinylated amino acid, a carboxy 50 Random Fragmentation and Reassembly: U.S. Pat. No. lated amino acid, a phosphorylated amino acid, and the like. 5,834,252 to Stemmer, et al. (Nov. 10, 1998) “End-Comple References adequate to guide one of skill in the modification mentary Polymerase Reaction:” U.S. Pat. No. 5,837,458 to of amino acids are replete throughout the literature. Example Minshull, et al. (Nov. 17, 1998), "Methods and Compositions protocols are found in Walker (1998) Protein Protocols on for Cellular and Metabolic Engineering:” WO95/22625, CD-ROM (Humana Press, Towata, N.J.). 55 Stemmer and Crameri, “Mutagenesis by Random Fragmen A mutant or engineered protein or enzyme is usually, tation and Reassembly:” WO 96/33207 by Stemmer and Lip although not necessarily, expressed from a mutant polynucle schutz "End Complementary Polymerase Chain Reaction: otide or gene. Engineered cells can be obtained by introduc WO97/20078 by Stemmer and Crameri “Methods for Gen tion of an engineered gene or part of it in the cell. The terms erating Polynucleotides having Desired Characteristics by “engineered cell”, “mutant cell' or “recombinant cell as 60 Iterative Selection and Recombination:” WO 97/35966 by used herein refer to a cell that has been altered or derived, or Minshull and Stemmer, “Methods and Compositions for Cel is in Some way different or changed, from a parent cell, lular and Metabolic Engineering:” WO 99/41402 by Pun including a wild-type cell. The term “recombinant as used nonen et al. “Targeting of Genetic Vaccine Vectors: WO herein with reference to a cell in alternative to “wild-type' or 99/41383 by Punnonen et al. “Antigen Library Immuniza “native', indicates a cell that has been engineered to modify 65 tion:” WO 99/41369 by Punnonen et al. “Genetic Vaccine the genotype and/or the phenotype of the cell as found in Vector Engineering:” WO 99/41368 by Punnonen et al. nature, e.g., by modifying the polynucleotides and/or “Optimization of Immunomodulatory Properties of Genetic US 9,399,762 B2 27 28 Vaccines.” EP 752008 by Stemmer and Crameri, “DNA metric, enzymatic, or luminescence assay and the like. For Mutagenesis by Random Fragmentation and Reassembly;' example a method for making libraries for directed evolution EP 0932670 by Stemmer “Evolving Cellular DNAUptake by to obtain P450s with new or altered properties is recombina Recursive Sequence Recombination:” WO 99/23107 by tion, or chimeragenesis, in which portions of homologous Stemmer et al., “Modification of Virus Tropism and Host P450s are Swapped to form functional chimeras, can use used. Range by Viral Genome Shuffling.” WO99/21979 by Apt et Recombining equivalent segments of homologous proteins al., “Human Papillomavirus Vectors:” WO 98/31837 by del generates variants in which every amino acid Substitution has Cardayre et al. “Evolution of Whole Cells and Organisms by already proven to be successful in one of the parents. There Recursive Sequence Recombination:” WO 98/27230 by Pat fore, the amino acid mutations made in this way are less ten and Stemmer, “Methods and Compositions for Polypep 10 tide Engineering:” WO 98/13487 by Stemmer et al., “Meth disruptive, on average, than random mutations. A structure ods for Optimization of Gene Therapy by Recursive based algorithm, such as SCHEMA, can be used to identify Sequence Shuffling and Selection:” WO00/00632, “Methods fragments of proteins that can be recombined to minimize for Generating Highly Diverse Libraries:” WO 00/09679, disruptive interactions that would prevent the protein from “Methods for Obtaining in vitro Recombined Polynucleotide 15 folding into its active form. Sequence Banks and Resulting Sequences: WO 98/42832 In some embodiments, activation of a target site in an by Arnold et al., “Recombination of Polynucleotide organic molecule can be performed in a whole-cell system. To Sequences. Using Random or Defined Primers: WO prepare the whole-cell system, the encoding sequence can be 99/29902 by Arnold et al., “Method for Creating Polynucle introduced into a host cell using a suitable vector, Such as a otide and Polypeptide Sequences:” WO 98/41653 by Vind, plasmid, a cosmid, a phage, a virus, a bacterial artificial “An invitro Method for Construction of a DNA Library;”WO chromosome (BAC), a yeast artificial chromosome (YAC), or 98/41622 by Borchert et al., “Method for Constructing a the like, into which the said sequence of the disclosure has Library Using DNA Shuffling.” WO 98/.42727 by Pati and been inserted, in a forward or reverse orientation. In some Zarling, “Sequence Alterations using Homologous Recom embodiments, the construct further comprises regulatory bination:” WO 00/18906 by Patten et al., “Shuffling of 25 sequences, including, for example, a promoter linked to the Codon-Altered Genes:” WO 00/04190 by del Cardayre et al. sequence. Large numbers of Suitable vectors and promoters “Evolution of Whole Cells and Organisms by Recursive are known to those of skill in the art, and are commercially Recombination:” WO 00/42561 by Crameri et al., “Oligo available. nucleotide Mediated Nucleic Acid Recombination: WO Accordingly, in other embodiments, vectors that include a 00/42559 by Selifonov and Stemmer “Methods of Populating 30 nucleic acid molecule of the disclosure are provided. In other Data Structures for Use in Evolutionary Simulations: WO embodiments, host cells transfected with a nucleic acid mol 00/42560by Selifonov et al., “Methods for Making Character ecule of the disclosure, or a vector that includes a nucleic acid Strings, Polynucleotides & Polypeptides Having Desired molecule of the disclosure, are provided. Host cells include Characteristics:” WO 01/23401 by Welch et al., “Use of eucaryotic cells such as yeast cells, insect cells, or animal Codon-Varied Oligonucleotide Synthesis for Synthetic Shuf 35 cells. Host cells also include procaryotic cells such as bacte fling:” and WO 01/64864 “Single-Stranded Nucleic Acid rial cells. Template-Mediated Recombination and Nucleic Acid Frag In other embodiments, methods for producing a cell for ment Isolation” by Affholter. carrying out or producing an enzyme catalyst of the disclo In particular, in Some embodiments, site-directed Sure are provided. Such methods generally include: (a) trans mutagenesis can be performed on predetermined residues of 40 forming a cell with an isolated nucleic acid molecule encod a heme containing enzyme or P450 polypeptide. These pre ing a polypeptide having the enzymatic activity that transfers determined sites can be identified using the crystal structure or inserts a nitrene into a —S-target site; (b) transforming a of the heme or P450 enzyme if available or a crystal structure cell with an isolated nucleic acid molecule encoding a of a homologous protein that shares at least 20% sequence polypeptide of the disclosure; or (c) transforming a cell with identity with a heme or P450 enzyme of the disclosure and an 45 an isolated nucleic acid molecule of the disclosure. alignment of the polynucleotide or amino acid sequences and The terms “vector”, “vector construct” and “expression its homologous protein. Mutagenesis of the predetermined vector” as used herein refer to a vehicle by which a DNA or sites can be performed changing one, two or three of the RNA sequence (e.g. a foreign gene) can be introduced into a nucleotides in the codon that encodes for each of the prede host cell, so as to transform the host and promote expression termined amino acids. Mutagenesis of the predetermined 50 (e.g. transcription and translation) of the introduced sites can be performed in the described way so that each of the sequence. Vectors typically comprise the DNA of a transmis predetermined amino acid is mutated to any of the other 19 sible agent, into which foreign DNA encoding a protein is natural amino acids. Substitution of the predetermined sites inserted by restriction enzyme technology. A common type of with unnatural amino acids can be performed using methods vector is a "plasmid', which generally is a self-contained established in vivo (Wang, Xie et al. 2006), in vitro (Shimizu, 55 molecule of double-stranded DNA that can readily accept Kuruma et al. 2006), semisynthetic (Schwarzer and Cole additional (foreign) DNA and which can readily introduced 2005) or synthetic methods (Camarero and Mitchell 2005) for into a suitable host cell. A large number of vectors, including incorporation of unnatural amino acids into polypeptides. plasmid and fungal vectors, have been described for replica In still further embodiments, libraries of engineered vari tion and/or expression in a variety of eukaryotic and prokary ants can be obtained by laboratory evolutionary methods 60 otic hosts. Non-limiting examples include pKK plasmids and/or rational design methods, using one or a combination of (Clonetech), puC plasmids, pET plasmids (Novagen, Inc., techniques such as random mutagenesis, site-saturation Madison, Wis.), pRSET or pREP plasmids (Invitrogen, San mutagenesis, site-directed mutagenesis, DNA shuffling, Diego, Calif.), or pMAL plasmids (New England Biolabs, DNA recombination, and the like and targeting one or more of Beverly, Mass.), and many appropriate host cells, using meth the amino acid residues, one at a time or simultaneously. Said 65 ods disclosed or cited herein or otherwise known to those libraries can be arrayed on multi-well plates and screened for skilled in the relevant art. Recombinant cloning vectors will activity on the target molecule using a colorimetric, fluori often include one or more replication systems for cloning or US 9,399,762 B2 29 30 expression, one or more markers for selection in the host, e.g., In another embodiment, the azide has a structure selected antibiotic resistance, and one or more expression cassettes. from the group consisting of The terms “express' and “expression” refers to allowing or causing the information in a gene or DNA sequence to become manifest, for example producing a protein by acti Vating the cellular functions involved in transcription and translation of a corresponding gene or DNA sequence. A DNA sequence is expressed in or by a cell to forman “expres sion product” such as a protein. The expression product itself, e.g. the resulting protein, may also be said to be “expressed 10 O O by the cell. A polynucleotide or polypeptide is expressed \/ recombinantly, for example, when it is expressed or produced YN, N3, N3, in a foreign host cell under the control of a foreign or native promoter, or in a native host cell under the control of a foreign 15 promoter. F OMe Polynucleotides provided herein can be incorporated into O O OR O O any one of a variety of expression vectors Suitable for express ing a polypeptide. Suitable vectors include chromosomal, R1ul NN, R.No1'N, Y R1'N,\/ nonchromosomal and synthetic DNA sequences, e.g., deriva tives of SV40; bacterial plasmids; phage DNA; baculovirus: yeast plasmids; vectors derived from combinations of plas wherein R' is any alkyl, aryl, -OR, NR, wherein R, R and mids and phage DNA, viral DNA such as vaccinia, adenovi Rare any alkyl, or aryl. In a specific embodiment, the azide rus, fowlpox virus, pseudorabies, adenovirus, adeno-associ is a tosyl azide. ated viruses, retroviruses and many others. Any vector that 25 The nitrene precursor can also be is selected from the group transduces genetic material into a cell, and, if replication is consisting of: desired, which is replicable and viable in the relevant host can be used. Vectors can be employed to transform an appropriate host O to permit the host to express a protein or polypeptide. 30 O l V/ Examples of appropriate expression hosts include: bacterial RI N1 X, RI N=OTs, R11 S NN1 X, cells, such as E. coli, B. subtilis, Streptomyces, and Salmo H nella typhimurium; fungal cells, such as Saccharomyces cer Na evisiae, Pichia pastoris, and Neurospora crassa; insect cells O O 35 V/ V/ O OR Such as Drosophila and Spodoptera frugiperda; mammalian S S V/ cells such as CHO, COS, BHK, HEK 293 br Bowes mela R3 1n N=OTs, R 1n NH2, R P X, noma; or plant cells or explants, etc. No1 NNE1 In bacterial systems, a number of expression vectors may O OR O OR be selected depending upon the use intended for the polypep R \/ n1\/ N. tide. For example, such vectors include, but are not limited to, 40 Yo1 YN=OTs and O NH2, multifunctional E. coli cloning and expression vectors such as BLUESCRIPT (Stratagene), in which the coding sequence may be ligated into the vector in-frame with sequences for the wherein R' is any alkyl, aryl, OR, NR, wherein R, R and amino-terminal Met and the subsequent 7 residues of beta Rare any alkyl, or aryland wherein—OTs can be ITs. In galactosidase so that a hybrid protein is produced; plN vec 45 certain aspects, the nitrene precursor contains a leaving tors; pl.T vectors; and the like. group. Suitable leaving groups include, but are not limited to, Similarly, in the yeast Saccharomyces cerevisiae a number OTs (tosylates), OMs (mesylates), halogen, N, H. and ITs of vectors containing constitutive or inducible promoters (N-tosylimine). such as alpha factor, alcohol oxidase and PGH may be used In certain aspects, the disclosure provides methods and for production of an enzyme catalyst of the disclosure. 50 systems for heme-containing enzymes to catalyze nitrogen In order to performe the sulfimidation reactions described transfer to a Sulfur in an organosulfur compound, also known herein, a nitrene precursor is used. The nitrene precursor can as Sulfimidation or Sulfoximidation. The reactions can be be an azide. For example, the nitrene precursor can have the intermolecular, intramolecular and a combination thereof. general formula: R' N, wherein R is: These heme containing enzymes catalyze the Sulfimidation or (i) a substituted or unsubstituted aryl, a substitute or unsub 55 sulfoximidation via nitrene transfer or insertion, which stituted alkyl, —OR, or—NR, wherein R is a substi allows the generation of a new S N bond. The reactions tuted or unsubstituted aryl, a substitute or unsubstituted proceed with high regio, chemo, and/or diastereoselectivity alkyl: as a result of uing a heme containing enzyme. (ii) - SOR, wherein R a substituted or unsubstituted In one embodiment, the disclosure provides a method for aryl, a substitute or unsubstituted alkyl, -OR or NR, 60 catalyzing a nitrene insertion or transfer to a Sulfur atom targe wherein Rare any alkyl or aryl; in an oranosulfur compound to produce a product having a (iii) —COR wherein R is a substituted or unsubstituted new S N bond. The method comprises: providing a sulfur aryl, a substitute or unsubstituted alkyl, —OR or NR, containing Substrate, a nitrene precursor and an engineered wherein R are any alkyl or aryl; or heme enzyme; and allowing the reaction to proceed for a time (iv) -P(O)(OR)(OR), wherein R and Rare indepen 65 sufficient to form a product having a new S-N bond. In other dently H, a substituted or unsubstituted aryl, a substitute embodiments, the disclosure provides a product of the meth or unsubstituted alkyl. ods herein. US 9,399,762 B2 31 32 In certain embodiments, the nitrene precursor contains an precursor can be about 1, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, azide functional group. In one embodiment, a product 55, 60, 65,70, 75,80, 85,90,95, 100, 150, 200,250,300,350, obtained from the methods of the disclosure comprises a 400, 450, or 500 mM. compound of Formula la: Reaction mixtures can contain additional components. As non-limiting examples, the reaction mixtures can contain buffers (e.g., 2-(N-morpholino)ethanesulfonic acid (MES), Formula 1 a 2-4-(2-hydroxyethyl)piperazin-1-ylethanesulfonic acid R1 (HEPES), 3-morpholinopropane-1-sulfonic acid (MOPS), N1 2-amino-2-hydroxymethyl-propane-1,3-diol (TRIS), potas 2S 10 sium phosphate, Sodium phosphate, phosphate-buffered R321 N2 saline, Sodium citrate, Sodium acetate, and sodium borate), cosolvents (e.g., dimethylsulfoxide, dimethylformamide, ethanol, methanol, isopropanol, glycerol, tetrahydrofuran, wherein R' is a sulfoxide, a carbonyl or a phosphonate: acetone, acetonitrile, and acetic acid), salts (e.g., NaCl. KCl, wherein R is H or any alkyl or aryl; and wherein R is H, O 15 CaCl2, and salts of Mn" and Mg"), denaturants (e.g., urea or an optionally Substituted aryl group. In one embodiment, and guandinium hydrochloride), detergents (e.g., Sodium R" is a sulfoxide of formula SOR, wherein Rany alkyl, any dodecylsulfate and Triton-X 100), chelators (e.g., ethylene aryl, —OR or NR", wherein Rand R7 are any alkyl or any glycol-bis(2-aminoethylether)-N.N.N',N'-tetraacetic acid aryl. IN a further embodiment, R is any alkyl or aryl. In still (EGTA), 2-(2-Bis(carboxymethyl)aminoethyl (car a further or alternative embodiment, R is H. In another boxymethyl)amino)acetic acid (EDTA), and 1.2-bis(o-ami embodiment, R' is a phosphonate of formula P(O)(OR) nophenoxy)ethane-N,N,N',N'-tetraacetic acid (BAPTA)), (OR), wherein Rand Rare independently any aryl or any Sugars (e.g., glucose, Sucrose, and the like), and reducing alkyl. In a further embodiment, R is any alkyl or any aryl. In agents (e.g., sodium dithionite, NADPH, NADH, dithiothrei still a further embodiment, R is any alkyl or aryl. In another tol (DTT), B-mercaptoethanol (BME), and tris(2-carboxy embodiment, R' is a carbonyl group. In a further embodi 25 ethyl)phosphine (TCEP)). Buffers, cosolvents, salts, denatur ment, R is any alkyl or any aryl. Instillafurther embodiment, ants, detergents, chelators, Sugars, and reducing agents can be R is any alkyl or any aryl. In another embodiment, R is an used at any Suitable concentration, which can be readily deter optionally Substituted aryl group. In a further embodiment, mined by one of skill in the art. In general, buffers, cosolvents, R" is any alkyl or any aryl. In still a further embodiment, R is salts, denaturants, detergents, chelators, Sugars, and reducing H or any alkyl or any aryl. In another embodiment, R' is a 30 agents, if present, are included in reaction mixtures at con carbonyl group. In a further embodiment, R is any alkyl or centrations ranging from about 1 uM to about 1 M. For any aryl. In still a further embodiment, R is O. example, a buffer, a cosolvent, a salt, a denaturant, a deter The methods of the disclosure include forming reaction gent, a chelator, a Sugar, or a reducing agent can be included mixtures that contain the heme enzymes described herein. in a reaction mixture at a concentration of about 1 uM, or 35 about 10 uM, or about 100 uM, or about 1 mM, or about 10 The heme enzymes can be, for example, purified prior to mM, or about 25 mM, or about 50 mM, or about 100 mM, or addition to a reaction mixture or secreted by a cell present in about 250 mM, or about 500 mM, or about 1 M. In some the reaction mixture. The reaction mixture can contain a cell embodiments, a reducing agent is used in a Sub-stoichiomet lysate including the enzyme, as well as other proteins and ric amount with respect to the organosulfur Substrate and other cellular materials. Alternatively, a heme enzyme can 40 nitrene precursor. CoSolvents, in particular, can be included in catalyze the reaction within a cell expressing the heme the reaction mixtures in amounts ranging from about 1% V/v enzyme. Any suitable amount of heme enzyme can be used in to about 75% V/V, or higher. A cosolvent can be included in the the methods of the disclosure. In general, the reaction mix reaction mixture, for example, in an amount of about 5, 10. tures contain from about 0.01 mol% to about 10 mol%heme 20, 30, 40, or 50% (v/v). enzyme with respect to the nitrene precursor and/or substrate. 45 Reactions are conducted under conditions sufficient to The reaction mixtures can contain, for example, from about catalyze the formation of the desired products. The reactions 0.01 mol % to about 0.1 mol % heme enzyme, or from about can be conducted at any Suitable temperature. In general, the 0.1 mol % to about 1 mol % heme enzyme, or from about 1 reactions are conducted at a temperature of from about 4°C. mol % to about 10 mol % heme enzyme. The reaction mix to about 40°C. The reactions can be conducted, for example, tures can contain from about 0.05 mol % to about 5 mol % 50 at about 25° C. or about 37° C. The reactions can be con heme enzyme, or from about 0.05 mol % to about 0.5 mol% ducted at any suitable pH. In general, the reactions are con heme enzyme. The reaction mixtures can contain about 0.1. ducted at a pH of from about 6 to about 10. The reactions can 0.2,0.3, 0.4,0.5,0.6, 0.7, 0.8, 0.9, or about 1 mol % heme be conducted, for example, at a pH of from about 6.5 to about enzyme. 9. The reactions can be conducted for any suitable length of 55 time. In general, the reaction mixtures are incubated under The concentration of the organosulfur Substrate and nitrene suitable conditions for anywhere between about 1 minute and precursor are typically in the range of from about 100 uM to several hours. The reactions can be conducted, for example, about 1 M. The concentration can be, for example, from about for about 1 minute, or about 5 minutes, or about 10 minutes, 100 uM to about 1 mM, or about from 1 mM to about 100mM, or about 30 minutes, or about 1 hour, or about 2 hours, or or from about 100 mM to about 500 mM, or from about 500 60 about 4 hours, or about 8 hours, or about 12 hours, or about 24 mM to 1 M. The concentration can be from about 500 uM to hours, or about 48 hours, or about 72 hours. Reactions can be about 500 mM,500 uM to about 50 mM, or from about 1 mM conducted under aerobic conditions or anaerobic conditions. to about 50 mM, or from about 15 mM to about 45 mM, or Reactions can be conducted under an inert atmosphere. Such from about 15 mM to about 30 mM. The concentration of as a nitrogen atmosphere or argon atmosphere. In some organosulfur Substrate and nitrene precursor can be, for 65 embodiments, a solvent is added to the reaction mixture. In example, about 100, 200,300,400, 500, 600, 700, 800, or 900 Some embodiments, the solventforms a second phase, and the uM. The concentration of organosulfur Substrate and nitrene cyclopropanation occurs in the aqueous phase. In some US 9,399,762 B2 33 34 embodiments, the heme enzyme is located in the aqueous posable mirror images of each other. Similarly, the two trans layer whereas the Substrates and/or products occur in an isomers are enantiomers. One of skill in the art will appreciate organic layer. Other reaction conditions may be employed in that the absolute stereochemistry of a product—that is, the methods of the disclosure, depending on the identity of a whether a given chiral center exhibits the right-handed “R” particular heme enzyme, organosulfur Substrate and nitrene configuration or the left-handed “S” configuration will precursor. depend on factors including the structures of the particular Reactions can be conducted in vivo with intact cells Substrate and nitrene precursor used in the reaction, as well as expressing a heme enzyme of the disclosure. The in vivo the identity of the enzyme. The relative stereochemistry— reactions can be conducted with any of the host cells used for that is, whether a product exhibits a cis or trans configura expression of the heme enzymes, as described herein. A sus 10 pension of cells can be formed in a Suitable medium Supple tion—as well as for the distribution of product mixtures will mented with nutrients (such as mineral micronutrients, glu also depend on Such factors. cose and other fuel sources, and the like). Nitrene transfer or In certain instances, the product mixtures have cis:trans insertion yields from reactions in vivo can be controlled, in ratios ranging from about 1:99 to about 99:1. The cis:trans part, by controlling the cell density in the reaction mixtures. 15 ratio can be, for example, from about 1:99 to about 1:75, or Cellular Suspensions exhibiting optical densities ranging from about 1:75 to about 1:50, or from about 1:50 to about from about 0.1 to about 50 at 600 nm can be used for nitrene 1:25, or from about 99:1 to about 75:1, or from about 75:1 to transfer reactions. Other densities can be useful, depending about 50:1, or from about 50:1 to about 25:1. The cis:trans on the cell type, specific heme enzymes, or other factors. ratio can be from about 1:80 to about 1:20, or from about 1:60 The methods of the disclosure can be assessed in terms of to about 1:40, or from about 8.0:1 to about 20:1 or from about the diastereoselectivity and/or enantioselectivity of sulfimi 60:1 to about 40:1. The cis:trans ratio can be about 1:5, 1:10, dation or Sulfoximidatino reaction—that is, the extent to 1:15, 1:20, 1:25, 1:30, 1:35, 1:40, 1:45, 1:50, 1:55, 1:60, 1:65, which the reaction produces a particular isomer, whether a 1:70, 1:75, 1:80, 1:85, 1:90, or about 1:95. The cis:trans ratio diastereomer or enantiomer. A perfectly selective reaction can be about 5:1, 10:1, 15:1, 20:1, 25:1, 30:1, 35:1, 40:1, 45:1, produces a single isomer, such that the isomer constitutes 25 50:1, 55:1, 60:1, 65:1, 70:1, 75:1, 80:1, 85:1, 90:1, or about 100% of the product. As another non-limiting example, a 95:1. reaction producing a particular enantiomer constituting 90% The distribution of a product mixture can be assessed in of the total product can be said to be 90% enantioselective. A terms of the enantiomeric excess, or “9% ee.' of the mixture. reaction producing a particular diastereomer constituting The enantiomeric excess refers to the difference in the mole 30% of the total product, meanwhile, can be said to be 30%d 30 fractions of two enantiomers in a mixture. In certain diastereoselective. In general, the methods of the invention include reactions instances, as a non-limiting example, for instance, the enan that are from about 1% to about 99% diastereoselective. The tiomeric excess of the “E” or trans (R,R) and (S,S) enanti reactions are from about 1% to about 99% enantioselective. omers can be calculated using the formula: %>eeE=(% The reaction can be, for example, from about 10% to about 35 R.R-% s.sy(% R.R+% s,s)x100%), wherein X is the mole 90% diastereoselective, or from about 20%) to about 80%) fraction for a given enantiomer. The enantiomeric excess of diastereoselective, or from about 40% to about 60%) dias the “Z” or cis enantiomers (% eez) can be calculated in the tereoselective, or from about 1% to about 25% diastereose Saale. lective, or from about 25% o to about 50% diastereoselective, In certain instances, product mixtures exhibit 96 ee values or from about 50% to about 75% diastereoselective. The 40 ranging from about 1% to about 99%, or from about -1% to reaction can be about 10%, 15%, 20%, 25%, 30%, 35%, 40%, about -99%. The closer a given '% ee value is to 99% (or 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, or -99%), the purer the reaction mixture is. The % ee can be, for about 95% diastereoselective. The reaction can be from about example, from about -90% to about 90%), or from about 10% to about 90% enantioselective, from about 20% to about -80% to about 80%, or from about -70% to about 70%, or 80% enantioselective, or from about 40% to about 60% enan 45 from about-60%) to about 60%, or from about-40% to about tioselective, or from about 1% to about 25% enantioselective, 40%, or from about -20% to about 20%). The % ee can be or from about 25% to about 50% enantioselective, or from from about 1% to about 99%, or from about 20% to about about 50% to about 75% enantioselective. The reaction can be 80%, or from about 40% to about 60%, or from about 1% to about 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, about 25%, or from about 25% to about 50%), or from about 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, or about 95% 50 50% to about 75%. The % ee can be from about -196 to about enantioselective. Accordingly some embodiments of the dis closure provide methods wherein the reaction is at least 30% -99%, or from about -20% to about -80%, or from about to at least 90% diastereoselective. In some embodiments, the -40% to about-60%, or from about -1% to about -25%), or reaction is at least 30% to at least 90% enantioselective. As from about -25% to about -50%, or from about -50% to about -75%. The % ee can be about -99%, -95%, -90%, described herein, the ratios and reactants (e.g., the type of 55 nitrene precursor the heme variant and cofactors) can be -85%, -80%, -75%, -70%, -65%, -60%, -55%, -50%, modified to yield a desired ratio of enantiomers. -45%, -40%, -35%, -30%, -25% -20%, -15%, -10%, One of skill in the art will appreciate that stereochemical 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, configuration of certain of the products herein will be deter 60%. 65%, 70%, 75%, 80%, 85%, 90%, or about 95%. Any of mined in part by the orientation of the product of the enzy 60 these values can be '% eeE values or % eez values. matic step. Certain of the products herein will be “cis' com Accordingly, Some embodiments of the disclosure provide pounds or “Z” compounds. Other products will be “trans’ methods for producing a plurality of products having a '% eez compounds or 'E' compounds. of from about -90% to about 90%. In some embodiments, In certain instances, two cis isomers and two trans isomers the '% eez is at least 90%. In some embodiments, the '% eez is can arise from the reaction of an organosulfur Substrate and a 65 at least-99%. In some embodiments, the '% eeE is from about nitrene precursor. The two cis isomers are enantiomers with -90% to about 90%. In some embodiments, the '% eeE is at respect to one another, in that the structures are non-Superim least 90%. In some embodiments, the '% eeE is at least-99%. US 9,399,762 B2 35 36 EXAMPLES TABLE B Sulfimidation Activity and Selectivity of BM3 Variants The present disclosure is further illustrated in the following using Substrates and Reaction Conditions Shown examples, which are provided by way of illustration and are n not intended to be limiting. Me Work on intramolecular C H amination was limited to aryl Sulfonylazide Substrates as nitrene precursors. Despite -- the Success with this Substrate class, research was performed MeO 7a. to assess the influence of the R-group on the nitrenoid transfer 10 and thus a series of Substrates displaying a range of stereo O O electronic properties that have been shown to be effective \/ nitrene precursors in other contexts were tested (FIG. 5-6). NN, Enzyme (0.2 mol%) For the thioether acceptor substrate thioanisole was cho 0.1 M KPi, pH = 8.0 15 RT, 4 hours sen, which has been used in enzymatic Sulfoxidation by cyto MeO chrome P450s and other oxygenase enzymes. As a catalyst, P411-CIST438S, a variant of cytochrome P450s, pos 1 sessing the aforementioned C400S mutation was used. This Ts enzyme, which contains 14 mutations relative to wild-type P450 (Table A), was previously shown to be a good cata S n lyst in the activation of azides for intramolecular C H inser Me tion. Reaction conditions were similar to those reported for intramolecular C-H amination (see, e.g., International MeO Application Publication No. WO 2014/058729, incorporated 25 herein by reference for all purposes) under anaerobic condi 8a. tions with nicotine adenine dinucleotide phosphate entry enzyme TTN e (NADPH) supplied as a reductant. 1 P411-CIS T438S 3OO 74:26 TABLE A 30 2 P450-CIS T438S 7 ind 3 P411-CIS A268T T438S 19 ind 4 P411-H2-5-F10 140 29:71 Mutations present in P450 BMS variants used in the disclosure 5 P411-H2-A-10 84 57:43 6 P411-H2-4-D4 32 70:30 Enzyme Mutations relative to wild-type P450 7 P450 10 ind P450.3 Ole 35 8 P411 Bas 11 ind P450-T268A T268A 9 P450-T268A 19 ind P411 Bas C4OOS 10 P411-T268A 17 ind P411-T268A 11 P411-CIS I263A T438S 320 1882 P450-CIS T438S “P411” denotes Ser-ligated (C400S) variant of cytochrome P450BM3. Variant IDs and specific amino acid substitutions in each can be found in Table A. 40 TTN – total turnover number, er = enantiomeric ratio, nd = not determined P411-CIS T438S P411-CIS A268T Since activating mutations T268A and C400S were already T438S P411, H2-A-10 present in P411-CIS T438S, the effects of reverting each P411, H2-5-F10 mutation to the wild-type residue (Table B, entries 1-3) were P411 H2-4-D4 tested. Each revertant was much less active than the parent, 45 supporting the benefit of having the C400S and T268A muta tions for effective nitrene-transfer chemistry. Given the bulky Considering the Small size of reactive oxygen species natu nature of the aryl Sulfonylazide nitrene sources and aryl thio rally produced by P450s, it was anticipated that smaller ethers, the C400S mutants of several P450BM3 variants that azides, such as mesyl azide would be less sterically demand 50 had been engineered via combinatorial alanine Scanning to ing than aryl or arylsulfonyl azides and thus a more Suitable hydroxylate large substrates were tested (Table B, entries partner for reaction with thioanisole. Tosyl azide was shown 4-6). While P411-H2-5-F10 displayed comparably high to be an exception precursor for Sulfimidation. levels of activity to P411-CIS T438S (>100 TTN), the other mutants we tested from this library were less produc Control experiments confirmed that enzyme was necessary 55 tive. The effects of introducing the activating mutations into for sulfimide formation (Table 7). Free hemin showed no wild-type P450 was also tested. Although these wild-type activity in this transformation. derivatives were highly active and stereoselective for In other non-natural P450 reactions reported to date, it was intramolecular C-H amination, neither single mutant shown thatamino acid substitutions could alter both the activ (T268A or C400S) nor the double mutant (T268A+C400S) ity and stereoselectivity of the enzymes. Thus, mutation of 60 were as active for intermolecular sulfimidation. conserved residues C400 and T268 and other active-site resi The turnover data presented above demonstrate the dues were tested to determine their effect on sulfimidation sequence dependence of Sulfimidation productivity. The activity (Table B). For these experiments we used the more effects, however, could be due to changes in stability of the reactive sulfide 4-methoxythioanisole, for which we mea enzymes that lead to degradation over the course of the reac sured 300 TTN with P411 BM3-CIS T438S (see below for 65 tion. To address this possibility, the initial rates of reaction more discussion of the effect of sulfide substituents on reac using the most productive enzyme in terms of total turnover tivity). were compared, P411 BM3-CIS T438S, and the less produc US 9,399,762 B2 37 38 tive P411 BM3-H2-A-10 and P411 BM3-H2-4-D4 enzymes TABLE C (FIG. 7). Differences in the total productivity (i.e., TTN) for each enzyme are mirrored in the initial rates of reaction, Impact of Sulfide Substituents on Sulfimidation Suggesting that specific amino acids proximal to the heme Activity with P411-CIS T438S. influence binding and orientation of the substrates to effect catalytic rate enhancement. S YMe The key role of active site architecture in guiding reaction trajectory is further supported by the effects of amino acid R -- Substitutions on the reaction stereochemistry. Experiments shows that enzymes capable of producing an excess of either 10 R2 sulfimide enantiomer: e.g., P411 BM3-CIST438S gave an er 7a-e of 74:26, while expanded active site variant P411 BM3-H2 5-F10 exhibited the opposite selectivity, giving 29:71 (FIG. 's-Ns 8). Among the H2 mutants (which differ from P411 BM3-CIS 15 ca T438S by 3-5 amino acid substitutions, Table A), H2-5-F10 P411 BM3-CIST438S (0.2 mol%) was alone in containing the I263A mutation, Suggesting this -- mutation is relevant for enantiomeric inversion observed in 0.1 M KPi, pH = 8.0 RT, 4 hours the P411 BM3-H2-5-F10 variant. When the I263A mutation was placed in the P411 BM3-CIS background, an even more Me pronounced inversion in selectivity was observed (er-18:82 1 for the P411 CIS I263-AT438S variant, compared to 74:26 for P411 CIS T438S). This enzymatic system not only -" os-NH: induces asymmetry in Sulfimide products but also provides 25 tunability in which selectivity can be switched with just a few YMe or mutations. -- Previous studies of P450-catalyzed sulfoxidation as well as R rhodium-catalyzed C-H amination Suggest that the elec 30 tronic properties of Sulfide or alkyl acceptor Substrates sig R2 Me nificantly impact reactivity. Thus, to better understand the 8a-e 9 mechanism of this new enzyme reaction, experiments were entry R in 7 R2 in 7 TTN 8 TTN 9 performed to establish how thioether electronic properties 35 8. - OMe —H 3OO 270 affected enzyme-catalyzed sulfimidation. A set of arylsulfide b —Me —H 190 400 C —H —Me 1OO 390 Substrates with Substituents encompassing a range of elec d —H —H 30 500 tronic properties, from strongly donating to weakly with e - CHO —H >1. 510 drawing, were selected. As a first approximation of the effect Trace product observed by liquid chromatography-mass spectrometry (LC-MS). of sulfide electronics, the total number of turnovers catalyzed 40 Although the total turnover data Suggest that Sulfide elec by P411-CIS T438S was determined in the reactions of tronics influence reactivity, this result could also be due to different sulfides with tosyl azide (Table C). In general, sul other factors, such as Substrate-dependent enzyme inactiva fides containing electron-donating Substituents on the aryl tion. To assess the effect of sulfide substituents on reactivity sulfide ring were better substrates for sulfimidation. For 45 more directly, the initial rates of reaction of tosyl azide with example, the enzyme reaction containing 4-methoxythioani the sulfides 7a-7d in Table 2 were measured. The initial rates correlated well with the total turnover data presented above, sole methoxythioanisole (7a) gave the highest levels of activ with p-OMe showing the highest rate of reaction (FIG. 12). ity (300 TTN). In contrast, the electron-deficient p-aldehyde Given this correlation, more mechanistic information was Substrate (7e) gave only trace amounts of Sulfimide product. 50 obtained by fitting the data to a Hammett plot that correlates Further, some azides that initially appeared entirely inactive the observed rates with each substituent’s Hammett param eter. There was a strong, linear relationship with a Hammett gave Small amounts of sulfimide products when reacted with value of p=-4.0 (FIG. 2), which suggests that during the 4-methoxythioanisole, underscoring the importance of Sul rate-limiting step there is a buildup of positive charge on the fide electronics in this reaction (Table S4). The identity of 55 sulfide that is stabilized by electron-donating substituents. Substrates also exerted a modest influence on the enantiose This observation is consistent with Hammett values obtained for the oxidation of thioanisoles in P450-catalyzed sulfoxi lectivity of sulfimidation. In particular, P411 BM3-CIS dation reactions, though the magnitude of p for Sulfimidation T438S gave er values for substrates 8a-8d that ranged from is significantly greater than for Sulfoxidation (-4.0 versus 59:41 for 8c to 87:13 for 8d (Table S5, Figures S4-S6). While 60 -0.2). One possible explanation for this difference is that the it is possible that some sulfides were poorer substrates due to presumed nitrenoid intermediate of this reaction (FIG. 1) is a the steric influence of the para substituent, the overall trend is weaker oxidant than compound I, making the nucleophilicity strongly suggestive of electron induction to the aryl Sulfide of sulfur an important contributor to the rate of nitrenoid transfer. The large difference in the magnitude of p could also being a major contributor to activity. One notable aspect of 65 indicates a change in mechanism relative to P450 sulfoxida these reactions is that significantly more Sulfonamide (9) was tion: whereas Sulfimidation may occur via direct nucleophilic produced when less reactive sulfides were used. attack of the thioether on the nitrenoid intermediate. US 9,399,762 B2 39 40 As noted above, a greater proportion of Sulfonamide side 14). Both sulfonamide and sulfimide products are formed product was formed when less reactive sulfides were used. throughout the course of the reaction. The observations are The varying amounts of this side product prompted an exami consistent with the competing reaction pathways outlined in nation of how Sulfonamide might be produced. Experiments FIG.3 and suggest that the catalyst rests as a mixture of ferric were performed to test the possibility that azide is reduced by and ferrous hemes during sulfimidation. When the concentra Some additive in the reactions (i.e., glucose oxidase, catalase, tion of azide is high, as it is at the beginning of the reaction, NADPH, etc.) by simply omitting the P450 enzyme from the the unproductive pathway is favored, and a ferric resting State reactions (Table 7). No-enzyme controls yielded very little dominates. As azide is consumed, both the ferric and ferrous resting states can be observed. reduced sulfonamide product (more than 10-fold lower than P450 monooxygenases are known to undergo an “oxidase with enzyme present). While these experiments showed that 10 enzyme was likely involved in azide reduction, this still left uncoupling side reaction in which compound I is reduced by several possibilities. Since P411's heme domain is fused two electrons to give water, which bears some similarity to the to a reductase, there was the possibility that azide reduction process of azide reduction observed here. One difference, occurs via direct hydride transfer from the reductase, as has however, is that only a single electron transfer is required to been observed for aldehyde reductions. Thus, a carbon mon 15 oxide-inhibited reactions was used to investigate this possi attain a reactive state in nitrene-transfer chemistry. This bility, since CO binding to the heme iron should have no effect stands in contrast to P450 monooxygenase chemistry, where on the reductase domain. In the presence of CO, there was a the generation of compound I from O requires the transfer of significant decrease in the Sulfonamide produced, suggesting two electrons. Thus, one explanation for the relatively high that azide reduction occurs at the heme. Furthermore, only proportion of reduced azide in these reactions is that the trace Sulfonamide was observed when reactions were con ducted in the presence of oxygen, further Supporting the electron-transfer machinery in P450 is evolved to carry involvement of reduced heme in azide reduction. Since all the out two-electron reductions. In the case of nitrene transfer available evidence Suggests that azide reduction and Sulfim chemistry, reducing the ferric heme to the +2 state allows ide formation both occur at the heme, the most parsimonious 25 nitrenoid formation, but a second electron transfer would explanation is that both reactions stem from a common inter generate an unreactive iron(III) sulfonamide complex, as pro mediate that can give rise to both Sulfonamide and Sulfimide products. posed for intramolecular C Hamination. Coupled with the A proposed mechanism of sulfonamide and Sulfimide for fact that lower Sulfide concentrations and less-reactive Sul mation begins with the iron(III) heme gaining an electron 30 fides lead to increased azide reduction, these observations are from NADPH via the flavin cofactors of the reductase domain consistent with the mechanism discussed above in which (FIG.3). Addition of azide substrate results information of a sulfimide formation competes with azide reduction. Since formal iron(IV) nitrenoid, which can either be reduced by subsequent electron transfer or “trapped by sulfide to form electron transfers from the reductase domain are quite rapid, sulfimide product. A second electron transfer followed by 35 relatively reactive sulfides can successfully compete with protonation of the nitrenoid is proposed to generate Sulfona reduction to form sulfimide. mide which restores the heme iron to its ferric state, and The mechanistic picture described above Suggests that additional reductant is required to return to the catalytically achieving higher levels of sulfide occupancy in the active site active ferrous state (FIG. 3, unproductive pathway). should favor sulfimide formation and inhibit azide reduction. To test whether ferricheme is involved in the unproductive 40 pathway, the change in the visible absorbance spectrum of the This could be achieved with tighter binding of the sulfide reduced holoenzyme P411-CIS I263-AT438S was moni acceptor Substrate or by increasing the concentration of Sul tored upon addition of NADPH followed by azide. The Ser fide relative to azide. Accordingly, experiments were per ligated P411 proteins exhibit different absorbance properties formed with excess sulfide or slow addition of azide to deter in the ferric and ferrous states compared with their Cys 45 mine if increased sulfimide formation could be obtained ligated counterparts, such that the ferric, ferrous, and CO ferrous Soret bands are shifted from 418, 408, and 450 nm to relative to Sulfonamide. Increasing the Sulfide concentration 405, 422, and 411 nm, respectively (FIG. 12). When NADPH decreased reduction of azide to Sulfonamide and improved was added to a solution of enzyme under an anaerobic atmo the ratio of sulfimide to sulfonamide, from 0.6 (with 0.5 equiv sphere, a reduction of the heme from the ferric to the ferrous 50 sulfide) to 1.8 (with 4 equiv sulfide) (FIG. 15, Table D). Slow state was observed. When a degassed solution of azide was added to the ferrous protein, an immediate shift back to the azide addition slightly increased the TTN for sulfimide and ferric state was observed, with concomitant production of decreased sulfonamide formation in a 2 h reaction (FIG. 16). Sulfonamide, Verified by high-performance liquid chroma That higher concentrations of sulfide substrate improve tography (HPLC). This observation Suggests that the unpro 55 Sulfimide production Suggests that protein engineering can be ductive azide reduction pathway occurs readily in the absence used to improve the binding of sulfide acceptor Substrates and of sulfide and that, when provided only with azide, the cata could also produce strong gains in the desired activity. lyst rests in the ferric state. To determine the resting state of the P411 catalyst in Indeed, the specific activities of the enzyme catalysts reported sulfimidation, the above experiment were repeated in the 60 here compare favorably with enantioselective iron-based presence of both sulfide and azide. Addition of sulfide to a catalysts, which routinely require catalyst loadings of 10 solution of enzyme and NADPH results in no change in the Q mol%. Furthermore, engineering the holoenzyme or reduc or Soret bands, with the iron heme remaining in the ferrous tase domainto favor one electron transfers might improve the state. However, addition of azide to this solution causes the iron heme to shift to the ferric state. After 10 min, peaks 65 proportion of desired product relative to azide reduction, corresponding to the ferrous heme begin to grow until the which could allow reaction with more challenging organic ferrous heme becomes the dominant species at 30 min (FIG. acceptor Substrates. US 9,399,762 B2 41 42 TABLED Protein expression and purification. Enzymes used in puri fied protein experiments were expressed in BL21 (DE3) E. Productio of sulfimide 8a and sulfonamide 9 in the coli cultures transformed with plasmid encoding P450 or presence of varying levels of sulfide acceptor substrate. P411 variants. Expression and purification was performed as Sulfide AZide TTN 8a. TTN 9 described elsewhere, except that the shake rate was lowered to 0.5 eq 1 eq 71 110 130 RPM during expression. Following expression, cells 1 eq 1 eq 97 1OO were pelleted and frozen at -20°C. For purification, frozen 2 eq 1 eq 110 93 cells were resuspended in buffer A (20 mM tris, 20 mM 3 eq 1 eq 130 85 , 100 mMNaCl, pH 7.5, 4 mL/g of cell wet weight) 4 eq 1 eq 140 8O 10 and disrupted by sonication (2x1 min, output control 5, 50% duty cycle: Sonicator 3000, Misonix, Inc.). To pellet AC400S mutation (as described herein) for sulfimidation insoluble material, lysates were centrifuged at 24,000xg for can be rationalized that the less electron-donating axial serine 0.5 h at 4°C. Proteins were expressed in a construct contain ligand in P411 enzymes likely makes the nitrenoid species a 15 ing a 6X-His tag and were consequently purified using a nickel more potent oxidant. The impact of sulfide substituents on NTA column (5 mL HisTrap HP, GE Healthcare, Piscataway, sulfimide formation is also reflected in the generation of N.J.) using an AKTAXpress purifier FPLC system (GE health Sulfonamide side product, Suggesting the nitrenoid undergoes care). P450 or P411 enzymes were then eluted on a linear rapid reduction and can be productively insert into reactive gradient from 100% buffer A 0% buffer B (20 mM tris, 300 sulfides. Characterization of the redox state of the heme iron mMimidazole, 100 mMNaCl, pH 7.5) to 100% buffer B over in the presence and absence of nitrene source and Sulfide 10 column volumes (P450/P411 enzymes elute at around 80 acceptor supports the proposal that nitrenoid “overreduction' mMimidazole). Fractions containing P450 or P411 enzymes competes with productive sulfimide formation and that the were pooled, concentrated, and Subjected to three exchanges former is a two-electron process resulting in regeneration of of phosphate buffer (0.1 M KPipH 8.0) to remove excess salt ferric heme. Another interesting aspect of this enzyme reac 25 and imidazole. Concentrated proteins were aliquoted, flash tion is the use of an aryl sulfonylazide nitrene source. The frozen on powdered dry ice, and stored at -20°C. until later ability of the enzyme to accept larger aryl Substrates may be SC. beneficial for development ofenantioselective intermolecular Typical procedure for small-scale sulfimidation bioconver nitrene-transfer catalysts. Intermolecular nitrene transfer in sions under anaerobic conditions using purified enzymes. the form of sulfimidation can now be added to the impressive 30 Small-scale reactions (400 uL) were conducted anaerobically array of cytochrome P450 enzymes. in 2 mL crimp vials. A solution of arylsulfide in DMSO or General. Unless otherwise noted, all chemicals and reagents for chemical reactions were obtained from commer methanol (100 mM, 10 uL) was added to the reaction vial via cial suppliers (Sigma-Aldrich, VWR, Alfa Aesar) and used syringe, followed by arylsulfonyl azide (100 mM, 10 uL. without further purification. Silica gel chromatography puri 35 DMSO). Final concentrations of the reagents were typically: fications were carried out using AMD Silica Gel 60,230-400 2.5 mMaryl sulfide, 2.5 mMarylsulfonyl azide, 10 mM mesh. 1H spectra were recorded on a Varian Inova 500 MHz NADPH, 25 mM glucose, 5-20 uM P450. To the vials were instrument in CDC13, and are referenced to the residual sol then added acetonitrile (460 LL) and internal standard (1.3.5 vent peak. Synthetic reactions were monitored using thin trimethoxybenzene, 10 mM in 10% DMSO/90% acetonitrile, layer chromatography (Merck 60 gel plates) using an UV 40 1 mM final concentration). This mixture was then transferred lamp for visualization. to a microcentrifuge tube, and centrifuged at 17,000xg for 5 Chromatography. Analytical high-performance liquid minutes. A portion (20 uD) of the Supernatant was then ana chromatography (HPLC) was carried out using an Agilent lyzed by HPLC. Sulfimide formation was quantified by com 1200 series, and a Kromasil 100 C18 column (Peeke Scien 45 parison of integrated peak areas of internal standard (1.3.5- tific, 4.6x50 mm, 5um). Semi-preparative HPLC was per trimethoxybenzene, 1 mM or 1,3,5-trichlorobenzene, 1 mM) formed using an Agilent XDB-C18 (9.4x250 mm, 5 um). and Sulfimide at 220 nm to a calibration curve made using Analytical chiral HPLC was conducted using a supercritical synthetically produced sulfimide and internal standard. Coef fluid chromatography (SFC) system with isopropanol and ficients determined from standard curves were multiplied by liquid CO as the mobile phase. Chiralcel OB-H and OJ 50 a dilution factor in order to obtain sulfimide concentrations in columns were used to separate Sulfimide enantiomers (4.6x 150 mm, 5uM). Sulfides were all commercially available and the reaction mixture. Standard curves and response factors for Sulfimide standards were prepared as reported. e.r. Values products 8a-8d are presented in FIGS. 17-20. For chiral determined by dividing the major peak area by the sum of the HPLC, the quenched reaction mixture was extracted twice peak areas determined by SFC chromatography. 55 with ethyl acetate (2x350 LL), dried under a light argon Cloning and site-directed mutagenesis. pET22b(+) was stream and resuspended in acetonitrile (100 uL). used as a cloning and expression vector for all enzymes controls to confirm the enzymatic sulfimidation activity of described in this study. Site-directed mutagenesis on variant P411 CIS T438S. Small-scale reactions (400 uL P411 BM3-CIS T438S to generate P411 BM3-CIS I263A total volume) were set up and worked up as described above. T438S was performed using a modified QuickChangeTM 60 For the reaction containing hemin as catalyst, 10 uL of a mutagenesis protocol. The PCR products were gel purified, hemin solution (1 mM in 50% DMSOH2O) was added to a digested with DpnI, and directly transformed into E. coli final concentration of 25 uM. TTNs were determined as strain BL21 (DE3). described above and are presented in Table 7. CS denotes Determination of P450 concentration. Concentration of complete system in which all components of the reactions as P450/P411 enzymes was accomplished by quantifying the 65 described above are present. Variations from the complete amount of free hemin present in purified protein using the system are denoted with a “ X' where X is the component pyridine/hemochrome assay. removed. US 9,399,762 B2 43 44 TABLE 7 8a:

Control experiments using Substrate 7c yielding NTS products 8c (sulfimide) and 9 (sulfonamide).

Condition TTN 8c TTN 9 O n Me MeO P411-CIS T438S (CS) 90 3OO 10 CS - NADPH 1 2 H NMR (500 MHz: CDC1): 8–7.71 (d. J=8.0 Hz, 2H), CS + Na2S2O + CO 2 40 7.61 (d. J=8.9 Hz, 2H), 7.15 (d. J=8.35 Hz, 2H), 6.96 (d. J=8.9 CS boiled enzyme 1 53 Hz, 2H), 3.83 (s.3H), 2.81 (s.3H), 2.34 (s.3H). CS aerobic 2.3 16 15 Hemin + Na2S2O O 42 CS - P411-CIS T438S O 21 NTS

CS = complete system. Synthesis of substrates and standards. All sulfimides pre O n Me sented in Table 8 were obtained form commercial sources Me (Sigma Aldrich, Alfa Aesar). Sulfimide standards were syn thesized using known techniques. 1H NMR (500 MHz: CDC13): 8–7.72 (d. J=8.2 Hz, 2H), 25 7.57 (d. J=8.2 Hz, 2H), 7.28 (d. J=7.3 Hz, 2H), 7.16 (d. J=8.1 TABLE 8 Hz, 2H), 2.82 (s.3H), 2.39 (s.3H), 2.35 (s.3H) Impace of sulfide substituents on Sulfimidation activity with P411-CIS T438S. 30 NTS S | YMe Me S R -- o YMe R 35 7a-e 1H NMR (500 MHz: CDC13): 8–7.73 (d. J=8.0 Hz, 2H), 748-7.33 (m, 4H), 7.17 (d. J=8.1 Hz, 2H), 2.82 (s.3H), 2.36 's-Ns (s, 3H), 2.35 (s.3H) O2 40 P411 BM3-CIST438S (0.2 mol%) Hos 0.1 M KPi, pH = 8.0 RT, 4 hours 45 Me 1 -" os-NH: o212 50 1H NMR (500 MHz: CDC13): 8–7.72 (d. J–7.8 Hz, 2H), S 7.55-7.46 (m, 5H), 7.16 (d. J=8.1 Hz, 2H), 2.83 (s.3H), 2.34 YMe (s, 3H) -- The 1H NMR listings above for products 8a-8d matched R those of characterized compounds. 55 Determination of initial rates. Four 2-ml vials were R2 Me charged with a stir bar, 10x oxygen depletion system (40LL), 8a-e 9 and a solution of enzyme prior to crimp sealing with a silicon septum. Once sealed, the headspace was flushed with argon entry R in 7 R2 in 7 TTN 8 TTN 9 for at least 10 minutes. Concurrently, a sealed 6-mL vial 60 charged with glucose (250 mM, 400 uL), NADPH (20 mM, 8. - OMe —H 300 270 400 uL), and KPi (pH=8.0.0.1 M, 2.6 mL) was sparged for 10 b —Me —H 190 400 C —H —Me 100 390 minutes with argon. After degassing was complete, 340LL of d —H —H 30 500 the reaction solution was transferred to the 2-mL vial via e - CHO —H > 1 a 510 syringe. Sulfide (100 mM, 10 LL) was added to all four 2-mL 65 vials followed quickly (less than 20 seconds) by tosyl azide Trace product observed by liquid chromatography-mass spectrometry (LC-MS). (100 mM, 10 uL). The reactions were quenched at 1-2 minute intervals over 5-10 minutes by decapping and adding aceto US 9,399,762 B2 45 46 nitrile (460 LL). After 5 minutes of stirring, the vials were resulting Solution confirmed that a Substantial amount of charged with internal standard and the reaction mixtures were azide is reduced to the corresponding Sulfonamide. transferred to 1.8 mL tubes, which were vortexed and centri To determine the resting state of the protein during the fuged (14,000xg, 5 min) The supernatant was transferred to a Sulfimidation reaction, a degassed solution of sulfide 7a (2.LL. vial for analysis by HPLC. Initial rates are plotted for indi 400 mM in DMSO) was added to the cuvette containing vidual enzymes referenced in FIG. 7, and for various substi P411 BM3CIS I263A T438S in the presence of NADPH. A tuted aryl sulfides in FIG. 12. visible absorbance spectrum of the mixture was recorded to Visible absorbance spectroscopy and observation of rest ensure that the oxidation state of iron heme is unchanged. ing states. To a semi-micro anaerobic cuvette, 8 ul of Next, a degassed solution of tosyl azide (2 uL, 400 mM in P411 CIS I263-AT438S (400 uM) was added. To obtain a 10 DMSO) was added to the cuvette. Visible absorbance spectra spectrum of the ferric protein, 0.5 mL of degassed phosphate of the solution were recorded at 5, 7, 12, 18, 22, 25 and 30 min buffer was added to the cuvette and the visible spectrum was (FIG. 14). The appearance of ferrous heme is observed over recorded from 650 to 400 nm. To obtain a spectrum of the time. HPLC confirmed formation of both sulfimide and sul ferrous protein, the cuvette was sealed with a cap equipped fonamide. with rubber septa and the headspace of the cuvette was purged 15 Excess sulfide and azide slow addition experiments. To with a gentle stream of Arfor 3 min. A solution of NADPH (5 assess the impact of Sulfide concentration on overall produc mM) was added to a 6 mL crimp vial and made anaerobic by tivity of reaction, Sulfide was added to reaction ranging from sparging with Ar for 5 min. The NADPH solution (0.5 mL) 0.5eq to 4 eq relative to azide. 1 eq of azide denotes 1 mM in was then added to the anaerobic cuvette containing protein. the small scale reactions described above. Results are plotted Visible spectra of the protein sample are recorded until a in Figure S10 below as a ratio of the TTN for sulfimide vs. stable ferrous state is reached. Representative spectra of the TTN for sulfonamide. Slow addition was accomplished by Fe(III)- and Fe(II)-protein are shown below (FIG. 13). adding 1 uL of a 100 mM (100 nmol) tosyl azide solution To determine the resting state of the protein in the unpro (DMSO) at 15 minute intervals to a reaction set up as ductive catalytic cycle, a degassed solution of tosyl azide (2 described previously with 0.4% catalyst loading, containing uL, 400 mM in DMSO) was added to a fully reduced sample 25 2.5 mM 7a. Azide was added over 150 minutes until equimo of ferrous protein. The visible spectrum of the protein shifted lar final concentrations of sulfide and azide were achieved. to the ferric heme immediately and remained unchanged for Results of the slow addition are presented in FIG. 16. 20 min. Addition of an aliquot of organic solvent of similar A number of embodiments of the disclosure have been Volume did not cause the observed change in iron oxidation described. Nevertheless, it will be understood that various state. At the end of 20 minutes (FIG. 13), the cuvette was 30 modifications may be made without departing from the spirit uncapped and the reaction mixture was worked up following and scope of the disclosure. Accordingly, other embodiments the general procedure for small scale reactions. HPLC of the are within the scope of the following claims.

SEQUENCE LISTING

<16 Os NUMBER OF SEO ID NOS: 49

<21 Oc SEO ID NO 1 <211 LENGTH: 1048 <212> TYPE PRT <213> ORGANISM: Bacillus megaterium <22 Os FEATURE; OTHER INFORMATION: cytochrome P450:NADPH-P-450 reductase precursor, cytochrome P450

<4 OOs SEQUENCE: 1

Thir Ile Lys Glu Met Pro Glin Pro Thir Phe Gly Glu Tell Lys Asn 1. 5 15

Luell Pro Lell Luell Asn Thir Asp Pro Wall Glin Ala Luell Met Lys Ile 25 3 O

Ala Asp Glu Lieu. Gly Glu Ile Phe Llys Phe Glu Ala Pro Gly Arg Val 35 4 O 45

Thir Arg Tyr Luell Ser Ser Glin Arg Lell Ile Glu Ala Asp Glu SO 55 60

Ser Arg Phe Asp Asn Luell Ser Glin Ala Lieu. Lys Phe Wall Arg Asp 65 70

Phe Ala Gly Asp Gly Lieu Phe Thr Ser Trp Thir His Glu Asn Trp 85 90 95

Ala His Asn Ile Luell Luell Pro Ser Phe Ser Glin Glin Ala Met 105 110

Gly Tyr His Ala Met Met Wall Asp Ile Ala Wall Glin Tell Wall Glin 115 12O 125

Trp Glu Arg Lieu. Asn Ala Asp Glu. His Ile Glu Wall Pro Glu Asp US 9,399,762 B2 47 48 - Continued

13 O 135 14 O

Met Thir Arg Luell Thir Lell Asp Thir Ile Gly Luell Gly Phe Asn Tyr 145 150 155 160

Arg Phe Asn Ser Phe Arg Asp Glin Pro His Pro Phe Ile Thir Ser 1.65 17O 17s

Met Wall Arg Ala Lell Glu Ala Met Asn Lell Glin Arg Ala Asn 18O 185 19 O

Pro Asp Asp Pro Ala Asp Glu Asn Arg Glin Phe Glin Glu Asp 195

Ile Lys Wall Met Asn Asp Lell Wall Asp Ile Ile Ala Asp Arg 21 O 215

Ala Ser Gly Glu Glin Ser Asp Asp Luell Luell Thir His Met Luell Asn Gly 225 23 O 235 24 O

Asp Pro Glu Thir Gly Glu Pro Luell Asp Asp Glu Asn Ile Arg 245 250 255

Glin Ile Ile Thir Phe Lell Ile Ala Gly His Glu Thir Thir Ser Gly Luell 26 O 265 27 O

Lell Ser Phe Ala Lell Phe Luell Wall ASn Pro His Wall Luell Glin 27s 285

Ala Ala Glu Glu Ala Ala Arg Wall Luell Wall Asp Pro Wall Pro Ser 29 O 295 3 OO

Tyr Glin Wall Lys Glin Lell Wall Gly Met Wall Luell Asn Glu 3. OS 310 315

Ala Luell Arg Luell Trp Pro Thir Ala Pro Ala Phe Ser Lell Ala 325 330 335

Glu Asp Thir Wall Lell Gly Gly Glu Tyr Pro Luell Glu Gly Asp Glu 34 O 345 35. O

Lell Met Wall Luell Ile Pro Glin Luell His Arg Asp Thir Ile Trp Gly 355 360 365

Asp Asp Wall Glu Glu Phe Arg Pro Glu Arg Phe Glu Asn Pro Ser Ala 37 O 375

Ile Pro Glin His Ala Phe Pro Phe Gly ASn Gly Glin Arg Ala Cys 385 390 395 4 OO

Ile Gly Glin Glin Phe Ala Lell His Glu Ala Thir Lell Wall Luell Gly Met 4 OS 415

Met Luell His Phe Asp Phe Glu Asp His Thir Asn Glu Luell Asp 425 43 O

Ile Glu Thir Lell Thir Lell Lys Pro Glu Gly Phe Wall Wall Ala 435 44 O 445

Ser Ile Pro Lell Gly Gly Ile Pro Ser Pro Ser Thir Glu 450 45.5 460

Glin Ser Ala Lys Wall Arg Ala Glu Asn Ala His Asn Thir 465 470 47s

Pro Luell Luell Wall Lell Gly Ser Asn Met Gly Thir Ala Glu Gly Thir 485 490 495

Ala Arg Asp Luell Ala Asp Ile Ala Met Ser Gly Phe Ala Pro Glin SOO 505 51O

Wall Ala Thir Luell Asp Ser His Ala Gly Asn Luell Pro Arg Glu Gly Ala 515 525

Wall Luell Ile Wall Thir Ala Ser Asn Gly His Pro Pro Asp Asn Ala 53 O 535 54 O

Lys Glin Phe Wall Asp Trp Lell Asp Glin Ala Ser Ala Asp Glu Wall Lys 5.45 550 555 560 US 9,399,762 B2 49 50 - Continued

Gly Wall Arg Ser Wall Phe Gly Gly Asp Lys Asn Trp Ala Thir 565 st O sts

Thir Glin Lys Wall Pro Ala Phe Ile Asp Glu Thir Lell Ala Ala 585 59 O

Gly Ala Glu Asn Ile Ala Asp Arg Gly Glu Ala Asp Ala Ser Asp Asp 595 605

Phe Glu Gly Thir Tyr Glu Glu Trp Arg Glu His Met Trp Ser Asp Wall 610 615

Ala Ala Phe Asn Lell Asp Ile Glu Asn Ser Glu Asp Asn Ser 625 630 635 64 O

Thir Luell Ser Luell Glin Phe Wall Asp Ser Ala Ala Asp Met Pro Luell Ala 645 650 655

Met His Gly Ala Phe Ser Thir Asn Wall Wall Ala Ser Lys Glu Luell 660 665 67 O

Glin Glin Pro Gly Ser Ala Arg Ser Thir Arg His Lell Glu Ile Glu Luell 675 685

Pro Lys Glu Ala Ser Glin Glu Gly Asp His Lell Gly Wall Ile Pro 69 O. 695 7 OO

Arg Asn Glu Gly Ile Wall Asn Arg Wall Ala Arg Phe Gly Luell 7 Os 71O

Asp Ala Ser Glin Glin Ile Arg Luell Glu Ala Glu Glu Luell Ala 72 73 O 73

His Luell Pro Luell Ala Thir Wall Ser Wall Glu Lell Luell Glin 740 74. 7 O

Wall Glu Luell Glin Asp Wall Thir Arg Thir Lell Arg Ala Met Ala 760 765

Ala Lys Thir Wall Cys Pro His Wall Lell Glu Ala Luell Luell 770 775

Glu Glin Ala Tyr Lys Glu Glin Wall Luell Arg Luell Thir Met 79 O

Lell Glu Luell Luell Glu Pro Ala Cys Met Phe Ser Glu 805 810 815

Phe Ile Ala Luell Lell Ser Ile Arg Pro Arg Ser Ile Ser 825 83 O

Ser Ser Pro Arg Wall Asp Glu Lys Glin Ala Ser Ile Thir Wall Ser Wall 835 84 O 845

Wall Ser Gly Glu Ala Trp Ser Gly Gly Glu Tyr Gly Ile Ala 850 855 860

Ser Asn Luell Ala Glu Lell Glin Glu Gly Asp Thir Ile Thir Phe 865

Ile Ser Thir Pro Glin Ser Glu Phe Thir Luell Pro Asp Pro Glu Thir 885 890 895

Pro Luell Ile Met Wall Gly Pro Gly Thir Gly Wall Ala Pro Phe Arg Gly 9 OO 905 91 O

Phe Wall Glin Ala Arg Glin Luell Glu Glin Gly Glin Ser Luell Gly 915 92 O 925

Glu Ala His Luell Tyr Phe Gly Arg Ser Pro His Glu Asp Luell 93 O 935 94 O

Tyr Glin Glu Glu Lell Glu Asn Ala Glin Ser Glu Gly Ile Ile Thir Luell 945 950 955 96.O

His Thir Ala Phe Ser Arg Met Pro Asn Glin Pro Thir Wall Glin 965 97O 97. US 9,399,762 B2 51 - Continued His Val Met Glu Glin Asp Gly Lys Llys Lieu. Ile Glu Lieu. Lieu. Asp Glin 98O 985 99 O Gly Ala His Phe Tyr Ile Cys Gly Asp Gly Ser Gln Met Ala Pro Ala 995 1OOO 1 OOS Val Glu Ala Thr Lieu Met Lys Ser Tyr Ala Asp Wal His Glin Val Ser 1010 1 O15 1 O2O Glu Ala Asp Ala Arg Lieu. Trp Lieu. Glin Glin Lieu. Glu Glu Lys Gly Arg 1025 103 O 1035 104 O Tyr Ala Lys Asp Val Trp Ala Gly 1045

<210s, SEQ ID NO 2 &211s LENGTH: 1049 212. TYPE: PRT <213> ORGANISM: Bacillus megaterium 22 Os. FEATURE: <223> OTHER INFORMATION: NADPH-cytochrome P450 reductase 102A1V9, CYP1 O2A1

<4 OOs, SEQUENCE: 2 Met Thr Ile Lys Glu Met Pro Gln Pro Llys Thr Phe Gly Glu Lieu Lys 1. 5 1O 15 Asn Lieu Pro Lieu. Lieu. Asn. Thir Asp Llys Pro Ile Glin Thr Lieu Met Lys 2O 25 3O Ile Ala Asp Glu Lieu. Gly Glu Ile Phe Llys Phe Glu Ala Pro Gly Arg 35 4 O 45 Val Thr Arg Tyr Lieu. Ser Ser Glin Arg Lieu. Ile Llys Glu Ala Cys Asp 50 55 60 Glu Ser Arg Phe Asp Lys Asn Lieu. Ser Glin Ala Lieu Lys Phe Val Arg 65 70 7s 8O Asp Phe Ala Gly Asp Gly Lieu. Phe Thir Ser Trp Thr His Glu Lys Asn 85 90 95 Trp Llys Lys Ala His Asn. Ile Lieu. Lieu Pro Ser Phe Ser Glin Glin Ala 1OO 105 11 O Met Lys Gly Tyr His Ala Met Met Val Asp Ile Ala Val Glin Lieu. Ile 115 12 O 125 Glin Llys Trp Glu Arg Lieu. Asn. Thir Asp Glu. His Ile Glu Val Pro Glu 13 O 135 14 O Asp Met Thir Arg Lieu. Thir Lieu. Asp Thir Ile Gly Lieu. Cys Gly Phe Asn 145 150 155 160 Tyr Arg Phe Asin Ser Phe Tyr Arg Asp Glin Pro His Pro Phe Ile Thr 1.65 17O 17s Ser Met Val Arg Ala Lieu. Asp Glu Ala Met Asn Llys Lieu. Glin Arg Ala 18O 185 19 O Asn Pro Asp Asp Pro Ala Tyr Asp Glu Asn Lys Arg Glin Phe Glin Glu 195 2OO 2O5 Asp Ile Llys Val Met Asn Asp Lieu Val Asp Llys Ile Ile Ala Asp Arg 21 O 215 22O

Lys Ala Ser Gly Glu Glin Ser Asp Asp Lieu. Lieu. Thir His Met Lieu. Asn 225 23 O 235 24 O

Gly Lys Asp Pro Glu Thr Gly Glu Pro Lieu. Asp Asp Glu Asn. Ile Arg 245 250 255

Tyr Glin Ile Ile Thr Phe Lieu. Ile Ala Gly His Glu Thir Thr Ser Gly 26 O 265 27 O

Lieu. Lieu. Ser Phe Ala Lieu. Tyr Phe Lieu Val Lys Asn Pro His Val Lieu. 27s 28O 285 US 9,399,762 B2 53 54 - Continued

Glin Lys Ala Ala Glu Glu Ala Ala Arg Wall Luell Wall Asp Pro Wall Pro 29 O 295 3 OO

Ser Glin Wall Lys Glin Luell Wall Gly Met Wall Luell Asn 3. OS 310 315

Glu Ala Luell Arg Lell Trp Pro Thir Ala Pro Ala Phe Ser Luell Tyr Ala 3.25 330 335

Glu Asp Thir Wall Lell Gly Gly Glu Pro Lell Glu Lys Gly Asp 34 O 345 35. O

Glu Luell Met Wall Lell Ile Pro Glin Luell His Arg Asp Lys Thir Ile Trp 355 360 365

Gly Asp Asp Wall Glu Glu Phe Arg Pro Glu Arg Phe Glu Asn Pro Ser 37 O 375

Ala Ile Pro Glin His Ala Phe Pro Phe Gly Asn Gly Glin Arg Ala 385 390 395 4 OO

Ile Gly Glin Glin Phe Ala Luell His Glu Ala Thir Lell Wall Luell Gly 4 OS 415

Met Met Luell Lys His Phe Asp Phe Glu Asp His Thir Asn Tyr Glu Luell 425 43 O

Asp Ile Lys Glu Thir Lell Thir Luell Pro Glu Gly Phe Wall Wall Lys 435 44 O 445

Ala Lys Ser Glin Ile Pro Luell Gly Ile Pro Ser Pro Ser Arg 450 45.5 460

Glu Glin Ser Ala Lys Lys Glu Arg Thir Wall Glu Asn Ala His Asn 465 470

Thir Pro Luell Luell Wall Lell Gly Ser Asn Met Gly Thir Ala Glu Gly 485 490 495

Thir Ala Arg Asp Lell Ala Asp Ile Ala Met Ser Gly Phe Ala Pro SOO 505

Glin Wall Ala Thir Lell Asp Ser His Ala Gly ASn Lell Pro Arg Glu Gly 515 525

Ala Wall Luell Ile Wall Thir Ala Ser Asn Gly His Pro Pro Asp Asn 53 O 535 54 O

Ala Glu Phe Wall Asp Trp Luell Asp Glin Ala Ser Ala Asp Glu Wall 5.45 550 555 560

Gly Wall Arg Tyr Ser Wall Phe Gly Cys Gly Asp Asn Trp Ala 565 st O sts

Thir Thir Glin Lys Wall Pro Ala Phe Ile Asp Glu Thir Luell Ala Ala 585 59 O

Gly Ala Glu Asn Ile Ala Glu Arg Gly Glu Ala Asp Ala Ser Asp 595 605

Asp Phe Glu Gly Thir Glu Glu Trp Arg Glu His Met Trp Ser Asp 610 615

Lell Ala Ala Phe Asn Lell Asp Ile Glu ASn Ser Glu Glu Asn Ala 625 630 635 64 O

Ser Thir Luell Ser Lell Glin Phe Wall Asp Ser Ala Ala Asp Met Pro Luell 645 650 655

Ala Met His Arg Ala Phe Ser Ala Asn Wall Wall Ala Ser Glu 660 665 67 O

Lell Glin Lys Pro Gly Ser Ala Arg Ser Thir Arg His Lell Glu Ile Glu 675 685

Lell Pro Glu Ala Ser Tyr Glin Glu Gly Asp His Lell Gly Wall Ile 69 O. 695 7 OO US 9,399,762 B2 55 - Continued Pro Arg Asn Tyr Glu Gly Ile Val Asn Arg Val Ala Thr Arg Phe Gly 7 Os 71O 71s 72O Lieu. Asp Ala Ser Glin Glin Ile Arg Lieu. Glu Ala Glu Glu Glu Lys Lieu. 72 73 O 73 Ala His Lieu Pro Lieu. Gly Llys Thr Val Ser Val Glu Glu Lieu. Lieu. Glin 740 74. 7 O Tyr Val Glu Lieu. Glin Asp Pro Val Thr Arg Thr Glin Lieu. Arg Ala Met 7ss 760 765 Ala Ala Lys Thr Val Cys Pro Pro His Llys Val Glu Lieu. Glu Val Lieu 770 775 78O Lieu. Glu Lys Glin Ala Tyr Lys Glu Glin Val Lieu Ala Lys Arg Lieu. Thir 78s 79 O 79. 8OO Met Leu Glu Lieu Lleu. Glu Lys Tyr Pro Ala Cys Glu Met Glu Phe Ser 805 810 815 Glu Phe Ile Ala Leu Lleu Pro Ser Met Arg Pro Arg Tyr Tyr Ser Ile 82O 825 83 O Ser Ser Ser Pro Arg Val Asp Glu Lys Glin Ala Ser Ile Thr Val Ser 835 84 O 845 Val Val Ser Gly Glu Ala Trp Ser Gly Tyr Gly Glu Tyr Lys Gly Ile 850 855 860 Ala Ser Asn Tyr Lieu Ala Asn Lieu. Glin Glu Gly Asp Thir Ile Thr Cys 865 87O 87s 88O Phe Val Ser Thr Pro Glin Ser Gly Phe Thr Lieu Pro Lys Gly Pro Glu 885 890 895 Thr Pro Lieu. Ile Met Val Gly Pro Gly Thr Gly Val Ala Pro Phe Arg 9 OO 905 91 O Gly Phe Val Glin Ala Arg Lys Glin Lieu Lys Glu Glin Gly Glin Ser Lieu. 915 92 O 925 Gly Glu Ala His Lieu. Tyr Phe Gly Cys Arg Ser Pro His Glu Asp Tyr 93 O 935 94 O Lieu. Tyr Glin Lys Glu Lieu. Glu Asn Ala Glin Asn. Glu Gly Ile Ile Thr 945 950 955 96.O Lieu. His Thr Ala Phe Ser Arg Val Pro Asn Gln Pro Llys Thr Tyr Val 965 97O 97. Glin His Val Met Glu Glin Asp Gly Llys Llys Lieu. Ile Glu Lieu. Lieu. Asp 98O 985 99 O Gln Gly Ala His Phe Tyr Ile Cys Gly Asp Gly Ser Gln Met Ala Pro 995 1OOO 1 OOS Asp Val Glu Ala Thr Lieu Met Lys Ser Tyr Ala Glu Val His Glin Val 1010 1 O15 1 O2O Ser Glu Ala Asp Ala Arg Lieu. Trp Lieu. Glin Gln Lieu. Glu Glu Lys Gly 1025 103 O 1035 104 O Arg Tyr Ala Lys Asp Val Trp Ala Gly 1045

<210s, SEQ ID NO 3 &211s LENGTH: 1049 212. TYPE: PRT <213> ORGANISM: Bacillus megaterium 22 Os. FEATURE: <223> OTHER INFORMATION: NADPH-cytochrome P450 reductase 102A1V10, CYP1 O2A1

<4 OOs, SEQUENCE: 3 Met Thr Ile Lys Glu Met Pro Gln Pro Llys Thr Phe Gly Glu Lieu Lys 1. 5 1O 15 US 9,399,762 B2 57 58 - Continued

Asn Luell Pro Luell Lell Asn Thir Asp Lys Pro Ile Glin Thir Luell Met Lys 2O 25

Ile Ala Asp Glu Lell Gly Glu Ile Phe Phe Glu Ala Pro Gly Arg 35 4 O 45

Wall Thir Arg Lell Ser Ser Glin Arg Luell Ile Lys Glu Ala Asp SO 55 6 O

Glu Ser Arg Phe Asp Lys Asn Luell Ser Glin Ala Lell Phe Wall Arg 65 70

Asp Phe Ala Gly Asp Gly Lell Phe Thir Ser Trp Thir His Glu Lys Asn 85 90 95

Trp Ala His Asn Ile Luell Luell Pro Ser Phe Ser Glin Glin Ala 105 11 O

Met Gly His Ala Met Met Wall Asp Ile Ala Wall Glin Luell Ile 115 12 O 125

Glin Lys Trp Glu Arg Lell Asn Thir Asp Glu His Ile Glu Wall Pro Glu 13 O 135 14 O

Asp Met Thir Arg Lell Thir Lell Asp Thir Ile Gly Lell Gly Phe Asn 145 150 155 160

Arg Phe Asn Ser Phe Arg Asp Glin Pro His Pro Phe Ile Thir 1.65 17O 17s

Ser Met Wall Arg Ala Lell Glu Ala Met ASn Lell Glin Arg Ala 18O 185 19 O

Asn Pro Asp Asp Pro Ala Asp Glu Asn Lys Arg Glin Phe Glin Glu 195

Asp Ile Wall Met Asn Asp Luell Wall Asp Ile Ile Ala Asp Arg 21 O 215

Lys Ala Ser Gly Glu Glin Ser Asp Asp Luell Luell Thir His Met Luell Asn 225 23 O 235 24 O

Gly Asp Pro Glu Thir Gly Glu Pro Luell Asp Asp Glu Asn Ile Arg 245 250 255

Glin Ile Ile Thir Phe Lell Ile Ala Gly His Glu Thir Thir Ser Gly 26 O 265 27 O

Lell Luell Ser Phe Ala Lell Tyr Phe Luell Wall Asn Pro His Wall Luell 27s 28O 285

Glin Lys Ala Ala Glu Glu Ala Ala Arg Wall Luell Wall Asp Pro Wall Pro 29 O 295 3 OO

Ser Glin Wall Lys Glin Luell Wall Gly Met Wall Luell Asn 3. OS 310 315

Glu Ala Luell Arg Lell Trp Pro Thir Ala Pro Ala Phe Ser Luell Tyr Ala 3.25 330 335

Glu Asp Thir Wall Lell Gly Gly Glu Tyr Pro Lell Glu Lys Gly Asp 34 O 345 35. O

Glu Luell Met Wall Lell Ile Pro Glin Luell His Arg Asp Lys Thir Ile Trp 355 360 365

Gly Asp Asp Wall Glu Glu Phe Arg Pro Glu Arg Phe Glu Asn Pro Ser 37 O 375

Ala Ile Pro Glin His Ala Phe Pro Phe Gly Asn Gly Glin Arg Ala 385 390 395 4 OO

Ile Gly Glin Glin Phe Ala Luell His Glu Ala Thir Lell Wall Luell Gly 4 OS 41O 415

Met Met Luell Lys His Phe Asp Phe Glu Asp His Thir Asn Tyr Glu Luell 42O 425 43 O US 9,399,762 B2 59 60 - Continued

Asp Ile Lys Glu Thir Lell Thir Luell Pro Glu Gly Phe Wall Wall Lys 435 44 O 445

Ala Lys Ser Glin Ile Pro Luell Gly Ile Pro Ser Pro Ser Arg 450 45.5 460

Glu Glin Ser Ala Lys Lys Glu Arg Thir Wall Glu Asn Ala His Asn 465 470

Thir Pro Luell Luell Wall Lell Gly Ser Asn Met Gly Thir Ala Glu Gly 485 490 495

Thir Ala Arg Asp Lell Ala Asp Ile Ala Met Ser Gly Phe Ala Pro SOO 505

Glin Wall Ala Thir Lell Asp Ser His Ala Gly ASn Lell Pro Arg Glu Gly 515 525

Ala Wall Luell Ile Wall Thir Ala Ser Asn Gly His Pro Pro Asp Asn 53 O 535 54 O

Ala Glu Phe Wall Asp Trp Luell Asp Glin Ala Ser Ala Asp Glu Wall 5.45 550 555 560

Gly Wall Arg Tyr Ser Wall Phe Gly Cys Gly Asp Asn Trp Ala 565 st O sts

Thir Thir Glin Lys Wall Pro Ala Phe Ile Asp Glu Thir Phe Ala Ala 585 59 O

Gly Ala Glu Asn Ile Ala Glu Arg Gly Glu Ala Asp Ala Ser Asp 595 605

Asp Phe Glu Gly Thir Glu Glu Trp Arg Glu His Met Trp Ser Asp 610 615

Lieu Ala Ala Phe Asn Lieu Asp Ile Glu Asn Ser Glu Glu Asn Ala 625 630 635 64 O

Ser Thir Luell Ser Lell Glin Phe Wall Asp Ser Ala Ala Asp Met Pro Luell 645 650 655

Ala Met His Arg Ala Phe Ser Ala Asn Wall Wall Ala Ser Glu 660 665 67 O

Lell Glin Lys Pro Gly Ser Ala Arg Ser Thir Arg His Lell Glu Ile Glu 675 685

Lell Pro Glu Ala Ser Tyr Glin Glu Gly Asp His Lell Gly Wall Ile 69 O. 695 7 OO

Pro Arg Asn Glu Gly Ile Wall Asn Arg Wall Thir Arg Phe Gly 7 Os

Lell Asp Ala Ser Glin Glin Ile Arg Luell Glu Ala Glu Glu Lys Luell 72 73 O 73

Ala His Luell Pro Lell Gly Thir Wall Ser Wall Glu Luell Luell Glin 740 74. 7 O

Wall Glu Luell Glin Asp Wall Thir Arg Thir Lell Arg Ala Met 7ss 760 765

Ala Ala Thir Wall Pro His Wall Lell Glu Wall Luell 770

Lell Glu Glin Ala Tyr Glu Glin Wall Luell Arg Luell Thir 79 O 79.

Met Luell Glu Luell Lell Glu Pro Ala Met Glu Phe Ser 805 810 815

Glu Phe Ile Ala Lell Lell Ser Met Arg Pro Arg Tyr Ser Ile 825 83 O

Ser Ser Ser Pro Arg Wall Asp Glu Glin Ala Ser Ile Thir Wall Ser 835 84 O 845

Wall Wall Ser Gly Glu Ala Trp Ser Gly Gly Glu Gly Ile US 9,399,762 B2 61 - Continued

850 855 860 Ala Ser Asn Tyr Lieu Ala Asn Lieu. Glin Glu Gly Asp Thir Ile Thr Cys 865 87O 87s 88O Phe Val Ser Thr Pro Glin Ser Gly Phe Thr Lieu Pro Lys Gly Pro Glu 885 890 895 Thr Pro Leu. Ile Met Val Gly Pro Gly Thr Gly Val Ala Pro Phe Arg 9 OO 905 91 O Gly Phe Val Glin Ala Arg Lys Glin Lieu Lys Glu Glin Gly Glin Ser Lieu. 915 92 O 925 Gly Glu Ala His Lieu. Tyr Phe Gly Cys Arg Ser Pro His Glu Asp Tyr 93 O 935 94 O Lieu. Tyr Glin Lys Glu Lieu. Glu Asn Ala Glin Asn. Glu Gly Ile Ile Thr 945 950 955 96.O Lieu. His Thr Ala Phe Ser Arg Val Pro Asn Gln Pro Llys Thr Tyr Val 965 97O 97. Glin His Val Met Glu Glin Asp Gly Llys Llys Lieu. Ile Glu Lieu. Lieu. Asp 98O 985 99 O Gln Gly Ala His Phe Tyr Ile Cys Gly Asp Gly Ser Gln Met Ala Pro 995 1OOO 1 OOS Asp Val Glu Ala Thr Lieu Met Lys Ser Tyr Ala Glu Val His Glin Val 1010 1 O15 1 O2O Ser Glu Ala Asp Ala Arg Lieu. Trp Lieu. Glin Gln Lieu. Glu Glu Lys Gly 1025 103 O 1035 104 O Arg Tyr Ala Lys Asp Val Trp Ala Gly 1045

<210s, SEQ ID NO 4 &211s LENGTH: 1049 212. TYPE: PRT <213> ORGANISM: Bacillus megaterium 22 Os. FEATURE: <223> OTHER INFORMATION: NADPH-cytochrome P450 reductase 102A1V4, CYP1 O2A1

<4 OOs, SEQUENCE: 4 Met Thr Ile Lys Glu Met Pro Gln Pro Llys Thr Phe Gly Glu Lieu Lys 1. 5 1O 15 Asn Lieu Pro Lieu. Lieu. Asn. Thir Asp Llys Pro Ile Glin Thr Lieu Met Lys 2O 25 3O Ile Ala Asp Glu Lieu. Gly Glu Ile Phe Llys Phe Glu Ala Pro Gly Arg 35 4 O 45 Val Thr Arg Tyr Lieu. Ser Ser Glin Arg Lieu. Ile Llys Glu Ala Cys Asp SO 55 6 O Glu Ser Arg Phe Asp Lys Asn Lieu. Ser Glin Ala Lieu Lys Phe Val Arg 65 70 7s 8O Asp Phe Ala Gly Asp Gly Lieu. Phe Thir Ser Trp Thr His Glu Lys Asn 85 90 95 Trp Llys Lys Ala His Asn. Ile Lieu. Lieu Pro Ser Phe Ser Glin Glin Ala 1OO 105 11 O Met Lys Gly Tyr His Ala Met Met Val Asp Ile Ala Val Glin Lieu. Ile 115 12 O 125 Glin Llys Trp Glu Arg Lieu. Asn. Thir Asp Glu. His Ile Glu Val Pro Glu 13 O 135 14 O

Asp Met Thir Arg Lieu. Thir Lieu. Asp Thir Ile Gly Lieu. Cys Gly Phe Asn 145 150 155 160 US 9,399,762 B2 64 - Continued

Arg Phe Asn Ser Phe Arg Asp Glin Pro His Pro Phe Ile Thir 1.65 17O 17s

Ser Met Wall Arg Ala Lell Glu Ala Met ASn Lys Lell Glin Arg Ala 18O 185 19 O

Asn Pro Asp Asp Pro Ala Asp Glu Asn Lys Arg Glin Phe Glin Glu 195

Asp Ile Wall Met Asn Asp Luell Wall Asp Ile Ile Ala Asp Arg 21 O 215

Lys Ala Ser Gly Glu Glin Ser Asp Asp Luell Luell Thir His Met Luell Asn 225 23 O 235 24 O

Gly Asp Pro Glu Thir Gly Glu Pro Luell Asp Asp Glu Asn Ile Arg 245 250 255

Glin Ile Ile Thir Phe Lell Ile Ala Gly His Glu Thir Thir Ser Gly 26 O 265 27 O

Lell Luell Ser Phe Ala Lell Tyr Phe Luell Wall Asn Pro His Wall Luell 27s 28O 285

Glin Lys Ala Ala Glu Glu Ala Thir Arg Wall Luell Wall Asp Pro Wall Pro 29 O 295 3 OO

Ser Glin Wall Lys Glin Luell Tyr Wall Gly Met Wall Luell Asn 3. OS 310 315

Glu Ala Luell Arg Lell Trp Pro Thir Ala Pro Ala Phe Ser Luell Tyr Ala 3.25 330 335

Glu Asp Thir Wall Lell Gly Gly Glu Tyr Pro Lell Glu Lys Gly Asp 34 O 345 35. O

Glu Lieu Met Wall Lieu Ile Pro Lieu His Arg Asp Lys Thr Ile Trp 355 360 365

Gly Glu Asp Wall Glu Glu Phe Arg Pro Glu Arg Phe Glu Asn Pro Ser 37 O 375

Ala Ile Pro Glin His Ala Phe Pro Phe Gly Asn Gly Glin Arg Ala 385 390 395 4 OO

Ile Gly Glin Glin Phe Ala Luell His Glu Ala Thir Lell Wall Luell Gly 4 OS 415

Met Met Luell Lys His Phe Asp Phe Glu Asp His Thir Asn Tyr Glu Luell 425 43 O

Asp Ile Lys Glu Thir Lell Thir Luell Pro Glu Gly Phe Wall Wall 435 44 O 445

Ala Lys Ser Lys Ile Pro Luell Gly Ile Pro Ser Pro Ser Thir 450 45.5 460

Glu Glin Ser Ala Lys Wall Arg Wall Glu Asn Ala His Asn 465 470

Thir Pro Luell Luell Wall Lell Tyr Gly Ser Asn Met Gly Thir Ala Glu Gly 485 490 495

Thir Ala Arg Asp Lell Ala Asp Ile Ala Met Ser Gly Phe Ala Pro SOO 505

Glin Wall Ala Thir Lell Asp Ser His Ala Gly ASn Lell Pro Arg Glu Gly 515 525

Ala Wall Luell Ile Wall Thir Ala Ser Asn Gly His Pro Pro Asp Asn 53 O 535 54 O

Ala Glin Phe Wall Asp Trp Luell Asp Glin Ala Ser Ala Asp Asp Wall 5.45 550 555 560

Gly Wall Arg Tyr Ser Wall Phe Gly Cys Gly Asp Asn Trp Ala 565 st O sts

Thir Thir Glin Wall Pro Ala Phe Ile Asp Glu Thir Luell Ala Ala US 9,399,762 B2 65 66 - Continued

585 59 O

Gly Ala Glu Asn Ile Ala Asp Arg Gly Glu Ala Asp Ala Ser Asp 595 605

Asp Phe Glu Gly Thir Glu Glu Trp Arg Glu His Met Trp Ser Asp 610 615

Wall Ala Ala Phe Asn Lell Asp Ile Glu ASn Ser Glu Asp Asn Lys 625 630 635 64 O

Ser Thir Luell Ser Lell Glin Phe Wall Asp Ser Ala Ala Asp Met Pro Luell 645 650 655

Ala Met His Gly Ala Phe Ser Ala Asn Wall Wall Ala Ser Glu 660 665 67 O

Lell Glin Glin Pro Gly Ser Glu Arg Ser Thir Arg His Lell Glu Ile Ala 675 685

Lell Pro Glu Ala Ser Tyr Glin Glu Gly Asp His Lell Gly Wall Ile 69 O. 695 7 OO

Pro Arg Asn Glu Gly Ile Wall Asn Arg Wall Thir Ala Arg Phe Gly 7 Os

Lell Asp Ala Ser Glin Glin Ile Arg Luell Glu Ala Glu Glu Lys Luell 72 73 O 73

Ala His Luell Pro Lell Gly Thir Wall Ser Wall Glu Luell Luell Glin 740 74. 7 O

Wall Glu Luell Glin Asp Wall Thir Arg Thir Lell Arg Ala Met 7ss 760 765

Ala Ala Thir Wall Pro His Wall Lell Glu Ala Luell 770

Lell Glu Glin Ala Tyr Glu Glin Wall Luell Arg Luell Thir 79 O 79.

Met Luell Glu Luell Lell Glu Pro Ala Met Glu Phe Ser 805 810 815

Glu Phe Ile Ala Lell Lell Ser Ile Arg Pro Arg Tyr Ser Ile 825 83 O

Ser Ser Ser Pro Arg Wall Asp Glu Glin Ala Ser Ile Thir Wall Ser 835 84 O 845

Wall Wall Ser Gly Glu Ala Trp Ser Gly Gly Glu Gly Ile 850 855 860

Ala Ser Asn Lell Ala Asn Luell Glin Glu Gly Asp Thir Ile Thir Cys 865

Phe Wall Ser Thir Pro Glin Ser Gly Phe Thir Luell Pro Asp Ser Glu 885 890 895

Thir Pro Luell Ile Met Wall Gly Pro Gly Thir Gly Wall Ala Pro Phe Arg 9 OO 905 91 O

Ser Phe Wall Glin Ala Arg Glin Luell Glu Glin Gly Glin Ser Luell 915 92 O 925

Gly Glu Ala His Lell Phe Gly Arg Ser Pro His Glu Asp Tyr 93 O 935 94 O

Lell Glin Glu Glu Lell Glu Asn Ala Glin ASn Glu Gly Ile Ile Thir 945 950 955 96.O

Lell His Thir Ala Phe Ser Arg Wall Pro Asn Glin Pro Thir Tyr Wall 965 97O 97.

Glin His Wall Met Glu Glin Asp Gly Lys Luell Ile Glu Luell Luell Asp 98O 985 99 O

Glin Gly Ala His Phe Ile Cys Gly Asp Gly Ser Glin Met Ala Pro 995 1OOO 1 OOS US 9,399,762 B2 67 - Continued

Asp Val Glu Ala Thr Lieu Met Lys Ser Tyr Ala Asp Val Tyr Glu Val 1010 1 O15 1 O2O Ser Glu Ala Asp Ala Arg Lieu. Trp Lieu. Glin Gln Lieu. Glu Glu Lys Gly 1025 103 O 1035 104 O Arg Tyr Ala Lys Asp Val Trp Ala Gly 1045

<210s, SEQ ID NO 5 &211s LENGTH: 1049 212. TYPE: PRT <213> ORGANISM: Bacillus megaterium 22 Os. FEATURE: <223> OTHER INFORMATION: NADPH-cytochrome P450 reductase 102A1V8, CYP1 O2A1

<4 OOs, SEQUENCE: 5 Met Thr Ile Lys Glu Met Pro Gln Pro Llys Thr Phe Gly Glu Lieu Lys 1. 5 1O 15 Asn Lieu Pro Lieu. Lieu. Asn. Thir Asp Llys Pro Ile Glin Thr Lieu Met Lys 2O 25 3O Ile Ala Asp Glu Lieu. Gly Glu Ile Phe Llys Phe Glu Ala Pro Gly Arg 35 4 O 45 Val Thr Arg Tyr Lieu. Ser Ser Glin Arg Lieu. Ile Llys Glu Ala Cys Asp SO 55 6 O Glu Ser Arg Phe Asp Lys Asn Lieu. Ser Glin Ala Lieu Lys Phe Val Arg 65 70 7s 8O Asp Phe Ala Gly Asp Gly Lieu Phe Thr Ser Trp Thr His Glu Lys Asn 85 90 95 Trp Llys Lys Ala His Asn. Ile Lieu. Lieu Pro Ser Phe Ser Glin Glin Ala 1OO 105 11 O Met Lys Gly Tyr His Ala Met Met Val Asp Ile Ala Val Glin Lieu. Ile 115 12 O 125 Glin Llys Trp Glu Arg Lieu. Asn. Thir Asp Glu. His Ile Glu Val Pro Glu 13 O 135 14 O Asp Met Thir Arg Lieu. Thir Lieu. Asp Thir Ile Gly Lieu. Cys Gly Phe Asn 145 150 155 160 Tyr Arg Phe Asin Ser Phe Tyr Arg Asp Glin Pro His Pro Phe Ile Thr 1.65 17O 17s Ser Met Val Arg Ala Lieu. Asp Glu Ala Met Asn Llys Lieu. Glin Arg Ala 18O 185 19 O Asn Pro Asp Asp Pro Ala Tyr Asp Glu Asn Lys Arg Glin Phe Glin Glu 195 2OO 2O5 Asp Ile Llys Val Met Asn Asp Lieu Val Asp Llys Ile Ile Ala Asp Arg 21 O 215 22O Lys Ala Ser Gly Glu Glin Ser Asp Asp Lieu. Lieu. Thir His Met Lieu. Asn 225 23 O 235 24 O

Gly Lys Asp Pro Glu Thr Gly Glu Pro Lieu. Asp Asp Glu Asn. Ile Arg 245 250 255

Tyr Glin Ile Ile Thr Phe Lieu. Ile Ala Gly His Glu Thir Thr Ser Gly 26 O 265 27 O

Lieu. Lieu. Ser Phe Ala Lieu. Tyr Phe Lieu Val Lys Asn Pro His Val Lieu. 27s 28O 285

Glin Lys Ala Ala Glu Glu Ala Ala Arg Val Lieu Val Asp Pro Val Pro 29 O 295 3 OO

Ser Tyr Lys Glin Val Lys Glin Lieu Lys Tyr Val Gly Met Val Lieu. Asn US 9,399,762 B2 69 70 - Continued

3. OS 310 315

Glu Ala Luell Arg Lell Trp Pro Thir Ala Pro Ala Phe Ser Luell Tyr Ala 3.25 330 335

Glu Asp Thir Wall Lell Gly Gly Glu Tyr Pro Lell Glu Lys Gly Asp 34 O 345 35. O

Glu Luell Met Wall Lell Ile Pro Glin Luell His Arg Asp Lys Thir Ile Trp 355 360 365

Gly Asp Asp Wall Glu Glu Phe Arg Pro Glu Arg Phe Glu Asn Pro Ser 37 O 375

Ala Ile Pro Glin His Ala Phe Pro Phe Gly Asn Gly Glin Arg Ala 385 390 395 4 OO

Ile Gly Glin Glin Phe Ala Luell His Glu Ala Thir Lell Wall Luell Gly 4 OS 415

Met Met Luell Lys His Phe Asp Phe Glu Asp His Thir Asn Tyr Glu Luell 425 43 O

Asp Ile Lys Glu Thir Lell Thir Luell Pro Glu Gly Phe Wall Wall 435 44 O 445

Ala Lys Ser Glin Ile Pro Luell Gly Ile Pro Ser Pro Ser Arg 450 45.5 460

Glu Glin Ser Ala Lys Glu Arg Thir Wall Glu Asn Ala His Asn 465 470

Thir Pro Luell Luell Wall Lell Gly Ser Asn Met Gly Thir Ala Glu Gly 485 490 495

Thir Ala Arg Asp Lell Ala Asp Ile Ala Met Ser Gly Phe Ala Pro 500 505

Arg Wall Ala Thir Lell Asp Ser His Ala Gly ASn Lell Pro Arg Glu Gly 515 525

Ala Wall Luell Ile Wall Thir Ala Ser Asn Gly His Pro Pro Asp Asn 53 O 535 54 O

Ala Glu Phe Wall Asp Trp Luell Asp Glin Ala Ser Ala Asp Glu Wall 5.45 550 555 560

Gly Wall Arg Tyr Ser Wall Phe Gly Cys Gly Asp Asn Trp Ala 565 st O sts

Thir Thir Glin Wall Pro Ala Phe Ile Asp Glu Thir Luell Ala Ala 585 59 O

Gly Ala Glu Asn Ile Ala Glu Arg Gly Glu Ala Asp Ala Ser Asp 595 605

Asp Phe Glu Gly Thir Glu Glu Trp Arg Glu His Met Trp Ser Asp 610 615

Lell Ala Ala Phe Asn Lell Asp Ile Glu ASn Ser Glu Glu Asn Ala 625 630 635 64 O

Ser Thir Luell Ser Lell Glin Phe Wall Asp Ser Ala Ala Asp Met Pro Luell 645 650 655

Ala Met His Arg Ala Phe Ser Ala Asn Wall Wall Ala Ser Glu 660 665 67 O

Lell Glin Lys Pro Gly Ser Ala Arg Ser Thir Arg His Lell Glu Ile Glu 675 685

Lell Pro Glu Ala Ser Tyr Glin Glu Gly Asp His Lell Gly Wall Ile 69 O. 695 7 OO

Pro Arg Asn Tyr Glu Gly Ile Wall Asn Arg Wall Ala Thir Arg Phe Gly 7 Os

Lell Asp Ala Ser Glin Glin Ile Arg Luell Glu Ala Glu Glu Glu Lys Luell 72 73 O 73 US 9,399,762 B2 71 - Continued

Ala His Lieu Pro Lieu. Gly Llys Thr Val Ser Val Glu Glu Lieu. Lieu. Glin 740 74. 7 O Tyr Val Glu Lieu. Glin Asp Pro Val Thr Arg Thr Glin Lieu. Arg Ala Met 7ss 760 765 Ala Ala Lys Thr Val Cys Pro Pro His Llys Val Glu Lieu. Glu Val Lieu 770 775 78O Lieu. Glu Lys Glin Ala Tyr Lys Glu Glin Val Lieu Ala Lys Arg Lieu. Thir 78s 79 O 79. 8OO Met Leu Glu Lieu Lleu. Glu Lys Tyr Pro Ala Cys Glu Met Glu Phe Ser 805 810 815 Glu Phe Ile Ala Leu Lleu Pro Ser Met Arg Pro Arg Tyr Tyr Ser Ile 82O 825 83 O Ser Ser Ser Pro Arg Val Asp Glu Lys Glin Ala Ser Ile Thr Val Ser 835 84 O 845 Val Val Ser Gly Glu Ala Trp Ser Gly Tyr Gly Glu Tyr Lys Gly Ile 850 855 860 Ala Ser Asn Tyr Lieu Ala Asn Lieu. Glin Glu Gly Asp Thir Ile Thr Cys 865 87O 87s 88O Phe Val Ser Thr Pro Glin Ser Gly Phe Thr Lieu Pro Lys Gly Pro Glu 885 890 895 Thr Pro Leu. Ile Met Val Gly Pro Gly Thr Gly Val Ala Pro Phe Arg 9 OO 905 91 O Gly Phe Val Glin Ala Arg Lys Glin Lieu Lys Glu Glin Gly Glin Ser Lieu. 915 92 O 925 Gly Glu Ala His Lieu. Tyr Phe Gly Cys Arg Ser Pro His Glu Asp Tyr 93 O 935 94 O Lieu. Tyr Glin Lys Glu Lieu. Glu Asn Ala Glin Asn. Glu Gly Ile Ile Thr 945 950 955 96.O Lieu. His Thr Ala Phe Ser Arg Val Pro Asn Gln Pro Llys Thr Tyr Val 965 97O 97. Glin His Val Met Glu Glin Asp Gly Llys Llys Lieu. Ile Glu Lieu. Lieu. Asp 98O 985 99 O Gln Gly Ala His Phe Tyr Ile Cys Gly Asp Gly Ser Gln Met Ala Pro 995 1OOO 1 OOS Asp Val Glu Ala Thr Lieu Met Lys Ser Tyr Ala Glu Val His Glin Val 1010 1 O15 1 O2O Ser Glu Ala Asp Ala Arg Lieu. Trp Lieu. Glin Gln Lieu. Glu Glu Lys Gly 1025 103 O 1035 104 O Arg Tyr Ala Lys Asp Val Trp Ala Gly 1045

<210s, SEQ ID NO 6 &211s LENGTH: 1049 212. TYPE: PRT <213> ORGANISM: Bacillus megaterium 22 Os. FEATURE: <223> OTHER INFORMATION: NADPH-cytochrome P450 reductase 102A1V3, CYP1 O2A1

<4 OOs, SEQUENCE: 6 Met Thr Ile Lys Glu Met Pro Gln Pro Llys Thr Phe Gly Glu Lieu Lys 1. 5 1O 15

Asn Lieu Pro Lieu. Lieu. Asn. Thir Asp Llys Pro Val Glin Ala Lieu Met Lys 2O 25 3O

Ile Ala Asp Glu Lieu. Gly Glu Ile Phe Llys Phe Glu Ala Pro Gly Arg US 9,399,762 B2 73 74 - Continued

35 4 O 45

Wall Thir Arg Lell Ser Ser Glin Arg Luell Ile Lys Glu Ala Asp SO 55 6 O

Glu Ser Arg Phe Asp Lys Asn Luell Ser Glin Ala Lell Phe Wall Arg 65 70

Asp Phe Ala Gly Asp Gly Lell Phe Thir Ser Trp Thir His Glu Lys Asn 85 90 95

Trp Ala His Asn Ile Luell Luell Pro Ser Phe Ser Glin Glin Ala 105 11 O

Met Gly His Ala Met Met Wall Asp Ile Ala Wall Glin Luell Wall 115 12 O 125

Glin Lys Trp Glu Arg Lell Asn Ala Asp Glu His Ile Glu Wall Pro Glu 13 O 135 14 O

Asp Met Thir Arg Lell Thir Lell Asp Thir Ile Gly Lell Gly Phe Asn 145 150 155 160

Arg Phe Asn Ser Phe Arg Asp Glin Pro His Pro Phe Ile Thir 1.65 17O 17s

Ser Met Wall Arg Ala Lell Glu Ala Met ASn Lell Glin Arg Ala 18O 185 19 O

Asn Pro Asp Asp Pro Ala Asp Glu Asn Arg Glin Phe Glin Glu 195

Asp Ile Wall Met Asn Asp Luell Wall Asp Ile Ile Ala Asp Arg 21 O 215 22O

Lys Ala Ser Gly Glu Glin Ser Asp Asp Luell Luell Thir His Met Luell Asn 225 23 O 235 24 O

Gly Asp Pro Glu Thir Gly Glu Pro Luell Asp Asp Glu Asn Ile Arg 245 250 255

Glin Ile Ile Thir Phe Lell Ile Ala Gly His Glu Thir Thir Ser Gly 26 O 265 27 O

Lell Luell Ser Phe Ala Lell Phe Luell Wall Asn Pro His Wall Luell 27s 28O 285

Glin Lys Ala Ala Glu Glu Ala Ala Arg Wall Luell Wall Asp Pro Wall Pro 29 O 295 3 OO

Ser Glin Wall Lys Glin Luell Wall Gly Met Wall Luell Asn 3. OS 310 315

Glu Ala Luell Arg Lell Trp Pro Thir Ala Pro Ala Phe Ser Luell Tyr Ala 3.25 330 335

Glu Asp Thir Wall Lell Gly Gly Glu Pro Lell Glu Lys Gly Asp 34 O 345 35. O

Glu Luell Met Wall Lell Ile Pro Glin Luell His Arg Asp Lys Thir Ile Trp 355 360 365

Gly Asp Asp Wall Glu Glu Phe Arg Pro Glu Arg Phe Glu Asn Pro Ser 37 O 375

Ala Ile Pro Glin His Ala Phe Pro Phe Gly Asn Gly Glin Arg Ala 385 390 395 4 OO

Ile Gly Glin Glin Phe Ala Luell His Glu Ala Thir Lell Wall Luell Gly 4 OS 41O 415

Met Met Luell Lys His Phe Asp Phe Glu Asp His Thir Asn Tyr Glu Luell 425 43 O

Asp Ile Lys Glu Thir Lell Thir Luell Pro Glu Gly Phe Wall Wall Lys 435 44 O 445

Ala Lys Ser Lys Ile Pro Luell Gly Gly Ile Pro Ser Pro Ser Thir 450 45.5 460 US 9,399,762 B2 75 76 - Continued

Glu Glin Ser Ala Lys Lys Wall Arg Lys Wall Glu Asn Ala His Asn 465 470

Thir Pro Luell Luell Wall Lell Gly Ser Asn Met Gly Thir Ala Glu Gly 485 490 495

Thir Ala Arg Asp Lell Ala Asp Ile Ala Met Ser Gly Phe Ala Pro SOO 505

Glin Wall Ala Thir Lell Asp Ser His Ala Gly ASn Lell Pro Arg Glu Gly 515 525

Ala Wall Luell Ile Wall Thir Ala Ser Asn Gly His Pro Pro Asp Asn 53 O 535 54 O

Ala Glin Phe Wall Asp Trp Luell Asp Glin Ala Ser Ala Asp Asp Wall 5.45 550 555 560

Gly Wall Arg Tyr Ser Wall Phe Gly Cys Gly Asp Asn Trp Ala 565 st O sts

Thir Thir Glin Lys Wall Pro Ala Phe Ile Asp Glu Thir Luell Ala Ala 585 59 O

Gly Ala Glu Asn Ile Ala Asp Arg Gly Glu Ala Asp Ala Ser Asp 595 605

Asp Phe Glu Gly Thir Glu Glu Trp Arg Glu His Met Trp Ser Asp 610 615

Wall Ala Ala Phe Asn Lell Asp Ile Glu ASn Ser Glu Asp Asn Lys 625 630 635 64 O

Ser Thir Luell Ser Lell Glin Phe Wall Asp Ser Ala Ala Asp Met Pro Luell 645 650 655

Ala Met His Gly Ala Phe Ser Ala Asn Wall Wall Ala Ser Glu 660 665 67 O

Lell Glin Glin Luell Gly Ser Glu Arg Ser Thir Arg His Lell Glu Ile Ala 675 685

Lell Pro Glu Ala Ser Tyr Glin Glu Gly Asp His Lell Gly Wall Ile 69 O. 695 7 OO

Pro Arg Asn Glu Gly Ile Wall Asn Arg Wall Thir Ala Arg Phe Gly 7 Os

Lell Asp Ala Ser Glin Glin Ile Arg Luell Glu Ala Glu Glu Lys Luell 72 73 O 73

Ala His Luell Pro Lell Gly Thir Wall Ser Wall Glu Luell Luell Glin 740 74. 7 O

Wall Glu Luell Glin Asp Wall Thir Arg Thir Lell Arg Ala Met 7ss 760 765

Ala Ala Thir Wall Pro His Wall Lell Glu Ala Luell 770

Lell Glu Glin Ala Tyr Glu Glin Wall Luell Arg Luell Thir 78s 79 O 79.

Met Luell Glu Luell Lell Glu Pro Ala Met Glu Phe Ser 805 810 815

Glu Phe Ile Ala Lell Lell Ser Ile Ser Pro Arg Tyr Ser Ile 82O 825 83 O

Ser Ser Ser Pro His Wall Asp Glu Glin Ala Ser Ile Thir Wall Ser 835 84 O 845

Wall Wall Ser Gly Glu Ala Trp Ser Gly Gly Glu Gly Ile 850 855 860

Ala Ser Asn Lell Ala Asn Luell Glin Glu Gly Asp Thir Ile Thir 865 87O 87s US 9,399,762 B2 77 - Continued

Phe Val Ser Thr Pro Glin Ser Gly Phe Thr Lieu Pro Lys Asp Ser Glu 885 890 895 Thr Pro Leu. Ile Met Val Gly Pro Gly Thr Gly Val Ala Pro Phe Arg 9 OO 905 91 O Gly Phe Val Glin Ala Arg Lys Glin Lieu Lys Glu Glin Gly Glin Ser Lieu. 915 92 O 925 Gly Glu Ala His Lieu. Tyr Phe Gly Cys Arg Ser Pro His Glu Asp Tyr 93 O 935 94 O Lieu. Tyr Glin Glu Glu Lieu. Glu Asn Ala Glin Asn. Glu Gly Ile Ile Thr 945 950 955 96.O Lieu. His Thr Ala Phe Ser Arg Val Pro Asn Gln Pro Llys Thr Tyr Val 965 97O 97. Glin His Val Met Glu Arg Asp Gly Llys Llys Lieu. Ile Glu Lieu. Lieu. Asp 98O 985 99 O Gln Gly Ala His Phe Tyr Ile Cys Gly Asp Gly Ser Gln Met Ala Pro 995 1OOO 1 OOS Asp Val Glu Ala Thr Lieu Met Lys Ser Tyr Ala Asp Val Tyr Glu Val 1010 1 O15 1 O2O Ser Glu Ala Asp Ala Arg Lieu. Trp Lieu. Glin Gln Lieu. Glu Glu Lys Gly 1025 103 O 1035 104 O Arg Tyr Ala Lys Asp Val Trp Ala Gly 1045

<210s, SEQ ID NO 7 &211s LENGTH: 1049 212. TYPE PRT <213> ORGANISM: Bacillus megaterium 22 Os. FEATURE: <223> OTHER INFORMATION: NADPH-cytochrome P450 reductase 102A1V7, CYP1 O2A1

<4 OO > SEQUENCE: 7 Met Thr Ile Lys Glu Met Pro Gln Pro Llys Thr Phe Gly Glu Lieu Lys 1. 5 1O 15 Asn Lieu Pro Lieu. Lieu. Asn. Thir Asp Llys Pro Ile Glin Thr Lieu Met Lys 2O 25 3O Ile Ala Asp Glu Lieu. Gly Glu Ile Phe Llys Phe Glu Ala Pro Gly Arg 35 4 O 45 Val Thr Arg Tyr Lieu. Ser Ser Glin Arg Lieu. Ile Llys Glu Ala Cys Asp SO 55 6 O Glu Ser Arg Phe Asp Lys Asn Lieu. Ser Glin Ala Lieu Lys Phe Val Arg 65 70 7s 8O Asp Phe Ala Gly Asp Gly Lieu. Phe Thir Ser Trp Thr His Glu Lys Asn 85 90 95 Trp Llys Lys Ala His Asn. Ile Lieu. Lieu Pro Ser Phe Ser Glin Glin Ala 1OO 105 11 O Met Lys Gly Tyr His Ala Met Met Val Asp Ile Ala Val Glin Lieu. Ile 115 12 O 125

Glin Llys Trp Glu Arg Lieu. Asn. Thir Asp Glu. His Ile Glu Val Pro Glu 13 O 135 14 O

Asp Met Thir Arg Lieu. Thir Lieu. Asp Thir Ile Gly Lieu. Cys Gly Phe Asn 145 150 155 160 Tyr Arg Phe Asin Ser Phe Tyr Arg Asp Glin Pro His Pro Phe Ile Thr 1.65 17O 17s

Ser Met Val Arg Ala Lieu. Asp Glu Ala Met Asn Llys Lieu. Glin Arg Ala 18O 185 19 O US 9,399,762 B2 79 80 - Continued

Asn Pro Asp Asp Pro Ala Asp Glu Asn Lys Arg Glin Phe Glin Glu 195

Asp Ile Wall Met Asn Asp Luell Wall Asp Lys Ile Ile Ala Asp Arg 21 O 215

Lys Ala Ser Gly Glu Glin Ser Asp Asp Luell Luell Thir His Met Luell Asn 225 23 O 235 24 O

Gly Asp Pro Glu Thir Gly Glu Pro Luell Asp Asp Glu Asn Ile Arg 245 250 255

Glin Ile Ile Thir Phe Lell Ile Ala Gly His Glu Thir Thir Ser Gly 26 O 265 27 O

Lell Luell Ser Phe Ala Lell Tyr Phe Luell Wall Asn Pro His Wall Luell 27s 28O 285

Glin Lys Ala Ala Glu Glu Ala Ala Arg Wall Luell Wall Asp Pro Wall Pro 29 O 295 3 OO

Ser Glin Wall Lys Glin Luell Wall Gly Met Wall Luell Asn 3. OS 310 315

Glu Ala Luell Arg Lell Trp Pro Thir Ala Pro Ala Phe Ser Luell Tyr Ala 3.25 330 335

Glu Asp Thir Wall Lell Gly Gly Glu Tyr Pro Lell Glu Lys Gly Asp 34 O 345 35. O

Glu Luell Met Wall Lell Ile Pro Glin Luell His Arg Asp Lys Thir Ile Trp 355 360 365

Gly Asp Asp Wall Glu Glu Phe Arg Pro Glu Arg Phe Glu Asn Pro Ser 37 O 375

Ala Ile Pro Glin His Ala Phe Pro Phe Gly Asn Gly Glin Arg Ala 385 390 395 4 OO

Ile Gly Glin Glin Phe Ala Luell His Glu Ala Thir Lell Wall Luell Gly 4 OS 415

Met Met Luell Lys His Phe Asp Phe Glu Asp His Thir Asn Tyr Glu Luell 425 43 O

Asp Ile Lys Glu Thir Lell Thir Luell Pro Glu Gly Phe Wall Wall Lys 435 44 O 445

Ala Lys Ser Glin Ile Pro Luell Gly Ile Pro Ser Pro Ser Arg 450 45.5 460

Glu Glin Ser Ala Lys Glu Arg Thir Wall Glu Asn Ala His Asn 465 470

Thir Pro Luell Luell Wall Lell Tyr Gly Ser Asn Met Gly Thir Ala Glu Gly 485 490 495

Thir Ala Arg Asp Lell Ala Asp Ile Ala Met Ser Gly Phe Ala Pro SOO 505

Glin Wall Ala Thir Lell Asp Ser His Ala Gly ASn Lell Pro Pro Glu Gly 515 525

Ala Wall Luell Ile Wall Thir Ala Ser Asn Gly His Pro Pro Asp Asn 53 O 535 54 O

Ala Glu Phe Wall Asp Trp Luell Asp Glin Ala Ser Ala Asp Glu Wall 5.45 550 555 560

Gly Wall Arg Tyr Ser Wall Phe Gly Cys Gly Asp Asn Trp Ala 565 st O sts

Thir Thir Glin Wall Pro Ala Phe Ile Asp Glu Thir Luell Ala Ala 585 59 O

Gly Ala Glu Asn Ile Ala Glu Arg Gly Glu Ala Asp Ala Ser Asp 595 6OO 605 US 9,399,762 B2 81 - Continued Asp Phe Glu Gly Thr Tyr Glu Glu Trp Arg Glu. His Met Trp Ser Asp 610 615 62O Lieu Ala Ala Tyr Phe Asn Lieu. Asp Ile Glu Asn. Ser Glu Glu Asn Ala 625 630 635 64 O Ser Thr Lieu. Ser Lieu. Glin Phe Val Asp Ser Ala Ala Asp Met Pro Lieu. 645 650 655 Ala Lys Met His Arg Ala Phe Ser Ala Asn Val Val Ala Ser Lys Glu 660 665 67 O Lieu. Glin Llys Pro Gly Ser Ala Arg Ser Thr Arg His Lieu. Glu Ile Glu 675 68O 685 Lieu Pro Lys Glu Ala Ser Tyr Glin Glu Gly Asp His Lieu. Gly Val Ile 69 O. 695 7 OO Pro Arg Asn Tyr Glu Gly Ile Val Asn Arg Val Ala Thr Arg Phe Gly 7 Os 71O 71s 72O Lieu. Asp Ala Ser Glin Glin Ile Arg Lieu. Glu Ala Glu Glu Glu Lys Lieu. 72 73 O 73 Ala His Lieu Pro Lieu. Gly Llys Thr Val Ser Val Glu Glu Lieu. Lieu. Glin 740 74. 7 O Tyr Val Glu Lieu. Glin Asp Pro Val Thr Arg Thr Glin Lieu. Arg Ala Met 7ss 760 765 Ala Ala Lys Thr Val Cys Pro Pro His Llys Val Glu Lieu. Glu Val Lieu 770 775 78O Lieu. Glu Lys Glin Ala Tyr Lys Glu Glin Val Lieu Ala Lys Arg Lieu. Thir 78s 79 O 79. 8OO Met Lieu. Glu Lieu Lieu. Glu Lys Tyr Pro Ala Cys Glu Met Glu Phe Ser 805 810 815 Glu Phe Ile Ala Leu Lleu Pro Ser Met Arg Pro Arg Tyr Tyr Ser Ile 82O 825 83 O Ser Ser Ser Pro Arg Val Asp Glu Lys Glin Ala Ser Ile Thr Val Ser 835 84 O 845 Val Val Ser Gly Glu Ala Trp Ser Gly Tyr Gly Glu Tyr Lys Gly Ile 850 855 860 Ala Ser Asn Tyr Lieu Ala Asn Lieu. Glin Glu Gly Asp Thir Ile Thr Cys 865 87O 87s 88O Phe Val Ser Thr Pro Glin Ser Gly Phe Thr Lieu Pro Lys Gly Pro Glu 885 890 895 Thr Pro Leu. Ile Met Val Gly Pro Gly Thr Gly Val Ala Pro Phe Arg 9 OO 905 91 O Gly Phe Val Glin Ala Arg Lys Glin Lieu Lys Glu Glin Gly Glin Ser Lieu. 915 92 O 925 Gly Glu Ala His Lieu. Tyr Phe Gly Cys Arg Ser Pro His Glu Asp Tyr 93 O 935 94 O Lieu. Tyr Glin Lys Glu Lieu. Glu Asn Ala Glin Asn. Glu Gly Ile Ile Thr 945 950 955 96.O

Lieu. His Thr Ala Phe Ser Arg Val Pro Asn Glu Pro Llys Thr Tyr Val 965 97O 97. Glin His Val Met Glu Glin Asp Gly Llys Llys Lieu. Ile Glu Lieu. Lieu. Asp 98O 985 99 O Gln Gly Ala His Phe Tyr Ile Cys Gly Asp Gly Ser Gln Met Ala Pro 995 1OOO 1 OOS

Asp Val Glu Ala Thr Lieu Met Lys Ser Tyr Ala Glu Val His Glin Val 1010 1 O15 1 O2O

Ser Glu Ala Asp Ala Arg Lieu. Trp Lieu. Glin Gln Lieu. Glu Glu Lys Gly US 9,399,762 B2 83 - Continued

1025 103 O 1035 104 O Arg Tyr Ala Lys Asp Val Trp Ala Gly 1045

<210s, SEQ ID NO 8 &211s LENGTH: 1049 212. TYPE: PRT <213> ORGANISM: Bacillus megaterium 22 Os. FEATURE: <223> OTHER INFORMATION: NADPH-cytochrome P450 reductase 102A1V2, CYP1 O2A1

<4 OOs, SEQUENCE: 8 Met Thr Ile Lys Glu Met Pro Gln Pro Llys Thr Phe Gly Glu Lieu Lys 1. 5 1O 15 Asn Lieu Pro Lieu. Lieu. Asn. Thir Asp Llys Pro Ile Glin Thr Lieu Met Lys 2O 25 3O Ile Ala Asp Glu Lieu. Gly Glu Ile Phe Llys Phe Glu Ala Pro Gly Arg 35 4 O 45 Val Thr Arg Tyr Lieu. Ser Ser Glin Arg Lieu. Ile Llys Glu Ala Cys Asp SO 55 6 O Glu Ser Arg Phe Asp Lys Asn Lieu. Ser Glin Ala Lieu Lys Phe Val Arg 65 70 7s 8O Asp Phe Ala Gly Asp Gly Lieu. Phe Thir Ser Trp Thr His Glu Lys Asn 85 90 95 Trp Llys Lys Ala His Asn. Ile Lieu. Lieu Pro Ser Phe Ser Glin Glin Ala 1OO 105 11 O Met Lys Gly Tyr His Ala Met Met Val Asp Ile Ala Val Glin Lieu. Ile 115 12 O 125 Glin Llys Trp Glu Arg Lieu. Asn. Thir Asp Glu. His Ile Glu Val Pro Glu 13 O 135 14 O Asp Met Thir Arg Lieu. Thir Lieu. Asp Thir Ile Gly Lieu. Cys Gly Phe Asn 145 150 155 160 Tyr Arg Phe Asin Ser Phe Tyr Arg Asp Glin Pro His Pro Phe Ile Thr 1.65 17O 17s Ser Met Val Arg Ala Lieu. Asp Glu Ala Met Asn Llys Lieu. Glin Arg Ala 18O 185 19 O Asn Pro Asp Asp Pro Ala Tyr Asp Glu Asn Lys Arg Glin Phe Glin Glu 195 2OO 2O5 Asp Ile Llys Val Met Asn Asp Lieu Val Asp Llys Ile Ile Ala Asp Arg 21 O 215 22O Lys Ala Ser Gly Glu Glin Ser Asp Asp Lieu. Lieu. Thir His Met Lieu. Asn 225 23 O 235 24 O Gly Lys Asp Pro Glu Thr Gly Glu Pro Lieu. Asp Asp Glu Asn. Ile Arg 245 250 255 Tyr Glin Ile Ile Thr Phe Lieu. Ile Ala Gly His Glu Thir Thr Ser Gly 26 O 265 27 O Lieu. Lieu. Ser Phe Ala Lieu. Tyr Phe Lieu Val Lys Asn Pro His Val Lieu. 27s 28O 285

Glin Lys Ala Ala Glu Glu Ala Thr Arg Val Lieu Val Asp Pro Val Pro 29 O 295 3 OO

Ser Tyr Lys Glin Val Lys Glin Lieu Lys Tyr Val Gly Met Val Lieu. Asn 3. OS 310 315 32O

Glu Ala Lieu. Arg Lieu. Trp Pro Thr Ala Pro Ala Phe Ser Lieu. Tyr Ala 3.25 330 335 US 9,399,762 B2 85 86 - Continued

Glu Asp Thir Wall Lell Gly Gly Glu Tyr Pro Lell Glu Lys Gly Asp 34 O 345 35. O

Glu Luell Met Wall Lell Ile Pro Glin Luell His Arg Asp Lys Thir Ile Trp 355 360 365

Gly Glu Asp Wall Glu Glu Phe Arg Pro Glu Arg Phe Glu Asn Pro Ser 37 O 375

Ala Ile Pro Glin His Ala Phe Pro Phe Gly Asn Gly Glin Arg Ala 385 390 395 4 OO

Ile Gly Glin Glin Phe Ala Luell His Glu Ala Thir Lell Wall Luell Gly 4 OS 415

Met Met Luell Lys His Phe Asp Phe Glu Asp His Thir Asn Tyr Glu Luell 425 43 O

Asp Ile Lys Glu Thir Lell Thir Luell Pro Glu Gly Phe Wall Wall 435 44 O 445

Ala Lys Ser Lys Ile Pro Luell Gly Ile Pro Ser Pro Ser Thir 450 45.5 460

Glu Glin Ser Ala Lys Wall Arg Wall Glu Asn Ala His Asn 465 470

Thir Pro Luell Luell Wall Lell Tyr Gly Ser Asn Met Gly Thir Ala Glu Gly 485 490 495

Thir Ala Arg Asp Lell Ala Asp Ile Ala Met Ser Gly Phe Ala Pro SOO 505

Glin Wall Ala Thir Lell Asp Ser His Ala Gly ASn Lell Pro Arg Glu Gly 515 525

Ala Wall Lieu Ile Wall Thr Ala Ser Asn Gly His Pro Pro Asp Asn 53 O 535 54 O

Ala Glin Phe Wall Asp Trp Luell Asp Glin Ala Ser Ala Asp Asp Wall 5.45 550 555 560

Gly Wall Arg Tyr Ser Wall Phe Gly Cys Gly Asp Asn Trp Ala 565 st O sts

Thir Thir Glin Wall Pro Ala Phe Ile Asp Glu Thir Luell Ala Ala 585 59 O

Gly Ala Glu Asn Ile Ala Asp Arg Gly Glu Ala Asp Ala Ser Asp 595 605

Asp Phe Glu Gly Thir Glu Glu Trp Arg Glu His Met Trp Ser Asp 610 615

Wall Ala Ala Phe Asn Lell Asp Ile Glu ASn Ser Glu Asp Asn Lys 625 630 635 64 O

Ser Thir Luell Ser Lell Glin Phe Wall Asp Ser Ala Ala Asp Met Pro Luell 645 650 655

Ala Met His Gly Ala Phe Ser Ala Asn Wall Wall Ala Ser Glu 660 665 67 O

Lell Glin Glin Luell Gly Ser Glu Arg Ser Thir Arg His Lell Glu Ile Ala 675 685

Lell Pro Glu Ala Ser Tyr Glin Glu Gly Asp His Lell Gly Wall Ile 69 O. 695 7 OO

Pro Arg Asn Glu Gly Ile Wall Asn Arg Wall Thir Ala Arg Phe Gly 7 Os 71s

Lell Asp Ala Ser Glin Glin Ile Arg Luell Glu Ala Glu Glu Glu Lys Luell 72 73 O 73

Ala His Luell Pro Lell Gly Thir Wall Ser Wall Glu Glu Luell Luell Glin 740 74. 7 O

Wall Glu Luell Glin Asp Pro Wall Thir Arg Thir Glin Lell Arg Ala Met US 9,399,762 B2 87 - Continued

7ss 760 765 Ala Ala Lys Thr Val Cys Pro Pro His Llys Val Glu Lieu. Glu Ala Lieu 770 775 78O Lieu. Glu Lys Glin Ala Tyr Lys Glu Glin Val Lieu Ala Lys Arg Lieu. Thir 78s 79 O 79. 8OO Met Leu Glu Lieu Lleu. Glu Lys Tyr Pro Ala Cys Glu Met Glu Phe Ser 805 810 815 Glu Phe Ile Ala Leu Lleu Pro Ser Ile Ser Pro Arg Tyr Tyr Ser Ile 82O 825 83 O Ser Ser Ser Pro His Val Asp Glu Lys Glin Ala Ser Ile Thr Val Ser 835 84 O 845 Val Val Ser Gly Glu Ala Trp Ser Gly Tyr Gly Glu Tyr Lys Gly Ile 850 855 860 Ala Ser Asn Tyr Lieu Ala Asn Lieu. Glin Glu Gly Asp Thir Ile Thr Cys 865 87O 87s 88O Phe Val Ser Thr Pro Glin Ser Gly Phe Thr Lieu Pro Lys Asp Ser Glu 885 890 895 Thr Pro Leu. Ile Met Val Gly Pro Gly Thr Gly Val Ala Pro Phe Arg 9 OO 905 91 O Gly Phe Val Glin Ala Arg Lys Glin Lieu Lys Glu Glin Gly Glin Ser Lieu. 915 92 O 925 Gly Glu Ala His Lieu. Tyr Phe Gly Cys Arg Ser Pro His Glu Asp Tyr 93 O 935 94 O Lieu. Tyr Glin Glu Glu Lieu. Glu Asn Ala Glin Asn. Glu Gly Ile Ile Thr 945 950 955 96.O Lieu. His Thr Ala Phe Ser Arg Val Pro Asn Gln Pro Llys Thr Tyr Val 965 97O 97. Glin His Val Met Glu Arg Asp Gly Llys Llys Lieu. Ile Glu Lieu. Lieu. Asp 98O 985 99 O Gln Gly Ala His Phe Tyr Ile Cys Gly Asp Gly Ser Gln Met Ala Pro 995 1OOO 1 OOS Asp Val Glu Ala Thr Lieu Met Lys Ser Tyr Ala Asp Val Tyr Glu Val 1010 1 O15 1 O2O Ser Glu Ala Asp Ala Arg Lieu. Trp Lieu. Glin Gln Lieu. Glu Glu Lys Gly 1025 103 O 1035 104 O Arg Tyr Ala Lys Asp Val Trp Ala Gly 1045

<210s, SEQ ID NO 9 &211s LENGTH: 1049 212. TYPE: PRT <213> ORGANISM: Bacillus megaterium 22 Os. FEATURE: <223> OTHER INFORMATION: cytochrome P450:NADPH P450 reductase, CYP102A1 <4 OOs, SEQUENCE: 9 Met Thr Ile Lys Glu Met Pro Gln Pro Llys Thr Phe Gly Glu Lieu Lys 1. 5 1O 15

Asn Lieu Pro Lieu. Lieu. Asn. Thir Asp Llys Pro Ile Glin Thr Lieu Met Lys 2O 25 3O

Ile Ala Asp Glu Lieu. Gly Glu Ile Phe Llys Phe Glu Ala Pro Gly Arg 35 4 O 45

Val Thr Arg Tyr Lieu. Ser Ser Glin Arg Lieu. Ile Llys Glu Ala Cys Asp SO 55 6 O

Glu Ser Arg Phe Asp Lys Asn Lieu. Ser Glin Ala Lieu Lys Phe Val Arg US 9,399,762 B2 89 90 - Continued

65 70

Asp Phe Ala Gly Asp Gly Lell Phe Thir Ser Trp Thir His Glu Lys Asn 85 90 95

Trp Ala His Asn Ile Luell Luell Pro Ser Phe Ser Glin Glin Ala 105 11 O

Met Gly His Ala Met Met Wall Asp Ile Ala Wall Glin Luell Ile 115 12 O 125

Glin Lys Trp Glu Arg Lell Asn Thir Asp Glu His Ile Glu Wall Pro Glu 13 O 135 14 O

Asp Met Thir Arg Lell Thir Lell Asp Thir Ile Gly Lell Gly Phe Asn 145 150 155 160

Arg Phe Asn Ser Phe Arg Asp Glin Pro His Pro Phe Ile Thir 1.65 17O 17s

Ser Met Wall Arg Ala Lell Glu Ala Met ASn Lell Glin Arg Ala 18O 185 19 O

Asn Pro Asp Asp Pro Ala Asp Glu Asn Arg Glin Phe Glin Glu 195

Asp Ile Wall Met Asn Asp Luell Wall Asp Ile Ile Ala Asp Arg 21 O 215

Lys Ala Ser Gly Glu Glin Ser Asp Asp Luell Luell Thir His Met Luell Asn 225 23 O 235 24 O

Gly Asp Pro Glu Thir Gly Glu Pro Luell Asp Asp Glu Asn Ile Arg 245 250 255

Glin Ile Ile Thir Phe Lell Ile Ala Gly His Glu Thir Thir Ser Gly 26 O 265 27 O

Lell Luell Ser Phe Ala Lell Phe Luell Wall Asn Pro His Wall Luell 27s 28O 285

Glin Lys Ala Ala Glu Glu Ala Ala Arg Wall Luell Wall Asp Pro Wall Pro 29 O 295 3 OO

Ser Glin Wall Lys Glin Luell Wall Gly Met Wall Luell Asn 3. OS 310 315

Glu Ala Luell Arg Lell Trp Pro Thir Ala Pro Ala Phe Ser Luell Tyr Ala 3.25 330 335

Glu Asp Thir Wall Lell Gly Gly Glu Pro Lell Glu Lys Gly Asp 34 O 345 35. O

Glu Luell Met Wall Lell Ile Pro Glin Luell His Arg Asp Lys Thir Ile Trp 355 360 365

Gly Asp Asp Wall Glu Glu Phe Arg Pro Glu Arg Phe Glu Asn Pro Ser 37 O 375

Ala Ile Pro Glin His Ala Phe Pro Phe Gly Asn Gly Glin Arg Ala 385 390 395 4 OO

Ile Gly Glin Glin Phe Ala Luell His Glu Ala Thir Lell Wall Luell Gly 4 OS 415

Met Met Luell Lys His Phe Asp Phe Glu Asp His Thir Asn Tyr Glu Luell 42O 425 43 O

Asp Ile Lys Glu Thir Lell Thir Luell Pro Glu Gly Phe Wall Wall 435 44 O 445

Ala Lys Ser Glin Ile Pro Luell Gly Ile Pro Ser Pro Ser Arg 450 45.5 460

Glu Glin Ser Ala Lys Lys Glu Arg Thir Wall Glu Asn Ala His Asn 465 470 47s 48O

Thir Pro Luell Luell Wall Lell Gly Ser Asn Met Gly Thir Ala Glu Gly 485 490 495 US 9,399,762 B2 91 92 - Continued

Thir Ala Arg Asp Lell Ala Asp Ile Ala Met Ser Lys Gly Phe Ala Pro SOO 505

Glin Wall Ala Thir Lell Asp Ser His Ala Gly ASn Lell Pro Arg Glu Gly 515 525

Ala Wall Luell Ile Wall Thir Ala Ser Asn Gly His Pro Pro Asp Asn 53 O 535 54 O

Ala Glu Phe Wall Asp Trp Luell Asp Glin Ala Ser Ala Asp Glu Wall 5.45 550 555 560

Gly Wall Arg Tyr Ser Wall Phe Gly Cys Gly Asp Asn Trp Ala 565 st O sts

Thir Thir Glin Lys Wall Pro Ala Phe Ile Asp Glu Thir Luell Ala Ala 585 59 O

Gly Ala Glu Asn Ile Ala Glu Arg Gly Glu Ala Asp Ala Ser Asp 595 605

Asp Phe Glu Gly Thir Glu Glu Trp Arg Glu His Met Trp Ser Asp 610 615

Lell Ala Ala Phe Asn Lell Asp Ile Glu ASn Ser Glu Glu Asn Ala 625 630 635 64 O

Ser Thir Luell Ser Lell Glin Phe Wall Asp Ser Ala Ala Asp Met Pro Luell 645 650 655

Ala Met His Arg Ala Phe Ser Ala Asn Wall Wall Ala Ser Glu 660 665 67 O

Lell Glin Lys Pro Gly Ser Ala Arg Ser Thir Arg His Lell Glu Ile Glu 675 685

Lell Pro Glu Ala Ser Tyr Glin Glu Gly Asp His Lell Gly Wall Ile 69 O. 695 7 OO

Pro Arg Asn Glu Gly Ile Wall Asn Arg Wall Thir Arg Phe Gly 7 Os

Lell Asp Ala Ser Glin Glin Ile Arg Luell Glu Ala Glu Glu Lys Luell 72 73 O 73

Ala His Luell Pro Lell Gly Thir Wall Ser Wall Glu Luell Luell Glin 740 74. 7 O

Wall Glu Luell Glin Asp Wall Thir Arg Thir Lell Arg Ala Met 7ss 760 765

Ala Ala Thir Wall Pro His Wall Lell Glu Wall Luell 770

Lell Glu Glin Ala Tyr Glu Glin Wall Luell Arg Luell Thir 79 O 79.

Met Luell Glu Luell Lell Glu Pro Ala Met Glu Phe Ser 805 810 815

Glu Phe Ile Ala Lell Lell Ser Met Arg Pro Arg Tyr Ser Ile 825 83 O

Ser Ser Ser Pro Arg Wall Asp Glu Glin Ala Ser Ile Thir Wall Ser 835 84 O 845

Wall Wall Ser Gly Glu Ala Trp Ser Gly Gly Glu Gly Ile 850 855 860

Ala Ser Asn Lell Ala Asn Luell Glin Glu Gly Asp Thir Ile Thir Cys 865

Phe Wall Ser Thir Pro Glin Ser Gly Phe Thir Luell Pro Gly Pro Glu 885 890 895

Thir Pro Luell Ile Met Wall Gly Pro Gly Thir Gly Wall Ala Pro Phe Arg 9 OO 905 91 O US 9,399,762 B2 93 - Continued Gly Phe Val Glin Ala Arg Lys Glin Lieu Lys Glu Glin Gly Glin Ser Lieu. 915 92 O 925 Gly Glu Ala His Lieu. Tyr Phe Gly Cys Arg Ser Pro His Glu Asp Tyr 93 O 935 94 O Lieu. Tyr Glin Lys Glu Lieu. Glu Asn Ala Glin Asn. Glu Gly Ile Ile Thr 945 950 955 96.O Lieu. His Thr Ala Phe Ser Arg Val Pro Asn Gln Pro Llys Thr Tyr Val 965 97O 97. Glin His Val Met Glu Glin Asp Gly Llys Llys Lieu. Ile Glu Lieu. Lieu. Asp 98O 985 99 O Gln Gly Ala His Phe Tyr Ile Cys Gly Asp Gly Ser Gln Met Ala Pro 995 1OOO 1 OOS Asp Val Glu Ala Thr Lieu Met Lys Ser Tyr Ala Glu Val His Glin Val 1010 1 O15 1 O2O Ser Glu Ala Asp Ala Arg Lieu. Trp Lieu. Glin Gln Lieu. Glu Glu Lys Gly 1025 103 O 1035 104 O Arg Tyr Ala Lys Asp Val Trp Ala Gly 1045

<210s, SEQ ID NO 10 &211s LENGTH: 1049 212. TYPE: PRT <213> ORGANISM: Bacillus megaterium 22 Os. FEATURE: <223> OTHER INFORMATION: NADPH-cytochrome P450 reductase 102A1V6, CYP1 O2A1

< 4 OO SEQUENCE: 10 Met Thr Ile Lys Glu Met Pro Gln Pro Llys Thr Phe Gly Glu Lieu Lys 1. 5 1O 15 Asn Lieu Pro Lieu. Lieu. Asn. Thir Asp Llys Pro Val Glin Ala Lieu Met Lys 2O 25 3O Ile Ala Asp Glu Lieu. Gly Glu Ile Phe Llys Phe Glu Ala Pro Gly Arg 35 4 O 45 Val Thr Arg Tyr Lieu. Ser Ser Glin Arg Lieu. Ile Llys Glu Ala Cys Asp SO 55 6 O Glu Ser Arg Phe Asp Lys Asn Lieu. Ser Glin Ala Lieu Lys Phe Val Arg 65 70 7s 8O Asp Phe Ala Gly Asp Gly Lieu. Phe Thir Ser Trp Thr His Glu Lys Asn 85 90 95 Trp Llys Lys Ala His Asn. Ile Lieu. Lieu Pro Ser Phe Ser Glin Glin Ala 1OO 105 11 O Met Lys Gly Tyr His Ala Met Met Val Asp Ile Ala Val Glin Lieu. Ile 115 12 O 125 Glin Llys Trp Glu Arg Lieu. Asn Ala Asp Glu. His Ile Glu Val Pro Glu 13 O 135 14 O

Asp Met Thir Arg Lieu. Thir Lieu. Asp Thir Ile Gly Lieu. Cys Gly Phe Asn 145 150 155 160 Tyr Arg Phe Asin Ser Phe Tyr Arg Asp Glin Pro His Pro Phe Ile Thr 1.65 17O 17s

Ser Met Val Arg Ala Lieu. Asp Glu Ala Met Asn Llys Lieu. Glin Arg Ala 18O 185 19 O

Asn Pro Asp Asp Pro Ala Tyr Asp Glu Asn Lys Arg Glin Phe Glin Asp 195 2OO 2O5

Asp Ile Llys Val Met Asn Asp Lieu Val Asp Llys Ile Ile Ala Asp Arg 21 O 215 22O

US 9,399,762 B2 97 98 - Continued

Ser Thir Luell Ser Lell Glin Phe Val Asp Ser Ala Ala Asp Met Pro Luell 645 650 655

Ala Met His Gly Ala Phe Ser Ala Asn Wall Wall Ala Ser Lys Glu 660 665 67 O

Lell Glin Glin Pro Gly Ser Ala Arg Ser Thir Arg His Lieu. Glu Ile Glu 675 68O 685

Lell Pro Glu Ala Ser Tyr Glin Glu Gly Asp His Lieu. Gly Wall Ile 69 O. 695 7 OO

Pro Arg Asn Glu Gly Ile Wall Asn Arg Wall Thir Thr Arg Phe Gly 7 Os

Lell Asp Ala Ser Glin Glin Ile Arg Lieu. Glu Ala Glu Glu Glu Lys Luell 72 73 O 73

Ala His Luell Pro Lell Gly Lys Thr Val Ser Wall Glu Glu Lieu. Luell Glin 740 74. 7 O

Wall Glu Luell Glin Asp Pro Wall. Thir Arg Thir Glin Lieu. Arg Ala Met 7ss 760 765

Ala Ala Thir Wall Pro Pro His Wall Glu Lieu. Glu Ala Luell 770 78O

Lell Glu Glin Ala Tyr Lys Glu Glin Wall Luell Thir Lys Arg Luell Thir 79 O 79.

Met Luell Glu Luell Lell Glu Lys Tyr Pro Ala Glu Met Glu Phe Ser 805 810 815

Glu Phe Ile Ala Lell Lell Pro Ser Met Arg Pro Arg Tyr Tyr Ser Ile 825 83 O

Ser Ser Ser Pro Arg Wall Asp Glu Lys Gln Ala Ser Ile Thr Wall Ser 835 84 O 845

Wall Wall Ser Gly Glu Ala Trp Ser Gly Gly Glu Tyr Lys Gly Ile 850 855 860

Ala Ser Asn Lell Ala Glu Lieu. Glin Glu Gly Asp Thr Ile Thir Cys 865

Phe Wall Ser Thir Pro Glin Ser Gly Phe Thir Luell Pro Lys Asp Pro Glu 885 890 895

Thir Pro Luell Ile Met Wall Gly Pro Gly Thir Gly Wall Ala Pro Phe Arg 9 OO 905 91 O

Gly Phe Wall Glin Ala Arg Lys Glin Lieu. Glu Gln Gly Glin Ser Luell 915 92 O 925

Gly Glu Ala His Lell Phe Gly Cys Arg Ser Pro His Glu Asp Tyr 93 O 935 94 O

Lell Glin Glu Glu Lell Glu Asn Ala Glin ASn Glu Gly Ile Ile Thir 945 950 955 96.O

Lell His Thir Ala Phe Ser Arg Val Pro Asn Glin Pro Llys Thr Tyr Wall 965 97O 97.

Glin His Wall Wall Glu Glin Asp Gly Lys Luell Ile Glu Lieu. Luell Asp 985 99 O

Glin Gly Ala His Phe Ile Cys Gly Gly Ser Glin Met Ala Pro 995 1OOO 1 OOS

Asp Wall Glu Ala Thir Lell Met Lys Ser Ala Glu Wal His Wall 1010 1 O15 1 O2O

Ser Glu Ala Asp Ala Arg Lieu. Trp Lieu. Glin Glin Lieu. Glu Glu Ser 1025 103 O 1035 104 O

Arg Tyr Ala Lys Asp Val Trp Ala Gly 1045 US 9,399,762 B2 99 100 - Continued

<210s, SEQ ID NO 11 &211s LENGTH: 1049 212. TYPE: PRT <213> ORGANISM: Bacillus megaterium 22 Os. FEATURE: <223> OTHER INFORMATION: NADPH-cytochrome P450 reductase 102A1V5, CYP1 O2A1

<4 OOs, SEQUENCE: 11 Met Thr Ile Lys Glu Met Pro Gln Pro Llys Thr Phe Gly Glu Lieu Lys 1. 5 1O 15 Asn Lieu Pro Lieu. Lieu. Asn. Thir Asp Llys Pro Val Glin Ala Lieu Met Lys 2O 25 3O Ile Ala Asp Glu Lieu. Gly Glu Ile Phe Llys Phe Glu Ala Pro Gly Arg 35 4 O 45 Val Thr Arg Tyr Lieu. Ser Ser Glin Arg Lieu. Ile Llys Glu Ala Cys Asp SO 55 6 O Glu Ser Arg Phe Asp Lys Asn Lieu. Ser Glin Ala Lieu Lys Phe Val Arg 65 70 7s 8O Asp Phe Ala Gly Asp Gly Lieu. Phe Thir Ser Trp Thr His Glu Lys Asn 85 90 95 Trp Llys Lys Ala His Asn. Ile Lieu. Lieu Pro Ser Phe Ser Glin Glin Ala 1OO 105 11 O Met Lys Gly Tyr His Ala Met Met Val Asp Ile Ala Val Glin Lieu. Ile 115 12 O 125 Glin Llys Trp Glu Arg Lieu. Asn Ala Asp Glu. His Ile Glu Val Pro Glu 13 O 135 14 O Asp Met Thir Arg Lieu. Thir Lieu. Asp Thir Ile Gly Lieu. Cys Gly Phe Asn 145 150 155 160 Tyr Arg Phe Asin Ser Phe Tyr Arg Asp Glin Pro His Pro Phe Ile Thr 1.65 17O 17s Ser Met Val Arg Ala Lieu. Asp Glu Ala Met Asn Llys Lieu. Glin Arg Ala 18O 185 19 O Asn Pro Asp Asp Pro Ala Tyr Asp Glu Asn Lys Arg Glin Phe Glin Asp 195 2OO 2O5 Asp Ile Llys Val Met Asn Asp Lieu Val Asp Llys Ile Ile Ala Asp Arg 21 O 215 22O Lys Ala Ser Gly Glu Glin Ser Asp Asp Lieu. Lieu. Thir His Met Lieu. Asn 225 23 O 235 24 O Gly Lys Asp Pro Glu Thr Gly Glu Pro Lieu. Asp Asp Glu Asn. Ile Arg 245 250 255 Tyr Glin Ile Ile Thr Phe Lieu. Ile Ala Gly His Glu Thir Thr Ser Gly 26 O 265 27 O Lieu. Lieu. Ser Phe Ala Lieu. Tyr Phe Lieu Val Lys Asn Pro His Val Lieu. 27s 28O 285 Glin Lys Ala Ala Glu Glu Ala Ala Arg Val Lieu Val Asp Pro Val Pro 29 O 295 3 OO

Ser Tyr Lys Glin Val Lys Glin Lieu Lys Tyr Val Gly Met Val Lieu. Asn 3. OS 310 315 32O

Glu Ala Lieu. Arg Lieu. Trp Pro Thr Ala Pro Ala Phe Ser Lieu. Tyr Ala 3.25 330 335

Lys Glu Asp Thr Val Lieu. Gly Gly Glu Tyr Pro Lieu. Glu Lys Gly Asp 34 O 345 35. O Glu Lieu Met Val Lieu. Ile Pro Glin Lieu. His Arg Asp Llys Thir Ile Trp 355 360 365 US 9,399,762 B2 101 102 - Continued

Gly Asp Asp Wall Glu Glu Phe Arg Pro Glu Arg Phe Glu Asn Pro Ser 37 O 375

Ala Ile Pro Glin His Ala Phe Pro Phe Gly Asn Gly Glin Arg Ala 385 390 395 4 OO

Ile Gly Glin Glin Phe Ala Luell His Glu Ala Thir Lell Wall Luell Gly 4 OS 415

Met Met Luell Lys His Phe Asp Phe Glu Asp His Thir Asn Tyr Glu Luell 425 43 O

Asp Ile Lys Glu Thir Lell Thir Luell Pro Glu Gly Phe Wall Wall Lys 435 44 O 445

Ala Lys Ser Glin Ile Pro Luell Gly Ile Pro Ser Pro Ser Arg 450 45.5 460

Glu Glin Ser Ala Lys Lys Glu Arg Thir Wall Glu Asn Ala His Asn 465 470

Thir Pro Luell Luell Wall Lell Gly Ser Asn Met Gly Thir Ala Glu Gly 485 490 495

Thir Ala Arg Asp Lell Ala Asp Ile Ala Met Ser Gly Phe Ala Pro SOO 505

Glin Wall Ala Thir Lell Asp Ser His Ala Gly ASn Lell Pro Arg Glu Gly 515 525

Ala Wall Luell Ile Wall Thir Ala Ser Asn Gly His Pro Pro Asp Asn 53 O 535 54 O

Ala Glin Phe Wall Asp Trp Luell Asp Glin Ala Ser Ala Asp Glu Wall 5.45 550 555 560

Gly Wall Arg Tyr Ser Wall Phe Gly Cys Gly Asp Asn Trp Ala 565 st O sts

Thir Thir Glin Lys Wall Pro Ala Phe Ile Asp Glu Thir Luell Ser Ala 585 59 O

Gly Ala Glu Asn Ile Ala Glu Arg Gly Glu Ala Asp Ala Ser Asp 595 605

Asp Phe Glu Gly Thir Glu Glu Trp Arg Glu His Met Trp Ser Asp 610 615

Lell Ala Ala Phe Asn Lell Asn Ile Glu ASn Ser Glu Asp Asn Ala 625 630 635 64 O

Ser Thir Luell Ser Lell Glin Phe Wall Asp Ser Ala Ala Asp Met Pro Luell 645 650 655

Ala Met His Gly Ala Phe Ser Ala Asn Wall Wall Ala Ser Glu 660 665 67 O

Lell Glin Glin Pro Gly Ser Ala Arg Ser Thir Arg His Lell Glu Ile Glu 675 685

Lell Pro Glu Ala Ser Tyr Glin Glu Gly Asp His Lell Gly Wall Ile 69 O. 695 7 OO

Pro Arg Asn Glu Gly Ile Wall Asn Arg Wall Thir Thir Arg Phe Gly 7 Os

Lell Asp Ala Ser Glin Glin Ile Arg Luell Glu Ala Glu Glu Glu Lys Luell 72 73 O 73

Ala His Luell Pro Lell Gly Thir Wall Ser Wall Glu Glu Luell Luell Glin 740 74. 7 O

Wall Glu Luell Glin Asp Pro Wall Thir Arg Thir Glin Lell Arg Ala Met 7ss 760 765

Ala Ala Thir Wall Pro Pro His Lys Wall Glu Lell Glu Ala Luell 770 775

Lell Glu Glin Ala Glu Glin Wall Luell Thir Arg Luell Thir US 9,399,762 B2 103 104 - Continued

78s 79 O 79.

Met Leu Glu Lieu Lleu. Glu Lys Tyr Pro Ala Cys Glu Met Glu Phe Ser 805 810 815

Glu Phe Ile Ala Leu Lleu Pro Ser Met Arg Pro Arg Tyr Tyr Ser Ile 82O 825 83 O

Ser Ser Ser Pro Arg Val Asp Glu Lys Glin Ala Ser Ile Thr Val Ser 835 84 O 845

Val Val Ser Gly Glu Ala Trp Ser Gly Tyr Gly Glu Tyr Lys Gly Ile 850 855 860 Ala Ser Asn Tyr Lieu Ala Glu Lieu. Glin Glu Gly Asp Thir Ile Thr Cys 865 87O 87s 88O

Phe Val Ser Thr Pro Glin Ser Gly Phe Thr Lieu Pro Lys Asp Pro Glu 885 890 895 Thr Pro Leu. Ile Met Val Gly Pro Gly Thr Gly Val Ala Pro Phe Arg 9 OO 905 91 O

Gly Phe Val Glin Ala Arg Lys Glin Lieu Lys Glu Glin Gly Glin Ser Luell 915 92 O 925 Gly Glu Ala His Lieu. Tyr Phe Gly Cys Arg Ser Pro His Glu Asp Tyr 93 O 935 94 O

Lieu. Tyr Glin Glu Glu Lieu. Glu Asn Ala Glin Asn. Glu Gly Ile Ile Thir 945 950 955 96.O

Lieu. His Thr Ala Phe Ser Arg Val Pro Asn Gln Pro Llys Thr Tyr Wall 965 97O 97. Glin His Val Val Glu Glin Asp Gly Llys Llys Lieu. Ile Glu Lieu. Lieu. Asp 98O 985 99 O

Gln Gly Ala His Phe Tyr Ile Cys Gly Asp Gly Ser Gln Met Ala Pro 995 1OOO 1 OOS

Asp Val Glu Ala Thr Lieu Met Lys Ser Tyr Ala Glu Val His Lys Wall 1010 1 O15 1 O2O

Ser Glu Ala Asp Ala Arg Lieu. Trp Lieu. Glin Gln Lieu. Glu Glu Lys Ser 1025 103 O 1035 104 O Arg Tyr Ala Lys Asp Val Trp Ala Gly 1045

<210s, SEQ ID NO 12 &211s LENGTH: 42O 212. TYPE: PRT <213> ORGANISM: Mycobacterium sp. HXN-1500 22 Os. FEATURE: <223> OTHER INFORMATION: cytochrome P450 alkane hydroxylase, CYP153A6

<4 OOs, SEQUENCE: 12 Met Thr Glu Met Thr Val Ala Ala Ser Asp Ala Thr Asn Ala Ala Tyr 1. 5 1O 15 Gly Met Ala Lieu. Glu Asp Ile Asp Wal Ser Asn Pro Val Lieu. Phe Arg 2O 25 3O

Asp Asn. Thir Trp His Pro Tyr Phe Lys Arg Lieu. Arg Glu Glu Asp Pro 35 4 O 45

Val His Tyr Cys Lys Ser Ser Met Phe Gly Pro Tyr Trp Ser Val Thir SO 55 6 O

Llys Tyr Arg Asp Ile Met Ala Val Glu Thir Asn. Pro Llys Val Phe Ser 65 70 7s 8O

Ser Glu Ala Lys Ser Gly Gly Ile Thir Ile Met Asp Asp Asn Ala Ala 85 90 95

Ala Ser Leu Pro Met Phe Ile Ala Met Asp Pro Pro Llys His Asp Wall US 9,399,762 B2 105 106 - Continued

105 11 O

Glin Arg Lys Thir Wall Ser Pro Ile Wall Ala Pro Glu Asn Luell Ala Thir 115 12 O 125

Met Glu Ser Wall Ile Arg Glin Arg Thir Ala Asp Lell Lell Asp Gly Luell 13 O 135 14 O

Pro Ile Asn Glu Glu Phe Asp Trp Wall His Arg Wall Ser Ile Glu Luell 145 150 155 160

Thir Thir Met Lell Ala Thir Luell Phe Asp Phe Pro Trp Asp Asp Arg 1.65 17O 17s

Ala Luell Thir Trp Ser Asp Wall Thir Thir Ala Lell Pro Gly Gly 18O 185 19 O

Gly Ile Ile Asp Ser Glu Glu Glin Arg Met Ala Glu Lell Met Glu 195

Ala Thir Phe Thir Glu Lell Trp Asn Glin Arg Wall Asn Ala Glu Pro 21 O 215 22O

Lys Asn Asp Luell Ile Ser Met Met Ala His Ser Glu Ser Thir Arg His 225 23 O 235 24 O

Met Ala Pro Glu Glu Lell Gly Asn Ile Wall Lell Lell Ile Wall Gly 245 250 255

Gly Asn Asp Thir Thir Arg Asn Ser Met Thir Gly Gly Wall Luell Ala Luell 26 O 265 27 O

Asn Glu Phe Pro Asp Glu Arg Luell Ser Ala Asn Pro Ala Luell 27s 285

Ile Ser Ser Met Wall Ser Glu Ile Ile Arg Trp Glin Thir Pro Luell Ser 29 O 295 3 OO

His Met Arg Arg Thir Ala Lell Glu Asp Ile Glu Phe Gly Gly His 3. OS 310 315

Ile Arg Glin Gly Asp Wall Wall Met Trp Wall Ser Gly Asn Arg 3.25 330 335

Asp Pro Glu Ala Ile Asp Asn Pro Asp Thir Phe Ile Ile Asp Arg Ala 34 O 345 35. O

Pro Arg Glin His Lell Ser Phe Gly Phe Gly Ile His Arg Wall 355 360 365

Gly Asn Arg Luell Ala Glu Lell Glin Luell Asn Ile Lell Trp Glu Glu Ile 37 O 375

Lell Arg Trp Pro Asp Pro Luell Glin Ile Glin Wall Lell Glin Glu Pro 385 390 395 4 OO

Thir Arg Wall Luell Ser Pro Phe Wall Gly Glu Ser Luell Pro Wall 4 OS 41O 415

Arg Ile Asn Ala

SEQ ID NO 13 LENGTH: 496 TYPE : PRT ORGANISM: Tetrahymena thermophile FEATURE: OTHER INFORMATION: cytochrome P450 monooxygenase CYP5O13 C2

SEQUENCE: 13 Met Ile Phe Glu Lieu. Ile Lieu. Ile Ala Val Ala Leu Phe Ala Tyr Phe 1. 5 15

Lys Ile Ala Lys Pro Tyr Phe Ser Tyr Lieu Lys Tyr Arg Llys Tyr Gly 25

Lys Gly Phe Tyr Tyr Pro Ile Leu Gly Glu Met Ile Glu Glin Glu Gln US 9,399,762 B2 107 108 - Continued

35 4 O 45

Asp Luell Glin His Ala Asp Ala Asp Ser Wall His His Ala Luell SO 55 6 O

Asp Asp Pro Asp Glin Luell Phe Wall Thir Asn Lell Gly Thir Lys 65 70 8O

Wall Luell Arg Lell Ile Glu Pro Glu Ile Ile Asp Phe Phe Ser 85 90 95

Ser Glin Tyr Tyr Glin Asp Glin Thir Phe Ile Glin Asn Ile Thir 1OO 105 11 O

Arg Phe Luell Asn Gly Ile Wall Phe Ser Glu Gly Asn Thir Trp 115 12 O 125

Glu Ser Arg Lell Phe Ser Pro Ala Phe His Tyr Glu Tyr Ile Glin 13 O 135 14 O

Lys Luell Thir Pro Lell Ile Asn Asp Ile Thir Asp Thir Ile Phe Asn Luell 145 150 155 160

Ala Wall Asn Glin Glu Lell Asn Phe Asp Pro Ile Ala Glin Ile 1.65 17O 17s

Glin Glu Ile Thir Gly Arg Wall Ile Ile Ala Ser Phe Phe Gly Glu Wall 18O 185 19 O

Ile Glu Gly Glu Phe Glin Gly Luell Thir Ile Ile Glin Luell Ser 195

His Ile Ile Asn Thir Lell Gly Asn Glin Thir Ser Ile Met Phe 21 O 215 22O

Lell Phe Gly Ser Tyr Phe Glu Luell Gly Wall Thir Glu Glu His Arg 225 23 O 235 24 O

Phe Asn Phe Ile Ala Glu Phe Asn Lys Lell Luell Glin 245 250 255

le Asp Glin Glin Ile Glu Ile Met Ser Asn Glu Lell Glin Thir 26 O 265 27 O

yr Ile Glin Asn Pro Ile Luell Ala Glin Luell Ile Ser Thir His 27s 285

le Asp Glu Ile Thir Arg Asn Glin Luell Phe Glin Asp Phe Thir Phe 29 O 295 3 OO

Ile Ala Gly Met Asp Thir Thir His Luell Lell Gly Met Thir Ile 310 315

Wall Ser Glin Asn Asp Tyr Thir Lell Glin Ser Glu 3.25 330 335

le Ser Asn Thir Asp Glin Ser His Gly Lell Ile Lys Asn Luell 34 O 35. O

Pro Luell Asn Ala Wall Ile Lys Thir Luell Arg Tyr Gly Pro 355 360 365

Gly Asn Ile Luell Phe Asp Arg Ile Ile Asp His Glu Luell Ala 37 O 375

Gly Ile Pro Ile Lys Gly Thir Wall Thir Pro Ala Met Ser 385 390 395 4 OO

Met Glin Arg Asn Ser Asp Pro His Asn Pro 4 OS 41O 415

Ser Arg Trp Luell Glu Glin Ser Ser Asp Luell His Pro Asp Ala Asn 425 43 O

Ile Pro Phe Ser Ala Gly Glin Arg Ile Gly Glu Glin Luell Ala 435 44 O 445

Lell Luell Glu Ala Arg Ile Ile Luell Asn Phe Ile Met Phe Asp 450 45.5 460 US 9,399,762 B2 109 110 - Continued

Phe Thr Cys Pro Glin Asp Tyr Lys Lieu Met Met Asn Tyr Llys Phe Luell 465 470 48O

Ser Glu Pro Wall Asn. Pro Leu Pro Leu Glin Lieu. Thir Lieu. Arg Llys Glin 485 490 495

SEQ ID NO 14 LENGTH: 394 TYPE : PRT ORGANISM: Nonomuraea dietziae FEATURE: OTHER INFORMATION: cytochrome P450 hydroxylase sh3

<4 OOs, SEQUENCE: 14

Val Asn. Ile Asp Lell Wall Asp Glin Asp His Tyr Ala Thir Phe Gly Pro 1. 5 15

Pro His Glu Glin Met Arg Trp Luell Arg Glu His Ala Pro Wall Trp 25

His Glu Gly Glu Pro Gly Phe Trp Ala Wall Thir Arg His Glu Asp Wall 35 4 O 45

Wall His Wall Ser Arg His Ser Asp Luell Phe Ser Ser Ala Arg Arg Luell SO 55 6 O

Ala Luell Phe Asn Glu Met Pro Glu Glu Glin Arg Glu Lell Glin Arg Met 65 70

Met Met Luell Asn Glin Asp Pro Pro Glu His Thir Arg Arg Arg Ser Luell 85 90 95

Wall Asn Arg Gly Phe Thir Pro Arg Thir Ile Arg Ala Lell Glu Glin His 105 11 O

Ile Arg Asp Ile Cys Asp Asp Luell Luell Asp Glin Ser Gly Glu Gly 115 12 O 125

Asp Phe Wall Thir Asp Lell Ala Ala Pro Luell Pro Lell Wall Ile 13 O 135 14 O

Glu Luell Luell Gly Ala Pro Wall Ala Asp Arg Asp Ile Phe Ala Trp 145 150 155 160

Ser Asn Arg Met Ile Gly Ala Glin Asp Pro Asp Ala Ala Ser Pro 1.65 17O 17s

Glu Glu Gly Gly Ala Ala Ala Met Glu Wall Ala Ala Ser Glu 18O 185 19 O

Lell Ala Ala Glin Arg Arg Ala Ala Pro Arg Asp Asp Ile Wall Thir 195

Lell Luell Glin Ser Asp Glu Asn Gly Glu Ser Luell Thir Glu Asn Glu Phe 21 O 215

Glu Luell Phe Wall Lell Lell Lell Wall Wall Ala Gly Asn Glu Thir Thir Arg 225 23 O 235 24 O

Asn Ala Ala Ser Gly Gly Met Luell Thir Luell Phe Glu His Pro Asp Glin 245 250 255

Trp Asp Arg Luell Wall Ala Asp Pro Ser Luell Ala Ala Thir Ala Ala Asp 26 O 265 27 O

Glu Ile Wall Arg Trp Wall Ser Pro Wall Asn Luell Phe Arg Arg Thir Ala 27s 28O 285

Thir Ala Asp Luell Thir Lell Gly Gly Glin Glin Wall Lys Ala Asp Asp 29 O 295 3 OO

Wall Wall Wall Phe Tyr Ser Ser Ala Asn Arg Asp Ala Ser Wall Phe Ser 3. OS 310 315 32O

Asp Pro Glu Wall Phe Asp Ile Gly Arg Ser Pro Asn Pro His Ile Gly 3.25 330 335 US 9,399,762 B2 111 112 - Continued

Phe Gly Gly Gly Gly Ala His Phe Cys Lieu. Gly Asn His Lieu Ala Lys 34 O 345 35. O

Lieu. Glu Lieu. Arg Val Lieu. Phe Glu Gln Lieu Ala Arg Arg Phe Pro Arg 355 360 365

Met Arg Gln Thr Gly Glu Ala Arg Arg Lieu. Arg Ser Asn Phe Ile Asn 37 O 375

Gly Ile Lys Thr Lieu Pro Wall Thr Lieu. Gly 385 390

SEO ID NO 15 LENGTH: TYPE : PRT ORGANISM: Homo sapiens FEATURE: OTHER INFORMATION: human vitamin D 25-hydroxylase, CYP2R1.

<4 OOs, SEQUENCE: 15

Met Trp Llys Lieu Trp Arg Ala Glu Glu Gly Ala Ala Ala Luell Gly Gly 1. 5 15

Ala Luell Phe Luell Lell Lell Phe Ala Luell Gly Wall Arg Glin Luell Luell 2O 25

Glin Arg Arg Pro Met Gly Phe Pro Pro Gly Pro Pro Gly Luell Pro Phe 35 4 O 45

Ile Gly Asn Ile Tyr Ser Lell Ala Ala Ser Ser Glu Lell Pro His Wall SO 55 6 O

Tyr Met Arg Glin Ser Glin Wall Gly Glu Ile Phe Ser Luell Asp 65 70 75

Lell Gly Gly Ile Ser Thir Wall Wall Luell Asn Gly Asp Wall Wall Lys 85 90 95

Glu Luell Wall His Glin Ser Glu Ile Phe Ala Asp Arg Pro Luell 105 11 O

Pro Luell Phe Met Lys Met Thir Lys Met Gly Gly Lell Lell Asn Ser Arg 115 12 O 125

Gly Arg Gly Trp Wall Asp His Arg Arg Luell Ala Wall Asn Ser Phe 13 O 135 14 O

Arg Phe Gly Tyr Gly Glin Ser Phe Glu Ser Ile Luell Glu 145 150 155 160

Glu Thir Phe Phe Asn Asp Ala Ile Glu Thir Gly Arg Pro 1.65 17O

Phe Asp Phe Lys Glin Lell Ile Thir Asn Ala Wall Ser Asn Ile Thir Asn 18O 185 19 O

Lell Ile Ile Phe Gly Glu Arg Phe Thir Glu Asp Thir Asp Phe Glin 195 2O5

His Met Ile Glu Lell Phe Ser Glu Asn Wall Glu Lell Ala Ala Ser Ala 21 O 215

Ser Wall Phe Luell Tyr Asn Ala Phe Pro Trp Ile Gly Ile Luell Pro Phe 225 23 O 235 24 O

Gly His Glin Glin Lell Phe Arg Asn Ala Ala Wall Wall Asp Phe 245 250 255

Lell Ser Arg Luell Ile Glu Ala Ser Wall ASn Arg Pro Glin Luell 26 O 265 27 O

Pro Glin His Phe Wall Asp Ala Tyr Luell Asp Glu Met Asp Glin Gly 27s 285

Asn Asp Pro Ser Ser Thir Phe Ser Glu ASn Lell Ile Phe Ser Wall 29 O 295 3 OO US 9,399,762 B2 113 114 - Continued

Gly Glu Luell Ile Ile Ala Gly Thir Glu Thir Thir Thir Asn. Wall Luell Arg 3. OS 310 315

Trp Ala Ile Luell Phe Met Ala Luell Tyr Pro ASn Ile Glin Gly Glin Wall 3.25 330 335

Glin Glu Ile Asp Lell Ile Met Gly Pro ASn Gly Llys Pro Ser Trp 34 O 345 35. O

Asp Asp Lys Lys Met Pro Tyr Thir Glu Ala Wall Lieu. His Glu Wall 355 360 365

Lell Arg Phe Asn Ile Wall Pro Luell Gly Ile Phe His Ala Thir Ser 37 O 375 38O

Glu Asp Ala Wall Wall Arg Gly Tyr Ser Ile Pro Gly Thr Thir Wall 385 390 395 4 OO

Ile Thir Asn Luell Tyr Ser Wall His Phe Asp Glu Tyr Trp Arg Asp 4 OS 415

Pro Glu Wall Phe His Pro Glu Arg Phe Luell Asp Ser Ser Gly Tyr Phe 425 43 O

Ala Lys Glu Ala Lell Wall Pro Phe Ser Luell Gly Arg Arg His 435 44 O 445

Lell Gly Glu His Lell Ala Arg Met Glu Met Phe Lell Phe Phe Thir Ala 450 45.5 460

Lell Luell Glin Arg Phe His Lell His Phe Pro His Glu Lieu Wall Pro Asp 465 470 48O

Lell Pro Arg Lell Gly Met Thir Luell Glin Pro Glin Pro Tyr Luell Ile 485 490 495

Ala Glu Arg Arg SOO

SEQ ID NO 16 LENGTH: TYPE : PRT ORGANISM: Macaca mulatta FEATURE: OTHER INFORMATION: Rhesus monkey vitamin D 25-hydroxylase, CYP2R1.

<4 OOs, SEQUENCE: 16

Met Trp Lys Luell Trp Gly Gly Glu Glu Gly Ala Ala Ala Lieu. Gly Gly 1. 5 15

Ala Luell Phe Luell Lell Lell Phe Ala Luell Gly Wall Arg Gln Lieu Luell 2O 25 3O

Lell Arg Arg Pro Met Gly Phe Pro Pro Gly Pro Pro Gly Lieu. Pro Phe 35 4 O 45

Ile Gly Asn Ile Tyr Ser Lell Ala Ala Ser Ala Glu Leul Pro His Wall SO 55 6 O

Tyr Met Arg Glin Ser Glin Wall Gly Glu Ile Phe Ser Luell Asp 65 70

Lell Gly Gly Ile Ser Thir Wall Wall Luell Asn Gly Tyr Asp Wall Wall Lys 85 90 95

Glu Luell Wall His Glin Ser Gly Ile Phe Ala Asp Arg Pro Luell 105 11 O

Pro Luell Phe Met Lys Met Thir Lys Met Gly Gly Lell Lieu. Asn Ser Arg 115 12 O 125

Gly Glin Gly Trp Wall Glu His Arg Arg Luell Ala Wall Asn Ser Phe 13 O 135 14 O

Arg Phe Gly Tyr Gly Glin Ser Phe Glu Ser Lys Ile Luell Glu 145 150 155 160 US 9,399,762 B2 115 116 - Continued

Glu Thir Phe Phe Thir Asp Ala Ile Glu Thir Tyr Lys Gly Arg Pro 1.65 17O

Phe Asp Phe Lys Glin Lell Ile Thir Ser Ala Wall Ser Asn Ile Thir Asn 18O 185 19 O

Lell Ile Ile Phe Gly Glu Arg Phe Thir Glu Asp Thir Asp Phe Glin 195

His Met Ile Glu Lell Phe Ser Glu Asn Wall Glu Lell Ala Ala Ser Ala 21 O 215

Ser Wall Phe Luell Tyr Asn Ala Phe Pro Trp Ile Gly Ile Luell Pro Phe 225 23 O 235 24 O

Gly His Glin Glin Lell Phe Arg Asn Ala Ser Wall Wall Asp Phe 245 250 255

Lell Ser Arg Luell Ile Glu Ala Ser Wall ASn Arg Pro Glin Luell 26 O 265 27 O

Pro Glin His Phe Wall Asp Ala Tyr Phe Asp Glu Met Asp Glin Gly 27s 285

Asn Asp Pro Ser Ser Thir Phe Ser Glu ASn Lell Ile Phe Ser Wall 29 O 295 3 OO

Gly Glu Luell Ile Ile Ala Gly Thir Glu Thir Thir Thir Asn Wall Luell Arg 3. OS 310 315 32O

Trp Ala Ile Luell Phe Met Ala Luell Tyr Pro ASn Ile Glin Gly Glin Wall 3.25 330 335

Glin Glu Ile Asp Lell Ile Met Gly Pro ASn Gly Pro Ser Trp 34 O 345 35. O

Asp Asp Lys Phe Met Pro Tyr Thir Glu Ala Wall Lell His Glu Wall 355 360 365

Lell Arg Phe Asn Ile Wall Pro Luell Gly Ile Phe His Ala Thir Ser 37 O 375 38O

Glu Asp Ala Wall Wall Arg Gly Tyr Ser Ile Pro Gly Thir Thir Wall 385 390 395 4 OO

Ile Thir Asn Luell Tyr Ser Wall His Phe Asp Glu Trp Arg Asp 4 OS 415

Pro Glu Wall Phe His Pro Glu Arg Phe Luell Asp Ser Ser Gly Tyr Phe 425 43 O

Ala Lys Glu Ala Lell Wall Pro Phe Ser Luell Gly Arg Arg His 435 44 O 445

Lell Gly Glu Glin Lell Ala Arg Met Glu Met Phe Lell Phe Phe Thir Ala 450 45.5 460

Lell Luell Glin Arg Phe His Lell His Phe Pro His Glu Lell Wall Pro Asp 465 470 48O

Lell Pro Arg Lell Gly Met Thir Luell Glin Pro Glin Pro Luell Ile 485 490 495

Ala Glu Arg Arg SOO

SEO ID NO 17 LENGTH: TYPE : PRT ORGANISM: Canis familiaris FEATURE: OTHER INFORMATION: dog vitamin D 25-hydroxylase, CYP2R1.

SEQUENCE: 17 Met Arg Gly Pro Pro Gly Ala Glu Ala Cys Ala Ala Gly Lieu. Gly Ala 1. 5 15 US 9,399,762 B2 117 118 - Continued

Ala Luell Luell Luell Lell Lell Phe Wall Luell Gly Wall Arg Glin Luell Luell Lys 2O 25

Glin Arg Arg Pro Ala Gly Phe Pro Pro Gly Pro Ser Gly Luell Pro Phe 35 4 O 45

Ile Gly Asn Ile Tyr Ser Lell Ala Ala Ser Gly Glu Lell Ala His Wall SO 55 6 O

Tyr Met Arg Glin Ser Arg Wall Gly Glu Ile Phe Ser Luell Asp 65 70

Lell Gly Gly Ile Ser Ala Wall Wall Luell Asn Gly Tyr Asp Wall Wall Lys 85 90 95

Glu Luell Wall His Glin Ser Glu Ile Phe Ala Asp Arg Pro Luell 105 11 O

Pro Luell Phe Met Lys Met Thir Lys Met Gly Gly Lell Lell Asn Ser Arg 115 12 O 125

Gly Arg Gly Trp Wall Asp His Arg Luell Ala Wall Asn Ser Phe 13 O 135 14 O

Arg Phe Gly Tyr Gly Glin Ser Phe Glu Ser Ile Luell Glu 145 150 155 160

Glu Thir Asn Phe Phe Ile Asp Ala Ile Glu Thir Gly Arg Pro 1.65 17O

Phe Asp Luell Lys Glin Lell Ile Thir Asn Ala Wall Ser Asn Ile Thir Asn 18O 185 19 O

Lell Ile Ile Phe Gly Glu Arg Phe Thir Glu Asp Thir Asp Phe Glin 195

His Met Ile Glu Lell Phe Ser Glu Asn Wall Glu Lell Ala Ala Ser Ala 21 O 215

Ser Wall Phe Luell Tyr Asn Ala Phe Pro Trp Ile Gly Ile Ile Pro Phe 225 23 O 235 24 O

Gly His Glin Glin Lell Phe Arg Asn Ala Ala Wall Wall Asp Phe 245 250 255

Lell Ser Arg Luell Ile Glu Ala Ser Ile ASn Arg Pro Glin Ser 26 O 265 27 O

Pro Glin His Phe Wall Asp Ala Tyr Luell Asn Glu Met Asp Glin Gly 27s 285

Asn Asp Pro Ser Thir Phe Ser Glu ASn Lell Ile Phe Ser Wall 29 O 295 3 OO

Gly Glu Luell Ile Ile Ala Gly Thir Glu Thir Thir Thir Asn Wall Luell Arg 3. OS 310 315

Trp Ala Ile Luell Phe Met Ala Luell Tyr Pro ASn Ile Glin Gly Glin Wall 3.25 330 335

Glin Glu Ile Asp Lell Ile Met Gly Pro Thir Gly Pro Ser Trp 34 O 345 35. O

Asp Asp Lys Lys Met Pro Tyr Thir Glu Ala Wall Lell His Glu Wall 355 360 365

Lell Arg Phe Asn Ile Wall Pro Luell Gly Ile Phe His Ala Thir Ser 37 O 375 38O

Glu Asp Ala Wall Wall Arg Gly Ser Ile Pro Gly Thir Thir Wall 385 390 395 4 OO

Ile Thir Asn Luell Tyr Ser Wall His Phe Asp Glu Trp Arg Asn 4 OS 415

Pro Glu Ile Phe Tyr Pro Glu Arg Phe Luell Asp Ser Ser Gly Tyr Phe 42O 425 43 O US 9,399,762 B2 119 120 - Continued

Ala Lys Glu Ala Lell Wall Pro Phe Ser Luell Gly Lys Arg His Cys 435 44 O 445

Lell Gly Glu Glin Lell Ala Arg Met Glu Met Phe Lell Phe Phe Thir Ala 450 45.5 460

Lell Luell Glin Arg Phe His Lell His Phe Pro His Gly Lell Wall Pro Asp 465 470

Lell Pro Arg Lell Gly Met Thir Luell Glin Pro Glin Pro Luell Ile 485 490 495

Ala Glu Arg Arg SOO

SEQ ID NO 18 LENGTH: 222 TYPE : PRT ORGANISM: Mus musculus FEATURE: OTHER INFORMATION: mouse Cyp2r1 protein, CYP2R1.

<4 OOs, SEQUENCE: 18

Met Gly Asp Glu Met Asp Glin Gly Glin Asn Asp Pro Lell Ser Thir Phe 1. 5 15

Ser Lys Glu Asn Lell Ile Phe Ser Wall Gly Glu Lell Ile Ile Ala Gly 25 3O

Thir Glu Thir Thir Thir Asn Wall Luell Arg Trp Ala Ile Lell Phe Met Ala 35 4 O 45

Lell Tyr Pro Asn Ile Glin Gly Glin Wall His Glu Ile Asp Luell Ile SO 55 6 O

Wall Gly His Asn Arg Arg Pro Ser Trp Glu Tyr Met Pro 65 70

Thir Glu Ala Wall Lell His Glu Wall Luell Arg Phe Asn Ile Wall 85 90 95

Pro Luell Gly Ile Phe His Ala Thir Ser Glu Asp Ala Wall Wall Arg Gly 105 11 O

Ser Ile Pro Gly Thir Thir Wall Ile Thir Asn Lell Tyr Ser Wall 115 12 O 125

His Phe Asp Glu Trp Asp Pro Asp Met Phe Pro Glu 13 O 135 14 O

Arg Phe Luell Asp Ser Asn Gly Tyr Phe Thir Lys Glu Ala Luell Ile 145 150 155 160

Pro Phe Ser Luell Gly Arg Arg His Luell Gly Glu Glin Luell Ala Arg 1.65 17O

Met Glu Met Phe Lell Phe Phe Thir Ser Luell Luell Glin Glin Phe His Luell 18O 185 19 O

His Phe Pro His Glu Lell Wall Pro Asn Luell Lys Pro Arg Luell Gly Met 195

Thir Luell Glin Pro Glin Pro Tyr Luell Ile Ala Glu Arg Arg 21 O 215 22O

SEQ ID NO 19 LENGTH: 422 TYPE : PRT ORGANISM; Bacillus halodurans FEATURE: OTHER INFORMATION: Bacillus halodurans strain C-125 fatty acid alpha hydroxylase,

SEQUENCE: 19 Met Lys Ser Asn Asp Pro Ile Pro Lys Asp Ser Pro Lieu. Asp His Thr US 9,399,762 B2 121 122 - Continued

15

Met Asn Luell Met Arg Glu Gly Glu Phe Luell Ser His Arg Met Glu 25 3O

Arg Phe Glin Thir Asp Lell Phe Glu Thir Arg Wall Met Gly Glin Wall 35 4 O 45

Lell Cys Ile Arg Gly Ala Glu Ala Wall Lys Luell Phe Asp Pro Glu SO 55 6 O

Arg Phe Arg His Arg Ala Thir Pro Arg Ile Glin Ser Luell 65 70

Phe Gly Glu Asn Ala Ile Glin Thir Met Asp Asp Ala His Luell His 85 90 95

Arg Glin Luell Phe Lell Ser Met Met Lys Pro Glu Asp Glu Glin Glu 105 11 O

Lell Ala Arg Luell Thir His Glu Thir Trp Arg Arg Wall Ala Glu Gly Trp 115 12 O 125

Lys Ser Arg Pro Ile Wall Luell Phe Asp Glu Ala Arg Wall Luell 13 O 135 14 O

Cys Glin Wall Ala Glu Trp Ala Glu Wall Pro Lell Ser Thir Glu 145 150 155 160

Ile Asp Arg Arg Ala Glu Asp Phe His Ala Met Wall Asp Ala Phe Gly 1.65 17O 17s

Ala Wall Gly Pro Arg His Trp Arg Gly Arg Gly Arg Arg Arg Thir 18O 185 19 O

Glu Arg Trp Ile Glin Ser Ile Ile His Glin Wall Arg Thir Gly Ser Luell 195

Glin Ala Arg Glu Gly Ser Pro Luell Tyr Wall Ser His Arg Glu 21 O 215 22O

Lell Asn Gly Lell Lell Asp Glu Arg Met Ala Ala Ile Glu Luell Ile 225 23 O 235 24 O

Asn Wall Luell Arg Pro Ile Wall Ala Ile Ala Thir Phe Ile Ser Phe Ala 245 250 255

Ala Ile Ala Luell Glin Glu His Pro Glu Trp Glin Glu Arg Luell Asn 26 O 265 27 O

Gly Ser Asn Glu Glu Phe His Met Phe Wall Glin Glu Wall Arg Arg 27s 285

Pro Phe Ala Pro Lell Ile Gly Ala Wall Arg Ser Phe Thir 29 O 295 3 OO

Trp Gly Wall Arg Phe Gly Arg Luell Wall Phe Luell Asp Met 3. OS 310 315

Gly Thir Asn His Asp Pro Luell Trp Asp Glu Pro Asp Ala Phe 3.25 330 335

Arg Pro Glu Arg Phe Glin Glu Arg Lys Asp Ser Lell Tyr Asp Phe Ile 34 O 345 35. O

Pro Glin Gly Gly Gly Asp Pro Thir Gly His Arg Cys Pro Gly Glu 355 360 365

Gly Ile Thir Wall Glu Wall Met Thir Thir Met Asp Phe Luell Wall Asn 37 O 375

Asp Ile Asp Asp Wall Pro Asp Glin Asp Ile Ser Ser Luell Ser 385 390 395 4 OO

Arg Met Pro Thir Arg Pro Glu Ser Gly Tyr Ile Met Ala Asn Ile Glu 4 OS 41O 415

Arg Glu His Ala 42O US 9,399,762 B2 123 124 - Continued

SEQ ID NO 2 O LENGTH: 389 TYPE : PRT ORGANISM: Streptomyces parvus FEATURE: OTHER INFORMATION: cytochrome P450, alry C

<4 OOs, SEQUENCE:

Met Tyr Lieu. Gly Gly Arg Arg Gly Thir Glu Ala Wall Gly Glu Ser Arg 1. 5 15

Glu Pro Gly Wall Trp Glu Wall Phe Arg Asp Glu Ala Wall Glin Wall 25

Lell Gly Asp His Arg Thir Phe Ser Ser Asp Met Asn His Phe Ile Pro 35 4 O 45

Glu Glu Glin Arg Glin Lell Ala Arg Ala Ala Arg Gly Asn Phe Wall Gly SO 55 6 O

Ile Asp Pro Pro Asp His Thir Glin Luell Arg Gly Lell Wall Ser Glin Ala 65 70

Phe Ser Pro Arg Wall Thir Ala Ala Luell Glu Pro Arg Ile Gly Arg Luell 85 90 95

Ala Glu Glin Luell Lell Asp Asp Ile Wall Ala Glu Arg Gly Asp Ala 105 11 O

Ser Asp Luell Wall Gly Glu Phe Ala Gly Pro Lell Ser Ala Ile Wall 115 12 O 125

Ile Ala Glu Luell Phe Gly Ile Pro Glu Ser Asp His Thir Met Ile Ala 13 O 135 14 O

Glu Trp Ala Ala Lell Lell Gly Ser Arg Pro Ala Gly Glu Luell Ser 145 150 155 160

Ile Ala Asp Glu Ala Ala Met Glin Asn Thir Ala Asp Lell Wall Arg Arg 1.65

Ala Gly Glu Tyr Lell Wall His His Ile Thir Glu Arg Arg Ala Arg Pro 18O 185 19 O

Glin Asp Asp Luell Thir Ser Arg Luell Ala Thir Thir Glu Wall Asp Gly Lys 195

Arg Luell Asp Asp Glu Glu Ile Wall Gly Wall Ile Gly Met Phe Luell Ile 21 O 215 22O

Ala Gly Luell Pro Ala Ser Wall Luell Thir Ala Asn Thir Wall Met Ala 225 23 O 235 24 O

Lell Asp Glu His Pro Ala Ala Luell Ala Glu Wall Arg Ser Asp Pro Ala 245 250 255

Lell Luell Pro Gly Ala Ile Glu Glu Wall Luell Arg Trp Arg Pro Pro Luell 26 O 265 27 O

Wall Arg Asp Glin Arg Lell Thir Thir Arg Asp Ala Asp Lell Gly Gly Arg 27s 285

Thir Wall Pro Ala Gly Ser Met Wall Wall Trp Lell Ala Ser Ala His 29 O 295 3 OO

Arg Asp Pro Phe Phe Glu Asn Pro Asp Luell Phe Asp Ile His Arg 3. OS 310 315 32O

Asn Ala Gly Arg His Lell Ala Phe Gly Lys Gly Ile His Cys Luell 3.25 330 335

Gly Ala Pro Luell Ala Arg Lell Glu Ala Arg Ile Ala Wall Glu Thir Luell 34 O 345 35. O

Lell Arg Arg Phe Glu Arg Ile Glu Ile Pro Arg Asp Glu Ser Wall Glu 355 360 365 US 9,399,762 B2 125 126 - Continued

Phe His Glu Ser Ile Gly Val Lieu. Gly Pro Val Arg Lieu Pro Thir Thr 37 O 375 Lieu. Phe Ala Arg Arg 385

SEQ ID NO 21 LENGTH: 414 TYPE : PRT ORGANISM: Pseudomonas putida FEATURE: OTHER INFORMATION: camphor 5-monooxygenase, camC, Cyp101, locus CPXA PSEPU, CYP101A1

<4 OOs, SEQUENCE: 21

Thir Thr Gu Thir Ile Glin Ser Asn Ala Asn Luell Ala Pro Luell Pro Pro 1. 5 1O 15

His Wall Pro Glu His Lell Wall Phe Asp Phe Asp Met Asn Pro Ser 25

Asn Luell Ser Ala Gly Wall Glin Glu Ala Trp Ala Wall Lell Glin Glu Ser 35 4 O 45

Asn Wall Pro Asp Lell Wall Trp Thir Arg ASn Gly Gly His Trp Ile SO 55 6 O

Ala Thir Arg Gly Glin Lell Ile Arg Glu Ala Tyr Glu Asp Arg His 65 70

Phe Ser Ser Glu Cys Pro Phe Ile Pro Arg Glu Ala Gly Glu Ala Tyr 85 90 95

Asp Phe Ile Pro Thr Ser Met Asp Pro Pro Glu Gln Arg Phe Arg 105 11 O

Ala Luell Ala Asn Glin Wall Wall Gly Met Pro Wall Wall Asp Luell Glu 115 12 O 125

Asn Arg Ile Glin Glu Lell Ala Ser Luell Ile Glu Ser Luell Arg Pro 13 O 135 14 O

Glin Gly Glin Asn Phe Thir Glu Asp Ala Glu Pro Phe Pro Ile 145 150 155 160

Arg Ile Phe Met Lell Lell Ala Gly Luell Pro Glu Glu Asp Ile Pro His 1.65 17O 17s

Lell Luell Thir Asp Glin Met Thir Arg Pro Asp Gly Ser Met Thir 18O 185 19 O

Phe Ala Glu Ala Glu Ala Luell Tyr Asp Lell Ile Pro Ile Ile 195

Glu Glin Arg Arg Glin Pro Gly Thir Asp Ala Ile Ser Ile Wall Ala 21 O 215

Asn Gly Glin Wall Asn Gly Arg Pro Ile Thir Ser Asp Glu Ala Arg 225 23 O 235 24 O

Met Gly Luell Lell Lell Wall Gly Gly Luell Asp Thir Wall Wall Asn Phe 245 250 255

Lell Ser Phe Ser Met Glu Phe Luell Ala Ser Pro Glu His Arg Glin 26 O 265 27 O

Glu Luell Ile Glu Arg Pro Glu Arg Ile Pro Ala Ala Cys Glu Glu Luell 27s 28O 285

Lell Arg Arg Phe Ser Lell Wall Ala Asp Gly Arg Ile Lell Thir Ser Asp 29 O 295 3 OO

Tyr Glu Phe His Gly Wall Glin Luell Gly Asp Glin Ile Luell Luell 3. OS 310 315

Pro Glin Met Luell Ser Gly Lell Asp Glu Arg Glu Asn Ala Pro Met US 9,399,762 B2 127 128 - Continued

3.25 330 335

His Wall Asp Phe Ser Arg Glin Wall Ser His Thir Thir Phe Gly His 34 O 345 35. O

Gly Ser His Luell Cys Lell Gly Glin His Luell Ala Arg Arg Glu Ile Ile 355 360 365

Wall Thir Luell Glu Trp Lell Thir Arg Ile Pro Asp Phe Ser Ile Ala 37 O 375

Pro Gly Ala Glin Ile Glin His Ser Gly Ile Wall Ser Gly Wall Glin 385 390 395 4 OO

Ala Luell Pro Luell Wall Trp Asp Pro Ala Thir Thir Ala Wall 4 OS 41O

SEQ ID NO 22 LENGTH: 515 TYPE : PRT ORGANISM: Homo sapiens FEATURE: OTHER INFORMATION: human cytochrome P450,

<4 OOs, SEQUENCE: 22

Gly Lieu. Glu Ala Lell Wall Pro Luell Ala Met Ile Wall Ala Ile Phe Luell 1. 5 15

Lell Luell Wall Asp Lell Met His Arg His Glin Arg Trp Ala Ala Arg 25

Pro Pro Gly Pro Lell Pro Lell Pro Gly Luell Gly Asn Lell Luell His Wall 35 4 O 45

Asp Phe Gln Asn Thr Pro Tyr Phe Asp Gln Lieu Arg Arg Arg Phe SO 55 6 O

Gly Asp Wall Phe Asn Lell Glin Luell Ala Trp Thir Pro Wall Wall Wall Luell 65 70

Asn Gly Luell Ala Ala Wall Arg Glu Ala Met Wall Thir Arg Gly Glu Asp 85 90 95

Thir Ala Asp Arg Pro Pro Ala Pro Ile Glin Wall Lell Gly Phe Gly 105 11 O

Pro Arg Ser Glin Gly Wall Ile Luell Ser Arg Gly Pro Ala Trp Arg 115 12 O 125

Glu Glin Arg Arg Phe Ser Wall Ser Thir Luell Arg Asn Lell Gly Luell Gly 13 O 135 14 O

Lys Ser Luell Glu Glin Trp Wall Thir Glu Glu Ala Ala Luell Cys 145 150 155 160

Ala Ala Phe Ala Asp Glin Ala Gly Arg Pro Phe Arg Pro Asn Gly Luell 1.65 17O 17s

Lell Asp Ala Wall Ser Asn Wall Ile Ala Ser Lell Thir Cys Gly Arg 18O 185 19 O

Arg Phe Glu Asp Asp Pro Arg Phe Luell Arg Lell Lell Asp Luell Ala 195 2O5

Glin Glu Gly Luell Lys Glu Glu Ser Gly Phe Luell Arg Glu Wall Luell Asn 21 O 215

Ala Wall Pro Wall Lell Pro His Ile Pro Ala Luell Ala Gly Wall Luell 225 23 O 235 24 O

Arg Phe Glin Ala Phe Lell Thir Glin Luell Asp Glu Lell Luell Thir Glu 245 250 255

His Arg Met Thir Trp Asp Pro Ala Glin Pro Pro Arg Asp Luell Thir Glu 26 O 265 27 O

Ala Phe Luell Ala Lys Glu Ala Lys Gly Ser Pro Glu Ser Ser US 9,399,762 B2 129 130 - Continued

27s 28O 285

Phe Asn Asp Glu Asn Lieu. Arg Ile Val Val Gly Asn Lieu. Phe Luell Ala 29 O 295 3 OO

Gly Met Wall. Thir Thir Lieu. Thir Thir Lieu Ala Trp Gly Lieu. Lieu. Luell Met 3. OS 310 315

Ile Lieu. His Lieu. Asp Val Glin Arg Gly Arg Arg Val Ser Pro Gly 3.25 330 335

Ser Pro Ile Val Gly Thr His Val Cys Pro Val Arg Val Glin Glin Glu 34 O 345 35. O

Ile Asp Asp Val Ile Gly Glin Val Arg Arg Pro Glu Met Gly Asp Glin 355 360 365

Wall His Met Pro Tyr Thr Thr Ala Val Ile His Glu Wall Glin Arg Phe 37 O 375 38O

Gly Asp Ile Val Pro Leu Gly Val Thr His Met Thir Ser Arg Asp Ile 385 390 395 4 OO

Glu Val Glin Gly Phe Arg Ile Pro Lys Gly Thir Thir Lieu. Ile Thir Asn 4 OS 41O 415

Lell Ser Ser Val Lieu Lys Asp Glu Ala Val Trp Glu Lys Pro Phe Arg 42O 425 43 O

Phe His Pro Glu. His Phe Lieu. Asp Ala Glin Gly His Phe Val Pro 435 44 O 445

Glu Ala Phe Lieu Pro Phe Ser Ala Gly Arg Arg Ala Cys Lieu. Gly Glu 450 45.5 460

Pro Lieu Ala Arg Met Glu Lieu. Phe Lieu. Phe Phe Thir Ser Lieu. Luell Glin 465 470 475

His Phe Ser Phe Ser Val Ala Ala Gly Glin Pro Arg Pro Ser His Ser 485 490 495

Arg Wall Wal Ser Phe Leu Val Thr Pro Ser Pro Tyr Glu Lieu. Ala SOO 505 51O

Wall Pro Arg 515

<210s, SEQ ID NO 23 &211s LENGTH: 532 212. TYPE: PRT <213> ORGANISM: Rattus norvegicus 22 Os. FEATURE: <223> OTHER INFORMATION: rat cytochrome P450, CYPC27 <4 OOs, SEQUENCE: 23 Ala Val Lieu. Ser Arg Met Arg Lieu. Arg Trp Ala Lieu. Lieu. Asp Thir Arg 1. 5 1O 15

Wall Met Gly His Gly Lieu. Cys Pro Glin Gly Ala Arg Ala Lys Ala Ala 2O 25 3O

Ile Pro Ala Ala Lieu. Arg Asp His Glu Ser Thr Glu Gly Pro Gly Thir 35 4 O 45

Gly Glin Asp Arg Pro Arg Lieu. Arg Ser Lieu. Ala Glu Lieu. Pro Gly Pro SO 55 6 O

Gly Thr Lieu. Arg Phe Lieu. Phe Gln Leu Phe Lieu. Arg Gly Tyr Wall Luell 65 70 7s

His Lieu. His Glu Lieu. Glin Ala Lieu. Asn Lys Ala Lys Tyr Gly Pro Met 85 90 95

Trp Thir Thir Thr Phe Gly Thr Arg Thr Asn Wall Asn Lieu. Ala Ser Ala 1OO 105 11 O

Pro Lieu. Lieu. Glu Glin Val Met Arg Glin Glu Gly Lys Tyr Pro Ile Arg US 9,399,762 B2 131 132 - Continued

115 12 O 125

Asp Ser Met Glu Glin Trp Lys Glu His Arg Asp His Gly Luell Ser 13 O 135 14 O

Tyr Gly Ile Phe Ile Thir Glin Gly Glin Glin Trp His Luell Arg His 145 150 155 160

Ser Luell Asn Glin Arg Met Lell Pro Ala Glu Ala Ala Luell Tyr Thir 1.65 17s

Asp Ala Luell Asn Glu Wall Ile Ser Asp Phe Ile Ala Arg Luell Asp Glin 18O 185 19 O

Wall Arg Thir Glu Ser Ala Ser Gly Asp Glin Wall Pro Asp Wall Ala His 195

Lell Luell His Lell Ala Lell Glu Ala Ile Tyr Ile Luell Phe Glu 21 O 215 22O

Lys Arg Wall Gly Cys Lell Glu Pro Ser Ile Pro Glu Asp Thir Ala Thir 225 23 O 235 24 O

Phe Ile Arg Ser Wall Gly Lell Met Phe Lys ASn Ser Wall Wall Thir 245 250 255

Phe Luell Pro Lys Trp Ser Arg Pro Luell Lel Pro Phe Trp Lys Arg Tyr 26 O 265 27 O

Met Asn Asn Trp Asp Asn Ile Phe Ser Phe Gly Glu Lys Met Ile His 27s 28O 285

Glin Lys Wall Glin Glu Ile Glu Ala Glin Lel Glin Ala Ala Gly Pro Asp 29 O 295 3 OO

Gly Wall Glin Wall Ser Gly Luell His Phe Luell Lell Thir Glu Luell 305 310 315

Lell Ser Pro Glin Glu Thir Wall Gly Thir Phe Pro Glu Lell Ile Luell Ala 3.25 330 335

Gly Wall Asp Thir Thir Ser Asn Thir Luell Thir Trp Ala Lell Tyr His Luell 34 O 345 35. O

Ser Asn Pro Glu Ile Glin Glu Ala Luell His Glu Wall Thir Gly 355 360 365

Wall Wall Pro Phe Gly Wall Pro Glin Asn Asp Phe Ala His Met 37 O 375

Pro Luell Luell Ala Wall Ile Glu Thir Luell Arg Lell Pro Wall 385 390 395 4 OO

Wall Pro Thir Asn Ser Arg Ile Ile Thir Glu Glu Thir Glu Ile Asn 4 OS 415

Gly Phe Luell Phe Pro Asn Thir Glin Phe Wall Lell Thir Wall 425 43 O

Wall Ser Arg Asp Pro Ser Wall Phe Pro Glu Pro Glu Ser Phe Glin Pro 435 44 O 445

His Arg Trp Luell Arg Arg Glu Asp Asp ASn Ser Gly Ile Glin His 450 45.5 460

Pro Phe Gly Ser Wall Pro Phe Gly Gly Wall Arg Ser Luell Gly 465 470 47s 48O

Arg Arg Ile Ala Glu Lell Glu Met Glin Luell Luell Lell Ser Arg Luell Ile 485 490 495

Glin Glu Wall Wall Lell Ser Pro Gly Met Gly Glu Wall Ser SOO 505 51O

Wall Ser Arg Ile Wall Lell Wall Pro Ser Wall Ser Luell Arg Phe 515 52O 525

Lell Glin Arg Glin 53 O US 9,399,762 B2 133 134 - Continued

<210s, SEQ ID NO 24 &211s LENGTH: 491 212. TYPE: PRT <213> ORGANISM: Oryctolagus cuniculus 22 Os. FEATURE: <223> OTHER INFORMATION: rabbit cytochrome P450, <4 OOs, SEQUENCE: 24

Met Glu Phe Ser Lieu Lleu Lleu Lleu Lieu. Ala Phe Lieu. Ala Gly Luell Luell 1. 5 1O 15

Lell Lieu. Lieu. Phe Arg Gly His Pro Lys Ala His Gly Arg Luell Pro Pro 2O 25 3O

Gly Pro Ser Pro Lieu Pro Val Lieu. Gly Asn Lieu. Lieu. Glin Met Asp Arg 35 4 O 45 Gly Lieu. Lieu. Arg Ser Phe Lieu. Arg Lieu. Arg Glu Gly Asp SO 55 6 O

Wall Phe Thr Val Tyr Lieu. Gly Ser Arg Pro Val Val Wall Luell Gly 65 70 7s

Thir Asp Ala Ile Arg Glu Ala Lieu Val Asp Glin Ala Glu Ala Ser 85 90

Gly Arg Gly Lys Ile Ala Val Val Asp Pro Ile Phe Glin Gly Gly 1OO 105 11 O

Wall Ile Phe Ala Asn Gly Glu Arg Trp Arg Ala Lieu. Arg Arg Phe Ser 115 12 O 125

Lell Ala Thr Met Arg Asp Phe Gly Met Gly Lys Arg Ser Wall Glu Glu 13 O 135 14 O

Arg Ile Glin Glu Glu Ala Arg Cys Lieu Val Glu Glu Lell Arg Ser 145 150 155 160

Gly Ala Lieu. Lieu. Asp Asn. Thir Lieu. Lieu. Phe His Ser Ile Thir Ser 1.65 17O 17s

Asn Ile Ile Cys Ser Ile Val Phe Gly Lys Arg Phe Asp Tyr Asp 18O 185 19 O

Pro Val Phe Lieu. Arg Lieu. Lieu. Asp Lieu. Phe Phe Glin Ser Phe Ser Luell 195 2OO

Ile Ser Ser Phe Ser Ser Glin Wall Phe Glu Lieu. Phe Pro Gly Phe Luell 21 O 215 22O

Lys His Phe Pro Gly Thr His Arg Glin Ile Tyr Arg Asn Luell Glin Glu 225 23 O 235 24 O

Ile Asn Thr Phe Ile Gly Glin Ser Val Glu Lys His Arg Ala Thir Luell 245 250 255

Asp Pro Ser Asn Pro Arg Asp Phe Ile Asp Val Tyr Lell Luell Arg Met 26 O 265 27 O

Glu Lys Asp Llys Ser Asp Pro Ser Ser Glu Phe His His Glin Asn Luell 27s 28O 285

Ile Lieu. Thr Val Lieu Ser Leu Phe Phe Ala Gly Thr Glu Thir Thir Ser 29 O 295 3 OO

Thir Thir Lieu. Arg Tyr Gly Phe Lieu. Lieu Met Lieu Lys Pro His Wall 3. OS 310 315 32O

Thir Glu Arg Val Glin Lys Glu Ile Glu Glin Val Ile Gly Ser His Arg 3.25 330 335

Pro Pro Ala Lieu. Asp Asp Arg Ala Lys Met Pro Tyr Thir Asp Ala Wall 34 O 345 35. O

Ile His Glu Ile Glin Arg Lieu. Gly Asp Lieu. Ile Pro Phe Gly Wall Pro 355 360 365 US 9,399,762 B2 135 136 - Continued

His Thir Wall Thir Lys Asp Thir Glin Phe Arg Gly Tyr Wall Ile Pro 37 O 375

Asn Thir Glu Wall Phe Pro Wall Luell Ser Ser Ala Lell His Asp Pro Arg 385 390 395 4 OO

Phe Glu Thir Pro Asn Thir Phe Asn Pro Gly His Phe Luell Asp Ala 4 OS 415

Asn Gly Ala Luell Lys Arg Asn Glu Gly Phe Met Pro Phe Ser Luell Gly 425 43 O

Arg Ile Lell Gly Glu Gly Ile Ala Arg Thir Glu Luell Phe Luell 435 44 O 445

Phe Phe Thir Thir Ile Lell Glin Asn Phe Ser Ile Ala Ser Pro Wall Pro 450 45.5 460

Pro Glu Asp Ile Asp Lell Thir Pro Arg Glu Ser Gly Wall Gly Asn Wall 465 470 48O

Pro Pro Ser Glin Ile Arg Phe Luell Ala Arg 485 490

SEO ID NO 25 LENGTH: 1061 TYPE : PRT ORGANISM; Bacillus subtilis FEATURE: OTHER INFORMATION: Bacillus subtilis strain 168 probable bifunctional P-450 ANADPH-P450 reductase 1, CypD, locus CYPD BACSU, CYP102A2

<4 OOs, SEQUENCE: 25

Met Lys Glu Thr Ser Pro Ile Pro Glin Pro Lys Thir Phe Gly Pro Luell 1. 5 15

Gly Asn Luell Pro Lell Ile Asp Asp Pro Thir Lell Ser Luell Ile 25

Lys Luell Ala Glu Glu Glin Gly Pro Ile Phe Glin Ile His Thir Pro Ala 35 4 O 45

Gly Thir Thir Ile Wall Wall Ser Gly His Glu Luell Wall Glu Wall Cys SO 55 6 O

Asp Glu Glu Arg Phe Asp Ser Ile Glu Gly Ala Lell Glu Wall 65 70

Arg Ala Phe Ser Gly Asp Gly Luell Phe Thir Ser Trp Thir His Glu Pro 85 90 95

Asn Trp Arg Lys Ala His Asn Ile Luell Met Pro Thir Phe Ser Glin Arg 105 11 O

Ala Met Lys Asp Tyr His Glu Lys Met Wall Asp Ile Ala Wall Glin Luell 115 12 O 125

Ile Glin Trp Ala Arg Lell Asn Pro Asn Glu Ala Wall Asp Wall Pro 13 O 135 14 O

Gly Asp Met Thir Lell Thir Luell Asp Thir Ile Gly Lell Gly Phe 145 150 155 160

Asn Tyr Arg Phe Asn Ser Arg Glu Thir Pro His Pro Phe Ile 1.65 17O 17s

Asn Ser Met Wall Arg Ala Lell Asp Glu Ala Met His Glin Met Glin Arg 18O 185 19 O

Lell Asp Wall Glin Asp Lell Met Wall Arg Thir Arg Glin Phe Arg 195 2OO 2O5

His Asp Ile Glin Thir Met Phe Ser Luell Wall Asp Ser Ile Ile Ala Glu 21 O 215 22O US 9,399,762 B2 137 138 - Continued

Arg Arg Ala Asn Gly Asp Glin Asp Glu Lys Asp Lell Lell Ala Arg Met 225 23 O 235 24 O

Lell Asn Wall Glu Asp Pro Glu Thir Gly Glu Lell Asp Asp Glu Asn 245 250 255

Ile Arg Phe Glin Ile Ile Thir Phe Luell Ile Ala Gly His Glu Thir Thir 26 O 265 27 O

Ser Gly Luell Luell Ser Phe Ala Thir Phe Luell Lell Lys His Pro Asp 285

Luell Ala Glu Glu Wall Asp Arg Wall Lell Thir Asp Ala 29 O 295 3 OO

Ala Pro Thir Lys Glin Wall Luell Glu Luell Thir Ile Arg Met Ile 3. OS 310 315

Lell Asn Glu Ser Lell Arg Lell Trp Pro Thir Ala Pro Ala Phe Ser Luell 3.25 330 335

Pro Glu Asp Thir Wall Ile Gly Gly Phe Pro Ile Thir Thir 34 O 345 35. O

Asn Asp Arg Ile Ser Wall Lell Ile Pro Glin Luell His Arg Asp Arg Asp 355 360 365

Ala Trp Gly Asp Ala Glu Glu Phe Arg Pro Glu Arg Phe Glu His 37 O 375

Glin Asp Glin Wall Pro His His Ala Pro Phe Gly Asn Gly Glin 385 390 395 4 OO

Arg Ala Ile Gly Met Glin Phe Ala Luell His Glu Ala Thir Luell Wall 4 OS 415

Lieu Gly Met Ile Lieu Phe Thr Lieu Ile Asp His Glu Asn 425 43 O

Glu Luell Asp Ile Lys Glin Thir Luell Thir Luell Pro Gly Asp Phe His 435 44 O 445

Ile Arg Wall Glin Ser Arg Asn Glin Asp Ala Ile His Ala Asp Wall Glin 450 45.5 460

Ala Wall Glu Ala Ala Ser Asp Glu Glin Lys Glu Thir Glu Ala 465 470

Gly Thir Ser Wall Ile Gly Luell Asn Asn Arg Pro Lell Luell Wall Luell 485 490 495

Gly Ser Asp Thir Gly Thir Ala Glu Gly Wall Ala Arg Glu Luell Ala SOO 505

Asp Thir Ala Ser Lell His Gly Wall Arg Thir Glu Thir Ala Pro Luell Asn 515 525

Asp Arg Ile Gly Lys Lell Pro Glu Gly Ala Wall Wall Ile Wall Thir 53 O 535 54 O

Ser Ser Asn Gly Lys Pro Pro Ser Asn Ala Gly Glin Phe Wall Glin 5.45 550 555 560

Trp Luell Glin Glu Ile Pro Gly Glu Luell Glu Gly Wall His Tyr Ala 565 st O sts

Wall Phe Gly Cys Gly Asp His Asn Trp Ala Ser Thir Glin Wall 58O 585 59 O

Pro Arg Phe Ile Asp Glu Glin Luell Ala Glu Gly Ala Thir Arg Phe 595 6OO 605

Ser Ala Arg Gly Glu Gly Asp Wall Ser Gly Asp Phe Glu Gly Glin Luell 610 615 62O

Asp Glu Trp Lys Ser Met Trp Ala Asp Ala Ile Ala Phe Gly 625 630 635 64 O

Lell Glu Luell Asn Glu Asn Ala Asp Glu Arg Ser Thir Luell Ser Luell US 9,399,762 B2 139 140 - Continued

645 650 655

Glin Phe Val Arg Gly Lieu. Gly Glu Ser Pro Lieu Ala Arg Ser Glu 660 665 67 O

Ala Ser His Ala Ser Ile Ala Glu Asn Arg Glu Lieu. Glin Ser Ala Asp 675 68O 685

Ser Asp Arg Ser Thr Arg His Ile Glu Ile Ala Leu Pro Pro Asp Wall 69 O. 695 7 OO

Glu Tyr Glin Glu Gly Asp His Lieu. Gly Val Lieu Pro Lys Asn Ser Glin 7 Os 71O 71s

Thir Asn. Wal Ser Arg Ile Lieu. His Arg Phe Gly Lieu Lys Gly Thir Asp 72 73 O 73

Glin Val Thir Lieu. Ser Ala Ser Gly Arg Ser Ala Gly His Lieu. Pro Luell 740 74. 7 O

Gly Arg Pro Val Ser Lieu. His Asp Lieu. Leu Ser Tyr Ser Val Glu Wall 7ss 760 765

Glin Glu Ala Ala Thr Arg Ala Glin Ile Arg Glu Lieu Ala Ala Phe Thir 770 775 78O

Val Cys Pro Pro His Arg Arg Glu Lieu. Glu Glu Lieu. Ser Ala Glu Gly 78s 79 O 79.

Val Tyr Glin Glu Glin Ile Lieu Lys Llys Arg Ile Ser Met Lieu. Asp Luell 805 810 815

Lieu. Glu Lys Tyr Glu Ala Cys Asp Met Pro Phe Glu Arg Phe Luell Glu 82O 825 83 O

Lieu. Lieu. Arg Pro Lieu Lys Pro Arg Tyr Tyr Ser Ile Ser Ser Ser Pro 835 84 O 845 Arg Val Asn. Pro Arg Glin Ala Ser Ile Thr Val Gly Val Val Arg Gly 850 855 860

Pro Ala Trp Ser Gly Arg Gly Glu Tyr Arg Gly Val Ala Ser Asn Asp 865 87O 87s

Lieu Ala Glu Arg Glin Ala Gly Asp Asp Wall Wal Met Phe Ile Arg Thir 885 890 895

Pro Glu Ser Arg Phe Gln Leu Pro Llys Asp Pro Glu Thr Pro Ile Ile 9 OO 905 91 O

Met Val Gly Pro Gly Thr Gly Val Ala Pro Phe Arg Gly Phe Luell Glin 915 92 O 925

Ala Arg Asp Val Lieu Lys Arg Glu Gly Llys Thr Lieu. Gly Glu Ala His 93 O 935 94 O

Lieu. Tyr Phe Gly Cys Arg Asn Asp Arg Asp Phe Ile Tyr Arg Asp Glu 945 950 955 96.O

Lieu. Glu Arg Phe Glu Lys Asp Gly Ile Wall Thir Wal His Thr Ala Phe 965 97O 97.

Ser Arg Lys Glu Gly Met Pro Llys Thr Tyr Val Gln His Leu Met Ala 98O 985 99 O

Asp Glin Ala Asp Thir Lieu. Ile Ser Ile Lieu. Asp Arg Gly Gly Arg Luell 995 1OOO 1 OOS

Tyr Val Cys Gly Asp Gly Ser Lys Met Ala Pro Asp Val Glu Ala Ala 1010 1 O15 1 O2O

Lieu. Glin Lys Ala Tyr Glin Ala Wal His Gly Thr Gly Glu Gln Glu Ala 1025 103 O 1035 104 O

Glin Asn Trp Lieu. Arg His Lieu. Glin Asp Thr Gly Met Tyr Ala Lys Asp 1045 1OSO 105.5

Val Trp Ala Gly Ile 106 O US 9,399,762 B2 141 142 - Continued

SEQ ID NO 26 LENGTH: 1054 TYPE : PRT ORGANISM; Bacillus subtilis FEATURE: OTHER INFORMATION: Bacillus subtilis strain 168 probable bifunctional P-450 ANADPH-P450 reductase 2, Cype,

SEQUENCE: 26

Met Lys Glin Ala Ser Ala Ile Pro Glin Pro Lys Thir Gly Pro Luell 1. 5 15

Lys Asn Luell Pro His Lell Glu Glu Glin Luell Ser Glin Ser Luell Trp 25

Arg Ile Ala Asp Glu Lell Gly Pro Ile Phe Arg Phe Asp Phe Pro Gly 35 4 O 45

Wall Ser Ser Wall Phe Wall Ser Gly His Asn Luell Wall Ala Glu Wall Cys SO 55 6 O

Asp Glu Ser Arg Phe Asp Asn Luell Gly Lys Gly Lell Glin Wall 65 70

Arg Glu Phe Gly Gly Asp Gly Luell Phe Thir Ser Trp Thir His Glu Pro 85 90 95

Asn Trp Glin Lys Ala His Arg Ile Luell Luell Pro Ser Phe Ser Glin 105 11 O

Ala Met Lys Gly Tyr His Ser Met Met Luell Asp Ile Ala Thir Glin Luell 115 12 O 125

Ile Trp Ser Arg Lieu Asn Pro Asn Glu Glu Ile Asp Wall Ala 13 O 135 14 O

Asp Asp Met Thir Lell Thir Luell Asp Thir Ile Gly Lell Gly Phe 145 150 155 160

Asn Arg Phe Asn Ser Phe Arg Asp Ser Glin His Pro Phe Ile 1.65 17O 17s

Thir Ser Met Luell Arg Ala Lell Glu Ala Met Asn Glin Ser 18O 185 19 O

Lell Gly Luell Glin Asp Met Met Wall Lys Thir Lell Glin Phe Glin 195

Asp Ile Glu Wall Met Asn Ser Luell Wall Asp Arg Met Ile Ala Glu 21 O 215

Arg Ala Asn Pro Asp Asp Asn Ile Asp Lell Lell Ser Luell Met 225 23 O 235 24 O

Lell Ala Asp Pro Wall Thir Gly Glu Thir Lell Asp Asp Glu Asn 245 250 255

Ile Arg Glin Ile Ile Thir Phe Luell Ile Ala Gly His Glu Thir Thir 26 O 265 27 O

Ser Gly Luell Luell Ser Phe Ala Ile Luell Lell Thir His Pro Glu 27s 28O 285

Luell Ala Glin Glu Glu Ala Asp Arg Wall Lell Thir Asp Asp 29 O 295 3 OO

Thir Pro Glu Tyr Lys Glin Ile Glin Glin Luell Lys Thir Arg Met Wall 3. OS 310 315

Lell Asn Glu Thir Lell Arg Lell Pro Thir Ala Pro Ala Phe Ser Luell 3.25 330 335

Ala Glu Asp Thir Wall Luell Gly Gly Glu Pro Ile Ser 34 O 345 35. O

Gly Glin Pro Wall Thir Wall Lell Ile Pro Luell His Arg Asp Glin Asn