US008715988B2

(12) United States Patent (10) Patent No.: US 8,715,988 B2 Arnold et al. (45) Date of Patent: May 6, 2014

(54) ALKANE OXIDATION BY MODIFIED 2003, OO77795 A1 4, 2003 Wilson HYDROXYLASES 2003/OO77796 A1 4/2003 Crotean 2003. O100744 A1 5/2003 Farinas 2005, 0037411 A1 2/2005 Arnold et al. (75) Inventors: Frances Arnold, Pasadena, CA (US); 2005/00591.28 A1 3, 2005 Arnold et al. Peter Meinhold, Pasadena, CA (US); 2005/0202419 A1 9, 2005 Cirino et al. Matthew W. Peters, Pasadena, CA 2008/0293928 A1 11/2008 Farinas et al. (US); Rudi Fasan, Brea, CA (US); Mike M. Y. Chen, Pasadena, CA (US) FOREIGN PATENT DOCUMENTS EP O 505 198 A1 9, 1992 (73) Assignee: California Institute of Technology, EP O 752 008 2, 1997 Pasadena, CA (US) EP O932 670 8, 2000 WO WO 89.03424 A1 4f1989 (*) Notice: Subject to any disclaimer, the term of this WO WO95/22625 A1 8, 1995 patent is extended or adjusted under 35 WO WO97, 16553 A1 5, 1997 WO WO97/20078 A1 6, 1997 U.S.C. 154(b) by 999 days. WO WO 97.35957 A1 Of 1997 WO WO97/35966 A1 10, 1997 (21) Appl. No.: 11/697,404 WO WO 98.27 230 A1 6, 1998 WO WO 98,31837 A1 7, 1998 (22) Filed: Apr. 6, 2007 WO WO98/41653 A1 9, 1998 WO WO 98.42832 A1 10, 1998 Prior Publication Data WO WO99,60096 A2 11/1999 (65) WO WOOOOO632 A1 1, 2000 US 2008/0057577 A1 Mar. 6, 2008 WO WOOOO4190 A1 1, 2000 WO WOOOO6718 A2 2, 2000 WO WOOOO9679 A1 2, 2000 Related U.S. Application Data WO WOOO, 18906 A3 4/2000 (63) Continuation-in-part of application No. WO WOOO, 31273 A2 6, 2000 WO OOT8973 A1 12/2000 PCT/US2006/01 1273, filed on Mar. 28, 2006. WO WOO1/62938 A2 8, 2001 (60) Provisional application No. 60/665,903, filed on Mar. WO 02083868 A2 10, 2002 28, 2005, provisional application No. 60/698,872, WO WO 03/008563 A2 1, 2003 filed on Jul. 13, 2005, provisional application No. WO 2005O17105 A2 2, 2005 60/700,781, filed on Jul. 20, 2005, provisional OTHER PUBLICATIONS application No. 60/900.243, filed on Feb. 8, 2007. Baharlou, Simin, International Preliminary Report on Patentability, (51) Int. C. International Application No. PCT/US2006/01 1273, Date of Issu CI2N 9/02 (2006.01) ance of Report Nov. 27, 2008. (52) U.S. C. Cook, Gareth, Australian Patent Office Search Report and Written USPC ...... 435/189 Opinion, Application No. SG200708978-2, Date of Mailing Dec. 16. (58) Field of Classification Search 2008. None Arnold et al., “Directed Evolution of Biocatalysts.” Current Opinion in Chem. Biology, Current Biology Ltd, London GB3(1):54-59, Feb. See application file for complete search history. 1999. Aust, S. D., Redox Report, 1999, 4:195-7. (56) References Cited Barnes, H. J., et al., Proce. Natl Acad. Sci USA 1991; 88:5597-601. U.S. PATENT DOCUMENTS Bell et al., “Butane and propane oxidation by engineered cytochromes P450(cam).” Chemical Communications, 2002, vol. 5, 4,599,342 A 7, 1986 LaHann pp. 490–491. 5,605,793 A 2f1997 Stemmer Bell et al., “Engineering Cytochrome P450cam into an alkane 5,741,691 A 4, 1998 Arnold et al. hydroxylase.” Dalton Transactions, 2003, vol. 11, pp. 2133-2140. 5,785,989 A 7/1998 Stanley et al. Beratan, D. N. T., Electron Transfer, 1996, Oxford: Bios 5,811,238 A 9, 1998 Stemmer et al. Scientific Publishers, pp. 23-42. 5,830,721 A 11/1998 Stemmer et al. 5,837,458 A 11/1998 Minshull et al. (Continued) 5,906,930 A 5, 1999 Arnold et al. 5,965,408 A 10, 1999 Short Primary Examiner — Tekchand Saidha 6,090,604 A 7/2000 Golightly et al. 6,498,026 B2 12/2002 Delagrave et al. Assistant Examiner — Younus Meah 6,524,837 B1 2/2003 Arnold et al. (74) Attorney, Agent, or Firm — Gavrilovich Dodd & 6,537,746 B2 3/2003 Arnold et al. Lindsey LLP: Joseph R. Baker, Jr. 6,643,591 B1 1 1/2003 Korzekwa et al. 6,794,168 B1 9/2004 Wong et al. (57) ABSTRACT 6,902,918 B1 6, 2005 Arnold et al. 7,098,010 B1 8, 2006 Arnold et al. This invention relates to modified hydroxylases. The inven 7,115,403 B1 10/2006 Arnold et al. tion further relates to cells expressing Such modified 7,226,768 B2 6, 2007 Farinas et al. 7,435,570 B2 10/2008 Arnold et al. hydroxylases and methods of producing hydroxylated 7.465,567 B2 12/2008 Cirino et al. alkanes by contacting a suitable Substrate with Such cells. 7,524,664 B2 * 4/2009 Arnold et al...... 435,189 2002fOO45.175 A1 4/2002 Wang et al. 7 Claims, 40 Drawing Sheets US 8,715,988 B2 Page 2

(56) References Cited Lipman, D. J. and Pearson W. R., Science 1985; 227:1435-1441. Lutz et al., Proc. Natl AcadSci USA, 2001. vol.98, pp. 11248-11253. OTHER PUBLICATIONS Ohkuma et al., “Cyp52 (Cytochrome-P450alk) multigene family in candida-maltose—Identification and characterization of 8 mem Blay et al., “Alkane oxidation by a carbonxylate-bridged bers.” DNA and Cell Biology, 1995, vol. 14, No. 2, pp. 163-173. dimanganese (III) complex. Chemical Communications, 2001, vol. Ost, T. W., et al. “Rational re-design of the substrate binding site of 20, pp. 2102-2103. flavocytochrome P450 BM3”; FEBS Lett., vol. 486, No. 2, Abstract Chavez et al., “Syntheses, structures, and reactivities of cobalt(III)- 2000. alkylperoxo complexes and their role in Stoichiometric and catalytic Otey et al., “Functional evolution and structural conservation in oxidation of hydrocarbons,” Journal of the American Chemical Soci chimeric cytochromes P450: Calibrating a structure-guided ety, 1998, vol. 120, No. 35, pp. 9015-9027. approach.” Chemistry and Biology, 2004, vol. 11, pp. 309-318. Cirino et al. "A self-sufficient peroxide-driven hydroxylation Peters, Matthew W., “Regio- and Enantioselective Alkane biocatalyst.” Angewandte Chemie International Edition, 2003, vol. Hydroxylation with Engineered Cytochromes P450 BM-3.” J. Am. 42, No. 28, pp. 3299-3301. Chem. Soc., vol. 125, 2003, pp. 13442-13450. Cirino et al., “Exploring the diversity of heme enzymes through Peterson et al., “The many faces of P450s and their structural and directed evolution.” In Directed Molecular Evolution of , functional implications. Sixth International Symposium on 2002, pp. 215-243, S. Brakmann and K. Johnsson, eds. (Germany: Cytrochrome P450 Biodiversity: University of California, Los Wiley-VCH). Angels, 2002, p. 26. Cirino, Patrick C., and R. Georgescu. "Screening for Thermostabil Petrounia, Ioanna and F. H. Arnold “Designed evolution of enzymatic ity.” Methods in Molecular Biology, May 2003, pp. 117-125, vol. properties.” Current Opinion in Biotech., 11 (4): 325-330, Aug. 2000. 230. Pompon, et al., “ by cDNA recombination in Coco et al., Nat. Biotechnol., 2001, vol. 19, pp. 354-359. yeasts: shuffling of mammalian cytochrome P-450 functions.” Gene, De Visser et al., “Hydrogen bonding modulates the slectivity of 1989, vol. 83, pp. 15-24. enzymatic oxidation by P450: Chameleon oxidant behavior by com Porter, et al., J. Biol. Chem., 1991, vol. 266, pp. 13469-13472. pound I.” Angewandte Chemie-International Edition, 2002, vol. 41. Ramarao et al., “Identification by in vitro mutagenesis of the inter No. 11, pp. 1947. action of two segments of C2MstC1, a chimera of cytochromes P450 De Visser et al., “What factors affect the regioselectivity of oxidation by cytochrome P450? A DFT study of allylic hydroxylation and 2C2 and P450 2C1.” The Journal of Biological Chemistry, Jan. 27. double bond epoxidation in a model reaction.” Journal of the Ameri 1995, vol. 270, No. 4, pp. 1873-1880. can Chemical Society, 2002, vol. 124, No. 39, pp. 11809-11826. Roberts, “The power of evolution: accessing the synthetic potential Elliot et al., “Regio- and stereoselectivity of particulate methane of P450s', Chemistry & Biology, 1999, vol. 6, No. 10, pp. R269 monoxygenanse from Methylococcus capsulates (Bath). Journal of R272. the American Chemical Society, 1997, vol. 199, No. 42, pp. 9949 Salazar, Oriana, P. C. Cirino, F. H. Arnold “Thermostability of a 9955. Cytochrome P450 Peroxygenase.” Chembiochem, 4 (9):891-893, Gleider et al., “Laboratory evolution of a soluble, self-sufficient, Sep. 2003. highly active alkanehydroxylase.” Nature Biotech., 2002, vol. 20, pp. Schneider, et al., “Production of chiral hydroxyl long chain fatty 1135-1139. acids by whole cells producing cytochrome P450 (BM-3) Gonzalez et al., “Evolution of the P450 gene Superfamily animal monoxygenase.” Tetrahedron Asymetry, 1998, vol. 9, No. 16, pp. plant warfare, molecular drive and human genetic differences in 2833-2844. drug oxidation.” Trends Genet. 1990, vol. 6, pp. 182-186. SegheZZi et al., “Identification of characterization of additional mem Gonzalez, Frank J. D. W. Nebert, J. P. Hardwick, and C. B. Kasper bers of the cytochrome-P450 multigene family Cyp52 of Candida “Complete cDNA and Protein Sequences of a pregnenolone 16C.- tropicalis.” DNA and Cell Biology, 1992, vol. 11, No. 10, pp. 767 Carabonitrile-induced Cytochrome P-450 A Representative of a New T80. Gene Family” J. Biol. Chem. 260 (12):7435-7441, 1985. Sieber et al., Nat. Biotechnol., 2001, vol. 19, pp. 456-460. Gotoh, Cytochrome P450, 2nd Edition, 1993, pp. 255-272. Sono et al., "Heme-containing oxygenases.” Chemical Reviews, Govindaraj and Poulos; "Role of the linker region connecting the 1996, vol. 96, No. 7, pp. 2841-2887. reductase and heme domains in cytochrome P450BM-3”; Biochem Thomas, J. M., et al., Acc Chem Res 2001; 34:191-200. istry; vol. 34, No. 35. Abstract, 1995. Tsotsou et al., “High throughput assay for chytochroms P450BM3 Hansson et al., J. Mol. Biol., 1999, vol. 287, pp. 265-276. for Screening libraries of Substrates and combinatorial mutants.” Jaeger et al., "Enantioselective biocatalysts optimized by directed Biosensors and Bioelectronics, 2002, vol. 17. No. 1-2, pp. 119-131. evolution.” Current Opinion in Biotechnology, 2004, vol. 15, No. 4. Urlacher et al., “Biotransformations using prokaryotic P450 pp. 305-313. monooxygenases. Current Opinion in Biotechnology, 2002, vol. 13, Li, Q. et al., “Rational evolution of a medium chain-specific pp. 557-564. cytochrome P-450 BM-3 variant.” Biochimica et Biophysica Acta, Urlacher et al., “Protein Engineering of cytochrome P450 2001, pp. 114-121, 1545, Elsevier Science B.V. monooxygenase from Bacillus megaterium.” Methods in Enzymol Li, Qing-Shan, J. Ogawa, R. D. Schmid, and S. Shimizu, “Engineer ogy, pp. 208-224, vol. 388, 2004. ing Cytochrome P450 BM-3 for Oxidation of Polycyclic Aromatic Van Deurzen M. P. J., et al. Tetrahedron 1997:53:13183-13220. Hydrocarbon Appl. and Env, Microbiol. Dec. 2001, 67(10): 5735 Wubbolts, et al., “Enantioselective oxidation by non-heme iron 5739. monoxygenases from Pseudomonas.” CHIMIA, 1996, vol. 16, pp. Li et al., Chemistry 2000, vol. 6, pp. 1531-1536. 436-437. Li et al., “residue size at position 87 of cytochrome P450 BM-3 Yeom, Sligar H., et al., “The role of Thr268 in oxygen activation of determines its stereo selectivity in propylbenzene and cytochrome P450BM-3” Biochemistry, vol. 34, No. 45. Abstract 3-chlorostyrene oxidation.” FEBS Lett 508, 2001, pp. 249-252. 1995. Li, H., et al. J. Biol. Chem, 1991:266:11909-1 1914. “enzymology of cythotchrme P450 reductase.” printed Apr. 5, 2004 Li, Q. S., et al.; “Critical Role of the residue size at position 87 in http://wwwuky.edu/Pharmacy/ps/porter/CPR enzymology.htm. H2)2-dependent substrate hydroxylation activity in h202 nactivia Sequence 4, U.S. Appl. No. 10/018,730A. tion of cytochrome P450-BM-3”; Biochem. Biophysics Res Com Sequence 9, U.S. Appl. No. 10/869,813. mun. vol. 280, No. 5, Abstract, 2001. Sequence 10, U.S. Appl. No. 10/869,813. Li, et al., “Critical Role of the Residue Size at Position 87 in H202 Sequence 11, U.S. Appl. No. 10/398,178. Dependent Substrate Hydroxylation Activity and H202 Inactivation Sequence 54, U.S. Appl. No. 10/869,825. of Cytochrome P450BM-3”, Biochemical and Biophysical Research Ayala et al., “Enzymatic Activation of alkanes: constraints and pro Communications, 2001, vol. 280, pp. 1258-1261. spective.” Applied Catalysis A: General, 2004, pp. 1-13, vol. 272. US 8,715,988 B2 Page 3

(56) References Cited Borman et al., “Kinetic studies on the reactions of Fusarium galac tose oxidase with five different substrates in the presence of OTHER PUBLICATIONS dioxygen.” Journal of Biological Inorganic Chemistry, 1997, pp. 480-487. Society of Biological Inorganic Chemistry. Adam et al., “Microbial Asymmetric CH Oxidations of Simple Bradford, "A Rapid and Sensitive Method for the Quantitation of Hydrocarbons: A Novel Monooxygenase Activity of the Topsoil Microgram Quantities of Protein Utilizing the Principle of Protein Microorganism Bacillus megaterium," Eur: J. Org. Chem., 2000, pp. Dye Binding.” Analytical Biochemistry, pp. 248-254. 2923-2926, Wiley-VCH Verlag GmbH. Weinheim, Germany. Calderhead, D. et al., “Labeling of Glucose Transporters at the Cell Aisaka et al., “Production of Galactose Oxidase by Gibberella Surface in 3T3-L1 Adipocytes.” The Journal of Biological Chemis fiujikuroi,” Agric. Biol. Chem., 1981, pp. 2311-2316, 45(10). try, Sep. 5, 1988, pp. 12171-12174, vol. 263, No. 25. The American Amaral et al., “Galactose Oxidase of Polyporus circinatus'', 'Meth Society for Biochemistry and Molecular Biology. ods in Enzymology, Carbohydrate Metabolism, 1966, pp. 87-92, vol. Calvin, N. et al., “High-Efficiency Transformation of Bacterial Cells 9, Academic Press Inc., New York, NY, USA. by Electroporation.” Journal of Bacteriology, Jun. 1988, pp. 2796 Anfinsen, “Principles that Govern the Folding of Protein Chains.” 2801, vol. 170, No. 6, American Society for Microbiology. Science, Jul. 20, 1973, pp. 223-230, vol. 181, No. 4096, American Cameron, A., “Two cradles for the heavy elements.” Nature, Jan. 15. Asso for the Advancement of Science, Washington, DC, USA. 1998, pp. 228-231, vol. 39. Appel et al., “A P450 BM-3 mutant hydroxylates alkanes, Capdevila, J. et al., “The Highly Stereoselective Oxidation of Poly cycloalkanes, arenas and heteroarenes.” Journal of Biotechnology, unsaturated Fatty Acids by Cytochrome P450BM-3. The Journal of 2001, pp. 167-171, Elsevier Science B.V. Biological Chemistry, Sep. 13, 1996, pp. 22663-22671, vol. 271, No. Arkin et al., “An algorithm for protein engine ring: Simulations of 37. The American Society for Biochemistry and Molecular Biology, recursive ensemble mutagenesis.” Proc. Natl. Acad. Sci-USA, Aug. Inc. 1992, pp. 7811-7815, vol. 89, Applied Biological Sciences. Carmichael, A. et al., “Protein engineering of Bacillus megaterium Arnold, “Design by Directed Evolution.” Accounts of Chemical CYP102.” Eur: J. Biochem., 2001, pp. 31.17-3125, vol. 268, FEBS. Research, 1998, pp. 125-131, vol. 31, No. 3, American Chemical Castelli, L. et al., “High-level secretion of correctly processed Society. 3-lactamase from Saccharomyces cerevisiae using a high-copy-num Arnold, “Engineering proteins for nonnatural environments.” The ber secretion vector.” Gene, 1994, pp. 113-117, vol. 142, Elsevier FASEB Journal, Jun. 1993, pp. 744-749, vol. 7, No. 6, FASEB, Science B.V. Bethesda, MD, USA. Chang, C. et al., “Evolution of a cytokine using DNA family shuf Arnold et al., “Optimizing Industrial Enzymes by Directed Evolu fling.” Nature Biotechnology, Aug. 1999, pp. 793-797, vol. 17. tion.” Advances in Biochemical Engineering/Biotechnology, 1997. Chang, Y. et al., “Homology Modeling, Molecular Dynamics Simu pp. 1-14, vol. 58, Springer-Verlag, Berlin, Germany. lations, and Analysis of CYP119, a P450 Enzyme from Extreme Arts et al., “Hydrogen Peroxide and Oxygen in Catalytic Oxidation of Acidothermophilic Archaeon Sulfolobus solfataricus,” Biochemis Carbohydrates and Related Compounds.” Synthesis Journal of Syn try, 2000, pp. 2484-2498, vol.39, No. 10, American Chemical Soci thetic Organic Chemistry, Jun. 1997, pp. 597-613. ety. Ashrafet al., “Bacterial oxidation of propane.” FEMS Microbiology Chen, H. et al., “Thermal, Catalytic, Regiospecific Functionalization Letters, 1994, pp. 1-6, Federation of European Microbiological Soci of Alkanes.” Science, Mar. 17, 2000, pp. 1995-1997, vol. 287. eties, Elsevier. Chen, K. et al., “Tuning the activity of an enzyme for unusual envi Avigad, “Oxidation Rates of Some Desialylated Glycoproteins by ronments: Sequential random mutagenesis of Subtilisin E for cataly Galactose Oxidase.” Archives of Biochemistry and Biophysics, Jun. sis in dimethylformamide.” Proc. Natl. Acad. Sci. USA, Jun. 15, 1993, 1985, pp. 531-537, vol. 239, No. 2, Academic Press, Inc. pp. 5618-5622, vol. 90, No. 12. Avigad, “An NADH Coupled Assay System for Galactose Oxidase.” Cherry, J. et al., “Directed evolution of a fungal peroxidase.” Nature Analytical Biochemistry, 1978, pp. 470-476, 86, Academic Press, Biotechnology, Apr. 1999, pp. 379-384, vol. 17. Nature America Inc., Inc. New York, NY, USA. Avigadet al., “The D-Galactose Oxidase of Polyporus circinatus," Christians, F. et al., “Directed evolution of thymidine kinase for AZT Journal of Biological Chemistry, Sep. 1962, pp. 2736-2743, vol. 237. phosphorylation using DNA family shuffling.” Nature Biotechnol No. 9, American Society of Biological Chemists, Baltimore, MD, ogy, Mar. 1999, pp. 259-264, vol. 17. Nature America Inc., NewYork, USA. NY, USA. Barnes, “Maximizing Expression of Eukaryotic Cytochrome P450s Cleland, J. et al., “Cosolvent Assisted Protein Refolding.” in Escherichia coli,” Methods in Enzymology, Cytochrome P450, Biotechnology, Dec. 1990, pp. 1274-1278, vol. 8. Part B, 1996, pp. 3-14, vol. 272, Academic Press, Inc., San Diego, Crameri, A. et al., “Molecular evolution of an arsenate detoxification CA, USA. pathway by DNA shuffling.” Nature Biotechnology, May 1997, pp. Baron et al., “Structure and Mechanism of Galactose Oxidase.” The 436-438, vol. 15, Nature America Inc., New York, NY, USA. Journal of Biological Chemistry, Sep. 23, 1994, pp. 25.095-25105. Crameri, A. et al., “Improved Green Fluorescent Protein by Molecu vol. 269, No. 38, American Soc for Biochemistry and Molecular lar Evolution Using DNA Shuffling.” Nature Biotechnology, Mar. Biology. 1996, pp. 315-319, vol. 14, Nature America Inc., New York, NY. Benson et al., “Regulation of Membrane Peptides by the USA. Pseudomonas Plasmid alk Regulon.” Journal of Bacteriology, Dec. Crameri, A. et al., “Construction and evolution of antibody-phage 1979, pp. 754-762, vol. 140, No. 3. libraries by DNA shuffling.” Nature Medicine, Jan. 1996, pp. 100 Better et al., “Escherichia coli Secretion of an Active Chimeric Anti 106, vol. 2, No. 1. body Fragment.” Science, May 20, 1988, pp. 1041-1043, vol. 240, Dahlhoff, W. et al., “-Glucose or gluco-Hexadialdose from American ASSO for the Advancement of Science, Washington, DC, -Glucurono-6.3-lactone by Controlled Reductions.” Angew. Chem. USA. Int. Ed. Engl., 1980, pp. 546-547, 19 No. 7, Verlag Chemie, GmbH, Boddupalliet al., “Fatty Acid Monooxygenation by P450s: Prod Weinheim, Germany. uct Identification and Proposed Mechanisms for the Sequential Danon, A., et al. “Enrichment of Rat Tissue Lipids with Fatty Acids Hydroxylation Reactions.” Archives of Biochemistry and Biophys that are Prostaglandin Precursors' Biochimica et Biophysica Acta, ics, Jan. 1992, pp. 20-28, vol. 292, No. 1, Academic Press, Inc. 1975, 388: 318-330. Boddupalli et al., “Fatty Acid Monooxygenation by Cytochrome De Bernardez-Clark, E. et al., “Inclusion Bodies and Recovery of P-450.” The Journal of Biological Chemistry, 1990, pp. 4233 Proteins from the Aggregated State.” ACS Symposium Series Protein 4239. The American Society for Biochemistry and Molecular Biol Refolding, 199" Natl Mig American Chemical Society, Apr. 22-27. Ogy. 1990, pp. 1-20, American Chemical Society, Washington, DC, USA. US 8,715,988 B2 Page 4

(56) References Cited Green, J., et al., “Substrate Specificity of Soluble Methane Monooxygenase Mechanistic Implications.” The Journal of Biologi OTHER PUBLICATIONS cal Chemistry, Oct. 25, 1989, pp. 17698-17703, vol. 264, No. 30, The American Society for Biochemistry and Molecular Biology, Inc. Deacon, S. et al., “Enhanced Fructose Oxidase Activity in a Galac Griebenow, K., et al., Lyophilization-Induced Reversible Changes in tose Oxidase Variant.” ChembioChem. A European Journal of the Secondary Structure of Proteins, Proc. Natl. Acad. Sci. USA, Nov. Chemical Biology, 2004, pp. 971-979, 5, Wiley-VCH Verlag GmbH 1995, pp. 10969-10976, vol.92. & Co., Weinheim, Germany. Groves, J., et al., “Models and Mechanisms of Cytochrome P450 Delagrave, S. et al., “Recursive ensemble mutagenesis.” Protein Action.” Cytochrome P450. Structure, Mechanism, and Biochemis Engineering, Apr. 1993, pp. 327-331, vol. 6, No. 3, Oxford Univer try (Second Edition), 1995, pp. 3-48, Plenum Press, New York. sity Press. Guengerich, F., et al., “Purification of Functional Recombinant Delagrave, S. et al., “Searching Sequence Space to Engineer Pro P450s from Bacteria.” Methods in Enzymology, 1996, pp. 35-44, vol. teins: Exponential Ensemble Mutagenesis.” Bio/Technology, Dec. 272, Academic Press, Inc. 1993, pp. 1548-1552, vol. 11, American Society for Cell Biology, Güssow, D., et al., “Direct Clone Characterization from Plaques and Colonies by the Polymerase Chain Reaction.” Nucleic Acids New Orleans, LA, USA. Research, 1989, p. 4000, vol. 17, No. 10, IRL Press. Dordick, J., “Designing Enzymes for Use in Organic Solvents.” Haines, D., et al., “Pivotal Role of Water in the Mechanism of Biotechnol. Prog., 1992, pp. 259-267, 8, American Chemical Society P450BM-3.” Biochemistry, 2001, pp. 13456-13465, vol. 40, No. 45, and American Institute of Chemical Engineers. American Chemical Society. Dower, W. et al., “High efficiency transformation of E. coli by high Hamilton, G.A., et al., “Galactose Oxidase: The Complexities of a voltage electroporation.” Nucleic Acids Research, 1988, pp. 6127 Simple Enzyme.” Oxidases and Related Redox Systems, 1973, pp. 6145, vol. 16, No. 13, IRL Press Limited, Oxford, England. 103-124, vol. 1, University Park Press. Farinas, E., et al., “Directed Evolution of a Cytochrome P450 Hamilton, G.A., et al., “Trivalent Copper, Superoxide and Galactose Monooxygenase for Alkane Oxidation.” Adv. Synth. Catal., 2001, pp. Oxidase.” Journal of the American Chemical Society, Mar. 15, 1978, 601-606, vol. 343, No. 6-7. pp. 1899-1912, vol. 100, No. 6, American Chemical Society. Fiedler, K., et al., The Role of N-Glycans in the Secretory Pathway, Hartmann, M., et al., “Selective Oxidations of Linear Alkanes with Cell, May 5, 1995, pp. 309-312, vol. 81, Cell Press. Molecular Oxygen on Molecular Sieve Catalysts—A Break Fisher, M., et al., “Positional Specificity of Rabbit CYP4B1 for through?.” Agnew. Chem. Int. Ed. 2000, pp. 888-890, vol.39, No. 5. Co-Hydroxylation of Short-Medium Chain Fatty Acids and Hydrocar Haschke, R., et al., "Calcium-Related Properties of Horseradish bons.” Biochemical and Biophysical Research Communications, Peroxidase.” Biochemical and Biophysical Research Communica 1998, pp. 352-355, vol. 248, No. RC988842. tions, Feb. 28, 1978, pp. 1039-1042, vol. 80, No. 4. Academic Press, Fox, B., et al., “Methane Monooxygenase from Methylosinus Inc. trichosporium OB3b.” Methods in Enzymology, 1990, pp. 191-202, Helenius, A., “How N-linked Oligosaccharides Affect Glycoprotein vol. 188, Academic Press, Inc. Folding in the Endoplasmic Reticulum.” Molecular Biology of the Fox, B., et al., “Methane Monooxygenase from Methylosinus Cell, Mar. 1994, pp. 253-265, vol. 5, No. 3. The American Society for trichosporium OB3b Purification and Properties of a Three-Compo Cell Biology. nent System with High Specific Activity from a Type II Hermes, J., et al., “Searching Sequence Space by Definably Random Methanotroph.” The Journal of Biological Chemistry, Jun. 15, 1989, Mutagenesis: Improving the Catalytic Potency of an Enzyme.” Proc. pp. 10023-10033, vol. 264, No. 17. The American Society for Bio Natl. Acad. Sci. USA, Jan. 1990, pp. 696-700, vol. 87. chemistry and Molecular Biology, Inc. Ito, N. et al., "X-Ray Crystallographic Studies of Cofactors in Galac Fruetel, J., et al., “Relationship of Active Site Topology to Substrate tose Oxidase.” Methods in Enzymology, Redox-Active Amino Acids in Specificity for Cytochrome P450 (CYP108).” The Journal of Bio Biology, 1995, pp. 235-262, vol. 258, Academic Press, Inc. logical Chemistry, Nov. 18, 1994, pp. 28815-28821, vol. 269, No. 46, Ito, N. et al., “Crystal Structure of a Free Radical Enzyme, Galactose The American Society for Biochemistry and Molecular Biology, Inc. Oxidase.” Journal of Molecular Biology, 1994, pp. 794-814, vol. 238, Gahmberg C., et al., “Nonmetabolic Radiolabeling and Taggin of No. 5, Academic Press Limited. Glycoconjugates.” Methods in Enzymology, 1994, pp. 32-44, vol. Ito, N. et al., “Novel thioether bond revealed by a 1.7 A crystal 230, Academic Press, Inc. structure of galactose oxidase.” Nature, Mar. 7, 1991, pp. 87-90. GaZaryan, I.G., “Heterologous Expression of Heme-Containing Joo, H. et al., “Laboratory evolution of peroxide-mediated Peroxidases.” Plant Peroxidase Newsletter, Sep.1994, pp. 11-13, No. cytochrome P450 hydroxylation.” Nature, Jun. 17, 1999, pp. 671 4, LABPV Newsletters. 673, vol. 399. Gietz, R., et al., “Studies on the Transformation of Intact Yeast Cells Joo, H. et al., “A high-throughput digital imaging screen for the by the LiAc/SS-DNA/PEG Procedure.” Yeast, Apr. 15, 1995, pp. discovery and directed evolution of oxygenases.” Chemistry & Biol 355-360, vol. 11, No. 4, John Wiley & Sons Ltd. ogy, Oct. 1999, pp. 699-706, vol. 6, No. 10. Gillam, E., et al., “Expression of Cytochrome P450 2D6 in Khoslat, C. et al., “Expression of Intracellular Hemoglobin Improves Escherichia coli, Purification, and Spectral and Catalytic Character Protein Synthesis in Oxygen-Limited Escherichia coli,' Bio/Tech ization.” Archives of Biochemistry and Biophysics, Jun. 1, 1995, pp. nology, Sep. 1990, pp. 849-853, American Society for Cell Biology, 540-550, vol. 319, No. 2, Academic Press, Inc. New Orleans, LA, USA. Giver, L., et al., “Combinatorial Protein Design by in Vitro Recom Kiba, N. et al., “A post-column co-immobilized galactose oxidase? bination.” Current Opinion in Chemical Biology, 1998, pp. 335-338, peroxidase reactor for fluorometric detection of Saccharides in a vol. 2, Current Biology Ltd. liquid chromatographic system.” Journal of Chromatography, 1989, Giver, L., et al., “Directed Evolution of a Thermostable Esterase.” pp. 183-187, vol. 463, Elsevier Science Publishes B.V., Amsterdam, Proc. Natl. Acad. Sci. USA, Oct. 1998, pp. 12809-12813, vol. 95. The Netherlands. Goldman, E., et al., “An Algorithmically Optimized Combinatorial Kim, J. et al., “Use of 4-(Nitrobenzyl)Pyridine (4-NBP) to Test Library Screened by Digital Imaging Spectroscopy.” Biotechnology, Mutagenic Potential of Slow-Reacting Epoxides, Their Correspond Dec. 1992, pp. 1557-1561, vol. 10. ing Olefins, and Other Alkylating Agents.” Bull. Environ. Contam. Graham-Lorence, S., et al., “An Active Site Substitution, F87V, Con Toxicol., 1992, pp. 879-885, vol. 49, Springer-Verlag New York Inc. verts Cytochrome P450 BM-3 into a Regio- and Stereoselective Klibanov, A. et al., “Stereospecific Oxidation of Aliphatic Alcohols (14S, 15R)-Arachidonic Acid Epoxygenase.” The Journal of Biologi Catalyzed by Galactose Oxidase.” Biochemical and Biophysical cal Chemistry, Jan. 10, 1997, pp. 1127-1135, vol. 272, No. 2, The Research Communications, 1982, pp. 804-808, vol. 108, No. 2, Aca American Society for Biochemistry and Molecular Biology, Inc. demic Press, Inc. Gram, H., et al., “In Vitro Selection and Affinity Maturation of Anti Knappik. A. et al., “Engineered turns of a recombinant antibody bodies from a Naive Combinatorial Immunoglobulin Library.” Proc. improve its in vivo folding.” Protein Engineering, Jan. 1995, pp. Natl. Acad. Sci. USA, Apr. 1992, pp. 3576-3580, vol.89. 81-89, vol. 8, No. 1, Oxford University Press. US 8,715,988 B2 Page 5

(56) References Cited Martinez, C. et al., “Cytochrome P450's: Potential Catalysts for Asymmetric Olefin Epoxidations.” Current Organic Chemistry, OTHER PUBLICATIONS 2000, pp. 263-282, vol. 4. No. 3, Bentham Science Publishers B.V. Matson, R. et al., "Characteristics of a Cytochrome P-450-Dependent Koroleva, O. et al., “Properties of Fusarium graminearum Galactose Fatty Acid (O-2Hydroxylase From Bacillus megaterium, "Biochimica Oxidase.” 1984, pp. 500-509, Plenum Publishing Corporation. et Biophysica Acta, 1977, pp. 487-494, 487. Elsevier/North Holland Kosman, D., “Chapter 1 Galactose Oxidase,” in Lontie, R., Eds., Biomedical Press. Copper Proteins and Copper Enzymes vol. II, pp. 1-26, CRC Press, Mazur, A., “Chapter 8, Galactose Oxidase.” ACS Symposium Series Inc., Boca Raton, FL, USA. 466—Enzymes in Carbohydrate Synthesis, 1991, pp. 99-1 10, Ameri Koster, R. et al., “Organoboron Monosaccharides; XII". Quantitative can Chemical Society, Washington, DC, USA. Preparation of D-gluco-Hexodialdose from Sodium D-Glucuronate MaZur, A., et al., "Chemoenzymic Approaches to the Preparation of or D-Glucuronic acid.” Synthesis, Aug. 1982, pp. 650-652, No. 8, 5-C-(Hydroxymethyl)hexoses.”.J. Org. Chem., 1997, pp. 4471-4475, Georg Thieme Verlag. vol. 62, No. 13, American Chemical Society, Washington, DC, USA. Kuchner, O. et al., “Directed evolution of enzyme catalysts.” McPherson, M. et al., “Galactose oxidase of Dactylium dendroides. Biotechnology, Dec. 1997, pp. 523-530, vol. 15, Elsevier Science Gene cloning and sequence analysis.” Chemical Abstract Service, XP-002298.547, Database accession No. M86819. Ltd. McPerson, M. et al., “Galactose Oxidase of Dactylium Kuhn-Velten, W., “Effects of Compatible Solutes on Mammalian dendroides.” Apr. 1992, pp. 8146-8152. The Journal of Biological Cytochrome P450 Stability.” 1997, pp. 132-135, Verlag der Chemistry, vol. 267, No. 12, The American Society for Biochemistry Zeitschrift für Naturforschung. and Molecular Biology, Inc. Kvittingen, L. et al., “Use of Salt Hydrates to Buffer Optimal Water McPherson, M. et al., “Galactose oxidase: Molecular analysis and Level During Lipase Catalysed Synthesis in Organic Media: A Prac mutagenesis studies.” Biochemical Society Transactions, 646' Meet tical Procedure for Organic Chemists.” Tetrahedron, 1992, pp. 2793 ing Leeds, 1993, pp. 1992-1994, vol. 21. The Biochemical Society, 2802, vol. 48, No. 13, Pergamon Press Ltd., Great Britain. Portland Press. Lei. S. et al., “Characterization of the Erwinia carotovora pelB Gene Meinhold, P. et al., “Direct Conversion of Ethane to Ethanol by and Its Product Pectate Lyase.” Journal of Bacteriology, Sep. 1987. Engineered Cytochrome P450 BM3.” ChembioChem, 2005, pp. 1-4. pp. 4379-4383, vol. 169, No. 9, American Society for Microbiology. vol. 6, Wiley-VCH Verlag GmbH & Co. Weinheim, Germany. Leadbetter, E. R. et al. “Incorporation of Molecular Oxygen in Mendonca, M. et al., “Purification and Characterization of Intracel Bacterial Cells Utilizing Hydrocarbons for Growth” Natures; Oct. 31. lular Galactose Oxidase from Dactylium dendroides," Archives of 1959; vol. 184, pp. 1428-1429. Biochemistry and Biophysics, Feb. 1987, pp. 507-514, vol. 252, No. Leung, D. et al., “A Method for Random Mutagenesis of a Defined 2. Academic Press, Inc. DNA Segment Using a Modified Polymerase Chain Reaction.” Tech Mendonca, M. et al., “Role of Carbohydrate Content on the Proper ties of Galactose Oxidase from Dactylium dendroides," Archives of nique, A Journal of Methods in Cell and Molecular Biology, Aug. Biochemistry and Biophysics, Nov. 1988, pp. 427-434, vol. 266, No. 1989, pp. 11-15, vol. 1, No. 1, Saunders Scientific Publications. 2. Academic Press, Inc. Lewis, D., “P450 Substrate Specificity and Metabolism.” Miele, R., et al., "Glycosylation of Asparagine-28 of Recombinant Cytochromes P450. Structure, Function and Mechanism, Aug. 2001, Staphylokinase with High-Mannose-type Oligosaccharides Results pp. 115-166, Taylor & Francis Publishers. in a Protein with Highly Attenuated Plasminogen Activator Activity.” Li, H. et al., “The structure of the cytochrome p450BM-3 haem Journal of Biological Chemistry, Mar. 1999, pp. 7769-7776, vol. 274, domain complexed with the fatty acid Substrate, palmitoleic acid.” No. 12, The American Society for Biochemistry and Molecular Biol Nature Structural Biology, Feb. 1997, pp. 140-146, vol. 4. No. 2. ogy, Inc. Lis, M. et al., “Galactose Oxidase-Glucan Binding Domain Fusion Miles, C. et al., “Protein engineering of cytochromes P-450.” Proteins as Targeting Inhibitors of Dental Plaque Bacteria.” Antimi Biochimica et Biophysica Acta, 2000, pp. 383-407. 1543, Elsevier crobial Agents & Chemotherapy, May 1997, pp. 999-1003, vol. 41. Science B.V. No. 5, American Society for Microbiology. Minshull, J. et al., “Protein evolution by molecular breeding.” Chemi Liu, C. et al., “Sugar-containing Polyamines Prepared Using Galac cal Biology, 1999, pp. 284-290, 3, Elsevier Science Ltd. tose Oxidase Coupled with Chemical Reduction.”.J. Am. Chem. Soc., Mitraki, A. et al., “Amino acid Substitutions influencing intracellular Jan. 20, 1999, pp. 466-467, vol. 121, No. 2, American Chemical pathways.” FEBS Letters, Jul. 1992, pp. 20-25, vol. Society. 307, No. 1, Elsevier Science Publishers B.V. Mannino, S. et al., “Simultaneous Determination of Glucose and Miura, Y. et al., “co-1, (D-2 and (O-3 Hydroxylation of Long-Chain Galactose in Dairy Products by Two Parallel Amperometric Biosen Fatty Acids, Amides and Alcohols by a Soluble Enzyme System from sors.” Italian Journal of Food Science, 1999, pp. 57-65, vol. 11, No. Bacillus megaterium," Biochimica et Biophysica Acta, 1975, pp. 1. Chiriotti Editori, S.p.a., Pinerolo, Italy. 305-317,388, Elsevier Scientific Publishing Company, Amsterdam, Maradufu, A. et al., “A Non-Hydrogen-Bonding Role for the The Netherlands. 4-Hydroxyl Group of Galactose in its Reaction with Galactose Miyazaki, K. et al., “Directed Evolution Study of Temperature Adap Oxidase.” Carbohydrate Research, 1974, pp. 93-99, 32, Elsevier tation in a Psychrophilic Enzyme.” Journal Mol. Biol. 2000, pp. Scientific Publishing Company, Amsterdam, The Netherlands. 1015-1026, 297, Academic Press. Maradufu, A. et al., "Stereochemistry of Dehydrogenation by Ga Modi. S. et al., “NMR Studies of Substrate Binding to Cytochrome lactose Oxidase.” Canadian Journal of Chemistry, Oct. 1971, pp. Piso Comparisons to Cytochrome Paso. Biochemistry, 1995, 3429-3437, vol. 49, No. 19, NCR Research Press, Ottawa, Canada. pp. 8982-8988, vol. 34, No. 28, American Chemical Society. March, J., Advanced Organic Chemistry. Reactions, Mechanisms, Moore, J. et al., “Directed evolution of apara-nitrobenzylesterase for and Structure, 4' Edition, 1992, pp. 882-884. Wiley and Sons, NY. aqueous-organic solvents.” Nature Biotechnology, Apr. 1996, pp. March, J., Advanced Organic Chemistry. Reactions, Mechanisms, 458–467, vol. 14. and Structure, 4' Edition, 1992, pp. 1072-1074, Wiley and Sons, NY. Moore, J. et al., “Strategies for the in vitro Evolution of Protein Martin, B. et al., “Highly Swelling hydrogels from ordered galactose Function: Enzyme Evolution by Random Recombination of based polyacrylates.” Biomaterials, 1998, pp. 69-76, 19(1-3), Improved Sequences.” J. Mol. Biol. 1997, pp. 336-347, 272, Aca Elsevier. demic Press Limited. Martin, I. et al., “Detection of honey adulteration with beet Sugar Moser, C. et al., “Biological Electron Transfer.”s Journal of using stable isotope methodology.” Food Chemistry, 1998, pp. 281 Bioenergetics and Biomembranes, Jun. 1995, pp. 263-274, vol. 27. 286, vol. 61, No. 3, Elsevier Science Ltd. No. 3, Plenum Publishing Corporation. Martineau, P. et al., “Expression of an Antibody Fragment at High Munro, A. et al., “Probing electronic transfer in flavocytochrome Levels in the Bacterial Cytoplasm.” J. Mol. Biol. 1998, pp. 117-127. P-450 BM3 and its component domains.” Eur: J. Biochem., 1996, pp. vol. 280, No. 1, Academic Press. 403-409, 239, FEBS. US 8,715,988 B2 Page 6

(56) References Cited Paulsen, M. et al., “Dramatic Differences in the Motions of the Mouth of Open and Closed Cytochrome P450BM-3 by Molecular Dynamics OTHER PUBLICATIONS Simulations.” Proteins. Structure, Function and Genetics, 1995, pp. 237-243, Wiley-Liss, Inc. Munro, A. et al., “Alkane Metabnolism by Cytochrome P450 BM3.” Peterson, J. et al., “Chapter 5 Bacterial P450s,” Cytochrome P450. Biochemical Society Transactions, 1993, p. 412S, 21. Structure, Mechanism, and Biochemistry, Second Ed., 1995, pp. 151 Murrell, J. et al., “Molecular biology and regulation of methane 180, Plenum Press, New York. monooxygenase.” Arch. Microbiol., 2000, pp. 325-332, 173o. Rathore, D., et al., “Expression of Ribonucleolytic Toxin Nagayama, Y. et al., “Role of Asparagine-linked Oligosaccharides in Restrictocin in Escherichia coli: Purification and Characterization.” Protein Folding, Membrane Targeting, and Thyrotropin and FEBS Letters, 1996, pp. 259-262, vol. 392, Federation of European Autoantibody Binding of the Human Thyrotropin Receptor.” Journal Biochemical Societies. of Biological Chemistry, Dec. 1998, pp. 33423-33428, vol. 273, No. Reynolds, M., et al., “Structure and Mechanism of Galactose 5. The American Society for Biochemistry and Molecular Biology, Oxidase: Catalytic Role of Tyrosine 495.”JBIC, 1997, pp. 327-335, Inc. vol. 2. Nakagawa, S. et al., “Construction of Catalase Deficient Escherichia Rodriguez-Lopez, J., et al., “Role of Arginine 38 in Horseradish coli Strains for the Production of Uricase.” Biosci. Biotech. Biochem. Peroxidase—A Critical Residue for Substrate Binding and Cataly 1996, pp. 415-420, 60 (3), Japanese Society for Bioscience, sis.” The Journal of Biological Chemistry, Feb. 23, 1996, pp. 4023 Biotechnology and Agrochemistry. 4030, vol. 271, No. 8. The American Society for Biochemistry and Nakajima, H. et al., “ Industrial Application of Adenosine Molecular Biology. 5'-Triphosphate Regeneration to Synthesis of Sugar Phosphates.” Romanos, M., et al., “Foreign Gene Expression in Yeast: a Review.” ACS Symposium Series 466, Enzymes in Carbohydrate Synthesis, Yeast, Jun. 1992, pp. 423-488, vol. 8, No. 6, John Wiley & Sons Ltd. Chapter 9, pp. 110-120, American Chemical Society, Washington Root, R., et al., “Enzymatic Synthesis of Unusual Sugars: Galactose DC, 1991, Bednarski & Simon, Editors. Oxidase Catalyzed Stereospecific Oxidation of Polyols,” Journal of Narhi, L. et al., “Identification and Characterization of Two Func the American Chemical Society, 1985, pp. 2997-2999, vol. 107, No. tional Domains in Cytochrome P-450, a Catalytically Self-suf 10, American Chemical Society. ficient Monooxygenase Induced by Barbiturates in Bacillus Ruettinger, R., et al., "Coding Nucleotide, 5' Regulatory, and megaterium," The Journal of Biological Chemistry, May 1987, pp. Deduced Amino Acid Sequences of P-450, a Single Peptide 6683-6690, vol. 262, No. 14. The American Society of Biological Chemists, Inc. Cytochrome P-450:NADPH-P-450 Reductase from Bacillus Narhi, L. et al., "Characterization of a Catalytically Self-sufficient megaterium," The Journal of Biological Chemistry, Jul. 5, 1989, pp. 199,000-Dalton Cytochrome P-450 Monooxygenase Induced by 10987-10995, vol. 264, No. 19, The American Society for Biochem Barbiturates in Bacillus megaterium," The Journal of Biological istry and Molecular Biology, Inc. Chemistry, Jun. 1986, pp. 7160-7169, vol.261, No. 16. The American Ruettinger, R., et al., “Epoxidation of Unsaturated Fatty Acids by a Society of Biological Chemists, Inc. Soluble Cytochrome P-450-dependent System from Bacillus Nelson, D., "Appendix A Cytochrome P450 Nomenclature and megaterium," The Journal of Biological Chemistry, Jun. 10, 1981, Alignment of Selected Sequences.” Cytochrome P450. Structure, pp. 5728-5734, vol. 256, No. 11. Mechanism, and Biochemistry, Second Ed., 1995, pp. 575-606, Ple Said, I.T., et al., “Comparison of Different Techniques for Detection num Press, NY. of Gal-GalNAc, an Early Marker of Colonic Neoplasia.” Histology Ness, J. et al., “DNA shuffling of subgenomic sequences of and Histopathology, Apr. 1999, pp. 351-357, vol. 14, No. 2, Jiménez subtilisin.” Nature Biotechnology, Sep. 1999, pp. 893-896, vol. 17. Godoy, S.A. No. 9, Nature Publishing Group. Savenkova, M., et al. “Improvement of Peroxygenase Activity by Noble, M. et al., “Roles of key active-site residues in Relocation of a Catalytic Histidine within the Active Site of Horse flavocytochrome P450 BM3.” Biochem. J., 1999, pp. 371-379,339, radish Peroxidase.” Biochemistry, 1998, pp. 10828-10836, vol. 37, Biochemical Society. American Chemical Society. Oliphant, A. et al., “Cloning of random-sequence Saysell, C., et al., “Properties of the Trp290His Variant of Fusarium oligodeoxynucleotides.” Gene, 1986, pp. 177-183, 44. Elsevier Sci NRRL 2903 Galactose Oxidase: Interactions of the GOase State ence Publishers B.V. with Different Buffers. Its Redox Activity and Ability to Bind Azide.” Oliver, C. et al., “Engineering the substrate specificity of Bacillus JBIC, 1997, pp. 702-709, vol. 2. megaterium cytochrome P-450 BM3: hydroxylation of alkyl Schatz, P, et al., “Genetic Analysis of Protein Export in Escherichia trimethylammonium compounds.” Biochem. J., 1997, pp. 537-544, coli,'Annual Review of Genetics, 1990, pp. 215-248, vol. 24. Annual 327. The Biochemical Society, London, England. Reviews, Inc., Palo Alto, CA. Omura, T. et al., “The Carbon Monoxide-binding Pigment of Liver Schein C. “Solubility as a Function of Protein Structure and Solvent Microsomes.” The Journal of Biological Chemistry, Jul. 1964, pp. Components.” Bio/Technology, Apr. 1990, pp. 308-317, vol. 8, No. 4. 2370-2378, vol. 239, No. 7. The American Society for Biochemistry Scheller, U., et al., “Characterization of the n-Alkane and Fatty Acid and Molecular Biology. Hydroxylating Cytochrome P450 Forms 52A3 and 52A4.” Archives Ortlepp, S. et al., “Expression and characterization of a protein speci of Biochemistry and Biophysics, Apr. 15, 1996, pp. 245-254, vol.328, fied by a synthetic horseradish peroxidase gene in Escherichia coli.” No. 2, Academic Press, Inc. Journal of Biotechnology, 1989, pp. 353-364. 11, Elsevier Science Schlegel, R., et al., “Substrate Specificity of Galactose Oxidase.” Publishers B.V. Carbohydrate Research, Jun. 1968, pp. 193-199, vol. 7, No. 2, Ost, T. et al., “Rational re-design of the substrate binding site of Elsevier Publishing Company, Amsterdam. flavocytochrome P450 BM3.” FEBS Letters, 2000, pp. 173-177,486, Schmid, A., et al., “Industrial Biocatalysis Today and Tomorrow.” Elsevier Science B.V. Nature, Jan. 11, 2001, pp. 258-268, vol. 409, Macmillian Magazines Ostermeier, M. et al., “Incremental Truncation as a Strategy in the Ltd. Engineering of Novel Biocatalysts.” Bioorganic & Medicinal Chem Schneider, S., et al., “Controlled Regioelectivity of Fatty Acid Oxi istry, 1999, pp. 2139-2144, 7, Elsevier Science Ltd. dation by Whole Cells Producing Cytochrome P450 Parekh, R. et al., “Multicopy Overexpression of Bovine Pancreatic Monooxygenase Under Varied Dissolved Oxygen Concentrations.” Trypsin Inhibitor Saturates the Protein Folding and Secretory Capac Biotechnology and Bioengineering, Aug. 5, 1999, pp. 333-341, vol. ity of Saccharomyces cerevisiae,” Protein Expression and Purifica 64, No. 3, John Wiley & Sons, Inc. tion, 1995, pp. 537-545, 6, Academic Press. Schwaneberg, U., et at... “A Continuous Spectrophotometric Assay Patten, P. et al., “Applications of DNA shuffling to pharmaceuticals for P450 BM-3, a Fatty Acid Hydroxylating Enzyme, and Its Mutant and vaccines.” Biotechnology, 1997, pp. 724-733, vol. 8, Elsevier F87A.” Analytical Biochemistry, 1999, pp. 359-366, vol. 269, Aca Science Ltd. demic Press. US 8,715,988 B2 Page 7

(56) References Cited Tams, J., et al., “Glycosylation and Thermodynamic Versus Kinetic Stability of Horseradish Peroxidase.” FEBS Letters, 1998, pp. 234 OTHER PUBLICATIONS 236, vol. 421, Federation of European Biochemical Societies. Thatcher, D., et al., “Protein Folding in Biotechnology.” Mechanisms Schwaneberg, U., et al., “Cost-Effective Whole-Cell Assay for Labo of Protein Folding, 1994, pp. 229-261, IRL Press, Oxford. ratory Evolution of Hydroxylases in Escherichia coli,” Journal of Tkac, J., et al., “Rapid and Sensitive Galactose Oxidase-Peroxidase Biomolecular Screening, 2001, pp. 111-117, vol. 6, No. 2. The Soci Biosensor for Galactose Detection with Prolonged Stability.” ety for Biomolecular Screening. Biotechnology Techniques, 1999, pp. 931-936, Kluwer Academic Schwaneberg, U., et al., “P450 Monooxygenase in Biotechnology— Publishers. Single-Step, Large-Scale Purification Method for Cytochrome P450 Tonge, G. et al., “Purification and Properties of the Methane Mono BM-3 by Anion-Exchange Chromatography.” Journal of Chroma oxygenasenzyme System from Methylosinus trichosporium OB3b.” tography, 1999, pp. 149-159, vol. 848, Elsevier Science B.V. Biochem. J., 1977, pp. 333-344, vol. 161. Shafikhani, S., et al., “Generation of Large Libraries of Random Tressel, P. et al., “A Simplified Purification Procedure for Galactose Mutants in Bacillus subtilis by PCR-Based Plasmid Multimeriza Oxidase.” Analytical Biochemistry, Jun. 1980, pp. 150-153, vol. 105. tion.” BioTechniques, Aug. 1997, pp. 304-310, vol. 23, No. 2. No. 1, Academic Press, Inc. Shanklin, J., et al., “Mossbauer Studies of Alkane (D-Hydroxylase: Tressel, P. et al., “Galactose Oxidase from Dactylium dendroides, s Evidence for a Diiron Cluster in an Integral-Membrane Enzyme.” Methods in Enzymology, 1982, pp. 163-171, vol.89, Academic Press. Proc. Natl. Acad. Sci. USA, Apr. 1997, pp. 2981-2986, vol. 94. Truan, G., et al., “Thr268 in Substrate Binding and Catalysis in Shao, Z. et al., “Random-priming in Vitro Recombination: An Effec P450BM-3.” Archives of iochemistry and Biophysics, Jan. 1, 1998, tive Tool for Directed Evolution.” Nucleic Acids Research, Jan. 15, pp. 53-64, vol. 349, No. 1, Academic Press. 1998, pp. 681-683, vol. 26, No. 2, Oxford University Press. Vega, F., et al., “On-line Monitoring of Galactoside Conjugates and Shilov, A., et al., “Activation of C-H Bonds by Metal Complexes.” Glycerol by Flow Injection Analysis.” Analytica Chimica Acta, 1998, Chem. Rev., 1997, pp. 2879-2932, vol.97, American Chemical Soci pp. 57-62, vol. 373, Elsevier Science B.V. ety. Vrbová, E., et al., “Preparation and Utilization of a Biosensor Based Shindler, J., et al., “ Peroxidase from Human Cervical Mucus—The on Galactose Oxidase.” Collect. Czech. Chem. Commun., 1992, pp. Isolation and Characterisation.” European Journal of Biochemistry, 2287-2294, vol. 57. Jun. 1976, pp. 325-331, vol. 65, No. 2. Wachter, R., et al., “Molecular Modeling Studies on Oxidation of Sirotkin, K., “Advantages to Mutagenesis Techniques Generating Hexopyranoses by Galactose Oxidase. An Active Site Topology Populations Containing the Complete Spectrum of single Codon Apparently Designed to Catalyze Radical Reactions, Either Con Changes.” J. Theor: Biol., 1986, pp. 261-279, vol. 123, Academic certed or Stepwise.” Journal of the American Chemical Society, Mar. Press Inc. (London) Ltd. 9, 1996, pp. 2782-2789, vol. 118, No. 9. Smith, A., et al., “Expression of a Synthetic Gene for Horseradish Watkinson, R., et al., “Physiology of Aliphatic Hydrocarbon-Degrad Peroxidase C in Escherichia coli and Folding and Activation of the ing Microorganisms,” Biodegradation, 1990, pp. 79-92, vol. 1, Nos. Recombinant Enzyme with Ca" and Heme.” The Journal of Biologi 2/3, Kluwer Academic Publishers. cal Chemistry, Aug. 5, 1990, pp. 13335-13343, vol. 265, No. 22. The Welinder, K., “Amino Acid Sequence Studies of Horseradish American Society for Biochemistry and Molecular Biology. Peroxidase.” European Journal of Biochemistry, 1979, pp. 483-502. Smith, A., et al., “Substrate Binding and Catalysis in Heme Welinder, K., “Supplement to Amino Acid Sequence Studies of Peroxidases.” Current Opinion in Chemical Biology, (1998), pp. Horseradish Peroxidase.” pp. 495-502. 269-278, vol. 2. Wetzel, R., et al., “Mutations in Human Interferon Gamma Affecting Spiro, T., et al., “Is the CO Adduct of Myoglobin Bent, and Does It Inclusion Body Formation Identified by a General Immunochemical Matter?.” Accounts of Chemical Research, 2001, pp. 137-144, vol. Screen.” Bio/Technology, Aug. 1991, pp. 731-737, vol. 9. 34, No. 2, American Chemical Society. Whittaker, M., et al., “The Active Site of Galactose Oxidase.” The Staijen, I., et al., “Expression, Stability and Performance of the Journal of Biological Chemistry, 1988, pp. 6074-6080, vol. 263, No. Three-Component Alkane Mono-oxygenase of Pseudomonas 13, The American Society for Biochemistry and Molecular Biology, oleovorans in Escherichia coli,' Eur: J. Biochem., 2000, pp. 1957 Inc. 1965, vol. 267. Whittaker, M., et al., “Kinetic Isotope Effects as probes of the Mecha Stemmer, W., “DNA Shuffling by Random Fragmentation and Reas nism of Galactose Oxidase.” Biochemistry, 1998, pp. 8426-8436, vol. sembly. In Vitro Recombination for Molecular Evolution.” Proc. 37. American Chemical Society. Natl. Acad. Sci. USA, Oct. 25, 1994, pp. 10747-10751, vol. 91, No. Wilkinson, D., et al., “Structural and Kinetic Studies of a Series of 22. Mutants of Galactose Oxidase Identified by Directed Evolution.” Stemmer, W., “Rapid Evolution of a Protein In Vitro by DNA Shuf Protein Engineering, Design & Selection, Jan. 12, 2004, pp. 141-148, fling.” Nature, Aug. 4, 1994, pp. 389-391, vol. 370, No. 6488. vol. 17. No. 2, Oxford University Press. Stemmer, W., et al., “Selection of an Active Single Chain Fv Antibody Yang, G., et al., “Gal-GalNAc: A biomarker of Colon from a Protein Linker Library Prepared by Enzymatic Inverse PCR.” Carcinogenesis.” Histology and Histopathology, 1996, pp. 801-806, BioTechniques, 1993, pp. 256-265, vol. 14, No. 2. vol. 11. Stevenson, J., et al., “The Catalytic Oxidation of Linear and Branched Yano, T., et al., “Directed Evolution of an Aspartate Alkanes by Cytochrome P450.” J. Am. Chem. Soc., 1996, pp. Aminotransferase with New Substrate Specificities.” Proc. Natl. 12846-12847. vol. 118, No. 50, American Chemical Society. Acad. Sci. USA, May 1998, pp. 551 1-5515, vol. 95. Studier, F., et al., “Use of T7 RNA Polymerase to Direct Expression Yeom, H., et al., "Oxygen Activation by Cytochrome P450: of Cloned Genes.” Methods in Enzymology, 1990, pp. 60-89, vol. Effects of Mutating an Active Site Acidic Residue.” Archieves of 185, Academic Press, Inc. Biochemistry and Biophysics, Jan. 15, 1997, pp. 209-216, vol. 337. Sun, L., et al., “Expression and Stabilization of Galactose Oxidase in No. 2, Academic Press. Escherichia coli by Directed Evolution.” Protein Engineering, Sep. You, L., et al., “Directed Evolution of Subtilisin E in Bacillus subtilis 2001, pp. 699-704, vol. 14, No. 9, Oxford University Press. to Enhance Total Activity in Aqueous Dimethylformamide.” Protein Sun, L., et al., “Modification of Galactose Oxidase to Introduce Engineering, 1996, pp. 77-83, vol. 9, Oxford University Press. Glucose 6-Oxidase Activity.” ChembioChem. A European Journal of Zhang, J., et al., “Directed Evolution of a Fucosidase from a Chemical Biology, Aug. 2, 2002, pp. 781-783, vol. 3, No. 8, Wiley Galactosidase by DNA Shuffling and Screening.” Proc. Natl. Acad. VCH-Vertag GmbH. Weinheim, Germany. Sci. USA, Apr. 1997, pp. 4504-4509, vol. 94. Szabó, E., et al., “Application of Biosensor for Monitoring Galactose Zhang, T., et al., “Circular Permutation of T4 Lysozyme.” Biochem Content.” Biosensors & Bioelectronics, 1996, pp. 1051-1058, vol. 11, istry, 1993, pp. 12311-12318, vol. 32, No. 46, American Chemical No. 10, Elsevier Science Limited. Society. US 8,715,988 B2 Page 8

(56) References Cited Zhao, H., et al., “Optimization of DNA Shuffling for High Fidelity Recombination.” Nucleic Acids Research, 1997, pp. 1307-1308, vol. OTHER PUBLICATIONS 25, No. 6, Oxford University Press. Zimmer, T., et al., “The CYP52 Multigene Family of Candida Zhao, H., et al., “Directed Evolution Converts Subtilisin E into a Functional Equivalent of Thermitase.” Protein Engineering, 1999, maltosa Encodes Functionally Diverse n-Alkane-Inducible pp. 47-53, vol. 12, No. 1, Oxford University Press. Cytochromes P450.” Biochemical and Biophysical Research Com Zhao, H., et al., “Methods for Optimizing Industrial Enzymes by munications, 1996, pp. 784-789, vol. 224, No. 3, Academic Press, Directed Evolution.” Manual of Industrial Microbiology and Inc. Biotechnology (2d Ed.), 1999, pp. 597-604, ASM Press, Washington, D.C. XP-002298548, “Protein Sequence.” Database accession No. Zhao, H., et al., “Molecular Evolution by Staggered Extension Pro 355884-87-6. cess (StEP) in Vitro Recombination.” Nature Biotechnology, Mar. 1998, pp. 258-261, vol. 16. * cited by examiner

U.S. Patent May 6, 2014 Sheet 2 of 40 US 8,715,988 B2

Current Chromatogram(s) ECD A, (PETER2004.041022O.D) ECDA, (PETER2004\0410222.D) ECD1A, (PETER2004,04102220 D) ECD1 A (PETER200404102226.D) ECD1 A (PETER2004\04102205,D) /ethane Xn, with 500 nM protein

ethane rxn. with 200 nM protein

10000 ethane Xn, with 100 nM protein Control Xn. Without ethane

Figure 2 U.S. Patent May 6, 2014 Sheet 3 of 40 US 8,715,988 B2

3888

g --- 3:: : & 3 U.S. Patent May 6, 2014 Sheet 4 of 40 US 8,715,988 B2

Masks AWAY. AM Wy R W 210 25 20 SS 190 185 180 ppm

diluyi ld A. L'A' EE

ye l, lity

Figure 4 U.S. Patent May 6, 2014 Sheet 5 of 40 US 8,715,988 B2

0.4

0.3

0.2

0.1 350 400 450 500 N (nm)

Figure 5

U.S. Patent May 6, 2014 Sheet 12 of 40 US 8,715,988 B2

improved Methane Hydroxylation Activity in a 35-E11 Mutant 0.7 35-E11- E464R

500 510 520 530 540 550 560 570 580 590 600 Wavelength (nm)

Figure 7

U.S. Patent May 6, 2014 Sheet 24 of 40 US 8,715,988 B2

3. y U.S. Patent May 6, 2014 Sheet 25 of 40 US 8,715,988 B2

U.S. Patent May 6, 2014 Sheet 26 of 40 US 8,715,988 B2

33:8 8 :

U.S. Patent May 6, 2014 Sheet 29 of 40 US 8,715,988 B2

4 Ol 450 SEQ1 HEME (384) AQHAEKBEGNGQRNSGQQEALHEATLVLGMRKHEDFEDHTNSELDI SEQ2 HEME (386) QWEHHAYKBEGNGQRNGOGMQEALHEATLVLGMEKSETLIDHENSELDI SEQ3 HEME (386) SHHaYKRFGNGORNGcGMQEALQESTVLGEKHEELINHTGSELKI SEQ4 HEME (389) KVEHHaYKBFGNGQRACGMOFALHEATLVMGMEQHFEFIDEDSQLDS SEQ5 HEME (393) KDEAHAYMEGGGERNGEGROFALTENKLALAIMERNEAFQDPHDSQFR. SEQ6 HEME (396) KRINAWKBFGNGQRAGGRGFAHENALALGMEQREKLIDHQRSQMHP SEQ7 HEME (384 ) As QHAKEEGNGORASGQQEALHEATLVLGMMKHEDFEDHTNSELDIAs Consensus (401 ) IP HAYKPFGNGORACIG OFALHEATLWLGMMLKHFDFIDH YELDI 45. 472 SEQ1 HEME (434) KETELKEEGFVVKAKSKKIPL SEQ2 HEME (436) KQTELKGDFHSVQSRHQ-- SEQ3 HEME (436) KEAEKEEDFKTVKPRKT-- SEQ4 HEME (439) KQTELKEGDFKRIVP- - - - - SEQ5 HEME (443) KETERKPDQFV------SEQ6 HEME (446) KETDMKBEG------SEQ7 HEME (434) KETELKEEGFVNKAKSKKIPL Consensus (451) KETLTLKPE F I K

Figure 3 1C U.S. Patent May 6, 2014 Sheet 30 of 40 US 8,715,988 B2

Figure 32 : Protein Similarity Scores Panel A: Full Length Protein SEQ ID i 2 3 NO :

l 54.57 3303 3272 3375 1860 2400 S3S2

2 S536 3.395 4208 1812 25.4 325

5480 3347 1931 2386 3226

5548 1848 2437 330

5523 1964 1831

5607 2372

5453

Panel B : Heme Domain (amino acids 1-455) SEQ ID 1. 2 3 4. 5 NO:

l 2386 1610 985 143 2293

2 2397 1675 lig77 969 1233 551

2386 1.65 997 1158 568

2386 1007 214 SS3

2361 979 959

2386 118

23.90 U.S. Patent May 6, 2014 Sheet 31 of 40 US 8,715,988 B2

Figure 33: Protein k Identity Panel A: Full Length Protein SEQ ID l 2 3 4. 5 6 7 NO:

1. 100 59 58 60 40 47 98

2 100 60 75 39 48 58

3 100 59 4. 46 57

4. LOO 38 46 59

5 100 40 39

6 OO 46

7 100

Panel B : Heme Domain (amino acids 1 - 455) SEQ ID l 2 3 4 5 6 7 NO;

1. 100 66 67 67 45 SO 97

2 00 69 82 45 53 65

3 100 58 46 50 66

4. 100 47 52 64

5 100 45 44

6 OO 49

7 100 U.S. Patent May 6, 2014 Sheet 32 of 40 US 8,715,988 B2

*::::::: 388-8 U.S. Patent May 6, 2014 Sheet 33 of 40 US 8,715,988 B2

*:::::::: 388-3: U.S. Patent May 6, 2014 Sheet 34 of 40 US 8,715,988 B2

&

- S

8.

|

*::::::: 3: U.S. Patent May 6, 2014 Sheet 35 of 40 US 8,715,988 B2

U.S. Patent May 6, 2014 Sheet 36 of 40 US 8,715,988 B2

*:::::::: U.S. Patent May 6, 2014 She et 37 of 40 US 8,715,988 B2

A g 400 S 350 30030 to 250 200 150 S g 100 50 OL. O. o O O O S > > w s O. V)

120 pMMO 120 Y-5 PMO 1OO 100 & 8060 9 s 40 20 t O 0 1 2 3 4 5 6 7 8 9 101112 0 1 2 3 4 5 6 7 8 9 10 1112 alkane chain length alkane chain length

120 BMO 120 cyp153(A6) 100 a 100 80 80 S 60 g s 60 2 40 2 40 20 20 h O s O O 1 2 3 4 5 6 7 8 9 101112 0 1 2 3 4 5 6 7 8 9 10 1112 alkane chain length alkane chain length

Figure 40 U.S. Patent May 6, 2014 Sheet 38 of 40 US 8,715,988 B2

C- C-Br C - 3. C

*:::::::::: U.S. Patent May 6, 2014 Sheet 39 of 40 US 8,715,988 B2

Figure 42 U.S. Patent May 6, 2014 Sheet 40 of 40 US 8,715,988 B2

*888. *::::::::: 33 US 8,715,988 B2 1. 2 ALKANE OXDATION BY MODIFIED there is a need for modified hydroxylases that have the ability HYDROXYLASES to efficiently hydroxylate alkanes in vivo. In addition, there is a need for cells that can express such modified hydroxylases CROSS REFERENCE TO RELATED while producing recoverable quantities of alkane-derived APPLICATIONS 5 alcohols. This application claims priority under 35 U.S.C. 120 as a SUMMARY continuation-in-part of PCT/US2006/01 1273, filed Mar. 28, 2006, which application claims priority to U.S. Provisional Provided herein are polypeptides that convert alkanes to Application Ser. No. 60/665,903 filed Mar. 28, 2005, and to 10 alcohols. Also provided are nucleic acid molecules that U.S. Provisional Application Ser. No. 60/698,872 filed Jul. encode such polypeptides, cells expressing Such polypep 13, 2005, and also to U.S. Provisional Application Ser. No. tides, and methods of synthesizing alcohols from a Suitable 60/700,781 filed Jul. 20, 2005. The application also claims alkane Substrate. Accordingly, in various embodiments, iso priority to U.S. Provisional Application Ser. No. 60/900,243, lated or recombinant polypeptides comprising residues 1 to filed Feb. 8, 2007, entitled, “Engineered Cytochromes P450 15 about 455 of the amino acid sequence set forth in SEQ ID with Tailored Propane Specificity and Haloalkanes Dechalo NO:1, 2, 3, 4, 5 or 6, are provided. The polypeptides include genase Activity.” All of the above disclosures are incorpo up to 50, 25, 10, or 5 conservative amino acid substitutions rated herein by reference. excluding residues: (a)47,78, 82,94, 142, 175, 184,205,226, 236,252,255,290,328, and 353 of SEQID NO:1; (b) 48,79, STATEMENTS REGARDING FEDERALLY 2O 83, 95, 143, 176, 185, 206, 227, 238,254, 257,292,330, and SPONSORED RESEARCH 355 of SEQID NO:2 or SEQID NO:3; (c)49,80, 84,96, 144, 177, 186, 207,228,239, 255,258, 293,331, and 357 of SEQ The invention was funded in part by Grant No. ID NO:4; (d) 54,85, 89, 101, 149, 182, 191, 212, 232, 243, BES-03 13567 awarded by National Science Foundation 259,262,297,336, and 361 of SEQID NO:5; and (e) 51, 82, (NSF) and Grant No. DAAD19-03-0004 awarded by the U.S. 25 86, 98,146, 179, 188,208, 231, 242,258, 262,296,337, and Army. The government may have certain rights in the inven 363 of SEQID NO:6. The amino acid sequence includes the tion. following residues: (1) a Z1 amino acid residue at positions: (a) 47, 82, 142, TECHNICAL FIELD 205,236,252, and 255 of SEQID NO:1; (b)48, 83, 143,206, 30 238,254, and 257 of SEQID NO:2 or SEQID NO:3; (c) 49, This invention relates to modified hydroxylases. The 84, 144, 207,239, 255, and 258 of SEQID NO:4; (d) 54, 89, invention further relates to cells expressing such modified 149, 212, 243, 259, and 262 of SEQID NO:5; and (e) 51, 86, hydroxylases and methods of producing hydroxylated 146, 208, 242,258, and 262 of SEQID NO:6; alkanes by contacting a suitable Substrate with Such modified (2) a Z2 amino acid residue at positions: (a) 94, 175, 184, hydroxylases and/or such cells. Also included are modified 35 290, and 353 of SEQID NO:1; (b) 95, 176, 185,292, and 355 hydroxylases comprising unique regioselectivities. of SEQID NO:2 or SEQID NO:3; (c) 96, 177, 186, 293, and 357 of SEQID NO:4; (d) 101, 182, 191,297, and 361 of SEQ BACKGROUND ID NO:5; and (e) 98,179, 188,296, and 363 of SEQID NO:6; (3) a Z3 amino acid residue at position: (a) 226 of SEQID Hydroxylation of linear alkanes has the important practical 40 NO:1; (b) 227 of SEQID NO:2 or SEQID NO:3; (c) 228 of implication of providing valuable intermediates for chemical SEQID NO:4; (d) 232 of SEQID NO:5; and (e) 231 of SEQ synthesis. Nevertheless, selective oxyfunctionalization of ID NO:6; and hydrocarbons remains one of the great challenges for con (4) a Z4 amino acid residue at positions: (a) 78 and 328 of temporary chemistry. Many chemical methods for hydroxy SEQID NO:1; (b) 79 and 330 of SEQ ID NO:2 or SEQ ID lation require severe conditions of temperature or pressure, 45 NO:3; (c)80 and 331 of SEQID NO:4; (d)85 and 336 of SEQ and the reactions are prone to over-oxidation, producing a ID NO:5; and (e) 82 and 337 of SEQID NO:6. range of products, many of which are not desired. In general, a Z1 amino acid residue includes glycine (G), Enzymes are an attractive alternative to chemical catalysts. asparagine (N), glutamine (Q), serine (S), threonine (T), In particular, monooxygenases have the ability to catalyze the tyrosine (Y), or cysteine (C). A Z2 amino acid residue specific hydroxylation of non-activated C-H bonds. These 50 includes alanine (A), Valine (V), leucine (L), isoleucine (I), cofactor-dependent oxidative enzymes have multiple proline (P), or methionine (M). A Z3 amino acid residue domains and function via complex electron transfer mecha includes lysine (K), or arginine (R). A Z4 amino acid residue nisms to transport a reduction equivalent to the catalytic cen includes tyrosine (Y), phenylalanine (F), tryptophan (W), or ter. Exemplary monooxygenases include the cytochrome histidine (H). P450 monooxygenases (“P450s). The P450s are a group of 55 In other embodiments, the polypeptide further includes a widely-distributed heme-containing enzymes that insert one Z3 amino acid residue at position: (a) 285 of SEQID NO:1; oxygen atom from diatomic oxygen into a diverse range of (b) 287 of SEQID NO:2 or 3; (c) 288 of SEQID NO:4; (d) hydrophobic Substrates, often with high regio- and stereose 292 of SEQ ID NO:5; and (e) 291 of SEQ ID NO:6. A Z3 lectivity. Their ability to catalyze these reactions with high amino acid residue includes lysine (K), arginine (R), or his specificity and selectivity makes P450s attractive catalysts for 60 tidine (H). In some aspects, the amino acid residue at this chemical synthesis and other applications, including oxida position is an arginine (R). tion chemistry. In yet another embodiment, the polypeptide comprises, in Despite the ability of these enzymes to selectively addition to one or more of the residue replacements above, an hydroxylate a wide range of compounds, including fatty isoleucine at position 52, a glutamic acid at position 74. acids, aromatic compounds, alkanes, alkenes, and natural 65 proline at position 188, a valine at position 366, analanine at products, only a few members of this large Superfamily of position 443, a glycine at position 698 of SEQ ID NO:1; proteins are capable of hydroxylating alkanes. Accordingly, isoleucine at position 53, a glutamic acid at position 75, US 8,715,988 B2 3 4 proline at position 189, a valine at position 368 of SEQ ID NO:2 or SEQ ID NO:3; (c) 186, 293 and 357 of SEQ ID NO:2; an isoleucine at position 53, a glutamic acid at position NO:4; (d) 191,297, and 361 of SEQID NO:5; (e) 188,296, 75, proline at position 189, a valine at position 368 of SEQID and 363 of SEQID NO:6; NO:3: an isoleucine at position 55, a glutamic acid at position (6) an isoleucine (I) amino acid residue at position: (a) 94 77, proline at position 191, a valine at position 371 of SEQID and 175 of SEQID NO:1; (b) 95 and 176 of SEQID NO:2 or NO:4; an isoleucine at position 59, a glutamic acid at position SEQID NO:3; (c) 96 and 177 of SEQID NO:4; (d) 101 and 81, proline at position 195, a valine at position 374 of SEQID 182 of SEQID NO:5; (e) 98 and 179 of SEQID NO:6; and NO:5; an isoleucine at position 56, a glutamic acid at position (7) a phenylalanine (F) amino acid residue at position: (a) 78, proline at position 192, a valine at position 376 of SEQID 78 and 328 of SEQID NO:1; (b)79 and 330 of SEQID NO:2 10 or SEQID NO:3; (c) 80 and 331 of SEQID NO:4; (d) 85 and NO:6; and an isoleucine at position 52, a glutamic acid at 336 of SEQID NO:5; (e) 82 and 337 of SEQID NO:6. position 74, proline at position 188, a valine at position 366 of In another embodiment, an isolated or recombinant SEQID NO:7. polypeptide that includes residues 1 to about 455 of the amino In general, polypeptides provided herein display hydroxy acid sequence set forth in SEQ ID NO:1, 2, 3, 4, 5 or 6 is lase activity that converts an alkane to an alcohol. In general, 15 provided. The polypeptide includes, at least, 80%, 85%, 90%, alkanes include methane (CH), ethane (CH), propane or 95% of the amino acid residues in the amino acid sequence (CHs), butane (CHo), pentane (C5H12), hexane (CH4). at the following positions: heptane (C7H), octane (CHs), nonane (CoHo), decane (1) a serine (S) residue at position: (a) 82, 142, and 255 of (CoH 22), undecane (CH2-), and dodecane (C12H2). Also SEQID NO:1; (b) 83, 143 and 257 of SEQID NO:2 or SEQ in general, alcohols include methanol, ethanol, propanol, ID NO:3; (c) 84, 144, and 258 of SEQID NO:4; (d) 89, 149, butanol, pentanol, hexanol, heptanol, octanol, nonanol, and 262 of SEQ ID NO:5; (e) 86, 146 and 262 of SEQ ID decanol, undecanol, and dodecanol. NO:6; In other embodiments, the amino acid sequence of the (2) a cysteine (C) amino acid residue at position: (a) 47 and polypeptide includes residues at the following positions: 205 of SEQID NO:1; (b) 48 and 206 of SEQID NO:2 or SEQ (1) a glycine (G), glutamine (Q), serine (S), threonine (T), 25 ID NO:3; (c) 49 and 207 of SEQID NO:4; (d) 54 and 212 of or cysteine (C) amino acid residue at position: (a) 47.82, 142, SEQID NO:5; (e) 51 and 208 of SEQID NO:6; 205,236,252, and 255 of SEQID NO:1; (b)48, 83, 143,206, (3) a glutamine (Q) amino acid residue at position: (a) 236 238,254, and 257 of SEQID NO:2 or SEQID NO:3; (c)49, of SEQID NO:1; (b) 238 of SEQID NO:2 or SEQID NO:3: 84, 144, 207,239, 255, and 258 of SEQID NO:4; (d) 54, 89, (c) 239 of SEQID NO:4; (d) 243 of SEQID NO:5; (e) 242 of 149,212, 243, 259, and 262 of SEQID NO:5; and (e) 51, 86, 30 SEQID NO:6; 146, 208, 242, 258, and 262 of SEQID NO:6; (4) a glycine (G)amino acid residue at position: (a) 252 of (2) a valine (V) or isoleucine (I) amino acid residue at SEQID NO:1; (b) 254 of SEQID NO:2 or SEQID NO:3; (c) position: (a) 94, 175, 184,290, and 353 of SEQID NO:1; (b) 255 of SEQ ID NO:4; (d) 259 of SEQ ID NO:5; (e) 258 of 95, 176, 185,292, and 355 of SEQID NO:2 or SEQID NO:3: SEQID NO:6; (c) 96, 177, 186, 293, and 357 of SEQID NO:4; (d) 101, 182, 35 (5) a valine (V) amino acid residue at position: (a) 184,290 191,297, and 361 of SEQID NO:5; and (e) 98,179, 188,296, and 353 of SEQID NO:1; (b) 185,292, and 355 of SEQID and 363 of SEQID NO:6; NO:2 or SEQ ID NO:3; (c) 186, 293 and 357 of SEQ ID (3) an arginine amino acid residue at position: (a) 226 of NO:4; (d) 191,297, and 361 of SEQID NO:5; (e) 188,296, SEQID NO:1; (b) 227 of SEQID NO:2 or SEQID NO:3; (c) and 363 of SEQID NO:6; 228 of SEQID NO:4; (d) 232 of SEQID NO:5; and (e) 231 40 (6) an isoleucine (I) amino acid residue at position: (a) 94 of SEQID NO:6; and and 175 of SEQID NO:1; (b) 95 and 176 of SEQID NO:2 or (4) a phenylalanine (F) or histidine (H)amino acid residue SEQID NO:3; (c) 96 and 177 of SEQID NO:4; (d) 101 and at position: (a)78 and 328 of SEQID NO:1; (b)79 and 330 of 182 of SEQID NO:5; (e) 98 and 179 of SEQID NO:6; and SEQID NO:2 or SEQID NO:3; (c) 80 and 331 of SEQ ID (7) a phenylalanine (F) amino acid residue at position: (a) NO:4; (d) 85 and 336 of SEQID NO:5; and (e) 82 and 337 of 45 78 and 328 of SEQID NO:1; (b)79 and 330 of SEQID NO:2 SEQID NO:6. or SEQID NO:3; (c) 80 and 331 of SEQID NO:4; (d) 85 and In other embodiments, the amino acid sequence of the 336 of SEQID NO:5; (e) 82 and 337 of SEQID NO:6; and polypeptide includes residues at the following positions: (8) an arginine (R) amino acid residue at position: (a) 226 (1) a serine (S) residue at position: (a) 82, 142, and 255 of of SEQID NO:1; b) 227 of SEQID NO:2 or SEQID NO:3: SEQID NO:1; (b) 83, 143 and 257 of SEQID NO:2 or SEQ 50 (c) 228 of SEQID NO:4; (d) 232 of SEQID NO:5; (e) 231 of ID NO:3; (c) 84, 144, and 258 of SEQID NO:4; (d) 89, 149, SEQID NO:6. and 262 of SEQ ID NO:5; (e) 86, 146 and 262 of SEQ ID In other embodiments, the polypeptide further includes a NO:6; Z3 amino acid residue at position: (a) 285 of SEQID NO:1; (2) a cysteine (C) amino acid residue at position: (a) 47 and (b) 287 of SEQID NO:2 or 3; (c) 288 of SEQID NO:4; (d) 205 of SEQID NO:1; (b) 48 and 206 of SEQID NO:2 or SEQ 55 292 of SEQ ID NO:5; and (e) 291 of SEQ ID NO:6. A Z3 ID NO:3; (c) 49 and 207 of SEQID NO:4; (d) 54 and 212 of amino acid residue includes lysine (K), arginine (R), or his SEQID NO:5; (e) 51 and 208 of SEQID NO:6; tidine (H). In some aspects, the amino acid residue at this (3) a glutamine (Q) amino acid residue at position: (a) 236 position is an arginine (R). of SEQID NO:1; (b) 238 of SEQID NO:2 or SEQID NO:3: In yet another embodiment, an isolated or recombinant (c) 239 of SEQID NO:4; (d) 243 of SEQID NO:5; (e) 242 of 60 polypeptide that includes residues 456-1048 of SEQ ID SEQID NO:6; NO:1, 456-1059 of SEQ ID NO:2, 456-1053 of SEQ ID (4) a glycine (G)amino acid residue at position: (a) 252 of NO:3, 456-1064 of SEQ ID NO:4, 456-1063 of SEQ ID SEQID NO:1; (b) 254 of SEQID NO:2 or SEQID NO:3; (c) NO:5, or 456-1077 of SEQ ID NO:6, is provided. The 255 of SEQ ID NO:4; (d) 259 of SEQ ID NO:5; (e) 258 of polypeptide includes an amino acid sequence with up to 65. SEQID NO:6; 65 40, 25, or 10 conservative amino acid substitutions excluding (5) a valine (V) amino acid residue at position: (a) 184,290 residues: (a) 464, 631, 645,710 and 968 of SEQID NO:1; (b) and 353 of SEQID NO:1; (b) 185,292, and 355 of SEQID 475, 641, 656, 721 and 980 of SEQID NO:2; (c) 467, 634, US 8,715,988 B2 5 6 648, 713 and 972 of SEQID NO:3; (d) 477,644,659,724 and alanine (A), serine (S), and glutamic acid (E); wherein Z1 is 983 of SEQID NO:4; (e) 472, 640, 656,723 and 985 of SEQ an amino acid residue selected from the group consisting of IDNO:5; and (f)480,648,664,733 and 997 of SEQID NO:6. glycine (G), asparagine (N), glutamine (Q), serine (S), threo The amino acid sequence includes the following residues: nine (T), tyrosine (Y), and cysteine (C); wherein Z2 is an (1) a Z1, Z3, Z4, or Z5 amino acid residue at position: (a) 5 amino acid residue selected from the group consisting of 464 of SEQ ID NO:1; (b) 475 of SEQ ID NO:2; (c) 467 of alanine (A), Valine (V), leucine (L), isoleucine (I), proline (P), SEQID NO:3; (d) 477 of SEQID NO:4; (e) 472 of SEQ ID and methionine (M); wherein Z3 is an amino acid residue NO:5; and (f) 480 of SEQID NO:6; selected from the group consisting of lysine (K), and arginine (2) a Z1 amino acid residue at position: (a) 631 and 710 of (R); and wherein Z4 is an amino acid residue selected from SEQID NO1; (b) 641 and 721 of SEQID NO:2; (c) 634 and 10 the group consisting of tyrosine (Y), phenylalanine (F), tryp 713 of SEQID NO:3; (d) 644 and 724 of SEQID NO:4; (e) tophan (W), and histidine (H), and wherein the polypeptide 640 and 723 of SEQID NO:5; and (f) 648 and 733 of SEQID catalyzes the conversion of an alkane to an alcohol. NO:6; In another embodiment, an isolated or recombinant (3) a Z3 amino acid residue at position: (a) 645 of SEQID polypeptide that includes the amino acid sequence set forth in NO:1; (b) 656 of SEQID NO:2; (c) 648 of SEQID NO:3; (d) 15 SEQID NO:7 with up to 75,50, 25, or 10 conservative amino 659 of SEQID NO:4; (e) 656 of SEQID NO:5; and (f) 664 of acid substitutions excluding residues 47,78, 82.94, 142, 175, SEQID NO:6; and 184,205, 226, 236,252, 255, 290, 328, 353,464, and 710, is (4) a Z2 amino acid residue at position: (a) 968 of SEQID provided. The polypeptide can display hydroxylase activity NO:1; (b) 980 of SEQID NO:2; (c) 972 of SEQID NO:3; (d) that converts an alkane to an alcohol. 983 of SEQID NO:4; (e) 985 of SEQID NO:5; and (f) 997 of In yet another embodiment, an isolated or recombinant SEQID NO:6. Z1 is an amino acid residue selected from the polypeptide that includes the amino acid sequence set forth in group consisting of glycine (G), asparagine (N), glutamine SEQID NO:8 with up to 75,50, 25, or 10 conservative amino (Q), serine (S), threonine (T), tyrosine (Y), and cysteine (C); acid substitutions excluding residues 47,78, 82.94, 142, 175, Z2 is an amino acid residue selected from the group consist 184,205, 226, 236,252, 255, 290, 328, 353,464, and 710, is ing of alanine (A), Valine (V), leucine (L), isoleucine (I), 25 provided. proline (P), and methionine (M); Z3 is an amino acid residue In another embodiment, an isolated or recombinant selected from the group consisting of lysine (K), and arginine polypeptide that includes the amino acid sequence set forth in (R); Z4 is an amino acid residue selected from the group SEQID NO:9 with up to 75,50, 25, or 10 conservative amino consisting of tyrosine (Y), phenylalanine (F), tryptophan acid substitutions excluding residues 47,78, 82.94, 142, 175, (W), and histidine (H); and is an amino acid residue selected 30 184,205, 226, 236,252, 255, 290, 328, 353,464, and 710, is from the group consisting of threonine (T), Valine (V), and provided. isoleucine (I). In another embodiment, an isolated or recombinant In other embodiments, the amino acid sequence of the polypeptide that includes the amino acid sequence set forth in polypeptide includes residues at the following positions: SEQ ID NO:10 with up to 75, 50, 25, or 10 conservative (1) a glycine (G), arginine (R), tyrosine (Y), or threonine 35 amino acid substitutions excluding residues 47, 78, 82, 94. (T) amino acid residue at position: (a) 464 of SEQID NO:1; 142, 175, 184, 205, 226, 236, 252, 255, 290, 328, 353, 464, (b) 475 of SEQID NO:2; (c)467 of SEQID NO:3; (d) 477 of and 710, is provided. SEQID NO:4; (e) 472 of SEQID NO:5; and (f) 480 of SEQ The polypeptides of SEQ ID NO:7, 8, 9 or 10, further ID NO:6; optionally include anarginine (R) at amino acid residue posi (2) an asparagine (N) amino acid residue at position: (a) 40 tion 285. 631 of SEQ ID NO:1; (b) 641 of SEQ ID NO:2; (c) 634 of As previously noted, polypeptides provided herein can SEQID NO:3; (d) 644 of SEQID NO:4; (e) 640 of SEQ ID display hydroxylase activity that converts an alkane to an NO:5; and (f) 648 of SEQID NO:6; alcohol. In general analkane includes methane (CH), ethane (3) an arginine (R) amino acid residue at position: (a) 645 (C2H), propane (CHs), butane (CHo), pentane (C5H12), of SEQID NO:1; (b) 656 of SEQID NO:2; (c) 648 of SEQID 45 hexane (CH4), heptane (C7H6) octane (CHs), nonane NO:3; (d) 659 of SEQID NO:4; (e) 656 of SEQID NO:5; and (CoHo), decane (CoH), undecane (CH4), and dodecane (f) 664 of SEQID NO:6; (CH). In general, an alcohol includes methanol, ethanol, (4) a threonine (T) amino acid residue at position: (a) 710 propanol, butanol, pentanol, hexanol, heptanol, octanol, of SEQID NO:1; (b) 721 of SEQID NO:2; (c) 713 of SEQID nonanol, decanol, undecanol, and dodecanol. NO:3; (d) 724 of SEQID NO:4; (e) 723 of SEQID NO:5; and 50 In another embodiment, an isolated or recombinant (f) 733 of SEQID NO:6; and polypeptide that includes the amino acid sequence set forth in (5) a lysine (L) amino acid residue at position: (a) 968 of SEQ ID NO:11 with up to 75, 50, 25, or 10 conservative SEQID NO:1; (b) 980 of SEQID NO:2; (c) 972 of SEQ ID amino acid substitutions excluding residues 47, 78, 82, 94. NO:3:(d)983 of SEQID NO:4; (e)985 of SEQID NO:5; and 142, 175, 184, 205, 226, 236, 252, 255, 285,290, 328, 353, (f) 997 of SEQ ID NO:6. 55 464, and 710, is provided. An isolated or recombinant polypeptide comprising resi In another embodiment, an isolated or recombinant dues 1 to about 455 of the amino acid sequence set forth in polypeptide that includes the amino acid sequence set forth in SEQID NO:1 with up to 50 conservative amino acid substi SEQ ID NO:12 with up to 75, 50, 25, or 10 conservative tutions excluding residues 47, 52, 74, 78, 82, 94, 142, 175, amino acid substitutions excluding residues 47, 78, 82, 94. 184, 188,205, 226,236,252,255,290, 328,353,366 and 443 60 142, 175, 184, 205, 226, 236, 252, 255, 290, 328, 353, 464, wherein the amino acid sequence comprises residues selected 645, and 710, is provided. from the group consisting of: (a) at positions 47, 82, 142,205, In another embodiment, an isolated or recombinant 236, 252, and 255, a Z1 amino acid residue; (b) at positions polypeptide that includes the amino acid sequence set forth in 52,94, 175, 184, 188,290,353,366 and 443, a Z2 amino acid SEQ ID NO:13 with up to 75, 50, 25, or 10 conservative residue; (c) at position 226, a Z3 amino acid residue; (d) at 65 amino acid substitutions excluding residues 47, 78, 82, 94. positions 78 and 328, a Z4 amino acid residue, (e) at position 142, 175, 184, 205, 226, 236, 252, 255, 290, 328, 353, 464, 74, an amino acid selected from the group consisting of 631, and 710, is provided. US 8,715,988 B2 7 8 In other embodiments, isolated or recombinant polypep In other embodiments, isolated or recombinant polypep tides of the invention include: (a) a polypeptide comprising tides of the invention include: (a) a polypeptide comprising the amino acid sequence set forth in SEQ ID NO:7; (b) a the amino acid sequence set forth in SEQ ID NO:13; (b) a polypeptide consisting of the amino acid sequence set forth in polypeptide consisting of the amino acid sequence set forth in SEQ ID NO:7; (c) a polypeptide comprising an amino acid SEQ ID NO: 13; or (c) a polypeptide comprising an amino sequence having at least 60%, 70%, 80%, 90%, or 98% acid sequence having at least 60%, 70%, 80%, 90%, or 98% sequence identity to the amino acid sequence set forth in SEQ sequence identity to the amino acid sequence set forth in SEQ ID NO:7 excluding amino acid residues 47, 78, 82,94, 142, ID NO:13 excluding amino acid residues 47, 78, 82,94, 142, 175, 184,205, 226,236,252,255,290,328,353,464, and 710 175, 184, 205, 226, 236, 252, 255, 290, 328, 353, 464, 631, 10 and 710 of SEQID NO:13. of SEQID NO:7; or (d) a polypeptide comprising an amino In other embodiments, isolated nucleic acid molecules are acid sequence that can be optimally aligned with the sequence provided. Such nucleic acid molecules include: (a) a nucleic of SEQIDNO:7 to generate a similarity score of at least 1830, acid molecule which encodes a polypeptide comprising the using the BLOSUM62 matrix, a gap existence penalty of 11, amino acid sequence set forth in SEQID NO:7, 8, 9, 10, 11, and a gap extension penalty of 1, excluding amino acid resi 15 12, 13 or 125; (b) a nucleic acid molecule which encodes a dues 47,78, 82,94, 142, 175, 184, 205, 226, 236, 252, 255, polypeptide consisting of the amino acid sequence set forth in 290, 328, 353, 464, and 710 of SEQID NO:7. SEQ ID NO:7, 8, 9, 10, 11, 12, 13 or 125; (c) a nucleic acid In other embodiments, isolated or recombinant polypep molecule which encodes a polypeptide comprising residues 1 tides of the invention include: (a) a polypeptide comprising to about 455 of the amino acid sequence set forth in SEQID the amino acid sequence set forth in SEQ ID NO:8; (b) a NO:7 or 11; (d) a nucleic acid molecule which encodes a polypeptide consisting of the amino acid sequence set forth in polypeptide comprising residues about 456 to about 1088 of SEQID NO:8; or (c) a polypeptide comprising an amino acid the amino acid sequence set forth in SEQID NO:7, 8, 9, 10, sequence having at least 60%, 70%, 80%, 90%, or 98% 12, 13 or 125; (e) a nucleic acid molecule comprising the sequence identity to the amino acid sequence set forth in SEQ nucleotide sequence set forth in SEQID NO:15; (f) a nucleic ID NO:8 excluding amino acid residues 47, 78, 82,94, 142, 25 acid molecule consisting of the nucleotide sequence set forth 175, 184,205, 226,236,252,255,290,328,353,464, and 710 in SEQID NO:15; (g) a nucleic acid molecule comprising the of SEQID NO:8. nucleotide sequence set forth in SEQID NO:17: (h) a nucleic In other embodiments, isolated or recombinant polypep acid molecule consisting of the nucleotide sequence set forth tides of the invention include: (a) a polypeptide comprising in SEQID NO:17. the amino acid sequence set forth in SEQ ID NO:9; (b) a 30 In other embodiments, nucleic acid molecules of the inven polypeptide consisting of the amino acid sequence set forth in tion include: (a) a nucleic acid molecule which encodes a SEQID NO:9; or (c) a polypeptide comprising an amino acid polypeptide comprising residues 1 to about 455 of the amino sequence having at least 60%, 70%, 80%, 90%, or 98% acid sequence set forth in SEQID NO:2 or 3 with the follow sequence identity to the amino acid sequence set forth in SEQ ing amino acid residues: 48 and 206 are cysteine (C); 79 and ID NO:9 excluding amino acid residues 47, 78, 82,94, 142, 35 330 are phenylalanine (F): 83, 143, and 257 are serine (S);95 175, 184,205, 226,236,252,255,290,328,353,464, and 710 and 176 are isoleucine (I); 185,292, and 355 are valine (V); of SEQ ID NO:9. 227 is arginine (R); 238 is glutamine (Q); and 254 is glycine In other embodiments, isolated or recombinant polypep (G); (b) a nucleic acid molecule which encodes a polypeptide tides of the invention include: (a) a polypeptide comprising consisting of residues 1 to about 455 of the amino acid the amino acid sequence set forth in SEQ ID NO:10; (b) a 40 sequence set forth in SEQ ID NO:2 or 3 with the following polypeptide consisting of the amino acid sequence set forth in amino acid residues: 48 and 206 are cysteine (C); 79 and 330 SEQ ID NO:10; or (c) a polypeptide comprising an amino are phenylalanine (F): 83, 143, and 257 are serine (S);95 and acid sequence having at least 60%, 70%, 80%, 90%, or 98% 176 are isoleucine (I); 185,292, and 355 are valine (V): 227 sequence identity to the amino acid sequence set forth in SEQ is arginine (R); 238 is glutamine (Q); and 254 is glycine (G); ID NO:10 excluding amino acid residues 47,78, 82,94, 142, 45 (c) a nucleic acid molecule which encodes a polypeptide 175, 184,205, 226,236,252,255,290,328,353,464, and 710 comprising residues 1 to about 455 of the amino acid of SEQID NO:10. sequence set forth in SEQID NO:4 with the following amino In other embodiments, isolated or recombinant polypep acid residues: 49 and 207 are cysteine (C); 80 and 331 are tides of the invention include: (a) a polypeptide comprising phenylalanine (F);84, 144, and 258 are serine (S); 96 and 177 the amino acid sequence set forth in SEQ ID NO:11; (b) a 50 are isoleucine (I); 186, 293, and 357 are valine (V): 228 is polypeptide consisting of the amino acid sequence set forth in arginine (R): 239 is glutamine (Q); and 255 is glycine (G); (d) SEQ ID NO:11; or (c) a polypeptide comprising an amino a nucleic acid molecule which encodes a polypeptide consist acid sequence having at least 60%, 70%, 80%, 90%, or 98% ing of residues 1 to about 455 of the amino acid sequence set sequence identity to the amino acid sequence set forth in SEQ forth in SEQID NO:4 with the following amino acid residues: ID NO:11 excluding amino acid residues 47,78, 82,94, 142, 55 49 and 207 are cysteine (C); 80 and 331 are phenylalanine (F); 175, 184, 205, 226, 236, 252, 255, 285,290, 328, 353, 464, 84, 144, and 258 are serine (S); 96 and 177 are isoleucine (I); and 710 of SEQID NO:11. 186, 293, and 357 are valine (V): 228 is arginine (R): 239 is In other embodiments, isolated or recombinant polypep glutamine (Q); and 255 is glycine (G); (e) a nucleic acid tides of the invention include: (a) a polypeptide comprising molecule which encodes a polypeptide comprising residues 1 the amino acid sequence set forth in SEQ ID NO:12; (b) a 60 to about 455 of the amino acid sequence set forth in SEQID polypeptide consisting of the amino acid sequence set forth in NO:5 with the following amino acid residues: 54 and 212 are SEQ ID NO:12; or (c) a polypeptide comprising an amino cysteine (C); 85 and 336 are phenylalanine (F); 89, 149, and acid sequence having at least 60%, 70%, 80%, 90%, or 98% 262 are serine (S); 101 and 182 are isoleucine (I); 191, 297, sequence identity to the amino acid sequence set forth in SEQ and 361 are valine (V): 232 is arginine (R): 243 is glutamine ID NO:12 excluding amino acid residues 47,78, 82,94, 142, 65 (Q); and 259 is glycine (G), (f) a nucleic acid molecule which 175, 184, 205, 226, 236, 252, 255, 290, 328, 353, 464, 645, encodes a polypeptide consisting of residues 1 to about 455 of and 710 of SEQID NO:12. the amino acid sequence set forth in SEQID NO:5 with the US 8,715,988 B2 9 10 following amino acid residues: 54 and 212 are cysteine (C); serine (S);96 and 177 are isoleucine (I); 186,293, and 357 are 85 and 336 are phenylalanine (F); 89, 149, and 262 are valine (V); 228, 288, and 659 are arginine (R): 239 is serine (S); 101 and 182 are isoleucine (I); 191, 297, and 361 glutamine (Q); and 255 is glycine (G); 477 is glycine (G), are valine (V); 232 is arginine (R): 243 is glutamine (Q); and arginine (R), tyrosine (Y), or threonine (T), 644 is asparagine 259 is glycine (G); (g) a nucleic acid molecule which encodes (N): 724 is threonine (T); and 983 is leucine; (g) a nucleic acid a polypeptide comprising residues 1 to about 455 of the amino molecule which encodes a polypeptide comprising the amino acid sequence set forth in SEQ ID NO:6 with the following acid sequence set forth in SEQ ID NO:5 with the following amino acid residues: 51 and 208 are cysteine (C); 82 and 337 amino acid residues: 54 and 212 are cysteine (C); 85 and 336 are phenylalanine (F): 86, 146, and 262 are serine (S); 98 and are phenylalanine (F);89, 149, and 262 are serine (S); 101 and 179 are isoleucine (I); 188,296, and 363 are valine (V): 231 10 182 are isoleucine (I); 191,297, and 361 are valine (V): 232, is arginine (R): 243 is glutamine (Q); and 258 is glycine (G); 292, and 656 are arginine (R): 243 is glutamine (Q); and 259 and (h) a nucleic acid molecule which encodes a polypeptide is glycine (G); 472 is glycine (G), arginine (R), tyrosine (Y), consisting of residues 1 to about 455 of the amino acid or threonine (T); 640 is asparagine (N): 723 is threonine (T): sequence set forth in SEQID NO:6 with the following amino and 985 is leucine; (h) a nucleic acid molecule which encodes acid residues: 51 and 208 are cysteine (C); 82 and 337 are 15 a polypeptide consisting of the amino acid sequence set forth phenylalanine (F): 86, 146, and 262 are serine (S);98 and 179 in SEQID NO:5 with the following amino acid residues: 54 are isoleucine (I); 188, 296, and 363 are valine (V): 231 is and 212 are cysteine (C); 85 and 336 are phenylalanine (F); arginine (R): 243 is glutamine (Q); and 258 is glycine (G). 89, 149, and 262 are serine (S); 101 and 182 are isoleucine (I); In other embodiments, nucleic acid molecules of the inven 191, 297, and 361 are valine (V): 232, 292, and 656 are tion include: (a) a nucleic acid molecule which encodes a arginine (R): 243 is glutamine (Q); and 259 is glycine (G); polypeptide comprising the amino acid sequence set forth in 472 is glycine (G), arginine (R), tyrosine (Y), or threonine SEQID NO:2 with the following amino acid residues: 48 and (T); 640 is asparagine (N): 723 is threonine (T); and 985 is 206 are cysteine (C); 79 and 330 are phenylalanine (F): 83, leucine: (i) a nucleic acid molecule which encodes a polypep 143, and 257 are serine (S);95 and 176 are isoleucine (I); 185, tide comprising the amino acid sequence set forth in SEQID 292, and 355 are valine (V); 227, 287, and 656 are arginine 25 NO:6 with the following amino acid residues: 51 and 208 are (R); 238 is glutamine (Q); 254 is glycine (G); 475 is glycine cysteine (C); 82 and 337 are phenylalanine (F); 86, 146, and (G), arginine (R), tyrosine (Y), or threonine (T), 641 is aspar 262 are serine (S);98 and 179 are isoleucine (I); 188,296, and agine (N): 721 is threonine (T); and 980 is leucine; (b) a 363 are valine (V): 231, 291, and 664 are arginine (R): 243 is nucleic acid molecule which encodes a polypeptide consist glutamine (Q); and 258 is glycine (G), 480 is glycine (G), ing of the amino acid sequence set forth in SEQID NO:2 with 30 arginine (R), tyrosine (Y), or threonine (T), 648 is asparagine the following amino acid residues: 48 and 206 are cysteine (N): 733 is threonine (T); and 997 is leucine; and () a nucleic (C); 79 and 330 are phenylalanine (F): 83, 143, and 257 are acid molecule which encodes a polypeptide consisting of the serine (S);95 and 176 are isoleucine (I); 185,292, and 355 are amino acid sequence set forth in SEQ ID NO:6 with the valine (V); 227, 287, and 656 are arginine (R): 238 is following amino acid residues: 51 and 208 are cysteine (C); glutamine (Q); 254 is glycine (G); 475 is glycine (G), arginine 35 82 and 337 are phenylalanine (F): 86, 146, and 262 are (R), tyrosine (Y), or threonine (T): 641 is asparagine (N): 721 serine (S);98 and 179 are isoleucine (I); 188,296, and 363 are is threonine (T); and 980 is leucine; (c) a nucleic acid mol valine (V): 231, 291, and 664 are arginine (R): 243 is ecule which encodes a polypeptide comprising the amino glutamine (Q); and 258 is glycine (G), 480 is glycine (G), acid sequence set forth in SEQ ID NO:3 with the following arginine (R), tyrosine (Y), or threonine (T), 648 is asparagine amino acid residues: 48 and 206 are cysteine (C); 79 and 330 40 (N): 733 is threonine (T); and 997 is leucine. are phenylalanine (F): 83, 143, and 257 are serine (S);95 and In one embodiment, an isolated nucleic acid molecule that 176 are isoleucine (I); 185,292, and 355 are valine (V): 227, includes a nucleic acid molecule of the invention and a nucle 287, and 648 are arginine (R): 238 is glutamine (Q); 254 is otide sequence encoding a heterologous polypeptide, is pro glycine (G), 467 is glycine (G), arginine (R), tyrosine (Y), or vided. threonine (T): 634 is asparagine (N); 713 is threonine (T); and 45 In other embodiments, vectors that include a nucleic acid 972 is leucine; (d) a nucleic acid molecule which encodes a molecule of the invention are provided. polypeptide consisting of the amino acid sequence set forth in In other embodiments, host cells transfected with a nucleic SEQID NO:3 with the following amino acid residues: 48 and acid molecule of the invention, or a vector that includes a 206 are cysteine (C); 79 and 330 are phenylalanine (F): 83, nucleic acid molecule of the invention, are provided. Host 143, and 257 are serine (S);95 and 176 are isoleucine (I); 185, 50 cells include eucaryotic cells Such as yeast cells, insect cells, 292, and 355 are valine (V); 227, 287, and 648 are arginine or animal cells. Host cells also include procaryotic cells Such (R); 238 is glutamine (Q); 254 is glycine (G); 467 is glycine as bacterial cells. (G), arginine (R), tyrosine (Y), or threonine (T): 634 is aspar In other embodiments, methods for producing a cell that agine (N); 713 is threonine (T); and 972 is leucine; (e) a converts an alkane to alcohol, are provided. Such methods nucleic acid molecule which encodes a polypeptide compris 55 generally include: (a) transforming a cell with an isolated ing the amino acid sequence set forth in SEQID NO:4 with nucleic acid molecule encoding a polypeptide that includes the following amino acid residues: 49 and 207 are cysteine an amino acid sequence set forth in SEQID NO: 7, 8, 9, 10. (C); 80 and 331 are phenylalanine (F); 84, 144, and 258 are 11, 12, 13 or 125; (b) transforming a cell with an isolated serine (S);96 and 177 are isoleucine (I); 186,293, and 357 are nucleic acid molecule encoding a polypeptide of the inven valine (V); 228, 288, and 659 are arginine (R): 239 is 60 tion; or (c) transforming a cell with an isolated nucleic acid glutamine (Q); and 255 is glycine (G); 477 is glycine (G), molecule of the invention. arginine (R), tyrosine (Y), or threonine (T), 644 is asparagine In other embodiments, methods for selecting a cell that (N): 724 is threonine (T); and 983 is leucine; (f) a nucleic acid converts an alkane to an alcohol, are provided. The methods molecule which encodes a polypeptide consisting of the generally include: (a) providing a cell containing a nucleic amino acid sequence set forth in SEQ ID NO:4 with the 65 acid construct that includes a nucleotide sequence that following amino acid residues: 49 and 207 are cysteine (C); encodes a modified cytochrome P450 polypeptide, the nucle 80 and 331 are phenylalanine (F); 84, 144, and 258 are otide sequence selected from: (i) a nucleic acid molecule US 8,715,988 B2 11 12 encoding a polypeptide comprising an amino acid sequence FIG. 2 depicts a gas chromatogram of ethanol reaction set forth in SEQID NO: 7, 8, 9, 10, 11, 12, 13 or 125; (ii) a mixtures using variant 53-5Has the catalyst. Prior to analysis, nucleic acid molecule encoding a polypeptide of the inven ethanol was derivatized to form ethyl nitrite for detection via tion; or (iii) a nucleic acid molecule of the invention. The an electron capture detector (ECD). methods further include (b) culturing the cell in the presence FIG. 3 depicts the conversion of ethane to ethanol using of a suitable alkane and under conditions where the modified 200 nM protein corresponds to 50 and 250 turnovers per cytochrome P450 is expressed at a level sufficient to convert enzyme active site of 53-5H and 35-E11, respectively. Halv an alkane to an alcohol. Such conditions are met when an ing or doubling the enzyme concentration yielded approxi alcohol is produced at a level detectable by a method provided mately correspondingly lower or higher product concentra herein, or a method known to one skilled in the art of enzy 10 tions, respectively. Control reactions, either without ethane or mology. with inactivated protein, did not contain ethanol concentra In other embodiments, methods for producing an alcohol, tions above the background level of 1 uM. Error bars are the are provided. The methods include: (a) providing a cell con standard deviation of three experiments. taining a nucleic acid construct comprising a nucleotide 15 FIG. 4 depicts C-NMR spectrum of singly 'C-labeled sequence that encodes a modified cytochrome P450 polypep ethanol incubated with 53-5H and NADPH regeneration sys tide, the nucleotide sequence selected from: (i) a nucleic acid tem. No C-labeled carbonyl peaks due to overoxidation of molecule encoding a polypeptide comprising an amino acid the ethanol were detectable. sequence set forth in SEQ ID NO: 7, 8, 9, 10, 11, 12, 13 or FIG.5 depicts an absorption spectrum of 53-5H. The ferric 125; (ii) a nucleic acid molecule encoding a polypeptide of heme iron is in low-spin form. A shift to high-spin ferric iron the invention; or (iii) a nucleic acid molecule of the invention. upon addition of ethane could not be detected. Addition of The methods further include (b) culturing the cell in the octane did not induce an increase at 390 nm. The spectrum of presence of a suitable alkane and under conditions where the 35-E11 is substantially similar. modified cytochrome P450 is expressed at an effective level; FIGS. 6A-6F depict an amino acid sequence alignment of and (c) producing an alcohol by hydroxylation of the Suitable 25 6 CYP102 homologs and BM-3 mutant 35-E11. Correspond alkane such as methane (CH), ethane (CH), propane ing mutations in homologs are provided in Table 4. (CHs), butane (CHo), pentane (C5H12), hexane (CH4). FIG. 7 depicts the spectra of methane reactions of BM-3 heptane (C7H), octane (CHs), nonane (CoHo), decane variant 35-E 11 and the variant 35-E11-E464Rafter treatment (CoH 22), undecane (CH2-), and dodecane (C12H2). In with alcohol oxidase and Purpald. The peak at 550 nm corre general an alcohol produced by a method the invention can 30 sponds to methanol produced. Solutions containing all reac include methanol, ethanol, propanol, butanol, pentanol, hex tants except methane do not absorb in this region when simi anol, heptanol, octanol, nonanol, decanol, undecanol, and larly treated. dodecanol. FIG. 8 depicts SEQID NO: 1 (CYP102 ul from Bacillus In yet another embodiment, methods for producing a cyto megaterium). chrome P450 variant that hydroxylates an alkane are pro 35 FIG.9 depicts SEQ ID NO: 2 (CYP102A2 from Bacillus vided. Such methods include selecting a parent cytochrome subtilis, 59% identity to CYP102 ul). P450 polypeptide and modifying at least one amino acid FIG. 10 depicts SEQID NO:3 (CYP102A3 from Bacillus residue positioned in or near the active site of the heme subtilis, 58% identity to CYP102 ul). domain of the parent cytochrome P450 polypeptide. In gen FIG. 11 depicts SEQID NO: 4 (CYP102A5 from Bacillus eral the modification reduces the volume of the active site. 40 cereus, 60% identity to CYP102 ul). The method further includes contacting the polypeptide com FIG. 12 depicts SEQID NO: 5 (CYP102 ul from Ralstonia prising the modified amino acid with at least one alkane under metallidurans, 38% identity to CYP102 ul). conditions suitable for hydroxylation of the alkane and FIG. 13 depicts SEQ ID NO: 6 (CYP102A6 from detecting a hydroxylated alkane. The modification may Bradyrhizobium japonicum, 46% identity to CYP102 ul). include a substitution of the parent amino acid for a different 45 FIG. 14 depicts SEQID NO: 7 (35-E11). amino acid or it may include modifying the parentamino acid FIG. 15 depicts SEQID NO: 8 (35-E11-E464R). to include an additional group. If the modification is a Sub FIG. 16 depicts SEQID NO: 9 (35-E11-E464Y). stitution, Such Substitutions can include a phenylalanine, FIG. 17 depicts SEQID NO: 10 (35-E11-E464T). tyrosine, histidine, or serine for the parentamino acid residue FIG. 18 depicts SEQID NO: 11 (20-D3). positioned in or near the active site of the heme domain of the 50 FIG. 19 depicts SEQID NO: 12 (23-1D). parent cytochrome P450 polypeptide. FIG. 20A-B depicts SEQ ID NO: 13 (21-4G) and 125 The details of one or more embodiments of the disclosure (P450). are set forth in the accompanying drawings and the descrip FIG. 21 depicts SEQID NO: 15 (53-5H). tion below. Other features, objects, and advantages will be FIGS. 22A and 22B depict SEQ ID NO: 16 (expression apparent from the description and drawings, and from the 55 plasmid pCWORI-53-H containing the heme and reductase claims. domain of variant 53-5H). FIG. 23 depicts SEQ ID NO: 17 (nucleic acid sequence BRIEF DESCRIPTION OF DRAWINGS encoding variant 35-E11). FIGS. 24A and 24B depict SEQ ID NO: 18 (expression FIG. 1 depicts the reduction of substrate size and increase 60 plasmid pCWORI-35-E 11 containing heme and reductase in C H bond dissociation energy achieved by directed evo domain of variant 35-E11). lution. Directed evolution was used to convert wild-type P450 FIG. 25 depicts alkyl methyl ether assays. Panel a) shows BM-3 stepwise from a fatty acid hydroxylase into an enzyme hydroxylation of the propane Surrogate dimethyl ether pro capable of activating the higher-energy C-H bond of ethane. duces formaldehyde which can easily be detected using the The figure shows the substrate range of wild-type P450 BM-3 65 dye Purpald. Panelb) shows hexylmethyl ether can be used as and its mutant 53-5H and, for comparison, the range of Sub an octane Surrogate. Panel c) shows a typical screening plate strates of other alkane monooxygenases. with P450-containing cell lysate, which, upon hydroxylation US 8,715,988 B2 13 14 of the methoxy group of DME or HME and addition of recombination of the most beneficial reductase domain muta Purpald, forms a purple color. Each well represents a different tions obtained from these libraries using the heme domain of BM-3 variant. 7-7. FIG. 26 depicts standard curves for formaldehyde () and FIG. 36 shows the thermostability of P450 variants. hexanal (0) using Purpald. The assay was performed in a 5 Curves corresponding to 35E11 (o), ETS8 (()), and wild 96-well microtiter plate and the final volume was 250 uL per type P450 (D) are indicated. Tso value correspond to well. midpoint denaturation temperature. FIG. 27 depicts a calorimetric reaction after addition of FIG.37A-F shows the functional and stability properties of Purpald to reaction mixtures containing BM-3 mutants, chlo variants on the lineage from P450 to P450. (A) Total romethane and cofactor regeneration system. Panel A and 10 turnover numbers for propane hydroxylation (mol propanol/ Panel B refer to independent experiments. mol P450). (B) Coupling efficiency as given by propanol formation rate/NADPH oxidation rate. (C) K values for FIG. 28 depicts total turnover numbers for the reaction of propanol formation. (D) Catalytic rate constants (k) for CM dehalogenation catalyzed by BM-3 mutants. propanol formation. (E) Enzyme thermostability, reported as FIG. 29 depicts a calorimetric response for the reaction 15 Tso values from heat inactivation of heme domain. (F) Sta BM-3 variants substituted at position A328. bility penalty for each round of directed evolution, normal FIG. 30 depicts the chemical structures of products from ized by the number of mutations accumulated in each round. the hydroxylation/dehalogenation reaction catalyzed by FIG. 38 depicts re-specialization of P450 for activity P450 mutants on different kinds of halogenated alkanes. on propane. Activities of the engineered variants on C to Co FIGS. 31A-31C depict an amino acid sequence alignment alkanes, reported relative to activity on the substrate preferred of the heme domain (residues 1-455) of 6 CYP102 homologs by each variant (=100%). The dashed line indicates the profile and BM-3 mutant 35-E11. Corresponding mutations in of the preceding variant to highlight the change in Substrate homologs are provided in Table 4. specificity in each directed evolution step. FIG. 32 depicts protein similarity scores for polypeptides FIG. 39 shows active site engineering. Map of active site provided herein. Panel A shows exemplary similarity scores 25 residues targeted for Saturation mutagenesis on the palmitate for full length polypeptides (e.g., SEQID NOs: 1-7). Panel B bound structure of P450 heme domain (PDB entry shows exemplary similarity scores for residues 1-455 of SEQ 1 FAG). ID NOs: 1-7 corresponding to the heme domain of P450 FIG. 40A-B (A) Comparison between catalytic rate of polypeptides. P450, on propane and those of naturally-occurring alkane FIG.33 depicts protein percent (%) identities for polypep 30 monooxygenases (pMMO, SMMO, BMO, AlkB, cyp 153 tides provided herein. Panel A shows exemplary percent (%) (A6)) on their preferred substrates. (B) Reported substrate identities for full length polypeptides (e.g., SEQ ID NOs: specificity profiles of pMMO (based on in vitro activity), PMO from Gordonia sp. TY-5 (based on growth rate), BMO 1-7). Panel B shows exemplary percent (%) identities for from Nocardioides sp. CF8 (based on growth rate), and residues 1-455 of SEQ ID NOs: 1-7 corresponding to the 35 cyp153(A6) from Mycobacterium sp. HXN-1500 (based on heme domain of P450 polypeptides. in vitro activity). FIG. 34A-B shows P450, and naturally-occurring non FIG. 41 shows C H bond dissociation energy (A) and heme gaseous alkane monooxygenases. (A) P450 (from molecular size (B) of C-C alkanes and halogenated methane Bacillus megaterium) catalyzes sub-terminal hydroxylation derivatives. Experimental C H BDE values are derived of long-chain fatty acids and propane monooxygenase from 40 from Cherkasov et al. Gordonia sp. TY-5 catalyzes hydroxylation of propane to FIG. 42 is a GC chromatogram of PFHA-derivatized form 2-propanol. P450 has no measurable activity on propane, aldehyde from reaction with P450, and halogenated meth while TY-5 PMO specificity is restricted to propane. (B) ane derivatives, CHC1, CHBr and CHI. Structures of P450 and soluble gaseous alkane monooxy FIG. 43A-C shows convergent evolution in propane genase (represented by the Soluble methane monoxygenase, 45 hydroxylation and emergence of new functions. (A) k val sMMO) redox systems. Single-component P450 (119 ues for alkanes as measured from Michaelis-Menten plots. kDa) consists of three functional sub-domains, heme-, FMN-, (B) Overlap of substrate range of the evolving P450 with and FAD-binding domain (modeled on rat CPR structure). those of naturally-occurring non-heme oxygenases (AlkB: sMMO (and presumably other bacterial multi-component integral membrane alkane hydroxylase). P450, has re monooxygenases including PMO and BMO) comprises a 250 50 specialized for propane hydroxylation with complete loss of kDa CBy hydroxylase, a 15 kDa cofactorless effector pro the original activity on fatty acids (AA: arachidonic acid, PA: tein, and a 40 kDa 2Fe-2S)- and FAD-containing reductase. palmitic acid, MPA: 13-methyl-PA). Overall, 34 mutational FIG. 35A-C depicts directed evolution of P450, from events separate P450 and P450 leading to 23 amino wild-type P450. (A) Protein engineering steps and acid differences. (C) Halomethane dehalogenase activity of selected variants from each round. (B) Map of mutated resi 55 P450, and its precursors, as measured by the rate of form dues (sphere models) on P450, heme domain crystal struc aldehyde formation. ture (PDB entry 1FAG). L188 (mutated to proline in Like reference symbols in the various drawings indicate ETS8--> 19A12 transition) is shown. Heme (sphere models) is like elements. shown while bound palmitate (stick model) is shown. (C) shows domain-based engineering strategy. The heme-, FMN 60 DETAILED DESCRIPTION and FAD-binding domains of P450 variant 35E11 were independently evolved preparing separate mutagenesis Before describing the invention in detail, it is to be under libraries of each region. The heme domain was optimized for stood that this invention is not limited to particular composi propane activity over several rounds of directed evolution. tions or biological systems, which can, of course, vary. It is Upon screening of 35E11 reductase domain libraries, sites 65 also to be understood that the terminology used herein is for important for activity were identified and further optimized the purpose of describing particular embodiments only, and is using the heme domain of 11-3. P450, was obtained by not intended to be limiting. As used in this specification and US 8,715,988 B2 15 16 the appended claims, the singular forms “a,” “an and “the onstrate that the homologous P450 enzymes CYP102A2 and include plural referents unless the content clearly dictates CYP102A3 are similar enough to BM-3 to recombine in this otherwise. Unless defined otherwise, all technical and scien fashion and still retain a high probability of folding. tific terms used herein have the same meaning as commonly Provided herein are variants of cytochrome P450 BM-3 understood by one of ordinary skill in the art to which the that catalyze the direct conversion of methane to methanol. invention pertains. Although any methods and materials simi The reaction uses dioxygen from the air and the reduced form lar or equivalent to those described herein can be used in the of the biological cofactor nicotinamide adenine dinucleotide practice for testing of the invention(s), specific examples of phosphate (NADPH). The stoichiometry of the reactions appropriate materials and methods are described herein. catalyzed by the BM-3 variant is CH+O+NADPH-i-H" World reserves of methane in natural gas are more abun 10 ->CHOH+HO+NADP". dant than oil. Methane in methane hydrates increases this by The term “total turnover number (TTN) is the total num orders of magnitude. Methane is also emitted from coal beds ber of substrate molecules converted to product (or turned and renewably produced by the anaerobic decomposition of over) by a single enzyme overits lifetime or during a specified organic matter in landfills and human and animal waste time period. TTN is an important figure of merit for a catalyst streams. Most of these sources, deemed “stranded by the 15 because it allows for the calculation of the total amount of energy industries, are not used because they are either too product that can be made from a given quantity of catalyst. remote to transport the gas to a power generating facility or The term "coupling' means the ratio, in percent, of product too small to economically convert with existing technology formed to cofactor (NAD(P)H) consumed during the into the more easily transported methanol. A practical catalyst enzyme-catalyzed reaction. If 2 moles of cofactor are con for the conversion of methane to methanol could allow these Sumed for every mole of product made, then the coupling for Sources of methane gas to be converted into easily transported the reaction is 50%. Coupling is also an important figure of liquid fuel. Accessing these reserves with a new technology merit because it allows one to assess how much cofactor is will have a dramatic effect on both the energy and chemical required to produce a given amount of product. commodity industries. The BM-3-based catalyst functions at ambient temperature The wide scale production of methanol from these sources 25 and pressure. Unlike methane monooxygenase enzymes, this would also have a favorable impact on the concentration of BM-3 based biocatalyst can be expressed in host organisms greenhouse gases in the atmosphere. Methane has a 20 fold that do not consume the methanol as it is produced. Thus it higher global warming potential than carbon dioxide. Human may be incorporated into a whole-cell biocatalysis system activities release about 300 MMT/year of methane into the that can accumulate methanol in the presence of methane and atmosphere, accounting for about a quarter of the measured 30 a1. increase in global temperatures. Conversion of this wasted Cytochrome P450 enzymes (P450s) are exceptional oxi methane into methanol could have a dramatic effect on the dizing catalysts, effecting highly selective transformations concentrations of global warming gases in the atmosphere, that are sometimes impossible to achieve by chemical meth since in addition to removing methane from the atmosphere ods under similarly mild conditions. These versatile enzymes by this process, the methanol when burned as a fuel releases 35 have enormous potential for applications in drug discovery, less (~/2) carbon dioxide than is released by burning an chemical synthesis, bioremediation, and biotechnology. equivalent amount of gasoline. Converting methane released However, tailoring P450s to accept nonnatural substrates, as into the environment from human activities into a usable required by many applications, is exceptionally difficult in alternative fuel is therefore one of the most effective means of this complex catalytic system, which involves multiple cofac curbing global warming. 40 tors, Substrates and protein domains. Compared to their natu The engineered cytochromes P450 mutants described ral counterparts, engineered P450s exhibit poor catalytic and throughout the present disclosure were acquired by accumu coupling efficiencies; obtaining native-like proficiencies is a lating point mutations in directed evolution experiments. An mandatory first step towards utilizing the catalytic power of alternative method for making libraries for directed evolution these versatile oxygenases in chemical synthesis. to obtain P450s with new or altered properties is recombina 45 The P450 catalytic cycle is initiated by a substrate binding tion, or chimeragenesis, in which portions of homologous event that is accompanied by large conformational changes P450s are swapped to form functional chimeras. Recombin and a shift in the heme redox potential. This induces electron ing equivalent segments of homologous proteins generates transfer from the NAD(P)H cofactor to the heme, resulting in variants in which every amino acid Substitution has already the formation of the highly reactive iron-oxo species that proven to be successful in one of the parents. Therefore, the 50 activates the substrate. Mechanisms controlling efficient amino acid mutations made in this way are less disruptive, on catalysis are disrupted when P450s bind normative substrates average, than random mutations. A structure-based algo or when amino acid substitutions are made, both of which rithm, such as SCHEMA, identifies fragments of proteins that result in uncoupling of cofactor utilization and product for can be recombined to minimize disruptive interactions that mation and rapid enzyme inactivation due to the formation of would prevent the protein from folding into its active form. 55 reactive oxygen species. SCHEMA has been used to design chimeras of P450 BM-3 P450s insert oxygen into a broad range of compounds. No and its homolog CYP102A2, sharing 63% amino acid P450, however, is known to have specialized in reactions on sequence identity. Fourteen of the seventeen constructed Small alkanes. Gaseous alkanes (methane, propane, butane) hybrid proteins were able to fold correctly and incorporate the and halomethanes (CHC1, CHBr) serve as sole carbon and heme cofactor, as determined by CO difference spectra. Half 60 energy sources for numerous aerobic microorganisms, where of the chimeras had altered substrate specificities, while three the first oxidative step is catalyzed by non-heme methane, mutants acquired activities towards a new substrate not propane and butane monooxygenases (MMO, PMO, BMO). hydroxylated by either parent. A large library of chimeras Accordingly, the term "hydroxylase' or "monooxygenase' made by SCHEMA-guided recombination of BM-3 with its should be considered to include any enzyme that can insert homologs CYP102A2 and CYP102A3 contains more than 65 one oxygen atom from diatomic oxygen into a Substrate. 3000 new properly folded variants of BM-3. In addition to Exemplary enzymes include the cytochrome P450 monooxy creating new biocatalysts, these chimeragenesis studies dem genases. "Cytochrome P450 monooxygenase' or “P450 US 8,715,988 B2 17 18 enzyme” means an enzyme in the superfamily of P450 heme ity and catalytic efficiency on this Small alkane, the active site thiolate proteins, which are widely distributed in bacteria, was progressively optimized in the evolving P450, as judged fungi, plants and animals. The unique feature which defines by narrowing of the Substrate range and decreasing K. This whether an enzyme is a cytochrome P450 enzyme is tradi is consistent with the sequence of mutational events leading tionally considered to be the characteristic absorption maxi 5 to P450, which indicate a continuous re-arrangement of mum (“Soret band') near 450 nm observed upon binding of active site residues in the search of a configuration optimal for carbon monoxide (CO) to the reduced form of the heme iron the new substrate. For example, the evolution of P450 into of the enzyme. Reactions catalyzed by cytochrome P450 P450, shows that a specialized member of the P450 enzymes include epoxidation, N-dealkylation, O-dealkyla enzyme family can evolve by point mutations and selection tion, S-oxidation and hydroxylation. The most common reac 10 for function on progressively different Substrates to a new, tion catalyzed by P450 enzymes is the monooxygenase reac re-specialized function. Particularly noteworthy is that the tion, i.e., insertion of one atom of oxygen into a substrate native enzymatic function eventually disappeared solely while the other oxygen atom is reduced to water. under the pressure of positive selection for the new function; The term “substrate' or "suitable substrate” means any no negative selection was needed. The disclosure provides an Substance or compound that is converted or meant to be 15 engineered P450, P450, comprising a catalytic rate and converted into another compound by the action of an enzyme specificity comparable to natural alkane monooxygenases catalyst. The term includes alkanes, and includes not only a (FIG. 40). single compound, but also combinations of compounds. Such A "mutation” means any process or mechanism resulting as solutions, mixtures and other materials which contain at in a mutant protein, enzyme, polynucleotide, gene, or cell. least one substrate. Substrates for hydroxylation using the This includes any mutation in which a protein, enzyme, poly cytochrome P450 enzymes of the invention include para nucleotide, or gene sequence is altered, and any detectable nitrophenoxycarboxylic acids (“pNCAs) such as 12-pNCA, change in a cell arising from Such a mutation. Typically, a as well as decanoic acid, myristic acid, lauric acid, and other mutation occurs in a polynucleotide or gene sequence, by fatty acids and fatty acid-derivatives. For alkane/alkene-sub point mutations, deletions, or insertions of single or multiple strates, propane, propene, ethane, ethene, butane, butene, 25 nucleotide residues. A mutation includes polynucleotide pentane, pentene, hexane, hexene, cyclohexane, octane, alterations arising within a protein-encoding region of a gene octene, styrene, p-nitrophenoxyoctane (8-pnpane), and vari as well as alterations in regions outside of a protein-encoding ous derivatives thereof, can be used. The term "derivative' sequence. Such as, but not limited to, regulatory or promoter refers to the addition of one or more functional groups to a sequences. A mutation in a gene can be 'silent”, i.e., not Substrate, including, but not limited, alcohols, amines, halo 30 reflected in an amino acid alteration upon expression, leading gens, thiols, amides, carboxylates, etc. to a “sequence-conservative' variant of the gene. This gener As will be described in more detail below, the invention is ally arises when one amino acid corresponds to more than one based, at least in part, on the generation and expression of codon. novel enzymes that catalyze the hydroxylation of alkanes to Non-limiting examples of a modified amino acid include a alcohols. In one embodiment, novel polypeptides that have 35 glycosylated amino acid, a Sulfated amino acid, a prenylated been engineered to convert an alkane to an alcohol are pro (e.g., farnesylated, geranylgeranylated) amino acid, an acety vided. Such polypeptides include P450 variants that have lated amino acid, an acylated amino acid, a pegylated amino been altered to include amino acid substitutions at specified acid, a biotinylated amino acid, a carboxylated amino acid, a residues. phosphorylated amino acid, and the like. References While these variants will be described in more detail below, 40 adequate to guide one of skill in the modification of amino it is understood that polypeptides of the invention may con acids are replete throughout the literature. Example protocols tain one or more modified amino acids. The presence of are found in Walker (1998) Protein Protocols on CD-ROM modified amino acids may be advantageous in, for example, (Humana Press, Towata, N.J.). (a) increasing polypeptide in vivo half-life, (b) reducing or Recombinant methods for producing and isolating modi increasing polypeptide antigenicity, and (c) increasing 45 fied P450 polypeptides of the invention are described herein. polypeptide storage stability. Amino acid(s) are modified, for In addition to recombinant production, the polypeptides may example, co-translationally or post-translationally during be produced by direct peptide synthesis using Solid-phase recombinant production (e.g., N-linked glycosylation at techniques (e.g., Stewart et al. (1969) Solid-Phase Peptide N—X—S/T motifs during expression in mammaliancells) or Synthesis (WH Freeman Co., San Francisco); and Merrifield modified by Synthetic means. Accordingly, A "mutant. 50 (1963) J. Am. Chem. Soc. 85: 2149-2154; each of which is “variant' or “modified protein, enzyme, polynucleotide, incorporated by reference). Peptide synthesis may be per gene, or cell, means a protein, enzyme, polynucleotide, gene, formed using manual techniques or by automation. Auto or cell, that has been altered or derived, or is in some way mated synthesis may beachieved, for example, using Applied different or changed, from a parent protein, enzyme, poly Biosystems 431 A Peptide Synthesizer (PerkinElmer, Foster nucleotide, gene, or cell. A mutant or modified protein or 55 City, Calif.) in accordance with the instructions provided by enzyme is usually, although not necessarily, expressed from a the manufacturer. mutant polynucleotide or gene. “Cytochrome P450 monooxygenase” or “P450 enzyme” A "parent protein, enzyme, polynucleotide, gene, or cell, means an enzyme in the superfamily of P450 heme-thiolate is any protein, enzyme, polynucleotide, gene, or cell, from proteins, which are widely distributed in bacteria, fungi, which any other protein, enzyme, polynucleotide, gene, or 60 plants and animals. The enzymes are involved in metabolism cell, is derived or made, using any methods, tools or tech of a plethora of both exogenous and endogenous compounds. niques, and whether or not the parent is itself native or mutant. Usually, P450 enzymes act as terminal oxidases in multicom A parent polynucleotide or gene encodes for a parent protein ponent electron transfer chains, i.e., P450-containing or enzyme. monooxygenase systems. The unique feature which defines The disclosure demonstrates, for example, that directed 65 whether an enzyme is a cytochrome P450 enzyme is tradi evolution converted a long-chain fatty acid P450 hydroxylase tionally considered to be the characteristic absorption maxi into a proficient propane hydroxylase. To achieve high activ mum (“Soret band') near 450 nm observed upon binding of US 8,715,988 B2 19 20 carbon monoxide (CO) to the reduced form of the heme iron Bacillus subtilis strain 1A1.15). This wild-type P450 is des of the enzyme. Reactions catalyzed by cytochrome P450 ignated CYP102A3 and shares 58% amino acid sequence enzymes include epoxidation, N-dealkylation, O-dealkyla identity to CYP102A1 (SEQ ID NO:1). SEQ ID NO:4 tion, S-oxidation and hydroxylation. The most common reac includes the amino acid sequence of wild-type cytochrome tion catalyzed by P450 enzymes is the monooxygenase reac P450 from Bacillus cereus. This wild-type P450 is also des tion, i.e., insertion of one atom of oxygen into a substrate ignated CYP102A5 and shares 60% amino acid sequence while the other oxygen atom is reduced to water. identity to CYP102A1 (SEQ ID NO:1). SEQ ID NO:5 "Heme domain refers to anamino acid sequence within an includes the amino acid sequence of wild-type cytochrome oxygen carrier protein, which sequence is capable of binding P450 from Ralstonia metallidurans. This wild-type P450 is an iron-complexing structure Such as a porphyrin. Com 10 designated CYP102E1 and shares 38% amino acid sequence pounds of iron are typically complexed in a porphyrin (tet identity to CYP102A1 (SEQ ID NO:1). SEQ ID NO:6 rapyrrole) ring that may differ in side chain composition. includes the amino acid sequence of wild-type cytochrome Heme groups can be the prosthetic groups of cytochromes P450 from Bradyrhizobium japonicum. This wild-type P450 and are found in most oxygen carrier proteins. Exemplary is also designated CYP102A6 and shares 38% amino acid heme domains include that of P450 BM-3 (P450-P), SEQ 15 sequence identity to CYP102A1 (SEQID NO:1). ID NO:3, as well as truncated or mutated versions that retain The cytochromes P450 set forth in SEQ ID NOs: 1-6 are the capability to bind the iron-complexing structure. The closely related to one another and show a high degree of skilled artisan can readily identify the heme domain of a sequence identity. The sequences can be aligned based on the specific protein using methods known in the art. sequence homology. The alignment provided in FIG. 6A-6F As noted above, polypeptides provided herein generally identifies “equivalent positions' in the sequences. An equiva include a “heme domain which includes amino acid residues lent position denotes a position which, on the basis of the 1 to about 455 of the various P450 sequences provided in, for alignment of the sequence of the parent cytochrome P450 in example, SEQID NO:1 through SEQID NO:13 and SEQID question with the “reference’ cytochrome P450 amino acid NO:125. As used herein, the term “heme domain refers to a sequence in question (e.g. SEQ ID NO: 1) so as to achieve catalytically functional region of the polypeptide that exhibits 25 juxtapositioning of amino acid residues which are common to monooxygenase/hydroxylase activity. A heme domain is a both, corresponds most closely to a particular position in the redox domain capable of binding an iron-complexing struc reference sequence in question. This process can cause gaps ture Such as a porphyrin. Compounds of iron are typically or insertions to appear in the sequences. In the alignment of complexed in a porphyrin (tetrapyrrole) ring that may differin FIGS. 6A-6F or FIGS. 31A-31C, equivalent positions are side chain composition. Heme groups can be the prosthetic 30 shown lined up vertically with one another. For example, groups of cytochromes and are found in most oxygen carrier position 87 in SEQID NO: 1 is equivalent to position 88 in proteins. Exemplary heme domains include that of P450 SEQID NO: 2 and 3 and position 89 in SEQ ID NO: 4. As BM-3 (e.g., amino acid residues 1 to about 455 of SEQ ID shown in FIG. 6A-6F and more particularly in FIGS. 31A NO:1), as well as truncated or mutated versions of these that 31C, the identity in the heme domain is often higher than retain the capability to bind the iron-complexing structure. 35 reported because of large differences in the linker region The skilled artisan can readily identify the heme domain of a (-residue 470) between the heme and reductase domains (see specific protein using methods known in the art. Accordingly, Gonindaraj & Poulos, J. Biol. Chem. 272:7915). the skilled artisan will recognize that amino acid residues 1 to Provided herein are novel modified cytochrome P450 “about' or “approximately 430 means within an acceptable enzymes capable of hydroxylating various alkanes to alco error range for the particular value as determined by one of 40 hols (e.g., methane to produce methanol). Because the heme ordinary skill in the art, which will depend in part on how the domain is capable of this reaction, protein engineering of value is measured or determined, i.e., the limitations of the cytochrome P450s from other sources can be expected to lead measurement system. For example, "about can mean a range to a similar result. It is well known in the art that amino acid of up to 20%, up to 10%, up to 5%, up to 1% of a given value, Substitutions having a particular effect (e.g. that confer activ Such as the number of amino acid residues in a heme domain 45 ity towards a new substrate) can have the same effect in or a reductase domain. closely related proteins. For example, the alignment of these A “protein’ or “polypeptide', which terms are used inter four homologs and 35-E11 shown in FIGS. 6A-6F illustrate changeably herein, comprises one or more chains of chemical the high degree of sequence similarity among the four cyto building blocks called amino acids that are linked together by chromes P450, particularly in the heme domain. It has been chemical bonds called peptide bonds. An "enzyme” means 50 shown on multiple occasions that amino acid Substitutions at any Substance, composed wholly or largely of protein, that equivalent positions in these enzymes have equivalent effects catalyzes or promotes, more or less specifically, one or more on function. For example, the substitution of F87 by A in chemical or biochemical reactions. The term “enzyme” can CYP102A1 increases the peroxygenase activity of the also refer to a catalytic polynucleotide (e.g., RNA or DNA). A enzyme. The same Substitution of the equivalent position in “native' or “wild-type' protein, enzyme, polynucleotide, 55 CYP102A2 and CYP102A3, which is F88A, has the same gene, or cell, means a protein, enzyme, polynucleotide, gene, effect: it increases the peroxygenase. 31 Appel et al. showed or cell that occurs in nature. that the substitutions A74G, F87A and L188Q in CYP102A1 Referring to the sequence comparison of various cyto together broaden the enzyme’s Substrate range, so that it chromes P450 in FIG. 6A-6F, SEQ ID NO:1 includes the hydroxylates octane and naphthalene. Lentz et al. demon amino acid sequence of cytochrome P450 BM-3 isolated 60 strated that Substituting the equivalent positions in from Bacillus megaterium. This wild-type P450 BM-3 is also CYP102A3 (no substitution was required at A74G because designated CYP102A1. SEQ ID NO:2 provides the amino there was a G at this position already), F88 to V and S188 to acid sequence of wild-type cytochrome P450 from Bacillus Q, had a similar effect of conferring the ability to hydroxylate subtilis strain 1A1.15. This wild-type P450 is designated octane and naphthalene. Thus on the basis of this experience, CYP102A2 and shares 59% amino acid sequence identity to 65 it is reasonable to assume that Substitutions equivalent to CYP102A1 (SEQ ID NO:1). SEQ ID NO:3 includes the those described here in CYP102A2 as conferring methane amino acid sequence of wild-type cytochrome P450 from hydroxylation activity will have the same effect in US 8,715,988 B2 21 22 CYP102A2, CYP102A3 as well as other cytochromes P450 position 448, a glycine at position 813 of SEQID NO:4; an that show high identity (55% or greater) to CYP102A such as isoleucine at position 59, a glutamic acid at position 81, the one from B. cereus. Additionally, these P450 enzymes can proline at position 195, a valine at position 374, analanine at be subjected to rounds of directed evolution using the tech position 452, a glycine at position 811 of SEQID NO:5; an niques and screens described herein to obtain and/or increase 5 isoleucine at position 56, a glutamic acid at position 78, methane hydroxylation activity. Less closely-related cyto proline at position 192, a valine at position 376, analanine at chromes P450 with similar domain architecture to BM-3 have position 455, a glycine at position 821 of SEQID NO:6; and also been reported in the literature, including P450s from the an isoleucine at position 52, a glutamic acid at position 74. unrelated B-proteobacterium Ralstonia metallidurans proline at position 188, a valine at position 366, analanine at (CYP102E1) and the gram negative bacterium Bradyrhizo- 10 position 443, a glycine at position 698 of SEQID NO:7. bium japonicum (CYP102A6), and could potentially be simi “Conservative amino acid substitution' or, simply, “con larly engineered for methane oxidation (see FIGS. 6A-6F and servative variations of a particular sequence refers to the Table 4 for corresponding mutations in these two P450s). replacement of one amino acid, or series of amino acids, with Accordingly, in various embodiments, isolated or recom essentially identical amino acid sequences. One of skill will binant polypeptides comprising residues 1 to about 455 of the 15 recognize that individual Substitutions, deletions or additions amino acid sequence set forth in SEQID NO:1,2,3,4, 5 or 6, which alter, add or delete a single amino acid or a percentage are provided. The polypeptides include up to 50, 25, 10, or 5 of amino acids in an encoded sequence result in "conservative conservative amino acid substitutions excluding residues: (a) variations” where the alterations result in the deletion of an 47, 78, 82,94, 142, 175, 184,205, 226, 236, 252, 255, 290, amino acid, addition of an amino acid, or Substitution of an 328, and 353 of SEQID NO:1; (b) 48, 79, 83,95, 143, 176, 20 amino acid with a chemically similar amino acid. 185, 206, 227, 238,254, 257, 292, 330, and 355 of SEQID Conservative substitution tables providing functionally NO:2 or SEQID NO:3; (c)49, 80, 84.96, 144, 177, 186, 207, similar amino acids are well known in the art. For example, 228, 239, 255, 258, 293,331, and 357 of SEQID NO:4; (d) one conservative Substitution group includes Alanine (A), 54, 85, 89, 101, 149, 182, 191,212, 232,243, 259,262, 297, Serine (S), and Threonine (T). Another conservative substi 336, and 361 of SEQID NO:5; and (e) 51, 82, 86, 98, 146, 25 tution group includes Aspartic acid (D) and Glutamic acid 179, 188,208,231, 242,258, 262,296,337, and 363 of SEQ (E). Another conservative substitution group includes Aspar ID NO:6. The amino acid sequence includes the following agine (N) and Glutamine (Q). Yet another conservative sub residues at the following positions: stitution group includes Arginine (R) and Lysine (K). Another (1) a Z1 amino acid residue at positions: (a) 47, 82, 142, conservative Substitution group includes Isoleucine, (I) Leu 205,236,252, and 255 of SEQID NO:1; (b)48, 83, 143,206, 30 cine (L), Methionine (M), and Valine (V). Another conserva 238,254, and 257 of SEQID NO:2 or SEQID NO:3; (c)49, tive substitution group includes Phenylalanine (F), Tyrosine 84, 144, 207,239, 255, and 258 of SEQID NO:4; (d) 54, 89, (Y), and Tryptophan (W). 149,212, 243, 259, and 262 of SEQID NO:5; and (e) 51, 86, Thus, “conservative amino acid substitutions of a listed 146, 208, 242, 258, and 262 of SEQID NO: 6; polypeptide sequence (e.g., SEQID NOs: 1-6) include sub (2) a Z2 amino acid residue at positions: (a) 94, 175, 184, 35 stitutions of a percentage, typically less than 10%, of the 290, and 353 of SEQID NO:1; (b) 95, 176, 185,292, and 355 amino acids of the polypeptide sequence, with a conserva of SEQID NO:2 or SEQID NO:3; (c) 96, 177, 186, 293, and tively selected amino acid of the same conservative substitu 357 of SEQID NO:4; (d) 101, 182, 191,297, and 361 of SEQ tion group. Accordingly, a conservatively Substituted varia ID NO:5; and (e) 98,179, 188,296, and 363 of SEQID NO:6; tion of a polypeptide of the invention can contain 100, 75, 50. (3) a Z3 amino acid residue at position: (a) 226 of SEQID 40 25, or 10 substitutions with a conservatively substituted varia NO:1; (b) 227 of SEQID NO:2 or SEQID NO:3; (c) 228 of tion of the same conservative Substitution group. SEQID NO:4; (d) 232 of SEQID NO:5; and (e) 231 of SEQ It is understood that the addition of sequences which do not ID NO:6; and alter the encoded activity of a nucleic acid molecule. Such as (4) a Z4 amino acid residue at positions: (a) 78 and 328 of the addition of a non-functional or non-coding sequence, is a SEQID NO:1; (b) 79 and 330 of SEQ ID NO:2 or SEQID 45 conservative variation of the basic nucleic acid. The “activ NO:3; (c)80 and 331 of SEQID NO:4; (d)85 and 336 of SEQ ity of an enzyme is a measure of its ability to catalyze a ID NO:5; and (e) 82 and 337 of SEQID NO:6. reaction, i.e., to “function', and may be expressed as the rate In other embodiments, the polypeptide further includes a at which the product of the reaction is produced. For example, Z3 amino acid residue at position: (a) 285 of SEQID NO:1; enzyme activity can be represented as the amount of product (b) 287 of SEQID NO:2 or 3; (c) 288 of SEQID NO:4; (d) 50 produced per unit of time or per unit of enzyme (e.g., con 292 of SEQ ID NO:5; and (e) 291 of SEQ ID NO:6. A Z3 centration or weight), or in terms of affinity or dissociation amino acid residue includes lysine (K), arginine (R), or his constants. As used interchangeably herein a “cytochrome tidine (H). In some aspects, the amino acid residue at this P450 activity”, “biological activity of cytochrome P450” or position is an arginine (R). “functional activity of cytochrome P450, refers to an activity In some embodiments, the polypeptide can further com- 55 exerted by a cytochrome P450 protein, polypeptide or nucleic prise, in addition to one or more of the residue replacements acid molecule on a cytochrome P450 polypeptide substrate, above, an isoleucine at position 52, a glutamic acid at position as determined in vivo, or in vitro, according to standard tech 74, proline at position 188, valine at position 366, an alanine niques. The biological activity of cytochrome P450 is at position 443, a glycine at position 698 of SEQ ID NO:1; described herein. isoleucine at position 53, a glutamic acid at position 75, 60 One of skill in the art will appreciate that many conserva proline at position 189, a valine at position 368, an alanine at tive variations of the nucleic acid constructs which are dis position 445, a glycine at position 709 of SEQID NO:2; an closed yield a functionally identical construct. For example, isoleucine at position 53, a glutamic acid at position 75, owing to the degeneracy of the genetic code, “silent Substi proline at position 189, a valine at position 368, an alanine at tutions’ (i.e., Substitutions in a nucleic acid sequence which position 445, a glycine at position 701 of SEQID NO:3: an 65 do not result in an alteration in an encoded polypeptide) are an isoleucine at position 55, a glutamic acid at position 77. implied feature of every nucleic acid sequence which encodes proline at position 191, a valine at position 371, analanine at an amino acid. Similarly, "conservative amino acid substitu US 8,715,988 B2 23 24 tions.” in one or a few amino acids in an amino acid sequence Accordingly, Some amino acid residues at specific posi are substituted with different amino acids with highly similar tions in a polypeptide are “excluded from conservative properties, are also readily identified as being highly similar amino acid substitutions. Instead, these restricted amino acids to a disclosed construct. Such conservative variations of each are generally chosen from a “Z” group. “Z” groups are amino disclosed sequence are a feature of the polypeptides provided 5 acid groups designated Z1, Z2, Z3, Z4, Z5, and Z6. These herein. amino acid residues can be substituted at a designated posi It will be appreciated by those skilled in the art that due to tion to obtain a modified or variant polypeptide. While some the degeneracy of the genetic code, a multitude of nucleotide overlap may occur, the members of each “Z” group are not sequences encoding modified P450 polypeptides of the “conservative amino acid substitutions' as defined above. invention may be produced, some of which bear substantial 10 The “Z” group members are established based upon the sub identity to the nucleic acid sequences explicitly disclosed stitutions set forth in Table 4. In general, these mutations herein. For instance, codons AGA, AGG, CGA, CGC, CGG, represent non-conservative Substitutions at the indicated and CGU all encode the amino acid arginine. Thus, at every position in the designated sequence. For example, as shown in position in the nucleic acids of the invention where an argin 15 the first column at the first row, an arginine (R) to cysteine (C) ine is specified by a codon, the codon can be altered to any of substitution is made in SEQ ID NO:1 at position 47. This the corresponding codons described above without altering substitution is generally not considered a “conservative sub the encoded polypeptide. It is understood that U in an RNA stitution. Similar substitutions are made throughout the vari sequence corresponds to T in a DNA sequence. ous sequences at the indicated positions in order to modify the “Conservative variants’ are proteins or enzymes in which a activity of the polypeptide. Such modifications are made in given amino acid residue has been changed without altering order to generate polypeptides that are active on Substrates overall conformation and function of the protein or enzyme, Such as alkanes. Referring again to Table 4, a parent polypep including, but not limited to, replacement of an amino acid tide (e.g., SEQID NO:1, 2, 3, 4, 5 or 6) can be modified to with one having similar properties, including polar or non include an amino acid substitution at a particular residue by polar character, size, shape and charge. Amino acids other 25 modifying the nucleic acid sequence that encodes the parent than those indicated as conserved may differ in a protein or polypeptide (see below for additional information regarding enzyme so that the percent protein or amino acid sequence modifying nucleic acid sequences). The region of P450 modi similarity between any two proteins of similar function may fied is identified on the left side of Table 4. The specific vary and can be, for example, at least 30%, at least 50%, at substitutions are identified throughoutTable 4. Accordingly, least 70%, at least 80%, at least 90%, at least 95%, at least 30 “Z” groups are identified as follows: a Z1 amino acid residue 98% or at least 99%, as determined according to an alignment includes glycine (G), asparagine (N), glutamine (Q), scheme. As referred to herein, "sequence similarity” means serine (S), threonine (T), tyrosine (Y), or cysteine (C); a Z2 the extent to which nucleotide or protein sequences are amino acid residue includes alanine (A), Valine (V), leucine related. The extent of similarity between two sequences can (L), isoleucine (I), proline (P), or methionine (M); a Z3 amino be based on percent sequence identity and/or conservation. 35 acid residue includes lysine (K), or arginine (R); a Z4 amino “Sequence identity” herein means the extent to which two acid residue includes tyrosine (Y), phenylalanine (F), tryp nucleotide or amino acid sequences are invariant. “Sequence tophan (W), or histidine (H); a Z5 amino acid residue includes alignment’ means the process of lining up two or more threonine (T), valine (V), and isoleucine (I); and a Z6 amino sequences to achieve maximal levels of identity (and, in the acid residue includes aspartic acid (D) and glutamic acid (E). case of amino acid sequences, conservation) for the purpose 40 Accordingly, a polypeptide provided herein can include of assessing the degree of similarity. Numerous methods for amino acids that are “restricted to particular amino acid aligning sequences and assessing similarity/identity are substitutions. For example, residues 47, 78, 82,94, 142, 175, known in the art such as, for example, the Cluster Method, 184, 205, 226, 236, 252, 255, 290, 328, and 353 can be wherein similarity is based on the MEGALIGN algorithm, as restricted to substitutions set forth in a “Z” group as defined well as BLASTN, BLASTP, and FASTA (Lipman and Pear 45 below and throughout the specification. It is understood that son, 1985; Pearson and Lipman, 1988). When using all of not all of the identified restricted residues need be altered in these programs, the preferred settings are those that results in the same polypeptide. In some embodiments, the invention the highest sequence similarity. encompasses polypeptides where only about 80%, 85%, 90% Non-conservative modifications of a particular polypep or 95% of the restricted amino acid residues are altered in a tide are those which substitute any amino acid not character 50 given polypeptide. ized as a conservative Substitution. For example, any Substi Member residues of each “Z” group can be included at the tution which crosses the bounds of the six groups set forth appropriate position in a designated polypeptide. Accord above. These include substitutions of basic or acidic amino ingly, in other embodiments, the amino acid sequence of the acids for neutral amino acids, (e.g., Asp, Glu, ASn, or Gln for polypeptide includes residues at the following positions: Val, Ile, Leu or Met), aromatic amino acid for basic or acidic 55 (1) a glycine (G), glutamine (Q), serine (S), threonine (T), amino acids (e.g., Phe, Tyror Trp for Asp, ASn, Glu or Gln) or or cysteine (C) amino acid residue at position: (a) 47, 82, 142, any other Substitution not replacing an amino acid with a like 205,236,252, and 255 of SEQID NO:1; (b)48, 83, 143,206, amino acid. Basic side chains include lysine (K), arginine (R), 238,254, and 257 of SEQID NO:2 or SEQID NO:3; (c) 49, histidine (H); acidic side chains include aspartic acid (D), 84, 144, 207,239, 255, and 258 of SEQID NO:4; (d) 54, 89, glutamic acid (E); uncharged polar side chains include gly 60 149,212, 243, 259, and 262 of SEQID NO:5; and (e) 51, 86, cine (G), asparagine (N), glutamine (Q), serine (S), threonine 146, 208, 242,258, and 262 of SEQID NO:6; (T), tyrosine (Y), cysteine (C); nonpolar side chains include (2) a valine (V) or isoleucine (I) amino acid residue at alanine (A), Valine (V), leucine (L), isoleucine (I), proline (P), position: (a) 94, 175, 184,290, and 353 of SEQID NO:1; (b) phenylalanine (F), methionine (M), tryptophan (W); beta 95, 176, 185,292, and 355 of SEQID NO:2 or SEQID NO:3: branched side chains include threonine (T), valine (V), iso 65 (c) 96, 177, 186, 293, and 357 of SEQID NO:4; (d) 101, 182, leucine (I); aromatic side chains include tyrosine (Y), pheny 191,297, and 361 of SEQID NO:5; and (e) 98,179, 188,296, lalanine (F), tryptophan (W), histidine (H). and 363 of SEQID NO:6; US 8,715,988 B2 25 26 (3) an arginine amino acid residue at position: (a) 226 of genome of a recombinant organism, such that it is not asso SEQID NO:1; (b) 227 of SEQID NO:2 or SEQID NO:3; (c) ciated with nucleotide sequences that normally flank the 228 of SEQID NO:4; (d) 232 of SEQID NO:5; and (e) 231 polynucleotide as it is found in nature is a recombinant poly of SEQID NO:6; and nucleotide. A protein expressed in vitro or in vivo from a (4) a phenylalanine (F) or histidine (H)amino acid residue 5 recombinant polynucleotide is an example of a recombinant at position: (a)78 and 328 of SEQID NO:1; (b)79 and 330 of polypeptide. Likewise, a polynucleotide sequence that does SEQID NO:2 or SEQID NO:3; (c) 80 and 331 of SEQ ID not appear in nature, for example a variant of a naturally NO:4; (d) 85 and 336 of SEQID NO:5; and (e) 82 and 337 of occurring gene, is recombinant. For example, an "isolated SEQID NO:6. nucleic acid molecule is one which is separated from other In other embodiments, the amino acid sequence of the 10 polypeptide includes residues at the following positions: nucleic acid molecules which are present in the natural Source (1) a serine (S) residue at position: (a) 82, 142, and 255 of of the nucleic acid. For example, with regards to genomic SEQID NO:1; (b) 83, 143 and 257 of SEQID NO:2 or SEQ DNA, the term "isolated includes nucleic acid molecules ID NO:3; (c) 84, 144, and 258 of SEQID NO:4; (d) 89, 149, which are separated from the chromosome with which the and 262 of SEQ ID NO:5; (e) 86, 146 and 262 of SEQ ID 15 genomic DNA is naturally associated. Typically, an “iso NO:6; lated nucleic acid is free of sequences which naturally flank (2) a cysteine (C) amino acid residue at position: (a) 47 and the nucleic acid (i.e., sequences located at the 5' and 3' ends of 205 of SEQID NO:1; (b) 48 and 206 of SEQID NO:2 or SEQ the nucleic acid) in the genomic DNA of the organism from ID NO:3; (c) 49 and 207 of SEQID NO:4; (d) 54 and 212 of which the nucleic acid is derived. For example, in various SEQID NO:5; (e) 51 and 208 of SEQID NO:6; embodiments, the isolated nucleic acid molecule can contain (3) a glutamine (Q) amino acid residue at position: (a) 236 less than about 5 kb, 4 kb, 3 kb, 2 kb, 1 kb, 0.5 kb or 0.1 kb of of SEQID NO:1; (b) 238 of SEQID NO:2 or SEQID NO:3: nucleotide sequences which naturally flank the nucleic acid (c) 239 of SEQID NO:4; (d) 243 of SEQID NO:5; (e) 242 of molecule in genomic DNA of the cell from which the nucleic SEQID NO:6; acid is derived. Moreover, an "isolated nucleic acid mol (4) a glycine (G)amino acid residue at position: (a) 252 of 25 ecule, such as a cDNA molecule, can be substantially free of SEQID NO:1; (b) 254 of SEQID NO:2 or SEQID NO:3; (c) other cellular material, or culture medium when produced by 255 of SEQ ID NO:4; (d) 259 of SEQ ID NO:5; (e) 258 of recombinant techniques, or Substantially free of chemical SEQID NO:6; precursors or other chemicals when chemically synthesized. (5) a valine (V) amino acid residue at position: (a) 184,290 It is envisioned that modifying the regions of a P450 and 353 of SEQID NO:1; (b) 185,292, and 355 of SEQID 30 polypeptide associated with the “linker region and “reduc NO:2 or SEQ ID NO:3; (c) 186, 293 and 357 of SEQ ID NO:4; (d) 191,297, and 361 of SEQID NO:5; (e) 188,296, tase' domain. The “linker region generally includes those and 363 of SEQID NO:6; residues from positions about 450 to about 480 of, for (6) an isoleucine (I) amino acid residue at position: (a) 94 example, SEQ ID NO:1-6. In addition, the “reductase' and 175 of SEQID NO:1; (b) 95 and 176 of SEQID NO:2 or 35 domain generally includes those residues from positions SEQID NO:3; (c) 96 and 177 of SEQID NO:4; (d) 101 and about 470 to about 1050 of SEQ ID NOs: 1-6. As shown in 182 of SEQID NO:5; (e) 98 and 179 of SEQID NO:6; and Table 4. polypeptides that include amino acid Substitutions at (7) a phenylalanine (F) amino acid residue at position: (a) these residues are provided. These substitutions can be made 78 and 328 of SEQID NO:1; (b)79 and 330 of SEQID NO:2 in conjunction with substitutions in the “heme' domain or or SEQID NO:3; (c) 80 and 331 of SEQID NO:4; (d) 85 and 40 they can be made in isolation from those in the heme domain. 336 of SEQID NO:5; (e) 82 and 337 of SEQID NO:6. Accordingly, in yet another embodiment, an isolated or In some embodiments, the polypeptide can further com recombinant polypeptide that includes residues 456-1048 of prise, in addition to one or more of the residue replacements SEQID NO:1, 456-1059 of SEQID NO:2, 456-1053 of SEQ above, an isoleucine at position 52, a glutamic acid at position ID NO:3, 456-1064 of SEQID NO:4, 456-1063 of SEQ ID 74, proline at position 188, valine at position 366, an alanine 45 NO:5, or 456-1077 of SEQ ID NO:6, is provided. The at position 443, a glycine at position 698 of SEQ ID NO:1; polypeptide includes an amino acid sequence with up to 65. isoleucine at position 53, a glutamic acid at position 75, 40, 25, or 10 conservative amino acid substitutions excluding proline at position 189, a valine at position 368 of SEQ ID residues: (a) 464, 631, 645,710 and 968 of SEQID NO:1; (b) NO:2; an isoleucine at position 53, a glutamic acid at position 475, 641, 656, 721 and 980 of SEQID NO:2; (c) 467, 634, 75, proline at position 189, a valine at position 368 of SEQID 50 NO:3: an isoleucine at position 55, a glutamic acid at position 648, 713 and 972 of SEQID NO:3; (d) 477,644,659,724 and 77, proline at position 191, a valine at position 371 of SEQID 983 of SEQID NO:4; (e) 472, 640, 656,723 and 985 of SEQ NO:4; an isoleucine at position 59, a glutamic acid at position ID NO:5; and (f)480,648,664,733 and 997 of SEQID NO:6. 81, proline at position 195, a valine at position 374 of SEQID The amino acid sequence includes the following residues: NO:5; an isoleucine at position 56, a glutamic acid at position 55 (1) a Z1, Z3, Z4, or Z5 amino acid residue at position: (a) 78, proline at position 192, a valine at position 376 of SEQID 464 of SEQ ID NO:1; (b) 475 of SEQ ID NO:2; (c) 467 of NO:6; and an isoleucine at position 52, a glutamic acid at SEQID NO:3; (d) 477 of SEQID NO:4; (e) 472 of SEQ ID position 74, proline at position 188, a valine at position 366 of NO:5; and (f) 480 of SEQID NO:6; SEQID NO:7. (2) a Z1 amino acid residue at position: (a) 631 and 710 of A polynucleotide, polypeptide, or other component is "iso 60 SEQID NO1; (b) 641 and 721 of SEQID NO:2; (c) 634 and lated when it is partially or completely separated from com 713 of SEQID NO:3; (d) 644 and 724 of SEQID NO:4; (e) ponents with which it is normally associated (other proteins, 640 and 723 of SEQID NO:5; and (f) 648 and 733 of SEQID nucleic acids, cells, synthetic reagents, etc.). A nucleic acid or NO:6; polypeptide is “recombinant' when it is artificial or engi (3) a Z3 amino acid residue at position: (a) 645 of SEQID neered, or derived from an artificial or engineered protein or 65 NO:1; (b) 656 of SEQID NO:2; (c) 648 of SEQID NO:3; (d) nucleic acid. For example, a polynucleotide that is inserted 659 of SEQID NO:4; (e) 656 of SEQID NO:5; and (f) 664 of into a vector or any other heterologous location, e.g., in a SEQID NO:6; and US 8,715,988 B2 27 28 (4) a Z2 amino acid residue at position: (a) 968 of SEQID ID NO:3; (c) 84, 144, and 258 of SEQID NO:4; (d) 89, 149, NO:1; (b) 980 of SEQID NO:2; (c) 972 of SEQID NO:3; (d) and 262 of SEQ ID NO:5; (e) 86, 146 and 262 of SEQ ID 983 of SEQID NO:4; (e) 985 of SEQID NO:5; and (f) 997 of NO:6; SEQID NO:6. (2) a cysteine (C) amino acid residue at position: (a) 47 and Z1 is an amino acid residue selected from the group con- 5 205 of SEQID NO:1; (b) 48 and 206 of SEQID NO:2 or SEQ sisting of glycine (G), asparagine (N), glutamine (Q), ID NO:3; (c) 49 and 207 of SEQID NO:4; (d) 54 and 212 of serine (S), threonine (T), tyrosine (Y), and cysteine (C); Z2 is SEQID NO:5; (e) 51 and 208 of SEQID NO:6; an amino acid residue selected from the group consisting of (3) a glutamine (Q) amino acid residue at position: (a) 236 alanine (A), Valine (V), leucine (L), isoleucine (I), proline (P), of SEQID NO:1; (b) 238 of SEQID NO:2 or SEQID NO:3: and methionine (M); Z3 is an amino acid residue selected 10 (c) 239 of SEQID NO:4; (d) 243 of SEQID NO:5; (e) 242 of from the group consisting of lysine (K), and arginine (R); Z4 SEQID NO:6; is an amino acid residue selected from the group consisting of (4) a glycine (G)amino acid residue at position: (a) 252 of tyrosine (Y), phenylalanine (F), tryptophan (W), and histi SEQID NO:1; (b) 254 of SEQID NO:2 or SEQID NO:3; (c) dine (H); and is an amino acid residue selected from the group 255 of SEQ ID NO:4; (d) 259 of SEQ ID NO:5; (e) 258 of consisting of threonine (T), Valine (V), and isoleucine (I). 15 SEQID NO:6; In other embodiments, the amino acid sequence of the (5) a valine (V) amino acid residue at position: (a) 184,290 polypeptide includes residues at the following positions: and 353 of SEQID NO:1; (b) 185,292, and 355 of SEQID (1) a glycine (G), arginine (R), tyrosine (Y), or threonine NO:2 or SEQ ID NO:3; (c) 186, 293 and 357 of SEQ ID (T) amino acid residue at position: (a) 464 of SEQID NO:1; NO:4; (d) 191,297, and 361 of SEQID NO:5; (e) 188,296, (b) 475 of SEQID NO:2; (c)467 of SEQID NO:3; (d) 477 of 20 and 363 of SEQID NO:6; SEQID NO:4; (e) 472 of SEQID NO:5; and (f) 480 of SEQ (6) an isoleucine (I) amino acid residue at position: (a) 94 ID NO:6; and 175 of SEQID NO:1; (b) 95 and 176 of SEQID NO:2 or (2) an asparagine (N) amino acid residue at position: (a) SEQID NO:3; (c) 96 and 177 of SEQID NO:4; (d) 101 and 631 of SEQ ID NO:1; (b) 641 of SEQ ID NO:2; (c) 634 of 182 of SEQID NO:5; (e) 98 and 179 of SEQID NO:6; and SEQID NO:3; (d) 644 of SEQID NO:4; (e) 640 of SEQ ID 25 (7) a phenylalanine (F) amino acid residue at position: (a) NO:5; and (f) 648 of SEQID NO:6; 78 and 328 of SEQID NO:1; (b)79 and 330 of SEQID NO:2 (3) an arginine (R) amino acid residue at position: (a) 645 or SEQID NO:3; (c) 80 and 331 of SEQID NO:4; (d) 85 and of SEQID NO:1; (b) 656 of SEQID NO:2; (c) 648 of SEQID 336 of SEQID NO:5; (e) 82 and 337 of SEQID NO:6; and NO:3; (d) 659 of SEQID NO:4; (e) 656 of SEQID NO:5; and (8) an arginine (R) amino acid residue at position: (a) 226 (f) 664 of SEQID NO:6; 30 of SEQID NO:1; b) 227 of SEQID NO:2 or SEQID NO:3: (4) a threonine (T) amino acid residue at position: (a) 710 (c) 228 of SEQID NO:4; (d) 232 of SEQID NO:5; (e) 231 of of SEQID NO:1; (b)721 of SEQID NO:2; (c)713 of SEQID SEQID NO:6. NO:3; (d) 724 of SEQID NO:4; (e) 723 of SEQID NO:5; and In other embodiments, the polypeptide further includes a (f) 733 of SEQID NO:6; and Z3 amino acid residue at position: (a) 285 of SEQID NO:1; (5) a lysine (L) amino acid residue at position: (a) 968 of 35 (b) 287 of SEQID NO:2 or 3; (c) 288 of SEQID NO:4; (d) SEQID NO:1; (b) 980 of SEQID NO:2; (c) 972 of SEQ ID 292 of SEQ ID NO:5; and (e) 291 of SEQ ID NO:6. A Z3 NO:3:(d)983 of SEQID NO:4; (e)985 of SEQID NO:5; and amino acid residue includes lysine (K), arginine (R), or his (f) 997 of SEQ ID NO:6. tidine (H). In some aspects, the amino acid residue at this The invention envisions the synthesis of chimeric polypep position is an arginine (R). tides that include a heme catalytic domain derived from a first 40 In some embodiments, the polypeptide can further com Source and an electron transfer domain (e.g., a reductase prise, in addition to one or more of the residue replacements domain) derived from a second source. The first source and above, an isoleucine at position 52, a glutamic acid at position the second source may differ by the genus, or the species from 74, proline at position 188, valine at position 366, an alanine which they are derived, or they may be derived from the same at position 443, a glycine at position 698 of SEQ ID NO:1; species as one another but from different organelles or com- 45 isoleucine at position 53, a glutamic acid at position 75, partments in the same species. Generally the electron transfer proline at position 189, a valine at position 368 of SEQ ID domain is a heme reductase domain. NO:2; an isoleucine at position 53, a glutamic acid at position Accordingly, in addition to providing variants of full 75, proline at position 189, a valine at position 368 of SEQID length P450 polypeptides, chimeric polypeptides that com NO:3: an isoleucine at position 55, a glutamic acid at position prise: 1) a variantheme domain isolated from a first organism 50 77, proline at position 191, a valine at position 371 of SEQID and modified to include a new activity; and 2) a variant NO:4; an isoleucine at position 59, a glutamic acid at position reductase domain isolated from a second organism and modi 81, proline at position 195, a valine at position 374 of SEQID fied to include a new activity oran activity that a complements NO:5; an isoleucine at position 56, a glutamic acid at position the heme domain, are provided. Methods for engineering a 78, proline at position 192, a valine at position 376 of SEQID chimeric polypeptide of the invention are disclosed in U.S. 55 NO:6; and an isoleucine at position 52, a glutamic acid at Patent Application Publication Number 20050124025, the position 74, proline at position 188, a valine at position 366 of contents of which are incorporated herein by reference. SEQID NO:7. In another embodiment, an isolated or recombinant Accordingly, in some embodiments, a polypeptide pro polypeptide that includes residues 1 to about 455 of the amino vided herein includes amino acid residue substitutions that acid sequence set forth in SEQ ID NO:1, 2, 3, 4, 5 or 6 is 60 correspond to positions in a particular sequence at least 80%, provided. The polypeptide comprises amino acid residues in 85%, 90%, or 95% of the time. In other words, the invention the amino acid sequence that correspond to the following encompasses polypeptides that contain the recited amino acid positions, at least 80%, 85%, 90%, or 95% of the amino acid substitutions at 80%, 85%, 90%, or 95% of the recited posi residues in the amino acid sequence at the positions specified tions in a given sequence. The skilled artisan will recognize below are present: 65 that not every Substitution from a group of Substitutions is (1) a serine (S) residue at position: (a) 82, 142, and 255 of necessary to obtain a modified polypeptide that is active on an SEQID NO:1; (b) 83, 143 and 257 of SEQID NO:2 or SEQ alkane Substrate. US 8,715,988 B2 29 30 In another embodiment, an isolated or recombinant substitutions excluding residues 47, 52, 74, 78, 82,94, 142, polypeptide that includes the amino acid sequence set forth in 175, 184, 188, 205, 226, 236, 252, 255, 290, 328, 353, 366, SEQ ID NO:7, is provided. SEQ ID NO:7 (see FIG. 14) 443, 464, 698 and 710. provides an amino acid sequence of a P450 BM-3 variant. In other embodiments, isolated or recombinant polypep This variant is also designated 35-E11. The polypeptide may tides of the invention include: (a) a polypeptide comprising contain up to 75, 50, 25, or 10 conservative amino acid sub the amino acid sequence set forth in SEQ ID NO:7; (b) a stitutions excluding residues 47, 78, 82, 94, 142, 175, 184, polypeptide consisting of the amino acid sequence set forth in 205, 226, 236, 252, 255, 290, 328, 353, 464, and 710. SEQ ID NO:7; (c) a polypeptide comprising an amino acid In yet another embodiment, an isolated or recombinant sequence having at least 60%, 70%, 80%, 90%, or 98% polypeptide that includes the amino acid sequence set forth in 10 sequence identity to the amino acid sequence set forth in SEQ SEQ ID NO:8, is provided. SEQ ID NO:8 (see FIG. 15) ID NO:7 excluding amino acid residues 47, 78, 82,94, 142, provides an amino acid sequence of a P450 BM-3 variant. 175, 184,205, 226,236,252,255,290,328,353,464, and 710 This variant is also designated 35-E11-E464R. The polypep of SEQID NO:7; or (d) a polypeptide comprising an amino tide may contain up to 75, 50, 25, or 10 conservative amino 15 acid sequence that can be optimally aligned with the sequence acid substitutions excluding residues 47,78, 82,94, 142, 175, of SEQIDNO:7 to generate a similarity score of at least 1830, 184, 205, 226, 236,252, 255, 290, 328, 353, 464, and 710. using the BLOSUM62 matrix, a gap existence penalty of 11, In another embodiment, an isolated or recombinant and a gap extension penalty of 1, excluding amino acid resi polypeptide that includes the amino acid sequence set forth in dues 47,78, 82,94, 142, 175, 184, 205, 226, 236, 252, 255, SEQ ID NO:9, is provided. SEQ ID NO:9 (see FIG. 16) 290, 328, 353,464, and 710 of SEQID NO:7. provides an amino acid sequence of a P450 BM-3 variant. In other embodiments, isolated or recombinant polypep This variant is also designated 35-E11-E464Y. The polypep tides of the invention include: (a) a polypeptide comprising tide may contain up to 75, 50, 25, or 10 conservative amino the amino acid sequence set forth in SEQ ID NO:8; (b) a acid substitutions excluding residues 47,78, 82,94, 142, 175, polypeptide consisting of the amino acid sequence set forth in 184, 205, 226, 236,252, 255, 290, 328, 353, 464, and 710. 25 SEQID NO:8; or (c) a polypeptide comprising an amino acid In another embodiment, an isolated or recombinant sequence having at least 60%, 70%, 80%, 90%, or 98% polypeptide that includes the amino acid sequence set forth in sequence identity to the amino acid sequence set forth in SEQ SEQ ID NO:10, is provided. SEQ ID NO:10 (see FIG. 17) ID NO:8 excluding amino acid residues 47, 78, 82,94, 142, provides an amino acid sequence of a P450 BM-3 variant. 175, 184,205, 226,236,252,255,290,328,353,464, and 710 This variant is also designated 35-E11-E464T. The polypep 30 of SEQID NO:8. In other embodiments, isolated or recombinant polypep tide may contain up to 75, 50, 25, or 10 conservative amino tides of the invention include: (a) a polypeptide comprising acid substitutions excluding residues 47,78, 82,94, 142, 175, the amino acid sequence set forth in SEQ ID NO:9; (b) a 184, 205, 226, 236,252, 255, 290, 328, 353, 464, and 710. polypeptide consisting of the amino acid sequence set forth in In yet another embodiment, the polypeptides of SEQ ID 35 SEQID NO:9; or (c) a polypeptide comprising an amino acid NO:7, 8, 9 or 10, further optionally include an arginine (R) at sequence having at least 60%, 70%, 80%, 90%, or 98% amino acid residue position 285. sequence identity to the amino acid sequence set forth in SEQ In another embodiment, an isolated or recombinant ID NO:9 excluding amino acid residues 47, 78, 82,94, 142, polypeptide that includes the amino acid sequence set forth in 175, 184,205, 226,236,252,255,290,328,353,464, and 710 SEQ ID NO:11, is provided. SEQ ID NO:11 (see FIG. 18) 40 of SEQ ID NO:9. provides an amino acid sequence of a P450 BM-3 variant. In other embodiments, isolated or recombinant polypep This variant is also designated 20-D3. The polypeptide may tides of the invention include: (a) a polypeptide comprising contain up to 75, 50, 25, or 10 conservative amino acid sub the amino acid sequence set forth in SEQ ID NO:10; (b) a stitutions excluding residues 47, 78, 82, 94, 142, 175, 184, polypeptide consisting of the amino acid sequence set forth in 205, 226, 236, 252, 255, 285,290, 328, 353,464, and 710. 45 SEQ ID NO:10; or (c) a polypeptide comprising an amino In another embodiment, an isolated or recombinant acid sequence having at least 60%, 70%, 80%, 90%, or 98% polypeptide that includes the amino acid sequence set forth in sequence identity to the amino acid sequence set forth in SEQ SEQ ID NO:12, is provided. SEQ ID NO:12 (see FIG. 19) ID NO:10 excluding amino acid residues 47, 78, 82,94, 142, provides an amino acid sequence of a P450 BM-3 variant. 175, 184,205, 226,236,252,255,290,328,353,464, and 710 This variant is also designated 23-1D. The polypeptide may 50 of SEQID NO:10. contain up to 75, 50, 25, or 10 conservative amino acid sub In other embodiments, isolated or recombinant polypep stitutions excluding residues 47, 78, 82, 94, 142, 175, 184, tides of the invention include: (a) a polypeptide comprising 205, 226, 236, 252, 255, 290, 328, 353, 464, 645, and 710. the amino acid sequence set forth in SEQ ID NO:11; (b) a In another embodiment, an isolated or recombinant polypeptide consisting of the amino acid sequence set forth in polypeptide that includes the amino acid sequence set forth in 55 SEQ ID NO:11; or (c) a polypeptide comprising an amino SEQ ID NO:13, is provided. SEQ ID NO:13 (see FIG. 20) acid sequence having at least 60%, 70%, 80%, 90%, or 98% provides an amino acid sequence of a P450 BM-3 variant. sequence identity to the amino acid sequence set forth in SEQ This variant is also designated 21-4G. The polypeptide may ID NO:11 excluding amino acid residues 47, 78, 82,94, 142, contain up to 75, 50, 25, or 10 conservative amino acid sub 175, 184, 205, 226, 236, 252, 255, 285,290, 328, 353, 464, stitutions excluding residues 47, 78, 82, 94, 142, 175, 184, 60 and 710 of SEQID NO:11. 205, 226, 236, 252, 255, 290, 328, 353, 464, 631, and 710. In other embodiments, isolated or recombinant polypep In another embodiment, an isolated or recombinant tides of the invention include: (a) a polypeptide comprising polypeptide that includes the amino acid sequence set forth in the amino acid sequence set forth in SEQ ID NO:12; (b) a SEQ ID NO:125, is provided. SEQ ID NO:125 (see FIG. polypeptide consisting of the amino acid sequence set forth in 21B) provides an amino acid sequence of a P450 BM-3 vari 65 SEQ ID NO:12; or (c) a polypeptide comprising an amino ant. This variant is also designated P450. The polypeptide acid sequence having at least 60%, 70%, 80%, 90%, or 98% may contain up to 75, 50, 25, or 10 conservative amino acid sequence identity to the amino acid sequence set forth in SEQ US 8,715,988 B2 31 32 ID NO:12 excluding amino acid residues 47,78, 82,94, 142, additional empty amino acid position inserted into an already 175, 184, 205, 226, 236, 252, 255, 290, 328, 353, 464, 645, opened gap. The alignment is defined by the amino acids and 710 of SEQID NO:12. positions of each sequence at which the alignment begins and In other embodiments, isolated or recombinant polypep ends, and optionally by the insertion of a gap or multiple gaps tides of the invention include: (a) a polypeptide comprising in one or both sequences so as to arrive at the highest possible the amino acid sequence set forth in SEQ ID NO:13; (b) a score. While optimal alignment and scoring can be accom polypeptide consisting of the amino acid sequence set forth in plished manually, the process is facilitated by the use of a SEQ ID NO: 13; or (c) a polypeptide comprising an amino computer-implemented alignment algorithm, e.g., gapped acid sequence having at least 60%, 70%, 80%, 90%, or 98% BLAST 2.0, described in Altschulet al. (1997) Nucl. Acids sequence identity to the amino acid sequence set forth in SEQ 10 Res. 25: 3389-3402 (incorporated by reference herein), and ID NO:13 excluding amino acid residues 47,78, 82,94, 142, made available to the public at the National Center for Bio 175, 184, 205, 226, 236, 252, 255, 290, 328, 353, 464, 631, technology Information (NCBI) Website (www.ncbi.nlm.ni and 710 of SEQID NO:13. h.gov). Optimal alignments, including multiple alignments, In other embodiments, isolated or recombinant polypep can be prepared using, e.g., PSI-BLAST, available through tides of the invention include: (a) a polypeptide comprising 15 the NCB1 website and described by Altschul et al. (1997) the amino acid sequence set forth in SEQID NO:125; (b) a Nucl. Acids Res. 25:3389-3402 (incorporated by reference polypeptide consisting of the amino acid sequence set forth in herein). SEQID NO:125; or (c) a polypeptide comprising an amino With respect to an amino acid sequence that is optimally acid sequence having at least 60%, 70%, 80%, 90%, or 98% aligned with a reference sequence, an amino acid residue sequence identity to the amino acid sequence set forth in SEQ “corresponds to the position in the reference sequence with ID NO:125 excluding amino acid residues 47, 52, 74,78, 82, which the residue is paired in the alignment. The “position” is 94, 142, 175, 184, 188, 205, 226, 236, 252, 255, 290, 328, denoted by a number that sequentially identifies each amino 353, 366,443, 464, 698 and 710. acid in the reference sequence based on its position relative to “Sequence identity” herein means the extent to which two the N-terminus. For example, in SEQID NO:1, position 1 is nucleotide or amino acid sequences are invariant. “Sequence 25 T. position 2 is I, position 3 is K, etc. When a test sequence is alignment’ means the process of lining up two or more optimally aligned with SEQ ID NO:1, a residue in the test sequences to achieve maximal levels of identity (and, in the sequence that aligns with the Kat position 3 is said to "cor case of amino acid sequences, conservation) for the purpose respond to position 3 of SEQID NO:1. Owing to deletions, of assessing the degree of similarity. Numerous methods for insertion, truncations, fusions, etc., that must be taken into aligning sequences and assessing similarity/identity are 30 account when determining an optimal alignment, in general known in the art such as, for example, the Cluster Method, the amino acid residue number in a test sequence as deter wherein similarity is based on the MEGALIGNalgorithm, as mined by simply counting from the N-terminal will not nec well as BLASTN, BLASTP, and FASTA (Lipman and Pear essarily be the same as the number of its corresponding posi son, 1985; Pearson and Lipman, 1988). When using all of tion in the reference sequence. For example, in a case where these programs, the preferred settings are those that results in 35 there is a deletion in an aligned test sequence, there will be no the highest sequence similarity. For example, the “identity” or amino acid that corresponds to a position in the reference "percent identity” with respect to a particular pair of aligned sequence at the site of deletion. Where there is an insertion in amino acid sequences can refer to the percent amino acid an aligned reference sequence, that insertion will not corre sequence identity that is obtained by ClustalW analysis (ver spond to any amino acid position in the reference sequence. In sion W 1.8 available from European Bioinformatics Institute, 40 the case of truncations or fusions there can be stretches of Cambridge, UK), counting the number of identical matches amino acids in either the reference oraligned sequence that do in the alignment and dividing such number of identical not correspond to any amino acid in the corresponding matches by the greater of (i) the length of the aligned Sequence. sequences, and (ii) 96, and using the following default Clust In other embodiments, isolated nucleic acid molecules are alW parameters to achieve slow/accurate pairwise align 45 provided. In one aspect, the invention provides a novel family ments—Gap Open Penalty: 10; Gap Extension Penalty: 0.10; of isolated or recombinant polynucleotides referred to herein Protein weight matrix: Gonnet series; DNA weight matrix: as “P450 polynucleotides” or “P450 nucleic acid molecules.” IUB; Toggle Slow/Fast pairwise alignments=SLOW or P450 polynucleotide sequences are characterized by the abil FULL Alignment. ity to encode a P450 polypeptide. In general, the invention Two sequences are "optimally aligned when they are 50 includes any nucleotide sequence that encodes any of the aligned for similarity scoring using a defined amino acid novel P450 polypeptides described herein. In some aspects of substitution matrix (e.g., BLOSUM62), gap existence pen the invention, a P450 polynucleotide that encodes a P450 alty and gap extension penalty so as to arrive at the highest variant polypeptide with activity on alkanes is provided. The score possible for that pair of sequences. Amino acid Substi terms “polynucleotide.” “nucleotide sequence, and “nucleic tution matrices and their use in quantifying the similarity 55 acid molecule' are used to refer to a polymer of nucleotides between two sequences are well-known in the art and (A, C, T, U.G., etc. or naturally occurring or artificial nucle described, e.g., in Dayhoff et al. (1978) “A model of evolu otide analogues), e.g., DNA or RNA, or a representation tionary change in proteins’ in Atlas of Protein Sequence and thereof, e.g., a character string, etc., depending on the relevant Structure.” Vol. 5, Suppl. 3 (ed. M. O. Dayhoff), pp. 345-352. context. A given polynucleotide or complementary poly Natl. Biomed. Res. Found. Washington, D.C. and Henikoffet 60 nucleotide can be determined from any specified nucleotide al. (1992) Proc. Natl. Acad. Sci. USA89: 10915-10919 (each Sequence. of which is incorporated by reference). The BLOSUM62 In one aspect, the P450 polynucleotides comprise recom matrix (FIG.10) is often used as a default scoring substitution binant or isolated forms of naturally occurring nucleic acids matrix in sequence alignment protocols such as Gapped isolated from an organism, e.g., a bacterial strain. Exemplary BLAST 2.0. The gap existence penalty is imposed for the 65 P450 polynucleotides include those that encode the wild-type introduction of a single amino acid gap in one of the aligned polypeptides set forth in SEQ ID NO: 1, 2, 3, 4, 5, or 6. In sequences, and the gap extension penalty is imposed for each another aspect of the invention, P450 polynucleotides are US 8,715,988 B2 33 34 produced by diversifying, e.g., recombining and/or mutating ray et al. (1989) Nucl. Acids Res. 17:477-508; incorporated one or more naturally occurring, isolated, or recombinant by reference herein) can be prepared, for example, to increase P450 polynucleotides. As described in more detail elsewhere the rate of translation or to produce recombinant RNA tran herein, it is often possible to generate diversified P450 poly Scripts having desirable properties, such as a longer half-life, nucleotides encoding P450 polypeptides with superior func as compared with transcripts produced from a non-optimized tional attributes, e.g., increased catalytic function, increased sequence. Translation stop codons can also be modified to stability, or higher expression level, than a P450 polynucle reflect host preference. For example, preferred stop codons otide used as a Substrate or parent in the diversification pro for S. cerevisiae and mammals are UAA and UGA, respec cess. Exemplary polynucleotides include those that encode tively. The preferred stop codon for monocotyledonous plants the P450 variant polypeptides set forth in SEQID NO: 7,8,9, 10 is UGA, whereas insects and E. coli prefer to use UAA as the 10, 11, 12, 13 or 125. stop codon (Dalphin et al. (1996) Nucl. Acids Res. 24: 216 The polynucleotides of the invention have a variety of uses 218; incorporated by reference herein). Methodology for in, for example recombinant production (i.e., expression) of optimizing a nucleotide sequence for expression in a plant is the P450 polypeptides of the invention and as substrates for provided, for example, in U.S. Pat. No. 6,015,891, and the further diversity generation, e.g., recombination reactions or 15 references cited therein (incorporated herein by reference). mutation reactions to produce new and/or improved P450 Accordingly, in some embodiments, nucleic acid mol homologues, and the like. ecules of the invention include: (a) a nucleic acid molecule It is important to note that certain specific, Substantial and which encodes a polypeptide comprising the amino acid credible utilities of P450 polynucleotides do not require that sequence set forthin SEQID NO:7,8,9, 10, 11, 12, 13 or 125; the polynucleotide encode a polypeptide with Substantial (b) a nucleic acid molecule which encodes a polypeptide P450 activity or even variant P450 activity. For example, consisting of the amino acid sequence set forth in SEQ ID P450 polynucleotides that do not encode active enzymes can NO:7, 8, 9, 10, 11, 12, 13 or 125; (c) a nucleic acid molecule be valuable sources of parental polynucleotides for use in which encodes a polypeptide comprising residues 1 to about diversification procedures to arrive at P450 polynucleotide 455 of the amino acid sequence set forth in SEQID NO:7 or variants, or non-P450 polynucleotides, with desirable func 25 11; (d) a nucleic acid molecule which encodes a polypeptide tional properties (e.g., high k, or k/K, low K, high comprising residues about 456 to about 1088 of the amino stability towards heat or other environmental factors, high acid sequence set forth in SEQ ID NO:7, 8, 9, 10, 12, 13 or transcription or translation rates, resistance to proteolytic 125; (e) a nucleic acid molecule comprising the nucleotide cleavage, etc.). sequence set forth in SEQID NO:15; (f) a nucleic acid mol P450 polynucleotides, including nucleotide sequences that 30 ecule consisting of the nucleotide sequence set forth in SEQ encode P450 polypeptides and variants thereof, fragments of ID NO:15; (g) a nucleic acid molecule comprising the nucle P450 polypeptides, related fusion proteins, or functional otide sequence set forth in SEQID NO:17: (h) a nucleic acid equivalents thereof, are used in recombinant DNA molecules molecule consisting of the nucleotide sequence set forth in that direct the expression of the P450 polypeptides in appro SEQID NO:17. priate host cells, such as bacterial cells. Due to the inherent 35 In other embodiments, nucleic acid molecules of the inven degeneracy of the genetic code, other nucleic acid sequences tion include: (a) a nucleic acid molecule which encodes a which encode Substantially the same or a functionally equiva polypeptide comprising residues 1 to about 455 of the amino lent amino acid sequence can also be used to clone and acid sequence set forth in SEQID NO:2 or 3 with the follow express the P450 polynucleotides. The term “host cell’, as ing amino acid residues: 48 and 206 are cysteine (C); 79 and used herein, includes any cell type which is Susceptible to 40 330 are phenylalanine (F): 83, 143, and 257 are serine (S);95 transformation with a nucleic acid construct. The term “trans and 176 are isoleucine (I); 185,292, and 355 are valine (V); formation” means the introduction of a foreign (i.e., extrinsic 227 is arginine (R); 238 is glutamine (Q); and 254 is glycine or extracellular) gene, DNA or RNA sequence to a host cell, (G); (b) a nucleic acid molecule which encodes a polypeptide so that the host cell will express the introduced gene or consisting of residues 1 to about 455 of the amino acid sequence to produce a desired substance, typically a protein 45 sequence set forth in SEQ ID NO:2 or 3 with the following or enzyme coded by the introduced gene or sequence. The amino acid residues: 48 and 206 are cysteine (C); 79 and 330 introduced gene or sequence may include regulatory or con are phenylalanine (F): 83, 143, and 257 are serine (S);95 and trol sequences, such as start, stop, promoter, signal, secretion, 176 are isoleucine (I); 185,292, and 355 are valine (V): 227 or other sequences used by the genetic machinery of the cell. is arginine (R); 238 is glutamine (Q); and 254 is glycine (G); A host cell that receives and expresses introduced DNA or 50 (c) a nucleic acid molecule which encodes a polypeptide RNA has been “transformed and is a “transformant’ or a comprising residues 1 to about 455 of the amino acid "clone.” The DNA or RNA introduced to a host cell can come sequence set forth in SEQID NO:4 with the following amino from any source, including cells of the same genus or species acid residues: 49 and 207 are cysteine (C); 80 and 331 are as the host cell, or cells of a different genus or species. phenylalanine (F);84, 144, and 258 are serine (S); 96 and 177 As will be understood by those of skill in the art, it can be 55 are isoleucine (I); 186, 293, and 357 are valine (V): 228 is advantageous to modify a coding sequence to enhance its arginine (R): 239 is glutamine (Q); and 255 is glycine (G); (d) expression in a particular host. The genetic code is redundant a nucleic acid molecule which encodes a polypeptide consist with 64 possible codons, but most organisms preferentially ing of residues 1 to about 455 of the amino acid sequence set use a subset of these codons. The codons that are utilized most forth in SEQID NO:4 with the following amino acid residues: often in a species are called optimal codons, and those not 60 49 and 207 are cysteine (C); 80 and 331 are phenylalanine (F); utilized very often are classified as rare or low-usage codons 84, 144, and 258 are serine (S); 96 and 177 are isoleucine (I); (see, e.g., Zhang et al. (1991) Gene 105:61-72; incorporated 186, 293, and 357 are valine (V): 228 is arginine (R): 239 is by reference herein). Codons can be substituted to reflect the glutamine (Q); and 255 is glycine (G); (e) a nucleic acid preferred codon usage of the host, a process Sometimes called molecule which encodes a polypeptide comprising residues 1 “codon optimization” or “controlling for species codon bias.” 65 to about 455 of the amino acid sequence set forth in SEQID Optimized coding sequences containing codons preferred NO:5 with the following amino acid residues: 54 and 212 are by a particular prokaryotic or eukaryotic host (see also, Mur cysteine (C); 85 and 336 are phenylalanine (F); 89, 149, and US 8,715,988 B2 35 36 262 are serine (S); 101 and 182 are isoleucine (I); 191, 297, (N): 724 is threonine (T); and 983 is leucine; (f) a nucleic acid and 361 are valine (V): 232 is arginine (R): 243 is glutamine molecule which encodes a polypeptide consisting of the (Q); and 259 is glycine (G), (f) a nucleic acid molecule which amino acid sequence set forth in SEQ ID NO:4 with the encodes a polypeptide consisting of residues 1 to about 455 of following amino acid residues: 49 and 207 are cysteine (C); the amino acid sequence set forth in SEQID NO:5 with the 5 80 and 331 are phenylalanine (F): 84, 144, and 258 are following amino acid residues: 54 and 212 are cysteine (C); serine (S);96 and 177 are isoleucine (I); 186,293, and 357 are 85 and 336 are phenylalanine (F); 89, 149, and 262 are valine (V); 228, 288, and 659 are arginine (R): 239 is serine (S); 101 and 182 are isoleucine (I); 191, 297, and 361 glutamine (Q); and 255 is glycine (G); 477 is glycine (G), are valine (V); 232 is arginine (R): 243 is glutamine (Q); and arginine (R), tyrosine (Y), or threonine (T), 644 is asparagine 259 is glycine (G); (g) a nucleic acid molecule which encodes 10 a polypeptide comprising residues 1 to about 455 of the amino (N): 724 is threonine (T); and 983 is leucine; (g) a nucleic acid acid sequence set forth in SEQ ID NO:6 with the following molecule which encodes a polypeptide comprising the amino amino acid residues: 51 and 208 are cysteine (C); 82 and 337 acid sequence set forth in SEQ ID NO:5 with the following are phenylalanine (F): 86, 146, and 262 are serine (S); 98 and amino acid residues: 54 and 212 are cysteine (C); 85 and 336 179 are isoleucine (I); 188,296, and 363 are valine (V): 231 15 are phenylalanine (F);89, 149, and 262 are serine (S); 101 and is arginine (R): 243 is glutamine (Q); and 258 is glycine (G); 182 are isoleucine (I); 191,297, and 361 are valine (V): 232, and (h) a nucleic acid molecule which encodes a polypeptide 292, and 656 are arginine (R): 243 is glutamine (Q); and 259 consisting of residues 1 to about 455 of the amino acid is glycine (G); 472 is glycine (G), arginine (R), tyrosine (Y), sequence set forth in SEQID NO:6 with the following amino or threonine (T); 640 is asparagine (N): 723 is threonine (T): acid residues: 51 and 208 are cysteine (C); 82 and 337 are and 985 is leucine; (h) a nucleic acid molecule which encodes phenylalanine (F): 86, 146, and 262 are serine (S);98 and 179 a polypeptide consisting of the amino acid sequence set forth are isoleucine (I); 188, 296, and 363 are valine (V): 231 is in SEQID NO:5 with the following amino acid residues: 54 arginine (R): 243 is glutamine (Q); and 258 is glycine (G). and 212 are cysteine (C); 85 and 336 are phenylalanine (F); In other embodiments, nucleic acid molecules of the inven 89, 149, and 262 are serine (S); 101 and 182 are isoleucine (I); tion include: (a) a nucleic acid molecule which encodes a 25 191, 297, and 361 are valine (V): 232, 292, and 656 are polypeptide comprising the amino acid sequence set forth in arginine (R): 243 is glutamine (Q); and 259 is glycine (G); SEQID NO:2 with the following amino acid residues: 48 and 472 is glycine (G), arginine (R), tyrosine (Y), or threonine 206 are cysteine (C); 79 and 330 are phenylalanine (F): 83, (T); 640 is asparagine (N): 723 is threonine (T); and 985 is 143, and 257 are serine (S);95 and 176 are isoleucine (I); 185, leucine: (i) a nucleic acid molecule which encodes a polypep 292, and 355 are valine (V); 227, 287, and 656 are arginine 30 tide comprising the amino acid sequence set forth in SEQID (R); 238 is glutamine (Q); 254 is glycine (G); 475 is glycine NO:6 with the following amino acid residues: 51 and 208 are (G), arginine (R), tyrosine (Y), or threonine (T): 641 is aspar cysteine (C); 82 and 337 are phenylalanine (F): 86, 146, and agine (N): 721 is threonine (T); and 980 is leucine; (b) a 262 are serine (S);98 and 179 are isoleucine (I); 188,296, and nucleic acid molecule which encodes a polypeptide consist 363 are valine (V): 231, 291, and 664 are arginine (R): 243 is ing of the amino acid sequence set forth in SEQID NO:2 with 35 glutamine (Q); and 258 is glycine (G), 480 is glycine (G), the following amino acid residues: 48 and 206 are cysteine arginine (R), tyrosine (Y), or threonine (T), 648 is asparagine (C); 79 and 330 are phenylalanine (F): 83, 143, and 257 are (N): 733 is threonine (T); and 997 is leucine; and () a nucleic serine (S);95 and 176 are isoleucine (I); 185,292, and 355 are acid molecule which encodes a polypeptide consisting of the valine (V); 227, 287, and 656 are arginine (R): 238 is amino acid sequence set forth in SEQ ID NO:6 with the glutamine (Q); 254 is glycine (G); 475 is glycine (G), arginine 40 following amino acid residues: 51 and 208 are cysteine (C); (R), tyrosine (Y), or threonine (T): 641 is asparagine (N): 721 82 and 337 are phenylalanine (F): 86, 146, and 262 are is threonine (T); and 980 is leucine; (c) a nucleic acid mol serine (S);98 and 179 are isoleucine (I); 188,296, and 363 are ecule which encodes a polypeptide comprising the amino valine (V): 231, 291, and 664 are arginine (R): 243 is acid sequence set forth in SEQ ID NO:3 with the following glutamine (Q); and 258 is glycine (G), 480 is glycine (G), amino acid residues: 48 and 206 are cysteine (C); 79 and 330 45 arginine (R), tyrosine (Y), or threonine (T), 648 is asparagine are phenylalanine (F): 83, 143, and 257 are serine (S);95 and (N): 733 is threonine (T); and 997 is leucine. 176 are isoleucine (I); 185,292, and 355 are valine (V): 227, In one embodiment, an isolated nucleic acid molecule that 287, and 648 are arginine (R): 238 is glutamine (Q); 254 is includes a nucleic acid molecule of the invention and a nucle glycine (G), 467 is glycine (G), arginine (R), tyrosine (Y), or otide sequence encoding a heterologous polypeptide, is pro threonine (T): 634 is asparagine (N); 713 is threonine (T); and 50 vided. 972 is leucine; (d) a nucleic acid molecule which encodes a “Silent variations are one species of “conservatively polypeptide consisting of the amino acid sequence set forth in modified variations.” One of skill will recognize that each SEQID NO:3 with the following amino acid residues: 48 and codon in a nucleic acid (except AUG, which is ordinarily the 206 are cysteine (C); 79 and 330 are phenylalanine (F): 83, only codon for methionine) can be modified by standard 143, and 257 are serine (S);95 and 176 are isoleucine (I); 185, 55 techniques to encode a functionally identical polypeptide. 292, and 355 are valine (V); 227, 287, and 648 are arginine Accordingly, each silent variation of a nucleic acid which (R); 238 is glutamine (Q); 254 is glycine (G); 467 is glycine encodes a polypeptide is implicit in any described sequence. (G), arginine (R), tyrosine (Y), or threonine (T): 634 is aspar The invention provides each and every possible variation of agine (N); 713 is threonine (T); and 972 is leucine; (e) a nucleic acid sequence encoding a polypeptide of the inven nucleic acid molecule which encodes a polypeptide compris 60 tion that could be made by selecting combinations based on ing the amino acid sequence set forth in SEQID NO:4 with possible codon choices. These combinations are made in the following amino acid residues: 49 and 207 are cysteine accordance with the standard triplet genetic code as applied to (C); 80 and 331 are phenylalanine (F); 84, 144, and 258 are the nucleic acid sequence encoding a P450 homologue serine (S);96 and 177 are isoleucine (I); 186,293, and 357 are polypeptide of the invention. All such variations of every valine (V); 228, 288, and 659 are arginine (R): 239 is 65 nucleic acid herein are specifically provided and described by glutamine (Q); and 255 is glycine (G); 477 is glycine (G), consideration of the sequence in combination with the genetic arginine (R), tyrosine (Y), or threonine (T), 644 is asparagine code. Any variant can be produced as noted herein. US 8,715,988 B2 37 38 In general, the invention includes any polypeptide encoded Accordingly, in another embodiment, an isolated nucleic by a modified P450 polynucleotide derived by mutation, acid molecule of the invention hybridizes under stringent recursive sequence recombination, and/or diversification of conditions to a nucleic acid molecule comprising the nucle the polynucleotide sequences described herein. In some otide sequence encoding a polypeptide set forth in any of SEQ aspects of the invention, a P450 polypeptide. Such as a heme NOs: 1-13 and 125, or having the nucleotide sequence set domain and/or a reductase domain, is modified by single or forthin any of SEQID NOS:15-18. In otherembodiments, the multiple amino acid substitutions, a deletion, an insertion, or nucleic acid is at least 30, 50, 100, 150, 200, 250, 300, 350, a combination of one or more of these types of modifications. 400, 450, 500, 550, or 600 nucleotides in length. Nucleic acid Substitutions can be conservative or non-conservative, can molecules are “hybridizable' to each other when at least one 10 Strand of one polynucleotide can anneal to another polynucle alter function or not, and can add new function. Insertions and otide under defined stringency conditions. Stringency of deletions can be substantial. Such as the case of a truncation of hybridization is determined, e.g., by (a) the temperature at a Substantial fragment of the sequence, or in the fusion of which hybridization and/or washing is performed, and (b) the additional sequence, either internally or at N or C terminal. ionic strength and polarity (e.g., formamide) of the hybrid One aspect of the invention pertains to isolated nucleic acid 15 ization and washing solutions, as well as other parameters. molecules that encode modified P450 polypeptides or bio Hybridization requires that the two polynucleotides contain logically active portions thereof. As used herein, the term Substantially complementary sequences; depending on the “nucleic acid molecule' is intended to include DNA mol stringency of hybridization, however, mismatches may be ecules (e.g., cDNA or genomic DNA) and RNA molecules tolerated. Typically, hybridization of two sequences at high (e.g., mRNA) and analogs of the DNA or RNA generated stringency (such as, for example, in an aqueous Solution of using nucleotide analogs. The nucleic acid molecule can be 0.5xSSC at 65° C.) requires that the sequences exhibit some single-stranded or double-stranded, but preferably is double high degree of complementarity over their entire sequence. stranded DNA. Conditions of intermediate stringency (such as, for example, A nucleic acid molecule of the present invention, e.g., a an aqueous solution of 2xSSC at 65°C.) and low stringency nucleic acid molecule that encodes a polypeptide set forth in 25 (such as, for example, an aqueous solution of 2xSSC at 55° any of SEQ NOs: 1-13 and 125, or having the nucleotide C.), require correspondingly less overall complementarity sequence of set forth in any of SEQ ID NOs: 15-18, or a between the hybridizing sequences (1xSSC is 0.15 MNaCl, portion thereof, can be isolated using standard molecular 0.015 M Na citrate). Nucleic acid molecules that hybridize biology techniques and the sequence information provided include those which anneal under Suitable stringency condi herein. 30 tions and which encode polypeptides or enzymes having the A nucleic acid of the invention can be amplified using same function, such as the ability to catalyze the conversion cDNA, mRNA or alternatively, genomic DNA, as a template of an alkane (e.g., methane) to an alcohol (e.g., methanol), of and appropriate oligonucleotide primers according to stan the invention. Further, the term “hybridizes under stringent dard PCR amplification techniques. The nucleic acid so conditions” is intended to describe conditions for hybridiza amplified can be cloned into an appropriate vector and char 35 tion and washing under which nucleotide sequences at least acterized by DNA sequence analysis. Furthermore, oligo 30%, 40%, 50%, or 60% homologous to each other typically nucleotides corresponding to nucleotide sequences can be remain hybridized to each other. Preferably, the conditions prepared by standard synthetic techniques, e.g., using an are such that sequences at least about 70%, more preferably at automated DNA synthesizer. In some embodiments, an iso least about 80%, even more preferably at least about 85% or lated nucleic acid molecule of the invention comprises a 40 90% homologous to each other typically remain hybridized to nucleic acid molecule which is a complement of a nucleotide each other. In some cases, an isolated nucleic acid molecule of sequence encoding a polypeptide set forth in any of SEQ the invention that hybridizes under Stringent conditions to a NOs: 1-13 and 125, or having the nucleotide sequence of set nucleic acid sequence encoding a polypeptide set forth in any forth in any of SEQID NOs: 15-18. In still another embodi of SEQ NOs: 1-13, and 125, or having the nucleotide ment, an isolated nucleic acid molecule of the invention com 45 sequence set forth in any of SEQID NOs: 15-18, corresponds prises a nucleotide sequence which is at least about 50%, to a naturally-occurring nucleic acid molecule. As used 54%, 55%, 60%, 62%, 65%, 70%, 75%, 78%, 80%, 85%, herein, a “naturally-occurring nucleic acid molecule refers 86%, 90%, 95%, 97%, 98% or more identical to the nucle to an RNA or DNA molecule having a nucleotide sequence otide sequence encoding a polypeptide set forth in any of SEQ that occurs in nature (e.g., encodes a natural protein). NOs: 1-13 and 125, or having the nucleotide sequence set 50 The skilled artisan will appreciate that changes can be forth in any of SEQID NOS:15-18, or a portion of any of these introduced by mutation into the nucleotide sequences of any nucleotide sequences. nucleic acid sequence encoding a polypeptide set forth in any In addition to the nucleotide sequences encoding a of SEQ NOs: 1-13, and 125, or having the nucleotide polypeptide set forth in any of SEQ NOs: 1-13, and 125, or sequence set forth in any of SEQ ID NOs: 15-18, thereby having the nucleotide sequence set forth in any of SEQ ID 55 leading to changes in the amino acid sequence of the encoded NOs: 15-18, it will be appreciated by those skilled in the art proteins. In some cases the alteration will lead to altered that DNA sequence polymorphisms that lead to changes in function of the polypeptide. In other cases the change will not the amino acid sequences of the proteins may exist within a alter the functional ability of the encoded polypeptide. In population. Such genetic polymorphisms may exist among general, Substitutions that do not alter the function of a individuals within a population due to natural allelic varia 60 polypeptide include nucleotide Substitutions leading to tion. Such natural allelic variations include both functional amino acid Substitutions at “non-essential amino acid resi and non-functional proteins and can typically result in 1-5% dues. Generally these substitutions can be made in, for variance in the nucleotide sequence of a gene. Any and all example, in the sequence encoding a polypeptide set forth in Such nucleotide variations and resulting amino acid polymor any of SEQ NOs:7-13, and 125, or having the nucleotide phisms in genes that are the result of natural allelic variation 65 sequence set forth in any of SEQ ID NOs: 15-18, without and that do not alter the functional activity of a protein are altering the ability of the enzyme to catalyze the oxidation of intended to be within the scope of the invention. an alkane to an alcohol. A “non-essential amino acid residue US 8,715,988 B2 39 40 is a residue that can be altered from the parent sequence nonpolar side chains (e.g., alanine, Valine, leucine, isoleu without altering the biological activity of the resulting cine, proline, phenylalanine, methionine, tryptophan), beta polypeptide, e.g., catalyzing the conversion of methane to branched side chains (e.g., threonine, Valine, isoleucine) and methanol. aromatic side chains (e.g., tyrosine, phenylalanine, tryp Also contemplated are those situations where it is desirable tophan, histidine). to alter the activity of a parent polypeptide such that the Mutational methods of generating diversity include, for polypeptide has new or increased activity on a particular example, site-directed mutagenesis (Ling et al. (1997) Substrate. It is understood that these amino acid Substitutions Approaches to DNA mutagenesis: an overview” Anal Bio will generally not constitute “conservative' substitutions. chem. 254(2): 157-178; Dale et al. (1996) “Oligonucleotide Instead, these substitutions constitute non-conservative Sub 10 directed random mutagenesis using the phosphorothioate stitutions introduced into a sequence in order to obtain a new method Methods Mol. Biol. 57:369-374; Smith (1985) “In or improved activity. For example, the polypeptides set forth vitro mutagenesis' Ann. Rev. Genet. 19:423-462; Botstein & in Table 4 (SEQ ID NOs: 1-6) describe specific amino acid Shortle (1985) “Strategies and applications of in vitro substitutions that contribute to the alteration of the activity of mutagenesis' Science 229:1193-1201; Carter (1986) “Site a parent polypeptide. SEQID NOs: 1-6 are P450 wild-type 15 directed mutagenesis' Biochem. J. 237:1-7: and Kunkel polypeptides. These polypeptides may also be referred to as (1987) “The efficiency of oligonucleotide directed mutagen "parent amino acid sequences because Substitutions made to esis' in Nucleic Acids & Molecular Biology (Eckstein, F. and their amino acid sequence give rise to modified P450 Lilley, D. M. J. eds. Springer Verlag, Berlin)); mutagenesis polypeptides that can have altered or modified activity as using uracil containing templates (Kunkel (1985) “Rapid and compared to the parent sequence (see Table 4). For example, efficient site-specific mutagenesis without phenotypic selec SEQID NO:1 provided the parent sequence for the modified tion Proc. Natl. Acad. Sci. USA 82:488-492; Kunkel et al. P450 polypeptide set forth in SEQID NO:7. SEQID NO:7 (1987) “Rapid and efficient site-specific mutagenesis without includes amino acid substitutions that impart activities to the phenotypic selection Methods in Enzymol. 154, 367-382; SEQ ID NO:7 polypeptide not measurable in the parent and Bass et al. (1988) “Mutant Trp repressors with new DNA polypeptide. Accordingly, the nucleic acid molecule encod 25 binding specificities’ Science 242:240-245); oligonucle ing the amino acid sequence of SEQ ID NO:1 provides a otide-directed mutagenesis (Methods in Enzymol. 100: 468 “parent nucleic acid molecule from which mutations can be 500 (1983); Methods in Enzymol. 154; 329-350 (1987); made to obtain a nucleic acid molecule that encodes a modi Zoller & Smith (1982) “Oligonucleotide-directed mutagen fied polypeptide that includes amino acid Substitutions. In esis using M13-derived vectors: an efficient and general pro general, these mutations provide non-conservative amino 30 cedure for the production of point mutations in any DNA acid substitutions at indicated position(s) in a designated fragment' Nucleic Acids Res. 10:6487-6500; Zoller & Smith sequence. (1983) “Oligonucleotide-directed mutagenesis of DNA frag It is also understood that a modified polypeptide can con ments cloned into M13 vectors’ Methods in Enzymol. 100: stitute a “parent polypeptide from which additional substi 468-500; and Zoller & Smith (1987) “Oligonucleotide-di tutions can be made. Accordingly, a parent polypeptide, and a 35 rected mutagenesis: a simple method using two nucleic acid molecule that encodes a parent polypeptide, oligonucleotide primers and a single-stranded DNA tem includes modified polypeptides and not just “wild-type' plate' Methods in Enzymol. 154:329-350); phosphorothio sequences. For example, the polypeptide of SEQID NO:7 is ate-modified DNA mutagenesis (Taylor et al. (1985) “The use a modified polypeptide with respect to SEQID NO: 1 (i.e., the of phosphorothioate-modified DNA in restriction enzyme “parent polypeptide). Similarly, the polypeptide of SEQID 40 reactions to prepare nicked DNA' Nucl. Acids Res. 13:8749 NO:8 is a modified polypeptide with respect to SEQID NO:7. 8764; Taylor et al. (1985) “The rapid generation of oligo Accordingly, SEQID NO:7 is the parent sequence of SEQID nucleotide-directed mutations at high frequency using phos NO:8. Further, the nucleic acid sequence encoding the phorothioate-modified DNA” Nucl. Acids Res. 13: 8765 polypeptide of SEQ ID NO:7 provides the parent sequence 8787; Nakamaye & Eckstein (1986) “Inhibition of restriction from which mutations can be made to obtain a nucleic acid 45 endonuclease Nici I cleavage by phosphorothioate groups and sequence that encodes the polypeptide of SEQID NO:8. its application to oligonucleotide-directed mutagenesis' It is also understood that an isolated nucleic acid molecule Nucl. Acids Res. 14:9679–9698; Sayers et al. (1988) “Y-T encoding a polypeptide homologous to the polypeptides if of Exonucleases in phosphorothioate-based oligonucleotide-di SEQID NOs: 1-13, and 125 can be created by introducing one rected mutagenesis' Nucl. Acids Res. 16:791-802; and Say or more nucleotide Substitutions, additions or deletions into 50 ers et al. (1988) "Strand specific cleavage of phosphorothio the nucleotide sequence encoding the particular polypeptide, ate-containing DNA by reaction with restriction Such that one or more amino acid Substitutions, additions or endonucleases in the presence of ethidium bromide' Nucl. deletions are introduced into the encoded protein. Mutations Acids Res. 16: 803-814); mutagenesis using gapped duplex can be introduced into the nucleic acid sequence by standard DNA (Kramer et al. (1984) “The gapped duplex DNA techniques, such as site-directed mutagenesis and PCR-me 55 approach to oligonucleotide-directed mutation construction” diated mutagenesis. In contrast to those positions where it Nucl. Acids Res. 12: 9441-9456; Kramer & Fritz (1987) may be desirable to make a non-conservative amino acid Methods in Enzymol. “Oligonucleotide-directed construc Substitutions (see above), in some positions it is preferable to tion of mutations via gapped duplex DNA' 154:350-367; make conservative amino acid substitutions. A "conservative Kramer et al. (1988) “Improved enzymatic in vitro reactions amino acid substitution' is one in which the amino acid 60 in the gapped duplex DNA approach to oligonucleotide-di residue is replaced with an amino acid residue having a simi rected construction of mutations' Nucl. Acids Res. 16: 7207; lar side chain. Families of amino acid residues having similar and Fritzetal. (1988) “Oligonucleotide-directed construction side chains have been defined in the art. These families of mutations: a gapped duplex DNA procedure without enzy include amino acids with basic side chains (e.g., lysine, argi matic reactions in vitro” Nucl. Acids Res. 16: 6987-6999) nine, histidine), acidic side chains (e.g., aspartic acid, 65 (each of which is incorporated by reference). glutamic acid), uncharged polar side chains (e.g., glycine, Additional Suitable methods include point mismatch repair asparagine, glutamine, serine, threonine, tyrosine, cysteine), (Kramer et al. (1984) “Point Mismatch Repair Cell 38:879 US 8,715,988 B2 41 42 887), mutagenesis using repair-deficient host strains (Carter Recombination:” WO 98/27230 by Patten and Stemmer, et al. (1985) “Improved oligonucleotide site-directed “Methods and Compositions for Polypeptide Engineering: mutagenesis using M13 vectors' Nucl. Acids Res. 13:4431 WO 98/13487 by Stemmer et al., “Methods for Optimization 4443; and Carter (1987) “Improved oligonucleotide-directed of Gene Therapy by Recursive Sequence Shuffling and Selec mutagenesis using M13 vectors' Methods in Enzymol. 154: 5 tion:” WO 00/00632, “Methods for Generating Highly 382-403), deletion mutagenesis (Eghtedarzadeh & Henikoff Diverse Libraries:” WO00/09679, “Methods for Obtaining in (1986) “Use of oligonucleotides to generate large deletions” vitro Recombined Polynucleotide Sequence Banks and Nucl. Acids Res. 14: 5115), restriction-selection and restric Resulting Sequences:” WO 98/42832 by Arnold et al., tion-purification (Wells et al. (1986) “Importance of hydro “Recombination of Polynucleotide Sequences Using Ran gen-bond formation in Stabilizing the transition state of Sub 10 dom or Defined Primers:” WO 99/29902 by Arnold et al., tilisin' Phil. Trans. R. Soc. Lond. A 317: 415-423), “Method for Creating Polynucleotide and Polypeptide mutagenesis by total gene synthesis (Nambiar et al. (1984) Sequences:” WO 98/41653 by Vind, “An in vitro Method for “Total synthesis and cloning of a gene coding for the ribonu Construction of a DNA Library:” WO 98/41622 by Borchert clease S protein’ Science 223: 1299-1301: Sakamar and et al., “Method for Constructing a Library Using DNA Shuf Khorana (1988) “Total synthesis and expression of a gene for 15 fling:” WO 98/.42727 by Pati and Zarling, “Sequence Alter the a-subunit of bovine rod outer segment guanine nucle ations using : WO 00/18906 by otide-binding protein (transducin) Nucl. Acids Res. 14: Patten et al., “Shuffling of Codon-Altered Genes:” WO 6361-6372; Wells et al. (1985) “Cassette mutagenesis: an 00/04190 by del Cardayre et al. “Evolution of Whole Cells efficient method for generation of multiple mutations at and Organisms by Recursive Recombination: WO 00/42561 defined sites' Gene34:315-323; and Grundstrometal. (1985) by Crameri et al., “Oligonucleotide Mediated Nucleic Acid “Oligonucleotide-directed mutagenesis by microscale shot Recombination:” WO 00/42559 by Selifonov and Stemmer gun gene synthesis' Nucl. Acids Res. 13: 3305-3316); “Methods of Populating Data Structures for Use in Evolu double-strand break repair (Mandecki (1986); Arnold (1993) tionary Simulations:” WO 00/42560 by Selifonov et al., “Protein engineering for unusual environments’ Current “Methods for Making Character Strings, Polynucleotides & Opinion in Biotechnology 4:450-455; and “Oligonucleotide 25 Polypeptides Having Desired Characteristics: WO directed double-strand break repair in plasmids of Escheri 01/23401 by Welch et al., “Use of Codon-Varied Oligonucle chia coli: a method for site-specific mutagenesis' Proc. Natl. otide Synthesis for Synthetic Shuffling.” and WO 01/64864 Acad. Sci. USA, 83:7177-7181) (each of which is incorpo “Single-Stranded Nucleic Acid Template-Mediated Recom rated by reference). Additional details on many of the above bination and Nucleic Acid Fragment Isolation” by Affholter methods can be found in Methods in Enzymology Volume 30 (each of which is incorporated by reference). 154, which also describes useful controls for trouble-shoot Also provided are recombinant constructs comprising one ing problems with various mutagenesis methods. or more of the nucleic acid sequences as broadly described Additional details regarding various diversity generating above. The constructs comprise a vector, Such as, a plasmid, methods can be found in the following U.S. patents, PCT a cosmid, a phage, a virus, a bacterial artificial chromosome publications, and EPO publications: U.S. Pat. No. 5,605,793 35 (BAC), a yeast artificial chromosome (YAC), or the like, into to Stemmer (Feb. 25, 1997), “Methods for In vitro Recombi which a nucleic acid sequence of the invention has been nation:” U.S. Pat. No. 5,811.238 to Stemmer et al. (Sep. 22, inserted, in a forward or reverse orientation. In a preferred 1998) “Methods for Generating Polynucleotides having aspect of this embodiment, the construct further comprises Desired Characteristics by Iterative Selection and Recombi regulatory sequences including, for example, a promoter nation:” U.S. Pat. No. 5,830,721 to Stemmer et al. (Nov. 3, 40 operably linked to the sequence. Large numbers of Suitable 1998), “DNA Mutagenesis by Random Fragmentation and vectors and promoters are known to those of skill in the art, Reassembly;” U.S. Pat. No. 5,834,252 to Stemmer, et al. and are commercially available. (Nov. 10, 1998) “End-Complementary Polymerase Reac Accordingly, in other embodiments, vectors that include a tion.” U.S. Pat. No. 5,837,458 to Minshull, et al. (Nov. 17, nucleic acid molecule of the invention are provided. In other 1998), "Methods and Compositions for Cellular and Meta 45 embodiments, host cells transfected with a nucleic acid mol bolic Engineering: WO95/22625, Stemmer and Crameri, ecule of the invention, or a vector that includes a nucleic acid “Mutagenesis by Random Fragmentation and Reassembly;' molecule of the invention, are provided. Host cells include WO 96/33207 by Stemmer and Lipschutz“End Complemen eucaryotic cells such as yeast cells, insect cells, or animal tary Polymerase Chain Reaction:” WO97/20078 by Stemmer cells. Host cells also include procaryotic cells such as bacte and Crameri “Methods for Generating Polynucleotides hav 50 rial cells. ing Desired Characteristics by Iterative Selection and Recom The terms “vector”, “vector construct” and “expression bination:” WO97/35966 by Minshull and Stemmer, “Meth vector” mean the vehicle by which a DNA or RNA sequence ods and Compositions for Cellular and Metabolic (e.g. a foreign gene) can be introduced into a host cell, so as Engineering:” WO99/41402 by Punnonenetal. “Targeting of to transform the host and promote expression (e.g. transcrip Genetic Vaccine Vectors:” WO99/41383 by Punnonen et al. 55 tion and translation) of the introduced sequence. Vectors typi “Antigen Library Immunization:” WO 99/41369 by Pun cally comprise the DNA of a transmissible agent, into which nonen et al. “Genetic Vaccine Vector Engineering: WO foreign DNA encoding a protein is inserted by restriction 99/41368 by Punnonen et al. “Optimization of Immuno enzyme technology. A common type of vector is a “plasmid', modulatory Properties of Genetic Vaccines:” EP 752008 by which generally is a self-contained molecule of double Stemmer and Crameri, “DNA Mutagenesis by Random Frag 60 stranded DNA that can readily accept additional (foreign) mentation and Reassembly;” EP 0932670 by Stemmer DNA and which can readily introduced into a suitable host “Evolving Cellular DNA Uptake by Recursive Sequence cell. A large number of vectors, including plasmid and fungal Recombination:” WO99/23107 by Stemmer et al., “Modifi vectors, have been described for replication and/or expression cation of Virus Tropism and Host Range by Viral Genome in a variety of eukaryotic and prokaryotic hosts. Non-limiting Shuffling:” WO99/21979 by Apt et al., “Human Papilloma 65 examples include pKK plasmids (Clonetech), puC plasmids, virus Vectors:” WO 98/31837 by del Cardayre et al. “Evolu pET plasmids (Novagen, Inc., Madison, Wis.), pRSET or tion of Whole Cells and Organisms by Recursive Sequence pREP plasmids (Invitrogen, San Diego, Calif.), or pMAL US 8,715,988 B2 43 44 plasmids (New England Biolabs, Beverly, Mass.), and many techniques. The vector may be, for example, a plasmid, a viral appropriate host cells, using methods disclosed or cited particle, a phage, etc. The engineered host cells can be cul herein or otherwise known to those skilled in the relevant art. tured in conventional nutrient media modified as appropriate Recombinant cloning vectors will often include one or more for activating promoters, selecting transformants, or ampli replication systems for cloning or expression, one or more fying the P450 homologue gene. Culture conditions, such as markers for selection in the host, e.g., antibiotic resistance, temperature, pH and the like, are those previously used with and one or more expression cassettes. the host cell selected for expression, and will be apparent to The terms “express’ and “expression” mean allowing or those skilled in the art and in the references cited herein, causing the information in a gene or DNA sequence to including, e.g., Sambrook, Ausubel and Berger, as well as become manifest, for example producing a protein by acti 10 e.g., Freshney (1994) Culture of Animal Cells: A Manual of Vating the cellular functions involved in transcription and Basic Technique, 3rd ed. (Wiley-Liss, New York) and the translation of a corresponding gene or DNA sequence. A references cited therein. DNA sequence is expressed in or by a cell to forman “expres In other embodiments, methods for producing a cell that sion product” such as a protein. The expression product itself, converts an alkane to alcohol, are provided. Such methods e.g. the resulting protein, may also be said to be “expressed 15 generally include: (a) transforming a cell with an isolated by the cell. A polynucleotide or polypeptide is expressed nucleic acid molecule encoding a polypeptide comprising an recombinantly, for example, when it is expressed or produced amino acid sequence set forth in SEQID NO: 7, 8, 9, 10, 11, in a foreign host cell under the control of a foreign or native 12, 13 or 125; (b) transforming a cell with an isolated nucleic promoter, or in a native host cell under the control of a foreign acid molecule encoding a polypeptide of the invention; or (c) promoter. transforming a cell with an isolated nucleic acid molecule of Polynucleotides provided herein can be incorporated into the invention. any one of a variety of expression vectors Suitable for express In other embodiments, methods for selecting a cell that ing a polypeptide. Suitable vectors include chromosomal, converts an alkane to an alcohol, are provided. The methods nonchromosomal and synthetic DNA sequences, e.g., deriva generally include: (a) providing a cell containing a nucleic tives of SV40; bacterial plasmids; phage DNA; baculovirus: 25 acid construct that includes a nucleotide sequence that yeast plasmids; vectors derived from combinations of plas encodes a modified cytochrome P450 polypeptide, the nucle mids and phage DNA, viral DNA such as vaccinia, adenovi otide sequence selected from: (i) a nucleic acid molecule rus, fowlpox virus, pseudorabies, adenovirus, adeno-associ encoding a polypeptide comprising an amino acid sequence ated viruses, retroviruses and many others. Any vector that set forth in SEQID NO: 7, 8, 9, 10, 11, 12, 13 or 125; (ii) a transduces genetic material into a cell, and, if replication is 30 nucleic acid molecule encoding a polypeptide of the inven desired, which is replicable and viable in the relevant host can tion; or (iii) a nucleic acid molecule of the invention. The be used. methods further include (b) culturing the cell in the presence Vectors can be employed to transform an appropriate host of a suitable alkane and under conditions where the modified to permit the host to express an inventive protein or polypep cytochrome P450 is expressed at an effective level; and (c) tide. Examples of appropriate expression hosts include: bac 35 detecting the production of an alcohol. terial cells, such as E. coli, B. subtilis, Streptomyces, and In other embodiments, methods for producing an alcohol, Salmonella typhimurium; fungal cells, such as Saccharomy are provided. The methods include: (a) providing a cell con ces cerevisiae, Pichia pastoris, and Neurospora Crassa; taining a nucleic acid construct comprising a nucleotide insect cells such as Drosophila and Spodoptera frugiperda: sequence that encodes a modified cytochrome P450 polypep mammalian cells such as CHO, COS, BHK, HEK 293 br 40 tide, the nucleotide sequence selected from: (i) a nucleic acid Bowes melanoma; or plant cells or explants, etc. molecule encoding a polypeptide comprising an amino acid In bacterial systems, a number of expression vectors may sequence set forth in SEQ ID NO: 7, 8, 9, 10, 11, 12, 13 or be selected depending upon the use intended for the P450 125; (ii) a nucleic acid molecule encoding a polypeptide of polypeptide. For example, when large quantities of P450 the invention; or (iii) a nucleic acid molecule of the invention. polypeptide or fragments thereof are needed for commercial 45 The methods further include (b) culturing the cell in the production or for induction of antibodies, vectors which presence of a suitable alkane and under conditions where the direct high level expression of fusion proteins that are readily modified cytochrome P450 is expressed at an effective level; purified can be desirable. Such vectors include, but are not and (c) producing an alcohol by hydroxylation of the Suitable limited to, multifunctional E. coli cloning and expression alkane such as methane (CH), ethane (CH), propane vectors such as BLUESCRIPT (Stratagene), in which the 50 (CHs), butane (CHo), pentane (C5H12), hexane (CH4). P450polypeptide coding sequence may be ligated into the heptane (C7H), octane (CHs), nonane (CoHo), decane vector in-frame with sequences for the amino-terminal Met (CoH 22), undecane (CH2-), and dodecane (C12H2). In and the Subsequent 7 residues of beta-galactosidase so that a general an alcohol produced by a method the invention can hybrid protein is produced; plN vectors (Van Heeke & include methanol, ethanol, propanol, butanol, pentanol, hex Schuster (1989).J. Biol. Chem. 264: 5503–5509); pET vectors 55 anol, heptanol, octanol, nonanol, decanol, undecanol, and (Novagen, Madison Wis.); and the like. dodecanol. Similarly, in the yeast Saccharomyces cerevisiae a number The cytochrome P450 methane hydroxylase is expressed of vectors containing constitutive or inducible promoters in its functional form in Escherichia coli and the whole cell such as alpha factor, alcohol oxidase and PGH may be used conversion of methane to methanol using optimized versions for production of the P450 polypeptides of the invention. For 60 of these enzymes in E. coli will be possible. As demonstrated reviews, see Ausubel (supra) and Grant et al. (1987) Methods in the examples provided herein, Escherichia coli over in Enzymology 153:516-544 (incorporated herein by refer expressing BM-3 enzymes will act as whole cell biocatalysts ence). with relative activities equivalent to the activities of the indi Also provided are engineered host cells that are transduced vidual BM-3 catalyst. Substrates used for these whole cell (transformed or transfected) with a vector provided herein 65 experiments include alkanes, alkenes, and heterocyclic com (e.g., a cloning vector or an expression vector), as well as the pounds. In the case of methane hydroxylation, the mainte production of polypeptides of the invention by recombinant nance of the gaseous substrate levels in cultures of whole cells US 8,715,988 B2 45 46 will complicate the process and constitute a large portion of methane. Expression of BM-3 in plant and fungi is possible future work with this system. Further, the metabolism of host with standard in the art DNA transfer techniques specific to organisms can be engineered to further increase the levels of these types of cells, such as the use of viral vectors for plants. NADPH (or NADH for BM-3 variants that accept NADH) As previously discussed, general texts which describe available to biocatalysts provided herein. molecular biological techniques useful herein, including the Examples of whole cell systems that could be used to use of vectors, promoters and many other relevant topics, convert methane to methanol will now be described. The include Berger and Kimmel, Guide to Molecular Cloning alkane content of natural gas generally comprises about 80% Techniques, Methods in Enzymology Volume 152. (Aca methane and 20% other Small alkanes, primarily ethane and demic Press, Inc., San Diego, Calif.) (“Berger); Sambrooket propane. Naturally occurring strains of gram negative (e.g. 10 al., Molecular Cloning A Laboratory Manual. 2d ed., Vol. Pseudomonas) and gram positive (e.g. Rhodococcus), and 1-3, Cold Spring Harbor Laboratory, Cold Spring Harbor, even engineered laboratory Strains of E. coli, can grow using N.Y., 1989 (“Sambrook”) and Current Protocols in Molecular Small alcohols as their sole carbon Source. Expression of Biology, F. M. Ausubel et al., eds., Current Protocols, a joint optimized BM-3 based biocatalysts in one of these bacterial venture between Greene Publishing Associates, Inc. and John strains will allow the bacteria to use the alkane impurities in 15 Wiley & Sons, Inc., (supplemented through 1999) natural gas as their sole source of carbon and energy (and (Ausubel') (each of which is incorporated by reference). cofactor regeneration) while converting the methane into Examples of protocols sufficient to direct persons of skill methanol. Expression of our BM-3 enzymes in any of these through in vitro amplification methods, including the poly bacteria can be done by standard in the art techniques using merase chain reaction (PCR), the ligase chain reaction multi-host plasmids and shuttle vectors. (LCR), QB-replicase amplification and other RNA poly Methane generated by the decomposition of biomass in merase mediated techniques (e.g., NASBA), e.g., for the pro landfills and agricultural waste does not contain other duction of the homologous nucleic acids of the invention are alkanes. To utilize these methane sources with a whole cell found in Berger, Sambrook, and Ausubel, as well as in Mullis system containing our BM-3 biocatalysts, an external carbon et al. (1987) U.S. Pat. No. 4,683.202: Innis et al., eds. (1990) and energy source Such as glucose must be provided to the 25 PCR Protocols: A Guide to Methods and Applications (Aca host cells to regenerate NADPH. Glucose or similar com demic Press Inc. San Diego, Calif.) (“Innis); Arnheim & pounds (with greater NADPH-producing potentials) can be Levinson (Oct. 1, 1990) C&EN 36-47: The Journal OfNIH added directly to these whole cell reactions or generated from Research (1991) 3: 81-94: Kwoh et al. (1989) Proc. Natl. biomass present in the methane-producing waste. The Acad. Sci. USA 86: 1173; Guatelli et al. (1990) Proc. Natl. amount of cofactor a host organism generates from these 30 Acad. Sci. USA87: 1874; Lomeletal. (1989).J. Clin. Chem. carbon Sources can also be increased by streamlining the cell 35: 1826; Landegren et al. (1988) Science 241: 1077-1080: strain for methane production. Expression of optimized Van Brunt (1990) Biotechnology 8:291-294: Wu and Wallace BM-3 enzymes in these bacteria can be done by standard in (1989) Gene 4:560; Barringer et al. (1990) Gene 89:117; and the art techniques using strain specific plasmids. Modifica Sooknanan and Malek (1995) Biotechnology 13: 563-564 tion of E. coli strains to optimize NADPH availability can also 35 (each of which is incorporated by reference). Improved meth be performed using standard in the art metabolic engineering ods for cloning in vitro amplified nucleic acids are described techniques such as chromosomal insertion of phage-deliv in Wallace et al., U.S. Pat. No. 5,426,039. Improved methods ered DNA and recombination of homologous gene fragments for amplifying large nucleic acids by PCR are Summarized in from plasmids. Cheng et al. (1994) Nature 369: 684-685 and the references Alternatively, methylotrophs, i.e. methanol-metabolizing 40 cited therein (incorporated by reference herein), in which yeast (e.g. Strains of Pichia) and bacterial cell Strains (e.g. PCR amplicons of up to 40 kb are generated. One of skill will strains of Methylobacterium), can serve as hosts for BM-3 appreciate that essentially any RNA can be converted into a biocatalysts. In this scenario, the engineered BM-3 enzymes double stranded DNA suitable for restriction digestion, PCR supply all of the methanol the hosts need to grow and live expansion and sequencing using reverse transcriptase and a while generating excess methanol for production. These cell 45 polymerase. See, e.g., Ausubel, Sambrook and Berger, all strains could also be modified genetically to optimize the Supra. amount of methanol they can produce. Additionally, since the The invention is illustrated in the following examples, growth of these methylotrophic strains are dependent on which are provided by way of illustration and are not intended methanol, these strains can be used to identify new enzyme to be limiting. variants with improved activity towards methane. Expression 50 of BM-3 in Pichia is possible with standard in the art Pichia EXAMPLES specific gene expression plasmids. Expression in methylotro phs is possible with standard in the art techniques using Example 1 multi-host shuttle vectors. The methods of the invention can utilize liquid cultures of 55 Ethane Conversion specific microorganisms for the whole cell bioconversion of methane to methanol, but other host organisms including but A catalyst for the selective oxidation under mild conditions not limited to other bacteria, other yeast, and even plants, of gaseous alkanes into easily transported alcohols allows for algae, and fungi could also be used. Plants, algae, and even the economical exploitation of both local, Small natural gas photosynthesizing bacteria Such as cyanobacteria that can 60 resources and vast, remote reserves. Prior to the present stud convert carbon dioxide and light into sugars (and then NAD ies, the selective conversion of ethane and methane mainly to (P)H) would be especially valuable as hosts for a BM-3 their corresponding alcohols was not demonstrated. Provided methane oxidation biocatalyst since these organisms would hereinare enzyme-based catalysts that convert ethane to etha not require any carbon Source other than carbon dioxide. nol and methane to methanol with high selectivity and pro Because methane streams derived from biomass contain 65 ductivity. The enzymes are function at ambient temperature about 50% carbon dioxide, these hosts are ideal for methane and pressure, use dioxygen from the air as the oxidant, and to methanol processes using landfill and waste generated produce little or no hazardous wastes. US 8,715,988 B2 47 48 Biological systems have evolved efficient and productive based on the hydroxylation of the octane Surrogate Substrate metalloenzymes that convert alkanes into alcohols using p-nitrophenyl octyl ether. The resulting mutant, 139-3, con dioxygen from the air. Methane monooxygenases (MMO) tained 11 amino acid substitutions distributed throughout the catalyze the conversion of methane to methanol and thereby hydroxylase domain and exhibited good activity on octane. It enable methanotrophic bacteria to use methane as a source of 5 had also acquired a small but appreciable activity on propane, carbon and energy. Microorganisms that grow on larger gas producing 2-propanol and a very small (less than 3%) amount eous alkanes (ethane, propane, butane) have also been of 1-propanol. Subsequently, the propane-hydroxylating reported and putatively use similar enzymes for the first, activity of this mutant was increased by randomly mutating alkane oxidation step. The relatively well-studied MMOs its heme domain and screening the resulting libraries for have long been a source of inspiration for designers of chemi 10 activity towards the (propane Surrogate) Substrate dimethyl cal catalysts. Unfortunately, these structurally complex ether. Mutant 9-10A obtained in this way was much more enzymes have never been functionally expressed in a heter active towards propane than 139-3 and produced significantly ologous organism suitable for bioconversion and process more 1-propanol (8%). The increase in 1-propanol was par optimization and therefore have proven to be of little practical ticularly interesting because the propane terminal C-H bond use themselves for production of alcohols. 15 energy (100.4 kcal/mol) is similar to the C H bondenergy of The present studies use directed evolution techniques to ethane (101.1 kcal/mol). convert heme-containing medium-chain fatty acid hydroxy To obtain an ethane-hydroxylating P450, the directed evo lases in to hydroxylating catalysts for Small alkanes. An lution of 9-10A was initiated. The mutations were targeted to exemplary hydroxylase includes cytochrome P450 BM-3 the active site rather than the entire heme domain. Most of (CYP102A1) which is the third P450 identified in the bacte these targeted Substitutions were not accessible by random rium Bacillus megaterium. P450 BM-3 is a soluble, catalyti point mutagenesis due to the conservative nature of the cally self-sufficient fatty acid monoxygenase isolated from genetic code. The BM-3 active site consists of a hydrophobic Bacillus megaterium. It has a multi-domain structure, com channel located directly above the heme. Guided by the high posed of three domains: one FAD, one FMN and one heme resolution crystal structure of P450 BM-3 with bound palmi domain, fused on the same 119 kDa polypeptide chain of 25 toylglycine Substrate eleven amino acid residues in this chan 1048 residues. Furthermore, despite its bacterial origin, P450 nel that lie within five A of the terminal eight carbons of the BM-3 has been classified as a class II P450 enzyme, typical of bound Substrate were chosen for mutagenesis. These residues microsomal eukaryotic P450s: it shares 30% sequence iden were targeted because they are likely to contact Small alkanes tity with microsomal fatty acid ()-hydroxylase, 35% in the active site during catalysis. Saturation mutagenesis sequence identity with microsomal NADPH:P450 reductase, 30 libraries constructed for each of these residues were screened and only 20% homology with other bacterial P450s. Homo for mutants showing improved activity towards dimethyl logues of P450 BM-3 (CYP102A1) have been recognized in ether. Single mutants selected from these libraries were then other bacteria, including two characterized systems in Bacil recombined to form a large library containing all possible Ius Subtilis: CYP102A2 and CYP102A3. combinations of the beneficial active-site mutations, and this Communication between the heme and reductase domains 35 library was screened for activity towards dimethyl ether. The of BM-3 is substrate-dependent and fine-tuned to transfer a mutant with the highest activity on propane, 53-5H, was pair of electrons from an NADPH cofactor through two isolated and purified and shown to catalyse thousands of reductase-bound flavins and then shuttle the electrons, one at turnovers of propane to propanol at a rate of 370 min' (see a time, into the hydroxylase domain during catalysis. Engi Table 1). neering the protein for activity on a new Substrate Such as 40 Notably, P450 BM-3 mutant 53-5H also hydroxylates ethane and/or methane must conserve this regulated commu ethane to generate ethanol (see Table 1).53-5H contains three nication, and poses significant technical challenges. The active-site mutations, A78F. A82S, and A328F, all of which engineered enzyme must bind ethane and/or methane, a Sub replace alanine with a larger side chain which reduce the strate considerably smaller than a fatty acid, directly above volume of the active site and position small alkanes above the the heme in the active site of the hydroxylase domain. Sub 45 heme during catalysis. In addition to its activity towards strate binding must initiate electron transfer from the reduc ethane, 53-5H exhibits the highest regioselectivity (89% tase domain to the heme domain during catalysis. Once elec 2-octanol) and enantioselectivity (65% S-2-octanol) towards tron transfer occurs, the active iron-oxo species must be octane yet encountered in any BM-3 variant (see Table 1), capable of breaking the high energy C-H bond in ethane further evidence of tighter substrate binding in the engineered (101.1 kcal mol-1 vs. 95-99 kcal mol-1 for the secondary 50 active site. C—H bonds of the fatty acid substrates of wild type BM-3). Ethane hydroxylation by 53-5H was measured after Finally, the singly-hydroxylated product must be released 12-hour incubations of the enzyme in ethane-saturated buffer from the active site before further oxidation can occur. containing an ethanol-free NADPH regeneration system Crystal structures of BM-3 heme domain with and without (commercial NADPH contains 2-3% ethanol). While the substrate are available while a structure of the reductase 55 reaction is uncoupled to NADPH oxidation in the presence of domain is not. Furthermore, large conformational changes in ethane (660 min', see Table 2), 53-5H nevertheless consis both domains during catalysis make it difficult to identify tently produces at least 50 equivalents of ethanol independent amino acid Substitutions which can address all these issues. In of the starting concentration of enzyme (see FIGS. 2 and 3). the present studies, ethane and/or methane hydroxylases were Overoxidation was tested by Supplying ethanol as a substrate engineered using an evolutionary strategy in which muta 60 and monitoring ethanol depletion by gas chromatography and tions are accumulated in multiple generations of random the production of acetaldehyde and acetic acid by C NMR mutagenesis and screening. The approach has been to use using "C-labeled ethanol. No (i.e. less than 10%, the detec directed evolution methods to adapt the enzyme to exhibit tion limit of these experiments) overoxidation products were higher turnover on Smaller and Smaller alkanes (see e.g., FIG. detected in these experiments (see FIG. 4). In general, the 1). Through Successive rounds of mutagenesis of the heme 65 hydrophobic active sites in P450s assist in the release of domain (residues 1-429 of SEQ ID NO: 1) the activity of singly-hydroxylated products. Alkane-hydroxylating BM-3 towards octane was increased. Screening was initially mutants overoxidize longer-chain alkanes Such as decane to a US 8,715,988 B2 49 50 higher extent (<7%) than shorter ones such as hexane F205C, S226R, H236Q, E252G, R255S, A29OV, A328F, (<1%). 18 Mutant 53-5H also catalyses the rapid (660 min-1) L353V) and 35-E11 (with additional amino acid substitutions and efficient (80% coupled) conversion of octane to primarily E464G and I710T) were expressed and purified as described 2-octanol in the presence of 1% ethanol, demonstrating that (Peters et al., J. Am. Chem. Soc. 125:13442 (2003); incorpo ethanol does not inhibit the catalyst. These results demon 5 rated by reference herein). Enzyme concentrations were mea strate that the 101.1 kcal/mol C H bond dissociation energy sured in triplicate from the CO-difference spectra. of ethane does not pose a fundamental barrier to a cytochrome Construction of Saturation mutagenesis libraries: Muta P450 and that a P450-based biocatalyst can cleanly convert tions were introduced into mutant 9-10A (with amino acid ethane into ethanol without undesired side reactions. substitutions R47C, V78A, K94I, P142S, T175I, A184V, A further round of directed evolution was performed to 10 increase the ethane activity of 53-5H, this time targeting F205C, S226R, H236Q, E252G, R255S, A290V, L353V) by mutations to the reductase domain, including the polypeptide PCR overlap extension mutagenesis using Pfu turbo DNA linker that connects it to the heme domain. The amino acid polymerase (Stratagene). The primers for each Saturation substitution E464G in the linker region increases total turn library contained all possible combinations of bases, NNN over in selected BM-3 mutants. Alone this mutation does not 15 (N=A, T, G, or C), at the codon for a particular residue. The enhance the production of ethanol by 53-5H, but further recitation of “for” indicates a “forward' primer. Accordingly, improvement was found upon screening a library of 53-5H the primers in the forward direction for each library were: containing this mutation and random mutations in the reduc tase domain for high activity towards dimethyl ether and (SEQ ID NO:19) accompanied by reduced NADPH consumption rates in the 74NNN- for (5' - GTCAANNNCTTAAATTTGCACG-3') absence of substrate. The resulting mutant 35-E 11 contains all 15 of the hydroxylase domain substitutions found in (SEQ ID NO: 2O) 53-5H, the linker mutation E464G, and a new mutation, lead 75NNN- for (5' - GTCAAGCGNNNAAATTTGCACG-3') ing to the substitution I710T, located in the reductase domain. (SEQ ID NO: 21) Compared to 53-5H, mutant 35-E 11 exhibits a five-fold 25 78NNN- for (5'-GCTTAAATTTNNNCGTGATTTTGCAGG-3") increase in total turnover of ethane to ethanol (total turnover number (ttn)=250). In a typical reaction with ethane, 200 nM (SEQ ID NO: 22) of mutant 35-E11 produces over 50 uM of ethanol (FIG. 3). 81NNN- for (5' - CGTGATNNNGCAGGAGAC-3 '' ) The rate of product formation by 35-E 11 equals that of (SEQ ID NO:23) 53-5H, while the NADPH consumption rate in the presence of 30 82NNN- for (5' - CGTGATTTTNNNGGAGAC-3 '' ) ethane has dropped to -520 min-1 (Table 2). The increased productivity likely reflects a prolonged catalyst lifetime, (SEQ ID NO:24) achieved by reducing non-productive cofactor oxidation, 87NNN- for (5' - GAGACGGGTTANNNACAAGCTGGAC-3) which inactivates the protein by forming various reactive (SEQ ID NO:25) species. Amino acid residue 1710 by comparison to the crys 35 88NNN- for (5'-GGAGACGGGTTATTTNNNAGCTGGACG-3') tal structure of the homologous rat P450 reductase, is located (SEQ ID NO: 26) near the FAD cofactor. 26 ONNN-for (s" - CAAATTATTNNNTTCTTAATTGCGGGAC-3') Accordingly, provided herein are variant P450s that have (SEO ID NO: 27) been adapted to hydroxylate ever smaller substrates. For 263NNN-for (5'-ACATTCTTANNNGCGGGACACGAAAC-3') example, the wildtype cytochrome P450 BM-3 enzyme is 40 inactive on propane. Evolving it to become an octane (SEQ ID NO: 28) hydroxylase, however, generated a small amount of activity 264NNN-for (5'-ACATTCTTAATTNNNGGACACGAAAC-3') towards propane. Similarly, increasing the newly acquired (SEQ ID NO: 29) propane activity produced measurable activity towards 328NNN-for (5' - CCAACTNNNCCTGCGTTTTCC-3') ethane. Directed evolution has generated a biocatalyst with 45 activity towards ethane. Further, as provided below, addi The reverse primers for each of these libraries complement tional rounds of directed evolution has generated a BM-3 their corresponding forward primers. For each mutation, two variant with the ability to convert methane to methanol. This separate PCRs were performed, each using a perfectly enzyme provides the foundation for a novel biocatalytic route complementary primer, BamHI-forw (5'-GGAAACAG to methanol production from methane. 50 GATCCATCGATGC-3') (SEQID NO:30) and SacI-rev (5'- Reagents: Unless noted otherwise, chemicals were pur GTGAAGGAATACCGCCAAGC-3') (SEQ ID NO:31), at chased from Sigma-Aldrich (St. Louis, Mo.). Ethane the end of the sequence and a mutagenic primer. The resulting (99.95%), propane and dimethyl ether (DME) were pur two overlapping fragments that contain the mutations were chased from Special Gas Services, Inc. (Warren, N.J.) or then annealed in a second PCR to amplify the complete Advanced Gas Technologies. Ethanol was purchased from 55 mutated gene. The full gene was then cut with BamHI and AAPER (Shelbyville, Ky.). NADP+ and NADPH were pur SacI restriction enzymes and ligated with T4 ligase (Invitro chased from Roche (Indianapolis, Ind.) or Biocatalytics, Inc. gen) into pBM-3-WT18-6, previously cut with BamHI and (Pasadena, Calif.). Hexanes (99.94%) were purchased from SacI to remove the wild-type gene. The ligation mixtures EMD (Gribbstown, N.J.). Restriction enzymes were pur were then transformed into E. coli DH5alpha electrocompe chased from Roche (Indianapolis, Ind.) or New England 60 tent cells and plated onto Luria-Bertani (LB) agar plates to Biolabs (Beverly, Mass.), Taq DNA polymerase from Roche form single colonies for picking. (Indianapolis, Ind.), Pfuturbo DNA polymerase from Strat Construction of the active site recombination library: agene (La Jolla, Calif.), and T4 DNA ligase from Invitrogen Improved mutations that were picked up from the Saturation (Carlsbad, Calif.). mutagenesis libraries were recombined in a combinatorial Expression and Purification of P450 BM-3: The genes 65 fashion. Saturation libraries of amino acid positions 74, 81, encoding P450 BM-3 mutants 53-5H (with amino acid sub 263 and 264 did not yield improved mutants so only benefi stitutions R47C, V78F, A82S, K94.I., P142S, T175I, A184V, cial mutations at the remaining seven sites were recombined. US 8,715,988 B2 51 52 Mutations were added sequentially using overlap extension not reported in this work. To make the reductase library of PCR with degenerate primers (see above) or a mixture of 53-5H containing this mutation, a random library of the degenerate primers (mutation sites in the primers are in bold reductase portion of the E465G BM-3 gene (1285bp to 3147 letters). Starting with mutant 9-10A, mutations at amino acid bp) was PCR amplified using the primers Sac1-for (5'- position 78 and 82 were introduced using primers: 5 GACTTTGAAGATCATACAAACTACGAGCTCG-3') (SEQ ID NO:44) and EcoRI-rev (5'-AGATCTGCTCAT GTTTGACAGCTTATCATCGATGAATTC-3) (SEQ ID 78A/T 82A/TAS/F/I/C/G/L/V-for NO:45), with various concentrations of MnCl2 (0, 50, 100, (SEQ ID NO:32) and 150 uM) added. The amplified DNA was restriction (5' GCTTAAATTCRCGCGTGATTTTDBCGGAGACG) 10 digested using Sac1 and EcoRI and then ligated with T4 and ligase into the pCWori plasmid containing the gene for 53-5H 78F 82A/TAS/F/I/C/G/L/V-for with its reductase portion removed with Sac1 and EcoRI. A (SEQ ID NO:33) small library (~100 mutants) from each PCR condition was s' CTTAAATTCTTTCGTGATTTTDBCGGAGACG) . tested for mutation rate. The 100 LM MnCl2 condition pro Next, mutations at position 260 were introduced using 15 duced the optimal mutation rate of 1-2 mutations per gene, primers: and a larger library containing 2610 members from this con dition was picked and Screened. Nucleic acid sequences of resulting ethane hydroxylating 26 OT- for mutant 35-E11 (positions given with regard to the coding (5' CAAATTATTACATTCTTAATTGCGGGAC), (SEQ ID NO:34) sequence of wild-type P450 BM-3 nucleic acid sequence) 26 ON-for include C142T, T234C, G235T, A237T, G247A, C248G, (5' CAAATTATTAACTTCTTAATTGCGGGAC) (SEO ID NO:35) A249C, A284T, C324T, C427T, C527T, C554T, T617G, and C681G, T711G, A758G, C766A, C872T, G985T, C986T, C1060G, A1119G, A1394G, T2132C, and A2313G. Amino 26 OL - for 25 acid sequence mutations compared to wild-type P450 BM-3 (5' CAAATTATTCTTTTCTTAATTGCGGGAC) . (SEQ ID NO:36) include R47C, V78F, A82S, K94.I., P142S, T175I, A184V, Mutations at position 87 and 88 were introduced using F205C, S226R, H236Q, E252G, R255S, A29OV, A328F, primers: L353V, E464G, I710T. Cell lysates for high throughput Screening were prepared 30 as described previously (Peters et al., J. Am. Chem. Soc. 87F/I/V/L 88T for 125:13442 (2003); incorporated by reference herein). High (5' GAGACGGGTTANTYACAAGCTGGAC) (SEO ID NO:37) throughput dimethylether (DME) screens were performed in and 96 well plates as described previously. Improved mutants 87F/I/V/L 88C for were selected based on an increased signal compared to (5' GAGACGGGTTANTYTGTAGCTGGAC). (SEO ID NO:38) 35 9-10A (for the saturation mutagenesis libraries) and 1-12G (for the recombination library). Reactions on octane and pro Mutations at position 328 were introduced using primers: pane were analyzed by gas chromatography as described. Reactions on ethane were carried out similarly to those 328A-for described for propane (Peters et al., J. Am. Chem. Soc. 125: (5' CCAACTGCTCCTGCGTTTTCC), (SEO ID NO:39) 40 13442 (2003); incorporated by reference herein). Because commercially-available NADPH contains 2-3% ethanol, an 328L/F/V-for NADPH regeneration system was used. Thereby, the back (5' CCAACTBTTCCTGCGTTTTCC), (SEQ ID NO: 4 O) ground ethanol concentration was reduced to approximately and 1 uM. Ethane hydroxylation reactions were performed in 20 328M-for 45 mL glass vials. A 5.0 mL reaction mixture contained 100, 200 (5' CCAACTATGCCTGCGTTTTCC) (SEQ ID NO: 41) or 500 nM purified protein in 0.1 M potassium phosphate buffer (pH 8.0) saturated with ethane and dioxygen. 300 uL of Finally, mutations at position 75 were introduced using the regeneration system (1.66 mM NADP+, 167 U/mL isoci primers: trate dehydrogenase, 416 mM isocitrate) was added. The vial 50 was immediately topped with a septum, pressurized with 20 75L/I- for psi of ethane and stirred at 25°C. For determination of total (5' CTTAAGTCAAGCGMTTAAATTC), (SEQ ID NO: 42) turnover, the reaction was allowed to proceed for four hours. and For rate determination the reaction was stopped at 0, 15 and 75W - for 30 min. 50 uL of 5 mM 1-butanol was added to the reaction 55 mixture as an internal standard. Ethanol and 1-butanol were (5' CTTAAGTCAAGCGTGGAAATTC) . (SEQ ID NO : 43) Subsequently derivatized to their corresponding alkyl nitrites. Sequences of resulting ethane hydroxylating mutant Control reactions were performed by repeating these steps 53-5H: Nucleic acid mutations compared to wild-type P450 without substrate and with inactivated protein, to correct for BM-3 include C142T, T234C, G235T, A237T, G247A, background levels of ethanol. Reaction buffer saturated with C248G, A249C, A284T, C324T, C427T, C527T, C554T, 60 ethane and oxygen did not contain ethanol concentrations T617G, C681G, T711G, A758G, C766A, C872T, G985T, above this background. C986T, C1060G, and A1119G. Amino acid mutations com Reactions with ethane and corresponding control reactions pared to wild-type P450 BM-3 include R47C, V78F. A82S, without ethane or with inactivated protein were carried out in K94I, P142S, T175I, A184V, F205C, S226R, H236Q, triplicate. Taking into account the background ethanol con E252G, R255S, A290V, A328F, and L353V. 65 centration of around 1 uM (1.2+0.1 uM for the control reac Construction of the reductase library: The mutation E465G tion without ethane or 0.8+0.1 uM for the control reaction was found by directed evolution of a different BM-3 variant with inactivated P450), the reactions yielded final ethanol US 8,715,988 B2 53 54 concentrations corresponding to total turnover numbers of 50 9-10A exhibit a significant spin shift upon incubation with and 250 for 53-5H and 35-E11, respectively (see e.g., FIGS. good Substrates, such as octane. 2 and 3). NADPH oxidation rates were measured over 20 S at 25°C. Gas chromatographic analysis of the ethanol was per using a BioSpec-1601 UV-Vis spectrophotometer (Shi formed using purchased Standards and a five-point calibration 5 madzu, Columbia, Md.) and 1 cm pathlength cuvettes. A 1 curve with 1-butanol as an internal standard. The samples mL reaction mixture contained 100 nM P450, 166 uM were injected at a Volume of 1 uL and analyses were per NADPH and either methane-, ethane- or propane-saturated formed at least in triplicate. Ethanol analysis was performed potassium phosphate buffer (0.1 M, pH 8.0) or 4 mM octane on a Hewlett-Packard 5890 Series II Plus gas chromatograph and 1% ethanol in potassium phosphate buffer (0.1M, pH using an HP-1 column (Agilent Technologies, 30 m length, 10 8.0). The reaction was initiated by the addition of NADPH 0.53 mm ID, 2.65uM film thickness) connected to an electron and the decrease in absorption at 340 nm was monitored. capture detector (ECD). The temperature program was: 60° NADPH consumption rates were also measured without sub C. injector, 150° C. detector, 35° oven for 3 min, 10° C./min strate (see Table 2). gradient to 60° C., 60° C. gradient to 200° C., and then 200° Octane reactions were performed in presence of 1% etha C. for 5 min. 15 nol and the results shown in Table 1. Rates of ethanol forma In addition, overoxidation of ethanol by 53-5H was tested. tion were measured over 30 min using gas chromatography First, ethanol (20 uM), instead of ethane was used as the (GC) coupled to an electron capture detector (ECD) and are substrate under the reaction conditions described above, and reported as nmol ethanol/min/nmol enzyme. Errors are at no decrease in ethanol concentration was observed within most 10%. Total turnover number (ttin) was measured using experimental error (+3 uM) over the course of the reaction. GC after completion of the reaction and is reported as nmol Second, NMR (Varian Mercury 300 MHz) spectroscopy was product/nmol protein. Errors are at most 10%. Initial rates of used to monitor reactions of 53-5H with singly 'C-labeled propanol and octanol formation were measured over 1 min by ethanol (HO CHCH, 57 ppm). Concentrations of 'C- GC and are reported as nmol product/min/nmol protein. labeled ethanol as low as 50 uM in the presence of 10 LM Errors are at most 15%. The “ee' of 2-octanol (main product) P450 (in 0.1M potassium phosphate buffer, pH=8, with 10% 25 was measured by GC; the favored enantiomer is listed in deuterium oxide added) could be detected using the 13C parenthesis. Wild-type P450 BM-3 primarily produces 3- and channel of the NMR. The longer relaxation rates of the car 4-octanol. The yields were not sufficient for the determination bonyl carbon atoms in acetaldehyde (206 ppm) and acetic ofee for wt BM-3. Mutant 9-10A contains amino acid sub acid (176 ppm) decrease their carbon NMR signal heights by stitutions R47C, V78A K94.I., P142S, T175I, A184V, F205C, 25-40% in the experiments, making their detection limit S226R, H236Q, E252G, R255S, A290V, and L353V. TABLE 1

Number of Active site Octane

amino acid amino acid Ethane Propane %2- 9/o Enzyme substitutions substitutions Rate ttin Rate ttin Rate ttin octanol ee Wild- 30 150 17 n.d type BM-3 9-10A 13 V78A 23 1100 S4O 3OOO 82 50 (S) 53-5H 15 V78F, A82S, 0.4 SO 370 SOOO 66O 8000 89 65 A328F (S) 3S-E11 17 V78F, A82S, 0.4 250 210 6000 420 8000 89 65 A328F (S) approximately 100 LM. Reactions using 10 uM53-5H, 1 mM Table 2 provides NADPH oxidation rates of 53-5H and 'C-ethanol, 100 uM NADP+, 50 mM isocitrate, and 10 35-E11 in presence and absence of substrate. The coupling U/mL isocitrate dehydrogenase were stirred 12 hours at room 50 efficiency is estimated from product formation rate/NADPH temperature. Deuterium oxide (10% final concentration) was oxidation rate. In general, NADPH oxidation rates cannot be then added to each reaction, and their NMR spectra (averag measured on the same time scale as ethanol formation rates. ing ~20,000 scans each) were recorded. Control reactions using 50 nM of 53-5H from the same batch, 4 mM of octane, TABLE 2 100 uM NADP+, 50 mM isocitrate, and 10 U/mL isocitrate 55 dehydrogenase were also performed and the products char 53-5H 3S-E11 acterized to verify that the P450 was fully active. No overoxi dation products were detected in these experiments, indicat NADPH Coupling NADPH Coupling ing that, in the presence of 1 mM ethanol, 53-5H catalyzes oxidation efficiency oxidation efficiency less than 10 total turnovers of ethanol (see FIG. 4). 60 rate (min' % rate (min) % The absorption spectra for 53-5H and 35-E11 were taken using a 3 uM solution of purified P450 in a quartz cuvette. no substrate S4O 390 Upon incubation with ethane, no shift in equilibrium from ethane 660 O.O6 52O O.O8 low- (A at 418 nm) to high-spin (A at 390 nm) configu propane 930 40 800 26 ration of the ferric P450 was observed. Octane, a good sub 65 Octane 82O 8O 660 64 strate, also did not induce a shift. Mutant 35-E11 behaved identically (see FIG. 5). Previously presented mutants, e.g. US 8,715,988 B2 55 56 Example 2 about 470 to 1048). These four libraries were screened as described above, and several improved variants (described Methane Conversion below) were isolated, sequenced, and purified for further analysis. Methane hydroxylase variants were generated by an evo 5 The mutant BM-3 enzymes described below were purified lutionary strategy in which mutations were accumulated in using published procedures and used as catalysts in Small multiple generations of random mutagenesis and Screening, reactions with methane. In glass head space vials containing in essence adapting the enzyme to exhibit higher turnover on magnetic stirring bars, BM-3 catalyst was added to 0.1 M Smaller and Smaller alkanes. In the first generations, the activ potassium phosphate buffer (pH-8) saturated with methane. ity of BM-3 towards octane was increased and a small but 10 appreciable activity on propane was acquired. Further evolu The vials were then crimped shut and charged to 20 psi with tion led to a variant which is much more active towards methane. To start the biotransformation, an NADPH regen propane and, significantly, produced more 1-propanol. The eration system consisting of NADP+, isocitrate, and isocitric increase in the terminal hydroxylation product was particu dehydrogenase was added through a syringe. The reaction larly interesting because the propane terminal C-H bond 15 was allowed to stir 12-15 hours. Methanol production was energy (100.4 kcal/mol) is similar to the C H bondenergy of measured using a coupled enzyme assay. Aliquots of the ethane (101.1 kcal/mol). reaction were transferred to 96-well microtitre plates and Continued directed evolution then led to the first ethane reacted with the enzyme alcohol oxidase for 5 minutes at hydroxylating P450 BM-3 as described above. The ethane room temperature. Alcohol oxidase converts the methanol hydroxylating P450 BM-3 variant 35-E11 was obtained by produced by the BM-3 variants into formaldehyde which can improving the electron transfer in the triple active site variant be detected with the dye Purpald. To detect formaldehyde and 53-5H as described. Briefly, the active site mutations in quench the alcohol oxidase reaction, a solution of Purpald in 53-5H (V78F. A82S, and A328F) were obtained by screening 2 MNaOH was added and allowed to react for one hour. The a library containing all possible combinations of single-site purple color that Purpald forms in the presence of formalde mutations of a BM-3 variant previously evolved for high 25 hyde was quantified one hour later by spectrophotometrically activity towards propane. The 15 mutations in 53-5H (12 are reading the well at 550 nm and comparing the absorbance not in the active site) are located in the heme domain: R47C, against similarly-treated solutions of BM-3, NADPH regen V78F, A82S, K94.I., P142S, T175I, A184V, F205C, S226R, eration system, and methane-free buffer spiked with known H236Q, E252G, R255S, A290V. A328F, L353V. While not concentrations of methanol. The results of these biotransfor bound to a particular mechanism of action, it is believed that 30 mations are reported in Table 3 and sample spectra from the the active site mutations serve to decrease the size of the calorimetric assay are shown in FIG. 7. active site so that ethane is bound and hydroxylated. Mutant As expected from their relative increase in activity towards 53-5H has only limited activity towards ethane, producing the screening Substrates, several of the mutants isolated from only about 50 total turnovers before it is inactivated. Variant these libraries have increased activity towards methane. 35-E11 was obtained by modifying the reductase domain of 35 Three of these mutants were isolated from the saturation 53-5H to more efficiently transfer electrons to the active site library of the linker position 464: E464R, E464Y, and E464T. during catalysis. This variant contains two mutations, one in None of these mutants exhibited higher activity towards the the linker region connecting the two domains (E464G) and screening substrate DME, but all of them had a 30-40% lower one in the reductase domain (IT10T), that improve coupling NADPH background consumption rate in the screen, indicat and increase the total turnovers on ethane to more than 250. 40 ing that the roughly equivalent production of oxidized DME The 35-E 11 variant was tested for the ability to convert they produced compared to the parent 35-E 11 required less methane to methanol using the method of Anthon and Barrett NADPH. This increase in coupling on DME translated to an (Agric. Food Chem., 52:3749 (2004)). The method was opti increase in TTN of methane (from 8 to 11-13). From the heme mized so that it could detect methanol concentrations in water domain library, mutant 20-D3 was identified for its increased as low as 10 uM. Using this assay the 35-E11 variant was 45 activity towards DME. Even though the background con identified as producing methanol from methane. sumption rate of NADPH was increased by an equivalent Libraries of mutants of 35-E11 were screened for pheno amount in the screen, more of these reducing equivalents are types indicative of either increased product formation on transferred to the heme domain of this mutant in the presence Small gaseous Substrates slightly larger than methane. Addi of Substrate, leading to an increase in methanol formation tionally, they were screened for increased coupling of the 50 (from 8 to 10 TTN). This mutation, H285R, is located on the consumption of NADPH reducing equivalents to product for Surface of the heme domain, near the putative interface mation. Product formation was determined by measuring between the heme and reductase domains. From the reductase total amount of dimethyl ether hydroxylation while increased domain library, two mutants, 23-1D and 21-4G, were identi coupling was determined primarily by decreased background fied for their decreased background NADPH consumption NADPH consumption rates (measured spectrophotometri 55 rates, leading to an increase in methanol formation. The muta cally at 340 nm in the absence of substrate). tions in these variants, Q645R/P968L and D631N, respec Four libraries of mutants of 35-E11 were constructed. Two tively, are located outside of the electron transfer pathway. saturation mutagenesis libraries were prepared in 35-E11, in Their presence, however, gives 35-E11 increased TTN on which all possible amino acid substitutions were introduced methane (from 8 to 10-11). at the positions of two non-heme domain mutations, E464G 60 The methane hydroxylation activity reported here is not and I710T. Additionally, two random mutagenesis libraries of Sufficiently high for a practical process for converting meth 35-E11 were prepared by error-prone PCR such that each ane to methanol. TTN must be increased, and coupling of library contained an average of 1-2 amino acid mutations. methanol production to consumption of the cofactor must be One of these random mutagenesis libraries contained only increased, or the enzyme will use too must However, the mutations in the heme domain (residues 1 to about 455) while 65 methods of making mutations and Screening the libraries the other library contained mutations only in the linker described here can be used to further improve the activity, between the two domains and the reductase domain (residues TTN and coupling. US 8,715,988 B2 57 58 Whole cell catalysis. Cell mass is generated using a carbon added to the vial. The vial was immediately topped with a Source Such as glucose and nitrogen and other nutrients Sup septum, pressurized with 30 psi of methane and shaken at 4 plied to the cells. At an optimum cell density, expression of C. for 1 hour. Then 30 uL of an NADPH regeneration system the P450 biocatalyst is induced using strategies well known in (1.66 mM NADP+, 167 U/mL isocitric dehydrogenase, 416 the art. After optimum biocatalyst expression levels are 5 mM isocitrate) was added to initiate the reaction. For deter reached, nitrogen is removed from the medium to prevent mination of total turnover, the reaction was allowed to pro further growth of the cell, ensuring that any additional carbon ceed for 20 hours at room temperature. Source Supplied to the non-growing cells is converted into The amount of methanol product formed during the 20hr reducing equivalents for methanol production. Methane and 10 methane reaction was quantified colorimetrically with the use oxygen are then Supplied to the culture at Saturating levels of alcohol oxidase and Purpald. 190 uL of the reaction mix assisted by mixing with either propeller-type mixing or air ture is transferred onto a single cell of a 96 well plate. 10 LIL lift mixing. As methanol is accumulated in the culture, por of 0.05 U/uL solution of alcohol oxidase (from Pichia pas tions of the culture are separated from the main reactor, cen toris, purchased from Sigma-Aldrich) was added to cell and trifuged to remove cells (which are added back to the main 15 allowed to incubate at room temperature for 5 minutes. Dur reactor), and the Supernatant treated to recover the methanol. ing this time, the alcohol oxidase converts the methanol into The process can be constructed as either a batch or continuous formaldehyde. Purpald (168 mM in 2 M NAOH) was then process. added to form a purple product with the formaldehyde in Specifically, the first demonstration of whole cell methanol Solution. The purple color was read approximately 1 hour production using a catalyst based on BM-3 or one of its later at 550 nm using a Spectramax Plus microtiter plate homologs expressed in a laboratory strain of E. coli will occur reader. The concentration of methanol was determined based in a small fermentor where oxygen and glucose levels can be on the measure absorbance in comparison to a methanol easily monitored and adjusted. Oxygen levels are particularly calibration curve. The methanol calibration curve was gener important, because in the absence of oxygen, the facultative ated with several standards. Each standard contained 100 uL aerobe E. coli will begin to degrade glucose into ethanol and 25 of a methanol solution (10, 20, 40, 60, 100 uM) and 90 uL of carbon dioxide. This causes the cell to up-regulate the pro the methane reaction mixture described above, without the duction of alcohol dehydrogenases that will consume the methane pressurization. trace levels of methanol produced by our catalyst. Addition Table 3 provides amino acid Substitution, screening and ally, the ethanol that accumulates from anaerobic metabolism 30 activity data for second-generation 35-E11 variants exhibit interferes with our methanol detection assays. To perform the ing methane hydroxylation activity. The relative dimethyl whole cell reaction in the fermentor, a 1L culture of cells will ether (DME) activity was measured, 1 hour after treatment be grown to high density (ODoo-1-2) in Terrific Broth con with Purpald, as the average net and absorbance at 550 nm taining 1 ug/mL amplicillin (for retention of the PCWori plas from 8 microtitre plate DME reactions of the mutant com mid containing the biocatalyst gene). Catalyst expression will 35 pared to 8 equivalent reactions of 35-E11. All values were be induced by the addition of 1 mM IPTG and 0.5 mM normalized by measuring the P450 concentration in each well Ö-aminolevulinic acid. After 12-24 hours, the culture will be with CO binding. “Asteriks' indicate criteria used to select centrifuged to collect the cells containing biocatalyst. After mutants from libraries for further characterization. The oxi removal of the supernatant, the cells will be resuspended and dation rate is measured as absorbance change at 340 nm over washed in a nitrogen free minimal medium to remove traces 40 1 minute upon the addition of NADPH to 8 microtitre plate of the terrific broth. The cells in nitrogen free minimal wells containing P450-containing cell lysate and compared to medium will then be added to a fermentor and fed 0.5% 8 equivalent reactions of 35-E11. All values were normalized glucose. During the reaction, additional glucose will be added by measuring the P450 concentration in each well with CO to maintain this level. In the fermentor, oxygen will be added 45 binding. The total turnover number (TTN) was measured as to the culture at saturating levels and will be monitored and the absorbance at 550 nm of methane reactions with each added during the reaction as needed to maintain these levels. mutant after treatment with alcohol oxidase and Purpald. To initiate the reaction, methane will be bubbled through the culture. The lack of nitrogen in the culture forces the cells into TABLE 3 50 a non-growing state, allowing most of the glucose aerobically Relative Relative consumed to be converted into NADPH for the biocatalyst.39 AA DME background An E. coli cell functioning as an aerobe also has a limited Substitutions activity in NADPH Methane ability to metabolize alcohols, including the methanol prod Mutant to 35-E11 SCCEl oxidation rate TTN uct.40 Methanol will be measured in this culture as described 55 for the in vitro reactions in this work. 3S-E11 1 1 8 The present data demonstrates that methane hydroxylation 35-E11-E464R E464R O.9 0.6* 12 activity can beachieved through modification of various cyto 3S-E11-E464Y E464Y O.8 0.7% 13 chrome P450 enzymes by directed evolution. Since this 3S-E11-E464T E464T O.8 0.7% 11 enzyme can be expressed in heterologous hosts that do not 60 2O-D3 H28SR 1.7% 1.6 10 metabolize methanol as it is produced, the invention of this 23-1D Q645R, P968L 0.7 O.8% 11 biocatalyst also makes the production of methanol using a 21-4G D631N 1.1 0.7% 10 whole cell process possible. Reactions for enzyme-catalyzed conversion of methane to methanol were carried out in 10 mL. glass headspace vials. A 65 Table 4 provides an extensive list of amino acid substitu 470 uL reaction mixture containing 5 uM purified BM-3 tions in various parental P450 sequences (e.g., SEQID NO: 1. variant in 0.1 M potassium phosphate buffer (pH 8.0) was first 2, 3, 4, 5, and 6). US 8,715,988 B2 59 TABLE 4 Seq. ID Seq. ID Seq. ID Seq. ID Mutation No. 1 No. 2 No. 3 No. 4

Heme Domain R47C G48C V48C D49C Mutation in 3S-E11 Active Site ation in E11 Active Site ion in 1

ion

Domain P142S P143S A143S P144S

A186V

Domain H28SR D287R E287R ion D3 Domain ion E11 Active Site ion E11 Domain ion S-E11 ker Mutation 5-E11 and RYT Derivatives Reductase Domain Mutation in 21-4G Reductase Domain Mutation in 23-1D Reductase Domain Mutation in 35-E11 Reductase Domain Mutation in 23-1D

Example 3 challenge. In particular, the regio- and enantioselective hydroxylation of Saturated hydrocarbons remains a problem Regiospecific Activity 60 in chemical synthesis. As previously noted, cytochrome P450 BM-3 is a soluble Biocatalysis provides a useful alternative to classic chemi protein that contains an hydroxylase domain and a reductase cal synthesis due to the inherent specificity of enzymes. Enan domain on a single polypeptide. Also as previously noted, tioselective catalysts that accept only a single enantiomer or cytochrome P450 BM-3 variants are provided that exhibit regioisomer of a Substrate have been created using directed 65 efficient alkane hydroxylase activity. One of these variants, evolution. The enzymatic, regio- and enantioselective func 9-10A, was further modified at eleven active site residues that tionalization of achiral compounds, however, is a greater are in close contact to the Substrate and Screened for activity. US 8,715,988 B2 61 62 Beneficial mutations were recombined, generating variants Screening libraries by spectroscopically monitoring that catalyze the hydroxylation of linear alkanes with varying NADPH consumption rates in the presence of the alkane regioselectivities. often selects for mutants with NADPH oxidation rates highly Mutant 9-10A exhibits high activity towards alkanes as uncoupled to product formation. This can be overcome Small as propane. The alkane hydroxylation properties of this 5 through the use of Surrogate Substrates that mimic desired mutant are detailed in Tables 5 and 6. Substrates, yet yield species when hydroxylated that are easily In Table 5 the product distribution is determined as ratio of detected either directly or with the addition of a dye. For a specific alcohol product to the total amount of all alcohol example, the screen for the high-throughput identification of products (given in %). The product distribution for ketones mutant BM-3 enzymes with activity towards propane as was similar to alcohol product distribution. The numbers 10 described previously uses dimethyl ether (DME) as a propane reported here are the (%) total of all ketones relative to total Surrogate Substrate. Upon hydroxylation, it decomposes into products (alcohols+ketones). The favored enantiomer is methanol and dye-sensitive formaldehyde. In addition, hexyl listed in parentheses. methyl ether (HME) may be used as a surrogate for octane. It TABLE 5

Mutant Product Propane Hexane % ee heptane % ee Octane % ee. Nonane 9o ee Decane 96 ee 9-10A 1-alcohol 8 O 1 1 O 1 2-alcohol 92 6 14(s) 26 7(S) 53 50(S) 39 60(S) 16 3-alcohol 95 41(s) 41 8(S) 2O 59 32 4-alcohol 33 26 3 51 Ketones <1 3 5 5 6

For the data shown in Table 6 the initial NADPH consump shares most of the physical properties of octane, including tion rates were measured over 15 seconds at 340 nm as nmol 25 solubility in buffer, size, shape, and strength of its C—H NADPH/min/nmol protein. Octane reactions contained 100 bonds. When HME is hydroxylated at its methoxy carbon, it nMP450, 166 uMNADPH, and 4 mM octane in 1% ethanol also produces an equivalent of formaldehyde which is visu and potassium phosphate buffer. Propane reactions contained ally detected at concentrations as low as 10 uM with the dye 200 nMP450, 166 uMNADPH, and propane-saturated potas Purpald (see FIG.25). Hexanal, the aldehyde product formed sium phosphate buffer. Errors are at most 10%. Initial rates 30 when the C-carbon of the hexyl group is hydroxylated, is were measured by GC over 1 minute as nmol total products/ practically unreactive towards Purpald at Some concentra min/nmol protein. Octane reactions contained 100 nMP450, tions used in the present Screens. Formaldehyde is approxi 500 uMNADPH, and 4 mM octane in 1% ethanol and potas mately 42 times more sensitive to Purpald than hexanal (see sium phosphate buffer. Propane reactions contained 1 uM FIG. 26). This disproportionate response makes HME, in P450, 500 uM NADPH, and propane-saturated potassium 35 contrast to the symmetric ether DME, a regio-selective screen phosphate buffer. Coupling was determined by ratio of prod for terminal alkane hydroxylation. uct formation rate to NADPH consumption rate. Total turn Small libraries of changes at each individual active site over numbers were determined as nmol product/nmol position in 9-10A were made to identify changes that either enzyme. Octane reactions contained 10-25 nMP450, 500 uM increase its activity towards alkanes or alter its regioselectiv NADPH, and 4 mM octane in 1% ethanol and potassium 40 phosphate buffer. Propane reactions contained 10-25 nM pro ity. The 11 key active site residues were chosen for their tein, potassium phosphate buffer Saturated with propane, and proximity, i.e. within 5 Angstroms of the 10 carbon atoms an NADPH regeneration system containing 100 uMNADP+, closest to the heme cofactor, to the bound fatty acid substrates 2 U/mL isocitrate dehydrogenase, and 10 mM isocitrate. co-crystallized with BM-3 heme domain in published crystal 45 structures. These active site residues are A74, L75, V78, F81, TABLE 6 A82, F87, T88, T260, 1263, A264, and A328. For example, crystal structures of wildtype P450 BM-3 with and without Rate of Rate of Coupling NADPH Product to Substrate reveal large conformational changes upon Substrate Consumption Formation NADPH Total binding at the active site. The substrate free structure displays Mutant Substrate (min)al (min). (%)?el Turnovers 50 an open access channel with 17 to 21 ordered water mol 9-10A octane 2600 S4O 21 3OOO ecules. Substrate recognition serves as a conformational trig propane 420 23 5 1100 ger to close the channel, which dehydrates the active site, increases the redox potential, and allows dioxygen to bind to As shown in Table 5 mutant 9-10A hydroxylates octane 55 the heme. In the present studies Saturation mutagenesis librar into a mixture of octanols and ketones. The distribution of ies were made at each of these 11 positions. The libraries were these products is different from the distribution produced by screened for activity towards dimethyl ether (DME) and wild type BM-3. 9-10A contains 14 mutations in the heme hexylmethyl ether (HME). domain, only one of which, V78A, is located in the active site. Twenty-one single active site mutants with increased activ This data indicates that active site changes, especially mul 60 tiple changes, may alter product distributions and increase ity towards either of these substrates were isolated. Mutations activity. For example, two other active-site mutations, A82L that improved the activity were foundatamino acid positions and A328V. shifted the regioselectivity of alkane hydroxyla L75 (I, W), A78 (T. F. S), A82 (T, S, F, I, C, G, L), F87 (I, V, tion predominantly to a single, Subterminal (c)-1) position. L), T88 (C), T260 (N. L. S) and A328 (L, F, M). The single Based on these results, active site variants were generated in 65 active site mutants of 9-10A were characterized by their order to obtain enzymes with altered regioselectivity of altered regioselectivity towards octane. Each of these muta hydroxylation and altered substrate specificities. tions produces a characteristic octanol product distribution as US 8,715,988 B2 63 64 a result of their presence altering the active site geometry bination library was assembled stepwise using designed oli above the heme cofactor. These 21 different mutants and their gonucleotide primers containing the mutations. Mutant regioselectivity towards octane are listed in Table 7. Trans 1-12G2 was used as a control in the screening procedure. A formations were carried out using cell lysates. The mutations total of 16 mutants with improved activity towards DME or were generated in variant 9-10A (with amino acid mutations HME relative to 1-12G were selected for purification and R47C, V78A, K94.I., P142S, T175I, A184V, F205C, S226R, further characterization. Octane was used as a model Sub H236Q, E252G, R255S, A290V. L353V). “Others” indicates strate for alkane hydroxylation and rates of product forma overoxidation products (ketones and aldehydes) with a simi tion, total turnover number and regioselectivity were deter lar product distribution as the alcohols. “Rate' was measured 10 mined (Table 8). Mutations were found at amino acid as nmol octanol/nmol P450/minute. “TTN' was measured as positions 75, 78, 82, 87 and 328. While building the library nmol octanol/nmol P450 produced in 4 hour reaction. ten improved members of a partial library were screened and TABLE 7 sequenced. All possible amino acids at position 75 and 260 15 were identified, and four out of eight amino acids at position 1-ol 2-ol 3-ol 4-ol 78. mutationPl |% (9%) 96 (9%). Othersel ratell ttnel O 51 21 27 1 S4O 3OOO Table 8 provides a listing of mutations, product distribu L75I 3 42 8 17 2O 21 O 4500 L75W 3 6 8 24 9 2OO 4000 tion, rates and total turnover numbers of octane hydroxylation A78T 4 22 27 45 2 SOO 4000 reactions catalyzed by BM-3 active site mutants. A78F 4 37 7 31 11 380 3OOO A78S S 48 8 20 9 26O 2SOO A82T S 23 25 44 3 34O 2700 Mutations are relative to variant 9-10A (described above). A82S 6 40 9 26 9 3OO 2600 Wild-type has a valineat position 78. The product distribution A82F 4 26 20 48 2 14O 3OOO determined as ratio of a specific alcohol product to the total A82I 4 11 21 S9 5 600 3500 25 A82C S 26 26 42 1 2OO 3OOO amount of alcohol products (given in %). The formation of A82G 7 52 7 23 1 36O 2000 ketones was also observed but is less than 5% of the total F87I 8 70 6 3 13 6O 2600 amount of product. The initial rates of product formation F87V 5 52 3 4 26 240 3600 F87L 7 55 22 12 4 6O 2300 were measured by GC over 60s as nmol total products/min/ T88C 4 S1 21 22 2 460 4000 30 nmol protein. The reactions contained P450 (100 nM), T26OL 7 67 1 10 5 26O 2SOO T26OS 6 39 26 29 O 160 3000 NADPH (500 uM), and octane (4 mM) in ethanol (1%) and T26ON 6 29 23 42 O 3OO 2SOO potassium phosphate buffer (0.1 M, pH-8.0). The total turn A328F 6 88 4 O 2 40 2800 over numbers are determined as nmol product/nmol protein. A328M 24 71 O O 5 10 700 A328L 13 87 O O O 2O 1000 The reactions contained P450 (25 nM), NADPH (500 uM), 35 and octane (4 mM) in ethanol (1%) and potassium phosphate buffer (0.1 M, pH=8.0). TABLE 8 9-10A A75 A786) A82 F87 A328 4-ol%) 3-ol %) 2-ol%) 1-ol%) ttnel rate 9-10A 2 2 51 O 3OOO S4O 77-9H T 45 52 1300 160 1-7D I F 55 38 1300 220 68-8F F 58 31 1400 18O 49-9B 76 2O 38OO 310 35-7F F 74 18 2OOO 2OO 1-SG I F 74 17 2200 380 13-7C T 74 17 2600 310 49-1A T 81 15 3700 350 2-4B I T 76 14 3900 32O 12-10C 77 12 31 OO 240 11-8E y 87 8 5700 550 41-SB F 68 1800 370 29-3E F 79 3200 160 7-11D 78 2200 150 29-1 OE 84 3200 140 53-5H F 87 7400 660

The original amino acids and the beneficial mutations The HME screening procedure was used to identify BM-3 described above were recombined to form a library of variants with a regioselectivity shifted towards the terminal mutants containing all possible combinations of these muta 60 methyl carbon of the alkane chain. In general, mutants with tions. Mutations A82L and A328V were included in this high total turnover of HME in the screen produced more library as they had proven to be important determinants of 1-octanol than compared to 1-12G. The total amount of 1-oc regioselectivity in previous experiments. Many of the active tanol formed (percent of 1-octanol multiplied by ttn) corre site mutations are too close to each other on the encoding gene 65 lates well with the amount of HME converted. For example, such that standard methods such as DNA shuffling or StEP mutant 77-9H (see Table 8 above) is not improved relative to could not be used to recombine them. Therefore, the recom 1-12G when considering rates and total turnovers on octane, US 8,715,988 B2 65 66 but this mutant converts octane to 52% 1-octanol, making it 2-position. Wildtype BM-3 hydroxylates lauric and palmitic highly active towards HME in the screen. The proportion of acid at Subterminal positions. Lauric acid is hydroxylated 1-octanol in the product mixture varies from three to 51%. mainly (48%) at the co-1 position, while palmitic acid is Single active site mutants that showed significantly different hydroxylated at the co-2 position (48%). On fatty acids, regioselectivities were recombined. Upon recombination and 5 mutant 77-9H shows a similar trend, with selectivity for spe screening for shifts towards terminal hydroxylation all the cific subterminal positions (see Table 10). Residues that con tact the terminal eight carbons of the substrate were altered identified variants are selective for the terminal methyl or based on the crystal structure with bound palmitoylglycine. penultimate Subterminal methylene group. Screening the Amino acid residues outside this region have not been modi recombination library e.g. for hydroxylation of the third or 10 fied, particularly those interacting with the carboxyl moiety. fourth carbon of a linear alkane chain may identify variants that exhibit this specific regioselectivity. TABLE 9 The recombination library contains approximately 9,000 Substrate 1-ol %) 2-ol %, 3-ol %) 4-ol % 5-ol % different mutants of P450 BM-3, each of which contains a 15 characteristic active site. More than 70% of these were active hexane 11.4 83.8 4.9 heptane 21.3 74.3 4.2 O.2 towards DME or HME. Data provided herein indicate that Octane 51.8 45.1 2.8 O.3 mutations in the active site can yield variants that hydroxylate Ol8le 9.7 88.1 1.5 0.7 O.O complex Substrates Such as 2-cyclopentyl-benzoxazole with decane 5.5 91.8 2.0 0.7 O.O high regio- and enantioselectivity, yielding potentially valu able intermediates for chemical synthesis (see example below). For Table 9 shows product distribution determined as ratio Application of active site mutations to homologous pro of a specific alcohol product to the total amount of alcohol teins: Mutant 9-10A described above was acquired by accu products (given in %). The formation of ketones was also mulating point mutations in directed evolution experiments. 25 observed but is less than 10% of the total amount of product. An alternative method for making libraries for directed evo lution to obtain P450s with new or altered properties is TABLE 10 recombination, or chimeragenesis, in which portions of homologous P450s are swapped to form functional chimeras. Substrate () (9%) ()-1 % ()-2% ()-3% Recombining equivalent segments of homologous proteins 30 lauric acid 7.5 S2.1 2.8 37.5 generates variants in which every amino acid Substitution has palmitic acid 8.9 5.2 83.0 3.0 already proven to be successful in one of the parents. There fore, the amino acid mutations made in this way are less Table 10 shows product distributions for hydroxylation of disruptive, on average, than random mutations. Recently, the fatty acids catalyzed by cytochrome P450 BM-3 77-9H. structure-based algorithm "SCHEMA” was developed. The 35 algorithm identifies fragments of proteins that can be recom Product distribution was determined as ratio of a specific bined to minimize disruptive interactions that would prevent hydroxylation product to the total amount of hydroxylation the protein from folding into its active form. P450 BM-3 and products (given in %). homologs thereof are soluble fusion proteins consisting of a P450 BM-3 mutant F87A has been reported to support catalytic heme-domain and an FAD- and FMN-containing 40 ()-hydroxylation of lauric acid. However, data provided NADPH reductase. They require dioxygen and an NADPH herein shows that the F87A mutation broadens regioselectiv cofactor to catalyze substrate hydroxylation. However, the ity and shifts hydroxylation away from the terminal position. heme domain can utilize hydrogen peroxide via the peroxide In the present studies, the F87A substitution was engineered “shunt” pathway for catalysis. While this peroxygenase into wild-type P450 BM-3 and 9-10A. The distribution of activity is low in CYP102A1, it is enhanced by the amino acid 45 alkane and fatty acid hydroxylation product regioisomers substitution F87A. A similar effect has been shown for the were analyzed. Neither variant supports terminal hydroxyla equivalent F88A mutation in CYP102A2. tion activity on alkanes (Table 11) or fatty acids (Table 12). SCHEMA was used to design chimeras of P450 BM-3 and Table 11 shows product distributions for hydroxylation of its homolog CYP102A2 sharing 63% amino acid sequence alkanes catalyzed by cytochrome P450 BM-3 variants wit identity. Fourteen of the seventeen constructed hybrid pro 50 F87A and 9-10A F87A. teins were able to fold correctly and incorporate the heme cofactor, as determined by CO difference spectra. The folded TABLE 11 chimeras exhibited peroxygenase activity. A large library of chimeras made by SCHEMA-guided recombination of BM-3 Mutant Substrate 1-ol%) 2-ol %, 3-ol %) 4-ol % 5-ol % with its homologs CYP102A2 and CYP102A3 contains more 55 wt F87A hexane O S4 46 than 3000 new properly folded variants of BM-3. In addition heptane O 36 61 3 Octane O 14 43 44 to creating new biocatalysts, these chimeragenesis studies Ol8le O 7 2O 2 72 demonstrate that the homologous P450 enzymes CYP102A2 decane O 10 9 O 81 and CYP102A3 are similar enough to BM-3 to recombine in 9-10A F87A hexane O 42 57 this fashion and still retain a high probability of folding. 60 heptane 2 62 34 3 Octane 1 55 37 7 Regioselectivity of variant 77-9H for the hydroxylation of Ol8le O 24 46 5 25 n-alkanes: Mutant 77-9H with the highest selectivity for ter decane O 14 23 1 62 minal hydroxylation was characterized in reactions with alkanes of chain length C6 to C10. The product distributions summarized in Table 9 show that mutant 77-9H hydroxylates 65 Table 12 shows product distributions for fatty acid only octane primarily at the terminal position. Alkanes hydroxylation catalyzed by cytochrome P450 BM-3 variants shorter or longer than octane are hydroxylated mainly at the wt F87A and 9-10A F87A. US 8,715,988 B2 68 TABLE 12

()-1 (0-2 (0-3 (0-4 (0-5 (0-6 (0-7 Mutant Substrate () (9.6 %) 9.6 %) 9.6 % (96) % wt-F87A lauric acid O 5 9 39 2O 27 O palmitic acid O 11 36 18 22 13 O 9-10A-F87A lauric acid O 5 6 28 14 47 O palmitic acid O O 1 1 2 8 2O 6

10 Total turnover numbers (ttn) and rates of product formation enzymes and nonnatural substrates, NADPH consumption is were measured for hydroxylation of the different alkanes by not necessarily coupled to the formation of product. Instead, 77-9H (see Table 13). The rates of product formation range electrons from NADPH reduce heme-bound dioxygen to from 112 min' (for heptane) to 742 min' (for nonane). The water or the reactive oxygen species peroxide and Superoxide. highest turnover rates of cytochromes P450 that carry out 15 To determine the coupling efficiency (the fraction of NADPH terminal hydroxylation reactions, however, are an order of used for product formation) of mutant 77-9H, rates of magnitude lower than those of 77-9H. Wildtype BM-3 NADPH oxidation upon addition of the alkane substrates hydroxylates alkanes at rates that are an order of magnitude were measured and compared to product formation rates lower (e.g., 30 min-1 for octane). To determine TTN, the (Table 13). The rate of product formation is correlated to enzyme was diluted to a concentration at which the P450 is coupling efficiency, i.e. the more the reaction is coupled to neither oxygen-limited nor inactivated due to dialysis of the NADPH oxidation, the higher the rate of product formation. flavins. The total turnover number was found to be substrate The highest coupling efficiency (66%) was observed for dependent and generally increases with the alkane chain Oa. length. Rates of substrate hydroxylation appear to follow the Linear alkanes are challenging to hydroxylate selectively same trend. It is well known that unproductive dissociation of 25 because they contain no functional groups that help fix their reduced oxygen species can lead to the formation of Super orientation for reaction with the active heme iron-oxo species oxide or peroxide, which inactivate the enzyme. Consistent of cytochrome P450. Model studies with metalloporphyrins with this is the fact that the enzyme supports the most turn show that, absent substrate binding effects, the regioselectiv overs on nonane, the Substrate for which the coupling effi ity of n-alkane hydroxylation is determined by the relative ciency is the highest (66%, Table 13). bond dissociation energies. Engineering a P450 for terminal TABLE 13 rate of product rate of NADPH Substrate tnal formation min oxidation min' coupling efficiency (%) hexane 18OO 130 251 - 29 1560 - 60 16.12.O heptane 1730 - 110 1129 1630 - 20 6.9 O.6 Octane 3040 150 236 - 24 163O3O 14.5 1.7 Ol8le 4240 680 74230 1150 100 65.6 9.3 decane 2831 - 225 318: 47 94OSO 33.47.2

For Table 13, total turnover numbers were determined as hydroxylase activity must therefore override the inherent nmol product/nmol protein. The reactions contained P450 (25 specificity of the catalytic species for the methylenegroups so nM), an NADPH regeneration system (166 uM NADP+, 6.6 that only the terminal methyl carbon is activated. U/mL isocitrate dehydrogenase, 41.6 mM isocitrate) and 45 Wildtype cytochrome P450 BM-3 hydroxylates its pre octane (4 mM) in ethanol (2%) and potassium phosphate ferred Substrates, medium-chain fatty acids, at Subterminal buffer (0.1 M. pH 8.0). The rates of product formation were positions. The terminal carbon is not hydroxylated. By measured by GC over 20s as nmol total products/min/nmol screening Saturation mutagenesis libraries at 11 active site protein. The reactions contained P450 (200 nM), NADPH positions and recombining beneficial mutations a set of P450 (500 uM) and octane (2 mM) in ethanol (2%) and potassium 50 BM-3 mutants that exhibit a range of regioselectivities for phosphate buffer (0.1 M, pH 8.0). The rates of NADPH oxi fatty acid and alkane hydroxylation are provided herein. One dation were measured over 20s at 340 nm as nmol NADPH/ mutant, 77-9H, hydroxylates octane at the terminal position min/nmol protein. The reactions contained P450 (200 nM), with 51% selectivity, but exhibits this selectivity only for NADPH (160 uM), and octane (2 mM) in ethanol (2%) and octane. The active site of 77-9H is therefore not restricted potassium phosphate buffer (0.1 M, pH 8.0). The background 55 near the activated oxygen to prevent Subterminal hydroxyla NADPH oxidation rate (without substrate) rate was 220/min. tion. The specificity for the terminal methyl group of octane The coupling efficiency is the ratio of product formation rate probably reflects specific interactions between active site to NADPH oxidation rate. residues and octane methylene groups and/or the methyl Wildtype P450 BM-3 tightly regulates electron transfer group at the unreacted end. The hexyl methyl ether substrate from the cofactor (NADPH) to the heme. In the absence of 60 used for screening terminal hydroxylation activity has the Substrate, a weakly-bound water molecule acts as the sixth, same chain length as octane. The mutations that enable ter axial ligand of the heme iron. Substrate replaces this water minal hydroxylation of HME likely act in a similar fashion to molecule, perturbing the spin-state equilibrium of the heme promote terminal hydroxylation of octane. iron in favor of the high-spin formand also increases the heme Recombining active site mutations for linear, terminal alk iron reduction potential by approximately 130 mV. These 65 ene epoxidation: The target reaction was the selective epoxi events trigger electron transfer to the Substrate bound heme dation of linear, terminal alkenes. The same Saturation and start the catalytic cycle. However, with mutant BM-3 mutagenesis libraries were screened to identify single active US 8,715,988 B2 69 70 site mutants that exhibit this activity. Upon recombination of beneficial active site mutations, two mutants were identified X ox M OH that epoxidize linear terminal alkenes with high regioselec RCH2-X -- R-CH Her tivity and opposite enantioselectivity. P450 V OH Potential uses of active site recombination variants for the unstable regioselective hydroxylation of larger compounds: The target -ax R compound 2-cyclopentlybenzoxazole was hydroxylated by M active site mutant 1-12G with high regio- and enantioselec R-CH -- O tivity (see Regio- and Enantio-Selective Alkane Hydroxyla 10 Ye H tion with Engineered Cytochromes P450 PCT/US04/ 18832, incorporated herein by reference). The work To test the feasibility of this approach, chloromethane was presented herein demonstrates that active site mutants are chosen as Substrate for the dehalogenation reaction. Com well suited for the preparation of chiral intermediates for pared to other haloalkanes, chloromethane (CM) has the chemical synthesis. In this case, the hydroxylated product can 15 advantage to be moderately soluble in water (0.5 g/100 ml) be used to as an intermediate in the synthesis of for example, and to lead to formaldehyde as end product of the hydroxy Carbovir, a carboxylic nucleoside potentially active against lation/alpha-elimination reaction. Formaldehyde can be then human HIV. readily detected by colorimetric reaction with Purpald as The active site mutants 53-5H and 77-9H, as provided described previously. To test the reaction, two BM-3 mutants herein, were tested for the hydroxylation of 2-cyclopentyl from later rounds of random mutagenesis, 35-E 1141 and benzoxazole. The regioselectivity for the formation of the 20-D3, were incubated in the presence of CM and a cofactor 3-(benzoxazol-2-yl)cyclopentanol was even higher than regeneration system based on NADP, glucose-6-phosphate and glucose-6-phosphate dehydrogenase. Reactions with reported for 1-12G. The observed selectivities of 53-5H and wild-type P450 BM-3 and without enzyme were included as 77-9H were 99.4% and 99.9%, respectively. The enantiose 25 controls. A specific colorimetric response (see FIG.27A) was lectivities were not as high with measuredee values of 55.8% observed only in the presence of the BM-3 mutants. for 53-5H and 1.5% for 77-9H. However, mutants were The reaction was repeated in the presence of additional selected for high regioselectivity and not enantioselectivity. BM-3 mutants, including 1-8H, 1-12G2 and the wild-type P450 BM-3 in which A328 was substituted with valine (BM Direct conversion of gaseous alkanes, such as ethane or 30 3-A328V) (FIG. 27B). The total turnover number of BM-3- propane to their corresponding alcohols: The mutants pro A328V on CM is considerably higher than that of the other vided herein indicate that single active site mutants, and com mutants considered in the study, while the wild-type enzyme binations thereof, may be used to generate active sites in P450 exhibits no detectable activity (FIG. 28). Wild-type BM-3 BM-3 capable of binding a diverse array of substrates, includ 35 was substituted with other amino acid residues at position ing gaseous alkanes. V328. These mutants, however, are much less active than The mutant with the highest activity on propane, 53-5H, BM-3-A328V on CM (FIG. 29). was shown to catalyze thousands of turnovers of propane to The activity of BM-3-A328V on CM and other substrates propanolata rate of 370 min. This mutant also hydroxylates such as HME, DME, and propane was further investigated by ethane to generate ethanol 53-5H contains three active-site 40 measuring product formation rate, cofactor consumption and mutations, A78F. A82S, and A328F, all of which replace coupling efficiency (Table 14). This mutant was also found to alanine with a larger side chain and presumably reduce the be capable to hydroxylate ethane to ethanol, although only for a total of about 5-10 turnovers. In Table 14 “consumption” volume of the active site and position small alkanes above the rates are expressed as nmol NADPH/nmol enzyme/min and heme during catalysis. In addition to its activity towards “product formation” rates are expressed as nmol product/ ethane, 53-5H exhibits the highest regioselectivity (89% 45 nmol enzyme/min. 2-octanol) and enantioselectivity (65% S-2-octanol) towards octane of any BM-3 variant identified herein. This informa tion further indicates that tighter substrate binding in the TABLE 1.4 engineered active site occurs in the modified enzymes. O Dehalogenation of halogenated alkanes by P450 BM-3 50 rates substrates substrate HME DME Propane CHCl ethane NADPH 19.9 36.3 12.5 n.d. 16.2 n.d. mutants: The hydroxylation activity of BM-3 active site consumption mutants on short- and medium-chain alkanes could be poten product 4.3 3.0 7.0 1.1 n.d. tially useful to carry out the dehalogenation of halogenated formationPl alkanes. Accordingly, wild-type P450 BM-3 was used as the 55 coupling (%) 11.8 24.2 8.O 6.8 n.d. starting point for generating active site variants. Alanine at TTN n.d. n.d. 2300 130 7 amino acid position 328 was Substituted with asparagine, leucine, isoleucine, proline, and valine using standard DNA It is envisioned that the type of dehalogenation reaction manipulation techniques. The amino acid sequence of the described for chloromethane is extendable to other halom wild-type enzyme is provided in SEQID NO: 1 (see FIG. 8). 60 ethanes such as bromo- and iodomethane as well as longer The dehalogenation reaction is based on the hydroxylation haloalkanes (e.g., haloethane, 1- and 2-halopropane). The of the carbon atom carrying the halogen by a spontaneous resulting haloalcohols are reactive intermediates which can (base catalyzed) alpha-elimination reaction which leads to form the corresponding carbonyl compounds (aldehyde, the formation of an aldehyde or ketone depending on the ketones) or epoxides depending on the regioselectivity of the starting haloalkane. The expected reaction mechanism for the 65 hydroxylation reaction and the nature of starting compounds dehalogenation of a general alpha-haloalkane is described by (FIG. 30). These reactions are useful for bioremediation as the scheme: well as for synthetic purposes.

US 8,715,988 B2 75 76 - Continued

Primer name it (SEQ ID NO : ) Sequence 74 E1037NNK for (119) 5 ' - CAG CAG CTA GAA NNK AAA. GGC CG -3' 75 E1037NNK rev (12O) is " - CGG CCT TTM NNT TCT AGC TGC TG-3'

76 BMfor 1504 (121) 5 " - GCA GAT ATT GCA ATG AGC AAA. GG-3

77 BMrev1504 (122) s" - CCT TTG CTC ATT GCA ATA TCT GC-3'

78 BMfor 2315 (123) 5 - CGG TCT, GCC CGC CGC ATA AAG-3

79 BMrev2315 (1.24) 5 - CTT TAT GCG GCG GGC AGA CCG-3

15 Construction of thermostability library (HL1): L52I. Construction of reductase domain random mutagenesis S106R, M145A, L324I,V340M, 1366V, E442K substitutions library (RL1-2). Random mutagenesis libraries of 35E11 were introduced in 35E11 sequence by PCR overlap exten FMN-binding domain (432-720) and FAD-binding domain sion mutagenesis. Multiple mutants were prepared by gene regions (724-1048) were created by error-prone PCR using SOEing using PCR products from single mutants. pCWori Taq polymerase (Roche, Indianapolis, Ind.), pCWori 35E11 35E11 was used as template, BamHI fwd and SacI rev as as template, FMN for/FMN rev and FAD for/FAD rev megaprimers, and #1 to #14 as mutagenizing primers. The primer pairs, and different concentrations of MnCl2 (100, amplified region (1.5 Kbp) was cloned in pCWori 35E11 200, 300 uM). The amplified region (1.9 Kbp) was cloned in using BamHI and Sac I restriction enzymes. 25 pCWori 35E11 using Sac I and EcoRI restriction enzymes. Construction of random mutagenesis library (HL2): Ran Mutation rate was estimated by sequencing of 8 randomly dom mutagenesis library of ETS8 heme domain (BamHI-Sac chosen colonies. I region) was created by error-prone PCR using Taq poly Construction of reductase domain Saturation mutagenesis merase (Roche, Indianapolis, Ind.), pCWori ETS8 as tem library (RL3-4). Site-saturation (NNK) libraries of positions plate, BamHI fwd and SacI rev primers, and different con 30 443, 445, 515, 580, 654, 664, 698, 1037 were prepared by centrations of MnC1 (50, 100, 150, 200, 300 uM) Mutation PCR overlap extension mutagenesis using pCWori 11-3 as rate was estimated by sequencing of 8 randomly chosen colo template, FMN for, and EcoRI rev megaprimers, and #60 to nies. #75 primers. Amplified region (1.9 Kbp) was cloned in p11-3 Construction of active site libraries (HL3): Site-saturation using Sac I and EcoRI restriction enzymes. (NNK) libraries of positions 437 and 438 were prepared by 35 Construction of domain recombination library (L9). A PCR overlap extension mutagenesis using pCWori 19A12 FMN/FAD recombination library was prepared by SOEing as template, BamHI fwd, and EcoRI rev megaprimers, and recombining G443A, P654K, T664G, D698G, and E 1037G #17 to #20 primers. Amplified region (~3.2 Kbp) was cloned mutations from RDL3 library, primers #76 to #79, and FMN in p19A12 using BamH I and EcoRI restriction enzymes. for and EcoRI rev as megaprimers. The reductase domain 40 library was combined to 7-7 heme domain by cloning the Site-saturation (NNK) libraries of positions 74, 75,78, 82,87, amplified region (1.9 Kbp) in pCWori 7-7 using Sac I and 88, 181, 184, 188, 260, 264, 265, 268, 328, and 401 were EcoRI restriction enzymes. prepared using BamHI fwd and SacI rev megaprimers, and High-throughput Screening. Single colonies were picked #22 to #53 primers. The amplified region (1.5 Kbp) was and inoculated by a Qpix robot (Genetix, Beaverton, Oreg.) cloned in pCRWori 35E11 using BamHI and Sac I restric 45 into 1-mL, deep-well plates containing LB (Amp") medium tion enzymes. (400 uL). The plates were incubated at 30°C., 250 rpm, and Construction of heme recombination library (HL4): 80% relative humidity. After 24h, 50LL from this pre-culture Recombination library was prepared recombining single were used to inoculate a 2-mL, deep-well plates containing mutants (e.g., A74S, A74Q, V184S, V184T, V184A) from 900 uLTB (Amp") medium per well. Cultures were grown at HL3 libraries by SOEing. Sequences of single point mutants 50 37°C., 220 rpm, and 80% relative humidity. As OD at 600 nm from RF-g3 generation served as templates. BamHI fwd and reached 1.5, cells were added with 50 uL TB medium con SacI rev were used as megaprimers. The amplified region taining 10 mM IPTG and 10 mM 8-aminolevulinic acid (1.5 Kbp) was cloned in pCWori 35E11 using BamHI and hydrochloride followed by growth at 30°C. After 24h, plates Sac I restriction enzymes. were centrifuged at 5000 rpm and stored at -20°C. Cells were Construction of heme saturation-recombination libraries 55 resuspended in 500 uL potassium phosphate (KPi) buffer (0.1 (HL5): These libraries were prepared by SOEing using gene M. pH 8.0), containing 0.5 mg/mL lysozyme, 2 units/mL fragments containing S/NNK in position 74, S/G in position DNasel, and 10 mMMgCl2. After incubation at 37° C. for 60 82, S/T/NNK in position 184 in combinatorial manner. To min, lysates were centrifuged at 1500 g for 10 minutes. introduce the 82G mutation primers #54 and #55 were used. Screening of dimethyl ether (DME) activity was carried out To randomize positions 74 and 184, primers #26 and #27 and 60 by transferring 20 ul supernatant to 96-well microtiter plates, primers #34 and #35, respectively, were used. To introduce adding 130 ul DME-saturated KPi buffer and starting the the remaining mutations, plasmids of the corresponding Vari reaction with 50 uL NADPH (1.0 mM). Formaldehyde pro ants were used as templates. For multiple mutants, a second duced in the reaction was determined by addition of Purpald PCR was carried out using purified product from first PCR as (168 mM in 2 MNaOH) and by monitoring absorbance at 550 template. BamHI fwd and SacI rev served as megaprimers. 65 nMusing a Spectramax Plus microtiter plate reader. Mutants The amplified region (1.5 Kbp) was cloned in pCWori with improved DME activity were re-screed in 96-well plates 35E11 using BamHI and Sac I restriction enzymes. using 8 replicates for each positive clone. US 8,715,988 B2 77 78 Protein expression and purification. All media and agar mL-scale reaction using 20 nM enzyme to reduce oxygen were supplemented with 100 ug/ml amplicillin. E. coli DH5C. limitation. Propane and butane reactions were analyzed by cells were transformed with plasmids encoding wild-type direct injection of reaction solution on the GC using 1-pen P450 gene and mutants thereof and grown in TB medium tanol as internal standard. Ethane reaction products were (Amp'). At ODoo-1.5, cultures were induced with 6-ami functionalized (see below) and analyzed by GC using 1-bu nolevulinic acid hydrochloride (ALA: 0.5 mM) and IPTG 0.5 tanol as internal standard. All measurements were performed mM and grown at 30° C. Cells were harvested after 24 hours in triplicate. by centrifugation at 4000 g. Cell pellets were resuspended in Reactions on laurate. P450 activity on laurate was 25 mM Tris pH 8.0 and lysed by Sonication. After centrifu tested mixing the substrate (2 mM) with purified protein (up gation, the Supernatant was applied to an ion-exchange col 10 to 0.5uM) in 100 mMKPipH 8.0 buffer, and adding NADPH umn (Toyopearl Super Q-650 resin). After the column was to final concentration of 1 mM. Reaction mixtures were incu washed with 3 column volumes 25 mM Tris pH 8.0 and 3 bated for 5 hours at room temperature under gentle stirring. column volumes 25 mM Tris 150 mM NaCl pH 8.0, bound Analysis of the hydroxylated products was carried out by protein was eluted with 25 mM Tris 340 mM. NaCl pH 8.0. derivatization with N-methyl-N-trimethylsilylheptafluo P450-containing fractions were collected and concentrated 15 robutyramide (Aldrich) followed by GCanalysis according to using a Centriprep 30 (Millipore). The concentrated protein a published procedure. A control experiment was carried out was desalted using a PD-10 desalting column (Pharmacia) using wild-type P450. From the reaction with wild-type and 100 mM KPipH 8.0 as running buffer. P450 concentra P450s, multiple hydroxylation products were detected, tion in the purified sample was measured in triplicate from the whereas no hydroxylation product could be detected from CO-difference spectra. Protein samples were aliquoted and P450, reactions. Stored at -80° C. Rate measurements. Initial rates of product formation were Tso determination. A solution of purified enzyme (1-3 uM) determined at 25° C. in 100 mM KPi pH 8.0 buffer using in 100 mM KPi pH 8.0 buffer was transferred on a PCR purified protein (100 nM) and a substrate concentration of 1.6 96-well plate and incubated for 10 minutes at different tem mM. In a 5 mL-vial, 20 uL of alkane solution (in ethanol) peratures (from 37°C. to 60° C.) in a PCR thermocycler. The 25 were dissolved in 400 uL KPibuffer, to which 500LL enzyme plate was then cooledonice, centrifuged at 1500 g for 10 min. solution at 100-200 nM was added. Reaction was initiated by 160 uL of the protein solution was then transferred on a adding 100 uL of 5 mM NADPH solution, followed by stir 96-well microtiter plate with 40 uL of sodium hydrosulfite 0.1 ring at 300 rpm for 1 minute, and quenched by adding 200LL M in 1 M KPipH 8.0. A protein sample incubated at room chloroform and stirring at 800 rpm for 10 sec. Solutions were temperature and one incubated on ice were included in the 30 transferred to an Eppendorf tube, to which internal standard series. Concentration of folded protein was then determined (10 uL 25 mM 1-bromohexane in ethanol) was added. Tubes by measuring CO-binding difference spectra using absorp were vortexed for 10 sec and centrifuged for 30 sec at 10,000 tion at 450 and 490 nm and extinction coefficient esco-91, g. Organic phase was removed and analyzed by GC. For 000 M' cm. CO-binding signals were normalized to pro propane, reactions were carried out in a 5 mL-scale using tein concentration of the sample incubated at 4°C. Tso values 35 100-400 nM purified enzyme, propane-saturated buffer (pro were calculated as midpoint inactivation temperatures from pane concentrations 1.6 mM) and a final NADPH concentra nonlinear fitting to the heat-inactivation curves. Experiments tion of 500LM. Reactions were stopped by addition of 200LL were carried out in triplicate. HSO conc. and analyzed by derivatization to alkyl nitrites Determination of NADPH oxidation rates. Initial rates of followed by GC-ECD analysis as described below. NADPH oxidation were measured using propane-saturated 40 Kinetic measurements. A propane concentration range 100 mM KPipH 8.0 buffer, purified enzyme at 50-200 nM, from 1.6 mM to 50 uM (6 data points) was prepared by and a saturating concentration of NADPH (200 uM). The dilution of a propane-Saturated buffer solution (propane solu absorbance decrease at 340 nm was monitored at 25°C. in a bility in water=1.6 mM). Reactions were carried out in a 5 BioSpec-1601 UV/VIS spectrophotometer (Shimadzu, mL-scale using 100-400 nM purified enzyme and a final Columbia, Md.) for 1 min. Rates were calculated over the first 45 NADPH concentration of 1 mM. After 30 seconds, reactions 20 seconds using NADPH extinction coefficient eo-6210 were stopped by addition of 200 uL HSO conc. and ana M' cm. Rates were determined in triplicate. Coupling lyzed by GC as described below. Propane leakage during the values were calculated from the ratio of propanol formation experiment was neglected in reason of short measuring time. rate to NADPH oxidation rate in the presence of propane. All measurements were performed in triplicate. Kinetic Reactions on alkanes. Reactions with alkanes from pen 50 parameters were estimated from least-squares nonlinear tane to decane were carried out at 3 mL-scale using 100 mM regression to the Michaelis-Menten equation, KPi pH 8.0 buffer containing 2% ethanol (to allow alkane solubilization) and purified protein. 2.5 mL KPi buffer were added with 60 uL of 90 mMalkane solution in ethanol (final where v is the initial velocity, S is the substrate concentration, conc.—1.8 mM), 150 uL enzyme solution at 1 uM (final 55 V is the maximum velocity, and K is the apparent conc. 50 nM), and 300 uL NADPH regeneration solution Michaelis constant for the substrate. (100 units/mL isocitrate dehydrohenase, 70 mg/mL isoci Products from alkane reactions were identified and quan trate, 1.5 mg/ml NADP). Vials were sealed and stirred for 24 tified by gas chromatography using authentic standards and hours at 4° C. 1 mL-aliquot of the reaction was removed, calibration curves with internal standards. For the reactions added with 10 ul internal standard (25 mM1-bromohexane in 60 with alkanes from pentane to decane, GC analyses were car ethanol), and extracted with 200 ul chloroform (30 sec vor ried out using a Shimadzu GC-17A gas chromatograph, Agi texing, 1 min centrifugation at 10000 g). The organic phase lent HP5 column (30 mx0.32 mmx0.1 um film), 1 uL injec was removed and analyzed by gas chromatography. Reac tion, FID detector, and the following separation program: tions with ethane, propane, and butane were carried out as 250° C. inlet, 250° C. detector, 50° C. oven for 3 min, 5° described for longer alkanes with the only difference that 65 C./min gradient to 100° C., 25°C/min gradient to 23.0°C., alkane-saturated KPibuffer was used. Propane reactions with and 23.0°C. for 3 min. For butane and propane (TTN) reac variants from HL5 and L9 libraries were performed on 1 tions, GC analyses were carried out using a Hewlett-Packard US 8,715,988 B2 79 80 5890 Series II Plus gas chromatograph, Supelco SPB-1 col ners in a single polypeptide. The catalytic function of umn (60 mx0.52 mmx0.5 um film), 0.5 LL injection, FID P450, is finely regulated through conformational re-ar detector, and the following separation program: 250°C. inlet, rangements in the heme and reductase domains, possibly 275° C. detector, 80° C. oven for 2 min, 10°C/min gradient involving hinged domain motions. This system enables an to 110°C., 25°C/min gradient to 275°C., and 275° C. for 2.5 easier and unbiased analysis of the evolution of the engi min. For ethane and propane (rate) reactions, analysis of neered function without the confounding effects of arbitrarily reaction mixtures was carried out by derivatization to alkyl chosen parameters (e.g. redox System composition), whose nitrites followed by GC-ECD analysis. For laurate, reaction role in the natural evolution of enzymes remains unclear. product analysis was carried out by derivatization with N-me P450 fold comprises a heme-, an FMN-, and a FAD thyl-N-trimethylsilylheptafluorobutyramide followed by 10 binding domain. Beside the heme domain, mutations in the extraction with hexane. Organic solutions were analyzed by reductase domains and linker regions are also known to affect FID-G.C. the catalytic properties of the enzyme. In order to identify Halomethane reactions: Reactions with chloromethane, beneficial mutations in each of these enzyme regions, a bromomethane, and iodomethane were carried out in 1 mL domain-based engineering strategy was employed (FIG. scale using halomethane-saturated buffer (100 mM phos 15 35C). While the heme domain was being optimized for pro phate pH 8.0) and 50-500 nM purified protein.900 uL buffer pane activity, beneficial mutations in the FMN- and FAD were added with protein, incubated for 15 seconds, and then binding domains were first identified (using 35E11 as back added with 100 uL NADPH regeneration solution (100 units/ ground) and Subsequently optimized (using heme domain mL IDH, 70 mg/mL isocitrate, 1.5 mg/ml NADP+). Solution variant 11-3 as background). In the last step, activity-enhanc were sealed and stirred for 30 minutes at 25°C. To reduce ing mutations in the reductase domains were recombined and background levels of formaldehyde to a minimum, buffers joined to the heme domain of the most active heme domain were prepared using HPLC-grade water and autoclaved twice variant (7-7). The overall directed evolution process that led prior to use. Formaldehyde concentration in the reaction Solu to the development of P450, from wild-type P450 is tions was determined through derivatization with o-(2,3,4,5, summarized in FIG. 35A. 6-pentafluorobenzyl)hydroxylamine hydrochloride 25 The engineering strategy that led to the development of (PFBHA) and gas chromatography. Reaction solutions (1 P450, is summarized in FIG. 35. As a first step, we aimed mL) were transferred in eppendorf tubes and added with 50 at increasing the thermostability of 35E11 heme domain in uL of PFBHA (Fluka) in formaldehyde-free water (17 order to obtain a more evolvable parent. This choice was mg/mL). Tubes were incubated for 15 minutes at 25°C. (1000 motivated by the repeated failure of random and focused rpm) in a thermoblock. Solutions were added with 10 LIL 30 mutagenesis libraries to yield improved variants starting from internal standard (25 mM 1-bromohexane), mixed, added this enzyme and by the analysis of the stability trend from with 500 uL hexane, vortexed for 10 sec, and centrifuged at wild-type to 35E11 demonstrated a considerable reduction in 13,000 rpm for 30 sec. Organic phase was removed, diluted stability as a consequence of the accumulated mutations 1/100 V/V, and analyzed on a Shimadzu GC-17A gas chro (FIG. 37E). To regain stability and allow further evolution, matograph. Analytical conditions were: Agilent HP1 column 35 stabilizing mutations, derived from a thermostable peroxyge (30 mx0.53 mmx2.65um film), 1 uL injection, ECD detector. nase, was tested in single and in combination (Table 16). The Separation program injection was: 70° C. inlet, 250° C. detec effect on enzyme’s thermostability of each single mutation or tor, 45° C. oven for 1 min, 15°C/min gradient to 200° C., 50° combination thereof on thermostability was evaluated by C./min gradient to 220°C., and 220°C. for 3 min. Under these monitoring temperature-dependent inactivation of the conditions, derivatized HCHO (PFBHA carbonyl oxime) 40 enzyme’s CO-binding (FIG. 36C). Among 35E 11 deriva elutes at 6.35 min, unreacted PFBHA at 6.88 min. Quantifi tives, ETS8 was selected as the best candidate for further cation was carried out using a calibration curve and solutions evolution experiments, in that its higher thermostability at known concentration of formaldehyde. Rates were esti (ATso--5.1°C.) was accompanied by only a 20% decrease in mated from 30 minute-reactions. Measurements were per propane activity on propane. Thus, almost half of the stability formed in triplicate. 45 loss from wild-type to 35E11 (ATso-11.6°C.) could be In multi-component redox systems such as soluble meth recovered in a single step and only two mutations (L52I. ane monooxygenase and class I cytochrome P450s, the rate of I366V). ETS8 was then used as parent for generating a set of product formation and the extent of coupling efficiency in heme domain random mutagenesis libraries with mutation product oxidation is profoundly influenced by the relative rates varying from 2.0-3.5 bp mutations/gene. Screening of ratio of the individual components in the invitro reconstituted 50 3000 members led to two improved variants, 19A12 and systems. With sMMO from Methylosinus tricho sporium 12G7, each carrying a singleamino acid mutation (L188Pand OB3b, product yield in propene oxidation varies from 10% to D182Y respectively), both located in helix F of P450 -100% as the relative concentration of MMOB and MMOR heme domain (FIG.35B). 19A12 served as parental sequence (with respect to MMOH) is changed. With sMMO from for the preparation of a pool of active site libraries, where 17 Methylococcus capsulatus (Bath), coupling in methane oxi 55 positions within and in proximity of the enzyme’s active site dation varies according to hydroxylase: reductase ratio with were targeted for saturation mutagenesis (FIG. 39). These significant uncoupling occurring at Stoichiometric concentra positions include residues 74, 75, 78, and 82 in helix B1", 87 tions. This relationship was also found to change with the and 88 in the helix B1'-helix Clinker, 181184 and 188 in helix nature of the substrate. Similar observations were made with F.260,264,265, and 268 in helix I, 328 in helix K-B-2 linker, P450, a prototypical member of class I P450 systems. 60 401 in helix L., 437 and 438 in the f-4 hairpin. Eleven of these When putidaredoxin is added in large stoichiometric excess positions were already Saturated in a previous round as compared to the other redox partners, up to 10-fold (J->35E11). From the active site libraries, further improve enhancements of P450 catalytic rates can beachieved. The ment of propane-hydroxylating activity was achieved in mul k/K, for oxygen consumption in P450 was also found to tiple variants, exemplified by variant 11-3. By recombination be hyperbolically dependent on putidaredoxin concentration. 65 of the beneficial mutations in these variants, a remarkably Belonging to prokaryotic class II P450 systems, P450 more active variant was obtained, i.e. 1-3. Analysis of the incorporates the hydroxylase domain and the reductase part mutational pattern in the selected variants from this and the US 8,715,988 B2 81 previously screened active site libraries revealed identical TABLE 17-continued positions exploring alternative or novel solutions, suggesting that fine tuning of active site after each mutational step be Amino acid mutations in the P450 variants. required for best function on propane. To account for this, a P450 variants recombination/site-saturation strategy was employed using 1-3 as parent and targeting positions 74, 82, and 184. From 139-3 J 35E11 ETSS 19A12 11-3 1-3 7-7 P450 these libraries, 7-7 emerged as the most active P450 variant, G443 A. with three different solutions for each of the targeted posi E464 G G G G G. G. G tions as compared to 1-3.7-7 is capable of catalyzing ~20,500 D698 G turnovers on propane, showing a k, of 185 min' and cou 10 I710 T T T T T T T pling efficiency of about 90% for propane hydroxylation. A list of all the amino acid mutations present in P450 and its Two random mutagenesis libraries were constructed tar intermediates is reported in Table 17. geting the FMN- and FAD-binding domain regions of 35E11 (RL1 and RL2, respectively; FIG. 35C). Mutation rates TABLE 16 15 ranged from 1.8 to 3.0 bp mutations/gene. Screening of about Thermostabilization of 35E11. S106R and M145A 5,000 members from each library led to the identification of mutations were introduced in ETS2 and ETS7, respectively, resulting eight beneficial mutations (G443D, V445M, T480M, in unfolded enzymes. These mutations were therefore not considered T515M, P654Q, T664M, D698G, and E1037G). Using 11-3 for recombination. as parent, these positions were subjected to Saturation ATso mutagenesis. Variants with improved activity on propane (as Variant L52I L324I V34OM I366V E442K Tso ( C.) ( C.) compared to 11-3) were identified in all the libraries except 35E11 43.4 O. 6 O those saturating positions 480 and 515. The amino acid sub ETS1 X 445 O.3 1.1 stitutions in the selected variants were G443A, V445R, ETS3 X 43.2 + 0.1 -0.3 25 P654K, T664G, D698G, and E1037G.TTN with propane in ETS4 X 46.O. O.1 2.6 ETS5 X 47.1 O.1 3.7 these mutants ranged from 20,000 to 25,000 with little or no ETS6 X 45.0 - 1.4 16 change in propane oxidation rate. As compared to the parent, ETS8 X X 48.5 O.2 5.1 most mutants, and in particular 11-3 (D698G), show ETS9 X X 46.8 O.6 3.4 enhanced coupling efficiency for propane hydroxylation. As ETS10 X X 44.2. O.1 O.8 30 ETS11 X X 46.6 O.2 3.2 the last directed evolution step, a recombination library of the ETS12 X X X misfolded — mutations in the FMN- and FAD-binding domains was pre ETS13 X X misfolded — pared by SOEing and fused to the heme domain of the most ETS14 X X X 46.1 O.6 2.7 active heme domain variant (7-7). The most active variant ETS15 X X X X 47.81.3 4.4 ETS16 X X X 45.9 O.1 2.5 (P450) isolated from this library (L9) contained G443A 35 ETS17 X X X X 47.2, O.2 3.8 and D698G mutations in addition to those already present in ETS18 X X X 45.5 O.2 2.1 7-7. A map of P450 mutated sites in the heme domain is

ETS19 X X X X 45.6 O.3 2.2 given in FIG.35B. S2O X X X 44.6+ 0.2 1.2 The structures of P450, heme- and FMN-binding S21 X X X X 45.8 O.1 24 TS22 X X X X X 47.O-O.3 3.6 domains were obtained using pdb entry 1BU7. P450 40 FAD-binding domain was modeled using rat cytochrome P450 reductase structure, pdb 1 AMO. Structural similarity between the two structures is supported by the alignment of TABLE 17 the solved structure of P450 FAD-binding domain and rat Amino acid mutations in the P450p a variants. CPR structure. The structures of sMMO components (hy 45 droxylase, 2Fe-2S N-terminal domain of MMO-Reductase, P450 a? variants C-terminal FAD-binding domain of MMO-reductase, and MMO regulatory protein) were obtained using pdb entries 139-3 J 35E11 ETSS 19A12 11-3 1-3 7-7 P450 1FZH, 1 JQ4, 1TVC, and 2MOB, respectively. R47 C C C C C Directed evolution was performed on a variant of cyto LS2 I I I I 50 chrome P450 (139-3) a self-sufficient hydroxylase from A74 E E B. megaterium that catalyzes the Sub-terminal hydroxylation V78 A. F F F F F A82 S S S G G of long-chain (C-C-o) fatty acids. As described herein the K94 I I I I I 139-3 P450 was modified to used propane and ethane H138 (35E11). Reflecting the dissimilarity in physico-chemical P142 55 properties between these small alkanes and the substrates of T175 I I I I s s s s V178 P450 (FIG. 34), coupling in propane and ethane oxidation A184. V V V in the engineered variants was useful. L188 Because 35E11 was only marginally stable, the engineer F205 S226 ing strategy described herein (see below) involved an initial H236 60 stabilization step followed by iterative rounds of mutagenesis E252 and Screening. Random, saturation and site-directed R255 mutagenesis methods were used to create libraries of mutants A290 (FIG. 35), which were screened for activity on a propane A295 A328 surrogate, dimethyl ether. Positives were confirmed in a re L353 V V s 65 screen and purified for further analysis. Reactions with puri 366 s s s s s s fied enzymes were carried out in sealed vials in the presence of propane-Saturated buffer and a cofactor regeneration sys US 8,715,988 B2 83 84 tem. Improvement in propane total turnover number—moles comparison among the variants, TTNs were normalized to the of propanol produced per mole of enzyme was a selection highest value in the series (MTN=Maximal Turnover Num criterion for taking a mutant to the next round. ber) and plotted for the increasing-sized substrates (FIG.38). Five rounds of directed evolution generated P450, a A gradual shift of the enzyme’s Substrate preference during laboratory evolution is evident from these profiles. The mar cytochrome P450 that is capable of supporting more than ginal activity of wild-type BM3 on these substrates 20,000 turnovers on propane (FIG.37A) and more than 2,500 (MTN=150) is restricted to the long-chain alkanes, where it is turnovers on ethane (Table 18). P450 differs from 35E11 a promiscuous function of BM3’s high C-C fatty acid at 7 amino acid positions and from wild-type at a total of 23 hydroxylase activity. In the first evolutionary intermediates, (~4% of its hydroxylase sequence). To evaluate the overall Substrate preference centers on medium-chain alkanes (Cs process of activity refinement from wild-type P450 to 10 C.), reflecting their selection for activity on octane. In 35E11, P450 total turnover numbers (TTN), catalytic efficiency selected for activity on propane, preference still centers on (k/K) and coupling efficiencies were measured for C-Cs alkanes but now extends towards the shorter ones, P450, and all its evolutionary precursors (FIG.37). Com including propane. Propane activity becomes prevalent with pared to 139-3, where activity on propane emerged (wild the transition from variant ETS8 to 19A12. This turning type P450 has no detectable propane activity), total turn 15 point also coincides with the highest per-mutation stability overs supported by P450, increased 100-fold. Kinetic penalty (FIG. 37F). For example, a single mutation (L188P) constants (K., k) were determined from Michaelis-Menten separates ETS8 from 19A12. Leucine 188 is located along plots of initial rates of activity on propane at different sub helix F (FIG. 35B), which together with helix G forms a lid strate concentrations. Throughout the variant series, the covering the active site, in one of the regions (SRS-2) asso increase in turnovers on propane accompanied a large ciated with substrate binding in P450s. Due to its particular decrease in K values that went from >30 mM in 139-3 to the conformational requirements and lack of H-bonding amide, -100-uMrange (1-3:250 uM, P450: 170 uM) (FIG.37C). proline acts as a helix-breaker. Although not wishing to be Despite the much reduced size and polar nature of propane, bound by any particular theory of operation, the L188P muta the P450 K, for propane is similar to that of wild-type tion may be associated with a perturbation of the Surrounding enzyme for laurate (K-2 10 LM). Catalytic constant k, 25 helical region, which destabilizes the protein fold but at the increased by a factor of 45 from 139-3 (10 min') to 19A12 same time induces a marked change of enzyme’s Substrate (450 min') and varied between 400 and 200 min' in subse recognition properties. The variants isolated after 19A12 are quent generations (FIG. 37D). P450, has a catalytic effi characterized by higher TTNs on propane and a gradual ciency (k/K) of 1.8x10" M' sec', or about three orders decrease in relative activity towards the longer alkanes. In of magnitude higher than that of the propane oxidizer 139-3 30 P450, specificity along the alkane series is increased and at least four orders of magnitude better than wild-type around propane, showing a 90% drop in activity when even a BM3. single methylene unit is added to the Substrate. TABLE 1.8 Total turnovers with P450 variants with alkanes from ethane to decane.

Alkane, Catoms TTN % rel. act. TTN % rel. act. TTN % rel. act. TTN % rel. act.

WT 139-3 J 35E11

2 O O.O O O.O 2O 5 O6 22O3O 3.5 3 O O.O 2OOSO 9.4 7SO 110 21.6 45SO 350 71.7 4 O O.O 68O 190 32.1 279O 900 80.2 5150 104 81.1 5 O O.O 102O 140 48.1 348O 412 1OOO 4800 - 174 75.6 6 10 - 4 6.3 1940, 73 91.5 329O 642 94.5 63SO 384 1OO.O 7 2O 5 12.5 2120 70 1OOO 23SO 220 67.5 S17O 148 81.4 8 3O42 18.8 2O6O 33 97.2 24OO 119 69.0 5950 323 93.7 9 110 - 10 68.8 111034 52.4 13OO 320 37.4 33SO 133 52.8 10 16095 1OO.O 730 - 49 34.4 780 - 65 22.4 860 - 223 13.5 19A12 11-3 1-3 P450Po.

2 7SO 80 7.1 13SO 80 10.2 17SO 190 1O.S 2SSO 280 12.4 3 1OSOO 1100 1OO.O 132OO 1200 1OOO 166OO 15OO 1OOO 2OSOO 1900 1OO.O 4 88SO 850 84.3 1OSOO 1500 79.5 82OO 900 49.4 62OO 400 30.2 5 8S90 S17 81.8 9370 268 71.O 6090 i 300 36.7 1OOO 110 4.9 6 747O 631 71.1 663O8O SO.2 63SO - 480 38.3 2600 205 12.7 7 S360 1027 S1.O S1SO 263 39.0 SO80 115 30.6 2990 16S 14.6 8 2710 108 25.8 229O 154 17.3 2O2O290 12.2 1880 - 60 9.2 9 2470 - 273 23.5 3330598 25.2 2870 S10 17.3 2390 30 11.7 10 1770 - S18 16.9 S1591 3.9 1OSO 100 6.3 108O3O 5.3

60 TTNs for each variant-substrate pair were determined In the presence of ideal Substrates, coupling of product using standardized reaction conditions (identical enzyme and formation to NADPH oxidation approaches 100%, e.g., no Substrate concentration, incubation time, cofactor regenera side reactions were detected in the P450 active site. Coupling tion system) (Table 18). TTN values were used to assess is therefore often used to define how well the substrate fits performance because TTN is a cumulative measure of both 65 into the active site. Along the variant series, coupling for catalytic and coupling efficiency and because mutants were propane oxidation increases from 5-15% in the earliest selected based on improvement in this property. To facilitate rounds to 90% in P450 (FIG. 37B). This is similar to the US 8,715,988 B2 85 86 efficiencies with which wild-type P450 oxidizes nase (MMO), which converts them to formaldehyde and HX. myristate and palmitate (88% and 93%, respectively). Reactions with purified P450 variants and the halomethanes Tso values determined from heat-inactivation curves of were analyzed for formaldehyde formation (FIG. 42). heme-dependent CO-binding spectral shift show how muta tions accumulated during laboratory evolution affected P450, and four of its evolutionary predecessors oxidized enzyme stability (FIG. 37E) and enable an estimation per bromomethane to formaldehyde. with bromomethane deha mutation stability penalties for each directed evolution round logenase activity that amounts to ~1-5% of the propane activ (FIG. 37F). A gradual but continuous decrease in stability ity. Activity on chloromethane was detected at lower levels both before and after the stabilization step is evident, reflect (0.5–1% of propane activity) and in fewer P450 variants. ing the often-observed trade-off between mutational effects 10 Iodomethane dehalogenase activity could be detected only in on activity and Stability. The average impact of each muta tional event on enzyme stability varies considerably, how P450, and its immediate precursor 1-3 (FIG. 43B). ever, with the highest stability cost observed for the transition Despite the low turnover rates (0.5-5 min'), the emergence from ETS8 to 19A12. These analyses highlight the impor of these activities in P450, has important implications. tance of the stabilization step after 35E11 and support the 15 Particularly noteworthy is the iodomethane dehalogenase proposed relationship between stability and evolvability. activity, which is not supported by natural P450s or methane To further investigate the evolution of substrate specificity, monooxygenase. This compound is only known to be kinetic constants for the Cs to Cs alkanes were determined degraded by a small Subset of methylotrophic species using from Michaelis-Menten plots (Table 19). The preference of enzymes Such as corrinoid-dependent methyltransferases 35E11 and its evolutionary predecessors for medium-chain alkanes is indicated by their high C/Cs specificities ((k/ that are unrelated to P450s and MMO. With P450, an K)s/(k/K)s-100), where K appears to be the major alternative enzymatic strategy for the degradation of this contributing factor. Cs/Cs specificity decreases to ~5 in the compound has thus emerged, expanding even further the immediate precursors of P450 and then Switches alto range of known P450 substrates. gether to the Smaller alkane, becoming 0.4 in P450. In 25 The examples set forth above are provided to give those of P450 catalytic efficiency is highest on propane and ordinary skill in the art a complete disclosure and description decreases along the alkane series, driven by decreasing k. of how to make and use the embodiments of the devices, TABLE 19

Kinetic constants for n-alkanes.

WT 139-3 35E11 19A12 11-3 1-3 P450Po. Propane K(M) Il-8. >3OOOO 1450 - 18O 82O 95 475 82 340 41 2SO 33 17035 k...(min) Il-8. <10 458 245 25 450 S4 45S 7S 32046 1853S k/K (M's) <1E--O1 5.17E--O2 4.98E-03 1.58E--04 2.23E--04 2.13E--04 181E--04 Pentane K(M) Il-8. 1410 SO 1290 - 1 OS 103057 67S74 740 - 69 385 43 83S 117 k...(min') Il-8. 98 - 11 21539 760 S2 1090 125 845 - 95 460 - 85 150 - 14 k/K(M's) 1.12E--O3 2.78E--O3 1.23E--04 2.69E-04 190E--04 199E-04 2.99E-03 Hexane K(M) Il-8. 435 30 295 - 20 370-33 665 - 118 395 - 81 150 - 24 43091 k...(min') Il-8. 210 - 11 410 S7 710 - 69 1310 222 910 - 117 315 - 17 8S 17 k./K (M's') 8.OSE-03 2.32E--04 32OE-04 3.28E--04 3.84E--04 3.5OE-04 3.29E-03 Heptane K(M) ill. 608 79 19 2OO 46 17S 17 105 29 7223 19040 k(min) ill. 145 - 12 275 57 63S 83 580 + 1.25 42O76 16035 325 k./K (M's') 4.O3E--04 5.73E--04 5.29E-04 5.52E--04 6.67E--04 3.81E--O4 2.75E-03 Octane K(M) >SOOO 19 6 130-27 85 2S 74 - 21 81 - 16 226 185 - 40 k...(min)

N.a. = not active. N.m = not measurable.

To evaluate the fate of the enzyme’s original function, systems and methods of the invention, and are not intended to P450, activity was tested on the long-chain fatty acids limit the scope of what the inventors regard as their invention. laurate and palmitate. Despite high catalytic efficiency of Modifications of the above-described modes for carrying out wild-type P450, with these substrates (1.9x10 M' sec' the invention that are obvious to persons of skill in the art are and 6.0x10" M' sec', respectively), no detectable amounts intended to be within the scope of the following claims. All of hydroxylated fatty acids were observed in the presence of 55 P450, demonstrating a loss of these original functions patents and publications mentioned in the specification are upon re-specialization for propane hydroxylation. indicative of the levels of skill of those skilled in the art to Along with activity on propane, P450, also hydroxy which the invention pertains. All references cited in this dis lates ethane (TTN-2.500), despite ethane's higher C H closure are incorporated by reference to the same extent as if bond energy (101.0 kcal mol' versus 98.6 kcal mol' for 60 each reference had been incorporated by reference in its propane). P450 activity was examined on halogenated entirety individually. derivatives of methane, CHC1, CHBr and CHI. Halom A number of embodiments of the invention have been ethanes have molecular sizes and C-H bond energies inter described. Nevertheless, it will be understood that various mediate between those of methane and propane (FIG. 41), but 65 modifications may be made without departing from the spirit different reactivity and polarity. All the halomethanes studied and scope of the invention. Accordingly, other embodiments except iodomethane are Substrates for methane monooxyge are within the scope of the following claims. US 8,715,988 B2 87 88

SEQUENCE LISTING

< 16 Os NUMBER OF SEO ID NOS: 125

SEQ ID NO 1 LENGTH: 104.8 TYPE : PRT ORGANISM: Bacillus megaterium

<4 OOs, SEQUENCE: 1.

Thir Ile Lys Glu Met Pro Glin Pro Thir Phe Gly Glu Luell Lys Asn 1. 15

Lell Pro Luell Luell Asn Thir Asp Wall Glin Ala Lell Met Ile 3O

Ala Asp Glu Luell Gly Glu Ile Phe Phe Glu Ala Pro Gly Arg Wall 35 4 O 45

Thir Arg Luell Ser Ser Glin Arg Luell Ile Lys Glu Ala Asp Glu SO 55 6 O

Ser Arg Phe Asp Asn Lell Ser Glin Ala Luell Phe Wall Arg Asp 65 70

Phe Ala Gly Asp Gly Lell Phe Thir Ser Trp Thir His Glu Asn Trp 85 90 95

Ala His Asn Ile Lell Luell Pro Ser Phe Ser Glin Glin Ala Met 105 11 O

Gly Tyr His Ala Met Met Wall Asp Ile Ala Wall Glin Luell Wall Glin 115 12 O 125

Trp Glu Arg Lell Asn Ala Asp Glu His Ile Glu Wall Pro Glu Asp 13 O 135 14 O

Met Thir Arg Luell Thir Lell Asp Thir Ile Gly Luell Gly Phe Asn Tyr 145 150 155 160

Arg Phe Asn Ser Phe Arg Asp Glin Pro His Pro Phe Ile Thir Ser 1.65 17O 17s

Met Wall Arg Ala Lell Glu Ala Met Asn Lys Lell Glin Arg Ala Asn 18O 185 19 O

Pro Asp Asp Pro Ala Asp Glu Asn Arg Glin Phe Glin Glu Asp 195

Ile Lys Wall Met Asn Asp Lell Wall Asp Ile Ile Ala Asp Arg 21 O 215

Ala Ser Gly Glu Glin Ser Asp Asp Luell Luell Thir His Met Luell Asn Gly 225 23 O 235 24 O

Asp Pro Glu Thir Gly Glu Pro Luell Asp Asp Glu Asn Ile Arg 245 250 255

Glin Ile Ile Thir Phe Lell Ile Ala Gly His Glu Thir Thir Ser Gly Luell 26 O 265 27 O

Lell Ser Phe Ala Lell Tyr Phe Luell Wall Lys ASn Pro His Wall Luell Glin 27s 285

Ala Ala Glu Glu Ala Ala Arg Wall Luell Wall Asp Pro Wall Pro Ser 29 O 295 3 OO

Tyr Glin Wall Lys Glin Lell Wall Gly Met Wall Luell Asn Glu 3. OS 310 315 32O

Ala Luell Arg Luell Trp Pro Thir Ala Pro Ala Phe Ser Lell Tyr Ala 3.25 330 335

Glu Asp Thir Wall Lell Gly Gly Glu Tyr Pro Luell Glu Gly Asp Glu 34 O 345 35. O

Lell Met Wall Luell Ile Pro Glin Luell His Arg Asp Thir Ile Trp Gly 355 360 365 US 8,715,988 B2 89 90 - Continued

Asp Asp Wall Glu Glu Phe Arg Pro Glu Arg Phe Glu Asn Pro Ser Ala 37 O 375

Ile Pro Glin His Ala Phe Pro Phe Gly ASn Gly Glin Arg Ala Cys 385 390 395 4 OO

Ile Gly Glin Glin Phe Ala Lell His Glu Ala Thir Lell Wall Luell Gly Met 4 OS 415

Met Luell His Phe Asp Phe Glu Asp His Thir Asn Tyr Glu Luell Asp 425 43 O

Ile Glu Thir Lell Thir Lell Lys Pro Glu Gly Phe Wall Wall Ala 435 44 O 445

Ser Ile Pro Lell Gly Gly Ile Pro Ser Pro Ser Thir Glu 450 45.5 460

Glin Ser Ala Lys Wall Arg Ala Glu Asn Ala His Asn Thir 465 470

Pro Luell Luell Wall Lell Tyr Gly Ser Asn Met Gly Thir Ala Glu Gly Thir 485 490 495

Ala Arg Asp Luell Ala Asp Ile Ala Met Ser Lys Gly Phe Ala Pro Glin SOO 505

Wall Ala Thir Luell Asp Ser His Ala Gly Asn Luell Pro Arg Glu Gly Ala 515 525

Wall Luell Ile Wall Thir Ala Ser Asn Gly His Pro Pro Asp Asn Ala 53 O 535 54 O

Lys Glin Phe Wall Asp Trp Lell Asp Glin Ala Ser Ala Asp Glu Wall Lys 5.45 550 555 560

Gly Wall Arg Ser Wall Phe Gly Gly Asp Asn Trp Ala Thir 565 st O sts

Thir Glin Lys Wall Pro Ala Phe Ile Asp Glu Thir Lell Ala Ala 585 59 O

Gly Ala Glu Asn Ile Ala Asp Arg Gly Glu Ala Asp Ala Ser Asp Asp 595 605

Phe Glu Gly Thir Tyr Glu Glu Trp Arg Glu His Met Trp Ser Asp Wall 610 615

Ala Ala Phe Asn Lell Asp Ile Glu Asn Ser Glu Asp Asn Ser 625 630 635 64 O

Thir Luell Ser Luell Glin Phe Wall Asp Ser Ala Ala Asp Met Pro Luell Ala 645 650 655

Met His Gly Ala Phe Ser Thir Asn Wall Wall Ala Ser Lys Glu Luell 660 665 67 O

Glin Glin Pro Gly Ser Ala Arg Ser Thir Arg His Lell Glu Ile Glu Luell 675 685

Pro Lys Glu Ala Ser Glin Glu Gly Asp His Lell Gly Wall Ile Pro 69 O. 695 7 OO

Arg Asn Tyr Glu Gly Ile Wall Asn Arg Wall Thir Ala Arg Phe Gly Luell 7 Os 71O

Asp Ala Ser Glin Glin Ile Arg Luell Glu Ala Glu Glu Glu Luell Ala 72 73 O 73

His Luell Pro Luell Ala Thir Wall Ser Wall Glu Glu Lell Luell Glin Tyr 740 74. 7 O

Wall Glu Luell Glin Asp Pro Wall Thir Arg Thir Glin Lell Arg Ala Met Ala 760 765

Ala Lys Thir Wall Cys Pro Pro His Wall Glu Lell Glu Ala Luell Luell 770 775 78O

Glu Glin Ala Tyr Lys Glu Glin Wall Luell Ala Arg Luell Thir Met 78s 79 O 79. 8OO US 8,715,988 B2 91 - Continued

Lieu. Glu Lieu. Lieu. Glu Lys Tyr Pro Ala Cys Glu Met Llys Phe Ser Glu 805 810 815 Phe Ile Ala Leu Lleu Pro Ser Ile Arg Pro Arg Tyr Tyr Ser Ile Ser 82O 825 83 O Ser Ser Pro Arg Val Asp Glu Lys Glin Ala Ser Ile Thr Val Ser Val 835 84 O 845 Val Ser Gly Glu Ala Trp Ser Gly Tyr Gly Glu Tyr Lys Gly Ile Ala 850 855 860 Ser Asn Tyr Lieu Ala Glu Lieu. Glin Glu Gly Asp Thr Ile Thr Cys Phe 865 87O 87s 88O Ile Ser Thr Pro Glin Ser Glu Phe Thr Lieu Pro Lys Asp Pro Glu Thr 885 890 895 Pro Leu. Ile Met Val Gly Pro Gly Thr Gly Val Ala Pro Phe Arg Gly 9 OO 905 91 O Phe Val Glin Ala Arg Lys Glin Lieu Lys Glu Glin Gly Glin Ser Lieu. Gly 915 92 O 925 Glu Ala His Lieu. Tyr Phe Gly Cys Arg Ser Pro His Glu Asp Tyr Lieu. 93 O 935 94 O Tyr Glin Glu Glu Lieu. Glu Asn Ala Glin Ser Glu Gly Ile Ile Thr Lieu. 945 950 955 96.O His Thr Ala Phe Ser Arg Met Pro Asn Glin Pro Llys Thr Tyr Val Glin 965 97O 97. His Val Met Glu Glin Asp Gly Lys Llys Lieu. Ile Glu Lieu. Lieu. Asp Glin 98O 985 99 O Gly Ala His Phe Tyr Ile Cys Gly Asp Gly Ser Gln Met Ala Pro Ala 995 1OOO 1005 Val Glu Ala Thr Lieu Met Lys Ser Tyr Ala Asp Val His Glin Val 1010 1015 1 O2O Ser Glu Ala Asp Ala Arg Lieu. Trp Lieu. Glin Gln Lieu. Glu Glu Lys 1025 1O3 O 1035 Gly Arg Tyr Ala Lys Asp Val Trp Ala Gly 104 O 1045

<210s, SEQ ID NO 2 &211s LENGTH: 106 O 212. TYPE: PRT <213> ORGANISM; Bacillus subtilis

<4 OOs, SEQUENCE: 2 Lys Glu Thir Ser Pro Ile Pro Gln Pro Llys Thr Phe Gly Pro Leu Gly 1. 5 1O 15 Asn Lieu Pro Lieu. Ile Asp Lys Asp Llys Pro Thr Lieu. Ser Lieu. Ile Llys 2O 25 3O Lieu Ala Glu Glu Gln Gly Pro Ile Phe Glin Ile His Thr Pro Ala Gly 35 4 O 45 Thir Thir Ile Val Val Ser Gly His Glu Lieu Val Lys Glu Val Cys Asp SO 55 6 O Glu Glu Arg Phe Asp Llys Ser Ile Glu Gly Ala Lieu. Glu Lys Val Arg 65 70 7s 8O Ala Phe Ser Gly Asp Gly Lieu Phe Thr Ser Trp Thr His Glu Pro Asn 85 90 95 Trp Arg Lys Ala His Asn. Ile Lieu Met Pro Thr Phe Ser Glin Arg Ala 1OO 105 11 O Met Lys Asp Tyr His Glu Lys Met Val Asp Ile Ala Val Glin Lieu. Ile 115 12 O 125 US 8,715,988 B2 93 94 - Continued

Glin Lys Trp Ala Arg Lell Asn Pro Asn Glu Ala Wall Asp Val Pro Gly 13 O 135 14 O

Asp Met Thir Arg Lell Thir Lell Asp Thir Ile Gly Lell Cys Gly Phe Asn 145 150 155 160

Arg Phe Asn Ser Arg Glu Thir Pro His Pro Phe Ile Asn 1.65 17s

Ser Met Wall Arg Ala Lell Asp Glu Ala Met His Glin Met Glin Arg Luell 18O 185 19 O

Asp Wall Glin Asp Lys Lell Met Wall Arg Thir Arg Glin Phe Arg 195

Asp Ile Glin Thir Met Phe Ser Luell Wall Asp Ser Ile Ile Ala Glu Arg 21 O 215

Arg Ala Asn Gly Asp Glin Asp Glu Asp Luell Lell Ala Arg Met Luell 225 23 O 235 24 O

Asn Wall Glu Asp Pro Glu Thir Gly Glu Lys Luell Asp Asp Glu Asn Ile 245 250 255

Arg Phe Glin Ile Ile Thir Phe Luell Ile Ala Gly His Glu Thir Thir Ser 26 O 265 27 O

Gly Luell Luell Ser Phe Ala Thir Tyr Phe Luell Luell His Pro Asp 27s 285

Lell Lys Ala Tyr Glu Glu Wall Asp Arg Wall Lell Thir Asp Ala Ala 29 O 295 3 OO

Pro Thir Glin Wall Lell Glu Luell Thir Tyr Ile Arg Met Ile Luell 3. OS 310 315

Asn Glu Ser Luell Arg Lell Trp Pro Thir Ala Pro Ala Phe Ser Luell 3.25 330 335

Pro Glu Asp Thir Wall Ile Gly Gly Phe Pro Ile Thir Thir Asn 34 O 345 35. O

Asp Arg Ile Ser Wall Lell Ile Pro Glin Luell His Arg Asp Arg Asp Ala 355 360 365

Trp Gly Asp Ala Glu Glu Phe Arg Pro Glu Arg Phe Glu His Glin 37 O 375

Asp Glin Wall Pro His His Ala Pro Phe Gly Asn Gly Glin Arg 385 390 395 4 OO

Ala Ile Gly Met Glin Phe Ala Luell His Glu Ala Thir Luell Wall Luell 4 OS 415

Gly Met Ile Luell Lys Phe Thir Luell Ile Asp His Glu Asn Glu 425 43 O

Lell Asp Ile Glin Thir Lell Thir Luell Pro Gly Asp Phe His Ile 435 44 O 445

Ser Wall Glin Ser Arg His Glin Glu Ala Ile His Ala Asp Wall Glin Ala 450 45.5 460

Ala Glu Ala Ala Pro Asp Glu Glin Glu Thir Glu Ala Lys 465 470

Gly Ala Ser Wall Ile Gly Lell Asn Asn Arg Pro Lell Lell Wall Luell 485 490 495

Gly Ser Asp Thir Gly Thir Ala Glu Gly Wall Ala Arg Glu Luell Ala Asp SOO 505

Thir Ala Ser Luell His Gly Wall Arg Thir Thir Ala Pro Luell Asn Asp 515 525

Arg Ile Gly Lell Pro Lys Glu Gly Ala Wall Wall Ile Wall Thir Ser 53 O 535 54 O

Ser Asn Gly Lys Pro Pro Ser Asn Ala Gly Glin Phe Wall Glin Trp US 8,715,988 B2 95 96 - Continued

5.45 550 555 560

Lell Glin Glu Ile Lys Pro Gly Glu Luell Glu Gly Wall His Ala Wall 565 st O sts

Phe Gly Gly Asp His Asn Trp Ala Ser Thir Glin Tyr Wall Pro 58O 585 59 O

Arg Phe Ile Asp Glu Glin Lell Ala Glu Gly Ala Thir Arg Phe Ser 595 605

Ala Arg Gly Glu Gly Asp Wall Ser Gly Asp Phe Glu Gly Glin Luell Asp 610 615

Glu Trp Ser Met Trp Ala Asp Ala Ile Ala Phe Gly Luell 625 630 635 64 O

Glu Luell Asn Glu Asn Ala Asp Glu Arg Ser Thir Lell Ser Luell Glin 645 650 655

Phe Wall Arg Gly Lell Gly Glu Ser Pro Luell Ala Arg Ser Tyr Glu Ala 660 665 67 O

Ser His Ala Ser Ile Ala Glu Asn Arg Glu Luell Glin Ser Ala Asp Ser 675 685

Asp Arg Ser Thir His Ile Glu Ile Ala Luell Pro Pro Asp Wall Glu 69 O. 695 7 OO

Tyr Glin Glu Gly Asp His Lell Gly Wall Luell Pro Asn Ser Glin Thir 7 Os

Asn Wall Ser Arg Ile Lell His Arg Phe Gly Luell Gly Thir Asp Glin 72 73 O 73

Wall Thir Luell Ser Ala Ser Gly Arg Ser Ala Gly His Lell Pro Luell Gly 740 74. 7 O

Arg Pro Wall Ser Lell His Asp Luell Luell Ser Ser Wall Glu Wall Glin 760 765

Glu Ala Ala Thir Ala Glin Ile Arg Glu Luell Ala Ser Phe Thir Wall 770 775

Cys Pro Pro His Arg Arg Glu Luell Glu Glu Luell Ser Ala Glu Gly Wall 79 O 79.

Glin Glu Glin Ile Lell Arg Ile Ser Met Lell Asp Luell Luell 805 810 815

Glu Glu Ala Asp Met Pro Phe Glu Arg Phe Luell Glu Luell 825 83 O

Lell Arg Pro Luell Pro Arg Tyr Ser Ile Ser Ser Ser Pro Arg 835 84 O 845

Wall Asn Pro Arg Glin Ala Ser Ile Thir Wall Gly Wall Wall Arg Gly Pro 850 855 860

Ala Trp Ser Gly Arg Gly Glu Arg Gly Wall Ala Ser Asn Asp Luell 865

Ala Glu Arg Glin Ala Gly Asp Wall Wall Met Phe Ile Arg Thir Pro 885 890 895

Glu Ser Arg Phe Glin Lell Pro Asp Pro Glu Thir Pro Ile Ile Met 9 OO 905 91 O

Wall Gly Pro Gly Thir Gly Wall Ala Pro Phe Arg Gly Phe Luell Glin Ala 915 92 O 925

Arg Asp Wall Luell Arg Glu Gly Thir Luell Gly Glu Ala His Luell 93 O 935 94 O

Tyr Phe Gly Asn Asp Arg Asp Phe Ile Arg Asp Glu Luell 945 950 955 96.O

Glu Arg Phe Glu Lys Asp Gly Ile Wall Thir Wall His Thir Ala Phe Ser 965 97O 97. US 8,715,988 B2 97 - Continued Arg Lys Glu Gly Met Pro Llys Thr Tyr Val Glin His Lieu Met Ala Asp 98O 985 99 O Glin Ala Asp Thir Lieu. Ile Ser Ile Lieu. Asp Arg Gly Gly Arg Lieu. Tyr 995 1OOO 1005 Val Cys Gly Asp Gly Ser Lys Met Ala Pro Asp Val Glu Ala Ala 1010 1015 1 O2O Lieu. Glin Lys Ala Tyr Glin Ala Wal His Gly Thr Gly Glu Glin Glu 1025 1O3 O 1035 Ala Glin Asn Trp Lieu. Arg His Lieu. Glin Asp Thr Gly Met Tyr Ala 104 O 1045 1 OSO Lys Asp Val Trp Ala Gly Ile 105.5 106 O

<210s, SEQ ID NO 3 &211s LENGTH: 1053 212. TYPE: PRT <213> ORGANISM; Bacillus subtilis

<4 OOs, SEQUENCE: 3 Lys Glin Ala Ser Ala Ile Pro Gln Pro Llys Thr Tyr Gly Pro Leu Lys 1. 5 1O 15 Asn Lieu Pro His Lieu. Glu Lys Glu Gln Lieu. Ser Glin Ser Lieu. Trp Arg 2O 25 3O Ile Ala Asp Glu Lieu. Gly Pro Ile Phe Arg Phe Asp Phe Pro Gly Val 35 4 O 45 Ser Ser Val Phe Val Ser Gly His Asn Lieu Val Ala Glu Val Cys Asp SO 55 6 O Glu Lys Arg Phe Asp Lys Asn Lieu. Gly Lys Gly Lieu. Glin Llys Val Arg 65 70 7s 8O Glu Phe Gly Gly Asp Gly Lieu Phe Thr Ser Trp Thr His Glu Pro Asn 85 90 95 Trp Glin Lys Ala His Arg Ile Lieu. Lieu Pro Ser Phe Ser Glin Lys Ala 1OO 105 11 O Met Lys Gly Tyr His Ser Met Met Lieu. Asp Ile Ala Thr Glin Lieu. Ile 115 12 O 125 Glin Llys Trp Ser Arg Lieu. Asn. Pro Asn. Glu Glu Ile Asp Wall Ala Asp 13 O 135 14 O Asp Met Thir Arg Lieu. Thir Lieu. Asp Thir Ile Gly Lieu. Cys Gly Phe Asn 145 150 155 160 Tyr Arg Phe Asin Ser Phe Tyr Arg Asp Ser Gln His Pro Phe Ile Thr 1.65 17O 17s Ser Met Lieu. Arg Ala Lieu Lys Glu Ala Met Asn. Glin Ser Lys Arg Lieu. 18O 185 19 O Gly Lieu. Glin Asp Llys Met Met Val Llys Thir Lys Lieu. Glin Phe Glin Lys 195 2OO 2O5 Asp Ile Glu Val Met Asn. Ser Lieu Val Asp Arg Met Ile Ala Glu Arg 21 O 215 22O Lys Ala Asn Pro Asp Glu Asn. Ile Lys Asp Lieu Lleu Ser Lieu Met Lieu. 225 23 O 235 24 O Tyr Ala Lys Asp Pro Val Thr Gly Glu Thir Lieu. Asp Asp Glu Asn. Ile 245 250 255 Arg Tyr Glin Ile Ile Thr Phe Lieu. Ile Ala Gly His Glu Thir Thr Ser 26 O 265 27 O Gly Lieu. Lieu. Ser Phe Ala Ile Tyr Cys Lieu. Lieu. Thir His Pro Glu Lys 27s 28O 285 US 8,715,988 B2 99 100 - Continued

Lell Lys Ala Glin Glu Glu Ala Asp Arg Wall Lell Thir Asp Asp Thir 29 O 295 3 OO

Pro Glu Glin Ile Glin Glin Luell Tyr Ile Arg Met Wall Luell 3. OS 310 315

Asn Glu Thir Luell Arg Lell Pro Thir Ala Pro Ala Phe Ser Luell 3.25 330 335

Ala Glu Asp Thir Wall Lell Gly Gly Glu Pro Ile Ser Gly 34 O 345 35. O

Glin Pro Wall Thir Wall Lell Ile Pro Luell His Arg Asp Glin Asn Ala 355 360 365

Trp Gly Pro Asp Ala Glu Asp Phe Pro Glu Arg Phe Glu Asp Pro 37 O 375

Ser Ser Ile Pro His His Ala Pro Phe Gly Asn Gly Glin Arg 385 390 395 4 OO

Ala Ile Gly Met Glin Phe Ala Luell Glin Glu Ala Thir Met Wall Luell 4 OS 415

Gly Luell Wall Luell Lys His Phe Glu Luell Ile ASn His Thir Gly Glu 425 43 O

Lell Ile Glu Ala Lell Thir Ile Pro Asp Asp Phe Ile 435 44 O 445

Thir Wall Pro Arg Thir Ala Ala Ile ASn Wall Glin Arg Glu 450 45.5 460

Glin Ala Asp Ile Lys Ala Glu Thir Pro Lys Glu Thir Pro Lys 465 470

His Gly Thir Pro Lell Lell Wall Luell Phe Gly Ser Asn Lell Gly Thir Ala 485 490 495

Glu Gly Ile Ala Gly Glu Lell Ala Ala Glin Gly Arg Glin Met Gly Phe SOO 505

Thir Ala Glu Thir Ala Pro Lell Asp Asp Ile Gly Lys Luell Pro Glu 515 525

Glu Gly Ala Wall Wall Ile Wall Thir Ala Ser Asn Gly Ala Pro Pro 53 O 535 54 O

Asp Asn Ala Ala Gly Phe Wall Glu Trp Luell Lys Glu Lell Glu Glu Gly 5.45 550 555 560

Glin Luell Gly Wall Ser Ala Wall Phe Gly Gly Asn Arg Ser 565 st O sts

Trp Ala Ser Thir Tyr Glin Arg Ile Pro Arg Luell Asp Asp Met Met 585 59 O

Ala Lys Gly Ala Ser Arg Luell Thir Ala Ile Glu Gly Asp Ala 595 605

Ala Asp Asp Phe Glu Ser His Arg Glu Ser Trp Asn Arg Phe Trp 610 615

Lys Glu Thir Met Asp Ala Phe Asp Ile Asn Glu Ala Glin Glu 625 630 635 64 O

Asp Arg Pro Ser Lell Ser Ile Thir Phe Luell Ser Ala Thir Glu Thir 645 650 655

Pro Wall Ala Lys Ala Gly Ala Phe Glu Gly Wall Luell Glu Asn 660 665 67 O

Arg Glu Luell Glin Thir Ala Ala Ser Thir Arg Ser Thir Arg His Ile Glu 675 685

Lell Glu Ile Pro Ala Gly Lys Thir Lys Glu Gly Asp His Ile Gly 69 O. 695 7 OO

Ile Luell Pro Asn Ser Arg Glu Luell Wall Glin Arg Wall Luell Ser Arg 7 Os 71O 71s 72O US 8,715,988 B2 101 102 - Continued

Phe Gly Luell Glin Ser Asn His Wall Ile Lys Wall Ser Gly Ser Ala His 72 73 O 73

Met Ala His Luell Pro Met Asp Arg Pro Ile Wall Wall Asp Lieu. Luell 740 74. 7 O

Ser Ser Tyr Wall Glu Lell Glin Glu Pro Ala Ser Arg Lell Glin Lieu. Arg 760 765

Glu Luell Ala Ser Tyr Thir Wall Pro Pro His Glin Glu Lieu. Glu 770 775

Glin Luell Wall Ser Asp Asp Gly Ile Glu Glin Wall Luell Ala Lys 79 O 79. 8OO

Arg Luell Thir Met Lell Asp Phe Luell Glu Asp Pro Ala Glu Met 805 810 815

Pro Phe Glu Arg Phe Lell Ala Luell Luell Pro Ser Lell Pro Arg Tyr 825 83 O

Ser Ile Ser Ser Ser Pro Lys Wall His Ala Asn Ile Wall Ser Met 835 84 O 845

Thir Wall Gly Wall Wall Ala Ser Ala Trp Ser Gly Arg Gly Glu Tyr 850 855 860

Arg Gly Wall Ala Ser Asn Luell Ala Glu Luell Asn Thir Gly Asp Ala 865 88O

Ala Ala Phe Ile Arg Thir Pro Glin Ser Gly Phe Glin Met Pro Asn 885 890 895

Asp Pro Glu Thir Pro Met Ile Met Wall Gly Pro Gly Thir Gly Ile Ala 9 OO 905 91 O

Pro Phe Arg Gly Phe Ile Glin Ala Arg Ser Wall Lell Lys Glu Gly 915 92 O 925

Ser Thir Luell Gly Glu Ala Lell Luell Phe Gly Cys Arg Arg Pro Asp 93 O 935 94 O

His Asp Asp Luell Tyr Arg Glu Glu Luell Asp Glin Ala Glu Glin Asp Gly 945 950 955 96.O

Lell Wall Thir Ile Arg Arg Tyr Ser Arg Wall Glu Asn Glu Pro Llys 965 97.

Gly Wall Glin His Lell Lell Lys Glin Asp Thir Glin Lys Luell Met Thr 985 99 O

Lell Ile Glu Lys Gly Ala His Ile Tyr Val Cys Gly Asp Gly Ser Glin 995 1OOO

Met Ala Pro Asp Val Glu Arg Thir Lieu. Arg Lieu Ala Tyr Glu Ala 1010 5

Glu Lys Ala Ala Ser Glin Glu Glu Ser Ala Val Trp Lieu. Glin Llys 1025 103 O 1035

Lell Glin Asp Glin Arg Arg Tyr Val Lys Asp Val Trp Thr Gly Met 104 O 104 5 1 OSO

SEQ ID NO 4 LENGTH: 1065 TYPE : PRT ORGANISM; Bacillus cer els

< 4 OOs SEQUENCE: 4. Met Asp Llys Llys Val Ser Ala Ile Pro Gln Pro Llys Thr Tyr Gly Pro 1. 5 15 Lieu. Gly Asn Lieu Pro Lieu. Ile Asp Lys Asp Llys Pro Thr Lieu. Ser Phe 25

Ile Lys Ile Ala Glu Glu Tyr Gly Pro Ile Phe Glin Ile Glin. Thir Lieu. 35 4 O 45 US 8,715,988 B2 103 104 - Continued

Ser Asp Thir Ile Ile Wall Ile Ser Gly His Glu Lell Wall Ala Glu Wall SO 55 6 O

Cys Asp Glu Thr Arg Phe Asp Ser Ile Glu Gly Ala Luell Ala Lys 65 70

Wall Arg Ala Phe Ala Gly Asp Gly Luell Phe Thir Ser Glu Thir Glin Glu 85 90 95

Pro Asn Trp Llys Llys Ala His Asn Ile Luell Met Pro Thir Phe Ser Glin 1OO 105 11 O

Arg Ala Met His Ala Met Met Wall Asp Ile Ala Wall Glin 115 12 O 125

Lell Wall Glin Ala Arg Luell Asn Pro ASn Glu Asn Wall Asp Wall 13 O 135 14 O

Pro Glu Asp Met Thir Arg Lell Thir Luell Asp Thir Ile Gly Luell Gly 145 150 155 160

Phe Asn Arg Phe Asn Ser Phe Arg Glu Thir Pro His Pro Phe 1.65 17s

Ile Thir Ser Met Thir Arg Ala Luell Asp Glu Ala Met His Glin Luell Glin 18O 185 19 O

Arg Luell Asp Ile Glu Asp Luell Met Trp Arg Thir Lys Arg Glin Phe 195

Glin His Asp Ile Glin Ser Met Phe Ser Luell Wall Asp Asn Ile Ile Ala 21 O 215

Glu Arg Ser Ser Gly Asn Glin Glu Glu ASn Asp Lell Luell Ser Arg 225 23 O 235 24 O

Met Luell His Wall Glin Asp Pro Glu Thir Gly Glu Lell Asp Asp Glu 245 250 255

Asn Ile Arg Phe Glin Ile Ile Thir Phe Luell Ile Ala Gly His Glu Thir 26 O 265 27 O

Thir Ser Gly Lieu. Luell Ser Phe Ala Ile Phe Lell Lell Asn Pro 285

Asp Lys Luell Ala Tyr Glu Glu Wall Asp Arg Wall Luell Thir Asp 29 O 295 3 OO

Pro Thir Pro Thr Tyr Glin Glin Wall Met Luell Ile Arg Met 3. OS 310 315

Ile Luell Asn Glu Ser Lell Arg Luell Trp Pro Thir Ala Pro Ala Phe Ser 3.25 330 335

Lell Ala Lys Glu Asp Thir Wall Ile Gly Gly Pro Ile 34 O 345 35. O

Gly Glu Asp Arg Ile Ser Wall Luell Ile Pro Glin Lell His Arg Asp 355 360 365

Asp Ala Trp Gly Asp Asn Wall Glu Glu Phe Glin Pro Glu Arg Phe 37 O 375

Glu Asp Luell Asp Llys Wall Pro His His Ala Tyr Pro Phe Gly Asn 385 390 395 4 OO

Gly Glin Arg Ala Cys Ile Gly Met Glin Phe Ala Lell His Glu Ala Thir 4 OS 415

Lell Wall Met Gly Met Lell Lell Glin His Phe Glu Phe Ile Asp Glu 42O 425 43 O

Asp Glin Lieu. Asp Wall Glin Thir Luell Thir Lell Lys Pro Gly Asp 435 44 O 445

Phe Lys Ile Arg Ile Wall Pro Arg Asn Glin ASn Ile Ser His Thir Thir 450 45.5 460

Wall Luell Ala Pro Thir Glu Glu Luell Lys ASn His Glu Ile Glin