US 200901 63376A1 (19) United States (12) Patent Application Publication (10) Pub. No.: US 2009/0163376 A1 Li et al. (43) Pub. Date: Jun. 25, 2009

(54) KETOL-ACID REDUCTOISOMERASE USING Related U.S. Application Data NADH (60) Provisional application No. 61/015.346, filed on Dec. (75) Inventors: Yougen Li, Lawrenceville, NJ (US); 20, 2007, provisional application No. 61/109,297, Der-Ing Liao, Wilmington, DE filed on Oct. 29, 2008. (US); Mark J. Nelson, Newark, DE Publication Classification (US); Daniel P. Okeefe, Ridley Park, PA (US) (51) Int. Cl. C40B 30/04 (2006.01) Correspondence Address: CI2N 9/90 (2006.01) E DUPONT DE NEMOURS AND COMPANY CI2N IS/II (2006.01) LEGAL PATENT RECORDS CENTER CI2N 5/06 (2006.01) BARLEY MILL PLAZA 25/1122B, 4417 LAN CI2P 7/16 (2006.01) CASTER PIKE (52) U.S. Cl...... 506/9: 435/233: 536/23.2; 435/325: WILMINGTON, DE 19805 (US) 435/160 (57) ABSTRACT (73) Assignee: E. DUPONT DE NEMOURS AND COMPANY Methods for the evolution of NADPH binding ketol-acid reductoisomerase to acquire NADH binding func (21) Appl. No.: 12/337,736 tionality are provided. Specific mutant ketol-acid reductoi Somerase enzymes isolated from Pseudomonas that have (22) Filed: Dec. 18, 2008 undergone co-factor switching to bind NADH are described.

Glucose

O Y O

O Pyruvate

CO-acetolate syn thase NXbhOH O 5 valine S-Acetolactate O decarboxylase f p NADPH + H ch isobutylarine NADP Aerof acid reductoisamerase (KAA) valine , co, NH 8 onega transanitase a 2. NH; valine NH3. -- OH 22 Hi / "h"gehydrogenase or 2 e, 2 OH frtisatinase O O OHO OH branched chain keto acid 2,3-Dihydroxy isowalerate acetohydroxy acid dehydratase aecarboxylase H "c" O "d" ) H2 O ci-Ketoisowalerate CO isobutyraldehyde "f 2e. 2 h" "e" 2 e, 2 HK ( , NADH + H branched chain HS-CoA 9 NAD" E.eyenase eranched chain ketoacid HS-CoA yarog. dehydrogenase aidehyde2E. dehydrogenase OH S-CoA u-r"k" ~s. itssobutyryla rice sobutanol isobutyryl CoA butyryl CoA Patent Application Publication Jun. 25, 2009 Sheet 1 of 15 US 2009/O163376 A1

Figure 1

Glucose

O Y O N's OH -- Noh O Pyruvate O "a" co- synthase OH OH

O O valine S-Acetolactate O decarboxylase t" NADPH + H b c isobutylamine Aerof acid reaucroisonerase (KAAI) ware co. NH NADP Nh H28 omega --a transititase 2 NH, valine NH3. -- OH 22 Hi / "h" dehydrogenase or 2 e, 2h OH e transaintinase >

c k' S-A N isobutyryloa -- SCAus figy Isobutanol isobutyryl CoA butyryl CoA Patent Application Publication Jun. 25, 2009 Sheet 2 of 15 US 2009/0163376 A1

Figure 2

a PF5-KARI (44) VGLRKGSATVAKA PAO-KARI (44) VGLRSGSATVAKA Spinach KARI (162) IGLRKGSNSFAEA

b 59 Si'8w 72 MMC5 (44) VGLRKNGASWENAK MMS2 (44) VGLRKNGASWNNAK MVSB (44) VGLRKNGASWENAK PF5-i VC (44) VGLRKGSATVAKAE KARI-S1 (44) VGLRKNGASWNKAV ill V5 (59) IGVRKDGASWKAAI KARI-D1 (48) VGLEREGKSWELAK KARI-D2 (45) IGLRRGGKSWELAT C On Sen Slus (59) VGLRKN AK Patent Application Publication Jun. 25, 2009 Sheet 3 of 15 US 2009/O163376 A1

Figure 3

Patent Application Publication Jun. 25, 2009 Sheet 4 of 15 US 2009/O163376 A1

Figure 4

C3A1 O C3B11 C3C8 C4D12 Wt Negative

OOO &SS EEE

ES:ax -0.05

-0.10

-0.15

-0.20 8

-0.25

-0.30

-O-35

-0.40 f e -0.45 ENADH.W O NADH W/o Substrate -0.50 NADPH W/ Substrate & NADPH W/o Substrate Patent Application Publication Jun. 25, 2009 Sheet 5 of 15 US 2009/O163376 A1

Figure 5

am s {d - S.

sa -0.1 {d 3s -0.1 * { e e - (d : as -0.2 s 0.2 d:: 4 s 4 3 -0.3 - 4d s 9. (d : 5 -0.3 d s 4 -0.4 s 3 NADH NADPH NADH NADPH -04 -O-5

NADH, NADPH -0.5 Patent Application Publication Jun. 25, 2009 Sheet 7 of 15 US 2009/0163376 A1

Figure 7

O5O 2

20 40 60 8O Temperature C

Patent Application Publication Jun. 25, 2009 Sheet 9 of 15 US 2009/0163376 A1

Figure 9 10 20 30 40 50 60 gi70732562 ref YP 262325.1/1 gil 28868203 refNP 790822.1/1 gil 26991362 refNP 746787.1/1 gil 15599888 |ref INP 253382.1/1 gil 114319705 ref|YP 741388.1/ gii 83648555 ref YP 436990.1/1 gil 67159493 refize 004 20011, 1 gil 71065099 ref YP 263826.1/1 gil 1754.6794 refinP 520196.1/1 gil 74318.007 ref YP315747. 1/1 gil 66044103 ref YP 233944.1/1 gil 1205538161 ref YP 9581.67. 1/ gil 146306044 ref YP 001186509. gil 56552037 ref YP 162876.1/1 gil 57240359 ref ZP_003.683.08.1 gil 6225553 spo32414 IILvc REOM gil 15897495 refNP 342100.1 1/1 gil 76801743 ref|YP 326751. gi 18313972 ref|NP 560639. gi 19552493 ref INP 600495. gil 1607.9881 ref|NP 390707. gil 42780593 ref|NP 977840. gi 42781005 ref INP 978252. gi 266346 spoo 1292. ILV5 SPIoL. MAATAATTFSLSSSSSTsAAASKALKQSPKPSALNLGFLGSSSTIKACRSLKAARVLPSG 70 90 100 110 120 a s a ...... gil 70732562 ref Yp 262325.1/1 --KVFYDKDCDLS-----II gi 28868203 refNP 790822. 1/1 ------KVFYDKDCDS- - - - - IIQG------gi 26991362 refNP 746787.1/1 ------KVFYDKDCDLS-- - - - IIQG------gil 15599888 |refNP 253382.1/1 ------RVFYDKDCDLS-- - - - IIQG------gill 14319705 ref YP 741388.1 / QVYYDKDADLS----- IIQG------gi 183648555 ref YP 436990.1/1 ------QVYYDKDCDLS----- IIQG------gil 67159493 refizp 0.0420011. 1 ------KWYYDKDCDLS-- - - - IIQs------gi 71065099 ref YP 263826.1/1 NVYYDKDCDLS-- - - - IvoG------gi 1754.6794 ref INP 520196.1/1 ------KVFYokdAdLS- or - - - LIKG------gil 7431.8007 ref YP315747. 1/1 KWYDKDADS- - - - - LIKQ------gil 6604.4103 ref YP 233944.1/1 KVFYDKdCdLS-- - - - IIQG------gil 120553816 ref YP 9581.67.1 / ------QVYYDKDCDLS-- - - - IIQG------gil 146306044 refi YP 001186509. ------KVFYOKDCDS- - - - - IIQG------gi 56552O37 ref YP 162876.1/1 ------KWYYDSDADLG- - - - - LIKS------gi57240359 refizP_00368308.1 ------A- -WSYYDKDCDIN- - - - - LIKs------gi 16225553 spo32414 ILVC RHOM RVYYORDAOWN- - - - - LIKS------gi 15897495 refNP 342100.1/1 KCTSKYTDNDAND- - - - - LIKG------gi 76801743 ref YP 326751. Tid-ATIYYDDDAEST- - - - -VLDD------gil 18313972 ref NP 560639. A- - -KYTDREASE- - - - - PLKG------gil 19552493 ref NP 600495. ------A- - ELLYDADADS- - - - - LQG------gil 1607.9881 ref|NP 390707. ------W---KWYYNGDKEN- - - - -VLAG------gil 42780593 ref INP 977840. ------A- - -KWYYEKdVTVN- - - - -VLKE------gil 42781005 ref INP 978252. War w w ------r ur m KTYYEKDANVE- - - - - LLKG------gil 266346 spi Q01292. ILV5 SPIOL ANGGGSALSAQM SAPSINTPSATTFDFDSSVFKKEKVTLSGHDEYIVRGGRNLFPLLPD

Patent Application Publication Jun. 25, 2009 Sheet 13 of 15 US 2009/0163376 A1

gil 70732562 ref YP 26.2325.1/1 gil 28868203 ref INP 790822. 1/1 gil 26991362 ref INP 746787.1/1 gil 15599888 |ref NP 253382. 1 1/1 gil 114319705 ref YP 741388.1 / gi83648555 ref YP 436990.1/l gil 67159493 ref ZP 0.042001.1.1 gil 71065099 ref YP 263826.1/1 gil 1754.6794 refNP 520196.1/1 gil 74318.007 ref YP315747, 1/1 gil 66044103 ref YP 233944.1/1 gil 120553816|ref|YP 958167.1/ gi 146306044 ref|YP 001186509. gil 56552037 ref IYP 162876, 1/1 gi5724.0359 refzP_003.683.08.1 gi 6225553 spo32414 (ILvc RHOM ------gil 15897495 ref INP 342100.1/1 gil 76801743 ref YP 326751.1/1 gi 18313972 refNP 560639.1/1 gil 19552493 ref NP 600495. 1/1 gil 1607.9881 Iref NP-390707. 1/1 gi 42780593 ref INP 977840.1/1 gi 42781005 ref INP 978252.1/1 gil 266346 sp|Q01292. ILV5 SPIOL MMAQIEILRKKGHSYSEIINESVIEAVDSLNPFMHARGVSFMVDNCSTTARLGSRKWAPR

gi70732562 ref YP 262325.1/1 gi 28868203 ref INP 790822.1/1 gi 269,91362 ref INP 746787. 1/1 gil 15599888 ref INP 253382.1/1 gil 114319705 ref YP 741388.1 / gil 83648555 ref (YP 436990.1/1 gil 67159493 refizP 0.042001l. 1 gil 71065099 ref YP 263826.1/1 gil 1754.6794 Iref INP 520196.1/1 gil 74318.007 ref YP315747. 1/1 gil 660.44103 ref YP 233944, 1/1 gil 120553816 ref YP 9581 67.1 / gil 146306044 ref YP 001186509. gi56552037 ref YP 162876.1/1 gi57240359 refizP_003683.08.1

gil 6225553 spo32414 ILvo RHOM gil 15897.495 ref NP 342100.1/1 gil 76801743 ref|YP 326751.1/1 gil 1831.3972 ref INP 560639.1/1 gil 19552493 refNP 600495. 1/1 gil 1607.9881 ref INP 390707. 1/1 gi 42780593 ref INP 977840.1/1 gil 42781005 refNP 978252.1/1 gil 266346 splg01292 |ILv.5 SPIOL FDYILSQQALVAVDNGAPINQDLISNFLSDPVHEAIGVCAQLRPSVDISVTADADFVRPE Patent Application Publication Jun. 25, 2009 Sheet 14 of 15 US 2009/O163376 A1

gil 70732562 ref|YP 26.2325. 1/1 ---- gi 28868203 refNP 790822. 1 1/1 ---- gi 26991362 ref|NP 746787. 1/1 ---- gil 15599888 |ref INP 253382. 1/1 ---- gill 14319705 ref YP 741388.1 / ---- gi 8364,8555 ref|YP 436990.1/1 ---- gil 67159493 ref ZP 0.042001.1.1 - - - - gi 71065099 ref YP 263826.1/1 ---- gi 1754.6794 ref INP 520196.1/1 ---- gi 74318.007 ref YP315747. 1/1 ---- gil 660.44103 ref YP 233944.1 / 1. ---- gil 12055381.6 ref YP 9581.67.1 / ---- gi 146306044 (ref YP 001186509. ---- gi 56552037 ref (YP 162876. 1 1/1 ---- gi 57240359 ref ZP 003.683.08.1 | ---- gi 6225553 spo32414 ILVC RHOM ---- gil 15897.495 ref INP 342100. 1/1 ---- gi 76801743 ref YP326751. 1 1/1 ---- gil 18313972 ref INP 560639. 1/1 ---- gil 19552493 ref INP 600495. 1/1 ---- gil 1607.9881 ref INP 390707. 1 1/1 ---- gil 42780593 ref (NP977840.1/1 ---- gil 42781005 ref INP 978252. 1/1 ---- gil 266346 spo01292. ILV5 SPIOL LRQA

US 2009/0163376 A1 Jun. 25, 2009

KETOL-ACD REDUCTOISOMERASE USING 200-fold increase in the K values for NADP(H) and a cor NADH responding decrease of more than 7-fold in those for NAD (H). Co-factor switching has been applied to a variety of enzymes including monooxygenases, (Kamerbeek, et al., 0001. This application claims the benefit of U.S. Provi Eur: J. Biochem. (2004), 271, 2107-2116); dehydrogenases: sional Applications, 61/015.346, filed Dec. 20, 2007, and Nishiyama, et al., J. Biol. Chem., (1993), 268, 4656-4660; 61/109,297, filed Oct. 29, 2008. Ferredoxin-NADP reductase, Martinez-Julvez, et al., Bio phys. Chem. (2005), 115, 219-224); and FIELD OF THE INVENTION (US2004/0248250). 0008 Rane et al., (Arch. Biochem. Biophy.s., (1997), 338, 0002 The invention relates to protein evolution. Specifi 83-89) discuss switching of a ketol acid reductoi cally, ketol-acid reductoisomerase enzymes have been Somerase isolated from E. coli by targeting four residues in evolved to use the cofactor NADH instead of NADPH. the for mutagenesis, (R68, K69, K75, and R76.); however the effectiveness of this method is in doubt. BACKGROUND OF THE INVENTION 0009. Although the above cited methods suggest that it is 0003 Ketol-acid reductoisomerase enzymes are ubiqui generally possible to switch the cofactor specificity between tous in nature and are involved in the production of valine and NADH and NADPH, the methods are enzyme specific and the isoleucine, pathways that may affect the biological synthesis outcomes unpredictable. The development of a ketol-acid of isobutanol. Isobutanol is specifically produced from reductoisomerase having a high specificity for NADH as catabolism of L-valine as a by- of yeast fermentation. opposed to NADPH would greatly enhance its effectiveness It is a component of “fusel oil that forms as a result of in the isobutanol biosynthetic pathway, however, no Such incomplete metabolism of amino acids by yeasts. After the KARI enzyme has been reported. amine group of L-valine is harvested as a nitrogen Source, the 0010 Applicants have solved the stated problem by iden resulting O-keto acid is decarboxylated and reduced to isobu tifying a number of mutant ketol-acid reductoisomerase tanol by enzymes of the Ehrlich pathway (Dickinson, et al., J. enzymes that have a preference for binding NADH as Biol. Chem. 273, 25752-25756, 1998). opposed to NADPH. 0004 Addition of exogenous L-valine to the fermentation increases the yield of isobutanol, as described by Dickinsonet SUMMARY OF THE INVENTION al., Supra, wherein it is reported that a yield of isobutanol of 3 0011. The invention relates to a method for the evolution g/L is obtained by providing L-valine at a concentration of 20 of ketol-acid reductoisomerase (KARI) enzymes from bind g/L in the fermentation. In addition, production of n-pro ing the cofactor NADPH to binding NADH. The method panol, isobutanol and isoamylalcohol has been shown by involves mutagenesis of certain specific residues in them calcium alginate immobilized cells of Zymomonas mobilis KARI enzyme to produce the co-factor Switching. (Oaxaca, et al., Acta Biotechnol., 11, 523-532, 1991). 0012. Accordingly the invention provides a mutant ketol 0005. An increase in the yield of C3-C5 alcohols from acid reductoisomerase enzyme comprising the amino acid carbohydrates was shown when amino acids leucine, isoleu sequence as set forth in SEQID NO: 29. cine, and/or valine were added to the growth medium as the 0013 Alternatively the invention provides a mutant ketol nitrogen source (WO 2005040392). acid reductoisomerase enzyme having the amino acid 0006 While methods described above indicate the poten sequence selected from the group consisting of SEQID NO: tial of isobutanol production viabiological means these meth 19, 24, 25, 26, 27, 28, 67, 68, 69, and 70. ods are cost prohibitive for industrial scale isobutanol pro 0014. In a preferred embodiment a mutant ketol-acid duction. The biosynthesis of isobutanol directly from sugars reductoisomerase enzyme is provided as set forth in SEQID would be economically viable and would represent an NO:17 comprising at least one mutation at a residue selected advance in the art. However, to date the only ketol-acid reduc from the group consisting of 24, 33, 47, 50, 52, 53, 61, 80. toisomerase (KARI) enzymes known are those that bind 115, 156, 165, and 170. NADPH in its native form, reducing the energy efficiency of 0015. In a specific embodiment the invention provides a the pathway. A KARI that would bind NADH would be ben mutant ketol-acid reductoisomerase enzyme as set forth in eficial and enhance the productivity of the isobutanol biosyn SEQID NO:17 wherein: thetic pathway by capitalizing on the NADH produced by the 0016 a) the residue at position 47 has an amino acid existing glycolytic and other metabolic pathways in most Substation selected from the group consisting of A, C, D, commonly used microbial cells. The discovery of a KARI F, G, I, L, N, P, and Y: enzyme that can use NADH as a cofactor as opposed to 0017 b) the residue at position 50 has an amino acid NADPH would be an advance in the art. Substitution selected from the group consisting of A, C, 0007. The evolution of enzymes having specificity for the D, E, F, G, M, N, V, W: NADH cofactor as opposed to NADPH is known for some 0.018 c) the residue at position 52 has an amino acid enzymes and is commonly referred to as "cofactor Switch Substitution selected from the group consisting of A, C, ing'. See for example Eppink, et al. J. Mol. Biol. (1999), 292, D, G, H, N, S: 87-96, describing the switching of the cofactor specificity of 0.019 d) the residue at position 53 has an amino acid strictly NADPH-dependent p-Hydroxybenzoate hydroxylase Substitution selected from the group consisting of A, H, (PHBH) from Pseudomonas fluorescens by site-directed I, W: mutagenesis; and Nakanishi, et al., J. Biol. Chem. (1997), 0020 e) the residue at position 156 has an amino acid 272,2218-2222, describing the use of site-directed mutagen Substitution of V: esis on a mouse lung carbonyl reductase in which Thr-38 was 0021 f) the residue at position 165 has an amino acid replaced by Asp (T38D) resulting in an enzyme having a Substitution of M: US 2009/0163376 A1 Jun. 25, 2009

0022 g) the residue at position 61 has an amino acid 0042 FIG. 2 Multiple sequence alignment (MSA) of Substitution of F: KARI enzymes from different recourses. (a) MSA among 0023 h) the residue at position 170 has an amino acid three NADPH-requiring KARI enzymes; (b) MSA among Substitution of A: PF5-KARI and other KARI enzymes, with promiscuous 0024 i) the residue at position 24 has an amino acid nucleotide specificity, where, MMC5 is from Methanococ Substitution of F: cus maripaludis C5: MMS2 is from Methanococcus mari 0025 j) the residue at position 33 has an amino acid paludis S2; MNSB is from Methanococcus vanniellii SB; Substitution of L: ilv5—is from Saccharomyces cerevisiae ilv5; KARI-D1- is 0026 k) the residue at position 80 has an amino acid from Sulfolobus solfataricus P2 ilvC; KARI-D2 is from Substitution of 1; and Pyrobaculum aerophilum P2ilvC; and KARI S1 is from 0027 l) the residue at position 115 has an amino acid Ralstonia Solanacearum GMI1000 ivlC. Substitution of L. 0043 FIG.3—Interaction of phosphate binding loop with 0028. In another embodiment the invention provides a NADPH based on homology modeling. method for the evolution of a NADPH binding ketol-acid 0044 FIG. 4: KARI activities oftop performers from the reductoisomerase enzyme to an NADH using form compris library C using cofactors NADH versus NADPH. Activity 1ng: and standard deviation were derived from triple experiments. 0029 a) providing a ketol-acid reductoisomerase The mutation information is as follows: C3A7=R47Y/S50A/ enzyme which uses NADPH having a specific native T52D/V53W; C3A10=R47Y/S50A/T52G/V53W; amino acid sequence; C3B11=R47F/S50A/T52D/V53W; C3C8=R47G/S50M/ 0030 b) identifying the cofactor switching residues in T52D/V53W; and C4D12=R47C/S50MT52D/V53W the enzymeofa) based on the amino acid sequence of the 004.5 FIG. 5 (a) KARI activities oftop performers from Pseudomonas fluorescens ketol-acid reductoisomerase libraries E, F and G using cofactors NADH versus NADPH. enzyme as set for the in SEQ ID NO:17 wherein the (b) KARI activities of positive control versus wild type Pfs cofactor Switching residues are at positions selected ilvC using cofactors NADH. Activity and standard deviation from the group consisting of 24, 33, 47, 50, 52, 53, 61. were derived from at least three parallel experiments. “Wt 80, 115, 156, 165, and 170: represents the wild type of Pfs-ilvC and “Neg” means nega 0031 c) creating mutations in at least one of the cofac tive control. tor Switching residues ofb) to create a mutant enzyme wherein said mutant enzyme binds NADH. Experiments for NADH and NADPH reactions in (a) were 30 0032. In an alternate embodiment the invention provides a minutes; in (b) were 10 minutes. method for the production of isobutanol comprising: 0046 FIG. 6 Activities oftop performers from library H 0033 a) providing a recombinant microbial host cell using cofactors NADH versus NADPH. Activity and standard comprising the following genetic constructs: deviation were derived from triple experiments. Mutation 0034 i) at least one genetic construct encoding an information is as follows: 24F9=R47P/S50G/T52D; acetolactate synthase enzyme for the conversion of 68F10=R47P/T52S: 83G10=R47P/S50D/T52S: pyruvate to acetolactate; 39G4=R47P/S50C/T52D; 91A9–R47P/S50CT52D; and 0035 ii) at least one genetic construct encoding a C3B 11=R47F/S50AFT52D/V53W mutant ketol-acid reductoisomerase enzyme of the 0047 FIG. 7. Thermostability of PF5-ilvC. The remain invention; ing activity of the enzyme after heating at certain tempera 0036 iii) at least one genetic construct encoding an tures for 10 min was the average number of triple experiments acetohydroxy acid dehydratase for the conversion of and normalized to the activity measured at room temperature. 2,3-dihydroxyisovalerate to O-ketoisovalerate, (path 0048 FIG. 8 Multiple sequence alignment among 5 way step c): naturally existing KARI molecules. The positions bolded and 0037 iv) at least one genetic construct encoding a grey highlighted were identified by error prone PCR and the branched-chain keto acid decarboxylase, of the con positions only grey highlighted were targeted for mutagen version of O-ketoisovalerate to isobutyraldehyde, CS1S. (pathway step d); 0049 FIG.9—Alignment of the twenty-four functionally 0038 v) at least one genetic construct encoding a verified KARI sequences. The GXGXX(G/A) motif involved branched-chain alcohol dehydrogenase for the con in the binding of NAD(P)H is indicated below the alignment. version of isobutyraldehyde to isobutanol (pathway 0050 FIG. 10 An example of the alignment of step e); and Pseudomonas fluorescens Pf-5 KARI to the profile HMM of 0039 b) growing the host cell of (a) under conditions KARI. The eleven positions that are responsible for co-factor where iso-butanol is produced. Switching are bolded and shaded in grey. 0051 Table 9 is a table of the Profile HMM of the KARI BRIEF DESCRIPTION OF THE FIGURES enzymes described in Example 5. The eleven positions in the SEQUENCE DESCRIPTIONS profile HMM representing the columns in the alignment 0040. The invention can be more fully understood from which correspond to the eleven cofactor Switching positions the following detailed description, the Figures, and the in Pseudomonas fluorescens Pf-5 KARI are identified as posi accompanying sequence descriptions, which form part of this tions 24, 33, 47, 50, 52, 53, 61,80, 115, 156, and 170. The application. lines corresponding to these positions in the model file are 0041 FIG. 1—Shows four different isobutanol biosyn highlighted in yellow. Table 9 is submitted herewith elec thetic pathways. The steps labeled “a”, “b”, “c”, “d”, “e”, “f”, tronically and is incorporated herein by reference. “g”, “h”, “i', “” and “k” represent the substrate to product 0052. The following sequences conform with 37 C.F.R. conversions described below. 1.821-1.825 (“Requirements for Patent Applications Con US 2009/0163376 A1 Jun. 25, 2009

taining Nucleotide Sequences and/or Amino Acid Sequence the EPO and PCT (Rules 5.2 and 49.5(a-bis), and Section 208 Disclosures—the Sequence Rules”) and are consistent with and Annex C of the Administrative Instructions). The sym the World Intellectual Property Organization (WIPO) Stan bols and format used for nucleotide and amino acid sequence dard ST.25 (1998) and the sequence listing requirements of data comply with the rules set forth in 37 C.F.R.S 1.822.

TABLE 1.

OLIGONUCLEOTIDE PRIMERS USED IN THIS INVENTION

SEQUENCE ID No. SEQUENCE Description

1. TGA TGAACA CTTCGCGTATTCGCCGTCCT Reverse Primer for pBAD vector

TAGACG GACTG TGGCCTGNNTAAAGGCNN Forward primer CNNCTGGGCCAAGGCT GAAGCCCACGGCTTG library C

TAGACG GACTG TGGCCTGNNTAAAGGCTCG Forward primer for GCTACCGTTGCCAAGGCTGAAGCCCACGGCTTG library E

TAGACG GACTG TGGCCTGCGTAAAGGCNNT Forward primer for GCTACCGTTGCCAAGGCTGAAGCCCACGGCTTG library

TAGACG GACTG TGGCCTGCGTAAAGGCTCG Forward primer for GCTNNTGTTGCCAAGGCTGAAGCCCACGGCTTG library G

TAGACG GACTG TGGCCTGNNTAAAGGCNNT Forward primer for GCTNNTGTTGCCAAGGCTGAAGCCCACGGCTTG library H

AAGATTAGCGGATCC ACCT Sequencing primer (forward)

AACAGCCAAGCTTTTAGTTC Sequencing primer (reverse)

CTCTCTACTGTTTCTCCATACCCG pBAD 266 - O 21308f

21 CAAGCCGTGGGCTTCAGCCTTGGCKNN PF5 53MtO229 O8r

22 CGGTTTCAGTCTCGTCCTTGAAG pBAD 866 - O 21308

49 GCTCAAGCANNKAACCTGAAGG

SO CCTTCAGGTTKNNTGCTTGAGC pBAD - 427 C33 0908O8r

51 GTAGACGTGNNKGTTGGCCTG

52 CAGGCCAACKNNCACGTCTAC

53 CTGAAGCCNNKGGCNNKAAAGTGAC pBAD - 484 - H59L 61 O908O8f

54 GTCACTTTKNNGCCKNNGGCTTCAG

55 GCAGCCGTTNNKGGTGCCGACT

56 AGTCGGCACCKNNAACGGCTGC

f CATGATCCTGNNKCCGGAGGAG

58 CTCGTCCGGKNNCAGGATCATG

59 CAAGAAGGGCNNKACTCTGGCCT

6 O AGGCCAGAGTKNNGCCCTTCTTG US 2009/0163376 A1 Jun. 25, 2009

TABLE 1 - continued

OLIGONUCLEOTIDE PRIMERS USED IN THIS INVENTION

SEQUENCE ID No. SEQUENCE Description

61 GTTGTGCCTNNKGCCGACCTCG pBAD-663

62 CGAGGTCGGCKNNAGGCACAAC pBAD - 685

0053 Additional sequences used in the application are SEQID NO: 29 is a consensus amino acid sequence compris listed below. The abbreviated gene names in bracket are used ing all experimentally verified KARI point mutations as in this disclosure. based on SEQID NO:17. SEQ ID NO: 9-Methanococcus maripaludis C5-ilvC SEQID NO:30 is the amino acid sequence for KARI from (MMC5) GenBank Accession Number NC 009135.1 Natronomonas pharaonis DSM 2160 Region: 901034.902026 SEQID NO: 31 is the amino acid sequence for KARI from SEQID NO: 10 is the Methanococcus maripaludis S2-ilvC Bacillus subtilis subsp. subtilis str. 168 (MMS2) GenBank Accession Number NC 005791.1 SEQID NO: 32 is the amino acid sequence for KARI from Region: 645729.646721 Corynebacterium glutamicum ATCC13032 SEQ ID NO: 11 is the Methanococcus vannielii SB-ilv5 SEQID NO:33 is the amino acid sequence for KARI from (MVSB) GenBank Accession Number Phaeospirilum molischianum NZ AAWXO1000002. 1 Region: 302214.303206 SEQID NO. 34 is the amino acid sequence for KARI from SEQID NO: 12 is the Saccharomyces cerevisiae ilv5 (ilv5)— Zymomonas mobilis subsp. mobilis ZM4 GenBank Accession Number NC 001144.4 Region: SEQID NO: 35 is the amino acid sequence for KARIAlka 838065.839252 lilimnicola ehrlichei MLHE-1 SEQID NO: 13 is the Sulfolobus solfataricus P2 ilvC (KARI SEQID NO:36 is the amino acid sequence for KARI from D1)—GenBank Accession Number NC 002754.1 Region: Campylobacter lari RM2100 506253.507260 SEQID NO:37 is the amino acid sequence for KARI from SEQID NO: 14 is the Pyrobaculum aerophilum str. IM2 ilvC Marinobacter aquaeolei VT8 (KARI-D2) GenBank Accession Number NC 003364.1 SEQ ID NO:38 is the amino acid sequence for KARI Psy Region: 1976281. 1977267 chrobacter arcticus 273-4 SEQ ID NO: 15 is the Ralstonia solanacearum GMI1000 SEQID NO:39 is the amino acid sequence for KARI from ilvC (KARI-S1)—GenBank Accession Number Hahella cheiuensis KCTC2396 NC 003295.1 Region: 2248264.2249280 SEQID NO: 40 is the amino acid sequence for KARI from SEQ ID NO: 16 is the Pseudomonas aeruginosa PAO1 Thiobacillus denitrificans ATCC25259 ilvC GenBank Accession Number NC 002516 Region: SEQID NO: 41 is the amino acid sequence for KARI from 5272455.5273471 Azotobacter vinelandii AVOP SEQID NO: 17 is the Pseudomonas fluorescens PF5 ilvC GenBankAccession Number NC 004129 Region: 6017379. SEQID NO: 42 is the amino acid sequence for KARI from 6O18395 Pseudomonas syringae pv. Syringae B728a SEQ ID NO: 18 is the Spinacia oleracea ilvC (Spinach SEQID NO: 43 is the amino acid sequence for KARI from KARI) GenBank Accession Number NC 002516 Region: Pseudomonas syringae pv. tomato str. DC3000 12050. SEQID NO: 44 is the amino acid sequence for KARI from SEQ ID NO: 19 is the amino acid sequence of the mutant Pseudomonas putida KT2440 (Y24F/R47Y/S50A/T52D/V53A/L61F/G170A) of the ilvC SEQID NO: 45 is the amino acid sequence for KARI from native protein of Pseudomonas fluorescens. Pseudomonas entomophila L48 SEQID NO:23 is the DNASEQ of the mutant (Y24F/R47Y/ SEQID NO: 46 is the amino acid sequence for KARI from S50A/T52D/V53A/L61F/G170A) of the ilvC native protein Pseudomonas mendocina ymp of Pseudomonas fluorescens. SEQID NO: 47 is the amino acid sequence for KARI from SEQ ID NO: 24 is the amino acid SEQ of the mutant ZB1 Bacillus cereus ATCC10987 NP 977840.1 (Y24F/R47Y/S50A/T52D/V53A/L61F/A156V) SEQID NO: 48 is the amino acid sequence for KARI from SEQ ID NO: 25 is the amino acid SEQ of the mutant ZF3 Bacillus cereus ATCC 10987 NP 978252.1 (Y24F/C33L/R47Y/S50A/T52D/V53A/L61F) SEQID NO: 63 is the amino acid sequence for KARI from SEQ ID NO: 26 is the amino acid SEQ of the mutant ZF2 Escherichia coli-GenBank Accession Number P05793 (Y24F/C33L/R47Y/S50A/T52D/V53A/L61F/A156V) SEQID NO: 64 is the amino acid sequence for KARI from SEQ ID NO: 27 is the amino acid SEQ of the mutant ZB3 Marine Gamma Proteobacterium HTCC2207-GenBank (Y24F/C33L/R47Y/S50A/T52D/V53A/L61F/G170A) Accession Number ZP 01224863.1 SEQ ID NO: 28 is the amino acid SEQ of the mutant Z4B8 SEQID NO: 65 is the amino acid sequence for KARI from (C33L/R47Y/S50A/T52D/V53A/L61F/T801/A156V/ Desulfuromonas acetoxidans—GenBank Accession Number G170A) ZP 01313517.1 US 2009/0163376 A1 Jun. 25, 2009

SEQID NO: 66 is the amino acid sequence for KARI from NC-002516, (SEQID NO:16)REGION:5272455.5273471, Pisum sativum (Pea)—GenBank Accession Number 082043 and Pseudomonas fluorescens GenBank Accession Number SEQ ID NO: 67 is the amino acid sequence for the mutant NC-004129 (SEQID NO:17) REGION: 6017379.6018395. 3361 G8 (C33L/R47Y/S50A/T52D/V53A/L61F/T801) As used herein the term "Class I ketol-acid reductoisomerase SEQ ID NO: 68 is the amino acid sequence for the mutant enzyme” means the shortform that typically has between 330 2H10 (Y24F/C33L/R47Y/S50A/T52D/V531/L61F/T801/ and 340 amino acid residues, and is distinct from the long A156V) form, called class II, that typically has approximately 490 SEQ ID NO: 69 is the amino acid sequence for the mutant residues. 1D2 (Y24F/R47Y/S50A/T52D/V53A/L61F/T801/A156V 0062. The term “acetolactate synthase' refers to an SEQ ID NO: 70 is the amino acid sequence for the mutant enzyme that catalyzes the conversion of pyruvate to acetolac 3F12 (Y24F/C33L/R47Y/S50A/T52D/V53A/L61F/T801/ tate and CO. Acetolactate has two stereoisomers ((R)- and A156V). (S)- ); the enzyme prefers the (S)— isomer, which is made by biological systems. Preferred acetolactate synthases are DETAILED DESCRIPTION OF THE INVENTION known by the EC number 2.2.1.69 (Enzyme Nomenclature 0054 The present invention relates to the generation of 1992, Academic Press, San Diego). These enzymes are avail mutated KARI enzymes to use NADH as opposed to able from a number of Sources, including, but not limited to, NADPH. These co-factor switched enzymes function more Bacillus subtilis (GenBank Nos: CAB15618, Z99122, NCBI effectively in microbial systems designed to produce isobu (National Center for Biotechnology Information) amino acid tanol. Isobutanol is an important industrial commodity sequence, NCBI nucleotide sequence, respectively), Kleb chemical with a variety of applications, where its potential as siella pneumoniae (GenBank Nos: AAA25079 (SEQ ID a fuel or fuel additive is particularly significant. Although NO:2), M73842 (SEQ ID NO:1)), and Lactococcus lactis only a four-carbon alcohol, butanol has the energy content (GenBank Nos: AAA25161, L16975). similar to that of gasoline and can be blended with any fossil 0063. The term “acetohydroxy acid dehydratase' refers to fuel. Isobutanol is favored as a fuel or fuel additive as it yields an enzyme that catalyzes the conversion of 2,3-dihydroxy only CO and little or no SO, or NO, when burned in the isovalerate to C.-ketoisovalerate. Preferred acetohydroxyacid standard internal combustion engine. Additionally butanol is dehydratases are known by the EC number 4.2.1.9. These less corrosive than ethanol, the most preferred fuel additive to enzymes are available from a vast array of microorganisms, date. including, but not limited to, E. coli (GenBank Nos: 0055. The following definitions and abbreviations are to YP 026248, NC 000913, S. cerevisiae (GenBank Nos: be use for the interpretation of the claims and the specifica NP 012550, NC 001142), M. maripaludis (GenBank Nos: tion. CAF29874, BX957219), and B. subtilis (GenBank Nos: 0056. The term “invention” or “present invention” as used CAB14105, Z99115). herein is meant to apply generally to all embodiments of the 0064. The term “branched-chain C.-keto acid decarboxy invention as described in the claims as presented or as later lase' refers to an enzyme that catalyzes the conversion of amended and Supplemented, or in the specification. C.-ketoisovalerate to isobutyraldehyde and CO. Preferred 0057 The term “isobutanol biosynthetic pathway” refers branched-chain C-keto acid decarboxylases are known by the to the enzymatic pathway to produce isobutanol. Preferred EC number 4.1.1.72 and are available from a number of isobutanol biosynthetic pathways are illustrated in FIG. 1 and Sources, including, but not limited to, Lactococcus lactis described herein. (GenBank Nos: AAS49166, AY548760; CAG34226, 0058. The term “NADPH consumption assay” refers to an AJ746364, Salmonella typhimurium (GenBank Nos: enzyme assay for the determination of the specific activity of NP-461346, NC-003197), and Clostridium acetobutylicum the KARI enzyme, involving measuring the disappearance of (GenBank Nos: NP-149189, NC-001988). the KARI cofactor, NADPH, from the enzyme reaction. 0065. The term “branched-chain alcohol dehydrogenase' 0059) “KARI is the abbreviation for the enzyme Ketol refers to an enzyme that catalyzes the conversion of isobu acid reductoisomerase. tyraldehyde to isobutanol. Preferred branched-chain alcohol 0060. The term “close proximity” when referring to the dehydrogenases are known by the EC number 1.1.1.265, but position of various amino acid residues of a KARI enzyme may also be classified under other alcohol dehydrogenases with respect to the adenosyl 2-phosphate of NADPH means (specifically, EC 1.1.1.1 or 1.1.1.2). These enzymes utilize amino acids in the three-dimensional model for the structure NADH (reduced nicotinamide adenine dinucleotide) and/or of the enzyme that are within about 4.5A of the phosphorus NADPH as electron donorand are available from a number of atom of the adenosyl 2-phosphate of NADPH bound to the Sources, including, but not limited to, S. cerevisiae (GenBank enzyme. Nos: NP-010656, NC-001136: NP-014051, NC-001145), E. 0061 The term “Ketol-acid reductoisomerase' (abbrevi coli (GenBank Nos: NP-417-484, and C. acetobutyllicum ated “KARI), and "Acetohydroxy acid isomeroreductase' (GenBank Nos: NP-349892, NC 003030). will be used interchangeably and refer the enzyme having the 0066. The term “branched-chain keto acid dehydroge EC number, EC 1.1.1.86 (Enzyme Nomenclature 1992, Aca nase' refers to an enzyme that catalyzes the conversion of demic Press, San Diego). Ketol-acid reductoisomerase cata C.-ketoisovalerate to isobutyryl-CoA (isobutyryl-cofactor A), lyzes the reaction of (S)-acetolactate to 2,3-dihydroxyisoval using NAD" (nicotinamide adenine dinucleotide) as electron erate, as more fully described below. These enzymes are acceptor. Preferred branched-chain keto acid dehydrogenases available from a number of sources, including, but not limited are known by the EC number 1.2.4.4. These branched-chain to E. coli GenBank Accession Number NC-000913 keto acid dehydrogenases comprise four subunits, and REGION: 3955993.3957468, Vibrio cholerae GenBank sequences from all Subunits are available from a vast array of Accession Number NC-002505 REGION: 1574.41.158925, microorganisms, including, but not limited to, B. subtilis Pseudomonas aeruginosa, GenBank Accession Number (GenBank Nos: CAB14336, Z991 16; CAB14335, Z991 16; US 2009/0163376 A1 Jun. 25, 2009

CAB14334, Z991 16; and CAB14337, Z991 16) and gene' refers to a native gene in its natural location in the Pseudomonas putida (GenBank Nos: AAA65614, M57613; genome of an organism. A “foreign' gene refers to a gene not AAA65615, M57613; AAA65617, M57613; and normally found in the host organism, but that is introduced AAA65618, M57613). into the host organism by gene transfer. Foreign genes can 0067. The terms “k,egg and “K” are known to those comprise native genes inserted into a non-native organism, or skilled in the art and are described in Enzyme Structure and chimeric genes. A “transgene' is a gene that has been intro Mechanism, 2" ed. (Ferst; W.H. Freeman: NY, 1985; pp duced into the genome by a transformation procedure. 98-120). The term “k, often called the “turnover number, 0072. As used herein the term “Coding sequence” refers to is defined as the maximum number of Substrate molecules a DNA sequence that codes for a specific amino acid converted to products per per unit time, or the sequence. "Suitable regulatory sequences' refer to nucleotide number of times the enzyme turns over per unit time. sequences located upstream (5' non-coding sequences), kVmax/E, where E is the enzyme concentration within, or downstream (3' non-coding sequences) of a coding (Ferst, supra). The terms “total turnover and “total turnover sequence, and which influence the transcription, RNA pro number are used herein to refer to the amount of product cessing or stability, or translation of the associated coding formed by the reaction of a KARI enzyme with substrate. sequence. Regulatory sequences may include promoters, 0068. The term “catalytic efficiency” is defined as the translation leader sequences, introns, polyadenylation recog k/K of an enzyme. Catalytic efficiency is used to quantify nition sequences, RNA processing site, effector the specificity of an enzyme for a substrate. and stem-loop structure. 0069. The term “isolated nucleic acid molecule”, “isolated (0073. The term “Promoter” refers to a DNA sequence nucleic acid fragment' and “genetic construct will be used capable of controlling the expression of a coding sequence or interchangeably and will mean a polymer of RNA or DNA functional RNA. In general, a coding sequence is located 3' to that is single- or double-stranded, optionally containing Syn a promoter sequence. Promoters may be derived in their thetic, non-natural or altered nucleotide bases. An isolated entirety from a native gene, or be composed of different nucleic acid fragment in the form of a polymer of DNA may elements derived from different promoters found in nature, or be comprised of one or more segments of cDNA, genomic even comprise synthetic DNA segments. It is understood by DNA or synthetic DNA. those skilled in the art that different promoters may direct the 0070. The term “amino acid refers to the basic chemical expression of a gene in different tissues or cell types, or at structural unit of a protein or polypeptide. The following different stages of development, or in response to different environmental or physiological conditions. Promoters which abbreviations are used hereinto identify specific amino acids: cause a gene to be expressed in most cell types at most times are commonly referred to as “constitutive promoters’. It is further recognized that since in most cases the exact bound Three-Letter One-Letter aries of regulatory sequences have not been completely Amino Acid Abbreviation Abbreviation defined, DNA fragments of different lengths may have iden Alanine Ala A. tical promoter activity. Arginine Arg R 0074 The term “operably linked’ refers to the association Asparagine ASn N of nucleic acid sequences on a single nucleic acid fragment so Aspartic acid Asp D Cysteine Cys C that the function of one is affected by the other. For example, Glutamine Gln Q a promoter is operably linked with a coding sequence when it Glutamic acid Glu E is capable of effecting the expression of that coding sequence Glycine Gly G (i.e., that the coding sequence is under the transcriptional Histidine His H Leucine Leu L control of the promoter). Coding sequences can be operably Lysine Lys K linked to regulatory sequences in sense or antisense orienta Methionine Met M tion. Phenylalanine Phe F Proline Pro P 0075. The term “expression', as used herein, refers to the Serine Ser S transcription and stable accumulation of sense (mRNA) or Threonine Thr T antisense RNA derived from the nucleic acid fragment of the Tryptophan Trp W invention. Expression may also refer to translation of mRNA Tyrosine Tyr Y into a polypeptide. Valine Wall V 0076. As used herein the term “transformation” refers to the transfer of a nucleic acid fragment into the genome of a 0071. The term “Gene’ refers to a nucleic acid fragment host organism, resulting in genetically stable inheritance. that is capable of being expressed as a specific protein, Host organisms containing the transformed nucleic acid frag optionally including regulatory sequences preceding (5' non ments are referred to as “transgenic’’ or “recombinant’ or coding sequences) and following (3' non-coding sequences) “transformed' organisms. the coding sequence. “Native gene' refers to a gene as found (0077. The terms “plasmid”, “vector” and “cassette” refer in nature with its own regulatory sequences. "Chimeric gene’ to an extra chromosomal element often carrying genes which refers to any gene that is not a native gene, comprising regu are not part of the central metabolism of the cell, and usually latory and coding sequences that are not found together in in the form of circular double-stranded DNA fragments. Such nature. Accordingly, a chimeric gene may comprise regula elements may be autonomously replicating sequences, tory sequences and coding sequences that are derived from genome integrating sequences, phage or nucleotide different sources, or regulatory sequences and coding sequences, linear or circular, of a single- or double-stranded sequences derived from the same source, but arranged in a DNA or RNA, derived from any source, in which a number of manner different than that found in nature. “Endogenous nucleotide sequences have been joined or recombined into a US 2009/0163376 A1 Jun. 25, 2009 unique construction which is capable of introducing a pro NADH produced in isobutanol production being consumed in moter fragment and DNA sequence for a selected gene prod wasteful reduction of other metabolic intermediates; e.g., by uct along with appropriate 3' untranslated sequence into a the production of lactate from pyruvate. Thus, the imbalance cell. “Transformation cassette' refers to a specific vector between NADH produced and NADPH consumed by the containing a foreign gene and having elements in addition to isobutanol pathway leads to a reduction in the molar yield of the foreign gene that facilitates transformation of a particular isobutanol produced from glucose in two ways: 1) unneces host cell. “Expression cassette' refers to a specific vector sary operation of metabolism to produce NADPH, and 2) containing a foreign gene and having elements in addition to wasteful reaction of metabolic intermediates to maintain NAD"/NADH homeostasis. The solution to this problem is to the foreign gene that allow for enhanced expression of that invent a KARI that is specific for NADH as its cofactor, so gene in a foreign host. that both molecules of NADH produced in glycolysis are 0078. As used herein the term “Codon degeneracy” refers consumed in the synthesis of isobutanol from pyruvate. to the nature in the genetic code permitting variation of the nucleotide sequence without effecting the amino acid Keto Acid Reductoisomerase (KARI) Enzymes sequence of an encoded polypeptide. The skilled artisan is I0083 Acetohydroxy acid isomeroreductase or Ketol-acid well aware of the “codon-bias’ exhibited by a specific host reductoisomerase (KARI; EC 1.1.1.86) catalyzes two steps in cell in usage of nucleotide codons to specify a given amino the biosynthesis of branched-chain amino acids and is a key acid. Therefore, when synthesizing a gene for improved enzyme in their biosynthesis. KARI is found in a variety of expression in a host cell, it is desirable to design the gene Such organisms and amino acid sequence comparisons across spe that its frequency of codon usage approaches the frequency of cies have revealed that there are 2 types of this enzyme: a short form (class I) found in fungi and most bacteria, and a long preferred codon usage of the host cell. form (class II) typical of plants. 007.9 The term “codon-optimized as it refers to genes or I0084 Class I KARIs typically have between 330-340 coding regions of nucleic acid molecules for transformation amino acid residues. The long form KARI enzymes have of various hosts, refers to the alteration of codons in the gene about 490 amino acid residues. However, some bacteria such or coding regions of the nucleic acid molecules to reflect the as Escherichia Coli possess a long form, where the amino typical codon usage of the host organism without altering the acid sequence differs appreciably from that found in plants. polypeptide encoded by the DNA. KARI is encoded by the ilvC gene and is an essential enzyme 0080 Standard recombinant DNA and molecular cloning for growth of E. Coli and other bacteria in a minimal medium. techniques used here are well known in the art and are Typically KARI uses NADPH as cofactor and requires a described by Sambrook et al. (Sambrook, Fritsch and Mania divalent cation such as Mg" for its activity. In addition to tis, Molecular Cloning: A Laboratory Manual, Second Edi utilizing acetolactate in the valine pathway, KARI also con tion, Cold Spring Harbor Laboratory Press, Cold Spring Har verts acetohydroxybutanoate to dihydroxymethylpentanoate bor, N.Y., 1989) (hereinafter “Maniatis”); and by Silhavy et in the isoleucine production pathway. al. (Experiments with Gene Fusions, Cold Spring Harbor I0085 Class II KARIs generally consist of a 225-residue Laboratory Press Cold Spring Harbor, N.Y., 1984); and by N-terminal domain and a 287-residue C-terminal domain. Ausubel, F. M. et al., (Current Protocols in Molecular Biol The N-terminal domain, which contains the NADPH-binding ogy, published by Greene Publishing Assoc. and Wiley-Inter site, has an O/B structure and resembles domains found in other pyridine nucleotide-dependent oxidoreductases. The science, 1987). C-terminal domain consists almost entirely of C-helices and 0081. The present invention addresses a need that arises in is of a previously unknown topology. the microbial production of isobutanol where the ketol-acid I0086. The crystal structure of the E. Coli KARI enzyme at reductoisomerase enzyme performs a vital role. Wild type 2.6 A resolution has been solved (Tyagi, et al., Protein Sci ketol-acid reductoisomerase enzymes typically use NADPH ence, 14, 3089-3100, 2005). This enzyme consists of two as their cofactor. However, in the formation of isobutanol an domains, one with mixed C/B structure which is similar to that excess of NADH is produced by ancillary metabolic path found in other pyridine nucleotide-dependent dehydrogena ways. The invention provides mutant Class I KARI enzymes ses. The second domain is mainly C-helical and shows strong that have been evolved to utilize NADH as a cofactor, over evidence of internal duplication. Comparison of the active coming the cofactor problem and increasing the efficiency of sites of KARI of E. Coli, Pseudomonas aeruginosa, and the isobutanol biosynthetic pathway. spinach showed that most residues in the active site of the 0082 Production of isobutanol utilizes the glycolysis enzyme occupy conserved positions. While the E. Coli KARI pathway present in the host organism. During the production was crystallized as a tetramer, which is probably the likely of two molecules of pyruvate from glucose during glycolysis, biologically active unit, the Paeruginosa KARI (Ahn, et al., there is net production of two molecules of NADH from J. Mol. Biol., 328, 505-515, 2003) formed a dodecamer, and NAD" by the glyceraldehyde-3-phosphate dehydrogenase the enzyme from spinach formed a dimer. Known KARIs are reaction. During the further production of one molecule of slow enzymes with a reported turnover number (k) of 2s' isobutanol from two molecules of pyruvate, there is net con (Aulabaugh et al.: Biochemistry, 29, 2824-2830, 1990) or sumption of one molecule of NADPH, by the KARI reaction, 0.12 s' (Rane et al., Arch. Biochem. Biophys. 338, 83-89, and one molecule of NADH by the isobutanol dehydrogenase 1997) for acetolactate. Studies have shown that genetic con reaction. The overall reaction of glucose to isobutanol thus trol of isoleucine-valine biosynthesis in E. coli is different leads to net production of one molecule of NADH and net than that in Ps. aeruginosa (Marinus, et al., Genetics, 63. consumption of one molecule of NADPH. The interconver 547-56, 1969). sion of NADH with NADPH is generally slow and inefficient; thus, the NADPH consumed is generated by metabolism (for Identification of Amino Acid Target Sites for Cofactor example, by the pentose phosphate pathway) consuming Sub Switching strate in the process. Meanwhile, the cell strives to maintain I0087. It was reported that phosphate p2 oxygen atoms of homeostasis in the NAD"/NADH ratio, leading to the excess NADPH form hydrogen bonds with side chains of Arg162, US 2009/0163376 A1 Jun. 25, 2009

Ser165 and Ser167 of spinach KARI (Biou V. et al. The 0097 g) the residue at position 61 has an amino acid EMBO Journal, 16: 3405-3415, 1997). Multiple sequence Substitution of F: alignments were performed, using vector NTI (Invitrogen 0.098 h) the residue at position 170 has an amino acid Corp. Carlsbad, Calif.), with KARI enzymes from spinach, Substitution of A: Pseudomonas aeruginosa (PAO-KARI) and Pseudomonas 0099 i) the residue at position 24 has an amino acid fluorescens (PF5-KARI). The NADPH binding sites are Substitution of F.; and shown in FIG. 2a. The amino acids, argenine, threonine and 0.100 j) the residue at position 115 has an amino acid serine appear to play similar roles informing hydrogen bonds Substitution of L. with phosphate p2 oxygen atoms of NADPH in KARI 0101. In further work, multiple sequence alignment of enzymes. Studies by Ahn et al (J. Mol. Biol. 328: 505-515, Pseudomonas fluorescens PF5-ilvC and Bacillus cereus 2003) had identified three NADPH phosphate binding sites ilvC1 and livC2 and spinach KARI was performed which (Arg47, Ser50 and Thr52) for Pseudomonas aeruginosa allowed identification of positions 24, 33, 47, 50, 52, 53, 61. (PAO-KARI) following comparing its structure with that of 80, 156 and 170 for further mutagenesis. More specifically the spinach KARI. Hypothesizing that these three NADPH mutants with much lower K for NADH were obtained. phosphate binding sites of the three KARI enzymes used in These mutations are also based on the Pseudomonas fluore the disclosure were conserved, Arg47, Ser50 and Thr52 of scens, KARI enzyme (SEQ ID NO:17) as a reference PF5-KARI were targeted as the phosphate binding sites for sequence wherein the reference sequence comprises at least this enzyme. This hypothesis was further confirmed through one amino acid Substitution selected from the group consist homology modeling. ing of 0088. Multiple sequence alignment among PF5-ilvC and 0102 k) the residue at position 24 has an amino acid several other KARI enzymes with promiscuous nucleotide Substitution of phenylalanine; specificity was also performed. As shown in FIG. 2b, the 0.103 1) the residue at position 50 has an amino acid amino acids of glycine (G50) and tryptophan (W53), in other Substitution of alanine; KARI enzymes in FIG.2b, always appear together as a pair in 0.104 m) the residue at position 52 has an amino acid the sequences of those enzymes. It was therefore assumed Substitution of aspartic acid; that the tryptophan 53 bulky residue was important in deter 0105 n) the residue at position 53 has an amino acid mining nucleotide specificity by reducing the size of nucle Substitution of alanine; otide binding pocket to favor the smaller nucleotide, NADH. 0106 o) the residue at position 61 has an amino acid Position 53 of PF5-ilvC was therefore chosen as a target for Substitution of phenylalanine; mutagenesis. 0.107 p) the residue at position 156 has an amino acid 0089. Several site-saturation gene libraries were prepared Substitution of valine; containing genes encoding KARI enzymes by commercially 0.108 q) the residue at position 33 has an amino acid available kits for the generation of mutants. Clones from each Substitution of leucine; library were screened for improved KARI activity using the 0.109 r) the residue at position 47 has an amino acid NADH consumption assay described herein. Screening substitution of tyrosine; resulted in the identification of a number of genes having 0110 s) the residue at position 80 has an amino acid mutations that can be correlated to KARI activity. The loca Substitution of isoleucine; and tion of the mutations were identified using the amino acid 0.111 t) the residue at position 170 has an amino acid sequence the Pseudomonas fluorescens PF5 ilvC protein Substitution of alanine. (SEQ ID NO:17). Mutants having improved KARI activity 0112 The present invention includes a mutant polypep were those which had mutations at the following positions: tide having KARI activity, said polypeptide having an amino 47, 50, 52 and 53. More specifically desirable mutations acid sequence selected from the group consisting of SEQID included the following substitutions: NO: 24, 25, 26, 27 and 28. 0090 a) the residue at position 47 has an amino acid 0113. A consensus sequence for the mutant ilvC was gen Substation selected from the group consisting of A, C, D, erated from the multiple sequence alignment and is provided F, G, I, L, N, P, and Y: as SEQID NO: 29 which represents all experimentally veri 0091 b) the residue at position 50 has an amino acid fied mutations of the KARI enzyme based on the amino acid Substitution selected from the group consisting of A, C, sequence of the KARI enzyme isolated from Pseudomonas D, E, F, G, M, N, V, W: fluorescens, (SEQID NO:17) 0092 c) the residue at position 52 has an amino acid 0114. Additionally the present invention describes muta Substitution selected from the group consisting of A, C, tion positions identified using a profile Hidden Markov D, G, H, N, S: Model (HMM) built based on sequences of 25 functionally 0093 d) the residue at position 53 has an amino acid verified Class I and Class II KARI enzymes. Profile HMM Substitution selected from the group consisting of A, H, identified mutation positions 24, 33, 47, 50, 52, 53, 61, 80. I, W: 115, 156, and 170 (the numbering is based on the sequences 0094. In another embodiment, additional mutagenesis, of Pseudomonas fluorescens PF5 KARI). Thus, it will be using error prone PCR, performed on the mutants listed above appreciated by the skilled person that mutations at these posi identified suitable mutation positions as: 156, 165, 61, 170, tions, as well as those discussed above that have been experi 115 and 24. More specifically the desirable mutants with mentally verified will also give rise to KARI enzymes having lower K, for NADH contained the following substitutions: the ability to bind NADH. 0.095 e) the residue at position 156 has an amino acid Substitution of V: The Host Strains for Kari Engineering 0096 f) the residue at position 165 has an amino acid 0115 Two host strains, E. coli TOP10 from Invitrogen and Substitution of M: E. Coli Bw25113 (AilvC, an ilvC gene-knockout), were used US 2009/0163376 A1 Jun. 25, 2009 for making constructs over-expressing the KARI enzyme in The conformation of AL was manually adjusted to favor this disclosure. In the Bw251 13 strain, the entire ilvC gene of hydride transfer of C4 of the nicotinamine of NADPH and the the E. Coli chromosome was replaced by a Kanamycin cas Substrate. No further energy minimization was performed on sette using the Lambda red homology recombination technol this model (Coordinates of the model created for this study ogy described by Kirill et al., (Kirill A. Datsenko and Barry L. are attached in a separate word file.). The residues in the Wanner, Proc. Natl. Acad. Sci. USA,97, 6640-6645, 2000). phosphate binding loop and their interactions with NADPH Homology Modeling of Pf Kari with Bound Substrates are illustrated in FIG. 3. 0116. The structure of PF5-KARI with bound NADPH, acetolactate and magnesium ions was built based on the crys Application of a Profile Hidden Markov Model for Identifi tal structure of P aeruginosa PAO1-KARI (PDB ID 1 NP3, cation of Residue Positions Involved in Cofactor Switching in Ahn H.J. etal, J. Mol. Biol. 328, 505-515, 2003) which has Kari Enzymes 92% amino acid sequence homology to PF5 KARI. PAO1 I0120 Applicants have developed a method for identifying KARI structure is a homo-dodecamer and each dodecamer KARI enzymes and the residue positions that are involved in consists of six homo-dimers with extensive dimer interface. cofactor switching from NADPH to NADH. To structurally The active site of KARI is located in this dimer interface. The characterize KARI enzymes, a Profile Hidden Markov Model biological assembly is formed by six homo-dimers positioned (HMM) was prepared as described in Example 5 using amino on the edges of a tetrahedron resulting in a highly symmetri acid sequences of 25 KARI proteins with experimentally cal dodecamer of 23 point group symmetry. For simplicity, verified function as outlined in Table 6. These KARIs were only the dimeric unit (monomer A and monomer B) was built from Pseudomonas fluorescens Pf-5 (SEQID NO: 17), Sul for the homology model of PF5-KARI in this study because folobus solfataricus P2 (SEQID NO: 13), Pyrobaculum aero the active site is in the homo-dimer interface. philum str. IM2 (SEQID NO: 14), Natronomonas pharaonis 0117. The model of PF5-KARI dimer was built based on DSM 2160 (SEQID NO:30), Bacillus subtilis subsp. subtilis the coordinates of monomer A and monomer B of PAO1 str. 168 (SEQ ID NO: 31), Corynebacterium glutamicum KARI and sequence of PF5-KARI using DeepView/Swiss ATCC 13032 (SEQ ID NO: 32), Phaeospririlum molis PDB viewer (Guex, N. and Peitsch, M. C. Electrophoresis 18: chianum (SEQ ID NO: 33), Ralstonia Solanacearum 2714-2723, 1997). This model was then imported to program GMI1000 (SEQ ID NO: 15), Zymomonas mobilis subsp. 0 (Jones, T. A. etal, Acta Crystallogr. A 47,110-119, 1991) on mobilis ZM4 (SEQ ID NO. 34), Alkalilimnicola ehrlichei a Silicon Graphics system for further modification. MLHE-1 (SEQ ID NO: 35), Campylobacter lari RM2100 0118. The structure of PAO1-KARI has no NADPH, sub (SEQ ID NO:36), Marinobacter aquaeolei VT8 (SEQ ID strate or inhibitor or magnesium in the active site. Therefore, NO: 37), Psychrobacter arcticus 273-4 (SEQ ID NO: 38), the spinach KARI structure (PDB ID lyve, BiouV. et al. The Hahella cheiuensis KCTC 2396 (SEQ ID NO:39), Thioba EMBO Journal, 16:3405-3415, 1997.), which has magne cillus denitrificans ATCC 25259 (SEQ ID NO: 40), Azoto sium ions, NADPH and inhibitor (N-Hydroxy-N-isopropy bacter vinelandii AVOP (SEQ ID NO: 41), Pseudomonas loXamate) in the acetolacate binding site, was used to model syringae pv. Syringae B728a (SEQID NO: 42), Pseudomo these molecules in the active site. The plant KARI has very nas syringae pv. tomato str. DC3000 (SEQ ID NO: 43), little sequence homology to either PF5- or PAO1 KARI Pseudomonas putida KT2440 (Protein SEQ ID NO: 44), (<20% amino acid identity), however the structures in the Pseudomonas entomophila L48 (SEQ ID NO: 45), active site region of these two KARI enzymes are very simi Pseudomonas mendocinaymp (SEQID NO:46), Pseudomo lar. To overlay the active site of these two KARI structures, nas aeruginosa PAO1 (SEQ ID NO: 16), Bacillus cereus commands LSQ ext, LSQ improve, LSQ mol in the pro ATCC 10987 (SEQID NO:47), Bacillus cereus ATCC 10987 gram O were used to line up the active site of monomer A of (SEQ ID NO: 48), and Spinacia oleracea (SEQID NO: 18). spinach KARI to the monomer A of PF5 KARI model. The I0121. In addition using methods disclosed in this applica coordinates of NADPH, two magnesium ions and the inhibi tion, sequences of Class II KARI enzymes Such as E. coli tor bound in the active site of spinach KARI were extracted (SEQ ID NO: 63-GenBank Accession Number P05793), and incorporated to molecule A of PF5 KARI. A set of the marine gamma Proteobacterium HTCC2207 (SEQ ID NO: coordinates of these molecules were generated for monomer 64-GenBank Accession Number ZP 01224863.1). Desulfil B of PF5 KARI by applying the transformation operator from romonas acetoxidans (SEQID NO: 65-GenBank Accession monomer A to monomer B calculated by the program. Number ZP 01313517.1) and Pisum sativum (pea) (SEQID 0119 Because there is no NADPH in the active site of NO: 66-GenBank Accession Number 082043) could be men PAO1 KARI crystal structure, the structures of the phosphate tioned. binding loop region in the NADPH binding site (residues 0.122 This Profile HMM for KARIs can be used to iden 44-45 in PAO1 KARI, 157-170 in spinach KARI) are very tify any KARI related proteins. Any protein that matches the different between the two. To model the NADPH bound form, Profile HMM with an E value of <10 using hmmsearch the model of the PF5-KARI phosphate binding loop (44-55) program in the HMMER package is expected to be a func was replaced by that of lyve (157-170). Any discrepancy of tional KARI, which can be eithera Class I and Class IIKARI. side chains between these two was converted to those in the Sequences matching the Profile HMM given herein are then PF5-KARI sequence using the mutate replace command in analyzed for the location of the 12 positions in Pseudomonas program 0, and the conformations of the replaced side-chains fluorescens Pf-5 that switches the cofactor from NADPH to were manually adjusted. The entire NADPH/Mg/inhibitor NADH. The eleven nodes, as defined in the section of Profile bound dimeric PF5-KARI model went through one round of HMMbuilding, in the profile HMM representing the columns energy minimization using program CNX (ACCELRYS San in the alignment which correspond to the eleven co-factor Diego Calif., Burnger, A.T. and Warren, G. L., Acta Crystal switching positions in Pseudomonas fluorescens Pf-5 KARI logr., D 54, 905-921, 1998) after which the inhibitor was are identified as node 24, 33, 47, 50, 52, 53, 61,80, 115, 156 replaced by the substrate, acetolactate (AL), in the model. and 170. The lines corresponding to these nodes in the model US 2009/0163376 A1 Jun. 25, 2009

file are identified in Table 9. One skilled in the art will readily described in the commonly owned U.S. patent application be able to identify these 12 positions in the amino acid Ser. No. 1 1/586,315, which is incorporated herein by refer sequence of a KARI protein from the alignment of the CCC. sequence to the profile HMM using hmmsearch program in I0128. The preferred pathway for conversion of pyruvate to HMMER package. isobutanol consists of enzymatic steps “a”, “b”, “c”, “d', and 0123. The KARI enzymes identified by this method, “e' (FIG. 1) and includes the following substrate to product include both Class I and Class II KARI enzymes from either conversions: microbial or plant natural sources. Any KARI identified by 0.129 a) pyruvate to acetolactate, as catalyzed for this method may be used for heterologous expression in example by acetolactate synthase, microbial cells. 0.130 b) (S)-acetolactate to 2,3-dihydroxyisovalerate, 0.124 For example each of the KARI encoding nucleic as catalyzed for example by acetohydroxy acid isomer acid fragments described herein may be used to isolate genes oreductase, encoding homologous proteins. Isolation of homologous 0131 c) 2.3-dihydroxyisovalerate to C.-ketoisovalerate, genes using sequence-dependent protocols is well known in as catalyzed for example by acetohydroxy acid dehy the art. Examples of sequence-dependent protocols include, dratase, but are not limited to: 1) methods of nucleic acid hybridiza 0.132. d) C.-ketoisovalerate to isobutyraldehyde, as cata tion; 2) methods of DNA and RNA amplification, as exem lyzed for example by a branched-chain keto acid decar plified by various uses of nucleic acid amplification technolo boxylase, and gies e.g., polymerase chain reaction (PCR), Mullis et al., 0.133 e) isobutyraldehyde to isobutanol, as catalyzed U.S. Pat. No. 4,683.202; chain reaction (LCR), Tabor, for example by, a branched-chain alcohol dehydroge S. et al., Proc. Acad. Sci. USA 82:1074 (1985); or strand aSC. displacement amplification (SDA), Walker, et al., Proc. Natl. I0134. This pathway combines enzymes involved in well Acad. Sci. U.S.A., 89:392 (1992); and 3.) methods of library characterized pathways for valine biosynthesis (pyruvate to construction and Screening by complementation. C.-ketoisovalerate) and Valine catabolism O-ketoisovalerate to 0.125. Although the sequence homology between Class I isobutanol). Since many valine biosynthetic enzymes also and Class II KARI enzymes is low, the three dimensional catalyze analogous reactions in the isoleucine biosynthetic structure of both Classes of the enzymes, particularly around pathway, Substrate specificity is a major consideration in the active site and nucleotide binding domains is highly con selecting the gene sources. For this reason, the primary genes served (Tygai, R., et al., Protein Science, 34:399-408, 2001). of interest for the acetolactate synthase enzyme are those The key amino acid residues that make up the substrate bind from Bacillus (alsS) and Klebsiella (budB). These particular ing pocket are highly conserved between these two Classes acetolactate synthases are known to participate in butanediol even though they may not align well in a simple sequence fermentation in these organisms and show increased affinity comparison. It can therefore be concluded that the residues for pyruvate over ketobutyrate (Gollop et al., J. Bacteriol. affecting cofactor specificity identified in Class IKARI (e.g., 172,3444-3449, 1990); and (Holtzclaw et al., J. Bacteriol. positions 24, 33, 47, 50, 52, 53, 61,80, 115, 156, and 170 of 121,917-922, 1975). The second and third pathway steps are PF5 KARI) can be extended to Class II KARI enzymes. catalyzed by acetohydroxy acid reductoisomerase and dehy dratase, respectively. These enzymes have been characterized Isobutanol Biosynthetic Pathways from a number of sources, such as for example, E. coli (Chun duru et al., Biochemistry 28,486-493, 1989); and (Flint et al., 0126 Carbohydrate utilizing microorganisms employ the J. Biol. Chem. 268, 14732-14742, 1993). The final two steps Embden-Meyerhof-Parnas (EMP) pathway, the Entner and of the preferred isobutanol pathway are known to occur in Doudoroff pathway and the pentose phosphate cycle as the yeast, which can use valine as a nitrogen Source and, in the central, metabolic routes to provide energy and cellular pre process, secrete isobutanol. C.-Ketoiso-Valerate can be con cursors for growth and maintenance. These pathways have in verted to isobutyraldehyde by a number of keto acid decar common the intermediate glyceraldehyde-3-phosphate and, boxylase enzymes. Such as for example pyruvate decarboxy ultimately, pyruvate is formed directly or in combination with lase. To prevent misdirection of pyruvate away from the EMP pathway. Subsequently, pyruvate is transformed to isobutanol production, a decarboxylase with decreased affin acetyl-cofactor A (acetyl-CoA) via a variety of means. ity for pyruvate is desired. So far, there are two such enzymes Acetyl-CoA serves as a key intermediate, for example, in known in the art (Smit et al., Appl. Environ. Microbiol., 71, generating fatty acids, amino acids and secondary metabo 303-311, 2005); and (de la Plaza et al., FEMS Microbiol. lites. The combined reactions of Sugar conversion to pyruvate Lett., 238, 367-374, 2004). Both enzymes are from strains of produce energy (e.g. adenosine-5'-triphosphate, ATP) and Lactococcus lactis and have a 50-200-fold preference for reducing equivalents (e.g. reduced nicotinamide adenine ketoisovalerate over pyruvate. Finally, a number of aldehyde dinucleotide, NADH, and reduced nicotinamide adenine reductases have been identified in yeast, many with overlap dinucleotide phosphate, NADPH). NADH and NADPH must ping Substrate specificity. Those known to prefer branched be recycled to their oxidized forms (NAD" and NADP", chain substrates over acetaldehyde include, but are not lim respectively). In the presence of inorganic electron acceptors ited to, alcoholdehydrogenase VI (ADH6) and Ypr1p (Larroy (e.g. O., NO, and SO), the reducing equivalents may be et al., Biochem.J. 361,163-172, 2002); and (Ford et al., Yeast used to augment the energy pool; alternatively, a reduced 19, 1087-1096, 2002), both of which use NADPH as electron carbon byproduct may be formed. donor. An NADPH-dependent reductase, YohD, active with 0127. There are four potential pathways for production of branched-chain substrates has also been recently identified in isobutanol from carbohydrate sources with recombinant E. coli (Sulzenbacheret al., J. Mol. Biol. 342,489-502, 2004). microorganisms as shown in FIG. 1. All potential pathways 0.135 Two of the other potential pathways for isobutanol for conversion of carbohydrates to isobutanol have been production also contain the initial three steps of “a”, “b” and US 2009/0163376 A1 Jun. 25, 2009

“c” (FIG. 1). One pathway consists of enzymatic steps “a”, ance of the strain. The intrinsic tolerance of microbes to “b”, “c”, “f”, “g”, “e” (FIG. 1). Step “f containing a isobutanol may be measured by determining the concentra “branched-chain keto acid dehydrogenase” with an EC num tion of isobutanol that is responsible for 50% inhibition of the ber 1.2.4.4. Step 'g' containing an 'acylating aldehyde dehy growth rate (ICs) when grown in a minimal medium. The drogenase” with a EC numbers 1.2.1.10 and 1.2.1.57 in addi ICso values may be determined using methods known in the tion to step 'e' containing the “branched chain alcohol art. For example, the microbes of interest may be grown in the dehydrogenase'. The other potential pathway consists of presence of various amounts of isobutanol and the growth rate steps “a”, “b”, “c”, “h”, “.”, (FIG. 1). The term “transami monitored by measuring the optical density at 600 nanom nase” (step “h”) EC numbers 2.6.1.42 and 2.6.1.66. Step “h” eters. The doubling time may be calculated from the logarith consists of either a “valine dehydrogenase' with EC numbers mic part of the growth curve and used as a measure of the 1.4.1.8 and 1.4.1.9 or step “i’, a “valine decarboxylase' with growth rate. The concentration of isobutanol that produces an EC number 4.1.1.14. Finally step '' will use an “omega 50% inhibition of growth may be determined from a graph of transaminase” with an EC number 2.6.1.18 to generate isobu the percent inhibition of growth versus the isobutanol con tyraldehyde which will be reduced by step 'e' to produce centration. Preferably, the host strain should have an ICs for isobutanol. All potential pathways for conversion of pyruvate isobutanol of greater than about 0.5%. to isobutanol are depicted in FIG. 1. 0.141. The microbial host for isobutanol production should 0.136 Additionally, a number of organisms are known to also utilize glucose at a high rate. Most microbes are capable produce butyrate and/or butanol via a butyryl-CoA interme of utilizing carbohydrates. However, certain environmental diate (Dürre, et al., FEMS Microbiol. Rev. 17, 251-262, microbes cannot utilize carbohydrates to high efficiency, and 1995); and (Abbad-Andaloussi et al., Microbiology 142, therefore would not be suitable hosts. 1149-1158, 1996). Therefore isobutanol production in these 0142. The ability to genetically modify the host is essen organisms will take place using steps 'k', 'g' and 'e' shown tial for the production of any recombinant microorganism. in FIG.1. Step“k' will use an "isobutyryl-CoA mutase' with The mode of gene transfer technology may be by electropo an EC number 5.4.99.13. The nest step will involve using the ration, conjugation, transduction or natural transformation. A “acylating aldehyde dehydrogenase” with the EC numbers broad range of host conjugative plasmids and drug resistance 1.2.1.10 and 1.2.1.57 to produce isobutyraldehyde followed markers are available. The cloning vectors are tailored to the by enzymatic step 'e' to produce isobutanol. All these path host organisms based on the nature of antibiotic resistance ways are fully described in the commonly owned patent markers that can function in that host. application CL3243 Herein incorporated by reference. 0143. The microbial host also has to be manipulated in 0.137 Thus, in providing multiple recombinant pathways order to inactivate competing pathways for carbon flow by from pyruvate to isobutanol, there exist a number of choices deleting various genes. This requires the availability of either to fulfill the individual conversion steps, and the person of transposons to direct inactivation or chromosomal integration skill in the art will be able to utilize publicly available vectors. Additionally, the production host should be ame sequences to construct the relevant pathways. nable to chemical mutagenesis so that mutations to improve intrinsic isobutanol tolerance may be obtained. Microbial Hosts for Isobutanol Production 0144. Based on the criteria described above, suitable 0138 Microbial hosts for isobutanol production may be microbial hosts for the production of isobutanol include, but selected from bacteria, cyanobacteria, filamentous fungi and are not limited to, members of the genera Clostridium, yeasts. The microbial host used for isobutanol production Zymomonas, Escherichia, Salmonella, Rhodococcus, should be tolerant to isobutanol so that the yield is not limited Pseudomonas, Bacillus, Vibrio, Lactobacillus, Enterococcus, by butanol toxicity. Microbes that are metabolically active at Alcaligenes, Klebsiella, Paenibacillus, Arthrobacter, high titer levels of isobutanol are not well known in the art. Corynebacterium, Brevibacterium, Pichia, Candida, Although butanol-tolerant mutants have been isolated from Hansenula and Saccharomyces. Preferred hosts include: solventogenic Clostridia, little information is available con Escherichia coli, Alcaligenes eutrophus, Bacillus lichenifon cerning the butanol tolerance of other potentially useful bac nis, Paenibacillus macerans, Rhodococcus erythropolis, terial strains. Most of the studies on the comparison of alcohol Pseudomonas putida, Lactobacillus plantarum, Enterococ tolerance in bacteria Suggest that butanol is more toxic than cus faecium, Enterococcus gallinarium, Enterococcus faeca ethanol (de Cavalho, et al., Microsc. Res. Tech. 64, 215-22, lis, Bacillus subtilis and Saccharomyces cerevisiae. 2004) and (Kabelitz, et al., FEMS Microbiol. Lett. 220, 223 227, 2003, Tomas, et al. J. Bacteriol. 186, 2006-2018, 2004) Construction of Production Host report that the yield of 1-butanol during fermentation in 0145 Recombinant organisms containing the necessary Clostridium acetobutylicum may be limited by 1-butanol tox genes that will encode the enzymatic pathway for the conver icity. The primary effect of 1-butanol on Clostridium aceto sion of a fermentable carbon substrate to isobutanol may be butylicum is disruption of membrane functions (Hermann et constructed using techniques well known in the art. In the al., Appl. Environ. Microbiol. 50, 1238-1243, 1985). present invention, genes encoding the enzymes of one of the 0.139. The microbial hosts selected for the production of isobutanol biosynthetic pathways of the invention, for isobutanol should be tolerant to isobutanol and should be able example, acetolactate synthase, acetohydroxy acid isomer to convert carbohydrates to isobutanol. The criteria for selec oreductase, acetohydroxy acid dehydratase, branched-chain tion of suitable microbial hosts include the following: intrin C.-keto acid decarboxylase, and branched-chain alcohol sic tolerance to isobutanol, high rate of glucose utilization, dehydrogenase, may be isolated from various sources, as availability of genetic tools for gene manipulation, and the described above. ability to generate stable chromosomal alterations. 0146 Methods of obtaining desired genes from a bacterial 0140) Suitable host strains with a tolerance for isobutanol genome are common and well known in the art of molecular may be identified by screening based on the intrinsic toler biology. For example, if the sequence of the gene is known, US 2009/0163376 A1 Jun. 25, 2009

Suitable genomic libraries may be created by restriction endo (Maguin et al., J. Bacteriol. 174, 5633-5638, 1992). Addi nuclease digestion and may be screened with probes comple tionally, in vitro transposomes are available to create random mentary to the desired gene sequence. Once the sequence is mutations in a variety of genomes from commercial sources isolated, the DNA may be amplified using standard primer Such as EPICENTRE(R). directed amplification methods such as polymerase chain 0152 The expression of an isobutanol biosynthetic path reaction (U.S. Pat. No. 4,683.202) to obtain amounts of DNA way in various preferred microbial hosts is described in more Suitable for transformation using appropriate vectors. Tools detail below. for codon optimization for expression in a heterologous host are readily available. Some tools for codon optimization are Expression of an Isobutanol Biosynthetic Pathway in E. Coli available based on the GC content of the host organism. 0153 Vectors or cassettes useful for the transformation of 0147 Once the relevant pathway genes are identified and E. coli are common and commercially available from the isolated they may be transformed into Suitable expression companies listed above. For example, the genes of an isobu hosts by means well known in the art. Vectors or cassettes tanol biosynthetic pathway may be isolated from various useful for the transformation of a variety of host cells are sources, cloned into a modified puC19 vector and trans common and commercially available from companies such as formed into E. coli NM522. EPICENTRE(R) (Madison, Wis.), Invitrogen Corp. (Carlsbad, Calif.), Stratagene (La Jolla, Calif.), and New England Expression of an Isobutanol Biosynthetic Pathway in Rhodo Biolabs, Inc. (Beverly, Mass.). Typically the vector orcassette coccus Erythropolis contains sequences directing transcription and translation of the relevant gene, a selectable marker, and sequences allow 0154) A series of E. coli-Rhodococcus shuttle vectors are ing autonomous replication or chromosomal integration. available for expression in R. erythropolis, including, but not Suitable vectors comprise a region 5' of the gene which har limited to, pRhBR17 and pl)A71 (Kostichka et al., Appl. bors transcriptional initiation controls and a region 3' of the Microbiol. Biotechnol. 62, 61-68, 2003). Additionally, a DNA fragment which controls transcriptional termination. series of promoters are available for heterologous gene Both control regions may be derived from genes homologous expression in R. erythropolis (Nakashima et al., Appl. Envi to the transformed host cell, although it is to be understood ron. Microbiol. 70, 5557-5568, 2004 and Tao et al., Appl. that Such control regions may also be derived from genes that Microbiol. Biotechnol. 68, 346-354, 2005). Targeted gene are not native to the specific species chosen as a production disruption of chromosomal genes in R. erythropolis may be host. created using the method described by Tao et al., Supra, and 0148. Initiation control regions or promoters, which are Brans et al. (Appl. Environ. Microbiol. 66,2029-2036, 2000). useful to drive expression of the relevant pathway coding 0155 The heterologous genes required for the production regions in the desired host cell are numerous and familiar to of isobutanol, as described above, may be cloned initially in those skilled in the art. Virtually any promoter capable of pDA71 or pRhBR71 and transformed into E. coli. The vectors driving these genetic elements is suitable for the present may then be transformed into R. erythropolis by electropora invention including, but not limited to, CYC1, HIS3, GAL1, tion, as described by Kostichka et al., supra. The recombi GAL10, ADH1, PGK, PHO5, GAPDH, ADC1, TRP1, nants may be grown in synthetic medium containing glucose URA3, LEU2, ENO, TPI (useful for expression in Saccharo and the production of isobutanol can be followed using meth myces); AOX1 (useful for expression in Pichia); and lac, ara, ods known in the art. tet, trp, IP, IP, T7, tac, and trc (useful for expression in Escherichia coli, Alcaligenes, and Pseudomonas) as well as Expression of an Isobutanol Biosynthetic Pathway in B. Sub the amy, apr, npr promoters and various phage promoters tilis useful for expression in Bacillus subtilis, Bacillus lichenifor 0156 Methods for gene expression and creation of muta mis, and Paenibacillus macerans. tions in B. subtilis are also well known in the art. For example, 0149 Termination control regions may also be derived the genes of an isobutanol biosynthetic pathway may be iso from various genes native to the preferred hosts. Optionally, a lated from various sources, cloned into a modified pUC19 termination site may be unnecessary, however, it is most vector and transformed into Bacillus subtilis BE 1010. Addi preferred if included. tionally, the five genes of an isobutanol biosynthetic pathway 0150 Certain vectors are capable of replicating in a broad can be split into two operons for expression. The three genes range of host bacteria and can be transferred by conjugation. of the pathway (bubE, ilvD, and kiv D) can be integrated into The complete and annotated sequence of pRK404 and three the chromosome of Bacillus subtilis BE 1010 (Payne, et al., J. related vectors-pRK437, pRK442, and pRK442(H) are avail Bacteriol. 173, 2278-2282, 1991). The remaining two genes able. These derivatives have proven to be valuable tools for (ilvC and bdhB) can be cloned into an expression vector and genetic manipulation in Gram-negative bacteria (Scott et al., transformed into the Bacillus strain carrying the integrated Plasmid 50, 74-79, 2003). Several plasmid derivatives of isobutanol genes broad-host-range Inc P4 plasmid RSF 1010 are also available with promoters that can function in a range of Gram-negative Expression of an Isobutanol Biosynthetic Pathway in B. bacteria. Plasmid pAYC36 and pAYC37, have active promot Licheniformis ers along with multiple cloning sites to allow for the heter 0157 Most of the plasmids and shuttle vectors that repli ologous gene expression in Gram-negative bacteria. cate in B. subtilis may be used to transform B. licheniformis 0151 Chromosomal gene replacement tools are also by either protoplast transformation or electroporation. The widely available. For example, a thermosensitive variant of genes required for the production of isobutanol may be cloned the broad-host-range replicon pWV 101 has been modified to in plasmids pBE20 or pBE60 derivatives (Nagarajan et al., construct a plasmid pVE6002 which can be used to effect Gene 114,121-126, 1992). Methods to transform B. licheni gene replacement in a range of Gram-positive bacteria formis are known in the art (Fleming et al. Appl. Environ. US 2009/0163376 A1 Jun. 25, 2009

Microbiol. 61,3775-3780, 1995). The plasmids constructed 231, 1993); pMBB1 and pHW800, a derivative of pMBB1 for expression in B. subtilis may be transformed into B. (Wyckoff et al., Appl. Environ. Microbiol. 62, 1481-1486, licheniformis to produce a recombinant microbial host that 1996); pMG1, a conjugative plasmid (Tanimoto et al., J. produces isobutanol. Bacteriol. 184,5800-5804, 2002); pNZ9520 (Kleerebezemet al., Appl. Environ. Microbiol. 63, 4581-4584, 1997); Expression of an Isobutanol Biosynthetic Pathway in Paeni pAM401 (Fujimoto et al., Appl. Environ. Microbiol. 67, bacillus Macerans 1262-1267, 2001); and pAT392 (Arthur et al., Antimicrob. 0158 Plasmids may be constructed as described above for Agents Chemother: 38, 1899-1903, 1994). Several plasmids expression in B. subtilis and used to transform Paenibacillus from Lactobacillus plantarum have also been reported (van macerans by protoplast transformation to produce a recom Kranenburg R, et al. Appl. Environ. Microbiol. 71, 1223 binant microbial host that produces isobutanol. 1230, 2005). Expression of an Isobutanol Biosynthetic Pathway in Various Expression of the Isobutanol Biosynthetic Pathway in Alcali Enterococcus species (E. faecium, E. gallinarium, and E. genes (Ralstonia) Eutrophus faecalis) 0159 Methods for gene expression and creation of muta 0163 The Enterococcus genus belongs to the Lactobacil tions in Alcaligenes eutrophus are known in the art (Taghavi lus family and many plasmids and vectors used in the trans et al., Appl. Environ. Microbiol., 60,3585-3591, 1994). The formation of Lactobacilli, Bacilli and Streptococci species genes for an isobutanol biosynthetic pathway may be cloned may be used for Enterococcus species. Non-limiting in any of the broad host range vectors described above, and examples of suitable vectors include paMB1 and derivatives electroporated to generate recombinants that produce isobu thereof (Renault et al., Gene 183,175-182, 1996); and tanol. The poly(hydroxybutyrate) pathway in Alcaligenes has (O'Sullivan et al., Gene 137, 227-231, 1993); pMBB1 and been described in detail, a variety of genetic techniques to pHW800, a derivative of pMBB1 (Wyckoffetal. Appl. Envi modify the Alcaligenes eutrophus genome is known, and ron. Microbiol. 62, 1481-1486, 1996); pMG1, a conjugative those tools can be applied for engineering an isobutanol bio plasmid (Tanimoto et al., J. Bacteriol. 184, 5800-5804. synthetic pathway. 2002); pNZ9520 (Kleerebezem et al., Appl. Environ. Micro Expression of an isobutanol biosynthetic pathway in biol. 63, 4581-4584, 1997); pAM401 (Fujimoto et al., Appl. Pseudomonas putida Environ. Microbiol. 67, 1262-1267, 2001); and p AT392 0160 Methods for gene expression in Pseudomonas (Arthur et al., Antimicrob. Agents Chemother: 38, 1899-1903, putida are known in the art (see for example Ben-Bassatet al., 1994). Expression vectors for E. faecalis using the nis A gene U.S. Pat. No. 6,586.229, which is incorporated herein by from Lactococcus may also be used (Eichenbaum et al., Appl. reference). The butanol pathway genes may be inserted into Environ. Microbiol. 64,2763-2769, 1998). Additionally, vec pPCU18 and this ligated DNA may be electroporated into tors for gene replacement in the E. faecium chromosome may electrocompetent Pseudomonas putida DOT-T1C5aAR1 be used (Nallaapareddy et al., Appl. Environ. Microbiol. 72, cells to generate recombinants that produce isobutanol. 334-345, 2006). Expression of an Isobutanol Biosynthetic Pathway in Saccha Fermentation Media romyces Cerevisiae 0164. Fermentation media in the present invention must 0161 Methods for gene expression in Saccharomyces cer contain suitable carbon substrates. Suitable substrates may evisiae are known in the art (e.g., Methods in Enzymology, include but are not limited to monosaccharides such as glu Volume 194, Guide to Yeast Genetics and Molecular and Cell cose and fructose, oligosaccharides such as lactose or Biology, Part A, 2004, Christine Guthrie and Gerald R. Fink, Sucrose, polysaccharides such as starch or cellulose or mix eds. Elsevier Academic Press, San Diego, Calif.). Expression tures thereof and unpurified mixtures from renewable feed of genes in yeast typically requires a promoter, followed by stocks Such as cheese whey permeate, cornsteep liquor, Sugar the gene of interest, and a transcriptional terminator. A num beet molasses, and barley malt. Additionally the carbon sub ber of yeast promoters can be used in constructing expression strate may also be one-carbon Substrates such as carbon diox cassettes for genes encoding an isobutanol biosynthetic path ide, or methanol for which metabolic conversion into key way, including, but not limited to constitutive promoters biochemical intermediates has been demonstrated. In addi FBA, GPD, ADH1, and GPM, and the inducible promoters tion to one and two carbon Substrates methylotrophic organ GAL1, GAL10, and CUP1. Suitable transcriptional termina isms are also known to utilize a number of other carbon tors include, but are not limited to FBAt, GPDt, GPMt, containing compounds such as methylamine, glucosamine ERG10t, GAL1t, CYC1, and ADH1. For example, suitable and a variety of amino acids for metabolic activity. For promoters, transcriptional terminators, and the genes of an example, methylotrophic yeast are known to utilize the car isobutanol biosynthetic pathway may be cloned into E. coli bon from methylamine to form trehalose or glycerol (Bellion yeast shuttle vectors. et al., Microb. Growth C1 Compa. Int. Symp., 7th (1993), 415-32. (eds): Murrell, J. Collin; Kelly, Don P. Publisher: Expression of an Isobutanol Biosynthetic Pathway in Lacto Intercept, Andover, UK). Similarly, various species of Can bacillus Plantarum dida will metabolizealanine or oleic acid (Sulter et al., Arch. 0162 The Lactobacillus genus belongs to the Lactobacil Microbiol. 153,485-489, 1990). Hence it is contemplated that lus family and many plasmids and vectors used in the trans the source of carbon utilized in the present invention may formation of Bacillus subtilis and Streptococcus may be used encompassa wide variety of carbon containing Substrates and for lactobacillus. Non-limiting examples of suitable vectors will only be limited by the choice of organism. include p AMB1 and derivatives thereof (Renault et al., Gene 0.165 Although it is contemplated that all of the above 183,175-182, 1996); and (O'Sullivan et al., Gene 137, 227 mentioned carbon Substrates and mixtures thereof are Suit US 2009/0163376 A1 Jun. 25, 2009

able in the present invention, preferred carbon substrates are ogy: A Textbook of Industrial Microbiology, Second Edition glucose, fructose, and Sucrose. (1989) Sinauer Associates, Inc., Sunderland, Mass., or Desh 0166 In addition to an appropriate carbon source, fermen pande, Mukund (Appl. Biochem. Biotechnol., 36,227, 1992), tation media must contain Suitable minerals, salts, cofactors, herein incorporated by reference. buffers and other components, known to those skilled in the 0172 Although the present invention is performed in art, suitable for the growth of the cultures and promotion of batch mode it is contemplated that the method would be the enzymatic pathway necessary for isobutanol production. adaptable to continuous fermentation methods. Continuous fermentation is an open system where a defined fermentation Culture Conditions medium is added continuously to a bioreactor and an equal amount of conditioned media is removed simultaneously for 0167 Typically cells are grown at a temperature in the processing. Continuous fermentation generally maintains the range of about 25° C. to about 40° C. in an appropriate cultures at a constant high density where cells are primarily in medium. Suitable growth media in the present invention are log phase growth. common commercially prepared media Such as Luria Bertani (0173 Continuous fermentation allows for modulation of (LB) broth, Sabouraud Dextrose (SD) broth orYeast Medium one factor or any number of factors that affect cell growth or (YM) broth. Other defined or synthetic growth media may end product concentration. For example, one method will also be used, and the appropriate medium for growth of the maintain a limiting nutrient such as the carbon Source or particular microorganism will be known by one skilled in the nitrogen level at a fixed rate and allow all other parameters to art of microbiology or fermentation Science. The use of moderate. In other systems a number of factors affecting agents known to modulate catabolite repression directly or growth can be altered continuously while the cell concentra indirectly, e.g., cyclic adenosine 2',3'-monophosphate tion, measured by media turbidity, is kept constant. Continu (cAMP), may also be incorporated into the fermentation ous systems strive to maintain steady state growth conditions medium. and thus the cell loss due to the medium being drawn off must 0168 Suitable pH ranges for the fermentation are between be balanced against the cell growth rate in the fermentation. pH 5.0 to pH 9.0, where pH 6.0 to pH 8.0 is preferred for the Methods of modulating nutrients and growth factors for con initial condition. tinuous fermentation processes as well as techniques for 0169. Fermentations may be performed under aerobic or maximizing the rate of product formation are well known in anaerobic conditions, where anaerobic or microaerobic con the art of industrial microbiology and a variety of methods are ditions are preferred. detailed by Brock, supra. 0.174. It is contemplated that the present invention may be Industrial Batch and Continuous Fermentations practiced using either batch, fed-batch or continuous pro 0170 The present process employs a batch method of cesses and that any known mode of fermentation would be fermentation. A classical batch fermentation is a closed sys suitable. Additionally, it is contemplated that cells may be tem where the composition of the medium is set at the begin immobilized on a substrate as whole cell catalysts and sub ning of the fermentation and not subject to artificial alter jected to fermentation conditions for isobutanol production. ations during the fermentation. Thus, at the beginning of the Methods for Isobutanol Isolation from the Fermentation fermentation the medium is inoculated with the desired Medium organism or organisms, and fermentation is permitted to 0.175. The biologically produced isobutanol may be iso occur without adding anything to the system. Typically, how lated from the fermentation medium using methods known in ever, a “batch” fermentation is batch with respect to the addi the art for Acetone-butanol-ethanol (ABE) fermentations (see tion of carbon source and attempts are often made at control for example, Durre. Appl. Microbiol. Biotechnol. 49, 639 ling factors such as pH and oxygen concentration. In batch 648, 1998), and (Groot et al., Process. Biochem. 27, 61-75, systems the metabolite and biomass compositions of the sys 1992 and references therein). For example, solids may be tem change constantly up to the time the fermentation is removed from the fermentation medium by centrifugation, stopped. Within batch cultures cells moderate through a static filtration, decantation and isobutanol may be isolated from lag phase to a high growth log phase and finally to a stationary the fermentation medium using methods such as distillation, phase where growth rate is diminished or halted. Ifuntreated, azeotropic distillation, liquid-liquid extraction, adsorption, cells in the stationary phase will eventually die. Cells in log gas stripping, membrane evaporation, or pervaporation. phase generally are responsible for the bulk of production of end product or intermediate. EXAMPLES 0171 A variation on the standard batch system is the Fed 0176 The present invention is further defined in the fol Batch system. Fed-Batch fermentation processes are also lowing Examples. It should be understood that these Suitable in the present invention and comprise a typical batch Examples, while indicating preferred embodiments of the system with the exception that the substrate is added in incre ments as the fermentation progresses. Fed-Batch systems are invention, are given by way of illustration only. From the useful when catabolite repression is apt to inhibit the metabo above discussion and these Examples, one skilled in the art lism of the cells and where it is desirable to have limited can ascertain the essential characteristics of this invention, amounts of substrate in the media. Measurement of the actual and without departing from the spirit and scope thereof, can substrate concentration in Fed-Batch systems is difficult and make various changes and modifications of the invention to is therefore estimated on the basis of the changes of measur adapt it to various uses and conditions. able factors such as pH, dissolved oxygen and the partial General Methods: pressure of waste gases Such as CO. Batch and Fed-Batch fermentations are common and well known in the art and 0177 Standard recombinant DNA and molecular cloning examples may be found in Thomas D. Brock in Biotechnol techniques used in the Examples are well known in the art and US 2009/0163376 A1 Jun. 25, 2009

are described by Sambrook, J., Fritsch, E. F. and Maniatis, T., the threonine at position 52 was changed to aspartic acid. For Molecular Cloning: A Laboratory Manual, Cold Spring Har library H, mutant C3B11 (R47F/S50A/T52D/v53W of PF5 bor Laboratory Press, Cold Spring Harbor, N.Y., 1989, by T. ilvC) was used as the positive control. J. Silhavy, M. L. Bennan, and L. W. Enquist, Experiments 0182 Clones from archive plates were inoculated into the with Gene Fusions, Cold Spring Harbor Laboratory, Cold 96-deep well plates. Each well contained 3.0 ul of cells from Spring Harbor, N.Y., 1984, and by Ausubel, F. M. et al., thawed archive plates, 300 ul of the LB medium containing Current Protocols in Molecular Biology, Greene Publishing 100 lug/ml amplicillin and 0.02% (w/v) arabinose as the Assoc. and Wiley-Interscience, N.Y., 1987. Materials and inducer. Cells were the grown overnight at 37°C. with 80% Methods suitable for the maintenance and growth of bacterial humidity while shaking (900 rpm), harvested by centrifuga cultures are also well known in the art. Techniques suitable tion (4000 rpm, 5 min at 25° C.). (Eppendorf centrifuge, for use in the following Examples may be found in Manual of Brinkmann Instruments, Inc. Westbury, N.Y.) and the cell Methods for General Bacteriology, Philipp Gerhardt, R. G. pellet was stored at -20°C. for later analysis. E. Murray, Ralph N. Costilow, Eugene W. Nester, Willis A. 0183 The assay substrate. (R.S)-acetolactate, was synthe Wood, Noel R. Krieg and G. Briggs Phillips, eds. American sized as described by Aulabaugh and Schloss (Aulabaugh and Society for Microbiology, Washington, D.C., 1994, or by Schloss, Biochemistry, 29, 2824-2830, 1990): 1.0 g of 2-ac Thomas D. Brock in Biotechnology: A Textbook of Industrial etoxy-2-methyl-3-oxobutyric acid ethyl ester (Aldrich, Mil Microbiology, Second Edition, Sinauer Associates, Inc., Sun waukee, Wis.) was mixed with 10 ml NaOH (1.0 M) and derland, Mass., 1989. All reagents, restriction enzymes and stirred at room temperature. When the solution's pH became materials used for the growth and maintenance of bacterial neutral, additional NaOH was slowly added until pH-8.0 was cells were obtained from Aldrich Chemicals (Milwaukee, maintained. All other chemicals used in the assay were pur Wis.), BD Diagnostic Systems (Sparks, Md.), Life Technolo chased from Sigma. gies (Rockville, Md.), or Sigma Chemical Company (St. 0.184 The enzymatic conversion of acetolactate to C-di Louis, Mo.), unless otherwise specified. hydroxy-isovalerate by KARI was followed by measuring the 0.178 The meaning of abbreviations used is as follows: disappearance of the cofactor, NADPH or NADH, from the “A” means Angstrom, “min” means minute(s), “h” means reaction at 340 nm using a plate reader (Molecular Device, hour(s), “ul' means microliter(s), “ngful' means nanogram Sunnyvale, Calif.). The activity was calculated using the per microliter, “pmol/ul' means pico mole per microliter, molar extinction coefficient of 6220 M cm' for either “ml” means milliliter(s), “L” means liter(s), “g/L mean gram NADPH or NADH. The stock solutions used were: KHPO per liter, “ng” means nano gram, “sec’ means second(s), (0.2 M); KHPO (0.2 M); EDTA (0.5 M); MgCl2 (1.0 M); “ml/min' means milliliter perminute(s), “w/v’ means weight NADPH (2.0 mM); NADH (2.0 mM) and acetolactate (45 per volume, “v/v’ means volume per volume, “nm’ means mM). The 100 ml reaction buffer mix stock containing: 4.8 ml nanometer(s), “mm” means millimeter(s), 'cm’ means cen KHPO, 0.2 ml KHPO, 4.0 ml MgCl, 0.1 ml EDTA and timeter(s) “mM” means millimolar, “M” means molar, 90.9 ml water was prepared. “mmol” means millimole(s), "umole” means micromole(s), 0185. Frozen cell pellet in deep-well plates and BugEBuster 'g' means gram(s), “ug' means microgram(s), 'mg” means were warmed up at room temperature for 30 min at the same milligram(s), ''g' means the gravitation constant, “rpm time. Each well of 96-well assay plates was filled with 120 ul means revolutions per minute, “HPLC' means high perfor of the reaction buffer and 20 ul of NADH (2.0 mM), 150 ul of mance liquid chromatography, “MS’ means mass spectrom BugEBuster was added to each well after 30 min warm-up and etry, “HPLC/MS’ means high performance liquid chroma cells were Suspended using Genmate (Tecan Systems Inc. San tography/mass spectrometry, “EDTA CaS Jose, Calif.) by pipetting the cell Suspension up and down ethylendiamine-tetraacetic acid, “dNTP means deoxynucle (x5). The plates were incubated at room temperature for 20 otide triphosphate. min and then heated at 60° C. for 10 min. The cell debris and 0179 The oligonucleotide primers used in the following protein precipitates were removed by centrifugation at 4,000 Examples have been described herein (see Table 1) rpm for 5 min at 25°C. An aliquot (50 ul) of the supernatant was transferred into each well of 96-well assay plates, the High Throughput Screening Assay of Gene Libraries solution was mixed and the bubbles were removed by cen 0180 High throughput screening of the gene libraries of trifugation at 4000 rpm at 25°C. for 1 min. Absorbance at 340 mutant KARI enzymes was performed as described herein: nm was recorded as background, 20 Jul of acetolactate (4.5 10x freezing medium containing 554.4 g/L glycerol. 68 mM mM, diluted with the reaction buffer) was added to each well of (NH)SO, 4 mM MgSO4, 17 mM sodium citrate, 132 and mixed with shaking by the plate reader. Absorbance at mMKHPO, 36 mMKHPO was prepared with molecular 340 nm was recoded at 0, and 60 minutes after substrate pure water and filter-sterilized. Freezing medium was pre addition. The difference in absorbance (before and after sub pared by diluting the 10x freezing medium with the LB strate addition) was used to determine the activity of the medium. An aliquot (200 l) of the freezing medium was used mutants. Mutants with higher KARI activity compared to the for each well of the 96-well archive plates (cat #3370, Corn wild type were selected for re-screening. ing Inc. Corning, N.Y.). 0186. About 5,000 clones were screened for library C and 0181 Clones from the LB agar plates were selected and 360 top performers were selected for re-screen. About 92 inoculated into the 96-well archive plates containing the clones were screened for library E and 16 top performers were freezing medium and grown overnight at 37° C. without selected for re-screening. About 92 clones were screened for shaking. The archive plates were then stored at -80° C. E. coli library F and 8 top performers were selected for re-screening. strain Bw251 13 transformed with pBAD-HisB (Invitrogen) About 92 clones were screened for library G and 20 top was always used as the negative control. For libraries C, E, F performers were selected for re-screening. About 8,000 and G. mutant T52D of (PF5-ilvC) was used as the positive clones were screened for library H and 62 top performers control. The mutant T52D was a mutant of PF5-ilvC in which were selected for re-screening. US 2009/0163376 A1 Jun. 25, 2009

0187. For library C, about 360 top performers were re 0193 To analyze for 2,3-dihydroxyisovalerate, the sample screened using the same procedure as for the general screen was diluted 10x with water, and 8.0 ul was injected into a ing. Among them, 45 top performers were further selected for Waters Acquity HPLC equipped with Waters SQD mass spec re-screening as described below. trometer (Waters Corporation, Milford, Mass.). The chroma tography conditions were: flow rate (0.5 ml/min), on a Waters Secondary Assay of Active Mutants Acquity HSST3 column (2.1 mm diameter, 100 mm length). Buffer A consisted of 0.1% (v/v) in water, Buffer B was 0.1% 0188 Cells containing pBad-ilvC and its mutants identi formic acid in acetonitrile. The sample was analyzed using fied by high throughput Screening were grown overnight, at 1% buffer B (in buffer A) for 1 min, followed by a linear 37° C., in 3.0 ml of the LB medium containing 100 ug/ml gradient from 1% buffer Bat 1 minto 75% buffer Bat 1.5 min. ampicillin and 0.02% (w/v) arabinose as the inducer while The reaction product, 2.3-dihydroxyiso-Valerate, was shaking at 250 rpm. The cells were then harvested by cen detected by ionization at m/z = 133, using the electrospay ion trifugation at 18,000xg for 1 minat room temperature (Sigma ization devise at -30 V cone voltage. The amount of product micro-centrifuge model 1-15, Laurel, Md.). The cell pellets 2,3-dihydroxyisovalerate was calculated by comparison to an were re-suspended in 300 ul of BugEuster Master Mix (EMD authentic standard. Chemicals). The reaction mixture was first incubated at room (0194 To calculate the K for NADH, the rate data for temperature for 20 min and then heated at 60° C. for 10 min. DHIV formation was plotted in Kaleidagraph (Synergy Soft The cell debris and protein precipitate were removed by cen ware, Reading, Pa.) and fitted to the single substrate Michae trifugation at 18,000xg for 5 min at room temperature. lis-Menton equation, assuming Saturating acetolactate con 0189 The reaction buffer (120 ul) prepared as described centration. above was mixed with either NADH or NADPH (20 ul) stock and cell extract (20 ul) in each well of a 96-well assay plate. Example 1 The absorbance at 340 nm at 25°C. was recorded as back ground. Then 20 ul of acetolactate (4.5 mM, diluted with Construction of Site-Saturation Gene Libraries reaction buffer) was added each well and mixed with shaking by the plate reader. The absorbance at 340 nm at 0 min, 2 min 0.195. Seven gene libraries were constructed (Table 2) and 5 min after adding acetolactate was recorded. The absor using two steps: 1) synthesis of MegaPrimers using commer bance difference before and after adding substrate was used to cially synthesized oligonucleotidies described in Table 1; and determine the activity of the mutants. The mutants with high 2) construction of mutated genes using the MegaPrimers activity were selected for sequencing. obtained in step 1. These primers were prepared using high (0190. Five top performers from “Library C were identi fidelity pfu-ultra polymerase (Stratagene, La Jolla, Calif.) for fied and sequenced (FIG. 4). The best performer was mutant one pair of primer containing one forward and one reverse R47F/S50A/T52D/V53W, which completely reversed the primer. The templates for libraries C, E, F, G and H were the nucleotide specificity. The best performers from “Libraries E. wildtype of PF5 ilvc. The DNA templates for library N were F and G” were R47P. S50D and T52D respectively (FIG. 5). those mutants having detectable NADH activity from library For “Library H. 5 top performers were identified and C while those for library 0 were those mutants having detect sequenced (FIG. 6) and the best performer was R47P/S50G/ able NADH activity from library H. A 50 ul reaction mixture T52D, which also completely reversed the nucleotide speci contained: 5.0 ul of 10x reaction buffer supplied with the ficity. Enzymes containing activities higher than the back pfu-ultra polymerase (Stratagene), 1.0 ul of 50 ng/ul tem ground were considered positive. plate, 1.0 ul each of 10 pmol/ul forward and reverse primers, 1.0 ul of 40 mM dNTP mix (Promega, Madison, Wis.), 1.0 ul Kari Enzyme Assay pfu-ultra DNA polymerase (Stratagene) and 39 ul water. The mixture was placed in a thin well 200 ul tube for the PCR 0191 KARI enzyme activity can be routinely measured reaction in a Mastercycler gradient equipment (Brinkmann by NADH or NADPH oxidation as described above, however Instruments, Inc. Westbury, N.Y.). The following conditions to measure formation of the 2,3-dihydroxyisovalerate prod were used for the PCR reaction: The starting temperature was uct directly, analysis of the reaction was performed using 95°C. for 30 sec followed by 30 heating/cooling cycles. Each LC/MS. cycle consisted of 95°C. for 30 sec, 54° C. for 1 min, and 70° 0192 Protein concentration of crude cell extract from C. for 2 min. At the completion of the temperature cycling, the Bugbuster lysed cells (as described above) was measured samples were kept at 70° C. for 4 min more, and then held using the BioRad protein assay reagent (BioRad Laborato awaiting sample recovery at 4°C. The PCR product was ries, Inc., Hercules, Calif. 94547). A total of 0.5 micrograms cleaned up using a DNA cleaning kit (Catil D4003, Zymo of crude extract protein was added to a reaction buffer con Research, Orange, Calif.) as recommended by the manufac sisting of 100 mM HEPES-KOH, pH 7.5, 10 mM MgCl, 1 turer. mM glucose-6-phosphate (Sigma-Aldrich), 0.2 Units of Leu conostoc mesenteroides glucose-6-phosphate dehydrogenase TABLE 2 (Sigma-Aldrich), and various concentrations of NADH or NADPH, to a volume of 96 uL. The reaction was initiated by GENELIBRARIES the addition of 4 LL of acetolactate to a final concentration of Library Targeted position(s) 4 mM and a final volume of 100 ul. After timed incubations name Templates of Pfs ilvC Primers used at 30° C., typically between 2 and 15 min, the reaction was C PF5 ilvc 47, 50, 52 and 53 SEQID No: 1 and 2 quenched by the addition of 10 uL of 0.5 M EDTA pH 8.0 E PFS ilvc 47 SEQID No: 1 and 3 (Life Technologies, Grand Island, N.Y. 14072). To measure F PFS ilvc 50 SEQID No: 1 and 4 the Kof NADH, the concentrations used were 0.03, 0.1.0.3, G PFS ilvc 52 SEQID No: 1 and 5 1, 3, and 10 mM. US 2009/0163376 A1 Jun. 25, 2009

TABLE 2-continued TABLE 3-continued

GENE LIBRARIES List of some mutants having NADH activities identified Library Targeted position(s) from saturation libraries name Templates of Pfs ilvC Primers used Mutant Position 47 Position 50 Position 52 Position 53 H PFS ilvc 47, 50, and 52 SEQID No: 1 and 6 N Good mutants 53 SEQID NO: 20 and 21 from library C O Good mutants 53 SEQID NO: 20 and 21 from library H

0196. The MegaPrimers were then used to generate gene 6883G10 R47P SSOD T52S libraries using the QuickChange II XL site directed mutagen 11463D8 R47P SSOF T52D esis kit (Catalog #200524, Stratagene, LaJolla Calif.). A 50 ul reaction mixture contained: 5.0 ul of 10x reaction buffer, 1.0 ul of 50 ng/ul template, 42 ul Megaprimer, 1.0 ul of 40 mM dNTP mix, 1.0 ulpfu-ultra DNA polymerase. Except for the MegaPrimer and the templates, all reagents used here were supplied with the kit indicated above. This reaction mixture was placed in a thin well 200 ul-capacity PCR tube and the following reactions were used for the PCR: The starting tem 11461A2 R47P SSOF T52D V53I perature was 95°C. for 30 sec followed by 25 heating/cooling cycles. Each cycle consisted of 95°C. for 30 sec, 55° C. for 1 min, and 68°C. for 6 min. At the completion of the tempera ture cycling, the samples were kept at 68°C. for 8 min more, Example 2 and then held at 4°C. for later processing. Dpn I restriction enzyme (1.0 ul) (supplied with the kit above) was directly Construction of Error Prone PCR Libraries added to the finished reaction mixture, enzyme digestion was performed at 37°C. for 1 hr and the PCR product was cleaned (0198 Several rounds of error prone PCR (epPCR) librar up using a DNA cleaning kit (Zymo Research). The cleaned ies were created using the GeneMorph II kit (Stratagene) as PCR product (10 ul) contained mutated genes for a gene recommended by the manufacturer. All the epPCR libraries library. target the N-terminal of PF5—KARI. The forward primer 0197) The cleaned PCR product was transformed into an (SED ID No. 20) and the reverse primer (SED ID No. 22) electro-competent strain of E. coli Bw25113 (AilvC) using a were used for all ePCR libraries. BioRad Gene Pulser II (Bio-Rad Laboratories Inc., Hercules, Calif.). The transformed clones were streaked on agar plates (0199 The DNA templates for each epPCR library were containing the Lauria Broth medium and 100 ug/ml amplicil mutants having relatively good NADH activities from the lin (Cati L1004, Teknova Inc. Hollister, Calif.) and incubated previous library. For example: the DNA templates for the n” at 37°C. overnight. Dozens of clones were randomly chosen epPCR library were mutants having good NADH activities for DNA sequencing to confirm the quality of the library. from the (n-1)th epPCR library. The templates of the first epPCR library were mutants having relatively good NADH TABLE 3 activities from libraries NandO. The mutations rate of library List of some mutants having NADH activities identified made by this kit was controlled by the amount of template from saturation libraries added in the reaction mixture and the number of amplification cycles. Typically, 1.0 ng of each DNA template was used in 100 ul of reaction mixture. The number of amplification cycles was 70. The following conditions were used for the PCR reaction: The starting temperature was 95°C. for 30 sec followed by 70 heating/cooling cycles. Each cycle consisted of 95°C. for 30 sec, 55° C. for 30 min, and 70° C. for 2 min. After the first 35 heating/cooling cycles finished, more dNTP and Mutazyme II DNA polymerase were added. The PCR product was cleaned up using a DNA cleaning kit (Cati D4003, Zymo Research, Orange, Calif.) as recom mended by the manufacturer. The cleaned PCR product was treated as megaprimer and introduced into the vector using the Quickchange kit as described in Example 1. Table 4 below lists the KARI mutants obtained and the significant improve ment observed in their binding NADH. The K, was reduced from 1100 uM for mutant C3B11 to 50 uM for mutant 12957G9. US 2009/0163376 A1 Jun. 25, 2009 18

mixture was rapidly withdrawn and added to a 1.5 ml Eppen TABLE 4 dorf tube containing 10 Jul 0.5 mM EDTA to stop the reaction and the actual absorption decrease for each sample was accu List of some mutants with their measured K. values rately recorded. Production of 2,3-dihydroxyisovalerate was measured and quantitated by LC/MS as described above. 0204 The coupling ratio is defined by the ratio between the amount of 2,3-dihydroxyisovalerate (DHIV) produced and the amount of either NADH or NADPH consumed during the experiment. The coupling ratio for the wild type enzyme (PF5-ilvC), using NADPH, was 0.98 DHIV/NADPH, while that for the mutant (C3B11), using NADH, was on average around 1.10 DHIV/NADPH. Example 5 Identification of Amino Acids Involved in Cofactor Example 3 Binding in Kari for Cofactor Specificity Switching Thermostability of PF5-ilvC and its Mutants Using Bioinformatic Tools 0200 Cells containing mutated pBad-ilvC were grown 0205 To discover if naturally existing KARI sequences overnight at 37°C. in 25 ml of the LB medium containing 100 could provide clues for amino acid positions that should be ug/ml amplicillin and 0.02% (w/v) arabinose inducer while targeted for mutagenesis, multiple sequence alignment shaking at 250 rpm. The cells were then harvested by cen (MSA) using PF5 KARI, its close homolog PAO1 KARI trifugation at 18,000xg for 1 minat room temperature and the and three KARI sequences with measurable NADH activity, cell pellets were re-suspended in 300 ul of BugEBuster Master i.e., B. Cereus ilvC1 and ilvC2 and spinach KARI were per Mix (EMD Chemicals). The reaction mixture was first incu formed (FIG. 8). Based on the multiple sequence alignment, bated at room temperature for 20 min and aliquots of this cell positions 33,43, 59, 61,71,80, 101, and 119 were chosen for mixture (e.g. 50 ul) were incubated at different temperatures saturation mutagenesis. Saturation mutagenesis on all of (from room temperature to 75°C.) for 10 min. The precipitate these positions was performed simultaneously using the was removed by centrifugation at 18,000xg for 5 minat room QuickChange II XL site directed mutagenesis kit (Catalog temperature. The remaining activity of the Supernatant was #200524, Stratagene, LaJolla Calif.) with the manufacturer's analyzed as described above. As shown in FIG. 7, pBad-ilvC Suggested protocol. Starting material for this mutagenesis was very stable with Tso equal to 68°C. (Tso is the tempera was a mixed template consisting of the mutants already iden ture, at which 50% of protein lost its activity after 10 min tified in Table 4 and Example 2. The primers used are listed in incubation). Table 5. The library of mutants thus obtained were named 0201 The thermostability of PF5-ilvC allowed destruc library Z". Mutants with good NADH activity from this tion of most of the other non-KARINADH oxidation activity library were identified using high throughput screening and within these cells, reducing the NADH background consump their KARI activity and the K for NADH were measured as tion and thus facilitating the KARI activity assays. This heat described above. These mutants (listed in Table 6) possess treatment protocol was used in all screening and re-screening much lower Ks for NADH compared to the parent templates assays. The mutants thus obtained were all thermostable (Table 4). A megaprimer, using primers (SEQID Nos. 20 and which allowed easier selection of the desirable mutants. 58), was created and mutations at positions 156 and 170 were eliminated. Further screening of this set of mutants identified Example 4 mutant 3361 G8 (SEQ ID NO: 67)(Table 7). The hits from Stoichiometric Production of 2,3-Dihydroxyisovaler library Zwere further subjected to saturation mutagenesis at ate by Kari During Consumption of Cofactors position 53 using primers (SEQ ID Nos. 20 and 21), and NADH or NADPH Subsequent screening identified the remaining mutants in Table 7. As shown in Table 7 the new mutants possessed much 0202 Screening and routine assays of KARI activity rely lower K, for NADH (e.g., 4.0 to 5.5 uM) compared to on the 340 nm absorption decrease associated with oxidation mutants listed in Table 6 (e.g., 14-40M). of the pyridine nucleotides NADPH or NADH. To insure that this metric was coupled to formation of the other reaction TABLE 5 product, oxidation of pyridine nucleotide and formation of 2,3-dihydroxyisovalerate were measured in the same Primers for Example 5 samples. Targeted position (s) 0203 The oxidation of NADH or NADPH was measured of Pf SilvC Primers at 340 nm in a 1 cm path length cuvette on a Agilent model 33 pBAD-405- C33 090808f : 8453 spectrophotometer (Agilent Technologies, Wilmington GCTCAAGCANNKAACCTGAAGG Del.). Crude cell extract (0.1 ml) prepared as described above (SEQ ID NO: 49) containing either wild type PF5 KARI or the C3B11 mutant, pBAD - 427- C33 090808r: was added to 0.9 ml of K-phosphate buffer (10 mM, pH 7.6), CCTTCAGGTTKNNTGCTTGAGC containing 10 mM MgCl, and 0.2 mM of either NADPH or (SEO ID NO: 5O) NADH. The reaction was initiated by the addition of aceto 43 pBAD-435-T43 090808f : lactate to a final concentration of 0.4 mM. After 10-20% decrease in the absorption (about 5 min), 50ll of the reaction US 2009/0163376 A1 Jun. 25, 2009

TABLE 5- continued TABLE 7

Primers for Example 5 Mutants further optimized for improved K, (for NADH NADH Targeted position (s) Mutant Mutation Locations K (IM) of Pf SilvC Primers 3361G8 C33LR47YSSOATS2DVS3AL61FT8OI 5.5 (SEQ ID NO: 51) (SEQID NO: 67) 2H10 Y24FiC33LR47YASSOATS2DVS3IL61FT8OI 5.3 pBAD-456-T43 090808r: A156V (SEQID NO: 68) CAGGCCAACKNNCACGTCTAC 1D2 Y24FR47YSSOATS2D.VS3AL61FT8OIA1S6V 4.1 (SEQ ID NO: 52) (SEQID NO: 69) 3F12 Y24FC33LR47YASSOATS2D.VS3AL61FT8OI 4.0 59 and 61 pBAD - 484-H59L61 090808f: A156V (SEQID NO: 70) CTGAAGCCNNKGGCNNKAAAGTGAC (SEO ID NO: 53) 0206. Further analyses using bioinformatic tools were pBAD-509–H59L61 090808r: therefore performed to expand the mutational sites to other GTCACTTTKNNGCCKNINGGCTTCAG KARI sequences as described below. (SEQ ID NO: 54) Sequence Analysis 71. pBAD-519-A71 090808f: GCAGCCGTTNNKGGTGCCGACT 0207 Members of the of ketol-acid redu (SEO ID NO: 55) coisomorase (KARI) were identified through BlastPsearches pBAD-541-A71 090808r: of publicly available databases using amino acid sequence of AGTCGGCACCKNNAACGGCTGC Pseudomonas fluorescens PF5 KARI (SEQID NO:17) with (SEO ID NO. 56) the following search parameters: E value=10, word size=3. Matrix=Blosumó2, and Gap opening 11 and gap exten 8O pBAD-545-T80 090808f: sion=1, E value cutoff of 10. Identical sequences and CATGATCCTGNNKCCGGACGAG sequences that were shorter than 260 amino acids were (SEO ID NO: 57) removed. In addition, sequences that lack the typical GXGXX pBAD-567-T80 090808r: (G/A) motif involved in the binding of NAD(P)H in the N-ter CTCGTCCGGKNNCAGGATCATG minal domain were also removed. These analyses resulted in (SEO ID NO: 58) a set of 692 KARI sequences. 1 O1 pBAD-608-A101 090808f: (0208. A profile HMM was generated from the set of the CAAGAAGGGCNNKACTCTGGCCT experimentally verified Class I and Class II KARI enzymes (SEO ID NO. 59) from various sources as described in Table 8. Details on pBAD-631-A101 090808r: building, calibrating, and searching with this profile HMM AGGCCAGAGTKNNGCCCTTCTTG are provided below. Any sequence that can be retrieved by (SEQ ID NO: 6O) HMM search using the profile HMM for KARI at E-value above 1E is considered a member of the KARI family. 119 pBAD-663-R119 O908O8f: Positions in a KARI sequence aligned to the following in the GTTGTGCCTNNKGCCGACCTCG profile HMM nodes (defined below in the section of profile (SEQ ID NO: 61) HMM building) are claimed to be responsible for NADH pBAD - 685-R119 O908O8r: utilization: 24, 33, 47, 50, 52, 53, 61,80, 115, 156, and 170 CGAGGTCGGCKNNAGGCACAAC (the numbering is based on the sequences of Pseudomonas (SEQ ID NO: 62) fluorescens PF5 KARI). Preparation of Profile HMM TABLE 6 0209. A group of KARI sequences were expressed in E. List of some mutants with their measured K values coli and have been verified to have KARI activity These (the mutated positions in those mutants were indentified KARIs are listed in Table 6. The amino acid sequences of by bioinformatic tools these experimentally verified functional KARIs were ana lyzed using the HMMER software package (The theory behind profile HMMs is described in R. Durbin, S. Eddy, A. Krogh, and G. Mitchison, Biological sequence analysis: probabilistic models of proteins and nucleic acids, Cam bridge University Press, 1998; Krogh et al., 1994; J. Mol. Biol. 235:1501-1531), following the user guide which is available from: HMMER (Janelia Farm Research Campus, Ashburn, Va.). The output of the HMMER software program is a profile Hidden Markov Model (profile HMM) that char acterizes the input sequences. As stated in the user guide, profile HMMs are statistical descriptions of the consensus of (SEQID NO: 28) a multiple sequence alignment. They use position-specific scores for amino acids (or nucleotides) and position specific US 2009/0163376 A1 Jun. 25, 2009 20 scores for opening and extending an insertion or deletion. will be assigned as an insert column. By default, X is set to 0.5. Compared to other profile based methods, HMMs have a Each match state has an I and a D State associated with it. formal probabilistic basis. Profile HMMs for a large number HMMER calls a group of three states (M/D/I) at the same of protein families are publicly available in the PFAM data consensus position in the alignment a "node'. base (Janelia Farm Research Campus, Ashburn, Va.). 0214) A profile HMM has several types of probabilities 0210. The profile HMM was built as follows: associated with it. One type is the transition probability—the probability of transitioning from one state to another. There Step 1. Build a Sequence Alignment are also emissions probabilities associated with each match 0211. The 25 sequences for the functionally verified state, based on the probability of a given residue existing at KARIs listed above were aligned using Clustal W (Thomp that position in the alignment. For example, for a fairly well son, J. D., Higgins, D. G., and Gibson T. J. (1994) Nuc. Acid conserved column in an alignment, the emissions probability Res. 22:46734680) with default parameters. The alignment is for the most common amino acid may be 0.81, while for each shown in FIG. 9. of the other 19 amino acids it may be 0.01.

TABLE 8 25 Experimentally verified KARI enzymes GINumber Accession SEQID NO: Organism 70732562 YP 26.2325. 17 Pseudomonas fluorescens Pf-5 15897495 NP 342100. 13 Sulfolobus solfatanicus P2 18313972 NP 560639. 14 Pyrobaculum aerophilum str. IM2 768O1743 YP 326751. 30 Natronomonas pharaonis DSM 2160 1607.9881 NP 390707. 31 Bacilius subtilis subsp. subtilis str. 168 19552493 NP 600495. 32 Corynebacterium glutamicum ATCC 13032 6225SS3 O32414 33 Phaeosprinium moischianum 1754.6794 NP 520196. 15 Raistonia Soianacearum GMI1000 56552O37 YP 162876. 34 Zymomonas mobilis Subsp. mobilis ZM4 114319705 YP 741388. 35 Aikalimnicola ehrlichei MLHE-1 57240359 ZP OO3683O8.1 36 Campylobacteriani RM2100 120553816 YP 9581.67. 37 Marinobacter aquaeolei VT8 71065099 YP 263826. 38 Psychrobacter arcticus 273-4 83648555 YP 436990. 39 Haheila cheiuensis KCTC 2396 7431.8007 YP 315747. 40 Thiobacilius denitrificans ATCC 25259 67159493 ZP OO42OO11.1 41 Azotobacter vinelandii AvOP 6604.4103 YP 233944. 42 Pseudomonas Syringae pv. Syringae B728a. 28868203 NP 790822. 43 Pseudomonas Syringae pv. tomato str. DC3OOO 26991362 NP 746787. 44 Pseudomonas putida KT2440 1047.83656 YP 610154. 45 Pseudomonas entomophila L48 146306044 YP 001186509.1 46 Pseudomonas mendocina ymp 15599888 NP 253382. 16 Pseudomonas aeruginosa PAO1 42780593 NP 977840. 47 Bacilius ceretts ATCC 10987 42781005 NP 978252. 48 Bacilius ceretts ATCC 10987 266346 Q01292 18 Spinacia oleracea

Step 2. Build a Profile HMM 0215. A profile HMM is completely described in a HMMER2 profile save file, which contains all the probabili 0212. The hmmbuild program was run on the set of ties that are used to parameterize the HMM. The emission aligned sequences using default parameters. hmmbuild reads probabilities of a match state or an insert State are stored as the multiple sequence alignment file, builds a new profile log-odds ratio relative to a null model: log (p x)/(null X). HMM, and saves the profile HMM to file. Using this program Where p X is the probability of an amino acid residue, at a an un-calibrated profile was generated from the multiple particular position in the alignment, according to the profile sequence alignment for twenty-four experimentally verified HMM and null x is the probability according to the Null KARIs as described above. model. The Null model is a simple one state probabilistic 0213. The following information based on the HMMER model with pre-calculated set of emission probabilities for Software user guide gives some description of the way that the each of the 20 amino acids derived from the distribution of hmmbuild program prepares a profile HMM. A profile HMM amino acids in the SWISSPROT release 24. State transition is a linear state machine consisting of a series of nodes, each of which corresponds roughly to a position (column) in the scores are also stored as log odds parameters and are propor multiple sequence alignment from which it is built. If gaps are tional to log(t. x). Wheret X is the transition probability of ignored, the correspondence is exact, i.e., the profile HMM transiting from one state to another state. has a node for each column in the alignment, and each node can exist in one state, a match state. The word “match here Step 3. Calibrate the Profile HMM implies that there is a position in the model for every position 0216. The profile HMM was read using hmmcalibrate in the sequence to be aligned to the model. Gaps are modeled which scores a large number of synthesized random using insertion (I) states and deletion (D) states. All columns sequences with the profile (the default number of synthetic that contain more than a certain fraction X of gap characters sequences used is 5,000), fits an extreme value distribution US 2009/0163376 A1 Jun. 25, 2009 21

(EVD) to the histogram of those scores, and re-saves the KARIs with experimentally verified function, matched all HMM file now including the EVD parameters. These EVD 692 sequences with an E value.<10. This result indicates parameters (Landw) are used to calculate the E-values of bit that members of the KARI family share significant sequence scores when the profile is searched against a protein sequence similarity. A hmmer search with a cutoff of Evalue 10 was database. Hmmcalibrate writes two parameters into the used to separate KARIs from other proteins. HMM file on a line labeled “EVD': these parameters are the p (location) and W(scale) parameters of an extreme value Step 5. Identify Positions that are Relevant for NAD(P)H distribution (EVD) that best fits a histogram of scores calcu Utilization. lated on randomly generated sequences of about the same 0220 Eleven positions have been identified in KARI of length and residue composition as SWISS-PROT. This cali Pseudomonas fluorescens Pf-5 that switches the cofactor bration was done once for the profile HMM. from NADPH to NADH. Since the KARI sequences share 0217. The calibrated profile HMM for the set of KARI significant sequence similarity (as described above), it can be sequences is provided appended hereto as a profile HMM reasoned that the homologous positions in the alignment of Excel chart (Table 9). In the main model section starting from KARI sequences should contribute to the same functional the HMM flag line, the model has three lines per node, for M specificity. The profile HMM for KARI enzymes has been nodes (where M is the number of match states, as given by the generated from the multiple sequence alignment which con LENG line). The first line reports the match emission log tains the sequence of Pseudomonas fluorescens Pf-5 KARI. odds scores: the log-odds ratio of emitting each amino acid The eleven positions in the profile HMM representing the from that state and from the Null model. The first number if columns in the alignment which correspond to the eleven the node number (1.M). The next Knumbers for match emis cofactor Switching positions in Pseudomonas fluorescens sion scores, one per amino acid. The highest scoring amino Pf-5 KARI are identified as positions 24, 33, 47, 50, 52, 53, acid is indicated in the parenthesis after the node number. 61, 80, 115, 156, and 170. The lines corresponding to these These log-odds scores can be converted back to HMM prob positions in the model file are highlighted in yellow in Table abilities using the null model probability. The last number on 9 the line represents the alignment column index for this match 0221 For any query sequence, hmm search is used to state. The second line reports the insert emission scores, and search the profile HMM for KARI against the query sequence the third line reports on state transition scores: M->M, M->I, and the alignment of the query to the HMM is recorded in the M->D; I->M, I->I; D-eM, D-e); B-eM: M->E. output file. In the alignment section of the output, the top line is the HMM consensus. The amino acid shown for the con Step 4. Test the Specificity and Sensitivity of the Built Profile sensus is the highest probability amino acid at that position HMMS according to the HMM (not necessarily the highest scoring 0218. The Profile HMM was evaluated using hmmsearch, amino acid). The centerline shows letters for “exact matches which reads a Profile HMM from hmm file and searches a to the highest probability residue in the HMM, or a "+” when sequence file for significantly similar sequence matches. The the match has a positive score. The third line shows the sequence file searched contained 692 sequences (see above). sequence itself. The positions in the query sequence that are During the search, the size of the database (Z parameter) was deemed as relevant for cofactor Switching are identified as set to 1 billion. This size setting ensures that significant E-val those that are aligned to these eleven nodes in the profile ues against the current database will remain significant in the HMM as described above. An example of the alignment of foreseeable future. The E-value cutoff was set at 10. Pseudomonas fluorescens Pf-5 KARI to the profile HMM of 0219. Ahmmer search, using hmmsearch, with the profile KARI is shown in FIG. 10 and the eleven positions that are HMM generated from the alignment of the twenty-five responsible for cofactor Switching are shaded in grey.

TABLE 9

A. C D E F G H I K HMM -> m->i m->d i->m i->i d->m d->d b->m ->{ L M N

-650 * - 1463 1(Q) -648 -1356 -136 -44 -1453 -1166 -219 - 1455 321 -1417 -911 -227 -149 -500 233 43 -381 399 O6 -626 210 -466 -72O 275 -38 -5840 -6882 -894 -1115 -701 - 1378 -650 : 2(M) -4231 -3929 -5216 -5402 -3438 -4370 -4528 -3232 -5113 -2613 S320 -SOS2 -147 -5O1 232 42 -382 397 O4 -625 209 -467 -722 276 -33O3 -3318 -325 -3473 -136 -701 - 1378 : : 3(F) -1308 -1104 -2227 -2120 3516 -2093 -244 -196 - 1891 64 66 - 1626 -149 -500 233 43 -381 399 O6 -626 210 -466 -72O 275 -38 -5840 -6882 -894 -1115 -943 -1060 : : 4(A) 1616 - 1744 1125 33 -2015 - 1540 -262 -1686 937 - 1765 -911 -252 -149 -500 233 43 -381 399 O6 -626 210 -466 -72O 275 -901 -74O2 -1125 -894 -1115 -2352 -314 : : 5(C) -346 2578 1084 -712 2092 -1540 -384 -167 -624 -482 125 -731 -149 -500 235 43 -381 398 O6 -626 210 -466 -721 275 -1009 -1006 -7567 -131 -3527 -1916 -444 : : 6(S) 800 -586 -1937 - 1415 -821 - 1740 -954 1279 -1204 -584 19 -1258 -149 -500 233 43 -381 399 O6 -626 210 -466 -72O 275 -17 -6953 -7995 -894 -1115 -146 -3378 : : US 2009/0163376 A1 Jun. 25, 2009 22

TABLE 9-continued

-2411 5O1 -2743 -1919 -558 -2483 -2420 -15O2 57 -500 43 -38 399 O6 -626 -466 -720 275 -894 -1115 -70 -1378 -2010 -4702 -2534 -4789 -4391 2241 -151 -1318 -4442 -500 43 399 O6 -626 -466 -720 275 -894 -70 -1378 -5505 -5069 -1332 -3424 -392 -2838 -3726 -500 43 399 O6 -626 -466 -720 275 -894 -70 -1378 -2625 -2097 -2986 -1481 -2628 9 O 6 -2674 -2098 -2051 -500 43 399 O6 -626 -466 -720 275 -894 -70 -1378 -4412 1042 -2437 -1765 -4500 -4361 -3682 515 -500 43 399 O6 -626 -466 -720 275 -894 -1378 -2371 819 -527 -2443 -2387 -1461 590 -500 43 O6 -626 -466 -720 275 -894 :2 -4633 S8O -4738 -4578 -3963 -1073 -500 43 -626 393 -466 -720 275 -894 -3818 762 -1437 -1051 -3233 -500 43 -626 32 -466 -720 275 -894 -290S S42 -2977 2 9 O -2912 -2023 1270 -500 43 -626 -466 -720 275 -894 -540 569 2299 -236 -2381 -500 43 -626 -466 -720 275 -894 -2877 1045 -2963 -2901 -2011 1860 -500 43 -626 -466 -720 275 -894 -832 -1110 -2091 -2264 -1691 -978 -500 43 -626 -466 -720 275 -5840 -894 -1313 -482 -1552 -1493 -1035 -579 -500 43 -626 -466 -720 275 -5840 -894 -1812 432 -2172 -2269 -1704 99 -500 43 -626 -466 -720 275 -5840 -894 -1695 2831 -1804 -1919 -1331 69 -500 43 -626 -466 -720 275 -5840 -894 -1337 -1229 -1596 -918 -769 -58S -1229 -500 43 -626 -466 -720 275 -5840 -894 -1931 -4227 2306 1990 -634 -3878 -500 43 -626 -466 -720 275 -8139 -894 -2299 -SOO3 3051 1593 -869 -4829 -500 43 -626 -466 -720 275 -8139 -894 -2632 -500 -2712 -2619 -1730 -778 -500 43 -626 O -466 -720 275 -8139 -894 -3900 392 -4030 -163 -3937 -3173 -967 -5O1 42 -625 -463 -722 276 -3318 -3674 -3775 -2558 -4021 3 8 -3617 -2982 -2368 -500 43 -626 -466 -720 275 -8139 -894 -2925 -979 -3021 2 7 3 -2865 -2032 2O2 -500 43 -626 2 -466 -720 275 -8139 -894 -2122 -4990 2388 49 4 5 -1532 -1474 -4790 -500 43 -626 2 O -466 -720 275 -8139 -894 : -1828 -4294 -4147 -416 9 -4428 -3497 -2821 -500 43 399 -626 -466 -720 275 -8139 -894 -2122 -4993 -511 2881 -495 -1532 -1474 -4796 -500 43 399 -626 -466 -720 275 -8139 -894 -70 -1378

US 2009/0163376 A1 Jun. 25, 2009 36

TABLE 9-continued

-4790 -4977 -4823 -4692 -4459 -3629 -4103 -4017 7200% 396 44 95 361 21 -368 -296 -251

-2278 -1503 -1798 -1617 -13SO -389 305 1335 8600% 394 45 96 359 7 -369 -294 -249

-1658 154 -383 -488 -2O38 -1421 87.00% 394 45 96 359 -369 -294 -249

-1705 -451 -883 -631 -50 -774 -133 88.00% 394 45 96 359 -369 -295 -249

-1964 -1013 -1358 1715 - 476 1117 -132O -938 9000% 394 45 96 359 7 -369 -294 -249

-2010 1146 458 829 224 -2040 -2577 -1913 91.00% 394 45 96 359 7 -369 -294 -249

-4600 -4417 -4628 -408O 3023 -3952 -3510 92.00% 394 45 96 359 -369 -294 -249

-4920 -3835 -4458 -4313 -3643 -581 4349 93.00% 394 45 96 359 -369 -294 -249

-32O6 -1513 -1078 -2258 -2435 -2009 418S 94.00% 394 45 96 359 -369 -294 -249

-2961 -1429 -2799 -2158 -3974 -4SSO -3541 95.00% 394 45 96 359 -369 -294 -249

-1960 -68 904 -67 -1993 -2554 -1871 96.00% 394 45 96 359 -369 -294 -249

-3046 -1551 -2987 -2292 -2742 -42O1 -4759 -3709 97.00% 394 45 96 359 7 -369 -294 -249

-3509 -3212 -3411 -2499 -1792 1507 -2796 -2431 98.00% 394 45 96 359 7 -369 -294 -249

-2294 -489 -1186 53 -2518 -3O86 -2349 99.00% 394 45 96 359 -369 -294 -249

-2862 -2O89 -2316 -232 -1213 1306 -1645 -1304 394 45 96 359 -369 -294 -249

-2289 -489 -1184 2139 -2SO3 -3077 -2343 O100% 394 45 96 359 -369 -294 -249

-1499 -1202 -1421 -646 -15SO -1916 -1919 394 45 96 359 -369 -294 -249

-1675 -363 -322 -934 -1354 -725 107 O300% 394 45 96 359 -369 -294 -249

-1453 -184 -1141 -728 -1814 -2146 -1646 04.00% 394 45 96 359 -369 -294 -249

-1441 -527 -653 -1512 -1988 -1SOS OSOO% 394 45 96 359 -369 -294 -249

-21 63 -1111 -1301 -1443 -932 592 3932 06.00% 394 45 96 359 -369 -294 -249

94 -3538 -3812 -3411 -2247 1576 -2891 -2629 O7.00% 394 45 96 359 -369 -294 -249

-4788 -4454 -4829 -4493 1435 -3781 -3585 O800% 394 45 96 359 -369 -294 -249

-2231 2257 968 -1109 -2288 -2738 -2136 09.00% 394 45 96 359 -369 -294 -249

-2810 -2362 1069 -3530 -4130 -3229 1000% 396 44 96 358 -371 -296 -251 US 2009/0163376 A1 Jun. 25, 2009 37

TABLE 9-continued

-358O -1076 1318 -3119 -2876 -3817 -3.395 -3374 26.00% 394 45 96 359 7 -369 -294 -249

-2582 1301 804 -1564 1681 -2645 -290S -2448 27.00% 394 45 96 359 7 -369 -294 -249

-4868 -4890 -5101 -4482 -2619 3219 -4SOS -3990 28.00% 394 45 96 359 -369 -294 -249

-2904 -3694 -3937 -1470 -2957 -4610 -4522 29.00% 394 45 96 359 -369 -294 -249

-4873 -4896 -5108 -4492 2896 -4512 -3997 394 45 96 359 -369 -294 -249

-48O2 -4495 -4860 -4506 1.192 -3838 -3622 31.00% 394 45 96 359 -369 -294 -249

-4804 -5546 -S385 -4727 -5862 -4924 -5849 3200% 394 45 96 359 -369 -294 -249

-4963 -3861 -4500 -4356 -3881 2986 4507 3300% 394 45 96 359 -369 -294 -249

-4804 -5546 -S385 -4727 -5862 -4924 -5849 34.00% 394 45 96 359 -369 -294 -249

-3093 -3.395 -3541 3475 -1885 -2307 -3927 -3474 35.00% 394 45 96 359 7 -369 -294 -249

-4693 4575 -3826 -4704 -4772 -5612 -4577 -47S1 36.00% 394 45 96 359 7 -369 -294 -249

-3149 -3871 -4137 -1784 -2005 -3297 -4725 -473S 3700% 394 45 96 359 7 -369 -294 -249

866 -1265 1506 -2614 -2557 -3469 -3282 -2908 38.00% 394 45 96 359 -369 -294 -249

-3728 -4477 -4545 -2S67 -2762 -3852 -4724 -4942 39.00% 394 45 96 359 7 -369 -294 -249

-3764 2546 -1428 -2990 -2976 -33O8 2269 -295 4000% 394 45 96 359 7 -369 -294 -249

-2900 -3608 -3823 217 -1660 -276 -4363 -4211 4100% 394 45 96 359 7 -369 -294 -249

-2446 2895 -1441 -1392 -693 -1678 -1278 42009/o 394 45 96 359 -369 -294 -249

-4479 -42SS -4592 -4115 -S371 -46SO -4731 4300% 394 45 96 359 -369 -294 -249

-4997 -47SO -5002 -5379 -2629 -3665 -3690 4400% 394 45 96 359 -369 -294 -249

-2603 2808 -1596 -1613 -2730 -2995 -2515 4500% 394 96 359 -369 -294 -249

-3196 -1786 -3536 -25O1 -3007 -4S17 -5004 -3956 46.00% 394 45 96 359 7 -369 -294 -249

-3026 -2967 -3497 3508 -1962 -3259 -4477 -4066 47.00% 393 45 95 360 8 -370 -295 -2SO

-3115 -1836 -3440 -2284 -2716 -4157 -488O -3914 S400% 394 45 96 359 -369 -294 -249

-4518 -398O -4367 -4081 -2716 3323 -3037 -2660 5500% 394 45 96 359 -369 -294 -249

-2SOS -770 -1595 -1483 -1666 332 -346O -2676 57.00% 394 45 96 359 -369 -294 -249 US 2009/0163376 A1 Jun. 25, 2009 38

TABLE 9-continued

-4579 -4940 -4923 -4013 -3297 3796 -4414 -4190 S800% 394 45 96 359 7 -369 -294 -249

-2610 -1809 -2O37 -1630 1166 2145 -1385 -343 59.00% 394 45 96 359 7 -369 -294 -249

-4871 -4894 -5106 -4488 -262O 3O88 -4511 -3996 6000% 394 45 96 359 7 -369 -294 -249

-3132 -3863 -4127 -1761 -1982 -3275 -472O -4725 6100% 394 45 96 359 7 -369 -294 -249

-4820 -4126 -4691 -4757 -33S1 883 -3184 -3234 62.00% 394 45 96 359 -369 -294 -249

-3439 -978 38OO -2894 -2709 -3682 -33S3 -3267 6300% 394 45 96 359 7 -369 -294 -249

1229 727 1079 -566 -893 -2041 -2S79 -1915 6400% 394 45 96 359 7 -369 -294 -249

-3049 -1587 -3230 -2297 -2766 -4245 -4850 -3752 6500% 394 45 96 359 7 -369 -294 -249

-2827 -1794 -1902 2738 -1771 -2970 -391O -3479 6600% 394 45 96 359 7 -369 -294 -249

-1861 886 -512 833 -740 -1868 -2441 -1767 67.00% 394 45 96 359 7 -369 -294 -249

-2948 -3606 -3832 3517 228 -3028 -46OO -4451 68.00% 394 45 96 359 7 -369 -294 -249

-2617 - 1898 18 -1648 -972 983 4091 -958 69.00% 394 45 96 359 7 -369 -294 -249

-1985 -100 -659 96 70 -2025 -2589 -1903 7000% 394 45 96 359 -369 -294 -249

-2052 692 -617 -889 -906 -361 -24S5 -1836 7100% 394 45 96 359 7 -369 -294 -249

-3728 -4477 -4545 -2S67 -2762 -3852 -4724 -4942 7200% 394 45 96 359 7 -369 -294 -249

-2052 1923 873 -888 -630 -2596 -1949 7300% 394 45 96 359 -369 -294 -249

- 1977 -90 -648 666 -2014 -2577 - 1892 7400% 394 45 96 359 -369 -294 -249

-2329 -543 -1250 -1245 -1368 -25O1 -3O87 -S30 7500% 394 45 96 359 -369 -294 -249

-3545 -3376 -4005 -2451 -2700 -3996 -47O6 -4575 7600% 394 45 96 359 7 -369 -294 -249

-3961 -3157 -3524 -3277 -2SO9 -1337 -1621 -840 77.00% 394 45 96 359 -369 -294 -249

-1947 672 436 806 687 -1970 -2533 -1851 7800% 393 45 95 359 -370 -295 -2SO

112 -3099 -3291 -2619 -767 3269 -2708 -2354 8400% 394 45 96 359 7 -369 -294 -249

-2401 -1174 -1547 -764 - 172 -528 -1519 1413 8500% 394 45 96 359 -369 -294 -249

-2242 -427 -1095 1451 1827 -2411 -2986 -2264 8600% 394 45 96 359 7 -369 -294 -249

686 -1884 -2111 -1695 945 2346 -1458 -1106 87.00% 394 45 96 359 -369 -294 -249 US 2009/0163376 A1 Jun. 25, 2009 39

TABLE 9-continued

-789 -73 421 23 -1936 2212 -1843 88.00% 394 45 96 359 -369 -294 -249

-2659 -1005 -1976 -400 -3194 3790 -2957 89.00% 394 45 96 359 -369 -294 -249

-2948 -3730 -3919 -1525 -2894 458O -4483 9000% 394 45 96 359 -369 -294 -249

-38SS -3494 -3700 -2979 2574 3091 -2698 91.00% 394 45 96 359 -369 -294 -249

-2135 -2O3 1088 428 -2148 2652 -2027 92.00% 394 45 96 359 -369 -294 -249

-1952 1955 -609 -382 -1966 2858 -1853 93.00% 394 45 96 359 -369 -294 -249

-2929 -3729 -3959 706 -1 718 -3OO1 4636 -4534 94.00% 394 45 96 359 -369 -294 -249

-3O82 603 -3296 -2353 -2844 -4347 4929 -3809 95.00% 394 45 96 359 -369 -294 -249

-4771 -4509 -4836 -4427 2741 3899 -3628 96.00% 394 45 96 359 -369 -294 -249

-4852 -4798 -SO38 -4487 3019 4355 -3902 97.00% 394 45 96 359 -369 -294 -249

-2878 250 -2265 -228 -506 1629 -1313 98.00% 394 45 96 359 -369 -294 -249

-4228 -3615 -3972 -3687 1692 2860 -2711 99.00% 394 45 96 359 -369 -294 -249 89(L) -4997 -47SO -5002 -5379 -2629 3665 -3690 394 45 96 359 -369 -294 -249

-3211 -2583 -2782 -23O8 1299 2020 -1688 2O100% 394 45 96 359 -369 -294 -249 91(P) 3993 -3625 -3900 666 -3354 461 O -4474 2O2.00% 394 45 96 359 -369 -294 -249 92(D) -45O1 -3870 -4926 -4440 -5894 4922 -5231 2O300% 394 45 96 359 -369 -294 -249 93 ( E) -2250 -554 -1093 463 -1902 2660 -2064 2O4.00% 394 45 96 359 -369 -294 -249 94(H) -2374 1474 -1479 -94 896 1557 -1158 394 45 96 359 7 -369 -294 -249 95 (Q) -3710 4317 -1866 -2894 -2844 -2559 3295 -2711 20600% 394 45 96 359 7 -369 -294 -249 96(A) 1217 711 445 447 -874 -1951 -1883 2O7.00% 394 45 96 359 7 -369 -249 97(D) -1947 1017 362 -760 -623 2535 -1852 20800% 394 45 96 359 -369 -294 -249 98(V) -3870 -33S1 -356S -31OS -2OOO 2330 2796 -2442 209.00% 394 45 96 359 -369 -294 -249 99(Y) -4964 -3861 -45O1 -4357 -4690 -3883 3325 4377 21000% 394 45 96 359 7 -369 -294 -249 100(E) -1952 890 286 -766 -1975 2S36 -1856 21100% 394 45 96 359 -369 -294 -249 101 (E) -2143 -136 -1056 26S -2377 2948 -22O7 212.00% 394 45 96 359 -369 -294 -249 US 2009/0163376 A1 Jun. 25, 2009 40

TABLE 9-continued

-1994 1767 -673 109 -8 85 -1023 -2605 -1917 21300% 394 45 96 359 7 -369 -294 -249

-4828 -4705 -4975 -4470 2240 -42O2 -3814 214.00% 394 45 96 359 -369 -294 -249

-1983 881 -618 -804 -1954 -2S30 -1862 215.00% 394 45 96 359 -369 -294 -249

2974 -812 -1635 469 -2849 -3462 -2693 216.00% 394 45 96 359 -369 -294 -249

-2218 -415 -957 -1093 -198 -2486 647 217.00% 394 45 96 359 -369 -294 -249

-4778 -4005 -4613 -4776 69 -3071 -3194 21800% 394 45 96 359 -369 -294 -249

-2766 1457 1261 -1817 -2858 -3OO2 -2622 219.00% 394 45 96 359 -369 -294 -249

1941 1446 -6SS -480 -2050 -261O - 1935 22000% 394 45 96 359 -369 -294 -249

-3507 -3243 -3937 -2418 -3974 -4703 -4521 22100% 394 45 96 359 -369 -294 -249

-1966 788 -630 790 -303 -1863 222.00% 394 45 96 359 -369 -249

-2388 -1111 -15O1 334 -1581 -1182 223.00% 394 45 96 359 -294 -249

-4857 -4172 -4762 -4864 SO6 -3264 -33S1 224.00% 394 45 96 359 -369 -294 -249

-2228 660 -1135 -1111 1305 -1913 -1442 394 45 96 359 -369 -294 -249

-4871 -3987 -4561 -4547 -2374 -1356 -292 22600% 394 45 96 359 -369 -294 -249

-2896 -3656 -3927 1514 -2983 -4632 -4539 227.00% 394 45 96 359 -369 -294 -249

-4960 -5O11 -4732 -S391 -6022 -4O63 -3641 228.00% 394 45 96 359 -369 -294 -249

-4804 -5546 -S385 -4727 -5862 -4924 -5849 229.00% 394 45 96 359 -369 -294 -249

-488O -3998 -4592 -4639 -2200 -1536 -S23 23000% 394 45 96 359 -369 -294 -249

-2633 -1676 -2141 413 -2139 -3194 -2737 23100% 394 45 96 359 -369 -294 -249

-4824 -4681 -4961 -4472 1969 -4158 -3791 232.00% 394 45 96 359 -369 -294 -249

-418.5 -3770 -3879 -3508 -4793 -417O -3759 233.00% 394 45 96 359 -369 -294 -249

-495S -3851 -4483 -4344 -3870 -S47 3677 234.00% 394 45 96 359 -369 -294 -249

-2090 -229 -938 -10 -2133 -2708 -2021 23S00% 394 45 359 -369 -294 -249

-3153 3585 -1718 -2137 -19 64 -1719 -2789 -2414 236.00% 394 45 96 359 -369 -294 -249

-4259 -42SS 230 -22 2003 -3673 -3237 237.00% 394 96 359 -369 -294 -249 US 2009/0163376 A1 Jun. 25, 2009 41

TABLE 9-continued

-1974 655 -646 -794 -827 1255 -2448 -1798 238.00% 394 45 96 359 7 -369 -294 -249

3775 -3593 -3911 -15SO -3067 -4647 -4548 239.00% 394 45 96 359 -369 -294 -249

2238 1247 2195 51 -218.4 -2757 -2174 24000% 394 45 96 359 -369 -294 -249

-2161 -319 -936 297 -2285 -2853 -2146 24100% 394 45 96 359 -369 -294 -249

-2278 -1503 -1798 -1617 -389 305 1335 242.00% 394 45 96 359 -369 -294 -249 32(P) 3539 -1065 -1192. -789 -1383 -1765 -1661 243.00% 394 45 96 359 -369 -294 -249 33(K) -1569 232 698 -786 -1358 -1637 -1317 24400% 394 45 96 359 -369 -294 -249 34(D) -2936 -1416 -2754 -2102 -2471 -3802 -4324 637 245.00% 394 45 96 359 7 -369 -294 -249

-3873 -3371 -3587 -342 -2009 1904 -2784 -2441 246.00% 394 45 96 359 7 -369 -294 -249

-2782 -1248 -230S 478 -2088 -3196 -3911 -3098 247.00% 394 45 96 359 7 -369 -294 -249

-4579 -4940 -4923 -4013 -3297 3796 -4414 -4190 248.00% 394 45 96 359 7 -369 -294 -249

-2608 -1885 -2O86 -1633 276 -271 -1325 844 249.00% 394 45 96 359 7 -369 -294 -249

-3994 -3.361 -3676 -3242 -2531 -1114 -2746 -2629 394 45 96 359 -369 -294 -249

-4869 -4890 -51O2 -4483 -2619 32O6 -4506 -3991 251.00% 394 45 96 359 -369 -294 -249

-3052 -3976 -4112 -1643 -1844 -2929 -4572 -4519 394 45 96 359 7 -369 -294 -249

4310 -5648 -S396 -5166 -5.194 -6092 -4900 -S786 2S300% 394 45 96 359 -369 -294 -249

-4535 -3079 -21.69 -4529 -4408 -5264 -4403 -4729 2S400% 394 45 96 359 7 -369 -294 -249

-2898 -3661 -3939 910 -1682 -2994 -4647 -4556 255.00% 394 45 96 359 7 -369 -294 -249

4036 -3912 -3822 -2883 -3249 -4O27 -3787 256.00% 394 45 96 359 -369 -294 -249

-4804 -5546 -S385 -4727 -5862 -4924 -5849 257.00% 394 45 96 359 -369 -294 -249

1551 -1305 -748 -2470 -2510 -3482 -3434 -2990 25800% 394 45 96 359 -369 -294 -249

-2747 -2399 -2684 567 2687 -1484 -2682 -2351 259.00% 394 45 96 359 7 -369 -294 -249

-4579 -4940 -4923 -4013 3796 -4414 -4190 26000% 394 45 96 359 -369 -294 -249

-4754 -3672 42.19 -4989 -5644 -4538 -4993 261.00% 394 45 96 359 -369 -294 -249

-2018 -128 2308 1224 -2023 -2585 -1919 262.00% 394 45 96 359 -369 -294 -249 US 2009/0163376 A1 Jun. 25, 2009 42

TABLE 9-continued 52(E) -2024 816 -736 -858 1303 -287 -2295 537 26300% 394 45 96 359 7 -369 -294 -249 153(Y) -4959 -3868 -4500 -4356 -4679 -3867 -S6S 4052 264.00% 394 45 96 359 7 -369 -294 -249 154(V) -2057 695 -796 443 -344 1871 -2.192 -1626 394 45 96 359 -369 -294 -249 55 (Q) -1949 1878 419 -764 -1976 -2S38 -1856 266.00% 394 45 96 359 -369 -294 -249 156(G) -390S -31.87 -4377 -3211 -4837 -4895 -4826 26700% 393 45 95 359 -368 -295 -2SO 57(G) -2112 -224 471 -962 -2149 -2694 -2O33 273.00% 394 45 96 359 -369 -294 -249 158(G) -3216 -3901 -4170 -1874 -3384 -4734 -4766 274.00% 394 45 96 359 -369 -294 -249 159(V) -4648 -4554 -4752 -4115 2986 -4159 -3678 27500% 394 45 96 359 -369 -294 -249 160(P) 4.031 -3244 -3757 -2665 -29 1 -4148 -4583 -4362 276.00% 394 45 96 359 -369 -294 -249 61(C) -2199 -545 -1093 872 -1043 -1946 -2701 -2102 277.00% 394 45 96 359 -369 -294 -249 162(L) -3626 -3433 -3567 48O -2414 -1973 -3145 -2755 278.00% 394 45 96 359 7 -369 -294 -249 63(I) -4439 -4023 -4320 -3916 -2488 2176 -3342 -3040 279.00% 394 45 96 359 -369 -294 -249 164(A) -3728 -4477 -4545 -2S67 -2762 -3852 -4724 -4942 28000% 394 45 96 359 -369 -294 -249 165(V) -4869 -4891 -51O2 -4483 -2619 3200 -4506 -3991 281.00% 394 45 96 359 -369 -294 -249 66(H) -3330 -1815 -2362 -2318 -2272 -981 3315 282009/o 394 45 96 359 -369 -294 -249

-4693 4575 -3826 -4704 -5612 -4577 -47S1 283.00% 394 45 96 359 -369 -294 -249 168(D) -3235 -1872 -3575 -2522 -3009 -4491 -4932 -3946 284.00% 394 45 96 359 -369 -294 -249 169(A. 1212 -538 -1013 -1056 -894 1275 -1968 635 394 45 96 359 -369 -294 -249 70(S) -2814 -1909 -2758 26.66 23 3 -3045 -4143 -3560 286.00% 394 45 96 359 -369 -294 -249 171(G) -3946 -3312 -3140 -3128 -3.317 -4492 -4622 -4749 28700% 394 45 96 359 -369 -294 -249 172(N) -1954 436 799 671 -539 -2534 -1855 288.00% 394 45 96 359 -369 -294 -249 173(A. -3728 -4477 -4545 -2S67 -3852 -4724 -4942 289.00% 394 45 96 359 -369 -294 -249 174(K) -2360 859 1012 -1273 -224.5 -2690 -21 65 394 45 96 359 -369 -294 -249 175(D) -2115 345 -869 -969 -2231 -2793 -2087 291.00% 394 45 96 359 -369 -294 -249

76(V) -2688 -1950 -2151 -1725 1669 -1425 827 292.00% 394 45 96 359 -369 -294 -249 US 2009/0163376 A1 Jun. 25, 2009 43

TABLE 9-continued 77 (A -2903 -3689 -3929 -1469 409 -2949 -4599 -4508 293.00% 394 45 96 359 7 -369 -294 -249 178(L) -4997 -47SO -5002 -5379 -4399 -2629 -3665 -3690 294.00% 394 45 96 359 -369 -294 -249 179(s) -2892 -3613 -3895 2686 -1675 -2982 -4623 -4523 295.00% 394 45 96 359 7 -369 -294 -249 180(Y) -4494 -3522 -4004 -3782 -3536 -2621 2928 4349 296.00% 394 45 96 359 -369 -294 -249 181(A, -2929 -3729 -3959 706 -1718 -3OO1 -4636 -4534 297.00% 394 45 96 359 -369 -294 -249

-2022 -205 -736 1541 -1628 -233O -1726 298.00% 394 45 96 359 -369 -294 -249 183(G) -3070 -3837 -4092 -1679 -3191 -4701 -4686 299.00% 394 45 96 359 -369 -294 -249 184(1) -4099 -3656 -3864 -3388 1742 -3147 -2761 394 45 96 359 -369 -294 -249 185(G) -4804 -5546 -S385 -4727 -5862 -4924 -5849 301.00% 394 45 96 359 -369 -294 -249 86(G) -2890 -3744 -3957 712 -2914 -4SS3 -4470 302009/o 394 45 96 359 -369 -294 -249 187(G) 986 -33O2 -3621 -1441 -2867 -4408 -4236 3O300% 394 45 96 359 -369 -294 -249 88(R) -4263 -2675 3948 -3768 -3671 -3813 -2293 -1328 3.04.00% 394 45 96 359 7 -369 -294 -249 189(A. -2734 -3301 -3572 1088 1907 -2713 -4286 -4136 394 45 96 359 7 -369 -294 -249 190(G) -4606 -5312 -5178 -4461 -5613 -4754 -S63S 306.00% 394 45 96 359 -369 -294 -249 91(V) -4668 -4584 -4783 -4148 2919 -4191 -3706 3.07.00% 394 45 96 359 -369 -294 -249 192(1) -4535 -4400 -4134 757 -3223 -3041 3O800% 394 96 359 -369 -294 -249 193(E) 1208 -1533 -2450 -1842 -3305 -4121 -339S 3.09.00% 394 45 96 359 -369 -294 -249 194(T) -3998 -458O -4545 -2999 -4113 -4684 -4915 31000% 394 45 96 359 -369 -294 -249 95 (T) -2901 -2988 -3387 -113 -2969 -4405 -4119 3.11.00% 394 45 96 359 -369 -294 -249 196(F) -4871 -3987 -4561 -4547 -2374 -1356 -292 3.12.00% 394 45 96 359 -369 -294 -249 197(K) -2475 767 2O54 -1421 -2544 -2858 -2356 313.00% 394 45 96 359 -369 -294 -249 198(E) -2713 1281 -2087 -1777 -2041 -3.331 -3909 -3046 3.14.00% 394 45 96 359 7 -369 -294 -249

99(E) -4513 -3838 -4570 -4456 -4726 -S786 -4878 -S197 315.00% 394 45 96 359 7 -369 -294 -249 200(T) -2815 -2309 -2526 -1685 3305 195 -1933 1430 316.00% 394 45 96 359 -369 -294 -249 201 E) -2749 -923 -1064 -1790 -1902 -171 -3398 -28O2 317.00% 394 45 96 359 7 -369 -294 -249 US 2009/0163376 A1 Jun. 25, 2009 44

TABLE 9-continued 202(T) -2947 -3588 -3797 697 -2989 -4554 -4399 3.18.00% 394 45 96 359 -369 -294 -249 203(D) -45O1 -3870 -4926 -4440 -5894 -4922 -5231 3.19.00% 394 45 96 359 -369 -294 -249 204(L) -4963 -4163 -4828 -5215 -1279 -3159 -3298 394 45 96 359 -369 -294 -249 205(F) -47S3 -3956 -4473 -4270 - 1899 -1349 -269 32100% 394 45 96 359 -369 -294 -249 206(G) -4804 -5546 -S385 -4727 -5862 -4924 -5849 32200% 394 45 96 359 -369 -294 -249 207(E) -4513 -3838 -4570 -4456 -4726 -S786 -4878 -S197 323.00% 394 45 96 359 7 -369 -294 -249 208(Q) -3620 4200 1284 -3063 -2944 -3900 -3556 -342O 324.00% 394 45 96 359 7 -369 -294 -249 209(A. -2798 -2295 -2S67 -1629 932 -2.245 - 1899 394 45 96 359 -369 -294 -249 210(V) -4855 -487O -5082 -4456 -2619 3416 -4480 -3969 326.00% 394 45 96 359 -369 -294 -249 211 (L) -4997 -47SO -5002 -5379 -2629 -3665 -3690 327.00% 394 45 96 359 -369 -294 -249 212(C) -4040 -3778 -4010 -3184 1347 -3209 -2883 328.00% 394 45 96 359 -369 -294 -249 213(G) -4804 -5546 -S385 -4727 -5862 -4924 -5849 329.00% 394 45 96 359 -369 -294 -249 214(G) -3149 -3871 -4137 -1784 -3297 -4725 -473S 33000% 394 45 96 359 -369 -294 -249 215(V) -3014 -238.2 -2566 -2O89 1949 -1773 -1423 33100% 394 45 96 359 -369 -294 -249 216(M) -224.5 -718 -1173 773 1715 1332 -1794 -1343 332009/o 394 45 96 359 -369 -294 -249 217(E) -2162 -351 -939 545 -1095 -2227 -2828 -2143 333.00% 394 45 96 359 7 -369 -294 -249 218 (L) -4963 -4163 -4828 -5215 -3571 -1279 -3159 -3298 334.00% 394 45 96 359 -369 -294 -249 219(V) -4772 -4654 -4895 -4362 -2584 3.018 -4181 -3758 33500% 394 45 96 359 -369 -294 -249 220(K) -2541 1714 784 -1509 -2617 -2894 -2418 33600% 394 45 96 359 -369 -294 -249 221(A) -2498 -1493 -1792 -1483 453 -1419 5O1 337.00% 394 45 96 359 -369 -294 -249 222G) -2977 -3326 -3545 -1606 -2S74 -4O68 -3810 338.00% 394 45 96 359 -369 -294 -249 223(F) -4949 -3874 -4496 -4351 -3829 -591 1725 33900% 394 45 96 359 -369 -294 -249 224(E) -2951 -1336 871 -2119 -3763 -4239 -339S 34000% 394 45 96 359 -369 -294 -249 225(T) -2936 -1995 920 -1748 967 -2989 -2654 341.00% 394 45 96 359 -369 -294 -249 226(L) -4756 -3864 -4314 -4462 -3729 -2115 -1672 1736 342.00% 394 45 96 359 7 -369 -294 -249 US 2009/0163376 A1 Jun. 25, 2009 45

TABLE 9-continued 227(V) -3600 -3879 -4011 -2397 25 2999 -3974 -3608 343.00% 394 45 96 359 -369 -294 -249 228(E) -3182 -1761 -3454 -2476 -4462 -4951 -3920 344.00% 394 45 96 359 -369 -294 -249 229(A. -23SS -90S -1425 512 -1817 -2697 -2174 345.00% 394 45 96 359 -369 -294 -249 230(G) -4804 -5546 -S385 -4727 -5862 -4924 -5849 346.00% 394 45 96 359 -369 -294 -249 231(Y. -4707 -3735 -4111 -4O65 -3172 -847 4618 347.00% 394 45 96 359 -369 -294 -249 232(Q) 2O86 345 902 -2042 -2608 -1941 348.00% 45 96 359 -369 -294 -249 233(P) 4045 -2989 -3760 -332O -4817 -4763 -4636 349.00% 394 45 96 359 -369 -294 -249 234(E) -3188 -1768 -3458 -2484 -4467 -495O -3925 394 45 96 359 -369 -294 -249 235(M. -4610 -391.5 -4423 -4378 1140 -3054 -3074 3S100% 394 45 96 359 -369 -294 -249 236(A) -3728 -4477 -4545 -2S67 -3852 -4724 -4942 352.00% 394 45 96 359 -369 -294 -249 237(Y) -4951 -3876 -4497 -4354 -3859 -588 4723 353.00% 394 45 96 359 -369 -294 -249 238(F) -4434 -3287 -2952 -3868 -3571 -1593 -494 3S400% 394 45 96 359 -369 -294 -249 239(E) -3182 -1755 -3103 -2442 -4306 -47S3 -3820 355.00% 394 45 96 359 -369 -294 -249 240C) -3133 -3588 -3744 -1803 112S -3677 -3390 356.00% 394 45 96 359 -369 -294 -249 241 (L) -4777 -4026 -4622 -47SS 814 -3103 -3213 357.00% 394 45 96 359 -369 -294 -249 242(H) -3009 -1439 -2361 -2209 -3893 -4267 -33S3 358.00% 394 45 96 359 -369 -294 -249 243 ( E) -4005 -3370 -3802 -3269 -4260 -4554 -4524 35900% 394 45 96 359 -369 -294 -249 244(L) -3170 -25O1 -2721 -2289 436 -1859 -1558 36000% 394 45 96 359 -369 -294 -249 245(K) -3377 -1447 -616 -2537 -3485 -3625 -3471 361.00% 394 45 96 359 -369 -294 -249 246(L) -4852 -4044 -4689 -4963 742 -3O82 -3239 363.00% 394 45 96 359 -369 -294 -249

247 (I) -4833 -4427 -4799 -4598 -64 -3627 -3397 364.00% 394 45 96 359 -369 -294 -249 248(V) -3419 -3145 -3320 367 -1807 3332 -287.4 -2504 36500% 394 45 96 359 -369 -294 -249 249(D) -3314 - 1891 -3072 -2624 -3060 -4467 -4770 -3962 366.00% 394 45 96 359 -369 -294 -249 250(L) -4179 -3740 -3989 -3.399 -1545 -3154 -3.367 367.00% 394 45 96 359 -369 -294 -249 251(M) -46O1 -4487 -4251 -2764 766 -3321 -3210 368.00% 394 96 359 7 -369 -294 -249 US 2009/0163376 A1 Jun. 25, 2009 46

TABLE 9-continued 252(Y) -4868 -3786 -4393 381 -4432 -3662 2413 4375 369.00% 394 45 96 359 7 -369 -294 -249 253(E) -2716 1879 -1666 -1757 -3162 -3709 -2960 37000% 394 45 96 359 -369 -294 -249 254(G) -2459 -486 658 -1397 -2521 -2923 -2378 371.00% 394 45 96 359 -369 -294 -249 255(G) -4804 -5546 -S385 -4727 -5862 -4924 -5849 372.00% 394 45 96 359 -369 -294 -249 256(1) -3653 -2958 -3221 -2884 -344 -2325 -2O73 373.00% 394 45 96 359 -369 -294 -249 257 (A -21 SS -498 -993 1037 -1149 -2002 624 37400% 394 45 96 359 -369 -294 -249 258(N. -2763 -1148 -2243 -1845 -3433 -4O27 -3144 375.00% 394 45 96 359 -369 -294 -249 259(M) -4838 -4039 -4539 -486O -1557 -3044 376.00% 394 45 96 359 -369 -294 260(R) -2814 -1215 24.08 -1789 -1468 1995 1546 377.00% 394 45 96 359 -369 -294 -249 261(Y) -2446 -659 -1264 -1389 -2582 -2924 3695 378.00% 394 45 96 359 -369 -294 -249 262(s) -2903 -3648 -3936 3391 -3009 -4661 -4566 37900% 394 45 96 359 -369 -294 -249 263(I) -4827 -4713 -4978 -4466 2521 -4218 -3821 38000% 394 45 96 359 -369 -294 -249 264(s) -3638 -42O3 -4355 3681 -2664 -3902 -4616 -4605 381.00% 394 45 96 359 7 -369 -294 -249 265(N) -3069 -1612 -3.311 -2334 -2821 -4325 -4919 -3794 382O0% 394 45 96 359 7 -369 -294 -249 266(T) -2850 -1369 -2277 - 1897 -33S1 -4OSS -3271 383.00% 394 45 96 359 -369 -294 -249 267(A. -3O80 -3858 -4099 -1690 -3194 -4697 -4687 384.00% 394 45 96 359 -369 -294 -249 268(E) -2827 1323 -773 -1923 -31.99 -3455 -2917 38500% 394 45 96 359 -369 -294 -249 269(Y) -4847 -3741 -4299 -4199 -4391 -3637 2997 4211 386.00% 394 45 96 359 -369 -294 -249 270(G) -4606 -5312 -5178 -4461 -4560 -5613 -4754 -S63S 387.00% 394 45 96 359 -369 -294 -249 271(D) -3073 -1621 -3297 -233O -2809 -4301 -4894 -3793 388.00% 394 45 96 359 7 -369 -294 -249 272Y) -3886 -3109 -3468 -3181 -2410 -1282 -1615 3508 389.00% 394 45 96 359 -369 -294 -249 273(V) -3060 942 -2514 -2129 1701 2578 - 1907 -1553 39000% 394 45 96 359 -369 -294 -249 274(T) -2421 -1149 887 1341 2345 822 -2454 -2006 391.00% 394 45 96 359 -369 -294 -249 2.75G) -3149 -3871 -4137 -1784 -2005 -3297 -4725 -473S 392.00% 394 45 96 359 7 -369 -294 -249 276(P) 2813 -260 465 -873 -2467 1093 393.00% 394 45 359 -369 -294 -249 US 2009/0163376 A1 Jun. 25, 2009 47

TABLE 9-continued 277 (R) -2228 -273 2862 417 -1 -2191 -2671 -2084 394.00% 394 45 96 359 -369 -294 -249 278(V) -4649 -4585 -4784 -4102 3125 -42O2 -3717 395.00% 394 45 96 359 -369 -294 -249 279(I) -4279 -4082 -4274 226 2182 -3688 -32SO 396.00% 394 45 96 359 -369 -294 -249 280(D) -2573 -899 -1742 -1561 -2804 -3.366 1327 397.00% 394 45 96 359 -369 -294 -249 281 (E) -2190 -358 -992 -1065 -2310 -2885 1152 398.00% 394 45 96 359 -369 -294 -249 282(E) 478 1356 -620 116 -1994 -2555 -1871 399.00% 394 45 96 359 -369 -294 -249 283(T) -2821 -2916 -3214 2251 1397 -3549 -3270 40000% 394 45 96 359 -369 -294 -249 284(K) -3430 -937 2705 -287.4 -3556 -3267 -31.89 4O100% 394 45 96 359 -369 -294 -249 285(E) -1960 1120 437 106 -1992 -2SS2 -1869 394 45 96 359 -369 -294 -249 286(A) -2019 -136 1056 -62 -1889 -2488 -1849 40300% 394 45 96 359 -369 -294 -249 287(M) -4838 -4039 -4539 -486O -1557 -3044 40400% 394 45 96 359 -369 -294 288(K) -2SS4 -472 1762 -1527 -2597 -2885 1245 394 45 96 359 -369 -294 -249 289(E) -1975 663 831 -795 -2015 -2577 - 1891 406.00% 394 45 96 359 -369 -294 -249 290(C) -3957 -3613 -38OS -3144 -2 2342 -3118 -2720 4O7.00% 394 45 96 359 -369 -294 -249 291 (L) -3782 -3025 -3298 -2972 -1269 -1846 1565 4.08.00% 394 45 96 359 -369 -294 -249 292(K) -1949 889 -603 -763 -1968 -2S32 -1851 409.00% 394 45 96 359 -369 -294 -249 293(D) -2782 -1139 2142 -1879 -3473 -4024 -31SS 41000% 394 45 96 359 -369 -294 -249 294(1) -4862 -4869 -5086 -4473 2O71 -4467 -3968 411.00% 394 45 96 359 -369 -294 -249

295 (Q) -2936 3817 -430 -2O18 -3060 -3278 -2908 412.00% 394 45 96 359 -369 -294 -249 296(s) -2288 724 -1177 1978 -2508 -3076 -2340 41300% 394 45 96 359 -369 -294 -249 297(G) -4804 -5546 -S385 -4727 -5862 -4924 -5849 414.00% 394 45 96 359 -369 -294 -249 298.E) -1991 -97 777 829 -1999 -2559 - 1890 415.00% 394 45 96 359 -369 -294 -249 29(F) -47SO -3628 -4209 -4O78 -3546 -601 2.917 416.00% 394 45 96 359 -369 -294 -249 300(A) -3432 -3073 -3298 -2387 1100 -2812 -2447 417.00% 394 45 96 359 -369 -294 -249 301(K) -3047 -758 2488 -458 -3194 -3159 -2895 418.00% 394 45 96 359 -369 -294 -249 US 2009/0163376 A1 Jun. 25, 2009 48

TABLE 9-continued 302(M) -1958 -66 722 775 -833 -1982 -2545 -1864 419.00% 394 45 96 359 7 -369 -294 -249 303(W) -4179 -3330 -3743 -3475 -2878 664 4754 -84 394 45 96 359 -369 -294 -249 304(1) -4713 -4488 -4769 -4301 -2597 2384 -3934 -3596 421.00% 394 45 96 359 -369 -294 -249 305(L) -2325 -915 -1349 767 84 -1756 -1329 422.00% 394 45 96 359 -369 -294 -249 306(E) -2893 -1339 -2626 -2045 -2402 -3767 -4362 -3401 42300% 394 45 96 359 7 -369 -294 -249 307(N) -21 SS -482 16S -1022 30 -1990 1446 424.00% 394 45 96 359 -369 -294 -249 308(Q) -1989 2135 1528 -812 -868 -2019 -2566 - 1895 42500% 394 45 96 359 7 -369 -294 -249 309(A. -2103 -346 1657 -953 -261 -21.69 -1624 426.00% 394 45 96 359 -369 -294 -249 310(G) -3167 -2SSO -318S -1986 -2162 -2814 -1874 427.00% 394 45 96 359 -294 -249 311 (Y) -2267 10O2 1577 -1172 -800 -1730 2446 42800% 394 45 96 359 -369 -294 -249 312(P) 3598 -1697 -2362 -1566 -2803 -3832 -3277 429.00% 394 45 96 359 -369 -294 -249 313(K -1569 232 698 -786 -1358 -1637 -1317 394 45 96 359 -369 -294 -249 314(E) -1441 -527 -653 -1512 -1988 -1SOS 43100% 394 45 96 359 -369 -294 -249 315(T) -2004 -128 839 1365 -2017 -2592 -1915 432.00% 394 45 96 359 -369 -294 -249 316(M) -3673 -2938 -3257 -2944 326 -2014 1662 43300% 394 45 96 359 -369 -294 -249 317(H) 472 -69 433 -772 1411 -299 -2SO7 -1836 434.00% 394 45 96 359 -369 -294 -249 318(A -2110 837 -805 -138 -1006 -2092 -2699 -2O39 435.00% 394 45 96 359 7 -369 -294 -249 319(M) -2474 -1419 102O -1455 670 -411 831 535 436.00% 394 45 96 359 7 -369 -294 -249 320(R) 597 -382 2908 -1246 -1219 -2044 -2S79 -2061 437.00% 394 45 96 359 -369 -294 -249 321(R. -2020 -177 2045 95 -893 -1984 -2543 - 1892 438.00% 394 45 96 359 7 -369 -294 -249 322(N) - 1957 348 224 551 -821 -1910 -2497 -1828 439.00% 394 45 96 359 7 -369 -294 -249 323(E) -1963 682 -629 -781 -279 -2472 476 44000% 394 45 96 359 -369 -294 -249 324(N. -1827 1398 447 -748 -1713 -1779 -1595 44100% 394 45 96 359 -369 -294 -249 325(N) -2417 -657 -1430 -1361 -2727 -3303 -2540 442O0% 394 45 96 359 -369 -294 -249 326(H) -2961 1288 -1287 670 -3161 -3310 -26O1 443.00% 394 45 96 359 -369 -294 -249 US 2009/0163376 A1 Jun. 25, 2009 49

TABLE 9-continued 327 (Q) 1404 2021 -634 -785 -561 -2495 -1830 444.00% 394 45 96 359 -369 -294 -249 328(1) -3016 -1615 822 -2097 -560 -2157 -1805 445.00% 394 45 96 359 -369 -294 -249 329(E) -3440 -2O54 -2556 -2649 -2 -3234 -392O -3370 446.00% 394 45 96 359 -369 -294 -249 330(W) -2104 -1454 -1405 -1757 -1218 S462 838 447.00% 394 45 96 359 -369 -294 -249 331(K) -2030 1425 -709 -864 1337 -2323 -1722 448.00% 394 45 96 359 -369 -294 -249 332(V) -4575 -4456 -4653 -3990 3227 -4O65 -3597 44900% 394 45 96 359 -369 -294 -249 333(G) -4804 -5546 -S385 -4727 -5862 -4924 -5849 45000% 394 45 96 359 -369 -294 -249 334(E) -2297 -451 1346 -1191 -2412 -2952 -2288 451.00% 394 45 96 359 -369 -294 -249 335(K) -2200 1198 862 -1073 -2287 -2770 -2133 452.00% 394 45 96 359 -369 -294 -249 336(L) -4852 -4044 -4689 -4963 742 -3O82 -3239 45300% 394 45 96 359 -369 -294 -249 337(R) -4754 -3672 42.19 -4989 -5644 -4538 -4993 454.00% 394 45 96 359 -369 -294 -249 338(E) -1998 -117 -681 1129 -2044 -2.609 -1921 394 45 96 359 -369 -294 -249 339(M) -4742 -3963 -4548 -4689 -3 273 771 -3O38 -3137 456.00% 394 45 96 359 7 -369 -294 -249 340(M. -3334 -2559 641 -2SO3 -1 740 404 -1934 -1661 457.00% 394 45 96 359 7 -369 -294 -249 341(P) 3205 695 -1607 703 -1 -1700 -2613 -2121 458.00% 394 45 96 359 -369 -294 -249 342(W) -4206 -2971 -3OOS -3484 -3 396 -2818 5644 611 459.00% 394 45 96 359 -369 -294 -249 343(1) -4407 -4340 -4.559 -3974 -2 216 21SO -391.5 -3445 46000% 394 45 96 359 -369 -294 -249 344(A) -1822 37 -536 966 724 -1859 -2439 -1755 46100% 394 45 96 359 7 -369 -294 -249 345(A) -1922 1335 -611 251 768 -123 -2361 -1732 462.00% 394 45 96 359 7 -369 -294 -249 346(N) 1624 -733 -1607 -1438 -1 645 -2887 -34.66 -2662 46300% 394 45 96 359 7 -369 -294 -249 347(K) -2774 -488 1175 -1994 -1 888 -2848 -2822 -2610 464.00% 394 45 96 359 -369 -294 -249 348(L) -2134 -805 485 -1085 471 -1340 -944 465.00% 394 45 96 359 -369 -294 -249 349(V) -2830 -2012 -2236 - 1892 2537 -1701 -1344 466.00% 394 45 96 359 -369 -294 -249 350(D) -2733 -1154 -2294 -1833 25 575 -3993 -3117 467.00% 394 45 96 359 -369 -294 -249 351(K) 531 720 -562 -724 290 -1929 -2493 -1812 46800% 394 45 96 359 -369 -294 -249 US 2009/0163376 A1 Jun. 25, 2009 50

TABLE 9-continued 352(D) 34 -182 -708 541 -74 -281 -2188 -16OS 46900% 394 45 96 359 117 -369 -294 -249

353(K) -2548 -407 1469 -1628 - 1517 -119 -2497 1619 47000% 394 45 96 359 117 -369 -294 -249

354(N) 1327 - 1460 -2119 1182 - 1362 -2558 -36O1 -3O46 47100% : : : : : : : :

HMMER2.02.2 g) File format version: a unique identifier for this save file format. NAME Functionally Verified KARIs Name of the profile HMM LENG 354 Model length: the number of match states in the model. ALPHAmino Symbol alphbet: This determines the symbol alphabet and the size of the symbol emission probability distributions. IAmino, the alphabet size is set to 20 and the symbol alphabet to ACDEFGHIKLMNPQRSTVWY (alphabetic order). MAPyes Map annotation flag: If set to yes, each line of data for the match state? consensus column in the main section of the file is followed by an extra number. This number gives the index of the alignment column that the match state was made from. This information provides a “map' of the match states (1... M) onto the columns of the alignment (1.alen). It is used for quickly aligning the model back to the original alignment, e.g. when using hmmalign-mapali. COM hmmbuild-n Functionally Command line for every HMMER command that modifies the save file: Verified KARIs exp-KARI.hmm This one means that hmmbuild (default patrameters) was applied to exp-KARI modalin generate the save file. COM hmmcalibrate Command line for every HMMER command that modifies the save file: This one exp-KARI.hmm means that hmmcalibrate (default parametrs) was applied to the save profile. NSEQ 25 Sequence number: the number of sequences the HMM was trained on DATE Mon Dec 8, 17:34:51 2008 Creation date: When was the save file was generated. XT-8455-4-1000-1OOO Eight special transitions for controlling parts -8455-4-8455-4 of the algorithm-specific parts of the Plan7 model. The null probability used to convert these back to model probabilities is 1.0. The order of the eight fields is N->B, N->N, E->C, E->J, C->T, C->C, J->B, J->J. NULT-4 -84.55 The transition probability distribution for the null model (single G state). NULE 595-15588S 338-294453 The extreme value distribution parameters L and lambda respectively; -1158. 197249 902-1085-142 both floating point values. These values are set when the model is calibrated -21-313 45 531 201384-1998-644 with hmmcalibrate. They are used to determine E-values of bit scores. EVD-333.7127080.110102

SEQUENCE LISTING

NUMBER OF SEO ID NOS: 70

SEQ ID NO 1 LENGTH: 30 TYPE: DNA ORGANISM: artificial sequence FEATURE: OTHER INFORMATION: primer SEQUENCE: 1 tgatgaacat citt cqcgt at t cqc.cgt.cct 3 O

SEQ ID NO 2 LENGTH: 68 TYPE: DNA ORGANISM: artificial sequence FEATURE: OTHER INFORMATION: forward primer for library C FEATURE: NAMEAKEY: misc feature LOCATION: (24) ... (25) OTHER INFORMATION: n is a, c, g, or t FEATURE: NAMEAKEY: misc feature LOCATION: (33) . . (34) OTHER INFORMATION: n is a, c, g, or t US 2009/0163376 A1 Jun. 25, 2009 51

- Continued

FEATURE: NAMEAKEY: misc feature LOCATION: (39).. (40) OTHER INFORMATION: n is a , 9. o t SEQUENCE: 2 gcgtagacgt gactgttggc Ctgnntaaag gCninggctinn Ctgggccaag gctgaag.ccc 6 O acggcttg 68

<210 SEQ ID NO 3 <211 LENGTH: 68 &212> TYPE: DNA <213> ORGANISM: artificial sequence &220s FEATURE: &223> OTHER INFORMATION: forward primer for library E &220s FEATURE: <221 NAMEAKEY: misc feature <222> LOCATION: (24) . . (25) &223> OTHER INFORMATION: n is a , 9. o t <4 OO SEQUENCE: 3 gcgtagacgt gactgttggc Ctgnntaaag gct cqgctac C9ttgccaag gctgaag.ccc 6 O acggcttg 68

<210 SEQ ID NO 4 <211 LENGTH: 68 &212> TYPE: DNA <213> ORGANISM: artificial sequence &220s FEATURE: &223> OTHER INFORMATION: forward primer for library F &220s FEATURE: <221 NAMEAKEY: misc feature <222> LOCATION: (33) . . (34) &223> OTHER INFORMATION: n is a , 9. o t <4 OO SEQUENCE: 4 gcgtagacgt gactgttggc ctg.cgtaaag gCnntgctac C9ttgccaag gctgaag.ccc 6 O acggcttg 68

SEO ID NO 5 LENGTH: 68 TYPE: DNA ORGANISM: artificial sequence FEATURE: OTHER INFORMATION: forward primer for library G FEATURE: NAMEAKEY: misc feature LOCATION: (39).. (40) OTHER INFORMATION: n is a , 9. o t SEQUENCE: 5 gcgtagacgt gactgttggc ctg.cgtaaag gct cqgctinn tittgccaag gctgaag.ccc 6 O acggcttg 68

SEQ ID NO 6 LENGTH: 68 TYPE: DNA ORGANISM: artificial sequence FEATURE: OTHER INFORMATION: forward primer for library H FEATURE: NAMEAKEY: misc feature LOCATION: (24) ... (25) OTHER INFORMATION: n is a , 9. o t US 2009/0163376 A1 Jun. 25, 2009 52

- Continued

&220s FEATURE: <221 NAMEAKEY: misc feature <222> LOCATION: (33) . . (34) <223> OTHER INFORMATION: n is a, c, g, or t &220s FEATURE: <221 NAMEAKEY: misc feature <222> LOCATION: (39) . . (40) <223> OTHER INFORMATION: n is a, c, g, or t <4 OO SEQUENCE: 6 gcgtagacgt gactgttggc Ctgnntaaag gCnntgctinn tittgccaag gctgaag.ccc 6 O acggcttg 68

<210 SEQ ID NO 7 <211 LENGTH: 2O &212> TYPE: DNA <213> ORGANISM: artificial sequence &220s FEATURE: <223> OTHER INFORMATION: sequencing primer (forward)

<4 OO SEQUENCE: 7 aagattagcg gatcctacct 2O

<210 SEQ ID NO 8 <211 LENGTH: 2O &212> TYPE: DNA <213> ORGANISM: artificial sequence &220s FEATURE: <223> OTHER INFORMATION: sequencing primer (reverse)

<4 OO SEQUENCE: 8 aacago caag cittittagttc 2O

<210 SEQ ID NO 9 <211 LENGTH: 330 &212> TYPE: PRT <213> ORGANISM: Methanococcus aeolicus

<4 OO SEQUENCE: 9 Met Llys Val Phe Tyr Asp Ser Asp Phe Llys Lieu. Asp Ala Lieu Lys Glu 1. 5 1O 15 Llys Thir Ile Ala Val Ile Gly Tyr Gly Ser Glin Gly Arg Ala Glin Ser 2O 25 3 O Lieu. Asn Met Lys Asp Ser Gly Lieu. Asn Val Val Val Gly Lieu. Arg Llys 35 4 O 45 Asn Gly Ala Ser Trp Glu Asn Ala Lys Ala Asp Gly. His Asn. Wal Met SO 55 6 O Thir Ile Glu Glu Ala Ala Glu Lys Ala Asp Ile Ile His Ile Lieu. Ile 65 70 7s 8O Pro Asp Glu Lieu. Glin Ala Glu Val Tyr Glu Ser Glin Ile Llys Pro Tyr 85 9 O 95 Lieu Lys Glu Gly Lys Thr Lieu Ser Phe Ser His Gly Phe Asin Ile His 1 OO 105 11O Tyr Gly Phe Ile Val Pro Pro Lys Gly Val Asn Val Val Lieu Val Ala 115 12O 125 Pro Llys Ser Pro Gly Lys Met Val Arg Arg Thr Tyr Glu Glu Gly Phe 13 O 135 14 O Gly Val Pro Gly Lieu. Ile Cys Ile Glu Ile Asp Ala Thr Asn. Asn Ala 145 150 155 160 US 2009/0163376 A1 Jun. 25, 2009 53

- Continued

Phe Asp Ile Val Ser Ala Met Ala Lys Gly Ile Gly Lieu. Ser Arg Ala 1.65 17O 17s Gly Val Ile Glin Thr Thr Phe Lys Glu Glu Thr Glu Thr Asp Leu Phe 18O 185 190 Gly Glu Glin Ala Val Lieu. Cys Gly Gly Val Thr Glu Lieu. Ile Lys Ala 195 2 OO 2O5 Gly Phe Glu Thir Lieu Val Glu Ala Gly Tyr Ala Pro Glu Met Ala Tyr 210 215 22O Phe Glu Thir Cys His Glu Lieu Lys Lieu. Ile Val Asp Lieu. Ile Tyr Glin 225 23 O 235 24 O Lys Gly Phe Lys Asn Met Trp Asin Asp Wal Ser Asn. Thir Ala Glu Tyr 245 250 255 Gly Gly Lieu. Thir Arg Arg Ser Arg Ile Val Thr Ala Asp Ser Lys Ala 26 O 265 27 O Ala Met Lys Glu Ile Lieu Lys Glu Ile Glin Asp Gly Arg Phe Thir Lys 27s 28O 285 Glu Phe Val Lieu. Glu Lys Glin Val Asn His Ala His Lieu Lys Ala Met 290 295 3 OO Arg Arg Ile Glu Gly Asp Lieu. Glin Ile Glu Glu Val Gly Ala Lys Lieu 3. OS 310 315 32O Arg Llys Met Cys Gly Lieu. Glu Lys Glu Glu 3.25 330

<210 SEQ ID NO 10 <211 LENGTH: 330 &212> TYPE: PRT <213> ORGANISM: Methanococcus maripaludis <4 OO SEQUENCE: 10 Met Llys Val Phe Tyr Asp Ser Asp Phe Llys Lieu. Asp Ala Lieu Lys Glu 1. 5 1O 15 Llys Thir Ile Ala Val Ile Gly Tyr Gly Ser Glin Gly Arg Ala Glin Ser 2O 25 3 O Lieu. Asn Met Lys Asp Ser Gly Lieu. Asn Val Val Val Gly Lieu. Arg Llys 35 4 O 45 Asn Gly Ala Ser Trp Asn. Asn Ala Lys Ala Asp Gly. His Asn. Wal Met SO 55 6 O Thir Ile Glu Glu Ala Ala Glu Lys Ala Asp Ile Ile His Ile Lieu. Ile 65 70 7s 8O Pro Asp Glu Lieu. Glin Ala Glu Val Tyr Glu Ser Glin Ile Llys Pro Tyr 85 9 O 95 Lieu Lys Glu Gly Lys Thr Lieu Ser Phe Ser His Gly Phe Asin Ile His 1 OO OS 11O Tyr Gly Phe Ile Val Pro Pro Lys Gly Val Asn Val Val Lieu Val Ala 115 2O 125 Pro Llys Ser Pro Gly Lys Met Val Arg Arg Thr Tyr Glu Glu Gly Phe 13 O 35 14 O Gly Val Pro Gly Lieu. Ile Cys Ile Glu Ile Asp Ala Thr Asn. Asn Ala 145 SO 155 160 Phe Asp Ile Val Ser Ala Met Ala Lys Gly Ile Gly Lieu. Ser Arg Ala 1.65 70 17s Gly Val Ile Glin Thr Thr Phe Lys Glu Glu Thr Glu Thr Asp Leu Phe US 2009/0163376 A1 Jun. 25, 2009 54

- Continued

18O 185 190 Gly Glu Glin Ala Val Lieu. Cys Gly Gly Val Thr Glu Lieu. Ile Lys Ala 195 2 OO 2O5 Gly Phe Glu Thir Lieu Val Glu Ala Gly Tyr Ala Pro Glu Met Ala Tyr 210 215 22O Phe Glu Thir Cys His Glu Lieu Lys Lieu. Ile Val Asp Lieu. Ile Tyr Glin 225 23 O 235 24 O Lys Gly Phe Lys Asn Met Trp Asin Asp Wal Ser Asn. Thir Ala Glu Tyr 245 250 255 Gly Gly Lieu. Thir Arg Arg Ser Arg Ile Val Thr Ala Asp Ser Lys Ala 26 O 265 27 O Ala Met Lys Glu Ile Lieu. Arg Glu Ile Glin Asp Gly Arg Phe Thir Lys 27s 28O 285 Glu Phe Lieu. Lieu. Glu Lys Glin Val Ser Tyr Ala His Lieu Lys Ser Met 290 295 3 OO Arg Arg Lieu. Glu Gly Asp Lieu. Glin Ile Glu Glu Val Gly Ala Lys Lieu 3. OS 310 315 32O Arg Llys Met Cys Gly Lieu. Glu Lys Glu Glu 3.25 330

<210 SEQ ID NO 11 <211 LENGTH: 330 &212> TYPE: PRT <213> ORGANISM: Methanococcus vannielii

<4 OO SEQUENCE: 11 Met Llys Val Phe Tyr Asp Ala Asp Ile Llys Lieu. Asp Ala Lieu Lys Ser 1. 5 1O 15 Llys Thir Ile Ala Val Ile Gly Tyr Gly Ser Glin Gly Arg Ala Glin Ser 2O 25 3 O Lieu. Asn Met Lys Asp Ser Gly Lieu. Asn Val Val Val Gly Lieu. Arg Llys 35 4 O 45 Asn Gly Ala Ser Trp Glu Asn Ala Lys Asn Asp Gly. His Glu Val Lieu SO 55 6 O Thir Ile Glu Glu Ala Ser Llys Lys Ala Asp Ile Ile His Ile Lieu. Ile 65 70 7s 8O Pro Asp Glu Lieu. Glin Ala Glu Val Tyr Glu Ser Glin Ile Llys Pro Tyr 85 9 O 95 Lieu. Thr Glu Gly Lys Thr Lieu Ser Phe Ser His Gly Phe Asin Ile His OO OS 1O yr Gly Phe Ile Ile Pro Pro Lys Gly Val Asn Val Val Lieu Val Ala 15 2O 25 Pro Llys Ser Pro Gly Lys Met Val Arg Llys Thr Tyr Glu Glu Gly Phe 3O 35 4 O Gly Val Pro Gly Lieu. Ile Cys Ile Glu Val Asp Ala Thr Asn. Thir Ala 45 SO 55 160 Phe Glu Thr Val Ser Ala Met Ala Lys Gly Ile Gly Lieu. Ser Arg Ala 65 70 7s Gly Val Ile Glin Thr Thr Phe Arg Glu Glu Thr Glu Thr Asp Leu Phe 8O 85 90 Gly Glu Glin Ala Val Lieu. Cys Gly Gly Val Thr Glu Lieu. Ile Lys Ala 95 2 OO 2O5 US 2009/0163376 A1 Jun. 25, 2009 55

- Continued Gly Phe Glu Thir Lieu Val Glu Ala Gly Tyr Ser Pro Glu Met Ala Tyr 210 215 22O Phe Glu Thir Cys His Glu Lieu Lys Lieu. Ile Val Asp Lieu. Ile Tyr Glin 225 23 O 235 24 O Lys Gly Phe Lys Asn Met Trp His Asp Val Ser Asn Thr Ala Glu Tyr 245 250 255 Gly Gly Lieu. Thir Arg Arg Ser Arg Ile Val Thr Ala Asp Ser Lys Ala 26 O 265 27 O Ala Met Lys Glu Ile Lieu Lys Glu Ile Glin Asp Gly Arg Phe Thir Lys 27s 28O 285 Glu Phe Val Lieu. Glu Asn Glin Ala Lys Met Ala His Lieu Lys Ala Met 290 295 3 OO Arg Arg Lieu. Glu Gly Glu Lieu. Glin Ile Glu Glu Val Gly Ser Lys Lieu 3. OS 310 315 32O Arg Llys Met Cys Gly Lieu. Glu Lys Asp Glu 3.25 330

<210 SEQ ID NO 12 <211 LENGTH: 349 &212> TYPE: PRT <213> ORGANISM: Saccharomyces cerevisiae <4 OO SEQUENCE: 12 Met Leu Lys Glin Ile Asn Phe Gly Gly Thr Val Glu Thr Val Tyr Glu 1. 5 1O 15 Arg Ala Asp Trp Pro Arg Glu Lys Lieu. Lieu. Asp Tyr Phe Lys Asn Asp 2O 25 3 O Thr Phe Ala Lieu. Ile Gly Tyr Gly Ser Glin Gly Tyr Gly Glin Gly Lieu. 35 4 O 45 Asn Lieu. Arg Asp Asn Gly Lieu. Asn Val Ile Ile Gly Val Arg Lys Asp SO 55 6 O Gly Ala Ser Trip Lys Ala Ala Ile Glu Asp Gly Trp Val Pro Gly Lys 65 70 7s 8O Asn Lieu. Phe Thr Val Glu Asp Ala Ile Lys Arg Gly Ser Tyr Val Met 85 9 O 95 Asn Lieu. Lieu. Ser Asp Ala Ala Glin Ser Glu Thir Trp Pro Ala Ile Llys OO OS 1O Pro Leu Lieu. Thir Lys Gly Lys Thr Lieu. Tyr Phe Ser His Gly Phe Ser 15 2O 25 Pro Val Phe Lys Asp Lieu. Thir His Val Glu Pro Pro Lys Asp Lieu. Asp 3O 35 4 O Val Ile Lieu Val Ala Pro Llys Gly Ser Gly Arg Thr Val Arg Ser Lieu 45 SO 55 160 Phe Lys Glu Gly Arg Gly Ile Asin Ser Ser Tyr Ala Val Trp Asn Asp 65 70 7s Val Thr Gly Lys Ala His Glu Lys Ala Glin Ala Lieu Ala Val Ala Ile 8O 85 90 Gly Ser Gly Tyr Val Tyr Glin Thr Thr Phe Glu Arg Glu Val Asin Ser 95 2 OO 2O5 Asp Lieu. Tyr Gly Glu Arg Gly Cys Lieu Met Gly Gly Ile His Gly Met 210 215 22O Phe Lieu Ala Glin Tyr Asp Val Lieu. Arg Glu Asn Gly His Ser Pro Ser 225 23 O 235 24 O US 2009/0163376 A1 Jun. 25, 2009 56

- Continued

Glu Ala Phe Asin Glu Thr Val Glu Glu Ala Thr Glin Ser Lieu. Tyr Pro 245 250 255 Lieu. Ile Gly Lys Tyr Gly Met Asp Tyr Met Tyr Asp Ala Cys Ser Thr 26 O 265 27 O Thir Ala Arg Arg Gly Ala Lieu. Asp Trp Tyr Pro Ile Phe Lys Asn Ala 27s 28O 285 Lieu Lys Pro Val Phe Glin Asp Leu Tyr Glu Ser Thr Lys Asn Gly Thr 290 295 3 OO Glu Thir Lys Arg Ser Lieu. Glu Phe Asn. Ser Glin Pro Asp Tyr Arg Glu 3. OS 310 315 32O Llys Lieu. Glu Lys Glu Lieu. Asp Thir Ile Arg Asn Met Glu Ile Trp Llys 3.25 330 335 Val Gly Lys Glu Val Arg Llys Lieu. Arg Pro Glu Asn Glin 34 O 345

<210 SEQ ID NO 13 <211 LENGTH: 335 &212> TYPE: PRT <213> ORGANISM: Sulfolobus solfataricus

<4 OO SEQUENCE: 13 Met Lys Cys Thir Ser Lys Ile Tyr Thr Asp Asn Asp Ala Asn Lieu. Asp 1. 5 1O 15 Lieu. Ile Lys Gly Lys Arg Ile Ala Val Lieu. Gly Tyr Gly Ser Glin Gly 2O 25 3 O Arg Ala Trp Ala Glin Asn Lieu. Arg Asp Ser Gly Lieu. Asn Val Val Val 35 4 O 45 Gly Lieu. Glu Arg Glu Gly Llys Ser Trp Glu Lieu Ala Lys Ser Asp Gly SO 55 6 O le Thr Pro Lieu. His Thr Lys Asp Ala Wall Lys Asp Ala Asp Ile Ile 65 70 7s 8O le Phe Leu Val Pro Asp Met Val Glin Arg Thr Lieu. Trp Lieu. Glu Ser 85 9 O 95 Val Glin Pro Tyr Met Lys Lys Gly Ala Asp Lieu Val Phe Ala His Gly OO OS 1O Phe Asn. Ile His Tyr Llys Lieu. Ile Asp Pro Pro Lys Asp Ser Asp Wall 15 2O 25 yr Met Ile Ala Pro Lys Gly Pro Gly Pro Thr Val Arg Glu Tyr Tyr 3O 35 4 O Lys Ala Gly Gly Gly Val Pro Ala Lieu Val Ala Val His Glin Asp Wall 45 SO 55 160 Ser Gly Thr Ala Lieu. His Lys Ala Lieu Ala Ile Ala Lys Gly Ile Gly 65 70 7s Ala Thr Arg Ala Gly Val Ile Pro Thr Thr Phe Lys Glu Glu. Thr Glu 8O 85 90 Thr Asp Leu Phe Gly Glu Glin Val Ile Leu Val Gly Gly Ile Met Glu 95 2 OO 2O5 Lieu Met Arg Ala Ala Phe Glu Thir Lieu Val Glu Glu Gly Tyr Glin Pro 210 215 22O Glu Val Ala Tyr Phe Glu Thir Ile Asn. Glu Lieu Lys Met Lieu Val Asp 225 23 O 235 24 O Lieu Val Tyr Glu Lys Gly Ile Ser Gly Met Lieu Lys Ala Val Ser Asp US 2009/0163376 A1 Jun. 25, 2009 57

- Continued

245 250 255 Thr Ala Lys Tyr Gly Gly Met Thr Val Gly Llys Phe Val Ile Asp Glu 26 O 265 27 O Ser Val Arg Lys Arg Met Lys Glu Ala Lieu. Glin Arg Ile Llys Ser Gly 27s 28O 285 Llys Phe Ala Glu Glu Trp Val Glu Glu Tyr Gly Arg Gly Met Pro Thr 290 295 3 OO Val Val Asin Gly Lieu. Ser Asn Val Glin Asn. Ser Lieu. Glu Glu Lys Ile 3. OS 310 315 32O Gly Asn Glin Lieu. Arg Asp Lieu Val Glin Lys Gly Llys Pro Llys Ser 3.25 330 335

<210 SEQ ID NO 14 <211 LENGTH: 328 &212> TYPE: PRT <213> ORGANISM: Pyrobaculum aerophilum

<4 OO SEQUENCE: 14 Met Ala Lys Ile Tyr Thr Asp Arg Glu Ala Ser Lieu. Glu Pro Lieu Lys 1. 5 1O 15 Gly Lys Thir Ile Ala Val Ile Gly Tyr Gly Ile Glin Gly Arg Ala Glin 2O 25 3 O Ala Lieu. Asn Lieu. Arg Asp Ser Gly Lieu. Glu Val Ile Ile Gly Lieu. Arg 35 4 O 45 Arg Gly Gly Lys Ser Trp Glu Lieu Ala Thir Ser Glu Gly Phe Arg Val SO 55 6 O Tyr Glu Ile Gly Glu Ala Val Arg Lys Ala Asp Val Ile Lieu Val Lieu. 65 70 7s 8O le Pro Asp Met Glu Gln Pro Llys Val Trp Gln Glu Glin Ile Ala Pro 85 9 O 95 Asn Lieu Lys Glu Gly Val Val Val Asp Phe Ala His Gly Phe Asin Val OO OS O His Phe Gly Lieu. Ile Llys Pro Pro Lys Asn e Asp Val Ile Met Val 15 2O 25 Ala Pro Lys Ala Pro Gly Lys Ala Val Arg Glu Glu Tyr Lieu Ala Gly 3O 35 4 O Arg Gly Val Pro Ala Lieu Val Ala Val Tyr Glin Asp Tyr Ser Gly Ser 45 SO 55 160 Ala Lieu Lys Tyr Ala Lieu Ala Lieu Ala Lys Gly Ile Gly Ala Thr Arg 65 70 7s Ala Gly Val Ile Glu Thir Thr Phe Ala Glu Glu Thr Glu Thir Asp Leu 8O 85 90 e Gly Glu Glin Ile Val Lieu Val Gly Gly Lieu Met Glu Lieu. Ile Llys 95 2 OO 2O5 Lys Gly Phe Glu Val Lieu Val Glu Met Gly Tyr Gln Pro Glu Val Ala 210 215 22O Tyr Phe Glu Val Lieu. Asn. Glu Ala Lys Lieu. Ile Met Asp Lieu. Ile Trp 225 23 O 235 24 O Glin Arg Gly Ile Tyr Gly Met Lieu. Asn Gly Val Ser Asp Thir Ala Lys 245 250 255 Tyr Gly Gly Lieu. Thr Val Gly Pro Arg Val Ile Asp Glu Asn. Wall Lys 26 O 265 27 O US 2009/0163376 A1 Jun. 25, 2009 58

- Continued Arg Llys Met Lys Glu Ala Ala Met Arg Val Lys Ser Gly Glu Phe Ala 27s 28O 285 Lys Glu Trp Val Glu Glu Tyr Asn Arg Gly Ala Pro Thr Lieu. Arg Llys 290 295 3 OO Lieu Met Glu Glu Ala Arg Thr His Pro Ile Glu, Llys Val Gly Glu Glu 3. OS 310 315 32O Met Arg Llys Lieu Lleu Phe Gly Pro 3.25

<210 SEQ ID NO 15 <211 LENGTH: 338 &212> TYPE: PRT <213> ORGANISM: Ralstonia Solanacearum

<4 OO SEQUENCE: 15 Met Llys Val Phe Tyr Asp Lys Asp Ala Asp Lieu. Ser Lieu. Ile Lys Gly 1. 5 1O 15 Lys Asn Val Thir Ile Ile Gly Tyr Gly Ser Glin Gly His Ala His Ala 2O 25 3 O Lieu. Asn Lieu. Asn Asp Ser Gly Val Llys Val Thr Val Gly Lieu. Arg Llys 35 4 O 45 Asn Gly Ala Ser Trp Asn Lys Ala Val Asn Ala Gly Lieu. Glin Val Lys SO 55 6 O Glu Val Ala Glu Ala Val Lys Asp Ala Asp Val Val Met Ile Lieu. Lieu. 65 70 7s 8O Pro Asp Glu Glin Ile Ala Asp Val Tyr Lys Asn. Glu Val His Gly Asn 85 9 O 95 le Lys Glin Gly Ala Ala Lieu Ala Phe Ala His Gly Phe Asn. Wal His OO OS 1O Tyr Gly Ala Val Ile Pro Arg Ala Asp Lieu. Asp Val Ile Met Val Ala 15 2O 25 Pro Lys Ala Pro Gly His Thr Val Arg Gly Thr Tyr Ala Glin Gly Gly 3O 35 4 O Gly Val Pro His Lieu. Ile Ala Val His Glin Asp Llys Ser Gly Ser Ala 45 SO 55 160 Arg Asp Ile Ala Lieu. Ser Tyr Ala Thir Ala Asn Gly Gly Gly Arg Ala 65 70 7s Gly Ile Ile Glu Thr Asn Phe Arg Glu Glu Thr Glu Thr Asp Leu Phe 8O 85 90 Gly Glu Glin Ala Val Lieu. Cys Gly Gly. Thr Val Glu Lieu. Ile Lys Ala 95 2 OO 2O5 Gly Phe Glu Thir Lieu Val Glu Ala Gly Tyr Ala Pro Glu Met Ala Tyr 210 215 22O Phe Glu. Cys Lieu. His Glu Lieu Lys Lieu. Ile Val Asp Lieu. Ile Tyr Glu 225 23 O 235 24 O Gly Gly Ile Gly Asn Met Asn Tyr Ser Ile Ser Asn. Asn Ala Glu Tyr 245 250 255 Gly Glu Tyr Val Thr Gly Pro Arg Val Val Thr Ala Glu Thir Lys Glin 26 O 265 27 O Ala Met Lys Glin Cys Lieu. His Asp Ile Glin Thr Gly Glu Tyr Ala Lys 27s 28O 285 Ser Phe Lieu. Lieu. Glu Asn Lys Ala Gly Ala Pro Thr Lieu. Ile Ser Arg 290 295 3 OO US 2009/0163376 A1 Jun. 25, 2009 59

- Continued

Arg Arg Lieu. Thir Ala Asp His Glin Ile Glu Glin Val Gly Ala Lys Lieu 3. OS 310 315 32O Arg Ala Met Met Pro Trp Ile Ala Lys Asn Llys Lieu Val Asp Glin Ser 3.25 330 335 Lys Asn

<210 SEQ ID NO 16 <211 LENGTH: 338 &212> TYPE: PRT <213> ORGANISM: Pseudomonas aeruginosa

<4 OO SEQUENCE: 16 Met Arg Val Phe Tyr Asp Lys Asp Cys Asp Lieu. Ser Ile Ile Glin Gly 1. 5 1O 15 Llys Llys Val Ala Ile Ile Gly Tyr Gly Ser Glin Gly His Ala His Ala 2O 25 3 O Cys Asn Lieu Lys Asp Ser Gly Val Asp Val Thr Val Gly Lieu. Arg Ser 35 4 O 45 Gly Ser Ala Thr Val Ala Lys Ala Glu Ala His Gly Lieu Lys Val Ala SO 55 6 O Asp Wall Lys Thr Ala Val Ala Ala Ala Asp Val Val Met Ile Lieu. Thr 65 70 7s 8O Pro Asp Glu Phe Glin Gly Arg Lieu. Tyr Lys Glu Glu. Ile Glu Pro ASn 85 9 O 95 Lieu Lys Lys Gly Ala Thr Lieu Ala Phe Ala His Gly Phe Ser Ile His OO OS 1O yr Asn Glin Val Val Pro Arg Ala Asp Lieu. Asp Wal Ile Met Ile Ala 15 2O 25 Pro Lys Ala Pro Gly His Thr Val Arg Ser Glu Phe Val Lys Gly Gly 3O 35 4 O Gly Ile Pro Asp Lieu. Ile Ala Ile Tyr Glin Asp Ala Ser Gly Asn Ala 45 SO 55 160 Lys Asn. Wall Ala Lieu. Ser Tyr Ala Cys Gly Val Gly Gly Gly Arg Thr 65 70 7s Gly Ile Ile Glu Thir Thr Phe Lys Asp Glu Thr Glu Thr Asp Leu Phe 8O 85 90 Gly Glu Glin Ala Val Lieu. Cys Gly Gly Cys Val Glu Lieu Val Lys Ala 95 2 OO 2O5 Gly Phe Glu Thir Lieu Val Glu Ala Gly Tyr Ala Pro Glu Met Ala Tyr 210 215 22O Phe Glu. Cys Lieu. His Glu Lieu Lys Lieu. Ile Val Asp Lieu Met Tyr Glu 225 23 O 235 24 O Gly Gly Ile Ala Asn Met Asn Tyr Ser Ile Ser Asn. Asn Ala Glu Tyr 245 250 255 Gly Glu Tyr Val Thr Gly Pro Glu Val Ile Asn Ala Glu Ser Arg Ala 26 O 265 27 O Ala Met Arg Asn Ala Lieu Lys Arg Ile Glin Asp Gly Glu Tyr Ala Lys 27s 28O 285 Met Phe Ile Thr Glu Gly Ala Ala Asn Tyr Pro Ser Met Thr Ala Tyr 290 295 3 OO Arg Arg Asn. Asn Ala Ala His Pro Ile Glu Glin Ile Gly Glu Lys Lieu 3. OS 310 315 32O US 2009/0163376 A1 Jun. 25, 2009 60

- Continued

Arg Ala Met Met Pro Trp Ile Ala Ala Asn Lys Ile Val Asp Llys Ser 3.25 330 335

Lys Asn

<210 SEQ ID NO 17 <211 LENGTH: 338 &212> TYPE: PRT <213> ORGANISM: Pseudomonas fluorescens

<4 OO SEQUENCE: 17 Met Llys Val Phe Tyr Asp Lys Asp Cys Asp Lieu. Ser Ile Ile Glin Gly 1. 5 1O 15 Llys Llys Val Ala Ile Ile Gly Tyr Gly Ser Glin Gly His Ala Glin Ala 2O 25 3 O Cys Asn Lieu Lys Asp Ser Gly Val Asp Val Thr Val Gly Lieu. Arg Llys 35 4 O 45 Gly Ser Ala Thr Val Ala Lys Ala Glu Ala His Gly Lieu Lys Val Thr SO 55 6 O Asp Wall Ala Ala Ala Val Ala Gly Ala Asp Lieu Val Met Ile Lieu. Thr 65 70 7s 8O Pro Asp Glu Phe Glin Ser Glin Lieu. Tyr Lys Asn. Glu Ile Glu Pro Asn 85 9 O 95 le Llys Lys Gly Ala Thr Lieu. Ala Phe Ser His Gly Phe Ala Ile His OO OS 1O yr Asn Glin Val Val Pro Arg Ala Asp Lieu. Asp Wal Ile Met Ile Ala 15 2O 25 Pro Lys Ala Pro Gly His Thr Val Arg Ser Glu Phe Val Lys Gly Gly 3O 35 4 O Gly Ile Pro Asp Lieu. Ile Ala Ile Tyr Glin Asp Ala Ser Gly Asn Ala 45 SO 55 160 Lys Asn. Wall Ala Lieu. Ser Tyr Ala Ala Gly Val Gly Gly Gly Arg Thr 65 70 7s Gly Ile Ile Glu Thir Thr Phe Lys Asp Glu Thr Glu Thr Asp Leu Phe 8O 85 90 Gly Glu Glin Ala Val Lieu. Cys Gly Gly. Thr Val Glu Lieu Val Lys Ala 95 2 OO 2O5 Gly Phe Glu Thir Lieu Val Glu Ala Gly Tyr Ala Pro Glu Met Ala Tyr 210 215 22O Phe Glu. Cys Lieu. His Glu Lieu Lys Lieu. Ile Val Asp Lieu Met Tyr Glu 225 23 O 235 24 O Gly Gly Ile Ala Asn Met Asn Tyr Ser Ile Ser Asn. Asn Ala Glu Tyr 245 250 255 Gly Glu Tyr Val Thr Gly Pro Glu Val Ile Asn Ala Glu Ser Arg Glin 26 O 265 27 O Ala Met Arg Asn Ala Lieu Lys Arg Ile Glin Asp Gly Glu Tyr Ala Lys 27s 28O 285 Met Phe Ile Ser Glu Gly Ala Thr Gly Tyr Pro Ser Met Thr Ala Lys 290 295 3 OO Arg Arg Asn. Asn Ala Ala His Gly Ile Glu Ile Ile Gly Glu Gln Lieu 3. OS 310 315 32O Arg Ser Met Met Pro Trp Ile Gly Ala Asn Lys Ile Val Asp Lys Ala 3.25 330 335 US 2009/0163376 A1 Jun. 25, 2009 61

- Continued

Lys Asn

<210 SEQ ID NO 18 <211 LENGTH: 338 &212> TYPE: PRT <213> ORGANISM: Spinacia oleracea

<4 OO SEQUENCE: 18 Met Arg Val Phe Tyr Asp Lys Asp Cys Asp Lieu. Ser Ile Ile Glin Gly 1. 5 1O 15 Llys Llys Val Ala Ile Ile Gly Tyr Gly Ser Glin Gly His Ala His Ala 2O 25 3 O Cys Asn Lieu Lys Asp Ser Gly Val Asp Val Thr Val Gly Lieu. Arg Ser 35 4 O 45 Gly Ser Ala Thr Val Ala Lys Ala Glu Ala His Gly Lieu Lys Val Ala SO 55 6 O Asp Wall Lys Thr Ala Val Ala Ala Ala Asp Val Val Met Ile Lieu. Thr 65 70 7s 8O Pro Asp Glu Phe Glin Gly Arg Lieu. Tyr Lys Glu Glu Ile Glu Pro Asn 85 9 O 95 Lieu Lys Lys Gly Ala Thr Lieu Ala Phe Ala His Gly Phe Ser Ile His OO OS 1O Tyr ASn Glin Val Val Pro Arg Ala Asp Lieu. Asp Val Ile Met Ile Ala 15 2O 25 Pro Lys Ala Pro Gly His Thr Val Arg Ser Glu Phe Val Lys Gly Gly 3O 35 4 O Gly Ile Pro Asp Lieu. Ile Ala Ile Tyr Glin Asp Ala Ser Gly Asn Ala 45 SO 55 160 Lys Asn. Wall Ala Lieu. Ser Tyr Ala Cys Gly Val Gly Gly Gly Arg Thr 65 70 7s Gly Ile Ile Glu Thir Thr Phe Lys Asp Glu Thr Glu Thr Asp Leu Phe 8O 85 90 Gly Glu Glin Ala Val Lieu. Cys Gly Gly Cys Val Glu Lieu Val Lys Ala 95 2 OO 2O5 Gly Phe Glu Thir Lieu Val Glu Ala Gly Tyr Ala Pro Glu Met Ala Tyr 210 215 22O Phe Glu. Cys Lieu. His Glu Lieu Lys Lieu. Ile Val Asp Lieu Met Tyr Glu 225 23 O 235 24 O Gly Gly Ile Ala Asn Met Asn Tyr Ser Ile Ser Asn. Asn Ala Glu Tyr 245 250 255 Gly Glu Tyr Val Thr Gly Pro Glu Val Ile Asn Ala Glu Ser Arg Ala 26 O 265 27 O Ala Met Arg Asn Ala Lieu Lys Arg Ile Glin Asp Gly Glu Tyr Ala Lys 27s 28O 285 Met Phe Ile Thr Glu Gly Ala Ala Asn Tyr Pro Ser Met Thr Ala Tyr 290 295 3 OO Arg Arg Asn. Asn Ala Ala His Pro Ile Glu Glin Ile Gly Glu Lys Lieu 3. OS 310 315 32O Arg Ala Met Met Pro Trp Ile Ala Ala Asn Lys Ile Val Asp Llys Ser 3.25 330 335

Lys Asn US 2009/0163376 A1 Jun. 25, 2009 62

- Continued

<210 SEQ ID NO 19 <211 LENGTH: 338 &212> TYPE: PRT <213> ORGANISM: Pseudomonas fluorescens

<4 OO SEQUENCE: 19 Met Llys Val Phe Tyr Asp Lys Asp Cys Asp Lieu. Ser Ile Ile Glin Gly 1. 5 1O 15 Llys Llys Val Ala Ile Ile Gly Phe Gly Ser Glin Gly His Ala Glin Ala 2O 25 3 O Cys Asn Lieu Lys Asp Ser Gly Val Asp Val Thr Val Gly Lieu. Tyr Lys 35 4 O 45 Gly Ala Ala Asp Ala Ala Lys Ala Glu Ala His Gly Phe Llys Val Thr SO 55 6 O Asp Wall Ala Ala Ala Val Ala Gly Ala Asp Lieu Val Met Ile Lieu. Thr 65 70 7s 8O Pro Asp Glu Phe Glin Ser Glin Lieu. Tyr Lys Asn. Glu Ile Glu Pro Asn 85 9 O 95 le Llys Lys Gly Ala Thr Lieu Ala Phe Ser His Gly Phe Ala Ile His OO OS 1O yr Asn Glin Val Val Pro Arg Ala Asp Lieu. Asp Wal Ile Met Ile Ala 15 2O 25 Pro Lys Ala Pro Gly His Thr Val Arg Ser Glu Phe Val Lys Gly Gly 3O 35 4 O Gly Ile Pro Asp Lieu. Ile Ala Ile Tyr Glin Asp Ala Ser Gly Asn Ala 45 SO 55 160 Lys Asn. Wall Ala Lieu. Ser Tyr Ala Ala Ala Val Gly Gly Gly Arg Thr 65 70 7s Gly Ile Ile Glu Thir Thr Phe Lys Asp Glu Thr Glu Thr Asp Leu Phe 8O 85 90 Gly Glu Glin Ala Val Lieu. Cys Gly Gly. Thr Val Glu Lieu Val Lys Ala 95 2 OO 2O5 Gly Phe Glu Thir Lieu Val Glu Ala Gly Tyr Ala Pro Glu Met Ala Tyr 210 215 22O Phe Glu. Cys Lieu. His Glu Lieu Lys Lieu. Ile Val Asp Lieu Met Tyr Glu 225 23 O 235 24 O Gly Gly Ile Ala Asn Met Asn Tyr Ser Ile Ser Asn. Asn Ala Glu Tyr 245 250 255 Gly Glu Tyr Val Thr Gly Pro Glu Val Ile Asn Ala Glu Ser Arg Glin 26 O 265 27 O Ala Met Arg Asn Ala Lieu Lys Arg Ile Glin Asp Gly Glu Tyr Ala Lys 27s 28O 285 Met Phe Ile Ser Glu Gly Ala Thr Gly Tyr Pro Ser Met Thr Ala Lys 290 295 3 OO Arg Arg Asn. Asn Ala Ala His Gly Ile Glu Ile Ile Gly Glu Gln Lieu 3. OS 310 315 32O Arg Ser Met Met Pro Trp Ile Gly Ala Asn Lys Ile Val Asp Lys Ala 3.25 330 335 Lys Asn

<210 SEQ ID NO 2 O <211 LENGTH: 338 US 2009/0163376 A1 Jun. 25, 2009 63

- Continued

&212> TYPE: PRT <213> ORGANISM: Pseudomonas fluorescens

<4 OO SEQUENCE: 2O Met Llys Val Phe Tyr Asp Lys Asp Cys Asp Lieu. Ser Ile Ile Glin Gly 1. 5 1O 15 Llys Llys Val Ala Ile Ile Gly Phe Gly Ser Glin Gly His Ala Glin Ala 2O 25 3 O Cys Asn Lieu Lys Asp Ser Gly Val Asp Val Thr Val Gly Lieu. Tyr Lys 35 4 O 45 Gly Ala Ala Asp Ala Ala Lys Ala Glu Ala His Gly Phe Llys Val Thr SO 55 6 O Asp Wall Ala Ala Ala Val Ala Gly Ala Asp Lieu Val Met Ile Lieu. Thr 65 70 7s 8O Pro Asp Glu Phe Glin Ser Glin Lieu. Tyr Lys Asn. Glu Ile Glu Pro Asn 85 9 O 95 le Llys Lys Gly Ala Thr Lieu Ala Phe Ser His Gly Phe Ala Ile His OO OS 1O Tyr Asn Glin Val Val Pro Arg Ala Asp Lieu. Asp Val Ile Met Ile Ala 15 2O 25 Pro Lys Ala Pro Gly His Thr Val Arg Ser Glu Phe Val Lys Gly Gly 3O 35 4 O Gly Ile Pro Asp Lieu. Ile Ala Ile Tyr Glin Asp Ala Ser Gly Asn Ala 45 SO 55 160 Lys Asn. Wall Ala Lieu. Ser Tyr Ala Ala Ala Val Gly Gly Gly Arg Thr 65 70 7s Gly Ile Ile Glu Thir Thr Phe Lys Asp Glu Thr Glu Thr Asp Leu Phe 8O 85 90 Gly Glu Glin Ala Val Lieu. Cys Gly Gly. Thr Val Glu Lieu Val Lys Ala 95 2 OO 2O5 Gly Phe Glu Thir Lieu Val Glu Ala Gly Tyr Ala Pro Glu Met Ala Tyr 210 215 22O Phe Glu. Cys Lieu. His Glu Lieu Lys Lieu. Ile Val Asp Lieu Met Tyr Glu 225 23 O 235 24 O Gly Gly Ile Ala Asn Met Asn Tyr Ser Ile Ser Asn. Asn Ala Glu Tyr 245 250 255 Gly Glu Tyr Val Thr Gly Pro Glu Val Ile Asn Ala Glu Ser Arg Glin 26 O 265 27 O Ala Met Arg Asn Ala Lieu Lys Arg Ile Glin Asp Gly Glu Tyr Ala Lys 27s 28O 285 Met Phe Ile Ser Glu Gly Ala Thr Gly Tyr Pro Ser Met Thr Ala Lys 290 295 3 OO Arg Arg Asn. Asn Ala Ala His Gly Ile Glu Ile Ile Gly Glu Gln Lieu 3. OS 310 315 32O Arg Ser Met Met Pro Trp Ile Gly Ala Asn Lys Ile Val Asp Lys Ala 3.25 330 335 Lys Asn

<210 SEQ ID NO 21 <211 LENGTH: 24 &212> TYPE: DNA <213> ORGANISM: artificial sequence &220s FEATURE: US 2009/0163376 A1 Jun. 25, 2009 64

- Continued <223> OTHER INFORMATION: primer pBAD 266 <4 OO SEQUENCE: 21 citct ct actg tttct c cata ccc.g 24

<210 SEQ ID NO 22 <211 LENGTH: 27 &212> TYPE: DNA <213> ORGANISM: artificial sequence &220s FEATURE: <223> OTHER INFORMATION: primer PF5- 53Mt &220s FEATURE: <221 NAMEAKEY: misc feature <222> LOCATION: (26) ... (27) <223> OTHER INFORMATION: n is a, c, g, or t <4 OO SEQUENCE: 22 Caag.ccgtgg gct tcagcct tckinn 27

<210 SEQ ID NO 23 <211 LENGTH: 23 &212> TYPE: DNA <213> ORGANISM: artificial sequence &220s FEATURE: <223> OTHER INFORMATION: primer pBAD 866 <4 OO SEQUENCE: 23 cggttt cagt ct cqtcCttg aag 23

<210 SEQ ID NO 24 <211 LENGTH: 338 &212> TYPE: PRT <213> ORGANISM: artificial sequence &220s FEATURE: <223> OTHER INFORMATION: synthetic construct mutant ilcV <4 OO SEQUENCE: 24 Met Llys Val Phe Tyr Asp Lys Asp Cys Asp Lieu. Ser Ile Ile Glin Gly 1. 5 1O 15 Llys Llys Val Ala Ile Ile Gly Phe Gly Ser Glin Gly His Ala Glin Ala 2O 25 3 O Cys Asn Lieu Lys Asp Ser Gly Val Asp Val Thr Val Gly Lieu. Tyr Lys 35 4 O 45 Gly Ala Ala Asp Ala Ala Lys Ala Glu Ala His Gly Phe Llys Val Thr SO 55 6 O Asp Wall Ala Ala Ala Val Ala Gly Ala Asp Lieu Val Met Ile Lieu. Thr 65 70 7s 8O Pro Asp Glu Phe Glin Ser Glin Lieu. Tyr Lys Asn. Glu Ile Glu Pro Asn 85 9 O 95 Ile Llys Lys Gly Ala Thr Lieu Ala Phe Ser His Gly Phe Ala Ile His 1 OO 105 11O Tyr Asn Glin Val Val Pro Arg Ala Asp Lieu. Asp Val Ile Met Ile Ala 115 12O 125 Pro Lys Ala Pro Gly His Thr Val Arg Ser Glu Phe Val Lys Gly Gly 13 O 135 14 O Gly Ile Pro Asp Lieu. Ile Ala Ile Tyr Glin Asp Val Ser Gly Asn Ala 145 150 155 160 Lys Asn. Wall Ala Lieu. Ser Tyr Ala Ala Gly Val Gly Gly Gly Arg Thr 1.65 17O 17s US 2009/0163376 A1 Jun. 25, 2009 65

- Continued

Gly Ile Ile Glu Thir Thr Phe Lys Asp Glu Thr Glu Thr Asp Leu Phe 18O 185 190 Gly Glu Glin Ala Val Lieu. Cys Gly Gly. Thr Val Glu Lieu Val Lys Ala 195 2 OO 2O5 Gly Phe Glu Thir Lieu Val Glu Ala Gly Tyr Ala Pro Glu Met Ala Tyr O 215 22O Phe Glu. Cys Lieu. His Glu Lieu Lys Lieu. Ile Val Asp Lieu Met Tyr Glu 225 23 O 235 24 O Gly Gly Ile Ala Asn Met Asn Tyr Ser Ile Ser Asn. Asn Ala Glu Tyr 245 250 255 Gly Glu Tyr Val Thr Gly Pro Glu Val Ile Asn Ala Glu Ser Arg Glin 26 O 265 27 O Ala Met Arg Asn Ala Lieu Lys Arg Ile Glin Asp Gly Glu Tyr Ala Lys 27s 28O 285 Met Phe Ile Ser Glu Gly Ala Thr Gly Tyr Pro Ser Met Thr Ala Lys 290 295 3 OO Arg Arg Asn. Asn Ala Ala His Gly Ile Glu Ile Ile Gly Glu Gln Lieu 3. OS 310 315 32O Arg Ser Met Met Pro Trp Ile Gly Ala Asn Lys Ile Val Asp Lys Ala 3.25 330 335

Lys Asn

<210 SEQ ID NO 25 <211 LENGTH: 338 &212> TYPE: PRT <213> ORGANISM: artificial sequence &220s FEATURE: <223> OTHER INFORMATION: synthetic construct mutant ilcV <4 OO SEQUENCE: 25 Met Llys Val Phe Tyr Asp Lys Asp Cys Asp Lieu. Ser Ile Ile Glin Gly 1. 5 1O 15 Llys Llys Val Ala Ile Ile Gly Phe Gly Ser Glin Gly His Ala Glin Ala 2O 25 3 O Lieu. Asn Lieu Lys Asp Ser Gly Val Asp Val Thr Val Gly Lieu. Tyr Lys 35 4 O 45 Gly Ala Ala Asp Ala Ala Lys Ala Glu Ala His Gly Phe Llys Val Thr SO 55 6 O Asp Wall Ala Ala Ala Val Ala Gly Ala Asp Lieu Val Met Ile Lieu. Thr 65 70 7s 8O Pro Asp Glu Phe Glin Ser Glin Lieu. Tyr Lys Asn. Glu Ile Glu Pro Asn 85 9 O 95 Ile Llys Lys Gly Ala Thr Lieu Ala Phe Ser His Gly Phe Ala Ile His 1 OO 105 11O Tyr Asn Glin Val Val Pro Arg Ala Asp Lieu. Asp Val Ile Met Ile Ala 115 12O 125 Pro Lys Ala Pro Gly His Thr Val Arg Ser Glu Phe Val Lys Gly Gly 13 O 135 14 O Gly Ile Pro Asp Lieu. Ile Ala Ile Tyr Glin Asp Ala Ser Gly Asn Ala 145 150 155 160 Lys Asn. Wall Ala Lieu. Ser Tyr Ala Ala Gly Val Gly Gly Gly Arg Thr 1.65 17O 17s US 2009/0163376 A1 Jun. 25, 2009 66

- Continued Gly Ile Ile Glu Thir Thr Phe Lys Asp Glu Thr Glu Thr Asp Leu Phe 18O 185 190 Gly Glu Glin Ala Val Lieu. Cys Gly Gly. Thr Val Glu Lieu Val Lys Ala 195 2 OO 2O5 Gly Phe Glu Thir Lieu Val Glu Ala Gly Tyr Ala Pro Glu Met Ala Tyr 210 215 22O Phe Glu. Cys Lieu. His Glu Lieu Lys Lieu. Ile Val Asp Lieu Met Tyr Glu 225 23 O 235 24 O Gly Gly Ile Ala Asn Met Asn Tyr Ser Ile Ser Asn. Asn Ala Glu Tyr 245 250 255 Gly Glu Tyr Val Thr Gly Pro Glu Val Ile Asn Ala Glu Ser Arg Glin 26 O 265 27 O Ala Met Arg Asn Ala Lieu Lys Arg Ile Glin Asp Gly Glu Tyr Ala Lys 27s 28O 285 Met Phe Ile Ser Glu Gly Ala Thr Gly Tyr Pro Ser Met Thr Ala Lys 290 295 3 OO Arg Arg Asn. Asn Ala Ala His Gly Ile Glu Ile Ile Gly Glu Gln Lieu 3. OS 310 315 32O Arg Ser Met Met Pro Trp Ile Gly Ala Asn Lys Ile Val Asp Lys Ala 3.25 330 335 Lys Asn

<210 SEQ ID NO 26 <211 LENGTH: 338 &212> TYPE: PRT <213> ORGANISM: artificial sequence &220s FEATURE: <223> OTHER INFORMATION: a KARI multant

<4 OO SEQUENCE: 26 Met Llys Val Phe Tyr Asp Lys Asp Cys Asp Lieu. Ser Ile Ile Glin Gly 1. 5 1O 15 Llys Llys Val Ala Ile Ile Gly Phe Gly Ser Glin Gly His Ala Glin Ala 2O 25 3 O Lieu. Asn Lieu Lys Asp Ser Gly Val Asp Val Thr Val Gly Lieu. Tyr Lys 35 4 O 45 Gly Ala Ala Asp Ala Ala Lys Ala Glu Ala His Gly Phe Llys Val Thr SO 55 6 O Asp Wall Ala Ala Ala Val Ala Gly Ala Asp Lieu Val Met Ile Lieu. Thr 65 70 7s 8O Pro Asp Glu Phe Glin Ser Glin Lieu. Tyr Lys Asn. Glu Ile Glu Pro Asn 85 9 O 95 Ile Llys Lys Gly Ala Thr Lieu Ala Phe Ser His Gly Phe Ala Ile His 1 OO OS 11O Tyr Asn Glin Val Val Pro Arg Ala Asp Lieu. Asp Val Ile Met Ile Ala 115 2O 125 Pro Lys Ala Pro Gly His Thr Val Arg Ser Glu Phe Val Lys Gly Gly 13 O 35 14 O Gly Ile Pro Asp Lieu. Ile Ala Ile Tyr Glin Asp Val Ser Gly Asn Ala 145 SO 155 160 Lys Asn. Wall Ala Lieu. Ser Tyr Ala Ala Gly Val Gly Gly Gly Arg Thr 1.65 70 17s Gly Ile Ile Glu Thir Thr Phe Lys Asp Glu Thr Glu Thr Asp Leu Phe US 2009/0163376 A1 Jun. 25, 2009 67

- Continued

18O 185 190 Gly Glu Glin Ala Val Lieu. Cys Gly Gly. Thr Val Glu Lieu Val Lys Ala 195 2 OO 2O5 Gly Phe Glu Thir Lieu Val Glu Ala Gly Tyr Ala Pro Glu Met Ala Tyr 210 215 22O Phe Glu. Cys Lieu. His Glu Lieu Lys Lieu. Ile Val Asp Lieu Met Tyr Glu 225 23 O 235 24 O Gly Gly Ile Ala Asn Met Asn Tyr Ser Ile Ser Asn. Asn Ala Glu Tyr 245 250 255 Gly Glu Tyr Val Thr Gly Pro Glu Val Ile Asn Ala Glu Ser Arg Glin 26 O 265 27 O Ala Met Arg Asn Ala Lieu Lys Arg Ile Glin Asp Gly Glu Tyr Ala Lys 27s 28O 285 Met Phe Ile Ser Glu Gly Ala Thr Gly Tyr Pro Ser Met Thr Ala Lys 290 295 3 OO Arg Arg Asn. Asn Ala Ala His Gly Ile Glu Ile Ile Gly Glu Gln Lieu 3. OS 310 315 32O Arg Ser Met Met Pro Trp Ile Gly Ala Asn Lys Ile Val Asp Lys Ala 3.25 330 335 Lys Asn

<210 SEQ ID NO 27 <211 LENGTH: 338 &212> TYPE: PRT <213> ORGANISM: Artificial sequence &220s FEATURE: <223> OTHER INFORMATION: synthetic construct mutant ilcV <4 OO SEQUENCE: 27 Met Llys Val Phe Tyr Asp Lys Asp Cys Asp Lieu. Ser Ile Ile Glin Gly 1. 5 1O 15 Llys Llys Val Ala Ile Ile Gly Phe Gly Ser Glin Gly His Ala Glin Ala 2O 25 3 O Lieu. Asn Lieu Lys Asp Ser Gly Val Asp Val Thr Val Gly Lieu. Tyr Lys 35 4 O 45 Gly Ala Ala Asp Ala Ala Lys Ala Glu Ala His Gly Phe Llys Val Thr SO 55 6 O Asp Wall Ala Ala Ala Val Ala Gly Ala Asp Lieu Val Met Ile Lieu. Thr 65 70 7s 8O Pro Asp Glu Phe Glin Ser Glin Lieu. Tyr Lys Asn. Glu Ile Glu Pro Asn 85 9 O 95 le Llys Lys Gly Ala Thr Lieu Ala Phe Ser His Gly Phe Ala Ile His OO OS 1O yr Asn Glin Val Val Pro Arg Ala Asp Lieu. Asp Wal Ile Met Ile Ala 15 2O 25 Pro Lys Ala Pro Gly His Thr Val Arg Ser Glu Phe Val Lys Gly Gly 3O 35 4 O Gly Ile Pro Asp Lieu. Ile Ala Ile Tyr Glin Asp Ala Ser Gly Asn Ala 45 SO 55 160 Lys Asn. Wall Ala Lieu. Ser Tyr Ala Ala Ala Val Gly Gly Gly Arg Thr 65 70 7s Gly Ile Ile Glu Thir Thr Phe Lys Asp Glu Thr Glu Thr Asp Leu Phe 8O 85 90 US 2009/0163376 A1 Jun. 25, 2009 68

- Continued

Gly Glu Glin Ala Val Lieu. Cys Gly Gly. Thr Val Glu Lieu Val Lys Ala 195 2 OO 2O5 Gly Phe Glu Thir Lieu Val Glu Ala Gly Tyr Ala Pro Glu Met Ala Tyr O 215 22O Phe Glu. Cys Lieu. His Glu Lieu Lys Lieu. Ile Val Asp Lieu Met Tyr Glu 225 23 O 235 24 O Gly Gly Ile Ala Asn Met Asn Tyr Ser Ile Ser Asn. Asn Ala Glu Tyr 245 250 255 Gly Glu Tyr Val Thr Gly Pro Glu Val Ile Asn Ala Glu Ser Arg Glin 26 O 265 27 O Ala Met Arg Asn Ala Lieu Lys Arg Ile Glin Asp Gly Glu Tyr Ala Lys 27s 28O 285 Met Phe Ile Ser Glu Gly Ala Thr Gly Tyr Pro Ser Met Thr Ala Lys 290 295 3 OO Arg Arg Asn. Asn Ala Ala His Gly Ile Glu Ile Ile Gly Glu Gln Lieu 3. OS 310 315 32O Arg Ser Met Met Pro Trp Ile Gly Ala Asn Lys Ile Val Asp Lys Ala 3.25 330 335

Lys Asn

<210 SEQ ID NO 28 <211 LENGTH: 338 &212> TYPE: PRT <213> ORGANISM: Artificial sequence &220s FEATURE: <223> OTHER INFORMATION: synthetic construct mutant ilcV <4 OO SEQUENCE: 28 Met Llys Val Phe Tyr Asp Lys Asp Cys Asp Lieu. Ser Ile Ile Glin Gly 1. 5 1O 15 Llys Llys Val Ala Ile Ile Gly Tyr Gly Ser Glin Gly His Ala Glin Ala 2O 25 3 O Lieu. Asn Lieu Lys Asp Ser Gly Val Asp Val Thr Val Gly Lieu. Tyr Lys 35 4 O 45 Gly Ala Ala Asp Ala Ala Lys Ala Glu Ala His Gly Phe Llys Val Thr SO 55 6 O Asp Wall Ala Ala Ala Val Ala Gly Ala Asp Lieu Val Met Ile Lieu. Ile 65 70 7s 8O Pro Asp Glu Phe Glin Ser Glin Lieu. Tyr Lys Asn. Glu Ile Glu Pro Asn 85 9 O 95 le Llys Lys Gly Ala Thr Lieu Ala Phe Ser His Gly Phe Ala Ile His OO OS 1O Tyr Asn Glin Val Val Pro Arg Ala Asp Lieu. Asp Val Ile Met Ile Ala 15 2O 25 Pro Lys Ala Pro Gly His Thr Val Arg Ser Glu Phe Val Lys Gly Gly 3O 35 4 O Gly Ile Pro Asp Lieu. Ile Ala Ile Tyr Glin Asp Val Ser Gly Asn Ala 45 SO 55 160 Lys Asn. Wall Ala Lieu. Ser Tyr Ala Ala Ala Val Gly Gly Gly Arg Thr 65 70 7s Gly Ile Ile Glu Thir Thr Phe Lys Asp Glu Thr Glu Thr Asp Leu Phe 8O 85 90 US 2009/0163376 A1 Jun. 25, 2009 69

- Continued Gly Glu Glin Ala Val Lieu. Cys Gly Gly. Thr Val Glu Lieu Val Lys Ala 195 2 OO 2O5 Gly Phe Glu Thir Lieu Val Glu Ala Gly Tyr Ala Pro Glu Met Ala Tyr 210 215 22O Phe Glu. Cys Lieu. His Glu Lieu Lys Lieu. Ile Val Asp Lieu Met Tyr Glu 225 23 O 235 24 O Gly Gly Ile Ala Asn Met Asn Tyr Ser Ile Ser Asn. Asn Ala Glu Tyr 245 250 255 Gly Glu Tyr Val Thr Gly Pro Glu Val Ile Asn Ala Glu Ser Arg Glin 26 O 265 27 O Ala Met Arg Asn Ala Lieu Lys Arg Ile Glin Asp Gly Glu Tyr Ala Lys 27s 28O 285 Met Phe Ile Ser Glu Gly Ala Thr Gly Tyr Pro Ser Met Thr Ala Lys 290 295 3 OO Arg Arg Asn. Asn Ala Ala His Gly Ile Glu Ile Ile Gly Glu Gln Lieu 3. OS 310 315 32O Arg Ser Met Met Pro Trp Ile Gly Ala Asn Lys Ile Val Asp Lys Ala 3.25 330 335

Lys Asn

<210 SEQ ID NO 29 <211 LENGTH: 338 &212> TYPE: PRT <213> ORGANISM: Artificial sequence &220s FEATURE: <223> OTHER INFORMATION: synthetic construct &220s FEATURE: <221 NAMEAKEY: SC FEATURE <222> LOCATION: (24) . . (24) <223> OTHER INFORMATION: Xaa = Tyr or Phe &220s FEATURE: <221 NAMEAKEY: SC FEATURE <222> LOCATION: (33) . . (33) <223> OTHER INFORMATION: Xaa = Cys or Lieu &220s FEATURE: <221 NAMEAKEY: SC FEATURE <222> LOCATION: (47) ... (47) <223> OTHER INFORMATION: Xaa = Arg or Tyr &220s FEATURE: <221 NAMEAKEY: SC FEATURE <222> LOCATION: (50) . . (50) <223> OTHER INFORMATION: Xaa = Ser or Ala &220s FEATURE: <221 NAMEAKEY: SC FEATURE <222> LOCATION: (52 ) . . (52) <223> OTHER INFORMATION: Xaa = Thir or Asp &220s FEATURE: <221 NAMEAKEY: SC FEATURE <222> LOCATION: (53) . . (53) <223> OTHER INFORMATION: Xaa = Wall or Ala &220s FEATURE: <221 NAMEAKEY: SC FEATURE <222> LOCATION: (61) . . (61) <223> OTHER INFORMATION: Xaa = Lieu or Phe &220s FEATURE: <221 NAMEAKEY: SC FEATURE <222> LOCATION: (80) ... (80) &223> OTHER INFORMATION: Xaa = Thir or Iso &220s FEATURE: <221 NAMEAKEY: SC FEATURE <222> LOCATION: (156) ... (156) <223> OTHER INFORMATION: Xaa = Ala or Wall &220s FEATURE: <221 NAMEAKEY: SC FEATURE <222> LOCATION: (17O) . . (170) <223> OTHER INFORMATION: Xaa = Gly or Ala US 2009/0163376 A1 Jun. 25, 2009 70

- Continued

<4 OO SEQUENCE: 29 Met Llys Val Phe Tyr Asp Lys Asp Cys Asp Lieu. Ser Ile Ile Glin Gly 1. 5 1O 15 Llys Llys Val Ala Ile Ile Gly Xaa Gly Ser Glin Gly His Ala Glin Ala 2O 25 3 O Xaa Asn Lieu Lys Asp Ser Gly Val Asp Val Thr Val Gly Lieu. Xaa Lys 35 4 O 45 Gly Xaa Ala Xala Xaa Ala Lys Ala Glu Ala His Gly Xaa Lys Val Thr SO 55 6 O Asp Wall Ala Ala Ala Val Ala Gly Ala Asp Lieu Val Met Ile Lieu. Xaa 65 70 7s 8O Pro Asp Glu Phe Glin Ser Glin Lieu. Tyr Lys Asn. Glu Ile Glu Pro Asn 85 9 O 95 le Llys Lys Gly Ala Thr Lieu Ala Phe Ser His Gly Phe Ala Ile His OO OS 1O Tyr Asn Glin Val Val Pro Arg Ala Asp Lieu. Asp Val Ile Met Ile Ala 15 2O 25 Pro Lys Ala Pro Gly His Thr Val Arg Ser Glu Phe Val Lys Gly Gly 3O 35 4 O Gly Ile Pro Asp Lieu. Ile Ala Ile Tyr Glin Asp Xaa Ser Gly Asn Ala 45 SO 55 160 Lys Asn. Wall Ala Lieu. Ser Tyr Ala Ala Xaa Val Gly Gly Gly Arg Thr 65 70 7s Gly Ile Ile Glu Thir Thr Phe Lys Asp Glu Thr Glu Thr Asp Leu Phe 8O 85 90 Gly Glu Glin Ala Val Lieu. Cys Gly Gly. Thr Val Glu Lieu Val Lys Ala 95 2 OO 2O5 Gly Phe Glu Thir Lieu Val Glu Ala Gly Tyr Ala Pro Glu Met Ala Tyr 210 215 22O Phe Glu. Cys Lieu. His Glu Lieu Lys Lieu. Ile Val Asp Lieu Met Tyr Glu 225 23 O 235 24 O Gly Gly Ile Ala Asn Met Asn Tyr Ser Ile Ser Asn. Asn Ala Glu Tyr 245 250 255 Gly Glu Tyr Val Thr Gly Pro Glu Val Ile Asn Ala Glu Ser Arg Glin 26 O 265 27 O Ala Met Arg Asn Ala Lieu Lys Arg Ile Glin Asp Gly Glu Tyr Ala Lys 27s 28O 285 Met Phe Ile Ser Glu Gly Ala Thr Gly Tyr Pro Ser Met Thr Ala Lys 290 295 3 OO Arg Arg Asn. Asn Ala Ala His Gly Ile Glu Ile Ile Gly Glu Gln Lieu 3. OS 310 315 32O Arg Ser Met Met Pro Trp Ile Gly Ala Asn Lys Ile Val Asp Lys Ala 3.25 330 335

Lys Asn

<210 SEQ ID NO 3 O <211 LENGTH: 331 &212> TYPE: PRT <213> ORGANISM: Natronomonas pharaanis <4 OO SEQUENCE: 30 US 2009/0163376 A1 Jun. 25, 2009 71

- Continued Met Thr Asp Ala Thir Ile Tyr Tyr Asp Asp Asp Ala Glu Ser Thr Val 1. 5 1O 15 Lieu. Asp Asp Llys Thr Val Ala Val Lieu. Gly Tyr Gly Ser Glin Gly His 2O 25 3 O Ala His Ala Glin Asn Lieu. Asp Asp Ser Gly Val Asp Val Val Val Gly 35 4 O 45 Lieu. Arg Glu Asp Ser Ser Ser Arg Ser Ala Ala Glu Ala Asp Gly Lieu. SO 55 6 O Asp Wall Ala Thr Pro Arg Gly Ala Ala Glu Glin Ala Asp Lieu Val Ser 65 70 7s 8O Val Lieu Val Pro Asp Thr Val Glin Pro Ala Val Tyr Glu Glin Ile Glu 85 9 O 95 Asp Val Lieu. Glin Pro Gly Asp Thir Lieu. Glin Phe Ala His Gly Phe Asn OO OS 1O le His Tyr Gly Glin Ile Glu Pro Ser Glu Asp Val Asn Val Thr Met 15 2O 25 Val Ala Pro Llys Ser Pro Gly His Lieu Val Arg Arg Asn Tyr Glu Asn 3O 35 4 O Asp Glu Gly Thr Pro Gly Lieu. Lieu Ala Val Tyr Glin Asp Pro Ser Gly 45 SO 55 160 Glu Ala His Asp Lieu. Gly Lieu Ala Tyr Ala Lys Ala Ile Gly Cys Thr 65 70 7s Arg Ala Gly Val Val Glu Thir Thr Phe Arg Glu Glu Thr Glu. Thir Asp 8O 85 90 Lieu. Phe Gly Glu Glin Ala Val Lieu. Cys Gly Gly Val Thir Ser Lieu Val 95 2 OO 2O5 Lys Thr Gly Tyr Glu Thir Lieu Val Asp Ala Gly Tyr Ser Pro Glu Met 210 215 22O Ala Tyr Phe Glu. Cys Lieu. Asn. Glu Lieu Lys Lieu. Ile Val Asp Lieu Met 225 23 O 235 24 O Tyr Glu Gly Gly Asn Ser Glu Met Trp Asp Ser Val Ser Asp Thr Ala 245 250 255 Glu Tyr Gly Gly Lieu. Thir Arg Gly Asp Arg Ile Val Asp Asp His Ala 26 O 265 27 O Arg Glu Lys Met Glu Glu Val Lieu. Glu Glu Val Glin Asn Gly. Thir Phe 27s 28O 285 Ala Arg Glu Trp Ile Ser Glu Asn Glin Ala Gly Arg Pro Ser Tyr Lys 290 295 3 OO Glin Lieu. Arg Ala Ala Glu Lys Asn His Asp Ile Glu Ala Val Gly Glu 3. OS 310 315 32O Asp Lieu. Arg Ala Lieu. Phe Ala Trp Gly Asp Asp 3.25 330

<210 SEQ ID NO 31 <211 LENGTH: 342 &212> TYPE: PRT <213> ORGANISM; Bacillus subtilis

<4 OO SEQUENCE: 31 Met Val Llys Val Tyr Tyr Asn Gly Asp Ile Lys Glu Asn Val Lieu Ala 1. 5 1O 15 Gly Lys Thr Val Ala Val Ile Gly Tyr Gly Ser Glin Gly His Ala His 2O 25 3 O US 2009/0163376 A1 Jun. 25, 2009 72

- Continued

Ala Lieu. Asn Lieu Lys Glu Ser Gly Val Asp Val Ile Val Gly Val Arg 35 4 O 45 Gln Gly Lys Ser Phe Thr Glin Ala Glin Glu Asp Gly His Llys Val Phe SO 55 6 O Ser Val Lys Glu Ala Ala Ala Glin Ala Glu Ile Ile Met Val Lieu. Lieu. 65 70 7s 8O Pro Asp Glu Glin Glin Glin Llys Val Tyr Glu Ala Glu Ile Lys Asp Glu 85 9 O 95 Lieu. Thir Ala Gly Llys Ser Lieu Val Phe Ala His Gly Phe Asin Val His OO OS 1O Phe His Glin Ile Val Pro Pro Ala Asp Val Asp Val Phe Leu Val Ala 15 2O 25 Pro Lys Gly Pro Gly His Lieu Val Arg Arg Thr Tyr Glu Glin Gly Ala 3O 35 4 O Gly Val Pro Ala Lieu. Phe Ala Ile Tyr Glin Asp Val Thr Gly Glu Ala 45 SO 55 160 Arg Asp Lys Ala Lieu Ala Tyr Ala Lys Gly Ile Gly Gly Ala Arg Ala 65 70 7s Gly Val Lieu. Glu Thir Thr Phe Lys Glu Glu Thr Glu Thr Asp Leu Phe 8O 85 90 Gly Glu Glin Ala Val Lieu. Cys Gly Gly Lieu. Ser Ala Lieu Val Lys Ala 95 2 OO 2O5 Gly Phe Glu Thir Lieu. Thr Glu Ala Gly Tyr Gln Pro Glu Lieu Ala Tyr O 215 22O Phe Glu. Cys Lieu. His Glu Lieu Lys Lieu. Ile Val Asp Lieu Met Tyr Glu 225 23 O 235 24 O Glu Gly Lieu Ala Gly Met Arg Tyr Ser Ile Ser Asp Thr Ala Glin Trp 245 250 255 Gly Asp Phe Val Ser Gly Pro Arg Val Val Asp Ala Lys Wall Lys Glu 26 O 265 27 O Ser Met Lys Glu Val Lieu Lys Asp Ile Glin Asn Gly Thr Phe Ala Lys 27s 28O 285 Glu Trp Ile Val Glu Asn Glin Val Asn Arg Pro Arg Phe Asn Ala Ile 290 295 3 OO Asn Ala Ser Glu Asn. Glu. His Glin Ile Glu Val Val Gly Arg Llys Lieu 3. OS 310 315 32O Arg Glu Met Met Pro Phe Wall Lys Glin Gly Lys Llys Lys Glu Ala Val 3.25 330 335

Wal Ser Wall Ala Glin Asn 34 O

<210 SEQ ID NO 32 <211 LENGTH: 338 &212> TYPE: PRT <213> ORGANISM: Corynebacterium glutamicum

<4 OO SEQUENCE: 32 Met Ala Ile Glu Lieu. Lieu. Tyr Asp Ala Asp Ala Asp Lieu. Ser Lieu. Ile 1. 5 1O 15 Glin Gly Arg Llys Val Ala Ile Val Gly Tyr Gly Ser Glin Gly His Ala 2O 25 3 O His Ser Glin Asn Lieu. Arg Asp Ser Gly Val Glu Val Val Ile Gly Lieu. US 2009/0163376 A1 Jun. 25, 2009 73

- Continued

35 4 O 45 Arg Glu Gly Ser Lys Ser Ala Glu Lys Ala Lys Glu Ala Gly Phe Glu SO 55 6 O Val Llys Thir Thr Ala Glu Ala Ala Ala Trp Ala Asp Val Ile Met Lieu 65 70 7s 8O Lieu Ala Pro Asp Thir Ser Glin Ala Glu Ile Phe Thr Asn Asp Ile Glu 85 9 O 95 Pro Asn Lieu. Asn Ala Gly Asp Ala Lieu. Lieu. Phe Gly His Gly Lieu. Asn OO OS 1O le His Phe Asp Lieu. Ile Llys Pro Ala Asp Asp Ile Ile Val Gly Met 15 2O 25 Val Ala Pro Lys Gly Pro Gly His Lieu Val Arg Arg Glin Phe Val Asp 3O 35 4 O Gly Lys Gly Val Pro Cys Lieu. Ile Ala Val Asp Glin Asp Pro Thr Gly 45 SO 55 160 Thir Ala Glin Ala Lieu. Thir Lieu. Ser Tyr Ala Ala Ala Ile Gly Gly Ala 65 70 7s Arg Ala Gly Val Ile Pro Thir Thr Phe Glu Ala Glu Thr Val Thr Asp 8O 85 90 Lieu. Phe Gly Glu Glin Ala Val Lieu. Cys Gly Gly Thr Glu Glu Lieu Val 95 2 OO 2O5 Lys Val Gly Phe Glu Val Lieu. Thr Glu Ala Gly Tyr Glu Pro Glu Met 210 215 22O Ala Tyr Phe Glu Val Lieu. His Glu Lieu Lys Lieu. Ile Val Asp Lieu Met 225 23 O 235 24 O Phe Glu Gly Gly Ile Ser Asn Met Asn Tyr Ser Val Ser Asp Thr Ala 245 250 255 Glu Phe Gly Gly Tyr Lieu. Ser Gly Pro Arg Val Ile Asp Ala Asp Thr 26 O 265 27 O Llys Ser Arg Met Lys Asp Ile Lieu. Thir Asp Ile Glin Asp Gly Thr Phe 27s 28O 285 Thir Lys Arg Lieu. Ile Ala Asn Val Glu Asn Gly Asn Thr Glu Lieu. Glu 290 295 3 OO Gly Lieu. Arg Ala Ser Tyr Asn. Asn His Pro Ile Glu Glu Thr Gly Ala 3. OS 310 315 32O Llys Lieu. Arg Asp Lieu Met Ser Trp Val Llys Val Asp Ala Arg Ala Glu 3.25 330 335

Thir Ala

<210 SEQ ID NO 33 <211 LENGTH: 339 &212> TYPE: PRT <213> ORGANISM: Phaeospririlum molischianum &220s FEATURE: <221 NAMEAKEY: misc feature <222> LOCATION: (310) ... (310) <223> OTHER INFORMATION: Xaa can be any naturally occurring amino acid <4 OO SEQUENCE: 33 Met Arg Val Tyr Tyr Asp Arg Asp Ala Asp Val Asn Lieu. Ile Llys Ser 1. 5 1O 15 Llys Llys Val Ala Val Ile Gly Tyr Gly Ser Glin Gly His Ala His Val 2O 25 3 O US 2009/0163376 A1 Jun. 25, 2009 74

- Continued Lieu. Asn Lieu. Arg Asp Ser Gly Wall Lys Asp Wall Ala Val Ala Lieu. Arg 35 4 O 45 Pro Gly Ser Ala Ser Ile Llys Lys Ala Glu Ala Glu Gly Lieu Lys Val SO 55 6 O Lieu. Thr Pro Ala Glu Ala Ala Ala Trp Ala Asp Val Val Met Ile Lieu. 65 70 7s 8O Thr Pro Asp Glu Lieu. Glin Ala Asp Lieu. Tyr Lys Ser Glu Lieu Ala Ala

Asn Lieu Lys Pro Gly Ala Ala Lieu Val Phe Ala His Gly Lieu Ala Ile OO OS 1O His Phe Llys Lieu. Ile Glu Ala Arg Ala Asp Lieu. Asp Val Phe Met Val 15 2O 25 Ala Pro Lys Gly Pro Gly His Thr Val Arg Gly Glu Tyr Lieu Lys Gly 3O 35 4 O Gly Gly Val Pro Cys Lieu Val Ala Val Ala Glin Asn Pro Thr Gly Asn 45 SO 55 160 Ala Lieu. Glu Lieu Ala Lieu. Ser Tyr Ala Ser Ala Ile Gly Gly Gly Arg 65 70 7s Ser Gly Ile Ile Glu Thir Thr Phe Arg Glu Glu. Cys Glu Thir Asp Leu 8O 85 90 Phe Gly Glu Glin Val Val Lieu. Cys Gly Gly Lieu. Ser Llys Lieu. Ile Glin 95 2 OO 2O5 yr Gly Phe Glu Thir Lieu Val Glu Ala Gly Tyr Ala Pro Glu Met Ala 210 215 22O yr Phe Glu. Cys Lieu. His Glu Val Lys Lieu. Ile Val Asp Lieu. Ile Tyr 225 23 O 235 24 O Glu Gly Gly Ile Ala Asn Met Arg Tyr Ser Ile Ser Asn. Thir Ala Glu 245 250 255 Tyr Gly Asp Tyr Val Thr Gly Ser Arg Ile Ile Thr Glu Ala Thr Lys 26 O 265 27 O Ala Glu Met Lys Arg Val Lieu Ala Asp Ile Glin Ser Gly Arg Phe Val 27s 28O 285 Arg Asp Trp Met Lieu. Glu. Cys Lys Ala Gly Glin Pro Ser Phe Lys Ala 290 295 3 OO Thir Arg Arg Ile Glin Xaa Glu. His Val Ile G Val Val Gly Glu Lys 3. OS 310 3 32O Lieu. Arg Gly Met Met Pro Trp Ile Ser Lys Asn Llys Lieu Val Asp Llys 3.25 330 335 Ala Arg Asn

<210 SEQ ID NO 34 <211 LENGTH: 339 &212> TYPE: PRT <213> ORGANISM: Zymomonas mobilis

<4 OO SEQUENCE: 34 Met Llys Val Tyr Tyr Asp Ser Asp Ala Asp Lieu. Gly Lieu. Ile Llys Ser 1. 5 1O 15 Llys Lys Ile Ala Ile Lieu. Gly Tyr Gly Ser Glin Gly His Ala His Ala 2O 25 3 O Glin Asn Lieu. Arg Asp Ser Gly Val Ala Glu Val Ala Ile Ala Lieu. Arg 35 4 O 45 US 2009/0163376 A1 Jun. 25, 2009 75

- Continued Pro Asp Ser Ala Ser Val Llys Lys Ala Glin Asp Ala Gly Phe Llys Val SO 55 6 O Lieu. Thir Asn Ala Glu Ala Ala Lys Trp Ala Asp Ile Lieu Met Ile Lieu. 65 70 7s 8O Ala Pro Asp Glu. His Glin Ala Ala Ile Tyr Ala Glu Asp Lieu Lys Asp 85 9 O 95 Asn Lieu. Arg Pro Gly Ser Ala Ile Ala Phe Ala His Gly Lieu. Asn. Ile OO OS 1O His Phe Gly Lieu. Ile Glu Pro Arg Lys Asp Ile Asp Val Phe Met Ile 15 2O 25 Ala Pro Lys Gly Pro Gly His Thr Val Arg Ser Glu Tyr Val Arg Gly 3O 35 4 O Gly Gly Val Pro Cys Lieu Val Ala Val Asp Glin Asp Ala Ser Gly Asn 45 SO 55 160 Ala His Asp Ile Ala Lieu Ala Tyr Ala Ser Gly Ile Gly Gly Gly Arg 65 70 7s Ser Gly Val Ile Glu Thir Thr Phe Arg Glu Glu Val Glu Thir Asp Leu 8O 85 90 Phe Gly Glu Glin Ala Val Lieu. Cys Gly Gly Lieu. Thir Ala Lieu. Ile Thr 95 2 OO 2O5 Ala Gly Phe Glu Thir Lieu. Thr Glu Ala Gly Tyr Ala Pro Glu Met Ala 210 215 22O Phe Phe Glu. Cys Met His Glu Met Lys Lieu. Ile Val Asp Lieu. Ile Tyr 225 23 O 235 24 O Glu Ala Gly Ile Ala Asn Met Arg Tyr Ser Ile Ser Asn. Thir Ala Glu 245 250 255 Tyr Gly Asp Ile Val Ser Gly Pro Arg Val Ile Asn. Glu Glu Ser Lys 26 O 265 27 O Lys Ala Met Lys Ala Ile Lieu. Asp Asp Ile Glin Ser Gly Arg Phe Val 27s 28O 285 Ser Llys Phe Val Lieu. Asp Asn Arg Ala Gly Glin Pro Glu Lieu Lys Ala 290 295 3 OO Ala Arg Lys Arg Met Ala Ala His Pro Ile Glu Glin Val Gly Ala Arg 3. OS 310 315 32O Lieu. Arg Llys Met Met Pro Trp Ile Ala Ser Asn Llys Lieu Val Asp Llys 3.25 330 335 Ala Arg Asn

<210 SEQ ID NO 35 <211 LENGTH: 338 &212> TYPE: PRT <213> ORGANISM: Alkalilimnicola ehrlichei

<4 OO SEQUENCE: 35 Met Glin Val Tyr Tyr Asp Lys Asp Ala Asp Lieu. Ser Ile Ile Glin Gly 1. 5 1O 15 Llys Llys Val Ala Val Ile Gly Tyr Gly Ser Glin Gly His Ala His Ala 2O 25 3 O Asn Asn Lieu Lys Glu Ser Gly Val Asp Val Val Val Gly Lieu. Arg Glu 35 4 O 45 Gly Ser Ser Ser Ala Ala Lys Ala Glin Lys Ala Gly Lieu Ala Val Ala SO 55 6 O US 2009/0163376 A1 Jun. 25, 2009 76

- Continued Ser Ile Glu Asp Ala Ala Ala Glin Ala Asp Val Val Met Ile Lieu Ala 65 70 7s 8O Pro Asp Glu. His Glin Ala Val Ile Tyr His Asn. Glin Ile Ala Pro Asn 85 9 O 95 Val Llys Pro Gly Ala Ala Ile Ala Phe Ala His Gly Phe Asn. Ile His OO OS 1O Phe Gly Glin Ile Glin Pro Ala Ala Asp Lieu. Asp Val Ile Met Val Ala 15 2O 25 Pro Lys Gly Pro Gly His Leu Val Arg Ser Thr Tyr Val Glu Gly Gly 3O 35 4 O Gly Val Pro Ser Lieu. Ile Ala Ile His Glin Asp Ala Thr Gly Lys Ala 45 SO 55 160 Lys Asp Ile Ala Lieu. Ser Tyr Ala Ser Ala Asn Gly Gly Gly Arg Ala

Gly Val Ile Glu Thir Ser Phe Arg Glu Glu Thr Glu Thr Asp Leu Phe

Gly Glu Glin Ala Val Lieu. Cys Gly Gly Ile Thir Ser Lieu. Ile Glin Ala 95 2 OO 2O5 Gly Phe Glu Thir Lieu Val Glu Ala Gly Tyr Ala Pro Glu Met Ala Tyr 210 215 22O Phe Glu. Cys Lieu. His Glu Thir Lys Lieu. Ile Val Asp Lieu. Lieu. Tyr Glin 225 23 O 235 24 O Gly Gly Ile Ala Asn Met Arg Tyr Ser Ile Ser Asn Thr Ala Glu Tyr 245 250 255 Gly Asp Phe Thr Arg Gly Pro Arg Val Ile Asn. Glu Glu Ser Arg Glu 26 O 265 27 O Ala Met Arg Glu Ile Lieu Ala Glu Ile Glin Glu Gly Glu Phe Ala Arg 27s 28O 285 Glu Phe Val Lieu. Glu Asn Glin Ala Gly Cys Pro Thr Lieu. Thir Ala Arg 290 295 3 OO Arg Arg Lieu Ala Ala Glu. His Glu Ile Glu Val Val Gly Glu Arg Lieu 3. OS 310 315 32O Arg Gly Met Met Pro Trp Ile Asn Ala Asn Llys Lieu Val Asp Lys Asp 3.25 330 335

Lys Asn

<210 SEQ ID NO 36 <211 LENGTH: 34 O &212> TYPE: PRT <213> ORGANISM: Campylobacter lari

<4 OO SEQUENCE: 36 Met Ala Val Ser Ile Tyr Tyr Asp Lys Asp Cys Asp Ile Asn Lieu. Ile 1. 5 1O 15 Llys Ser Lys Llys Val Ala Ile Ile Gly Phe Gly Ser Glin Gly His Ala 2O 25 3 O His Ala Met Asn Lieu. Arg Asp Ser Gly Val Glu Val Ile Ile Gly Lieu. 35 4 O 45 Lys Glu Gly Gly Glin Ser Trp Ala Lys Ala Gln Lys Ala Asn. Phe Ile SO 55 6 O Val Lys Ser Val Lys Glu Ala Thir Lys Glu Ala Asp Lieu. Ile Met Ile 65 70 7s 8O US 2009/0163376 A1 Jun. 25, 2009 77

- Continued Lieu Ala Pro Asp Glu Ile Glin Ser Glu Ile Phe Asn. Glu Glu Ile Llys 85 9 O 95 Pro Glu Lieu Lys Ala Gly Llys Thr Lieu Ala Phe Ala His Gly Phe Asn OO OS 1O le His Tyr Gly Glin Ile Val Ala Pro Lys Gly Ile Asp Val Ile Met 15 2O 25 le Ala Pro Lys Ala Pro Gly His Thr Val Arg His Glu Phe Ser Ile 3O 35 4 O Gly Gly Gly Thr Pro Cys Lieu. Ile Ala Ile His Glin Asp Glu Ser Lys 45 SO 55 160 Asn Ala Lys Asn Lieu Ala Lieu. Ser Tyr Ala Ser Ala Ile Gly Gly Gly 65 70 7s Arg Thr Gly Ile Ile Glu Thir Thr Phe Lys Ala Glu Thr Glu. Thir Asp 8O 85 90 Lieu. Phe Gly Glu Glin Ala Val Lieu. Cys Gly Gly Lieu. Ser Ala Lieu. Ile 95 2 OO 2O5 Glin Ala Gly Phe Glu Thir Lieu Val Glu Ala Gly Tyr Glu Pro Glu Met 210 215 22O Ala Tyr Phe Glu. Cys Lieu. His Glu Met Lys Lieu. Ile Val Asp Lieu. Ile 225 23 O 235 24 O Tyr Glin Gly Gly Ile Ala Asp Met Arg Tyr Ser Val Ser Asn Thr Ala 245 250 255 Glu Tyr Gly Asp Tyr Ile Thr Gly Pro Lys Ile Ile Thr Lys Glu Thr 26 O 265 27 O Lys Glu Ala Met Lys Gly Val Lieu Lys Asp Ile Glin Asn Gly Ser Phe 27s 28O 285 Ala Lys Asp Phe Ile Lieu. Glu Arg Arg Ala Asn. Phe Ala Arg Met His 290 295 3 OO Ala Glu Arg Llys Lieu Met Asn Asp Ser Lieu. Ile Glu Lys Thr Gly Arg 3. OS 310 315 32O Glu Lieu. Arg Ala Met Met Pro Trp Ile Ser Ala Lys Llys Lieu Val Asp 3.25 330 335 Lys Asp Lys Asn 34 O

<210 SEQ ID NO 37 <211 LENGTH: 338 &212> TYPE: PRT <213> ORGANISM: Marinobacter aquaeolei <4 OO SEQUENCE: 37 Met Glin Val Tyr Tyr Asp Lys Asp Cys Asp Lieu. Ser Ile Ile Glin Gly 1. 5 1O 15 Llys Llys Val Ala Ile Lieu. Gly Phe Gly Ser Glin Gly His Ala His Ala 2O 25 3 O Cys Asn Lieu Lys Asp Ser Gly Val Asp Val Val Val Gly Lieu. Arg Ala 35 4 O 45 Gly Ser Ser Ser Ile Ala Lys Ala Glu Ala Tyr Gly Lieu Lys Thir Ser SO 55 6 O Asp Wall Ala Ser Ala Val Ala Ser Ala Asp Val Val Met Val Lieu. Thr 65 70 7s 8O Pro Asp Glu Phe Glin Ala Glin Lieu. Tyr Arg Glu Glu Ile Glu Pro Asn 85 9 O 95 US 2009/0163376 A1 Jun. 25, 2009 78

- Continued

Lell Glin Gly Ala Lell Ala Phe Ala His Gly Phe Ala Ile His OO OS

yr Asn Glin Ile Wall Pro Arg Asp Luell Wall Ile Met Wall Ala 15 2O 25

Pro Ala Pro Gly Thir Wall Arg Thir Glu Phe Thir Gly Gly 3O 35 4 O

Ile Pro Asp Lell le Ala Ile Phe Glin Ala Ser Gly Asn Ala SO 55 160

Asn Wall Ala Lell Ala Ser Gly le Gly Gly Gly Arg Thir 70 7s

Ile Ile Glu Thir Phe Asp Glu Glu Thir Asp Luell Phe 85 90

Glu Glin Ala Wall Lell Gly Gly Ala Wall Glu Lell Wall Ala 2O5

Phe Glu Thir Lell Glu Ala Gly Ala Pro Glu Met Ala 215 22O

Glu Luell His Glu Lell Luell Ile Wall Asp Lell Met Glu 23 O 235 24 O

Gly Ile Ala Asn Met Asn Ser Ile Ser Asn Asn Ala Glu 250 255

Glu Wall Thir Gly Pro Glu Wall Ile ASn Glu Glin Ser Arg Glu 265 27 O

Met Arg Asn Ala Lell Arg Ile Glin Ser Gly Glu Ala 27s 28O 285

Met Phe Ile Ser Glu Gly Ala Luell Asn Pro Ser Met Thir Ala Arg 290 295 3 OO

Arg Arg Glin Asn Ala Ala His Glu Ile Glu Thir Wall Gly Glu Luell 3. OS 310 315 32O

Arg Ser Met Met Pro Trp Ile Ser Ala Asn Lys Ile Wall Asp Asp 3.25 330 335

Lys Asn

SEQ ID NO 38 LENGTH: 338 TYPE : PRT ORGANISM: Psychrobacter arcticus

SEQUENCE: 38

Met Asin Val Tyr Tyr Asp Asp Asp Luell Ser Ile Wall Glin Gly 1. 5 15

Lys Lys Wall Ala Ile Ile Gly Gly Ser Glin Gly His Ala His Ala 2O 25 3 O

Lell Asn Luell Glin Asp Ser Asn Wall Asp Wall Thir Wall Gly Luell Arg Ala 35 4 O 45

Asp Ser Gly Ser Trp Lys Ala Glu Asn Ala Gly Lell Wall Ala SO 55 6 O

Glu Wall Glu Glu Ala Wall Ala Ala Asp Ile Ile Met Ile Luell Thir 65 70 7s

Pro Asp Glu Phe Glin Lys Glu Luell Asn Asp Wall Ile Glu Pro Asn 85 9 O 95

Ile Glin Gly Ala Thir Lell Ala Phe Ala His Gly Phe Ala Ile His 1 OO 105 11O US 2009/0163376 A1 Jun. 25, 2009 79

- Continued

yr Asn Glin Wall Ile Pro Arg Ser Asp Luell Asp Wall Ile Met Wall Ala 15 12O 125

Pro Ala Pro Gly His Thir Wall Arg Ser Glu Phe Ala Gly Gly 3O 135 14 O

Ile Pro Asp Lell Ile Ala Ile Glin Asp Ala Ser Gly Glin Ala 150 155 160

Glin Luell Ala Lell Ser Ala Ala Gly Wall Gly Gly Gly Arg Ser 17O 17s

Ile Ile Glu Thir Thir Phe Asp Glu Thir Glu Thir Asp Luell Phe 185 190

Glu Glin Ala Wall Lell Gly Gly Ala Wall Glu Lell Wall Met 2 OO 2O5

Phe Glu Thir Lell Thir Glu Ala Gly Ala Pro Glu Met Ala Tyr 215 22O

Glu Luell His Glu Lell Luell Ile Wall Asp Lell Met Glu 23 O 235 24 O

Gly Ile Ala Asp Met Asn Ser Ile Ser Asn Asn Ala Glu Tyr 250 255

Glu Wall Thir Gly Pro Glu Wall Ile ASn Glu Glin Ser Arg Glu 265 27 O

Met Arg Asn Ala Lell Arg Ile Glin Ser Gly Glu Ala Lys 27s 28O 285

Met Phe Ile Ser Glu Gly Ala Thir Asn Pro Ser Met Thir Ala Arg 290 295 3 OO

Arg Arg Asn Asn Ala Glu His Glin Ile Glu Ile Thir Gly Ala Luell 3. OS 310 315

Arg Gly Met Met Pro Trp Ile Gly Gly Asn Lys Ile Ile Asp Asp 3.25 330 335

Lys Asn

<210 SEQ ID NO 39 <211 LENGTH: 338 &212> TYPE : PRT <213> ORGANISM: Hahella cheuensis

<4 OO SEQUENCE: 39

Met Glin Val Tyr Tyr Asp Asp Asp Luell Ser Ile Ile Glin Gly 1. 5 15

Lys Lys Wall Ala Ile Ile Gly Gly Ser Glin Gly His Ala His Ala 2O 25 3 O

Asn Asn Luell Lys Asp Ser Gly Wall Asp Wall Cys Wall Gly Luell Arg Lys 35 4 O 45

Gly Ser Gly Ser Trp Ala Ala Glu Asn Ala Gly Lell Ala Wall Lys SO 55 6 O

Glu Wall Ala Glu Ala Wall Ala Gly Ala Asp Wall Wall Met Ile Luell Thir 65 70 7s

Pro Asp Glu Phe Glin Ala Glin Luell Ser Glu Ile Glu Pro Asn 85 9 O 95

Lell Ser Gly Ala Thir Lell Ala Phe Ala His Gly Phe Ser Ile His 1 OO 105 11O

Tyr Asn Glin Ile Wall Pro Arg Ala Asp Luell Asp Wall Ile Met Ile Ala 115 12O 125 US 2009/0163376 A1 Jun. 25, 2009 80

- Continued

Pro Lys Ala Pro Gly His Thr Val Arg Ser Glu Phe Val Lys Gly Gly 13 O 135 14 O Gly Ile Pro Asp Lieu. Ile Ala Ile Phe Glin Asp Ala Ser Gly Ser Ala 145 150 155 160 Lys Asp Lieu Ala Lieu. Ser Tyr Ala Ser Gly Val Gly Gly Gly Arg Thr 1.65 17O 17s Gly Ile Ile Glu Thir Thr Phe Lys Asp Glu Thr Glu Thr Asp Leu Phe 18O 185 190 Gly Glu Glin Ala Val Lieu. Cys Gly Gly Ala Val Glu Lieu Val Lys Ala 195 2 OO 2O5 Gly Phe Glu Thir Lieu Val Glu Ala Gly Tyr Ala Pro Glu Met Ala Tyr O 215 22O Phe Glu. Cys Lieu. His Glu Lieu Lys Lieu. Ile Val Asp Lieu Met Tyr Glu 225 23 O 235 24 O Gly Gly Ile Ala Asn Met Asn Tyr Ser Ile Ser Asn. Asn Ala Glu Tyr 245 250 255 Gly Glu Tyr Val Thr Gly Pro Glu Val Ile Asn Asp Glin Ser Arg Ala 26 O 265 27 O Ala Met Arg Asn Ala Lieu Lys Arg Ile Glin Asp Gly Glu Tyr Ala Lys 27s 28O 285 Met Phe Ile Ala Glu Gly Ala His Asn Tyr Pro Ser Met Thr Ala Tyr 290 295 3 OO Arg Arg Asn. Asn Ala Ala His Pro Ile Glu Glin Val Gly Glu Lys Lieu 3. OS 310 315 32O Arg Ser Met Met Pro Trp Ile Ala Ser Asn Lys Ile Val Asp Llys Ser 3.25 330 335 Lys Asn

<210 SEQ ID NO 4 O <211 LENGTH: 338 &212> TYPE: PRT <213> ORGANISM: Thiobacillus denitrificans

<4 OO SEQUENCE: 40 Met Llys Val Tyr Tyr Asp Lys Asp Ala Asp Lieu. Ser Lieu. Ile Lys Glin 1. 5 1O 15 Arg Llys Val Ala Ile Val Gly Tyr Gly Ser Glin Gly. His Ala His Ala 2O 25 3 O Asn Asn Lieu Lys Asp Ser Gly Val Asp Val Thr Val Ala Lieu. Arg Pro 35 4 O 45 Gly Ser Ala Ser Ala Lys Lys Ala Glu Asn Ala Gly Lieu. Thr Val Lys SO 55 6 O Ser Val Pro Glu Ala Val Ala Gly Ala Asp Lieu Val Met Ile Lieu. Thir 65 70 7s 8O Pro Asp Glu Phe Glin Ser Arg Lieu. Tyr Arg Asp Glu Ile Glu Pro Asn 85 9 O 95 Ile Lys Glin Gly Ala Thr Lieu Ala Phe Ala His Gly Phe Ser Ile His 1 OO 105 11O Tyr Asn Glin Val Val Pro Arg Ala Asp Lieu. Asp Val Ile Met Ile Ala 115 12O 125 Pro Lys Ala Pro Gly His Thr Val Arg Ser Glu Phe Val Lys Gly Gly 13 O 135 14 O US 2009/0163376 A1 Jun. 25, 2009 81

- Continued

Gly Ile Pro Asp Lieu. Ile Ala Ile Tyr Glin Asp Ala Ser Gly Lys Ala 145 150 155 160 Lys Glu Thir Ala Lieu. Ser Tyr Ala Ser Ala Ile Gly Gly Gly Arg Thr 1.65 17O 17s Gly Ile Ile Glu Thir Thr Phe Lys Asp Glu Thr Glu Thr Asp Leu Phe 18O 185 190 Gly Glu Glin Ala Val Lieu. Cys Gly Gly Ala Val Glu Lieu Val Lys Ala 195 2 OO 2O5 Gly Phe Asp Thr Lieu Val Glu Ala Gly Tyr Ala Pro Glu Met Ala Tyr O 215 22O Phe Glu. Cys Lieu. His Glu Lieu Lys Lieu. Ile Val Asp Lieu Met Tyr Glu 225 23 O 235 24 O Gly Gly Ile Ala Asn Met Asn Tyr Ser Ile Ser Asn. Asn Ala Glu Tyr 245 250 255 Gly Glu Tyr Val Thr Gly Val Llys Val Ile Asn. Glu Glin Ser Arg Ala 26 O 265 27 O Ala Met Lys Glu. Cys Lieu Ala Asn. Ile Glin Asn Gly Ala Tyr Ala Lys 27s 28O 285 Arg Phe Ile Leu Glu Gly Glin Ala Asn Tyr Pro Glu Met Thr Ala Trp 290 295 3 OO Arg Arg Asn. Asn Ala Ala His Glin Ile Glu Val Val Gly Ala Lys Lieu 3. OS 310 315 32O Arg Ser Met Met Pro Trp Ile Ala Ala Asn Llys Lieu Val Asp His Ser 3.25 330 335 Lys Asn

<210 SEQ ID NO 41 <211 LENGTH: 338 &212> TYPE: PRT <213> ORGANISM: Azotobacter vinelandii

<4 OO SEQUENCE: 41 Met Llys Val Tyr Tyr Asp Lys Asp Cys Asp Lieu. Ser Ile Ile Glin Ser 1. 5 1O 15 Llys Llys Val Ala Ile Ile Gly Tyr Gly Ser Glin Gly His Ala His Ala 2O 25 3 O Cys Asn Lieu Lys Asp Ser Gly Val Asp Val Tyr Val Gly Lieu. Arg Ala 35 4 O 45 Gly Ser Ala Ser Val Ala Lys Ala Glu Ala His Gly Lieu. Thr Val Lys SO 55 6 O Ser Val Lys Asp Ala Val Ala Ala Ala Asp Val Val Met Ile Lieu. Thir 65 70 7s 8O Pro Asp Glu Phe Glin Gly Arg Lieu. Tyr Lys Asp Glu Ile Glu Pro Asn 85 9 O 95 Lieu Lys Lys Gly Ala Thr Lieu Ala Phe Ala His Gly Phe Ser Ile His 1 OO 105 11O Tyr Asn Glin Val Val Pro Arg Ala Asp Lieu. Asp Val Ile Met Ile Ala 115 12O 125 Pro Lys Ala Pro Gly His Thr Val Arg Ser Glu Phe Val Arg Gly Gly 13 O 135 14 O Gly Ile Pro Asp Lieu. Ile Ala Val Tyr Glin Asp Ala Ser Gly Asn Ala 145 150 155 160 US 2009/0163376 A1 Jun. 25, 2009 82

- Continued

Lys Asn Lieu Ala Lieu. Ser Tyr Ala Cys Gly Val Gly Gly Gly Arg Thr 1.65 17O 17s Gly Ile Ile Glu Thir Thr Phe Lys Asp Glu Thr Glu Thr Asp Leu Phe 18O 185 190 Gly Glu Glin Ala Val Lieu. Cys Gly Gly Cys Val Glu Lieu Val Lys Ala 195 2 OO 2O5 Gly Phe Glu Thir Lieu Val Glu Ala Gly Tyr Ala Pro Glu Met Ala Tyr O 215 22O Phe Glu. Cys Lieu. His Glu Lieu Lys Lieu. Ile Val Asp Lieu Met Phe Glu 225 23 O 235 24 O Gly Gly Ile Ala Asn Met Asn Tyr Ser Ile Ser Asn. Asn Ala Glu Tyr 245 250 255 Gly Glu Tyr Val Thr Gly Pro Glu Val Ile Asn Glu Gln Ser Arg Glin 26 O 265 27 O Ala Met Arg Asn Ala Lieu Lys Arg Ile Glin Asp Gly Glu Tyr Ala Lys 27s 28O 285 Met Phe Ile Thr Glu Gly Ala Ala Asn Tyr Pro Ser Met Thr Ala Tyr 290 295 3 OO Arg Arg Asn. Asn Ala Ala His Glin Ile Glu Val Val Gly Glu Lys Lieu 3. OS 310 315 32O Arg Thr Met Met Pro Trp Ile Ala Ala Asn Lys Ile Val Asp Llys Thr 3.25 330 335

Lys Asn

<210 SEQ ID NO 42 <211 LENGTH: 338 &212> TYPE: PRT <213> ORGANISM: Pseudomonas syringae <4 OO SEQUENCE: 42 Met Llys Val Phe Tyr Asp Lys Asp Cys Asp Lieu. Ser Ile Ile Glin Gly 1. 5 1O 15 Llys Llys Val Ala Ile Ile Gly Tyr Gly Ser Glin Gly His Ala Glin Ala 2O 25 3 O Cys Asn Lieu Lys Asp Ser Gly Val Asp Val Thr Val Gly Lieu. Arg Llys 35 4 O 45 Gly Ser Ala Thr Val Ala Lys Ala Glu Ala His Gly Lieu Lys Val Thr SO 55 6 O Asp Wall Ala Ser Ala Val Ala Ala Ala Asp Lieu Val Met Ile Lieu. Thr 65 70 7s 8O Pro Asp Glu Phe Glin Ser Glin Lieu. Tyr Lys Asn. Glu Val Glu Pro Asn 85 9 O 95 Lieu Lys Lys Gly Ala Thr Lieu Ala Phe Ser His Gly Phe Ala Ile His 1 OO 105 11O Tyr Asn Glin Val Val Pro Arg Ala Asp Lieu. Asp Val Ile Met Ile Ala 115 12O 125 Pro Lys Ala Pro Gly His Thr Val Arg Thr Glu Phe Val Lys Gly Gly 13 O 135 14 O Gly Ile Pro Asp Lieu. Ile Ala Val Tyr Glin Asp Ala Ser Gly Asn Ala 145 150 155 160 Lys Asn. Wall Ala Lieu. Ser Tyr Ala Ser Gly Val Gly Gly Gly Arg Thr 1.65 17O 17s US 2009/0163376 A1 Jun. 25, 2009 83

- Continued

Gly Ile Ile Glu Thir Thr Phe Lys Asp Glu Thr Glu Thr Asp Leu Phe 18O 185 190 Gly Glu Glin Ala Val Lieu. Cys Gly Gly. Thr Val Glu Lieu Val Lys Ala 195 2 OO 2O5 Gly Phe Glu Thir Lieu Val Glu Ala Gly Tyr Ala Pro Glu Met Ala Tyr O 215 22O Phe Glu. Cys Lieu. His Glu Lieu Lys Lieu. Ile Val Asp Lieu Met Tyr Glu 225 23 O 235 24 O Gly Gly Ile Ala Asn Met Asn Tyr Ser Ile Ser Asn. Asn Ala Glu Tyr 245 250 255 Gly Glu Tyr Val Thr Gly Pro Glu Val Ile Asn Ala Glu Ser Arg Glin 26 O 265 27 O Ala Met Arg Asn Ala Lieu Lys Arg Ile Glin Asp Gly Glu Tyr Ala Lys 27s 28O 285 Met Phe Ile Ser Glu Gly Ala Thr Gly Tyr Pro Ser Met Thr Ala Lys 290 295 3 OO Arg Arg Asn. Asn Ala Ala His Gly Ile Glu Ile Ile Gly Glu Lys Lieu 3. OS 310 315 32O Arg Ser Met Met Pro Trp Ile Ala Ala Asn Lys Ile Val Asp Lys Asp 3.25 330 335

Lys Asn

<210 SEQ ID NO 43 <211 LENGTH: 338 &212> TYPE: PRT <213> ORGANISM: Pseudomonas syringae <4 OO SEQUENCE: 43 Met Llys Val Phe Tyr Asp Lys Asp Cys Asp Lieu. Ser Ile Ile Glin Gly 1. 5 1O 15 Llys Llys Val Ala Ile Ile Gly Tyr Gly Ser Glin Gly His Ala Glin Ala 2O 25 3 O Cys Asn Lieu Lys Asp Ser Gly Val Asp Val Thr Val Gly Lieu. Arg Llys 35 4 O 45 Gly Ser Ala Thr Val Ala Lys Ala Glu Ala His Gly Lieu Lys Val Thr SO 55 6 O Asp Wall Ala Ser Ala Val Ala Ala Ala Asp Lieu Val Met Ile Lieu. Thr 65 70 7s 8O Pro Asp Glu Phe Glin Ser Glin Lieu. Tyr Lys Asn. Glu Val Glu Pro Asn 85 9 O 95 Lieu Lys Lys Gly Ala Thr Lieu Ala Phe Ser His Gly Phe Ala Ile His OO OS 1O yr Asn Glin Val Val Pro Arg Ala Asp Lieu. Asp Wal Ile Met Ile Ala 15 2O 25 Pro Lys Ala Pro Gly His Thr Val Arg Thr Glu Phe Val Lys Gly Gly 3O 35 4 O Gly Ile Pro Asp Lieu. Ile Ala Val Tyr Glin Asp Ala Ser Gly Asn Ala 45 SO 55 160 Lys Asn. Wall Ala Lieu. Ser Tyr Ala Ser Gly Val Gly Gly Gly Arg Thr 65 70 7s Gly Ile Ile Glu Thir Thr Phe Lys Asp Glu Thr Glu Thr Asp Leu Phe 8O 85 90 US 2009/0163376 A1 Jun. 25, 2009 84

- Continued

Gly Glu Glin Ala Val Lieu. Cys Gly Gly. Thr Val Glu Lieu Val Lys Ala 195 2 OO 2O5 Gly Phe Glu Thir Lieu Val Glu Ala Gly Tyr Ala Pro Glu Met Ala Tyr O 215 22O Phe Glu. Cys Lieu. His Glu Lieu Lys Lieu. Ile Val Asp Lieu Met Tyr Glu 225 23 O 235 24 O Gly Gly Ile Ala Asn Met Asn Tyr Ser Ile Ser Asn. Asn Ala Glu Tyr 245 250 255 Gly Glu Tyr Val Thr Gly Pro Glu Val Ile Asn Ala Glu Ser Arg Glin 26 O 265 27 O Ala Met Arg Asn Ala Lieu Lys Arg Ile Glin Asp Gly Glu Tyr Ala Lys 27s 28O 285 Met Phe Ile Thr Glu Gly Ala Thr Gly Tyr Pro Ser Met Thr Ala Lys 290 295 3 OO Arg Arg Asn. Asn Ala Glu. His Gly Ile Glu Val Ile Gly Glu Lys Lieu 3. OS 310 315 32O Arg Ser Met Met Pro Trp Ile Ala Ala Asn Lys Ile Val Asp Lys Asp 3.25 330 335

Lys Asn

<210 SEQ ID NO 44 <211 LENGTH: 338 &212> TYPE: PRT <213> ORGANISM: Pseudomonas putida <4 OO SEQUENCE: 44 Met Llys Val Phe Tyr Asp Lys Asp Cys Asp Lieu. Ser Ile Ile Glin Gly 1. 5 1O 15 Llys Llys Val Ala Ile Ile Gly Tyr Gly Ser Glin Gly His Ala Glin Ala 2O 25 3 O Cys Asn Lieu Lys Asp Ser Gly Val Asp Val Thr Val Gly Lieu. Arg Llys 35 4 O 45 Gly Ser Ala Thr Val Ala Lys Ala Glu Ala His Gly Lieu Lys Val Ala SO 55 6 O Asp Wall Ala Thr Ala Val Ala Ala Ala Asp Lieu Val Met Ile Lieu. Thr 65 70 7s 8O Pro Asp Glu Phe Glin Gly Ala Lieu. Tyr Lys Asn. Glu Ile Glu Pro Asn 85 9 O 95 le Llys Lys Gly Ala Thr Lieu Ala Phe Ser His Gly Phe Ser Ile His OO OS 1O Tyr Asn Glin Val Val Pro Arg Ala Asp Lieu. Asp Val Ile Met Ile Ala 15 2O 25 Pro Lys Ala Pro Gly His Thr Val Arg Ser Glu Phe Val Lys Gly Gly 3O 35 4 O Gly Ile Pro Asp Lieu. Ile Ala Ile Tyr Glin Asp Ala Ser Gly Asn Ala 45 SO 55 160 Lys Asn. Wall Ala Lieu. Ser Tyr Ala Ser Gly Val Gly Gly Gly Arg Thr 65 70 7s Gly Ile Ile Glu Thir Thr Phe Lys Asp Glu Thr Glu Thr Asp Leu Phe 8O 85 90 Gly Glu Glin Ala Val Lieu. Cys Gly Gly. Thr Val Glu Lieu Val Lys Ala 95 2 OO 2O5